Features / Usability

Trying to strip.tags in searchresults

posts: 3510

Using 1.9.4....

In tiki-searchresults.tpl, I want to strip out all the wiki tags, module coding, etc. from the search results so that only actual text appears. The TPL file already uses:


However, this doesn't seem to work — the wiki codes are still being shown in the search results. I experimented with


and it produced exactly the same results. Anyone know how to display text without any wiki formatting?


posts: 1

Here are my changes to achieve the same goal (from lib/search/searchlib.php):

$search1=  array('*','!','__','((','))');
          $replace1= array('','','','','');
          while ($res = $result->fetchRow()) {
           if($this->user_has_perm_on_object($user,$res["page"],'wiki page','tiki_p_view')) {
            $href = "tiki-index.php?page=".urlencode($res["page"]);
            $ret[] = array(
              'pageName' => $res["page"],
              'location' => tra("Wiki"),
              'data' => str_replace($search1,$replace1,substr($res["data"],0,250)),
              'hits' => $res["hits"],
              'lastModif' => $res["lastModif"],
              'href' => $href,
              'relevance' => $res["count"]

Essentially, search1 is an array of strings that I want to replace with the the array replace1. I do the actual replace a few lines down with the str_replace function.

Hope this helps,

posts: 3510

> Here are my changes to achieve the same goal (from lib/search/searchlib.php):


EXCELLENT! This is what I was looking for as a start. How can this be updated to also strip out the modules? I need to remove all the {......} as well.


posts: 80 Austria

Where do i put this to make it work correctly? CAn you please post a line number?
Thank you.

posts: 3510

tmuth's solution works great for the index search (tiki-searchindex.php) but not for the database search (tiki-searchresults.php) — I kept hitting a String/Array error.

So, borrowing from tmuth, here's my hack:

  1. In tiki-searchresults.php add the following lines:
    • $search1= array('__','!','!!','!!!','^','{maketoc}','%%%','::');
    • $replace1= array('','','','','','','','');
  2. Then assign the variables to be used in the template:
    • $smarty->assign('search1', $search1);
    • $smarty->assign('replace1', $replace1);
  3. Finally, in the tiki-searchresults.tpl template file, replace:
    • {$results[search].data|strip_tags}

with this:

    • {$results[search].data|replace:$search1:$replace1}

I'm sure there's a more elegant solution, but this seems to work for me.


posts: 2808

Hello, is anyone still working on this?

The latest versions (HEAD/v1.10 and BRANCH-1-9/v1.9.8) still seem to return wiki mark-up so i am assuming not (hmm, i hate assuming!)

Well, i had a go (sorry i didn't find this thread and the tracker itemId=299 before i started!) but my fix seems to work for me, so i was trying to work out how to submit it as a patch (i'm not quite ready for full CVS commits yet). Should i attach it to the tracker item? For now i'll

My approach is probably too processor heavy at the moment but my main aim was to avoid rewriting the wiki parsing routine for a more future-proof long term solution.

I'm only working on "searchindex" not "searchresults" at the moment (still not 100% certain what the differences are as we've been using the Google search until now for this very reason)

In lib/search/searchlib.php find_exact_wiki() i use:

  • html_entity_decode(strip_tags($this->parse_data($res["data"]))) to get rid of all formatting,
  • then do a strpos to find the first highlighted word in the text.
  • then trim the start and end points so as to not chop words up (i'm that picky)

$href = "tiki-index.php?page=".urlencode($res["page"]);
$chunksize = 250; $offs = -1; $startoff = 0;			// start: to clean the output to remove wiki source
$data = html_entity_decode(strip_tags($this->parse_data($res["data"])));
														// parse the wiki source and strip_tags
														// a bit heavy handed - would prefer to leave text formatting tags in...
$offs = strpos($data, stristr( $data, $words[0] ));		// just the first one's pos - workaround for no stripos() in php4
$startoff = $offs - ($chunksize / 2);					// start of result chunk
if ($startoff < 0) { $startoff = 0; }					// correct for chopped off words
while($startoff > 0 && preg_match("/\S/", $data[$startoff])) {
while($chunksize > 200 && preg_match("/[\S\w]/", $data[$chunksize + $startoff])) {
$ret[] = array(
  'pageName' => $res["page"],
  'location' => tra("Wiki"),
  'data' => substr($data, $startoff, $chunksize),		// end: to clean the output to remove wiki source
  'hits' => $res["hits"],
  'lastModif' => $res["lastModif"],
  'href' => $href,
  'relevance' => $res["count"]

What do you think? Any use? Too many comments? razz

jonny B

posts: 1442 Canada

Nyloth added this to 1.10 and Sylvie made it optional.

M ;-)

Why Register?

Register at tiki.org and you'll be able to use it at any *.tiki.org site, thanks to the InterTiki feature. A valid email address is required to receive site notifications and occasional newsletters. You can opt out of these items at any time.