Loading...
 
Features / Usability

Features / Usability


Re: Trying to strip.tags in searchresults

posts: 126351

Hello, is anyone still working on this?

The latest versions (HEAD/v1.10 and BRANCH-1-9/v1.9.8) still seem to return wiki mark-up so i am assuming not (hmm, i hate assuming!)

Well, i had a go (sorry i didn't find this thread and the tracker itemId=299 before i started!) but my fix seems to work for me, so i was trying to work out how to submit it as a patch (i'm not quite ready for full CVS commits yet). Should i attach it to the tracker item? For now i'll

My approach is probably too processor heavy at the moment but my main aim was to avoid rewriting the wiki parsing routine for a more future-proof long term solution.

I'm only working on "searchindex" not "searchresults" at the moment (still not 100% certain what the differences are as we've been using the Google search until now for this very reason)

In lib/search/searchlib.php find_exact_wiki() i use:

  • html_entity_decode(strip_tags($this->parse_data($res["data"]))) to get rid of all formatting,
  • then do a strpos to find the first highlighted word in the text.
  • then trim the start and end points so as to not chop words up (i'm that picky)

$href = "tiki-index.php?page=".urlencode($res["page"]);
++$cant;
$chunksize = 250; $offs = -1; $startoff = 0;			// start: to clean the output to remove wiki source
$data = html_entity_decode(strip_tags($this->parse_data($res["data"])));
														// parse the wiki source and strip_tags
														// a bit heavy handed - would prefer to leave text formatting tags in...
$offs = strpos($data, stristr( $data, $words[0] ));		// just the first one's pos - workaround for no stripos() in php4
$startoff = $offs - ($chunksize / 2);					// start of result chunk
if ($startoff < 0) { $startoff = 0; }					// correct for chopped off words
while($startoff > 0 && preg_match("/\S/", $data[$startoff])) {
	--$startoff;
}
while($chunksize > 200 && preg_match("/[\S\w]/", $data[$chunksize + $startoff])) {
	--$chunksize;
}
$ret[] = array(
  'pageName' => $res["page"],
  'location' => tra("Wiki"),
  'data' => substr($data, $startoff, $chunksize),		// end: to clean the output to remove wiki source
  'hits' => $res["hits"],
  'lastModif' => $res["lastModif"],
  'href' => $href,
  'relevance' => $res["count"]
);


What do you think? Any use? Too many comments? razz

jonny B

Why Register?

Register at tiki.org and you'll be able to use the account at any *.tiki.org site, thanks to the InterTiki feature. A valid email address is required to receive site notifications and occasional newsletters. You can opt out of these items at any time.