Features / Usability

Features / Usability

Re: Trying to strip.tags in searchresults

posts: 126817 United Kingdom

Hello, is anyone still working on this?

The latest versions (HEAD/v1.10 and BRANCH-1-9/v1.9.8) still seem to return wiki mark-up so i am assuming not (hmm, i hate assuming!)

Well, i had a go (sorry i didn't find this thread and the tracker itemId=299 before i started!) but my fix seems to work for me, so i was trying to work out how to submit it as a patch (i'm not quite ready for full CVS commits yet). Should i attach it to the tracker item? For now i'll

My approach is probably too processor heavy at the moment but my main aim was to avoid rewriting the wiki parsing routine for a more future-proof long term solution.

I'm only working on "searchindex" not "searchresults" at the moment (still not 100% certain what the differences are as we've been using the Google search until now for this very reason)

In lib/search/searchlib.php find_exact_wiki() i use:

  • html_entity_decode(strip_tags($this->parse_data($res["data"]))) to get rid of all formatting,
  • then do a strpos to find the first highlighted word in the text.
  • then trim the start and end points so as to not chop words up (i'm that picky)

$href = "tiki-index.php?page=".urlencode($res["page"]);
$chunksize = 250; $offs = -1; $startoff = 0;			// start: to clean the output to remove wiki source
$data = html_entity_decode(strip_tags($this->parse_data($res["data"])));
														// parse the wiki source and strip_tags
														// a bit heavy handed - would prefer to leave text formatting tags in...
$offs = strpos($data, stristr( $data, $words[0] ));		// just the first one's pos - workaround for no stripos() in php4
$startoff = $offs - ($chunksize / 2);					// start of result chunk
if ($startoff < 0) { $startoff = 0; }					// correct for chopped off words
while($startoff > 0 && preg_match("/\S/", $data[$startoff])) {
while($chunksize > 200 && preg_match("/[\S\w]/", $data[$chunksize + $startoff])) {
$ret[] = array(
  'pageName' => $res["page"],
  'location' => tra("Wiki"),
  'data' => substr($data, $startoff, $chunksize),		// end: to clean the output to remove wiki source
  'hits' => $res["hits"],
  'lastModif' => $res["lastModif"],
  'href' => $href,
  'relevance' => $res["count"]

What do you think? Any use? Too many comments? razz

jonny B