Loading...
 
Features / Usability

Features / Usability


Trying to strip.tags in searchresults

posts: 3665 United States

Using 1.9.4....

In tiki-searchresults.tpl, I want to strip out all the wiki tags, module coding, etc. from the search results so that only actual text appears. The TPL file already uses:

{$results[search].data|strip_tags}

However, this doesn't seem to work — the wiki codes are still being shown in the search results. I experimented with

{$results[search].data}

and it produced exactly the same results. Anyone know how to display text without any wiki formatting?

-Rick

posts: 1

Here are my changes to achieve the same goal (from lib/search/searchlib.php):

Image
Copy to clipboard
$search1= array('*','!','__','((','))'); $replace1= array('','','','',''); $cant=0; $ret=array(); while ($res = $result->fetchRow()) { if($this->user_has_perm_on_object($user,$res["page"],'wiki page','tiki_p_view')) { $href = "tiki-index.php?page=".urlencode($res["page"]); ++$cant; $ret[] = array( 'pageName' => $res["page"], 'location' => tra("Wiki"), 'data' => str_replace($search1,$replace1,substr($res["data"],0,250)), 'hits' => $res["hits"], 'lastModif' => $res["lastModif"], 'href' => $href, 'relevance' => $res["count"] ); } }

Essentially, search1 is an array of strings that I want to replace with the the array replace1. I do the actual replace a few lines down with the str_replace function.

Hope this helps,
Tyler

posts: 3665 United States

> Here are my changes to achieve the same goal (from lib/search/searchlib.php):

[snip]

EXCELLENT! This is what I was looking for as a start. How can this be updated to also strip out the modules? I need to remove all the {......} as well.

-Rick


posts: 80 Austria

Hi!
Where do i put this to make it work correctly? CAn you please post a line number?
Thank you.


posts: 3665 United States

tmuth's solution works great for the index search (tiki-searchindex.php) but not for the database search (tiki-searchresults.php) — I kept hitting a String/Array error.

So, borrowing from tmuth, here's my hack:

  1. In tiki-searchresults.php add the following lines:
    • $search1= array('__','!','!!','!!!','^','{maketoc}','%%%','::');
    • $replace1= array('','','','','','','','');
  2. Then assign the variables to be used in the template:
    • $smarty->assign('search1', $search1);
    • $smarty->assign('replace1', $replace1);
  3. Finally, in the tiki-searchresults.tpl template file, replace:
    • {$results[search].data|strip_tags}

with this:

    • {$results[search].data|replace:$search1:$replace1}


I'm sure there's a more elegant solution, but this seems to work for me.

-Rick


posts: 126886 United Kingdom

Hello, is anyone still working on this?

The latest versions (HEAD/v1.10 and BRANCH-1-9/v1.9.8) still seem to return wiki mark-up so i am assuming not (hmm, i hate assuming!)

Well, i had a go (sorry i didn't find this thread and the tracker itemId=299 before i started!) but my fix seems to work for me, so i was trying to work out how to submit it as a patch (i'm not quite ready for full CVS commits yet). Should i attach it to the tracker item? For now i'll

My approach is probably too processor heavy at the moment but my main aim was to avoid rewriting the wiki parsing routine for a more future-proof long term solution.

I'm only working on "searchindex" not "searchresults" at the moment (still not 100% certain what the differences are as we've been using the Google search until now for this very reason)

In lib/search/searchlib.php find_exact_wiki() i use:

  • html_entity_decode(strip_tags($this->parse_data($res["data"]))) to get rid of all formatting,
  • then do a strpos to find the first highlighted word in the text.
  • then trim the start and end points so as to not chop words up (i'm that picky)

Copy to clipboard
$href = "tiki-index.php?page=".urlencode($res["page"]); ++$cant; $chunksize = 250; $offs = -1; $startoff = 0; // start: to clean the output to remove wiki source $data = html_entity_decode(strip_tags($this->parse_data($res["data"]))); // parse the wiki source and strip_tags // a bit heavy handed - would prefer to leave text formatting tags in... $offs = strpos($data, stristr( $data, $words[0] )); // just the first one's pos - workaround for no stripos() in php4 $startoff = $offs - ($chunksize / 2); // start of result chunk if ($startoff < 0) { $startoff = 0; } // correct for chopped off words while($startoff > 0 && preg_match("/\S/", $data[$startoff])) { --$startoff; } while($chunksize > 200 && preg_match("/[\S\w]/", $data[$chunksize + $startoff])) { --$chunksize; } $ret[] = array( 'pageName' => $res["page"], 'location' => tra("Wiki"), 'data' => substr($data, $startoff, $chunksize), // end: to clean the output to remove wiki source 'hits' => $res["hits"], 'lastModif' => $res["lastModif"], 'href' => $href, 'relevance' => $res["count"] );


What do you think? Any use? Too many comments? razz

jonny B


posts: 1630 Canada

Nyloth added this to 1.10 and Sylvie made it optional.

M ;-)


Upcoming Events

1)  18 Apr 2024 14:00 GMT-0000
Tiki Roundtable Meeting
2)  16 May 2024 14:00 GMT-0000
Tiki Roundtable Meeting
3)  20 Jun 2024 14:00 GMT-0000
Tiki Roundtable Meeting
4)  18 Jul 2024 14:00 GMT-0000
Tiki Roundtable Meeting
5)  15 Aug 2024 14:00 GMT-0000
Tiki Roundtable Meeting
6)  19 Sep 2024 14:00 GMT-0000
Tiki Roundtable Meeting
7) 
Tiki birthday
8)  17 Oct 2024 14:00 GMT-0000
Tiki Roundtable Meeting
9)  21 Nov 2024 14:00 GMT-0000
Tiki Roundtable Meeting
10)  19 Dec 2024 14:00 GMT-0000
Tiki Roundtable Meeting