Loading...
 
Features / Usability

Features / Usability


Setting Cron Job for Search

posts: 13

I'm trying to find some details on how to set up the cron job for the search refresh. Can someone explain how to do it or point me in the right direction?

Thanks in advance.

Mokonzi

posts: 13
Any suggestions or help on this?

posts: 113 Ireland

I am wondering why you need to set up a cron job. Why aren't you using the auto indexing (create index on upload)? Admin -> File Galleries and Define MIME types and System Commands. This puts the searchable text in a column of the database. Easy to set up and use.

See MIME types

gg


posts: 13

Ok, I added the last two of the MIME types in, and then reindexed the pages. Is that the only thing I need to do for it to refresh the Search?

I assume it refreshes automatically when I add new content?

Thanks for the help gg.


posts: 13
So far I'm not seeing how it refreshes the search when I add a Wiki page manually. Am I missing something?

posts: 113 Ireland

When you added the MIME types, you enabled File Galleries to be automatically indexed. The file gallery content is often stored outside the database, and in a non-searchable format (like .doc, .pdf, ...) so this is the only way to index.

All of the other types of data (wiki, forums, faqs, blogs, articles) the search goes thru the database. So there is not concept of indexes (the content is in the database)

gg


posts: 13
Ok, in which case, what do to solve the time outs I'm getting when I hit refresh the search option?
posts: 15

Hi.

Solving the timeout issue

Firstly, you increase the execution time, by increasing the value of MAX_EXECUTION_TIME in php.ini and then if PHP is an Apache module, you'd need to restart Apache.

This is the relevant section of PHP.INI

;;;;;;;;;;;;;;;;;;;
; Resource Limits ;
;;;;;;;;;;;;;;;;;;;

max_execution_time = 30 ; Maximum execution time of each script, in seconds

You wouldn't want to just make this daftly high - 5 minutes for example, so experiment with increasing it until your search script doesn't timeout (also though - see the CRON job section as you may not need to change this figure).

For me, 40 seconds is fine although since upgrading to 1.9.10.1 from 1.9.2, I get this error:

"Notice: Use of undefined constant PLUGINS_DIR - assumed 'PLUGINS_DIR' in E:\services\web\tikiwiki\lib\wiki-plugins\wikiplugin_pluginmanager.php on line 42"

If anyone knows a fix for this that'd be much appreciated - at the moment I'm not sure if the indexes are being refreshed - ie. if this is just a warning, or if the index rebuild is failing because of it.

CRON job

I'm also keen to set up the index refresh as a cron job.

When you click on the " Refresh wiki search index now" link from Admin -> Search, it links to this page which I suppose contains the code that runs the index:
http://www.yourserver.com/yourtikiwikipath/tiki-admin.php?page=search&refresh_index_now=y

So I guess a cron job would run a command that includes this URI as a parameter (unless that doc/devtools/batch_refresh_indexes_tikisearch.php file that gg mentions is available somewhere - that would be a better option).

This page shows some potential options using wget or lynx or php:
http://docs.phplist.com/CronJobExamples

With the php option, you could also maybe include the -d parameter to alter max_execution_time without changing it in php.ini and therefore making it a sitewide change.

However, this leaves the problem of authentication since you can only execute this page logged in as an admin I think.

Would be interested to know if anyone's got this working.

Cheers,

Nick.


posts: 113 Ireland

I am not sure I understand what you mean by 'refresh the search option'.

Do you mean: Admin -> File Galleries and press the 'reindex all files for search'? You should only need to do that if you just enabled the MIME types or added a new MIME type. Otherwise, the new index would be identical to the old index. It is only doing File Galleries. If this is timing out, then check you PHP timeout settings (Admin -> phpinfo then look at the php settings and change as appropriate in the php.ini file)

Do you mean: Hit the browsers refresh button? I can't see how that would time out

gg


posts: 13

I'll start from scratch:

I'm talking about the Search config:

http://doc.tikiwiki.org/tiki-index.php?page=Search+Config&highlight=cron%20job, which shows that the normal search refresh can time out, and that for larger wikis, a cron job is needed.

Get to refresh option by:

Admin -> Search -> Refresh wiki search index now (which is under search features)

Hope I've made myself a bit clearer now... neutral


posts: 113 Ireland

Now I understand the question. I actually never manually use this, as I let the system incrementally index the search table. Take a look at the tiki_searchindex table in the database. That is where all of the results of this search are stored. If you sort by last_update, you can see how it is incrementally building the index. If you have a site that is fairly stable, I would suggest that you let the system do the index.

If you manually refresh the index, it can exceed the php time limit on a large site. If your site is changing significantly, then I guess a cron job might be used. The Search Admin page states there is a sample file in the source tree (doc/devtools/batch_refresh_indexes_tikisearch.php), but it is not there.

gg


posts: 13
Thanks GG. Any suggestions on how I might get some info on how to set up the cron job?

posts: 13
I'm really lost on this guys, anyone done this yet? Any comments or suggestions?

posts: 1

Hiya gg.. this is exactly the question I joined the forum to ask!

You say about "I actually never manually use this, as I let the system incrementally index the search table. Take a look at the tiki_searchindex table in the database. That is where all of the results of this search are stored. If you sort by last_update, you can see how it is incrementally building the index. If you have a site that is fairly stable, I would suggest that you let the system do the index."

Does this automatically happen as it would be fine? I just havent seen this behaviour.

thanks in advance,

Alan


posts: 113 Ireland

Alan

Look in Admin -> Features: Search for the field 'Search Refresh Rate'. If it is non-zero, then every Nth time someone logs in, a random part of the wiki gets reindexed. The results are put into the tiki_searchindex table. To confirm that this is happening, look at the field 'last_update' in this table. This is the timestamp of the most recent update. If your table is empty, then I suspect you have a 0 for the search refresh rate. You can use something like phpmyadmin to view the database.

Another way to determine this is working is to look at the most recent timestamp in the table. Then login (non-admin) to the site multiple times (dependent on the setting for Search Refresh Rate) . Then check the most recent timestamp again. It should have updated. Or you can just set the refresh rate to 1 and this should have some part of the wiki get reindexed on every login.

gg

posts: 13
Would this solution help in my case? Or is it still going to need a cron job to sort it out?

posts: 13

>Can you tell me how big your site is and what the setting for refresh rate is. I would suggest setting refresh to 1 and then login with a couple different user names.
>
>Have you looked at the table that stores the indexes? It would be useful to know how big it is. Maybe it is very big and working and you just don't realize it is working. Compare the datestamps in tiki_log (contains login details) and with tiki_searchindex. This can tell you if it is working.
>
>Gary

Thanks for the info on what tables to check out. It is updating the wiki slowly. The refresh rate is set to 5. Would it be better dropping it to 1, or is that going in the wrong direction? :-)

The wiki search index has almost 32,000 entries. There are also 450+ wiki pages at the moment.

Is it better overall to get the index to refresh via cron or leave it to do itself over time?

Thanks for your help gg.

Mokonzi

PS I'm having issues sending you a message directly.