Features / Usability

Features / Usability

MIME Type pdftotext

posts: 10

I was looking over this document:

At the bottom of this page it explains how to enter a MIME type and command so that TikiWiki will search the PDF files.
I have setup pdftotext and it works at the linux command line. I also added the:
application/pdf /usr/local/bin/pdftotext %1
to my File Galleries admin page. When i upload a new PDF i can see that the .txt output from pdftotext is (i am using a directory to store all uploads) created but when i use TikiWiki search it doesn't find any of the text inside the pdf or the .txt created by pdftotext.

Any ideas how i can get TikiWiki to "index" the .txt file into the mysql database so i can search on it?


posts: 113 Ireland

The line should be

application/pdf /usr/local/bin/pdftotext %1 -

You left off the redirecting '-'. The txt is never created as a file, it is piped.

Now you should reindex all files for search.

The result will be that if you look in the tiki_file table in the database, the search_data field will now have all of the text that was extracted. You should look here to confirm that your data is there.

I use:

application/msword /usr/local/bin/catdoc %1 - application/pdf /usr/local/bin/pdftotext %1 - application/vnd.ms-excel /usr/local/bin/xls2csv %1 application/vnd.ms-powerpoint /usr/local/bin/catppt %1 text/html strings %1 text/plain strings %1
posts: 10

Thank you! adding the '-' made it work.
Not sure how i missed that one. So simple heh heh


posts: 3

I have the exact same problem, only my tikiwiki is installed on a Windows server running Apache.
I made the change so that my command is now:

application/pdf /GnuWin32/bin/pdftotext.exe %1 -

The command works when I run it at the cmd prompt but it fails to work when I use it in tikiwiki.

Any ideas?

posts: 1
Same problem here on FreeBSD (even with the -, and also with pstotext, which does not need the -). Do you have a solution meanwhile?

posts: 22 Germany

Sorry for bumping this old thread. But I have the same problem with apache running on a windows server.

All pdf files are read by pdftotext and exported to a textfile. But that's all. The redirecting '-' dows not work in a windows environment.

Any windows user with a working pdf search?