Mediawiki importer Posted by Rodrigo Primo 24 Sep 2010 20:26 GMT-0000 Use this topic to discuss issues related with the Mediawiki importer.
Posted by Christophe 04 Oct 2010 06:07 GMT-0000 Hi Rodrigo, could see your message on IRC. I'm using Tiki version 5.1, PHP5.1.6, I don't know where the DOMdocument should be. The start of the mediawiki export is as follows : <mediawiki xmlns="http://www.mediawiki.org/xml/export-0.3/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.3/ http://www.mediawiki.org/xml/export-0.3.xsd" version="0.3" xml:lang="en">
Posted by Rodrigo Primo 04 Oct 2010 16:54 GMT-0000 Hi tochinet, DOMDocument is part of PHP. See http://php.net/manual/en/book.dom.php Soon I will add a check in the code to print a user friendly message instead of the error if DOMDocument is not available. I have also updated the documentation, now DOMDocument is listed as a requirement. In order to use the Mediawiki importer you have to enable DOMDocument on your PHP installation. It is enabled by default on recent versions. The importer use DOMDocument to read the XML file, so it is not possible to use the importer without it. Let me know if you net further assistance.
Posted by Rodrigo Primo 08 Oct 2010 17:01 GMT-0000 Hi tochinet, I have updated the code and now it checks if DOMDocument is available and if not display a user friendly error message. It is on trunk. Thanks, Rodrigo.
Posted by Christophe 11 Oct 2010 08:59 GMT-0000 Rodrigo, the two servers use PHP 5.1.6. I found (in /info.php) the following infos : DOM/XML enabled DOM/XML API Version 20031129 libxml Version 2.6.26 HTML Support enabled XPath Support enabled XPointer Support enabled Schema Support enabled RelaxNG Support enabled but also a --disable-dom in the "Configure Command" table. Any idea what the problem could be ? According to other forums on the Net, a yum update php-xml solved the issue for some people, but my php-xml seems up to date: yum says "No package marked for updates".
Posted by Rodrigo Primo 13 Oct 2010 13:20 GMT-0000 Hi tochinet, Unfortunately I don't have any experience with yum. I use Ubuntu and the PHP compiled for it (and probably all debian based distros) have DOMDocument by default. PHP version is 5.3.2 but as far as I know XML support had major changes on PHP only from version 4 to 5. So I don't think that your problem is that you are running 5.1.6. Nonetheless it is a good idea to check that. I was unable to find in which PHP version DOMDocument was added. If you find this information please let me know and I will add it to the documentation. I can't say if you have XML support enabled. Apparently yes but this --disable-dom seems to state the contrary.
Posted by Christophe 20 Oct 2010 12:05 GMT-0000 Further looking in forums, the --disable-dom seems to be a compiler?? option, and set by default. People that had a similar issue ans solved it (by upgrading, which didn't work for me) still had this set after the correction. But ... strangely enough, it worked today. Still some quirks (some links seems to be translated in strange way, and strange error messages to investigate), but I got at least something. About these errors : - Notice: Undefined variable : page in ./lib/tikilib.php on line 4609 - Notice: Undefined offset : 1 in ./lib/filegals/filegallib.php on line 1261 - Warning: Missing argument 13 for sendWikiEmailNotification(), called in ./lib/tikilib.php on line 4587 and defined in ./lib/notifications/notificationemaillib.php on line 172
Posted by Rodrigo Primo 22 Oct 2010 12:34 GMT-0000 Hi tochinet, Maybe you manage to enable DOM but was missing restarting the web server. So after you reboot this last time you tried DOM was working. Just a guess. Since I'm working again in the importer to create the Wordpress importer I would like to check those errors you mentioned. If it is not confidential data, could you send me a copy of the XML file you are importing? You can find my e-mail address on my user page. Will be much easier for me with the XML file. Also what kind of links were not correctly translated? You have an example? Have you check the importer limitations: http://doc.tiki.org/Mediawiki%20importer#Known_issues Thanks, Rodrigo.
Posted by beneason 26 Oct 2010 17:32 GMT-0000 Rodrigo - I'm getting an error message as follows: "XML file does not validate against the Mediawiki XML schema" What am i doing wrong? Ben
Posted by Rodrigo Primo 27 Oct 2010 14:16 GMT-0000 Hi Ben, This error probably means that your XML file is not valid according to the Mediawiki DTD (http://en.wikipedia.org/wiki/Document_Type_Definition) file. Might be a problem in your XML file or in the version of it. Which version of Mediawiki you are running? Can you tell me the version of the XML file (on the top of the XML file look for something like "http://www.mediawiki.org/xml/export-0.3/")? The importer was tested with Mediawiki 1.14 but it is very likely that it works with other versions. Maybe they have updated their XML definition, if no structural changes have been made should be very easy to support this new version. Rodrigo
Posted by beneason 27 Oct 2010 15:34 GMT-0000 Rodrigo - I may have started out here wrong. I'm trying to pull down data directly from wikipedia which i know is not the same as MediaWiki. By your comment back, wondering if this tool is simply good for pulling from MediaWiki. Do you know how to make it work for pulling from Wikipedia? There is a special export function in Wikipedia and this is what i was pulling from. Really appreciate the help here. Ben
Posted by beneason 27 Oct 2010 19:55 GMT-0000 Rodrigo - If it is of any help. I've attached the copy of the file i was trying to bring in from Wikipedia. Ben
Posted by Rodrigo Primo 28 Oct 2010 13:11 GMT-0000 Hi Ben, I think you forgot to attach the file. Anyway, I guess I know the problem. It is ok to import a Wikipedia file (Wikipedia is just one huge Mediawiki installation). But the newer versions of Mediawiki generated a new XML file that as expected does not validate against the DTD of the old one. The importer works with version 0.3 (http://www.mediawiki.org/xml/export-0.3/) and now Mediawiki is using version 0.4 (http://www.mediawiki.org/xml/export-0.4/). Next week I will take a look at this problem and if not much have changed from one version to the other I should be able to fix this issue. If you are a programmer you can take a look at lib/importer/tikiimporter_wiki_mediawiki.php and see if you can fix the problem by yourself. Thanks for reporting this problem, I was not aware of this new version of the XML file.
Posted by Rodrigo Primo 02 Nov 2010 18:16 GMT-0000 Hi Ben, I just added support for Mediawiki XML files 0.4 on trunk. For more information I'm copying a message I send to another developer on the Tiki devel list (he was also asking about support for version 0.4). I suggest you try to import your Mediawiki file running Tiki trunk. Let me know if you need more information on how to get Tiki trunk running. Cheers, Rodrigo. Hi Jonny, I guess I'm using schemaValidate() instead of validate() because the later generates a "no DTD found" error. I haven't investigate to understand why. See r30477 it adds support for the Mediawiki XML file version 0.4. Apparently there is some validation problems in the DTD of the version 0.4. See the Mediawiki bug I have reported https://bugzilla.wikimedia.org/show_bug.cgi?id=25753 Now you should be able to import the file you were using to test http://en.wikipedia.org/wiki/Special:Export/Train I have tested and I got some weird results. A big portion of this Wikipedia article is not displayed. I have checked and apparently its content is correctly parsed by TikiImporter_Wiki_Mediawiki::convertMarkup() (see test testConvertMarkupParserWikipediaSamplePage) and correctly added to the tiki_pages table. So apparently the problem is in Tiki parser when we try to display the page. I haven't checked more than this. This problem might be related with the fact that in this article (and in most Wikipedia articles) the sintax is used a lot. Text_Wiki ignore this sintax. A solution might be to change Text_Wiki Tiki renderer to add ~np~ when rendering . I'm not planning to put more time on this in the near future but let me know if you need any help. Cheers, Rodrigo.
Posted by whit 16 Nov 2010 19:56 GMT-0000 - I just added support for Mediawiki XML files 0.4 on trunk. How generally usable is trunk just now? We've got a fair sized internal Mediawiki we'd dearly love to import to something better featured like Tiki, but it's only feasible if the conversion is automated. Obviously trunk can have no guarantees. But does it largely work, or are there so many loose ends that we'd kick ourselves if we tried to seriously use it after converting our internal docs to it?
Posted by Rodrigo Primo 17 Nov 2010 13:07 GMT-0000 Hi, Trunk is not stable and is not recommended for a production site. I just backported the support for Mediawiki XML files version 0.4 from trunk to branch 6.x. So it should be included in version 6.1. So I suggest you wait for version 6.1 or start your site running branches/6.x which is much more stable than trunk (after 6.1 is released all the commits are reviewed before being added to branches/6.x). Note that the only thing I did was to change the importer to accept Mediawiki XML file 0.4. I haven't check what have changed from version 0.4 and version 0.3 and I did only a very basic test importing a Wikipedia article XML file version 0.4. Some changes might affect the importer. Please let me know if you find any issue. Thanks, Rodrigo.
Posted by Rodrigo Primo 17 Nov 2010 13:10 GMT-0000 Hi, I have created a Forum for the Tiki Importer (include Mediawiki and Wordpress support, though Wordpress support is still under development). I will keep this topic for documentation purposes but locked. Thanks, Rodrigo.