Loading...
 
Tiki Importer

Tiki Importer


Mediawiki Importer - blank screens, segfaults, uploaded file inclusion

Rodrigo has a forum here for the Mediawiki Importer - but he's locked it and given a notice that there's a new forum to discuss it in, which links back to the same locked forum. I went to send him a private message, but then after entering the message get a response saying that his user cannot receive messages - which really should be the response from Tiki before presenting the form for the message input (feature request), but I digress.

On Rodrigo's advice, I installed the 6.x Tiki from SYN which he said has his latest importer that handles newer versions of Mediawiki. When I try the import, it just instantly fails - bounces back to the screen to begin the import from. What's the right place to discuss this? Thanks.

Brazil

Hi whit,

Sorry for the confusion with the forums. Before we had a single topic. I created a new forum for the Tiki Importer and moved the old topic inside "Features/Usability" forum there. The old topic is locked but the new forum is open, so you can create new topics.

I have changed my user preferences so now I should be able to receive messages using Tiki internal messaging system.

Thanks for improving Mediawiki Importer documentation. I saw your changes on http://doc.tiki.org/Mediawiki+Importer.

We can continue the discussion is this topic (which I already moved to the new forum).

A while ago a user reported that Mediawiki was generating an invalid XML file and this was causing the importer to generate a blank screen. So please check if you have a valid XML file with an external tool. If you open your XML file in Firefox for example it will tell you wheter it is valid or not. I have updated the documentation with a paragraph about this and I plan to improve the importer when I have some free time to display an error message instead of a blank screen when handling an invalid XML file.

Let me know if you have a valid XML file so we can further investigate why the importer is not working in your case.

Thanks, Rodrigo.

Hi Rodrigo,

One problem down, another to go. The blank page was because, according to the Apache error log, the POST content length was beyond the setting there. Enlarging that further fixed the blank page problem. So the fix there would be to show the error - as other PHP errors already do - rather than fail silently.

Having got past that, the importer now gets part way into the initial read through of the pages and causes a segfault of the Apache child. Thinking this might be an Ubuntu thing (possibly with preg library version), I did a fresh install of Fedora 14 and put Tiki on it. It has precisely the same segfault, in the same place in the two largish files I'm trying this on, both dumps through the Mediawiki php script of the same Mediawiki - one with history and no attached-files links, the other the other way around. I've filed a bug report here on the segfault - more details there.

The files open without any errors in Firefox.

The uploaded files links, by the way, appear not to work - the importer claims it's not good - even after editing the XML to point to the Mediawiki server (required because it's on a LAN elsewhere - so the links in the XML have a local name that doesn't work from here). Perhaps that's because the remote server requires a uid/password sign in? Any way to handle that? Is it just a matter of moving the the files to upload to a location where the right URL can reach them, and then replacing the URL in the XML with that?

Thanks,
Whit

I've narrowed down the cause of the segfault to a particular page with a large table. Removing that from the XML - and incidentally using the dumpBackup.php option to chunk the export by page number range so as to get smaller files to import - allows most all the pages to import. And they do that fairly well. If someone wants a copy of the page that chokes it as a test case I can probably provide that - although not publicly as there's some private info in it. Update: I've now got about a half dozen large tables which reliably produce segfaults if included in the XML for import. Removing them by hand gets the rest of the pages in, at least. The tables appear generally within a few pages of the page listed on the screen by the importer at the time of the segfault.

Still haven't figured out how to make the uploads feature work for the importer. The importer suggests that I haven't used "-uploads" in dumpBackup.php yet I have. This could relate to my using a more recent version of dumpBackup.php than came with our Mediawiki. The version that came with it lacks the "-uploads" option, so I grabbed the most recent one from the Mediawiki site - which appears to work just fine though. I've confirmed that the uploaded images and pdfs are available outside of the need to log in for them - Mediawiki has very loose access controls in a default installation, so if you know file paths and names you can get them without login. Update on this: The "-uploads" flag in reality creates no difference in the contents of the resulting XML file in my case, so maybe the slightly-older Mediawiki versions just won't support that. Or maybe I'm just misunderstanding what an "upload" is in this context. Does it include images uploaded to the Mediawiki for inclusion in pages, or is it something narrower than that which perhaps we just don't have?

Note: Just changed the title of the thread to match the current contents. Realize it makes it sound pretty bad. In truth I'm quite impressed with it, now that I've got some results. So this is more about rough edges.

Brazil

Hi whit,

If I remember correctly it is not that simple to handle the error when your file is bigger than PHP's post_max_size setting (you are talking about this one or some other Apache setting?). My impression is that you can't catch this error. I mean this error happens before you can check on PHP side the size of the file to match against the setting value. But I might be wrong.

If this is the case could you please update the documentation with your experience with this? Maybe add one more item to the list of known issues warning users to manually check the size of the file against post_max_size value.

Others have reported this segmentation fault error. Please send me by e-mail (I just sent you a message with my e-email address) a copy of the file causing the problem and I will try to check. I have checked already one file causing this problem and I wasn't able to find any kind of solution. As far as I know segmentation fault errors should not occur and they smell like PHP bugs. But I don't have enough knowledge to be sure about that. As you guessed it is very likely that the problem occurs inside preg library.

About the uploads, you got the idea right. It is possible to import files or images attached to Mediawiki pages. But the problem is that you need the information added by dumpBackup.php --upload and if there is no difference between the XML file generated by dumpBackup.php --upload and Mediawiki export interface the importer will not be able to import the attachments. The problem is that the standard Mediawiki XML file has no information about the location of the files. This is way dumpBackup.php --upload exist. As you said probably it is a Mediawiki version issue.

Rodrigo

You may well be right on PHP's post_max_size.

As a kludge to import files I tar'd the Mediawiki images directory tree, recreated it on the Tiki server, and then did two things:

  • edited the XML file before import so that references to /images/x/whatever would be to the location of the copied tree (in my case that happens to be /wikiimages/x/whatever

  • also copied all the files from that tree to /img/wiki_up - where the imported wiki also ends up trying to link to some of them


For the second copy, I used a few lines of python:

Copy to clipboard
#! /usr/bin/python import os, shutil inroot = '/var/www/tiki/wikiimages' outroot = '/var/www/tiki/img/wiki_up' for root, dirs, files in os.walk(inroot): for name in files: oldfile = os.path.join(root,name) newfile = os.path.join(outroot,name) shutil.copy2(oldfile,newfile)


The only serious thing missing is getting the file names into Tiki's database, so that they're available for new inclusions in other pages.

Brazil

Thanks for the script. I have updated the documentation with a link to your post here. It is a good workaround for those having problems to create the XML file with information about the uploads.

About your question, it is not that simple to have the file names in the database. The importer is using the old way to add images to wiki pages. It is copying them to img/wiki_up and adding the syntax to display the file in the wiki page content.

To have the file names in the database you have to change the importer to import the attachments to a file gallery instead of img/wiki_up. This is a nice improvement to have in Tiki importer but needs to be coded.

Thanks, Rodrigo.


Upcoming Events

1)  18 Apr 2024 14:00 GMT-0000
Tiki Roundtable Meeting
2)  16 May 2024 14:00 GMT-0000
Tiki Roundtable Meeting
3)  20 Jun 2024 14:00 GMT-0000
Tiki Roundtable Meeting
4)  18 Jul 2024 14:00 GMT-0000
Tiki Roundtable Meeting
5)  15 Aug 2024 14:00 GMT-0000
Tiki Roundtable Meeting
6)  19 Sep 2024 14:00 GMT-0000
Tiki Roundtable Meeting
7) 
Tiki birthday
8)  17 Oct 2024 14:00 GMT-0000
Tiki Roundtable Meeting
9)  21 Nov 2024 14:00 GMT-0000
Tiki Roundtable Meeting
10)  19 Dec 2024 14:00 GMT-0000
Tiki Roundtable Meeting