Loading...
 
Development

Development


Very effective performance fix

posts: 13

Hi,

I was poking around some in the guts of tikilib, and we're doing a lot of expensive substr() calls in the parse_pp_np() function. I profiled it, and we're spending almost 50% of the time in this function.

Below is a simple fix I implemented on my site, it reduces the use of substr() quite dramatically. I haven't tried to optimize any of the other code, just trying to avoid those substr(), so my "fix" most certainly isn't the best possible. But it gives an idea what can be done. If there's interest, I'd be more than happy to rewrite this function and submit a patch.

On my site, this reduces the effective load time of a typical page (tiki-view_blog) from 0.75s to 0.45s. This is simply because I reduce the time spent in substr() from about 40% to 3% (of total CPU time used).

Thanks,

-- Leif

In lib/tikilib.php, parse_pp_np() I simply added a test on the next character before doing substr(), i.e.

Copy to clipboard
// Find all sections delimited by ... $new_data = ''; $nopa = ''; $state = true; $skip = false; $dlength=strlen($data); for ($i = 0; $i < $dlength; $i++) { if ($data[$i] == '~') { $tag5 = substr($data, $i, 5); $tag4 = substr($tag5, 0, 4); $tag1 = substr($tag4, 0, 1); // Beginning of a noparse section found if ($state && $tag4 == '') { $i += 3; $state = false; $skip = true; } // Termination of a noparse section found if (!$state && ($tag5 == '')) { $state = true; $i += 4; $skip = true; $key = md5($this->genPass()); $new_data .= $key; $aux["key"] = $key; $aux["data"] = $nopa; $noparsed[] = $aux; $nopa = ''; } } else { $tag1 = $data[$i]; } . . .

posts: 13

Well, I got some time to wrap my noodle on this function when I got home from work, and I have a proposed patch now (a diff is here). This reduces the overall time spent in parse_pp_np() from 49% to 2%. This has almost cut my page load time in half.

I'm making the assumption that these ~np~ tags aren't nested here, and I can't say I have tested this very thoroughly :-). But it gives a better idea how we can improve performance here by avoiding expensive calls. Btw, this eliminated about 37,000 calls to substr() when viewing my page.

There's still room for more improvement, but this was easily the biggest and easiest win to start with. I'm somewhat surprised that we're now spending about 13% of the time calling basename(), it seems to be a pretty expensive call, and it's the next biggest consumer.

Cheers,

-- Leif


posts: 1092

If you want to become a developper on tikiwiki.org, tell us. You are welcome.
It is hard for a developper to integrate the patches of somebody else except if he is working on this point.
I think (not sure) the function you are talking about has being tickled by rlpowel recently. You can speak with him if you want.

My impression about wiki parsing is that I don't understand why the parsing is done with preg_match and ... . It is impossible to manage some case with this method. I never understood why a bnf grammar has not been written for the tikiwiki syntax , and a LL(1) or LL(2) parse hadn't been used. Traditional parsing method will be safe.
The other day I was trying to figure out if I can put a structure in a table cell... Without a grammar, I can only test.
If people are interesting to join me in a team. Welcome :-)
Sylvie


posts: 13

Ah, indeed, I pulled the source from CVS, and the tip of the tree is improved a lot. I guess I should have checked this before :-).

Using this code, we are only spending about 14% in parse_data(), so it doesn't leave a lot of room for improvement. Writing a recursive decent parser in PHP seems very reasonable, perhaps writing an LR(1) parser extension using an existing tool like Bison would be even better (but requires compiling ...) Either way, this would certainly shave of some time, but probably best we could hope for is about a 10% improvement. Not bad though, probably worth doing at some point.