Ticket #2477 (new enhancement )

Opened 3 years ago

Last modified 2 weeks ago

xml serialize optimizations

Reported by: jack Assigned to:
Type: enhancement Priority: normal
Milestone: Component: words
Keywords: xish domish xml review Cc: jack, twonds, ralphm, exarkun, thijs
Branch: branches/faster-domish-serialization-2477 Author: exarkun
Launchpad Bug:

Description

twisted.words.xish's xml serialization is quite slow. It is routinely showing up as the top cycle eater in profiles. The following two patches attempt to address this.

The first one gets rid of all string formatting operations in favor of more calls to write(). This results in about a 17% speedup.

The second patch caches the results of escapeToXml and achieves about another 25% performance boost at the cost of memory. This patch is usable but could probably be done better.

The performance is still not great, but I'm unsure how further gains can be made. I tried amny things includes cStringIO and a non-recursive implementation (both made the code slower, not faster). I think the real solution is to make domish a light wrapper around some fast underlying C library.

Attachments

domish.diff (4.9 kB) - added by jack 3 years ago.
replace string formatting with more calls to write()
domish2.diff (1.1 kB) - added by jack 3 years ago.
cache results of escapeToXml
test.log.bz2 (127.0 kB) - added by jack 3 years ago.
test data from our xmpp component
timer.py (1.6 kB) - added by jack 3 years ago.
benchmark script
cserialize.diff (1.7 kB) - added by thijs 3 weeks ago.
diff from #3633

Change History

  2007-02-24 03:48:57+00:00 changed by jack

  • attachment domish.diff added

replace string formatting with more calls to write()

  2007-02-24 03:49:43+00:00 changed by jack

  • attachment domish2.diff added

cache results of escapeToXml

  2007-02-25 03:49:24+00:00 changed by exarkun

Can you attach the benchmarks you used?

  2007-02-25 03:59:58+00:00 changed by jack

Sure. I'm attaching a small data file with xml extracted from our components logs (ie, real world data for our application) and the timing code I used for benchmarking performance on this data.

  2007-02-25 04:00:44+00:00 changed by jack

  • attachment test.log.bz2 added

test data from our xmpp component

  2007-02-25 04:01:08+00:00 changed by jack

  • attachment timer.py added

benchmark script

  2007-02-26 16:39:28+00:00 changed by ralphm

The problem with using a c based XML library like elementtree, is that it seems hard to control the serialization. XMPP has special requirements in this regard in terms of how namespaces are used and that we serialize stanzas, not complete documents. I'd like to be proven wrong, though.

  2007-03-03 10:59:36+00:00 changed by ralphm

  • owner changed from exarkun to ralphm

  2008-04-03 09:40:55+00:00 changed by ralphm

  • owner changed from ralphm to jack
  • branch deleted
  • author deleted

Jack, I'm not sure how the log file relates to the timer script here. Could you expand on that? It would be nice to have a script that we can use to test performance before and after applying the patches, that we can put alongside the Words documentation, similar to the ones in doc/core/benchmarks.

  2008-04-03 13:57:28+00:00 changed by twonds

It is just data to test with the profile script. You can just use that data or generate some other xml to run the script and profile before and after the patches right? We could just switch this to use timeit instead of profile. I could do this or get jack to do it if that is what you want.

  2009-01-17 03:38:14+00:00 changed by jack

  • owner changed from jack to ralphm
  • launchpad_bug deleted

  2009-01-17 08:56:24+00:00 changed by glyph

  • keywords changed from xish domish xml to xish domish xml review
  • owner deleted

So, according to this blog post, Jack has already been using this for "2 years", and has started distributing a fork of Twisted that includes this patch. In that same post he complained that it hasn't been merged yet, so maybe somebody should review it.

  2009-01-17 18:28:58+00:00 changed by exarkun

  • branch set to branches/faster-domish-serialization-2477
  • branch_author set to exarkun

(In [26066]) Branching to 'faster-domish-serialization-2477'

  2009-01-18 00:46:55+00:00 changed by exarkun

(In [26075]) Fix a bug in the cache optimization which results in incorrect XML escaping

refs #2477

  2009-01-18 04:08:26+00:00 changed by exarkun

  • cc changed from jack, twonds, ralphm to jack, twonds, ralphm, exarkun
  • owner set to ralphm

I applied both patches to a branch. I also grabbed some of the test data and worked it into a modified version of the benchmark which can easily be run to produce some performance data.

The caching part of the patch had two problems which I have addressed (with different success, I think). First, it caused escapeToXml to return incorrect results sometimes because the cache did not take into account the value of isattrib. The code now makes sure to respect this value. Second, the cache was allowed to grow to a million items, then completely discarded. A million items is too large for an upper bound for a private cache which users have no control over. The behavior of discarding all entries when the cache became full also seems strange; I suspect this case was never encountered, or was at least encountered only very rarely, but I don't know. It seems as though most keys in the cache would have to be very low value, only being computed once or twice in the lifetime of the cache. However, I don't really know what a typical xmpp stream's values look like (possibly there is no such thing as a typical stream). So, I have replaced the caching scheme with a different one. This one imposes a (comparatively) low limit and uses a very simple two-generation scheme to attempt to preserve keys with frequent hits while discarding those which are not often requested. As to whether it really succeeds at this goal, I do not know. It likely depends on the actual data used with it, which again I am not very knowledgeable of.

  2009-01-18 05:02:14+00:00 changed by jack

That sounds like the right improvement. I believe we used fill and discard because it is very fast. Most other methods we could think of incurred enough overhead to make the performance gains less interesting.

Does your new benchmarking find similar results to what I reported with the original cache method? What are they like with the new cache method?

  2009-01-18 11:09:26+00:00 changed by ralphm

  • keywords changed from xish domish xml review to xish domish xml
  • owner changed from ralphm to exarkun

I've run the benchmark a bunch of times on trunk, branch with only Jack's patches (r26063), branch up till exarkun's caching experiments (r26073), and the branch as it is now (r26082). On my machine, I consistently get respectively 2.2⋅10³, 3.4⋅10³, 3.7⋅10³ and 4.1⋅10³ elements/s. Thanks for the improvements!

Some review comments:

  • domish_serialization.py import sys, but doesn't use it.
  • serialize should have a proper docstring.
  • A bunch of docstrings don't have their opening on a line of their own, unfortunately, and this should at least be fixed for escapeToXml.
  • escapeToXml's docstring should mention the caching.
  • The cache limit should probably have a name.
  • The change to addContent made me discover an issue with SerializedXml. I filed #3623 for that, but maybe it could be solved in this branch.

  2009-01-18 17:48:15+00:00 changed by exarkun

Jack, is there any chance you can try this code out and compare its performance to trunk and to your original patches?

  2009-01-27 10:00:36+00:00 changed by jack

I can confirm Ralph's findings. Trunk gives ~2100 elements/sec. My patches give ~3100 elements/sec. And the final branch code gives ~3700 elements/sec. Great job.

I've now written a C serializer for domish Elements in and attached it to a new bug. This code gives ~7500 elements/sec. See #3633.

  2009-01-27 20:14:21+00:00 changed by exarkun

Jack, have you tried using psyco with this code? In my tests, it produces roughly a 3x speedup in this branch (distinctly less in trunk, I suppose there are fewer opportunities for it there). How does this compare to the C serialize you've written?

  2009-03-24 16:14:16+00:00 changed by exarkun

Ping?

  2009-05-05 05:06:15+00:00 changed by jack

We used to run psyco but ran into some weird problems and turned it off some time back. I'm a bit wary of turning that on in our production systems unless I can dig up the reasons we turned it off, or we can do some load testing with it enabled on the staging server.

If you are saying it's 3x faster, that would make it approximately (and hypothetically since I haven't run it) 11k elements/s, which is a fair bit faster than my C version. However, I think the C version can be sped up considerably.

I see no reason not to commit this work since there is still outstanding work to do on the C version, and there may be good reasons to have both implementations (ie, C may be faster for CPython, but unworkable on Jython or IronPython?).

  2009-05-05 18:55:12+00:00 changed by jack

Addressing exarkun's comments from #3633:

What's interesting is that XML serialization is fast. In the light of the 2x improvement of the approach of using a C library instead of a Python library, one thought I have is that the Python optimizations are irrelevant and should be discarded. They obscure the code and don't bring its performance even close to the performance this optimization gives. That said, I'd also really like to explore using an existing C library to do this serialization rather than including a completely custom C extension.

I disagree that the Python optimizations are irrelevant now for the reasons stated above. I agree that the python optimization is a tradeoff on code readability, but I think as long as the rest of the domish stuff is nice and readable, it's not a big deal.

The problem with using an existing C library to do serialization is that it will need the XML represented in its internal format to start with. Translating from domish's format to say the libxml2 format is probably just as expensive an operation as serializing to a string. This story would be radically different if domish itself were just a wrapper around libxml2 or something similar. This has been discussed before (see #3201 for example).

  2010-01-19 02:23:59+00:00 changed by thijs

  • cc changed from jack, twonds, ralphm, exarkun to jack, twonds, ralphm, exarkun, thijs

From #3633:

For most XMPP applications, XML serialization is the bottleneck. While #2477 improves the Python serializer quite a bit, this can be made faster by dropping down to C.

I've written a Python c module called cserialize which serializes domish.Element objects. This module is compatible with domish and passes most tests. It gives slightly more than a factor of 2 performance increase over the code in #2477.

The failing tests have to do with namespace issues, and after investigating the code, I believe the tests are wrong and need to be rewritten. I haven't rewritten them as I wanted to discuss it first.

You can find the cserialize code here: http://github.com/metajack/cserialize

  2010-01-19 02:24:42+00:00 changed by thijs

  • attachment cserialize.diff added

diff from #3633

  2010-01-19 02:25:24+00:00 changed by thijs

  • keywords changed from xish domish xml to xish domish xml review
  • owner deleted

  2010-01-25 17:46:41+00:00 changed by exarkun

I think we should ignore the cserializer stuff. If someone wants to review what's in the branch referenced here, I guess that's fine. I'm not too excited about this, since it adds a lot of code (much of which is not idiomatic), but there's a benchmark and it shows this is a performance improve so great I guess.

It would be better if we could let someone else deal with this kind of performance issue, of course.

Note: See TracTickets for help on using tickets.