Ticket #3633 (closed enhancement: duplicate )

Opened 1 year ago

Last modified 1 year ago

C based domish.Element serializer

Reported by: jack Assigned to: jack
Type: enhancement Priority: normal
Milestone: Component: words
Keywords: xish domish xml xmpp Cc: ralphm, exarkun, twonds
Branch: Author:
Launchpad Bug:

Description

For most XMPP applications, XML serialization is the bottleneck. While #2477 improves the Python serializer quite a bit, this can be made faster by dropping down to C.

I've written a Python c module called cserialize which serializes domish.Element objects. This module is compatible with domish and passes most tests. It gives slightly more than a factor of 2 performance increase over the code in #2477.

The failing tests have to do with namespace issues, and after investigating the code, I believe the tests are wrong and need to be rewritten. I haven't rewritten them as I wanted to discuss it first.

You can find the cserialize code here: http://github.com/metajack/cserialize

Attachments

cserialize.diff (1.7 kB) - added by jack 1 year ago.
patch to domish.py to use cserialize

Change History

  2009-01-27 09:59:24+00:00 changed by jack

  • attachment cserialize.diff added

patch to domish.py to use cserialize

  2009-01-27 10:00:23+00:00 changed by jack

I just noticed the patch includes an unintended change to the namespace prefixes. Sorry about that.

  2009-01-27 16:11:04+00:00 changed by ralphm

  • cc set to ralphm, exarkun
  • keywords changed from xish domish xml xmpp review to xish domish xml xmpp
  • owner changed from exarkun to jack

First of all, thanks for working on this!

Why not include the C-based serializer in Twisted proper? That would remove the additional external dependency and allow for easier maintenance.

The patch itself is way too complicated. In previous times, there were multiple serializer classes, and they could be switched by setting the public module attribute SerializerClass to point to the serializer class of choice. At some point the even slower _Serializer was removed and _ListSerializer was made the default.

Assuming the inclusion of the c-based serializer as a module, we can just try to import it and have a class _CSerializer (or something) that provides serialize and __init__ like _ListSerializer. If available, it can be assigned to SerializerClass in preference to the pure Python one. An interface definition might be in order.

Could you expand on the failing tests?

  2009-01-27 16:18:01+00:00 changed by exarkun

A procedural interjection:

#2477 is already a ticket for XML serialization optimization. This ticket is a duplicate. It's not really important that this ticket was filed for a different set of optimizations. The specific changes which result in fast XML serialization aren't very interesting. What's interesting is that XML serialization is fast. In the light of the 2x improvement of the approach of using a C library instead of a Python library, one thought I have is that the Python optimizations are irrelevant and should be discarded. They obscure the code and don't bring its performance even close to the performance this optimization gives. That said, I'd also really like to explore using an existing C library to do this serialization rather than including a completely custom C extension. However, regardless of what we ultimately decide to do, #2477 is the ticket to associate discussion with.

So, Jack, could you close this ticket as a duplicate and raise the ideas you've presented here on #2477?

  2009-01-28 11:33:13+00:00 changed by ralphm

While this ticket is still open, I was reminded about some discussion over at #2741 on the selection of different serializer implementations in domish that should be considered in light of the provided patch.

  2009-01-28 19:41:14+00:00 changed by twonds

  • cc changed from ralphm, exarkun to ralphm, exarkun, twonds

  2009-02-07 06:59:35+00:00 changed by exarkun

  • status changed from new to closed
  • resolution set to duplicate

I was hoping Jack would notice my comment and agree with my assessment that this is a duplicate or explain why it's not. It's been a couple weeks, though, so I'm just going to call this a duplicate unilaterally.

  2009-02-07 10:29:14+00:00 changed by jack

I'm fine with moving the discussion to the other bug. I didn't respond sooner because right after I filed this I got a series of illnesses. I haven't forgotten, and I appreciate your prodding.

Note: See TracTickets for help on using tickets.