Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#3633 enhancement closed duplicate (duplicate)

C based domish.Element serializer

Reported by: jack Owned by: jack
Priority: normal Milestone:
Component: words Keywords: xish domish xml xmpp
Cc: ralphm, Jean-Paul Calderone, twonds Branch:
Author:

Description

For most XMPP applications, XML serialization is the bottleneck. While #2477 improves the Python serializer quite a bit, this can be made faster by dropping down to C.

I've written a Python c module called cserialize which serializes domish.Element objects. This module is compatible with domish and passes most tests. It gives slightly more than a factor of 2 performance increase over the code in #2477.

The failing tests have to do with namespace issues, and after investigating the code, I believe the tests are wrong and need to be rewritten. I haven't rewritten them as I wanted to discuss it first.

You can find the cserialize code here: http://github.com/metajack/cserialize

Attachments (1)

cserialize.diff (1.7 KB) - added by jack 9 years ago.
patch to domish.py to use cserialize

Download all attachments as: .zip

Change History (8)

Changed 9 years ago by jack

Attachment: cserialize.diff added

patch to domish.py to use cserialize

comment:1 Changed 9 years ago by jack

I just noticed the patch includes an unintended change to the namespace prefixes. Sorry about that.

comment:2 Changed 9 years ago by ralphm

Cc: ralphm Jean-Paul Calderone added
Keywords: review removed
Owner: changed from Jean-Paul Calderone to jack

First of all, thanks for working on this!

Why not include the C-based serializer in Twisted proper? That would remove the additional external dependency and allow for easier maintenance.

The patch itself is way too complicated. In previous times, there were multiple serializer classes, and they could be switched by setting the public module attribute SerializerClass to point to the serializer class of choice. At some point the even slower _Serializer was removed and _ListSerializer was made the default.

Assuming the inclusion of the c-based serializer as a module, we can just try to import it and have a class _CSerializer (or something) that provides serialize and __init__ like _ListSerializer. If available, it can be assigned to SerializerClass in preference to the pure Python one. An interface definition might be in order.

Could you expand on the failing tests?

comment:3 Changed 9 years ago by Jean-Paul Calderone

A procedural interjection:

#2477 is already a ticket for XML serialization optimization. This ticket is a duplicate. It's not really important that this ticket was filed for a different set of optimizations. The specific changes which result in fast XML serialization aren't very interesting. What's interesting is that XML serialization is fast. In the light of the 2x improvement of the approach of using a C library instead of a Python library, one thought I have is that the Python optimizations are irrelevant and should be discarded. They obscure the code and don't bring its performance even close to the performance this optimization gives. That said, I'd also really like to explore using an existing C library to do this serialization rather than including a completely custom C extension. However, regardless of what we ultimately decide to do, #2477 is the ticket to associate discussion with.

So, Jack, could you close this ticket as a duplicate and raise the ideas you've presented here on #2477?

comment:4 Changed 9 years ago by ralphm

While this ticket is still open, I was reminded about some discussion over at #2741 on the selection of different serializer implementations in domish that should be considered in light of the provided patch.

comment:5 Changed 9 years ago by twonds

Cc: twonds added

comment:6 Changed 9 years ago by Jean-Paul Calderone

Resolution: duplicate
Status: newclosed

I was hoping Jack would notice my comment and agree with my assessment that this is a duplicate or explain why it's not. It's been a couple weeks, though, so I'm just going to call this a duplicate unilaterally.

comment:7 Changed 9 years ago by jack

I'm fine with moving the discussion to the other bug. I didn't respond sooner because right after I filed this I got a series of illnesses. I haven't forgotten, and I appreciate your prodding.

Note: See TracTickets for help on using tickets.