[Twisted-Python] Re: Disabling PB (de)serialization

Brian Warner warner at lothar.com
Wed Sep 7 22:07:33 EDT 2005

> Mostly, *very* long lists of small objects, each containing a few numbers
> and short strings.

What order of magnitude are we talking about? Would something like
[(1,12,"foo","bar") for i in range(10000)] be close? If so, I'll use this as
one of the benchmark cases.

> In my case, the data is going between instances of a VNC server and a
> VNC viewer. The viewer, of course, isn't sending much, but using PB rref
> method calls to transfer the data from the VNC server was enough to max
> out my Athlon 2200, albeit with both the PB client and the PB server
> running on the same machine (it was just a prototype).

So most of the data is opaque VNC blobs? What kind of a size-histogram are we
talking about? Or is this a python implementation of the VNC protocol?

FYI, newpb is scheduled to have an opportunistic string-caching scheme in
which any string that gets sent over the wire more than a couple times gets
replaced by a VOCAB token with a number. The idea is to compress all the
standard internal PB sequences (like "list", "tuple", "my-reference", "call")
into short two-byte tokens, and for the sender to decide which strings get
tokenized these ways (there will be a special sequence that adds/removes
things from the receiver's mapping). Incidentally, oldpb used a "dialect"
number (of which there was only one) which indicated a static list of strings
to tokenize this way.

I haven't implemented this part yet, but when I do I'll be curious about how
to keep it from thrashing on the strings in user data. I'm vaguely planning
on something that ignores any string longer than 20 characters, keeps a list
of 100 or so with a counter for each, when the counter hits 3 the word gets
VOCABized, if the list is full when a new word is introduced then an old one
gets thrown out at random. No idea how this will perform.. worst case I'll go
back to a static list, but still have it sender-chosen (based upon just the
strings that actually appear in the newpb code), making it more flexible and
less negotiation-heavy than oldpb's approach.


More information about the Twisted-Python mailing list