[Twisted-Python] Safe Pickling using banana and jelly

Heiko Wundram heikowu at ceosg.de
Tue May 27 22:21:52 MDT 2003


On Tue, 2003-05-27 at 10:01, Christopher Armstrong wrote:
> It seems you've missed other posts in this thread (or my own? I don't
> remember now), that point out that jelly can indeed restrict instantiation
> of arbitrary classes. Implementing another half-broken serialization scheme
> is definitely not a good solution to any problem :-)

The thing being: I don't like any serialization scheme that just says:
Okay, all data is fine with us, we'll coredump if we deconstruct
anything bad or obnoxious... (sort'a like that) I guess that's not the
point when doing serialization for data received over the net...

My intention behind creating my own serialization scheme were the
following:

1. I can control how the data is actually stored in the pickle (classes
aren't pickled as is, but instead get the chance to __serialize
themselves; that's just mildly important, but... Hey. :)) And foremost:
I can control exactly which classes are okay for
serialization/unserialization.

2. Both Jelly and Banana can't handle a special case where I have an
instance of class y (derived from class x), and want the serialization
stream to show that class x was sent. Examples of this: My current
program has a Host class, and a LocalHost class which is derived from
it. The LocalHost, when sent to other hosts, should just get serialized
as a Host. Normally you'd have to "cast" the LocalHost to a host object,
and then serialize that. My scheme can handle "alias classes", which get
serialized using the other's name, making sure that the other class
doesn't even get a chance to serialize private attributes.

3. Support for different serialization protocols. I guess I've not made
the mistake that most other pickle-like module authors made, that they
didn't implement support for more than one protocol right from the
start... (hypocrisy... ;)) This e.g. means that object-ids (which
basically are names for the pickled class) can have different meanings,
when the protocol changes.

4. And most important for me (except 1): I can control exactly what data
gets put into the unserialized classes, as all __unserialize functions
are called in reverse order (when having classes which are bases to
other classes, etc.). The point being: when I load a Host from network,
I have to reduce several timeouts as "local" timeouts are much higher
than timeouts for instances received over the net. As I don't reduce
them when I send them out, I just reduce them when they come in. As
such, the Host class is derived from another class, which is called
TimeoutBase, which controls this timeout stuff, as there are other
classes which also need timeout behaviour. With all "normal" picklers,
the Host instance would be created without it ever having the chance to
change the private data of the TimeoutBase (yeah, I use private
variables for all class data), except by calling some kind of utility
function like checkTimeout() (yuck...) or hacking by changing private
attributes (even more yuck...). I think this is pretty crufty. When the
unserializer just calls all bases __unserialize in turn, I can just have
the TimeoutBase store (just) it's values to network when __serialize is
called, and when __unserialize is called load them again, and
check/reduce them right as they are being put into the class. This also
protects against broken objects which are received from the network,
which I would have to catch using some kind of checkInstance() logic
too... (objects which are loaded with incomplete data, eg.)

5. My pickles are signed data items, and while signing banana'd packets
is no problem, I have integrated signature/digesting algorithms with the
stream class, which allows for checking of SHA-digests and the like as
they are being written out. I don't really know any other pickler that
takes care of checksumming packets right in the pickler. You can of
course turn off this functionality... :)

6. Without being hypocrite, I guess I can say that the serialization
scheme is so simple, that there couldn't be much holes an attacker could
come through... Tests like the things Andrew did will certainly not
work, and the burden is put on the programmer to make sure that the
classes he registers as being serializable are really classes that are
okay to serialize (destroy, etc.). All base classes are safe to
serialize. And actually only int, string, unicode, complex, long and
float ever get written out to the stream, all other objects being
abstractions of a _DataStream (which is basically a list kind of
object).

> btw, thanks to Andrew Dalke for an *excellent* beginning of a security
> audit for jelly. :-)

Thanks here too... Made me finally give myself the turn to actually
write my own serializer... ;) I've wanted to do that for a long time,
but I've always feared that it would be an overly complex task.
Yesterday proved me wrong about that... :) I needed above functionality
1+4 more than once, when it wasn't available, and that always produced
pretty crufty code...

Heiko.





More information about the Twisted-Python mailing list