[Twisted-Python] Safe Pickling using banana and jelly

Andrew Bennetts andrew-twisted at puzzling.org
Wed May 28 00:45:38 EDT 2003


On Wed, May 28, 2003 at 06:21:52AM +0200, Heiko Wundram wrote:
> On Tue, 2003-05-27 at 10:01, Christopher Armstrong wrote:
> > It seems you've missed other posts in this thread (or my own? I don't
> > remember now), that point out that jelly can indeed restrict instantiation
> > of arbitrary classes. Implementing another half-broken serialization scheme
> > is definitely not a good solution to any problem :-)
> 
> The thing being: I don't like any serialization scheme that just says:
> Okay, all data is fine with us, we'll coredump if we deconstruct
> anything bad or obnoxious... (sort'a like that) I guess that's not the
> point when doing serialization for data received over the net...

Twisted Spread is intended to be safe and secure, because data from the net
could be from anywhere, so it has to be robust in the face of dangerous
data.

> My intention behind creating my own serialization scheme were the
> following:
> 
> 1. I can control how the data is actually stored in the pickle (classes
> aren't pickled as is, but instead get the chance to __serialize
> themselves; that's just mildly important, but... Hey. :)) And foremost:
> I can control exactly which classes are okay for
> serialization/unserialization.

That's what __getstate__ and __setstate__ are for with pickle; IIRC Jelly
supports that and also it's own 'getStateToCopyFor' or something along those
lines.

> 2. Both Jelly and Banana can't handle a special case where I have an
> instance of class y (derived from class x), and want the serialization
> stream to show that class x was sent. Examples of this: My current
> program has a Host class, and a LocalHost class which is derived from
> it. The LocalHost, when sent to other hosts, should just get serialized
> as a Host. Normally you'd have to "cast" the LocalHost to a host object,
> and then serialize that. My scheme can handle "alias classes", which get
> serialized using the other's name, making sure that the other class
> doesn't even get a chance to serialize private attributes.

I'm sure PB can handle this.  Have you read "PB Copyable: Passing Complex
Types", http://twistedmatrix.com/documents/howto/pb-copyable? 

> 3. Support for different serialization protocols. I guess I've not made
> the mistake that most other pickle-like module authors made, that they
> didn't implement support for more than one protocol right from the
> start... (hypocrisy... ;)) This e.g. means that object-ids (which
> basically are names for the pickled class) can have different meanings,
> when the protocol changes.

I'm not quite sure what you're saying here.

> 4. And most important for me (except 1): I can control exactly what data
> gets put into the unserialized classes, as all __unserialize functions
> are called in reverse order (when having classes which are bases to
> other classes, etc.). The point being: when I load a Host from network,
> I have to reduce several timeouts as "local" timeouts are much higher
> than timeouts for instances received over the net. As I don't reduce
[...]

Again, read http://twistedmatrix.com/documents/howto/pb-copyable.  With PB
you can control exactly what data is sent over the wire.  You also get
Referenceables, Copyables and Cacheables, which is considerably richer than
a simple object protocol that just passes the occasional object-graph.

> 5. My pickles are signed data items, and while signing banana'd packets
> is no problem, I have integrated signature/digesting algorithms with the
> stream class, which allows for checking of SHA-digests and the like as
> they are being written out. I don't really know any other pickler that
> takes care of checksumming packets right in the pickler. You can of
> course turn off this functionality... :)

TCP guarantees the data isn't corrupted on the way through.  Use SSL to 
guarantee that the traffic isn't being sniffed or tampered with in transit.
PB has an authentication step so that you can know who the clients are.
Between all this, I'm not sure what problem your signatures are solving that
isn't already solved?

> 6. Without being hypocrite, I guess I can say that the serialization
> scheme is so simple, that there couldn't be much holes an attacker could
> come through... Tests like the things Andrew did will certainly not
> work, and the burden is put on the programmer to make sure that the
> classes he registers as being serializable are really classes that are
> okay to serialize (destroy, etc.). All base classes are safe to
> serialize. And actually only int, string, unicode, complex, long and
> float ever get written out to the stream, all other objects being
> abstractions of a _DataStream (which is basically a list kind of
> object).

Banana only understands a very small number of primitives too.  PB has been
implemented in languages other than Python, so I don't think it's overly
complex -- and I believe Brian Warner is working on a rewrite that might
make it simpler, faster and more flexible.

> > btw, thanks to Andrew Dalke for an *excellent* beginning of a security
> > audit for jelly. :-)
> 
> Thanks here too... Made me finally give myself the turn to actually
> write my own serializer... ;) I've wanted to do that for a long time,
> but I've always feared that it would be an overly complex task.
> Yesterday proved me wrong about that... :) I needed above functionality
> 1+4 more than once, when it wasn't available, and that always produced
> pretty crufty code...

As far as I can see (although I'm no PB expert), PB satisfies 1 and 4.  What
makes you think it doesn't?

-Andrew.





More information about the Twisted-Python mailing list