[Twisted-Python] Large Transfers

Glyph Lefkowitz glyph at twistedmatrix.com
Sat May 10 11:17:31 EDT 2003


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Saturday, May 10, 2003, at 08:59 AM, Uwe C. Schroeder wrote:

> Is this really a good thing to do ? Shouldn't pb see that the  
> arguments are
> larger than 640k and start paging ?

No, because this would violate lots of order-of-execution guarantees  
that PB normally provides.

Let's say that you did

	foo.callRemote("call1", "x" * 1024 * 1024)
	foo.callRemote("call2", "x")

You would expect 'call1' to execute before 'call2', right?  but no -  
because call1 began paging its arguments, call2 will be sent  
interleaved (the desired result for paged calls) and would execute  
first.

The same thing would be true of sending return values.

> What I'm doing is to hand down XML data which is database-generated on  
> the
> server side. Whenever a user requests a too large resultset the  
> network layer
> fails. On the other hand the resultset already is in memory, so why  
> not jut
> transfer it ?
> I realize that his is probably bad design

If your goal is to facilitate bad design with huge gobs of XML, PB is  
probably not for you.  There are a number of other protocols which are  
designed for exactly this kind of application - HTTP, XML/RPC, SOAP,  
depending on your level of complexity.  Twisted provides native support  
for the first 2, and SOAP could probably be added without too much  
trouble.

> , but it's the easiest way to
> transfer this information. Sure I can write the stuff to a temporary  
> file and
> page it over, however this defies the purpose, since then the original  
> call
> results in a message to go get the file. This means I need at least 4
> callbacks for any given call (the original ok callback, then another  
> one for
> the possible paging as well as 2 error callbacks, one for each call)

The error-handling needs to be improved, but this is what  
'twisted.spread.util.getAllPages' is for.  You only need one callback.

> I can extend the problem by compressing the parameters with zlib (  
> which I'm
> doing anyways), but at some point I will hit the limit.

Hard limits like this one should never be pushed so closely.  If your  
data is more than, say, 60k, you should probably be looking at paging  
it.  More than 500k and you are definitely abusing the protocol.

> The other problem this creates is a timing issue. Since I have to make  
> several
> calls in order to transfer the resultset, I have to delay database  
> calls
> until the whole resultset is valid.
> To put it more technically:
>
> self.perspective.transfer_large_result(small_int,label,large_result_arr 
> ay)
>
> will fail if large_result_array exceeds 640k. However small_int and  
> label can
> be transferred. The only way to do this is

> if numberofbytes(large_result_set) >= 640k:

Wrong.  If numberofbytes(large_result_set) + banana_epsilon(()) +  
pb_call_overhead ...

This is not a number that you can calculate reliably.  640k is a hard  
high limit.

> self.perspective.transfer_first_part(small_int,label).callback(self.sma 
> llpart_ok)

I don't understand.  Do you want to get the whole result at once?  Or  
do you want to send it only when necessary?  If you want to send it  
only when necessary then aren't these two steps required anyway?  If  
not, then can't you use the methods in twisted.spread.util to retrieve  
the pager when you would normally be retrieving a string, in the same  
step?

It would be helpful for my understanding if you would use real method  
names like "addCallback" and "callRemote" here.  I don't have any idea  
what 'self.perspective' is, or whether 'transfer_first_part' is  
supposed to be remote or local.

> and in smallpart_ok
>
> self.page_the_rest(large_result_array).callback(self.whole_stuff_transf 
> ered)

Again, not really sure what you're saying.  Why not -

	from twisted.spread import util
	util.getAllPages(serverThingy, "getStuff", small_int,  
label).addCallback(lambda l: ''.join(l)).addCallback(gotABigString)

> This is an enormous overhead.

as far as I understand it, the only overhead you require is that line  
above.  But I admit I do not understand it terribly well.

> What even strikes me more is that this size limit not even prevents  
> large
> memory consumption - since the object is already there and in cBanana  
> the
> object is already stored in the buffer.

Where is 'there'?

On the side of the connection that wishes to send the data, it's in  
memory.  If you modify the sending side locally, the whole string may  
even be in your outgoing buffer there, but on the receiving side only  
the beginning will be, since upon receiving the length it will  
terminate the connection.  (At least, I don't think TCP normally sends  
multi-megabyte packets.)

> I think I'll just remove all size limits and go thru the (unwanted)  
> way to create a own package.

Please, don't do this.  If PB is not working with your application, use  
something else.  This kind of a brute-force solution will undoubtedly  
cause problems down the road, and the people most suited to help you  
with them will not be interested in doing so.

There's nothing wrong with a hybrid approach, either.  You could  
transfer the file over HTTP rather than in the PB connection, but still  
use PB as your control protocol.  You could implement an even simpler  
file-transfer protocol reminiscent of HTTP/0.9 rather than use  
Twisted's full HTTP layer.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (Darwin)

iD8DBQE+vRgQvVGR4uSOE2wRAv60AJ46qvOBQAjiliEBKIAuGqP1vtibuwCff6DM
099lnO4JoOM0PphdHPK3+Ec=
=O9xE
-----END PGP SIGNATURE-----





More information about the Twisted-Python mailing list