[Twisted-Python] Large Transfers
glyph at twistedmatrix.com
Sat May 10 11:17:31 EDT 2003
-----BEGIN PGP SIGNED MESSAGE-----
On Saturday, May 10, 2003, at 08:59 AM, Uwe C. Schroeder wrote:
> Is this really a good thing to do ? Shouldn't pb see that the
> arguments are
> larger than 640k and start paging ?
No, because this would violate lots of order-of-execution guarantees
that PB normally provides.
Let's say that you did
foo.callRemote("call1", "x" * 1024 * 1024)
You would expect 'call1' to execute before 'call2', right? but no -
because call1 began paging its arguments, call2 will be sent
interleaved (the desired result for paged calls) and would execute
The same thing would be true of sending return values.
> What I'm doing is to hand down XML data which is database-generated on
> server side. Whenever a user requests a too large resultset the
> network layer
> fails. On the other hand the resultset already is in memory, so why
> not jut
> transfer it ?
> I realize that his is probably bad design
If your goal is to facilitate bad design with huge gobs of XML, PB is
probably not for you. There are a number of other protocols which are
designed for exactly this kind of application - HTTP, XML/RPC, SOAP,
depending on your level of complexity. Twisted provides native support
for the first 2, and SOAP could probably be added without too much
> , but it's the easiest way to
> transfer this information. Sure I can write the stuff to a temporary
> file and
> page it over, however this defies the purpose, since then the original
> results in a message to go get the file. This means I need at least 4
> callbacks for any given call (the original ok callback, then another
> one for
> the possible paging as well as 2 error callbacks, one for each call)
The error-handling needs to be improved, but this is what
'twisted.spread.util.getAllPages' is for. You only need one callback.
> I can extend the problem by compressing the parameters with zlib (
> which I'm
> doing anyways), but at some point I will hit the limit.
Hard limits like this one should never be pushed so closely. If your
data is more than, say, 60k, you should probably be looking at paging
it. More than 500k and you are definitely abusing the protocol.
> The other problem this creates is a timing issue. Since I have to make
> calls in order to transfer the resultset, I have to delay database
> until the whole resultset is valid.
> To put it more technically:
> will fail if large_result_array exceeds 640k. However small_int and
> label can
> be transferred. The only way to do this is
> if numberofbytes(large_result_set) >= 640k:
Wrong. If numberofbytes(large_result_set) + banana_epsilon(()) +
This is not a number that you can calculate reliably. 640k is a hard
I don't understand. Do you want to get the whole result at once? Or
do you want to send it only when necessary? If you want to send it
only when necessary then aren't these two steps required anyway? If
not, then can't you use the methods in twisted.spread.util to retrieve
the pager when you would normally be retrieving a string, in the same
It would be helpful for my understanding if you would use real method
names like "addCallback" and "callRemote" here. I don't have any idea
what 'self.perspective' is, or whether 'transfer_first_part' is
supposed to be remote or local.
> and in smallpart_ok
Again, not really sure what you're saying. Why not -
from twisted.spread import util
util.getAllPages(serverThingy, "getStuff", small_int,
label).addCallback(lambda l: ''.join(l)).addCallback(gotABigString)
> This is an enormous overhead.
as far as I understand it, the only overhead you require is that line
above. But I admit I do not understand it terribly well.
> What even strikes me more is that this size limit not even prevents
> memory consumption - since the object is already there and in cBanana
> object is already stored in the buffer.
Where is 'there'?
On the side of the connection that wishes to send the data, it's in
memory. If you modify the sending side locally, the whole string may
even be in your outgoing buffer there, but on the receiving side only
the beginning will be, since upon receiving the length it will
terminate the connection. (At least, I don't think TCP normally sends
> I think I'll just remove all size limits and go thru the (unwanted)
> way to create a own package.
Please, don't do this. If PB is not working with your application, use
something else. This kind of a brute-force solution will undoubtedly
cause problems down the road, and the people most suited to help you
with them will not be interested in doing so.
There's nothing wrong with a hybrid approach, either. You could
transfer the file over HTTP rather than in the PB connection, but still
use PB as your control protocol. You could implement an even simpler
file-transfer protocol reminiscent of HTTP/0.9 rather than use
Twisted's full HTTP layer.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (Darwin)
-----END PGP SIGNATURE-----
More information about the Twisted-Python