[Twisted-Python] Large Transfers

Sat May 10 12:03:18 EDT 2003

On Saturday 10 May 2003 08:17 am, Glyph Lefkowitz wrote:
> On Saturday, May 10, 2003, at 08:59 AM, Uwe C. Schroeder wrote:
> > Is this really a good thing to do ? Shouldn't pb see that the
> > arguments are
> > larger than 640k and start paging ?
>
> No, because this would violate lots of order-of-execution guarantees
> that PB normally provides.
>
> Let's say that you did
>
> 	foo.callRemote("call1", "x" * 1024 * 1024)
> 	foo.callRemote("call2", "x")
>
> You would expect 'call1' to execute before 'call2', right?  but no -
> because call1 began paging its arguments, call2 will be sent
> interleaved (the desired result for paged calls) and would execute
> first.

Hmmm - you'd call foo.callRemote("call2", "x") in the callback of the first 
call, since both calls return a defered. You can't rely on the first call 
being completed before issuing the next one. Well, you can, but that is not 
quite safe. If the two calls depend on each other you have to call the second 
one in the callback of the first one. If the calls are not related, the 
execution order isn't that important (well, at least I can't think of a case 
right now where it would be)

>
> The same thing would be true of sending return values.
>
> > What I'm doing is to hand down XML data which is database-generated on
> > the
> > server side. Whenever a user requests a too large resultset the
> > network layer
> > fails. On the other hand the resultset already is in memory, so why
> > not jut
> > transfer it ?
> > I realize that his is probably bad design

> If your goal is to facilitate bad design with huge gobs of XML, PB is
> probably not for you.  There are a number of other protocols which are
> designed for exactly this kind of application - HTTP, XML/RPC, SOAP,
> depending on your level of complexity.  Twisted provides native support
> for the first 2, and SOAP could probably be added without too much
> trouble.

Yep, XML/RPC might be the right thing. SOAP has nice features but I don't like 
the bulky approach.

> > , but it's the easiest way to
> > transfer this information. Sure I can write the stuff to a temporary
> > file and
> > page it over, however this defies the purpose, since then the original
> > call
> > results in a message to go get the file. This means I need at least 4
> > callbacks for any given call (the original ok callback, then another
> > one for
> > the possible paging as well as 2 error callbacks, one for each call)
>
> The error-handling needs to be improved, but this is what
> 'twisted.spread.util.getAllPages' is for.  You only need one callback.

Didn't see that one.

> > I can extend the problem by compressing the parameters with zlib (
> > which I'm
> > doing anyways), but at some point I will hit the limit.
>
> Hard limits like this one should never be pushed so closely.  If your
> data is more than, say, 60k, you should probably be looking at paging
> it.  More than 500k and you are definitely abusing the protocol.

no argument about that

> > The other problem this creates is a timing issue. Since I have to make
> > several
> > calls in order to transfer the resultset, I have to delay database
> > calls
> > until the whole resultset is valid.
> > To put it more technically:
> >
> > self.perspective.transfer_large_result(small_int,label,large_result_arr
> > ay)
> >
> > will fail if large_result_array exceeds 640k. However small_int and
> > label can
> > be transferred. The only way to do this is
> >
> > if numberofbytes(large_result_set) >= 640k:

> Wrong.  If numberofbytes(large_result_set) + banana_epsilon(()) +
> pb_call_overhead ...

yeah - just wwanted to keep it simple :-)

> This is not a number that you can calculate reliably.  640k is a hard
> high limit.
>
> > self.perspective.transfer_first_part(small_int,label).callback(self.sma
> > llpart_ok)
>
> I don't understand.  Do you want to get the whole result at once?  Or
> do you want to send it only when necessary?  If you want to send it
> only when necessary then aren't these two steps required anyway?  If
> not, then can't you use the methods in twisted.spread.util to retrieve
> the pager when you would normally be retrieving a string, in the same
> step?

I do not WANT to get the whole thing, sadly I NEED the whole thing.

> It would be helpful for my understanding if you would use real method
> names like "addCallback" and "callRemote" here.  I don't have any idea
> what 'self.perspective' is, or whether 'transfer_first_part' is
> supposed to be remote or local.

well, I thought that would be clear. perspective is a pb.Perspective, callback 
is addCallback. The whole thing is on the initiating side, which could be 
server or client, since those chunks get sent in both directions and I have 
the initiating side actively SEND it and not the receiver fetch it.

>
> > and in smallpart_ok
> >
> > self.page_the_rest(large_result_array).callback(self.whole_stuff_transf
> > ered)
>
> Again, not really sure what you're saying.  Why not -
>
> 	from twisted.spread import util
> 	util.getAllPages(serverThingy, "getStuff", small_int,
> label).addCallback(lambda l: ''.join(l)).addCallback(gotABigString)
>
> > This is an enormous overhead.
>
> as far as I understand it, the only overhead you require is that line
> above.  But I admit I do not understand it terribly well.

Me neither :-) I'll do some tests to see if I can put that in easily. I'd 
prefere a solution that uses standard Twisted without modifying it.

> > What even strikes me more is that this size limit not even prevents
> > large
> > memory consumption - since the object is already there and in cBanana
> > the
> > object is already stored in the buffer.
>
> Where is 'there'?
>
> On the side of the connection that wishes to send the data, it's in
> memory.  If you modify the sending side locally, the whole string may
> even be in your outgoing buffer there, but on the receiving side only
> the beginning will be, since upon receiving the length it will
> terminate the connection.  (At least, I don't think TCP normally sends
> multi-megabyte packets.)

Nope, TCP doesn't. Why will it terminate the connection ? 

> > I think I'll just remove all size limits and go thru the (unwanted)
> > way to create a own package.
>
> Please, don't do this.  If PB is not working with your application, use
> something else.  This kind of a brute-force solution will undoubtedly
> cause problems down the road, and the people most suited to help you
> with them will not be interested in doing so.
>
> There's nothing wrong with a hybrid approach, either.  You could
> transfer the file over HTTP rather than in the PB connection, but still
> use PB as your control protocol.  You could implement an even simpler
> file-transfer protocol reminiscent of HTTP/0.9 rather than use
> Twisted's full HTTP layer.

For some reasons beyond my influence I can't use more than one port. If I 
could talk "them" into using several ports I'd love to do that. For that 
reason I have to find a way to handle everything with one port. XML/RPC was 
my original protocol choice, however I think pb is much nicer :-)

	UC

--
Open Source Solutions 4U, LLC	2570 Fleetwood Drive
Phone:  +1 650 872 2425		San Bruno, CA 94066
Cell:   +1 650 302 2405		United States
Fax:    +1 650 872 2417