[Twisted-Python] Streaming File Transfer Protocol?

David Bolen db3l.net at gmail.com
Sat Feb 13 18:22:10 MST 2010


Darren Govoni <darren at ontrenet.com> writes:

> I spoke too fast. But pardon my noobiness.
>
> Ok, so I am using a simple protocol that is listening on a TCP port.
>
> One the client side, I write 4096 bytes using
> self.transport.write(bytes)
>
> on dataReceived side, I get only 1448. 

Quite possible, and even likely with a chunk of 4096, given likely
network latencies and the physical packet sizes at each network hop
along the way.

However, dataReceived will eventually be called additional times until
all of the 4096 bytes that was transmitted and received over the
socket connection have been handed off to your protocol.  That's just
the nature of a stream protocol - it's a constant stream of data being
fed by one end and drained on the other, without any natural
boundaries or structures within (other than, I suppose, the boundary
of an octet since you can't receive a partial octet).

The alternative is to use a datagram protocol like UDP, but then you
have all the negatives of no guaranteed delivery, out of order
delivery, completely impossible delivery (when trying a datagram
larger than the UDP limit), etc...

Far easier to just handle the TCP stream properly.

> Now, what I "want" to happen is when I issue a write of a known
> number of bytes. I "want" those bytes to arrive in total because
> they represent a pickled object.  The server has no idea if the
> bytes are split and scattered (again, I want the control protocol to
> take affect).

I suspect it may just be a difference in phrasing, but note that I
consider "arrive in total" to be different from "arrive in the same
number of I/O operations".  TCP guarantees the former (sans dropped
connections) but not the latter.  It's a trade-off that you make in
order to get the other benefits of guaranteed delivery with TCP,
regardless of network disruptions, latency, etc...

You're fine as long as you just accept up front that you can't make
any assumptions as to how the data will arrive at the receiving end.
So combine the data in whatever sizes it is received (and any number
of received chunks) until you have it all.  You can then de-pickle it
or do anything else with it.  As a comparison, that's really all PB is
doing, although it's banana-encoding the object on the wire rather
than pickling.

Depending on the client/server interaction, you may also have the
opposite problem - the final chunk of data received may cover more
than one client transmission, and you'll have to split it up
appropriately.

That's why if you will be transmitting multiple sets of data over a
single connection, you'll want some structure (unique boundary codes,
encoded length information, parseable data like XML, etc...) in the wire
protocol so your server knows when it is done.

> 1) Am I doing something wrong here?

Not so much wrong, as perhaps a little misguided in terms of trying to
have a stream protocol work less as a stream than it does.

I suspect you may also be over-estimating a little the complexity of
handling this aspect of TCP in your own code.

> 2) Can I force twisted to send ALL the bytes I issue in the write
> without re-thinking TCP or forcing me to re-implement TCP?

Again, distinguish between "send ALL the bytes" which *does* in fact
happen, versus "receive bytes in identically sized chunks" which will
not happen.  Though I seriously doubt that your demands are such that
it requires "re-thinking" or "re-implement[ing]" TCP.

Much easier to stick with the TCP base (loads of benefits), and just
encode enough structure into your stream to permit the server to
identify the boundaries of the requests.  Then, code the server to
look for such boundaries while accepting data in any size chunks, and
you're done.  It's pretty much what every other TCP protocol that has
structure to its data does, whether that's length counted, flag bytes,
specific textual content (such as the final empty line in an HTTP
request), etc...

As has been posted in another response, you may find some of the
existing protocols in twisted.protocol.basic to be helpful for this.
The older posting of mine that you referenced used a subclass of
LineReceiver to encode the length in ASCII as part of an initial
header, for example, though it closed the connection when done.  And,
for example, Netstring or the Int##String classes takes care of the
counting on your behalf, and even give subclasses a nice single entry
point (stringReceived) to use instead of dataReceived, so your server
need not think about the aggregation or splitting of chunks.

If nothing else, reading the source to one of those receiver classes
might help provide a concrete example of the aggregation (or
splitting) of the stream data that I mention above.

-- David





More information about the Twisted-Python mailing list