[Twisted-Python] Send many large files with PB

Brian Warner warner at lothar.com
Wed May 17 02:41:12 EDT 2006


Justin Mazzola Paluska <jmp at MIT.EDU> writes:

> - Should I send the files from SRC to DEST one-by-one?

That's how I would do it. If you're talking about gigabyte-sized files, the
protocol overhead will be pretty minimal compared to the data being
transferred. You've got a couple of objects to keep track of for each file
being sent, but on the other hand it will be a lot easier to keep track of
how much progress you've made (and keep the user informed) that way.

> - Or, is it better to use something like tarfile module to create a
>   stream of bytes that I stream to the other side and decode?

I would recommend this approach if you had a bunch of small files. You want
to run that 'tar cf - WHAT' child against a ProcessProtocol that reacted to
dataReceived(data) by doing a rref.callRemote("moreDataForYou", data). You'd
probably want to accumulate data into chunks of maybe 4k or so to increase
efficiency. At the far end, your remote_moreDataForYou() call would write
that data into the untarring ProcessProtocol. Take a look at
doc/core/howto/process.xhtml for details on ProcessProtocols and
reactor.spawnProcess.

> - Finally, should I be doing something completely different?
>   Normally, outside of my application, I'd just use rsync, scp, or
>   some such.

I'd certainly investigate this method if the most of the files you are
sending are already in place on the far end. The bandwidth savings are worth
the extra setup hassle.

Is there a way to get rsync to speak to stdout/stdin instead of using a TCP
socket? If so, you could spawnProcess('rsync') and proxy it to the far end
over PB as with 'tar' above. Or, you could have your PB-connection-wielding
process listen on a local TCP socket, then tell rsync to talk directly to
that port, then do a socket-level proxy over PB to the far system.

Also remember that scp (or rsync-over-ssh or tar|ssh, etc) will be doing
better authentication than PB, since PB is all in cleartext. Many
applications don't require confidentiality, but before you switch from ssh to
straight PB you should be aware of what exactly you're giving up.

<shameless plug>
But, if you use NewPB, you get the strong authentication and confidentiality
of ssh with all of the juicy RemoteReference model you've come to know and
love from PB, check out NewPB[1] today.
</shameless plug>.


cheers,
 -Brian

[1]: http://twistedmatrix.com/trac/wiki/NewPB




More information about the Twisted-Python mailing list