[Twisted-Python] Send many large files with PB

Justin Mazzola Paluska jmp at MIT.EDU
Wed May 17 09:17:37 EDT 2006


On Tue, May 16, 2006 at 11:41:12PM -0700, Brian Warner wrote:
> Justin Mazzola Paluska <jmp at MIT.EDU> writes:
> 
> > - Should I send the files from SRC to DEST one-by-one?
> 
> That's how I would do it. If you're talking about gigabyte-sized files, the
> protocol overhead will be pretty minimal compared to the data being
> transferred. You've got a couple of objects to keep track of for each file
> being sent, but on the other hand it will be a lot easier to keep track of
> how much progress you've made (and keep the user informed) that way.

OK.  I could also possibly stream multiple files at once with this
method, which is an added bonus.

> > - Finally, should I be doing something completely different?
> >   Normally, outside of my application, I'd just use rsync, scp, or
> >   some such.
> 
> I'd certainly investigate this method if the most of the files you are
> sending are already in place on the far end. The bandwidth savings are worth
> the extra setup hassle.

For this particular job, none of the files are initially in place on
the remote end, so rsync itself won't be a big win.

> Is there a way to get rsync to speak to stdout/stdin instead of using a TCP
> socket? If so, you could spawnProcess('rsync') and proxy it to the far end
> over PB as with 'tar' above. Or, you could have your PB-connection-wielding
> process listen on a local TCP socket, then tell rsync to talk directly to
> that port, then do a socket-level proxy over PB to the far system.

For future reference, I think there are ways of hacking this (these
statements are conjectures, I haven't actually tried them):

- on the side pushing data, use --rsh= some script that just takes the
  output of rsync and pushes it to stdout.

- on the side receiving the data, use --server to read from stdin.

> Also remember that scp (or rsync-over-ssh or tar|ssh, etc) will be doing
> better authentication than PB, since PB is all in cleartext. Many
> applications don't require confidentiality, but before you switch from ssh to
> straight PB you should be aware of what exactly you're giving up.

Our PB connections go over SSL and we have a custom auth module, so
piping everything over PB wouldn't be a big loss.

> <shameless plug>
> But, if you use NewPB, you get the strong authentication and confidentiality
> of ssh with all of the juicy RemoteReference model you've come to know and
> love from PB, check out NewPB[1] today.
> </shameless plug>.

I've been reading about NewPB and it might be exactly what we'll need
for the next revision of our application.  We're just too close to
pushing out this version to switch to a new RPC method for the core of
the program.

Thanks,
	--Justin




More information about the Twisted-Python mailing list