[Twisted-Python] Uploading multiple files using ftpclient in Twisted

Jaepyoung Kim jaepyoung.kim at gmail.com
Sat Jul 10 20:26:19 EDT 2010


Great thanks for your perfect answer.

I think Disk IO will not be a bottleneck.
There are four servers which share disks.
I executed the scrpit in seperate servers and this reduced the upload
speed a lot.
After I saw this performance improvement, I started changing script.

I will try your suggestion..


On Sat, Jul 10, 2010 at 3:56 PM, David Bolen <db3l.net at gmail.com> wrote:
> Jaepyoung Kim <jaepyoung.kim at gmail.com> writes:
>> The current script is uploading using ftplib and it takes time about 1 hour.
>> I want to change this script to use twisted asynchronous function.
>> I thought if I use asynchronous function in twisted like following,
>> then file uploading will be executed in parallel.
>> But this was executed sequentially. Uploading second file starts afer
>> completing first file upload.
>> Could you check what was wrong in my source code? Or Am I wrong in
>> understanding asynchronous function?
> I'm pretty sure you'll need separate connections to an FTP server to
> achieve parallel transfers, regardless of how you write the client.
> At least as long as you stick with regular get/put commands.  So while
> using a twisted approach can enable you to manage those parallel
> streams pretty easily, you'll need distinct connections for each
> transfer and manage which file transfer is using which connection in
> your code.
> Essentially a store or fetch FTP operation initiates a transfer over
> the dedicated data channel, so that channel is in use until the
> transfer completes or is aborted.  The data on the data channel is not
> encapsulated nor multiplexed in any way so you can only have a single
> transfer using the data channel at once.  Passive transfers do create
> new data channels, but the FTP protocol specifically says a server
> needs to stop accepting connections and shut down any open connections
> on old passive ports once a new passive request is received, so you're
> still limited to one at a time.
> Thus, your callbacks for each store operation, will only file when the
> store has completed, and you'll only be able to initiate the next
> store request at that point since its only then that the channel to
> the server is free to transfer another file.
> I believe some servers have implemented custom extensions to implement
> parallel operations at a finer grained level than a file, but I don't
> think they're commonly implemented in ftp libraries (nor in servers
> commonly in use).
> What I'd suggest, in terms of your code, is to instantiate a pool of
> FTPClients to the same server, initiate transfers on them in parallel
> and then as one completes, use it to pick up the next file.  You'll
> need to handle the distribution of files amongst the pool of clients
> yourself.
> Is there any particular reason you expect this to yield an improvement
> in overall time?  Unless you're transferring very large numbers of
> files that are very small compared to the bandwidth*latency of your
> network connection to the server (which doesn't sound like the case
> here), the overhead of the protocol itself will be quite small, and
> your bottleneck is either going to be the network throughput, or the
> slower of the disk I/O on either end.
> Neither of those bottlenecks will likely be improved by doing multiple
> transfers in parallel, and in fact your total time can worsen if the
> prior bottleneck was the disk I/O since you'll have competing
> operations for the disks as opposed to simple sequential access.  Or
> you may find that you get very marginal benefit with the expense of
> much more complicated to maintain code.
> You might grab an existing ftp client that supports parallel transfers
> and use it to run some tests before trying to re-implement things
> yourself.  There should be several options, but for example, I believe
> FileZilla supports it under Windows, or lftp under Linux.
> -- David
> _______________________________________________
> Twisted-Python mailing list
> Twisted-Python at twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

Jaepyoung Kim
(Cellular phone) 1-310-848-7774

More information about the Twisted-Python mailing list