[Twisted-Python] Memory usage in large file transfers

Jp Calderone exarkun at intarweb.us
Tue Dec 2 10:21:42 EST 2003


On Tue, Dec 02, 2003 at 03:46:07PM +0200, Nikolaos Krontiris wrote:
> Hi again.
> 
> ----- Original Message ----- 
> From: "Andrew Bennetts" <andrew-twisted at puzzling.org>
> To: <twisted-python at twistedmatrix.com>
> Sent: Monday, December 01, 2003 2:11 PM
> Subject: Re: [Twisted-Python] Memory usage in large file transfers
> 
> 
> > On Mon, Dec 01, 2003 at 11:25:13AM +0200, Nikolaos Krontiris wrote:
> > >    Hi there.

> > >    I am writing a file transfer program using twisted as my framework.
> > >    I have been having some problems as far as memory usage is
> > >    concerned (i.e. both client and server just eat through available
> > >    memory without ever releasing it back to the kernel while
> > >    transfering data). I am aware that in theory, the client and server
> > >    will consume at least as much memory as the file to be transferred,
> > >    but this memory should also be made available to the O/S after the
> > >    operation has completed. I also use a garbage collector, which
> > >    makes things just marginally better and the only TWISTED operations
> > >    I use are a few transport write and callLater commands.
> >
> > You don't say how large "large" is, but you probably should be using
> > producer/consumer APIs rather than just plain transport.write(data).  See
> > twisted.protocols.basic.FileSender for an example.  If I'm understanding
> > your problem correctly, you should see a significant improvement.  This
> > technique doesn't require holding the entire file in memory to transfer
> > it.
> >
> > I'm not sure what you mean about using a garbage collector -- Python
> > automatically cleans up objects with zero reference counts, and
> > periodically finds and collects unreachable object cycles.
> >
> nk: The size of files I am referring to can be anything from 20MB up to
> 500MB, but right now I'm taking it easy with the client/ server model; I'm
> using sending a single 43MB file, and as I'm debugging and improving
> performance, I will increase this filesize...
> nk: I had originally thought about using basic.FileSender but a) It has been
> commented as unstable by the twisted development team and b)I need to send a
> client ID each time I send a single buffer (security... what can you
> say...).

  Comments of "unstable" in Twisted sources refer to the fact that we are
not promising that the API will not change.  The code in question is bug
free, as far as I know.

  Client ID?  I assume you're using TCP; what do you need to add an ID to
every packet for?

> To make sure that I'm not holding the entire file's contents in
> memory, I read (at most) 64K of  the file each time, and send this data
> away. After it has been sent, this data buffer is flushed. I guess I can try
> to change this to file.open, file.seek, file.read and file.close each time I
> read the file so that the only contents of the file in system memory are the
> only ones necessary...

  That is not necessary.

> nk: When talking about the garbage collector, I'm just referring to python's
> gc.enable() and gc.collect() commands, nothing more... Unfortunately I don't
> believe that the built-in periodical find-and-collect unreachable object
> cycles is very useful in the case of the client, since it shuts down after
> the file's EOF...

  Unless you are preventing the garbage collector from working, I don't see
why this should be the case.  To see if your program is building up
unreasonable amounts of unfreeable objects, look at the gc.garbage list. 
Any objects the garbage collector has found can be freed but which it cannot
actually free will end up there.

> [snip]
> >
> > >    As professional network programmers, do you believe my diagnosis is
> > >    correct? Have you encountered such problems in the past? Are there
> > >    workarounds for this?
> >
> > I really can't say.  You've given no specific data at all... How large are
> > the files?  How much memory does your server appear to lose per request?
> > How much memory does the server take overall (both initially and after
> > running for a while)?  How many concurrent requests are you dealing with?
> > What platform, version of Python, and version of Twisted?  Anything else
> you
> > think is relevant?  :)
> >
> nk: Right now, I'm testing the server/ client model with a 43 MB file. The
> memory consumed on a WinMe system using Python 2.3.2 and Twisted 1.1.0 with
> a 64K buffer is 58MB, while with a 4KB buffer is around the 80MB region. On
> Linux using Python 2.3.2 and Twisted 1.1.0, the memory consumed with a 4K
> buffer is always a bit more than 100MB. I can't use very large buffers on my
> Linux system, because of the ID I have to send per buffer sent. It seems
> that the linux default SOL_SOCKET, SO_RCVBUF sizes are relatively small, so
> it confuses the client ID since the packets it receives will have different
> sizes... Note that these results are for 1 server and 1 client. I have not
> yet dared do 2 concurrent clients at once!

  You absolutely must not rely on the number of bytes received from a
particular read from a socket.  I cannot stress this enough.  You
*absolutely* *can* *not* rely on it.

> nk: The server consumes only 3MB of memory while idle. Unfortunately, I
> cannot tell if the erratic memory consumption lies on the server or client
> side (or both), since I only have 1 PC...

  The client and server are still separate processes.  You should be able to
view resource usage for each individually.

  Jp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://twistedmatrix.com/pipermail/twisted-python/attachments/20031202/6d92cfc8/attachment.pgp 


More information about the Twisted-Python mailing list