[Twisted-Python] Memory usage in large file transfers
nkrontir at hotmail.com
Tue Dec 2 08:46:07 EST 2003
----- Original Message -----
From: "Andrew Bennetts" <andrew-twisted at puzzling.org>
To: <twisted-python at twistedmatrix.com>
Sent: Monday, December 01, 2003 2:11 PM
Subject: Re: [Twisted-Python] Memory usage in large file transfers
> On Mon, Dec 01, 2003 at 11:25:13AM +0200, Nikolaos Krontiris wrote:
> > Hi there.
> > I am writing a file transfer program using twisted as my framework.
> > I have been having some problems as far as memory usage is concerned
> > (i.e. both client and server just eat through available memory
> > ever releasing it back to the kernel while transfering data). I am
> > that in theory, the client and server will consume at least as much
> > as the file to be transferred, but this memory should also be made
> > available to the O/S after the operation has completed.
> > I also use a garbage collector, which makes things just marginally
> > and the only TWISTED operations I use are a few transport write and
> > callLater commands.
> You don't say how large "large" is, but you probably should be using
> producer/consumer APIs rather than just plain transport.write(data). See
> twisted.protocols.basic.FileSender for an example. If I'm understanding
> your problem correctly, you should see a significant improvement. This
> technique doesn't require holding the entire file in memory to transfer
> I'm not sure what you mean about using a garbage collector -- Python
> automatically cleans up objects with zero reference counts, and
> finds and collects unreachable object cycles.
nk: The size of files I am referring to can be anything from 20MB up to
500MB, but right now I'm taking it easy with the client/ server model; I'm
using sending a single 43MB file, and as I'm debugging and improving
performance, I will increase this filesize...
nk: I had originally thought about using basic.FileSender but a) It has been
commented as unstable by the twisted development team and b)I need to send a
client ID each time I send a single buffer (security... what can you
say...). To make sure that I'm not holding the entire file's contents in
memory, I read (at most) 64K of the file each time, and send this data
away. After it has been sent, this data buffer is flushed. I guess I can try
to change this to file.open, file.seek, file.read and file.close each time I
read the file so that the only contents of the file in system memory are the
only ones necessary...
nk: When talking about the garbage collector, I'm just referring to python's
gc.enable() and gc.collect() commands, nothing more... Unfortunately I don't
believe that the built-in periodical find-and-collect unreachable object
cycles is very useful in the case of the client, since it shuts down after
the file's EOF...
> > The only culprits responsible for this I can imagine to be a
> > between the hardcoded buffer sizes in TWISTED and the amount of data
> > send (I send 64Kb of data per request for faster delivery in LANs)
> > possibly that this memory lost is in many small chunks of data -- in
> > case no O/S can free this data, since there are always limits only
> > which the kernel will deem an amount of memory worth the trouble to
> > released (I think glibc has around a 2MB limit)...
> Memory fragmentation can prevent the OS reclaiming memory, but generally
> you'd expect memory growth to slow as it asymptotically reaches a high
> enough limit to accomodate all memory allocations for your load, even with
> I believe Python 2.3's pymalloc allocates memory for different types in
> different "arenas", which are seperately mmapped, so fragmentation in e.g.
> the string arena (strings being that type this is read from files, split
> sent over the network, etc) hopefully won't impact other memory
> So 2.3 vs. 2.2 (or earlier) you should see... different memory use
> characteristics. Hopefully better, but you never know :)
> Also, transport.write and callLater in 64kB chunks is unlikely to be the
> fastest or memory-efficient technique. Producers/consumers should be
> but I'd suspect that even a single transport.write of the entire content
> would probably be better. Actual benchmarks to support this claim would
> very welcome!
> > As professional network programmers, do you believe my diagnosis is
> > correct? Have you encountered such problems in the past? Are there
> > workarounds for this?
> I really can't say. You've given no specific data at all... How large are
> the files? How much memory does your server appear to lose per request?
> How much memory does the server take overall (both initially and after
> running for a while)? How many concurrent requests are you dealing with?
> What platform, version of Python, and version of Twisted? Anything else
> think is relevant? :)
nk: Right now, I'm testing the server/ client model with a 43 MB file. The
memory consumed on a WinMe system using Python 2.3.2 and Twisted 1.1.0 with
a 64K buffer is 58MB, while with a 4KB buffer is around the 80MB region. On
Linux using Python 2.3.2 and Twisted 1.1.0, the memory consumed with a 4K
buffer is always a bit more than 100MB. I can't use very large buffers on my
Linux system, because of the ID I have to send per buffer sent. It seems
that the linux default SOL_SOCKET, SO_RCVBUF sizes are relatively small, so
it confuses the client ID since the packets it receives will have different
sizes... Note that these results are for 1 server and 1 client. I have not
yet dared do 2 concurrent clients at once!
nk: The server consumes only 3MB of memory while idle. Unfortunately, I
cannot tell if the erratic memory consumption lies on the server or client
side (or both), since I only have 1 PC...
> If you could answer some of these sorts of questions, we could maybe tell
> you if what you're seeing is expected behaviour, or unusual, and maybe
> suggest specific remedies.
> Twisted-Python mailing list
> Twisted-Python at twistedmatrix.com
More information about the Twisted-Python