[Twisted-Python] Memory leak//problem in twisted write procedures
exarkun at divmod.com
exarkun at divmod.com
Thu Sep 16 20:29:48 EDT 2004
On Thu, 16 Sep 2004 20:03:38 -0400, Joshua Moore-Oliva <josh at chatgris.com> wrote:
>I believe I have identified a memory problem in twisted.
> However, I have since been hit with an apparent memory leak. After reading through the twisted source (specifically the write routines in twisted/internet/abstract.py)
> I believe I have come across the problem.
> the write implementation continually increases it's write databuffer, only freeing up memory when the data buffer is all sent
> However, successive write operations append data to the data buffer.. So if a program is able to keep the data buffer from ever completely emptying (which mine is)
> then the data buffer will forever grow, resulting in a memory problem.
> I have added some output to print out the length of the data buffer and the offset, and after only 10 minutes of running the numbers are already
> offset == 76906496
> dataBuffer len == 120768384
Whether or not "leak" is the appropriate term for this is debatable. That aside, it is somewhat undesirable, but it is completely avoidable without making any changes to Twisted*. There are two concepts involved, Producers and Consumers. The transport is a Consumer in this case, and whatever protocol you have that is writing to it is the Producer. The Consumer will notify the Producer when it would be a good idea to write more bytes. Since this will only happen when the write buffer is empty, it avoids the problem of an ever growing buffer in the transport.
Producers and Consumers aren't _too_ well documented, but the relevant interfaces are quite simple:
There are a few examples throughout Twisted itself. One such (FileSender, at the end of bottom of the module):
> Now, reading through the source to fix this problem, the fastest solution (requiring the least change to the existing code) would be to splice//reduce the size of the
> dataBuffer after offset exceeds a certain number.
It is the easiest change to make, but it leads to detrimental string copying behavior. Which this is a minor concern in the case where many small writes are being made while the buffer is large (because huge amounts of copying is already going on), it is a noticable slowdown in the more common case when the buffer is typically empty or almost empty.
* A change _should_ be made to Twisted eventually. A good solution would involve a zero-copy buffering system, such as a list. There is an implementation of this, but it involves so many nasty hacks that I don't feel it is worth including. Shortly after 2.0 I plan to find time to clean up many of the low-level TCP implementation details, as they have grown increasingly crufty over the last year.
More information about the Twisted-Python