[Twisted-Python] memory leaks

Mon Dec 5 23:09:15 MST 2016

> On Dec 4, 2016, at 9:50 AM, Jean-Paul Calderone <exarkun at twistedmatrix.com> wrote:
> 
> On Sun, Dec 4, 2016 at 12:50 AM, Glyph Lefkowitz <glyph at twistedmatrix.com <mailto:glyph at twistedmatrix.com>> wrote:
> Following up on a Stack Overflow question from some time ago, http://stackoverflow.com/questions/40604545/twisted-using-connectprotocol-to-connect-endpoint-cause-memory-leak?noredirect=1#comment68573508_40604545 <http://stackoverflow.com/questions/40604545/twisted-using-connectprotocol-to-connect-endpoint-cause-memory-leak?noredirect=1#comment68573508_40604545> since the submitter added a minimal reproducer, I used Heappy https://pypi.org/project/guppy/ <https://pypi.org/project/guppy/> to look at memory sizing, and seeing large numbers of Logger instances and type objects leaking when using client endpoints.  It was not immediately obvious to me where the leak is occurring though, as I was careful to clean up the Deferred results and not leave them in a failure state.
> 
> I am hoping that I can entice someone else to diagnose this far enough to actually file a bug :-).
> 
> 
> Answered.  I didn't file a bug, I'll let someone else with ideas about twisted.logger think about what exactly the bug is.

I wrote up this bug, and will file it when the ability to file bugs on Trac comes back:

twisted.logger._initialBuffer can consume a surprisingly large amount of memory if logging is not initialized

The way that `twisted.logger` is supposed to work is that at process startup time, the global log observer has a ring buffer for any messages emitted before logging is initialized, and emit those messages to the initial set of log observers passed to `globalLogBeginner.beginLoggingTo`.

The size of this buffer (in `twisted.logger._buffer._DEFAULT_BUFFER_MAXIMUM` is 65535.  This value was selected arbitrarily, probably because somebody (me or wsanchez) thought "huh, yeah, 64k, that's probably a fine number); but of course, that's 64k ''bytes''.

If this were a buffer of actual formatted log messages, of say 200 bytes each, that would be about 13 megabytes, which is maybe an acceptable amount of RAM to spend on a log buffer.

However, it isn't that.  It's a buffer of 64k log ''events'', each of which probably has a `log_logger` and `log_source` set, each of which is an object attached to potentially arbitrary data.  For example, every `Factory` that starts up logs something, which means you're holding on to an instance, and an instance dictionary, and the protocol instance, and the protocol instance dictionary.  Worse yet, any logged ''failures'' might hold on to all the stuff on their stack.

Add it all up and you end up with a log buffer totaling in the hundreds of megabytes, or even gigabytes, once it's full.  In an application that naively uses Twisted without ever initializing logging, this hangs around forever.

This buffer should probably be a ''lot'' smaller, and we might want to emit a warning when it fills up, reminding people that it is ''only polite'' to start up the logging subsystem, even just to explicitly throw logs away.

Text is here in case someone else manages to make trac come back and would like to file it before I get back :).

-glyph

-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/twisted-python/attachments/20161205/f15acb77/attachment-0002.html>