Opened 6 years ago

Closed 5 years ago

#8164 defect closed fixed (fixed)

(bounded) leak in twisted since 15.2.0 - twisted.logger._initialBuffer can consume a surprisingly large amount of memory if logging is not initialized

Reported by: Bret Curtis Owned by: Jean-Paul Calderone
Priority: normal Milestone:
Component: logger Keywords:
Cc: wsanchez@… Branch: 8164-initial-log-buffer-size
branch-diff, diff-cov, branch-cov, buildbot
Author:

Description (last modified by Glyph)

The way that twisted.logger is supposed to work is that at process startup time, the global log observer has a ring buffer for any messages emitted before logging is initialized, and emit those messages to the initial set of log observers passed to globalLogBeginner.beginLoggingTo.

The size of this buffer (in twisted.logger._buffer._DEFAULT_BUFFER_MAXIMUM is 65535. This value was selected arbitrarily, probably because somebody (me or wsanchez) thought "huh, yeah, 64k, that's probably a fine number); but of course, that's 64k bytes.

If this were a buffer of actual formatted log messages, of say 200 bytes each, that would be about 13 megabytes, which is maybe an acceptable amount of RAM to spend on a log buffer.

However, it isn't that. It's a buffer of 64k log events, each of which probably has a log_logger and log_source set, each of which is an object attached to potentially arbitrary data. For example, every Factory that starts up logs something, which means you're holding on to an instance, and an instance dictionary, and the protocol instance, and the protocol instance dictionary. Worse yet, any logged failures might hold on to all the stuff on their stack.

Add it all up and you end up with a log buffer totaling in the hundreds of megabytes, or even gigabytes, once it's full. In an application that naively uses Twisted without ever initializing logging, this hangs around forever.

This buffer should probably be a lot smaller, and we might want to emit a warning when it fills up, reminding people that it is only polite to start up the logging subsystem, even just to explicitly throw logs away.


Using calls to twisted render_GET or render_POST reproduces the leak.

To reproduce, run the following code using any version of twisted after 15.2.0:

Code highlighting:

import os

from twisted.web.server import Site
from twisted.web.resource import Resource
from twisted.internet import reactor

def get_mem_usage():
    with open('/proc/%d/status' % os.getpid(), 'r') as f:
        for line in f.read().splitlines():
            if line.startswith('VmRSS'):
                return line

class FormPage(Resource):
    isLeaf = True
    def render_GET(self, request):
        return 'GET response - %s\n' % get_mem_usage()

factory = Site(FormPage())
reactor.listenTCP(8880, factory)
print 'Please GET http://127.0.0.1:8880/'
reactor.run()

When the twisted process is running, you can then use curl:

Code highlighting:

while [ 1 ]; do curl http://127.0.0.1:8880; done

You can use 15.1.0 as a control to show that memory usage is constant.

Change History (23)

comment:1 Changed 6 years ago by Bret Curtis

For OSX or Windows, you can remove the get_mem_usage part to get it to run. You can then use task-manager or top to see the process memory usage balloon in size.

comment:2 Changed 6 years ago by hawkowl

This is because the logger is never started, and so the LogBeginner holds up to 65,000 log messages. Not sure if this is as designed (I think it is, psi29a does not :P )... maybe the log message holding count needs to be brought down a bit?

comment:3 Changed 6 years ago by Bret Curtis

We have code that worked fine up until 15.2.0.

So this is a change in behaviour over <= 15.1.0 that increases memory usage of a process, I would consider this a regression.

Consider services that have their own logging and does not rely on Twisted's implementation.

comment:4 Changed 6 years ago by Glyph

Even 65,000 is a finite number :-). Perhaps it's a bit high, and we could tune it down, but that would suggest that the memory would level off eventually. If it doesn't, then that's not the issue.

comment:5 in reply to:  3 Changed 6 years ago by Glyph

Replying to psi29a:

Consider services that have their own logging and does not rely on Twisted's implementation.

Services which have their own logging are still responsible for hooking up Twisted's logs to that output; numerous mechanisms exist to do that.

That said, it's definitely not appropriate for Twisted to leak memory forever while buffering infinite log messages, or if 64k is enough messages to make a significant impact on memory usage, even to buffer that many; so don't take this to mean I don't think this is a real issue.

comment:6 Changed 6 years ago by Bret Curtis

Keywords: logging added

I let one process go for a minute and it jumped to 100MiB before I cut it off. My colleague who originally found and documented the problem yesterday saw that the process ballooned to 400MiB after "a few iterations of 100,000 gets".

I still don't like that how one day, while upgrading twisted, my application suddenly balloons up. Logging shouldn't get in the way.

comment:7 Changed 6 years ago by Bret Curtis

I can confirm that it does plateau and that any addition messages will cause the first ones to be removed from the buffer. I think the reason why we saw 400MiB in one instance is that the log output contains additional information (longer get url than just get '/').

From our point of view, we already have logging system and we already do file descriptor remapping and this is done in pre-flight (daemonization code, not twistd) before we start the reactor loop.

So shifting to using Twisted's logging system is really not an option. We expected that any logging from twisted would happen to stdout and stderr, by default unless asked to do otherwise.

This behaviour has changed and we're stuck with 15.1.0 for another 6 months before we can add additional code that just tells Twisted to dump to stdout/stderr anyway.

Code highlighting:

from twisted.logger import globalLogBeginner; 

globalLogBeginner.beginLoggingTo([lambda _: None], redirectStandardIO=False, discardBuffer=True)

I guess the question is, should I really have to do this? I would expect that if I wanted that kind of logging, I would ask for it... otherwise use the default stdout and stderr.

comment:8 in reply to:  7 Changed 6 years ago by Glyph

Replying to psi29a:

I can confirm that it does plateau and that any addition messages will cause the first ones to be removed from the buffer. I think the reason why we saw 400MiB in one instance is that the log output contains additional information (longer get url than just get '/').

OK, good to know we have the correct diagnosis.

From our point of view, we already have logging system and we already do file descriptor remapping and this is done in pre-flight (daemonization code, not twistd) before we start the reactor loop.

This is all fine and good, but the fact remains that *twisted* is using some API for logging itself, and you have to let twisted know about your thing, whatever it is.

So shifting to using Twisted's logging system is really not an option.

I was not suggesting that, and in any case it wouldn't help. If you started pushing more messages into Twisted's log buffer from your application, they still wouldn't have anywhere to go. So whether you use Twisted logger or not is entirely irrelevant to this issue. The relevant but of Twisted you might use which would address this problem is twistd (as it initialized logging for you), but I totally understand that it might not be appropriate to your application; the only change I'm recommending you make is adding the beginLoggingTo call you've already written out.

We expected that any logging from twisted would happen to stdout and stderr, by default unless asked to do otherwise.

What format would you expect this to be in? In the future we'd really like Twisted's logging default format to be ietf-json-text-sequence, not plain text. Also, twisted has never done this in the past.

This behaviour has changed and we're stuck with 15.1.0 for another 6 months before we can add additional code that just tells Twisted to dump to stdout/stderr anyway.

I don't follow. Is this because you have a 6-month deployment cycle? If you properly initialize logging you'll stop having this issue with every version of Twisted; no need to be stuck.

Code highlighting:

from twisted.logger import globalLogBeginner; 

globalLogBeginner.beginLoggingTo([lambda _: None], redirectStandardIO=False, discardBuffer=True)

I guess the question is, should I really have to do this? I would expect that if I wanted that kind of logging, I would ask for it... otherwise use the default stdout and stderr.

Yes, you should. Consider that Twisted potentially begins logging as soon as you start calling its APIs. If you're writing an interactive command-line tool, you don't want the fact that you started the reactor to start vomiting JSON logs all over your user's screen. If you're writing a GUI you similarly might want to direct the logs - including any status messages that were logged before you could initialize logging, which is the point of this buffer - to a more useful location than a console that the user will never see and can't be collected for diagnostic purposes. You are also talking as if Twisted *used* to log to stderr by default, which was never the case: previously, you should have been calling 'startLogging', which would have had a similar effect.

Now, again, I want to be clear that the buffer size here is still a bug, and we vastly mis-estimated a reasonable size. But even after we fix it, your application should still do proper initialization of the logging subsystem, even if that initialization just tells it to discard all logs, as above. And doing the right thing here in your own code will address your immediate problem.

comment:9 Changed 6 years ago by Bret Curtis

Upon further inspection, our use case was to ignore anything Twisted had to say unless it came to stderr. For versions of Twisted 15.1 and lower, the bits were just dropped while in 15.2 with the new logging core they are held in queue.

The 6 month freeze on our code is something beyond our control and not apart of our typical development cycle. This is just an unfortunate case of "timing" which means our only recourse is to back off to Twisted (15.1) just dropping its logs on the floor.

Once we're able to get back to it, we'll be sure to tell Twisted explicitly to not redirectStandardIO as part of our pre-flight checks.

As for the 'bug' itself: is there an upper-bound to the length of the log-message? I could imagine that it could bloat heavily even if you lowered the amount of messages down from 65K.

comment:10 Changed 5 years ago by Glyph

#8936 was a duplicate of this issue, but its description was a bit better, so I'm going to add it to the description here.

comment:11 Changed 5 years ago by Glyph

Keywords: memory leak logging removed

(Keywords are for workflow and not description.)

comment:12 Changed 5 years ago by Glyph

Component: webcore
Description: modified (diff)
Summary: Memory leak in twisted web since 15.2.0(bounded) leak in twisted since 15.2.0 - twisted.logger._initialBuffer can consume a surprisingly large amount of memory if logging is not initialized

comment:13 Changed 5 years ago by Wilfredo Sánchez Vega

Cc: wsanchez@… added
Milestone: New Logging System

comment:14 Changed 5 years ago by Wilfredo Sánchez Vega

Component: corelogger

comment:15 Changed 5 years ago by Wilfredo Sánchez Vega

Milestone: New Logging System

Closing New Logging System milestone; use logger component for ongoing tickets.

comment:16 Changed 5 years ago by Jean-Paul Calderone

Keywords: review added

comment:17 Changed 5 years ago by Jean-Paul Calderone

9020 was a duplicate of this.

comment:18 Changed 5 years ago by Jean-Paul Calderone

Branch: 8164-initial-log-buffer-size

comment:19 Changed 5 years ago by Glyph

Keywords: review removed
Owner: set to Jean-Paul Calderone

I love the changes. The docs are much better. The improved test coverage is welcome. Please:

  1. fix the one doc builder error,
  2. file a ticket for your very good "insert a log event if the initial log buffer ends up dropping anything" idea,

and land.

comment:20 Changed 5 years ago by Wilfredo Sánchez Vega

+1, this is great, thanks JP

comment:21 Changed 5 years ago by Jean-Paul Calderone

Thanks!

(1) fixed in 1bb61b5 (2) is 9022

comment:22 Changed 5 years ago by Jean-Paul Calderone <exarkun@…>

In 4acde62:

Merge pull request #699 from twisted/8164-initial-log-buffer-size

Reduce the size of the LogBeginner event buffer.

Author: exarkun
Reviewers: glyph, wsanchez
Fixes: 8164

The previous larger event buffer size caused noticeable problems for several users due to the associated memory and CPU cost, particularly in cases where no observer was ever registered (leaving the events in memory for the full process lifetime).

comment:23 Changed 5 years ago by Jean-Paul Calderone

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.