Opened 3 years ago

Closed 13 months ago

#5487 defect closed wontfix (wontfix)

Memory Consumption Issue with Twisted+pyOpenSSL

Reported by: wirepair Owned by:
Priority: high Milestone:
Component: core Keywords:
Cc: Branch:
Author: Launchpad Bug:

Description (last modified by thijs)

While running some code in production using twisted web Agent I noticed memory climbing excessively high (4GB) when making many SSL based requests. I've reduced the issue down to being reproduceable even in the sample twisted tls echo server and a modified tls client (using endpoints). I've tried the following operating environments:

CentOS 6.0 (x86_64)
(Linux 2.6.32-71.el6.x86_64 #1 SMP Fri May 20 03:51:51 BST 2011 x86_64 x86_64 x86_64 GNU/Linux)
Python 2.6.6, Twisted 11.1 (and 12.0) pyopenssl 0.13.
Python 2.7.2 (from src), Twisted 12.0 and pyopenssl 0.13.
Amazon Linux (x86_64)
2.6.35.14-95.38.amzn1.x86_64 #1 SMP Thu Aug 25 17:11:23 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
Python 2.6.7, Twisted 11.1, pyopenssl 0.13.

I tried a very simple blocking pyopenssl *only* script which didn't appear to consume memory even though the request was basically the same (sending/receiving 1100 bytes). (See attached pyossltest.py.)

It should be noted that Python 2.7 with Twisted 11.0 and pyopenssl 0.13 in win32 (x86 python) would properly release memory and never appeared to consume more than ~40mb using the attached test code. The systems which are showing problems consumed around 300mb during the same time period, and never appeared to release any memory (always went up, never down).

Also if you only establish the connection with the client, then immediately close (call self.transport.loseConnection prior calling self.transport.write()) the memory footprint does not increase.

I also tried epoll and select reactors and didn't notice any change.
Let me know if you need any more information, i've brought this issue up in #twisted on IRC and other people appear to be able to reproduce the issue.

Attachments (5)

tlscli.py (1.4 KB) - added by wirepair 3 years ago.
Client that sends multiple concurrent requests and consumes vast amounts of memory
tlsserv.py (704 bytes) - added by wirepair 3 years ago.
TLS server code which also consumes vast amounts of memory (taken from twisted documentation)
pyossltest.py (986 bytes) - added by wirepair 3 years ago.
quick dirty pyopenssl blocking code which doesn't appear to consume memory
tlsserv.2.py (993 bytes) - added by exarkun 3 years ago.
Modified server example
tlscli.2.py (1.5 KB) - added by exarkun 3 years ago.
Modified client example

Download all attachments as: .zip

Change History (20)

Changed 3 years ago by wirepair

Client that sends multiple concurrent requests and consumes vast amounts of memory

Changed 3 years ago by wirepair

TLS server code which also consumes vast amounts of memory (taken from twisted documentation)

Changed 3 years ago by wirepair

quick dirty pyopenssl blocking code which doesn't appear to consume memory

comment:1 Changed 3 years ago by wirepair

therve pointed me to this article: http://journal.paul.querna.org/articles/2011/04/05/openssl-memory-use/. Seems someone else has seen this behavior. Not sure if this is really a fix for it though, even if connection objects are large per connection, it should release them at some point (like it is in Windows).

comment:2 Changed 3 years ago by thijs

  • Description modified (diff)

comment:3 Changed 3 years ago by exarkun

One thing to pay attention to in the attached Twisted-using client code is that it doesn't wait for old connections to close before starting new ones. Considering this, it's not surprising to see memory usage grow without bound, since the number of open connections might grow without bound.

I changed the client to only start a new connection when one of the old connections has closed. This seems to have put a ceiling on memory usage at around 208MB (with the code running for about ten minutes).

I also instrumented the Twisted-using server code to report the number of TLSMemoryBIOProtocol instances every 15 seconds. It occasionally reports rather large numbers. Here is an excerpt:

57 TLSMemoryBIOProtocol
130 TLSMemoryBIOProtocol
79 TLSMemoryBIOProtocol
67 TLSMemoryBIOProtocol
74 TLSMemoryBIOProtocol
143 TLSMemoryBIOProtocol
141 TLSMemoryBIOProtocol
132 TLSMemoryBIOProtocol
108 TLSMemoryBIOProtocol
164 TLSMemoryBIOProtocol
160 TLSMemoryBIOProtocol
179 TLSMemoryBIOProtocol
120 TLSMemoryBIOProtocol
133 TLSMemoryBIOProtocol
202 TLSMemoryBIOProtocol
180 TLSMemoryBIOProtocol
214 TLSMemoryBIOProtocol
96 TLSMemoryBIOProtocol
87 TLSMemoryBIOProtocol
69 TLSMemoryBIOProtocol

These all seem too large, based on what's going on, but this may just be normal non-deterministic garbage collector behavior. The number never seems to go much above a couple hundred, at least.

Attaching my modified versions; as things stand, I'm not sure there's any real memory leak in Twisted or pyOpenSSL.

Changed 3 years ago by exarkun

Modified server example

Changed 3 years ago by exarkun

Modified client example

comment:4 Changed 3 years ago by itamar

If it's "normal non-deterministic garbage collection", gc.collect() should free the memory. Does it?

comment:5 Changed 3 years ago by exarkun

If it's "normal non-deterministic garbage collection", gc.collect() should free the memory. Does it?

Not necessarily. You can't ever really be sure what memory is going to be returned to the platform.

comment:6 Changed 3 years ago by wirepair

OK that makes sense. However, from your modifications to make sure that each connection is only started after one completes for the client, that would mean there's only a max client count of 5 (due to being locked with max_runs/coiterate). So 5 concurrent SSL connections takes up 200MBs, is this right?

If that's true, I really need to stop my connection count from ever growing that large. Is there something specific I need to do to ensure that twisted.web.client.Agent closes it's tls connections after it finishes receiving the information? I don't see anything immediately obvious from http://twistedmatrix.com/documents/current/api/twisted.web.client.Agent.html#request.

comment:7 Changed 3 years ago by itamar

200MBs for 5 connection seems... problematic.

Agent connections will close immediately after they're done by default, unless you chose to use a persistent pool.

comment:8 Changed 3 years ago by exarkun

200MBs for 5 connection seems... problematic.

This seems to be on the OpenSSL configuration side. At least, I haven't seen any investigation that suggests it's anything in Twisted that results to this amount of memory being used.

If it is on the OpenSSL side, then there's little we can do about it. Our SSL API is such that it is up to application developers to properly configure OpenSSL for their needs.

Agent connections will close immediately after they're done by default, unless you chose to use a persistent pool.

Are you really confident that, in all cases, Agent doesn't call connectionLost on the body protocol until after its transport has gone away (presuming the non-use of persistent connections)?

comment:9 Changed 3 years ago by itamar

connectionLost may be called on the body protocol before the underlying connection is lost, yes, but it does tell it to close at same time so it shouldn't take *too* long. TLS shutdown handshake over interwebs might take a bit of time.

Before just writing this off as "OpenSSL sucks", next step is to try with the non-memory BIO version and compare memory usage.

comment:10 follow-up: Changed 3 years ago by wirepair

Another update on this. Just built openssl-1.0.0g:

include/openssl/opensslv.h:28:#define OPENSSL_VERSION_NUMBER	0x1000007fL

Then built pyopenssl using the following steps:

python setup.py build_ext -I/usr/local/ssl/include/ -L/usr/local/ssl/lib/
python setup.py build
sudo python setup.py install
python
>>> import OpenSSL
>>> "%02x"%OpenSSL.SSL.OPENSSL_VERSION_NUMBER
'1000007f'

It looks like SSL_OP_NO_COMPRESSION is not exposed as an attribute to OpenSSL.SSL. So I manually passed in the value of 0x00020000 taken from the header:

ssl/ssl.h:553:#define SSL_OP_NO_COMPRESSION				0x00020000L

After only making the change on the server (code below) it still appears to increase in memory consumption. I can not set SSL_MODE_RELEASE_BUFFERS in the context because it doesn't look like pyopenssl 0.13 exposes the ctx set_mode method. (Although it appears to have been done in a branch: https://bugs.launchpad.net/pyopenssl/+bug/836901). There seemed to be no difference between turning encryption off and having it on with regards to how quickly memory climbed.

Also I tried running my client against an nginx server that I setup to confirm that it wasn't some openssl issue and to see what it's memory looked like. After running for a few minutes the server showed little to no consumption.

#!/usr/bin/env python
from OpenSSL import SSL
from twisted.internet import ssl, reactor
from twisted.internet.protocol import Factory, Protocol
"""
tlsserv.py (original code with set_options modified)
mkdir keys
openssl genrsa -des3 -out server.key 4096
openssl req -new -key server.key -out server.csr
openssl x509 -req -days 365 -in server.csr -signkey server.key -out server.crt
"""
class Echo(Protocol):
    def dataReceived(self, data):
        """As soon as any data is received, write it back."""
        self.transport.write(data)

if __name__ == '__main__':
    factory = Factory()
    factory.protocol = Echo
    myContextFactory = ssl.DefaultOpenSSLContextFactory('keys/server.key', 'keys/server.crt')
    ctx = myContextFactory.getContext()
    #define SSL_OP_NO_COMPRESSION 0x00020000L
    ctx.set_options(0x00020000)
    reactor.listenSSL(8000, factory,myContextFactory)
    reactor.run()

comment:11 in reply to: ↑ 10 Changed 3 years ago by wirepair

There seemed to be no difference between turning encryption off and having it on with regards to how quickly memory climbed.

*Edit: I meant compression.

comment:12 Changed 3 years ago by wirepair

OK I tried exarkun's scripts again and it does seem to fix the problem by limiting the connection count. Translating this over to my problem with Agent I found that by not calling response.deliverBody in some cases (whenever there were redirects) I was increasing my connection count and waiting for them to timeout (hence my memory consumption issue). Translating closing connections after reading the resposne headers, but when you don't need the body turns in to the following:

def cbRequest(response):
    # do stuff... but don't call response.deliverBody because maybe you don't need it.
    response._transport._producer.loseConnection()

Running 100 max concurrent connections but making sure to call loseConnection kept my memory usage around 100MB which, is totally acceptable for *100* concurrent.

End result: Not closing connections was the real culprit I think. For t.w.c.Agent this is not clear that you need to directly call deliverBody to have the transport closed i created ticket #5488.

As far as I'm concerned this ticket can be closed (unless someone disagrees).

comment:13 Changed 3 years ago by exarkun

connectionLost may be called on the body protocol before the underlying connection is lost, yes, but it does tell it to close at same time so it shouldn't take *too* long. TLS shutdown handshake over interwebs might take a bit of time.

If connectionLost is called first, then there's no way to limit the number of connections.

Before just writing this off as "OpenSSL sucks", next step is to try with the non-memory BIO version and compare memory usage.

Using my modified non-leaky version, memory usage is roughly the same whether I use pyOpenSSL 0.9, which does not expose the APIs necessary for the memory BIO to work, or pyOpenSSL trunk@HEAD (which does, of course).

comment:14 Changed 3 years ago by exarkun

Using my modified non-leaky version, memory usage is roughly the same whether I use pyOpenSSL 0.9, which does not expose the APIs necessary for the memory BIO to work, or pyOpenSSL trunk@HEAD (which does, of course).

Sorry. I confused some numbers. The memory usage is not the same. Both versions seem to have a ceiling, but it's higher with the memory BIO version.

comment:15 Changed 13 months ago by exarkun

  • Resolution set to wontfix
  • Status changed from new to closed

I think the conclusion from comment 12 is that there isn't a memory leak being observed here. There was a kind of connection leak in the application code that was suspected of triggering a memory leak.

Note: See TracTickets for help on using tickets.