[Twisted-web] render_GET and memory consumption

Tue Dec 21 10:59:46 EST 2010

On 02:34 pm, psanchez at fosstel.com wrote:
>On 10-12-21 09:00 AM, exarkun at twistedmatrix.com wrote:
>>On 05:19 am, psanchez at fosstel.com wrote:
>>>Hello,
>>>
>>>Here's a demo HTTP server that returns 10 MB of random data each time 
>>>a
>>>client connects.
>>>
>>>import os
>>>from twisted.internet import reactor
>>>from twisted.web.server import Site
>>>from twisted.web.resource import Resource
>>>
>>>data = os.urandom(10*1024*1024)
>>>
>>>class TestPage(Resource):
>>>      isLeaf = True
>>>      def render_GET(self, request):
>>>          return data
>>>
>>>root = Resource()
>>>root.putChild('test', TestPage())
>>>reactor.listenTCP(8880, Site(root))
>>>reactor.run()
>>>
>>>Now, when I run N clients simultaneously from a different host I see
>>>that the server's memory consumption increases by N*10 MB. I can't
>>>reproduce this example when running the clients from the same host as
>>>the server; the test goes so fast that I can't gather any useful 
>>>data.
>>>
>>>I run the test using the following httperf command on a different 
>>>host
>>>and looking at the Gnome system monitor in the server (top will do as
>>>well).
>>>
>>>httperf --server 192.168.1.10 --port 8880 --uri /test \
>>>          --rate 10 --num-conn 10
>>>
>>>When the server is idle memory consumption is 17.1 MB, but during the
>>>test it jumps to 117.2 MB. My questions are then:
>>>
>>>1. Given that 'data' is a global variable, eventually read-only as
>>>well,
>>>why is it replicated for each request? And who is replicating it?
>>
>>It's copied as part of the process of writing it to the socket.  You
>>can't write 10MB at once, and you can't slice a string (to throw away
>>the part that you did manage to write) without making a copy of part 
>>of
>>it.
>>>2, What would be the proper way to re-write this example so that 
>>>there
>>>is one and only one 'data' structure at any time?
>>
>>Split data up into ~32kB-64kB chunks and write them to the request
>>individually.  Then each chunk can just be dropped with no copying.
>>
>>Jean-Paul
>>
>>_______________________________________________
>>Twisted-web mailing list
>>Twisted-web at twistedmatrix.com
>>http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-web
>
>Thanks Jean-Paul,
>
>Here is my modified example, unfortunately with the same bad results
>regarding memory consumption.
>
>import os
>from twisted.internet import reactor
>from twisted.web.server import Site
>from twisted.web.resource import Resource
>
>CHUNK_SIZE = 32*1024
>data = os.urandom(10*1024*1024)
>
>class TestPage(Resource):
>      isLeaf = True
>
>      def render_GET(self, request):
>          s = 0
>          for chunk in iter(lambda: data[s:s+CHUNK_SIZE], ''):
>              request.write(chunk)
>              s = s + CHUNK_SIZE

You've just moved the copying-by-slicing out of the transport and into 
your Resource. :)  You need to do all that copying/slicing at the 
beginning, where it only needs to happen once, so that every render can 
share those allocated strings.

Jean-Paul