[Twisted-web] render_GET and memory consumption

Tue Dec 21 09:34:40 EST 2010

On 10-12-21 09:00 AM, exarkun at twistedmatrix.com wrote:
> On 05:19 am, psanchez at fosstel.com wrote:
>> Hello,
>>
>> Here's a demo HTTP server that returns 10 MB of random data each time a
>> client connects.
>>
>> import os
>>from twisted.internet import reactor
>>from twisted.web.server import Site
>>from twisted.web.resource import Resource
>>
>> data = os.urandom(10*1024*1024)
>>
>> class TestPage(Resource):
>>      isLeaf = True
>>      def render_GET(self, request):
>>          return data
>>
>> root = Resource()
>> root.putChild('test', TestPage())
>> reactor.listenTCP(8880, Site(root))
>> reactor.run()
>>
>> Now, when I run N clients simultaneously from a different host I see
>> that the server's memory consumption increases by N*10 MB. I can't
>> reproduce this example when running the clients from the same host as
>> the server; the test goes so fast that I can't gather any useful data.
>>
>> I run the test using the following httperf command on a different host
>> and looking at the Gnome system monitor in the server (top will do as
>> well).
>>
>> httperf --server 192.168.1.10 --port 8880 --uri /test \
>>          --rate 10 --num-conn 10
>>
>> When the server is idle memory consumption is 17.1 MB, but during the
>> test it jumps to 117.2 MB. My questions are then:
>>
>> 1. Given that 'data' is a global variable, eventually read-only as
>> well,
>> why is it replicated for each request? And who is replicating it?
>
> It's copied as part of the process of writing it to the socket.  You
> can't write 10MB at once, and you can't slice a string (to throw away
> the part that you did manage to write) without making a copy of part of
> it.
>> 2, What would be the proper way to re-write this example so that there
>> is one and only one 'data' structure at any time?
>
> Split data up into ~32kB-64kB chunks and write them to the request
> individually.  Then each chunk can just be dropped with no copying.
>
> Jean-Paul
>
> _______________________________________________
> Twisted-web mailing list
> Twisted-web at twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-web

Thanks Jean-Paul,

Here is my modified example, unfortunately with the same bad results 
regarding memory consumption.

import os
from twisted.internet import reactor
from twisted.web.server import Site
from twisted.web.resource import Resource

CHUNK_SIZE = 32*1024
data = os.urandom(10*1024*1024)

class TestPage(Resource):
      isLeaf = True

      def render_GET(self, request):
          s = 0
          for chunk in iter(lambda: data[s:s+CHUNK_SIZE], ''):
              request.write(chunk)
              s = s + CHUNK_SIZE

root = Resource()
root.putChild('test', TestPage())
reactor.listenTCP(8880, Site(root))
reactor.run()

Shall I add some delay before writing each chunk?

-- 
Pedro