[Twisted-web] render_GET and memory consumption

Tue Dec 21 11:43:11 EST 2010

On 10-12-21 10:59 AM, exarkun at twistedmatrix.com wrote:
> On 02:34 pm, psanchez at fosstel.com wrote:
>> On 10-12-21 09:00 AM, exarkun at twistedmatrix.com wrote:
>>> On 05:19 am, psanchez at fosstel.com wrote:
>>>> Hello,
>>>>
>>>> Here's a demo HTTP server that returns 10 MB of random data each time
>>>> a
>>>> client connects.
>>>>
>>>> import os
>>> >from twisted.internet import reactor
>>> >from twisted.web.server import Site
>>> >from twisted.web.resource import Resource
>>>>
>>>> data = os.urandom(10*1024*1024)
>>>>
>>>> class TestPage(Resource):
>>>>       isLeaf = True
>>>>       def render_GET(self, request):
>>>>           return data
>>>>
>>>> root = Resource()
>>>> root.putChild('test', TestPage())
>>>> reactor.listenTCP(8880, Site(root))
>>>> reactor.run()
>>>>
>>>> Now, when I run N clients simultaneously from a different host I see
>>>> that the server's memory consumption increases by N*10 MB. I can't
>>>> reproduce this example when running the clients from the same host as
>>>> the server; the test goes so fast that I can't gather any useful
>>>> data.
>>>>
>>>> I run the test using the following httperf command on a different
>>>> host
>>>> and looking at the Gnome system monitor in the server (top will do as
>>>> well).
>>>>
>>>> httperf --server 192.168.1.10 --port 8880 --uri /test \
>>>>           --rate 10 --num-conn 10
>>>>
>>>> When the server is idle memory consumption is 17.1 MB, but during the
>>>> test it jumps to 117.2 MB. My questions are then:
>>>>
>>>> 1. Given that 'data' is a global variable, eventually read-only as
>>>> well,
>>>> why is it replicated for each request? And who is replicating it?
>>>
>>> It's copied as part of the process of writing it to the socket.  You
>>> can't write 10MB at once, and you can't slice a string (to throw away
>>> the part that you did manage to write) without making a copy of part
>>> of
>>> it.
>>>> 2, What would be the proper way to re-write this example so that
>>>> there
>>>> is one and only one 'data' structure at any time?
>>>
>>> Split data up into ~32kB-64kB chunks and write them to the request
>>> individually.  Then each chunk can just be dropped with no copying.
>>>
>>> Jean-Paul
>>>
>>> _______________________________________________
>>> Twisted-web mailing list
>>> Twisted-web at twistedmatrix.com
>>> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-web
>>
>> Thanks Jean-Paul,
>>
>> Here is my modified example, unfortunately with the same bad results
>> regarding memory consumption.
>>
>> import os
>>from twisted.internet import reactor
>>from twisted.web.server import Site
>>from twisted.web.resource import Resource
>>
>> CHUNK_SIZE = 32*1024
>> data = os.urandom(10*1024*1024)
>>
>> class TestPage(Resource):
>>       isLeaf = True
>>
>>       def render_GET(self, request):
>>           s = 0
>>           for chunk in iter(lambda: data[s:s+CHUNK_SIZE], ''):
>>               request.write(chunk)
>>               s = s + CHUNK_SIZE
>
> You've just moved the copying-by-slicing out of the transport and into
> your Resource. :)  You need to do all that copying/slicing at the
> beginning, where it only needs to happen once, so that every render can
> share those allocated strings.
>
> Jean-Paul
>
> _______________________________________________
> Twisted-web mailing list
> Twisted-web at twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-web

OK, I guess I'm being slow :-( Here's another version, same results.

import os
from twisted.internet import reactor
from twisted.web.server import Site
from twisted.web.resource import Resource

CHUNK_SIZE = 32*1024
data = os.urandom(10*1024*1024)
chunks = []

def make_chunks():
     s = 0
     for chunk in iter(lambda: data[s:s+CHUNK_SIZE], ''):
         chunks.append(chunk)
         s = s + CHUNK_SIZE

class TestPage(Resource):
      isLeaf = True

      def render_GET(self, request):
          for chunk in chunks:
              request.write(chunk)

make_chunks()
root = Resource()
root.putChild('test', TestPage())
reactor.listenTCP(8880, Site(root))
reactor.run()

I tried also preparing the chunks in a TestPage.__init__() 
implementation. Same results. So, where exactly do I have to put the 
make_chunks() steps?

Thanks,

-- 
Pedro