[Twisted-web] xmlrpc resource file descriptor leak

Werner Thie wthie at thiengineering.ch
Thu Jul 3 02:59:48 EDT 2008


Hi

suffering form the same problem a year ago or so, I dug into this by 
following the call chain and cgi.py is the source of the 'too many fd' 
problem.

For an explanation read the comment starting at line 417 in cgi.py which 
reads:

     The class is subclassable, mostly for the purpose of overriding
     the make_file() method, which is called internally to come up with
     a file open for reading and writing.  This makes it possible to
     override the default choice of storing all files in a temporary
     directory and unlinking them as soon as they have been opened.

The trick which is used here is the fact, that an fd hangs around for 
some time even if the fd in question was unlinked. It takes some time 
for the OS to collect all those unlinked fds, but they will be collected 
  eventually. The number of fds allowed per process when using cgi.py 
(used by twisted) depends on the burst rate of requests, because every 
request has per default a FieldStorage and therefore an fd.

The only solution is to up the number of allowed fds per process/per 
machine and depends on the OS:

MS Windows: if CRT is used, hardcoded to 2048 else limited by mem

On **ixes use ''ulimit -a' or 'sysctl -a | grep files' to get a printout 
the system value, usually something along kern.maxfiles=10000

Per machine:
/etc/sysctl.conf contains the values for the kernel preset when booting.

Per process:
/etc/login.conf contains usually a variable called openfiles-max

On my OpenBSD production system (avg load 30 req/sec) values are

kern.maxfiles=10000

openfiles-max=8192
openfiles-cur=8192

which allows smooth operation of two twisted processes on a dual core 
machine.

HTH, Werner

FYI the output of top:

load averages:  0.34,  0.31,  0.31 
                                            08:55:01
31 processes:  1 running, 29 idle, 1 on processor
CPU0 states: 10.8% user,  0.0% nice,  2.6% system,  0.0% interrupt, 
86.6% idle
CPU1 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt, 
100% idle
Memory: Real: 325M/608M act/tot  Free: 2913M  Swap: 0K/4096M used/tot

PID  UNAME PRI NICE  SIZE   RES STATE    WAIT  TIME   CPU    COMMAND
4562 www   2   0     125M   97M sleep/0  poll  242:39 11.82% python2.5
6506 www   2   0     205M  181M run/0    -     34:20   2.00% python2.5


Phil Mayers wrote:
> This is a bit vague, and I wanted to get some feedback before I submit a 
> ticket.
> 
> We have a long-running twisted / nevow process that basically has:
> 
>  root
>   \- RPC2 - a twisted.web.xmlrpc.XMLRPC sub-class
>   \- ui   - nevow pages
> 
> The thing hung up over the weekend with "too many open file descriptors" 
> and before I killed it I did an "lsof"; lots of the files were:
> 
> python25 20163  nsg   31u   REG              253,0      370   3276854 
> /tmp/tmp5QJivu (deleted)
> 
> ...and "cat /proc/20163/fd/31" shows:
> 
> <?xml version='1.0'?>
> <methodCall>
> <methodName>classify_maclist</methodName>
> <params>
> <param>
> <value><string>HORPROD</string></value>
> </param>
> <param>
> <value><array><data>
> <value><string>xxxx</string></value>
> </data></array></value>
> </param>
> <param>
> <value><int>-1</int></value>
> </param>
> <param>
> <value><int>5</int></value>
> </param>
> </params>
> </methodCall>
> 
> ...which is an XMLRPC call from a Zope server on another machine to this 
> process. I presume the t.w.http.Request content is getting written to a 
> tempfile, but I can't understand why - the Content-Length is tiny (<400 
> bytes).
> 
> I can't seem to reproduce this in a sample application though; does 
> anyone have any ideas how I can narrow down the problem?
> 
> _______________________________________________
> Twisted-web mailing list
> Twisted-web at twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-web



More information about the Twisted-web mailing list