<div dir="ltr"><div> - Python 2.7.3</div><div> - [twisted, version 13.1.0]</div><div> - xen-domU</div><div><br></div><div>`atop` shows that `carbon-relay` is eating 80, 90% USRCPU. From the `strace`:</div><div><br></div><div>
accept(7, {sa_family=AF_INET, sin_port=htons(60649), sin_addr=inet_addr("192.237.222.81")}, [16]) = 257</div><div> accept(7, {sa_family=AF_INET, sin_port=htons(51564), sin_addr=inet_addr("166.78.1.48")}, [16]) = 257</div>
<div> accept(7, 0x7ffff4679550, [16]) = -1 EAGAIN (Resource temporarily unavailable)</div><div> accept(7, {sa_family=AF_INET, sin_port=htons(33654), sin_addr=inet_addr("198.61.194.248")}, [16]) = 257</div>
<div> accept(7, {sa_family=AF_INET, sin_port=htons(50037), sin_addr=inet_addr("166.78.181.204")}, [16]) = 257</div><div> accept(7, 0x7ffff4679550, [16]) = -1 EAGAIN (Resource temporarily unavailable)</div>
<div><br></div><div>The strange thing is: even restart the service, it seems stuck at fd 7 everytime running `strace`. Does it mean this fd is not being cleanup properly?</div><div><br></div><div>I have increased the number of open files:</div>
<div><br></div><div>**/proc/2891/limits** </div><div><br></div><div> Limit Soft Limit Hard Limit Units </div><div> Max cpu time unlimited unlimited seconds </div>
<div> Max file size unlimited unlimited bytes </div><div> Max data size unlimited unlimited bytes </div><div> Max stack size 8388608 unlimited bytes </div>
<div> Max core file size 0 unlimited bytes </div><div> Max resident set unlimited unlimited bytes </div><div> Max processes 15834 15834 processes </div>
<div> Max open files 16384 16384 files </div><div> Max locked memory 65536 65536 bytes </div><div> Max address space unlimited unlimited bytes </div>
<div> Max file locks unlimited unlimited locks </div><div> Max pending signals 15834 15834 signals </div><div> Max msgqueue size 819200 819200 bytes </div>
<div> Max nice priority 0 0 </div><div> Max realtime priority 0 0 </div><div> Max realtime timeout unlimited unlimited us </div>
<div><br></div><div>then it decreases down to ~ 50%.</div><div><br></div><div>My problem looks like similar to this [thread](<a href="http://twistedmatrix.com/pipermail/twisted-python/2008-September/018361.html">http://twistedmatrix.com/pipermail/twisted-python/2008-September/018361.html</a>) but since we have a few sockets in TIME_WAIT state, I don't think that enable the `tw_recycle` can help. About the `tcp_syncookies`, I don't see any related message in the syslog.</div>
<div><br></div><div>This is what I get when trying to start `carbon-relay` in debug mode:</div><div><br></div><div> 26/11/2013 02:22:14 :: [listener] MetricPickleReceiver connection with <a href="http://50.56.249.127:48772">50.56.249.127:48772</a> lost: Connection to the other side was lost in a non-clean fashion: Connection lost.</div>
<div> 26/11/2013 02:22:14 :: [listener] MetricPickleReceiver connection with <a href="http://198.101.241.101:50672">198.101.241.101:50672</a> lost: Connection to the other side was lost in a non-clean fashion: Connection lost.</div>
<div> 26/11/2013 02:22:14 :: [listener] MetricPickleReceiver connection with <a href="http://166.78.2.167:43346">166.78.2.167:43346</a> lost: Connection to the other side was lost in a non-clean fashion: Connection lost.</div>
<div><br></div><div>This is from `twisted`:</div><div><br></div><div> class ConnectionLost(ConnectionClosed):</div><div> """Connection to the other side was lost in a non-clean fashion"""</div>
<div> </div><div> def __str__(self):</div><div> s = self.__doc__.strip().splitlines()[0]</div><div> if self.args:</div><div> s = '%s: %s' % (s, ' '.join(self.args))</div>
<div> s = '%s.' % s</div><div> return s</div><div><br></div><div>I also have tried to [debug with `gdb`](<a href="https://wiki.python.org/moin/DebuggingWithGdb">https://wiki.python.org/moin/DebuggingWithGdb</a>) but `pystack` returns nothing.</div>
</div>