Ticket #401 (new defect )

Opened 5 years ago

Last modified 4 years ago

too many requests handled by banana in one batch

Reported by: bruce Assigned to: warner
Type: defect Priority: high
Milestone: Component: pb
Keywords: Cc: glyph, exarkun, itamarst, warner, PenguinOfDoom, bruce
Branch: Author:
Launchpad Bug:

Attachments

Change History

  2003-11-17 09:11:45+00:00 changed by bruce

I have a twisted process which is part of our cluster. It communicates entirely
over PB.  Randomly, which can be once a day, multiple times a day, once a week,
etc, it will:
   * Stop responding to network traffic
   * Chew 100% CPU
   * Process PB requests with log output getting written to file. The log
     output appears to be slow, but this could be due to the PB requests
     being executed slowly, the file I/O happening slowly, or the general
     100% CPU usage.  I assume nothing about the speed as of yet.
   * After several minutes, which can range from 5 minutes to over 30, it
     completes processing PB requests and finally issues a select() call.
I know that while it is using 100% CPU it does *not* issue a select() call
because I have logged over 7 minutes of it running via strace.
Typical strace output looks like:
munmap(0x415e7000, 3751936)             = 0
old_mmap(NULL, 3751936, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x41255000
munmap(0x40ec1000, 3751936)             = 0
old_mmap(NULL, 3751936, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x40ec1000
munmap(0x41255000, 3751936)             = 0
gettimeofday({1068777839, 650421}, NULL) = 0
write(3, "2003/11/13 19:43:59 MST [Broker,"..., 72) = 72
brk(0x8db5000)                          = 0x8db5000
old_mmap(NULL, 3751936, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x41255000
munmap(0x40ec1000, 3751936)             = 0
old_mmap(NULL, 3751936, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x40ec1000
munmap(0x41255000, 3751936)             = 0
old_mmap(NULL, 3751936, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x41255000
munmap(0x40ec1000, 3751936)             = 0
old_mmap(NULL, 3751936, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x40ec1000
munmap(0x41255000, 3751936)             = 0
old_mmap(NULL, 3751936, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x41255000
munmap(0x40ec1000, 3751936)             = 0
gettimeofday({1068777839, 764617}, NULL) = 0
write(3, "2003/11/13 19:43:59 MST [Broker,"..., 72) = 72
old_mmap(NULL, 3751936, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x40ec1000
munmap(0x41255000, 3751936)             = 0
old_mmap(NULL, 3751936, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x41255000
munmap(0x40ec1000, 3751936)             = 0
old_mmap(NULL, 3751936, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x40ec1000
munmap(0x41255000, 3751936)             = 0
old_mmap(NULL, 3751936, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x41255000
munmap(0x40ec1000, 3751936)             = 0
old_mmap(NULL, 3751936, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x40ec1000
munmap(0x41255000, 3751936)             = 0
old_mmap(NULL, 3751936, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x41255000
munmap(0x40ec1000, 3751936)             = 0
old_mmap(NULL, 3751936, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x40ec1000
munmap(0x41255000, 3751936)             = 0
I'm running Python 2.2.1.  I'm not going to upgrade to 2.3.2 until I get this
resolved because I need to be sure that it is resolve.  I did rebuild my Python
install to have debug data so that I could get the Python stack trace from gdb.
I did that today and this was one of the stack traces, with lots of interspersed
gdb goo:
(gdb) up 3
#3  0x0807400d in eval_frame (f=0x867779c) at Python/ceval.c:973
973                                     x = PyNumber_Add(v, w);
(gdb) pyframe
0x829dc94: "/home/bruce/nobackup/twisted/Twisted/twisted/internet/abstract.py"
0x829dcf4:       "write"
$1 = 159
(gdb) up 3
#6  0x08075e31 in eval_frame (f=0x8679e14) at Python/ceval.c:2024
2024                                        x = fast_function(func, &stack_pointer,
(gdb) pyframe
0x82f9914: "/home/bruce/nobackup/twisted/Twisted/twisted/spread/banana.py"
0x82f9974:       "sendEncoded"
$2 = 316
(gdb) up 3
#9  0x08075e31 in eval_frame (f=0x8665e3c) at Python/ceval.c:2024
2024                                        x = fast_function(func, &stack_pointer,
(gdb) pyframe
0x83047d4:       "/home/bruce/nobackup/twisted/Twisted/twisted/spread/pb.py"
0x830482c:       "sendCall"
$3 = 532
(gdb) up 3
#12 0x08075e31 in eval_frame (f=0x8678de4) at Python/ceval.c:2024
2024                                        x = fast_function(func, &stack_pointer,
(gdb) pyframe
0x8308edc:       "/home/bruce/nobackup/twisted/Twisted/twisted/spread/pb.py"
0x8308f34:       "_sendMessage"
$4 = 818
(gdb) up 3
#15 0x08075e31 in eval_frame (f=0x86788b4) at Python/ceval.c:2024
2024                                        x = fast_function(func, &stack_pointer,
(gdb) pyframe
0x8300d6c:       "/home/bruce/nobackup/twisted/Twisted/twisted/spread/pb.py"
0x8300dc4:       "callRemote"
$5 = 338
(gdb) up 3
#18 0x080a6730 in PyObject_Call (func=0x8327dbc, arg=0x864fed4, kw=0x0) at
Objects/abstract.c:1684
1684                    PyObject *result = (*call)(func, arg, kw);
(gdb) up
#19 0x080ad04f in instancemethod_call (func=0x8327dbc, arg=0x83b7804, kw=0x0) at
Objects/classobject.c:2276
2276            result = PyObject_Call((PyObject *)func, arg, kw);
(gdb) down
#18 0x080a6730 in PyObject_Call (func=0x8327dbc, arg=0x864fed4, kw=0x0) at
Objects/abstract.c:1684
1684                    PyObject *result = (*call)(func, arg, kw);
(gdb) down
#17 0x080b7b69 in function_call (func=0x8327dbc, arg=0x864fed4, kw=0x0) at
Objects/funcobject.c:374
374             result = PyEval_EvalCodeEx(
(gdb) up
#18 0x080a6730 in PyObject_Call (func=0x8327dbc, arg=0x864fed4, kw=0x0) at
Objects/abstract.c:1684
1684                    PyObject *result = (*call)(func, arg, kw);
(gdb) up
#19 0x080ad04f in instancemethod_call (func=0x8327dbc, arg=0x83b7804, kw=0x0) at
Objects/classobject.c:2276
2276            result = PyObject_Call((PyObject *)func, arg, kw);
(gdb) up
#20 0x080a6730 in PyObject_Call (func=0x866223c, arg=0x83b7804, kw=0x0) at
Objects/abstract.c:1684
1684                    PyObject *result = (*call)(func, arg, kw);
(gdb) up
#21 0x0807754f in PyEval_CallObjectWithKeywords (func=0x866223c, arg=0x83b7804,
kw=0x0) at Python/ceval.c:3049
3049            result = PyObject_Call(func, arg, kw);
(gdb) up
#22 0x080c4f81 in builtin_apply (self=0x0, args=0x867a9a4) at
Python/bltinmodule.c:95
95              retval = PyEval_CallObjectWithKeywords(func, alist, kwdict);
(gdb) up
#23 0x080c35d4 in PyCFunction_Call (func=0x80ff9d8, arg=0x867a9a4, kw=0x0) at
Objects/methodobject.c:80
80                      return (*meth)(self, arg);
(gdb) up
#24 0x08075d93 in eval_frame (f=0x869ffc4) at Python/ceval.c:2004
2004                                        x = PyCFunction_Call(func, callargs,
NULL);
(gdb) pyframe
0x81ee83c:       "/var/lotr/lotr/server/sigil/services/lobby.py"
0x81ee88c:       "callRemote"
$6 = 1558
(gdb) up 3
#27 0x08075e31 in eval_frame (f=0x86a217c) at Python/ceval.c:2024
2024                                        x = fast_function(func, &stack_pointer,
(gdb) pyframe
0x81e158c:       "/var/lotr/lotr/server/sigil/services/lobby.py"
0x81e15dc:       "memberLeft"
$7 = 1047
(gdb) up 3
#30 0x08075e31 in eval_frame (f=0x86a3fb4) at Python/ceval.c:2024
2024                                        x = fast_function(func, &stack_pointer,
(gdb) pyframe
0x836cb5c:       "/var/lotr/lotr/server/sigil/services/lobby.py"
0x836cbac:       "removeMember"
$8 = 2080
(gdb) up 3
#33 0x08075e31 in eval_frame (f=0x86d428c) at Python/ceval.c:2024
2024                                        x = fast_function(func, &stack_pointer,
(gdb) pyframe
0x81dce2c:       "/var/lotr/lotr/server/sigil/services/lobby.py"
0x81dce7c:       "removeMember"
$9 = 1839
(gdb) up 3
#36 0x08075e31 in eval_frame (f=0x869834c) at Python/ceval.c:2024
2024                                        x = fast_function(func, &stack_pointer,
(gdb) pyframe
0x81decec:       "/var/lotr/lotr/server/sigil/services/lobby.py"
0x81ded3c:       "leaveGroup"
$10 = 907
(gdb) up 3
#39 0x08075e31 in eval_frame (f=0x866773c) at Python/ceval.c:2024
2024                                        x = fast_function(func, &stack_pointer,
(gdb) pyframe
0x81dd764:       "/var/lotr/lotr/server/sigil/services/lobby.py"
0x81dd7b4:       "detached"
$11 = 813
(gdb) up 3
#42 0x08075e31 in eval_frame (f=0x867048c) at Python/ceval.c:2024
2024                                        x = fast_function(func, &stack_pointer,
(gdb) pyframe
0x81eda94:       "/var/lotr/lotr/server/sigil/services/lobby.py"
0x81edae4:       "removeAccount"
$12 = 1524
(gdb) up 3
#45 0x080a6730 in PyObject_Call (func=0x83fe1b4, arg=0x86787ec, kw=0x8b06f3c) at
Objects/abstract.c:1684
1684                    PyObject *result = (*call)(func, arg, kw);
(gdb) up
#46 0x08077bb8 in ext_do_call (func=0x83fe1b4, pp_stack=0xbfffcf44, flags=3,
na=1, nk=0) at Python/ceval.c:3317
3317            result = PyObject_Call(func, callargs, kwdict);
(gdb) up
#47 0x08075f5b in eval_frame (f=0x86b5d3c) at Python/ceval.c:2072
2072                        x = ext_do_call(func, &stack_pointer, flags, na, nk);
(gdb) pyframe
0x82fee44:       "/home/bruce/nobackup/twisted/Twisted/twisted/spread/pb.py"
0x82fee9c:       "perspectiveMessageReceived"
$13 = 234
(gdb) up 3
#50 0x08075e31 in eval_frame (f=0x86b492c) at Python/ceval.c:2024
2024                                        x = fast_function(func, &stack_pointer,
(gdb) pyframe
0x83098bc:       "/home/bruce/nobackup/twisted/Twisted/twisted/spread/pb.py"
0x8309914:       "_recvMessage"
$14 = 836
So, we can see that PB is executing messages.  The exact message that is being
executed varies. I cont'd and ^C'd multiple times.  Each time, it broke in the
memcpy() routine while writing data into self.dataBuffer.
My version of Twisted is a bit old, it is the tag that was called
1.0.8.something.  I've been trying to avoid the pain of dealing with API
changes, so I haven't yet updated any further than that.  If you need
translations of the line numbers to individual lines of source, let me know.
The self.dataBuffer object looks like this:
(gdb) print *((PyStringObject*)v)
$19 = {ob_refcnt = 2, ob_type = 0x80df160, ob_size = 2759220, ob_shash = -1,
ob_sinterned = 0x0, ob_sval = "\a"}
So, we can see that it is about 2.7 megs in size.
The data that was being appended was this:
(gdb) print *((PyStringObject*)w)
$20 = {ob_refcnt = 4, ob_type = 0x80df160, ob_size = 71, ob_shash = -1,
ob_sinterned = 0x0, ob_sval = "\a"}
The actual contents of the buffers were:
(gdb) x/s ((PyStringObject*)v)->ob_sval
0x40d9b01c:
"\a\200\a\202message\021>*\201\001\201\016\202notifyGroupAdd\001\201\004\200\005\202tuple'G\201\002\201\n\200\n\202dictionary\002\200\t\202subGroups\001\200\004\202list\002\200\004\202namez%\a\201\002\200\017\202defaultMetadata\001\200\n\202dictionary\002\200\027\202childrenDefaultMetadata\001\200\n\202dictionary\002\200\031\202childrenMandatoryMetadata\001"...
(gdb) x/s ((PyStringObject*)2)->ob_sval
0x16:    <Address 0x16 out of bounds>
(gdb) x/s ((PyStringObject*)w)->ob_sval
0x8b412d4:
"\a\200\a\202message\006r+\201\001\201\023\202notifyStatusChanged\001\201\004\200\005\202tuplec+\201Y\020\201\005\201\001\200\n\202dictionary"
So, we can see that on this particular occasion that I printed 'w' (data), it
was sending a 'notifyStatusChanged' message.  self.dataBuffer started with a PB
message for 'notifyGroupAdd'.

  2003-11-17 09:17:30+00:00 changed by bruce

My current theory is this:
1) The server that issues requests to the failing server dumps a boatload of PB
requests on it at once.
2) It makes a call to cBanana.dataReceived() and processes them until they've
all been executed.
3) Each command is appending to the transport's dataBuffer.  This dataBuffer is
a Python string which has abysmal behavior for lots of small appends that total
up to something large.
So, there is a factor that causes this problem, which is not yet identified
which dumps a boatload of PB requests in a single, very large shot.
And then there is the seemingly/potentially disastrous behavior of the
dataBuffer as it expands.  This helps push us way over a Quadratic Cliff,
leading to the 7+ minutes of processing time.
There may be other factors as well either that are part of the root cause or
things that help push the server over the Quadratic Cliff while it is trying to
recover from the root cause.

  2003-11-17 10:10:17+00:00 changed by bruce

Discussion #twisted leads to 3 conclusions as I understand it:
   * Need for iovec to avoid the string concatentation behavior.
   * Need for rate-limiting the # of expressionReceived's per
     dataReceived/reactor.iterate that Banana will process.
   * Need to find out what causes those boatloads of PB requests
     to land at once, or if indeed that is what is happening.

  2003-11-19 05:51:32+00:00 changed by itamarst

(Why the hell wasn't I getting copies of nosy emails?)
I suggest issues #2 and #3 be added as issues for (or just noted to) warner.

  2004-05-25 15:16:20+00:00 changed by glyph

Since it falls roughly into PB...

  2004-06-08 20:29:56+00:00 changed by jknight

Is this not actually (primarily) a bug in t.i.abstract? Surely dataBuffer should either be a
list of strings or a cStringIO?

  2004-09-11 04:19:11+00:00 changed by warner

Newpb should make it easier to handle this.
On the sending side, Banana.produce() is used to add things to the output queue.
It loops until a) it runs out of objects to send, b) some object tells it (with
a Deferred) that it needs a minute before it is ready to provide the next token,
or c) the .paused flag is set. The intention was to make Banana into a proper
Producer, but I haven't written any tests to verify that it can be used that way.
That means it ought to be possible to have the transport pause() the send-side
of Banana when the transmit buffer becomes full, rather than buffer-to-RAM a
bazillion tokens. This would at least limit the size of the problem.
It is also possible to have Banana notify the transport when it has finished
doing writes() for a while. This would allow us to make an enhanced transport
which implements write() by adding the data chunks to a list instead of doing
string concatenates, then implement flush() by using iovec. This needs some work
on the transport/protocol relationship, but Banana is designed to take advantage
of it when it becomes available.
On the receive side, we need to implement some kind of .pauseReceiver method
which does .stopReading(). .resumeReceiver should re-fire dataReceived and do
.startReading if we don't get re-paused by the time we finish handling the
current backlog. I don't think this will be too hard to implement. Some other
level can then decide when to pause/resume the protocol (perhaps a ThrottledBroker).
As for the cause, you could send a lot of little PB calls (say a few thousand of
them) fairly quickly. You'd fill the TCP transmit buffer in short order, then
the rest would get buffered to RAM. Once they're in RAM, TCP can fling them out
in huge batches, which means the receiving end will receive them in huge
batches. A single dataReceived call with like 4k of data could result in
hundreds of PB method calls, starving the reactor.
A test case for this would be awesome, but of course it would be really
difficult to write. Something which doesn't use the network (just flings PB
requests across a loopback transport) might work, it could count the number of
callLater(0) calls that got run while the requests were being sent and make sure
it's more than one.

  2004-09-11 04:21:54+00:00 changed by itamarst

My long term goal is to fix this in a way that lets me access this from the
C++/Pyrex API I have as well. However supporting both Pyrex and C++ at once in a
nice way is problematic.

  2004-10-26 00:45:51+00:00 changed by jknight

t.i.abstract doesn't have such horrible performance any more, so removing that from title. PB issue
may still be present, though, I have no idea.
Note: See TracTickets for help on using tickets.