Opened 13 years ago
Closed 9 years ago
#3803 defect closed duplicate (duplicate)
fix many LineReceiver slowdowns and bugs
Reported by: | ivank | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | core | Keywords: | |
Cc: | zooko@… | Branch: | |
Author: |
Description
LineReceiver can be slowed down by various input in three different ways.
The attached implementation:
does not rely on string concatenation optimizations available in CPython 2.5+ and other implementations.
stays fast when `data' contains many lines delivered at once, unless there is excess toggling between line mode and raw mode, with a large `extra' being passed to setLineMode each time.
(note: many lines at once will be slow if Python is missing collections.dequeue, available since CPython 2.4)
searches for the delimiter only in recently-received data, preventing unnecessary searching of the delimiter in a long buffer.
The attached implementation also fixes #3277 and #3050 and probably fixes #3353 because it no longer erroneously rejects lines under the MAX_LENGTH.
A related ticket is #2611 which is focused on fixing O(N2) behavior in all protocols in protocols.basic
Attachments (2)
Change History (16)
Changed 13 years ago by
Attachment: | better-LineReceiver-1.diff added |
---|
comment:1 Changed 13 years ago by
Keywords: | review added |
---|
comment:2 Changed 13 years ago by
Owner: | Glyph deleted |
---|
Thanks for the patch!
Please leave reviews unassigned though, so the first person who has some free time can review it.
Changed 13 years ago by
Attachment: | better-LineReceiver-2.diff added |
---|
changes over #1: remove cStringIO nonsense, add test for delimiter change while in raw mode
comment:3 Changed 13 years ago by
Sorry for using "over #1", I meant "over better-LineReceiver-1.diff"
comment:4 Changed 13 years ago by
Should delimiter be a property so that it can raise an exception if it is changed at a bad time (while in the while loop)?
comment:5 Changed 13 years ago by
I was wrong, this does not fix #3353, because that concerns LineOnlyReceiver as well, where it incorrectly does:
len(self._buffer) > self.MAX_LENGTH
comment:6 Changed 13 years ago by
The attached implementation is still suboptimal, because if data with many delimiters arrives, and the data has lines that switch it into raw mode, the implementation will create and re-join far more string objects than necessary.
It also does unnecessary work converting data into lines, when the protocol is paused. This is probably a minor deficiency.
LineOnlyReceiver should probably have its MAX_LENGTH logic fixed at the same time, if it is fixed for LineReceiver in this branch/ticket.
Anything that hits MAX_LENGTH needs to be tested precisely for both protocols.
The docstrings need to stop referring to tickets and bugs.
Are there any other obvious problems that I have missed?
comment:7 Changed 13 years ago by
Keywords: | review removed |
---|
I think my design is bad because of my first point in the comment above.
It should probably be rewritten to store incoming data into some kind of a buffer object where it is possible to left-truncate data. Then, the implementation can efficiently find lines in the buffer and remove them as they are lineReceived
. I don't know how to do this, but the details of this buffering shouldn't be in LineReceiver
.
So, I'm removing this from review, because this has enough problems as-is.
comment:8 follow-up: 9 Changed 13 years ago by
I think there's a way to do this without changing much of the original LineReceiver
code:
Use a linked list of strings as a buffer (maybe reuse some of the pure-Python StringIO
?). This buffer can be truncated from the left in O(1) time. To stop the 'dribble in bytes slowly' attack, optimize the buffer's x in buffer
operation by remembering where there are no matches for x
.
By putting all the optimization in a special buffer object, LineReceiver
can be kept simple.
Another advantage is that this code will operate more correctly (compared to better-LineReceiver-2.diff), because line extraction will be as lazy as possible (no early split step).
comment:9 Changed 12 years ago by
Cc: | zooko@… added |
---|
Replying to ivank:
Use a linked list of strings as a buffer
I've written things like this a few times, and so the most recent time I went ahead and encapsulated it into a class with unit tests and benchmarks:
http://tahoe-lafs.org/trac/stringchain
Then I used it to successfully fix a performance problem in foolscap while not making too much change to the surrounding foolscap parsing logic: http://foolscap.lothar.com/trac/ticket/149
I think it might be useful here.
comment:10 follow-up: 12 Changed 12 years ago by
Correct me if I'm wrong, but I don't think Twisted can use GPL'd code (as the above stringchain class is). A similar approach might be useful though.
comment:11 Changed 12 years ago by
I suppose I should have said: "I don't think Twisted can _include_ GPL'd code"
comment:12 Changed 12 years ago by
Replying to khorn:
Correct me if I'm wrong, but I don't think Twisted can use GPL'd code (as the above stringchain class is). A similar approach might be useful though.
Indeed not. Good catch.
comment:13 Changed 11 years ago by
comment:14 Changed 9 years ago by
Resolution: | → duplicate |
---|---|
Status: | new → closed |
I've started the work in #6357.
"attached implementation"