Opened 9 years ago

Closed 5 years ago

#3803 defect closed duplicate (duplicate)

fix many LineReceiver slowdowns and bugs

Reported by: ivank Owned by:
Priority: normal Milestone:
Component: core Keywords:
Cc: zooko@… Branch:
Author:

Description

LineReceiver can be slowed down by various input in three different ways.

The attached implementation:

does not rely on string concatenation optimizations available in CPython 2.5+ and other implementations.

stays fast when `data' contains many lines delivered at once, unless there is excess toggling between line mode and raw mode, with a large `extra' being passed to setLineMode each time.

(note: many lines at once will be slow if Python is missing collections.dequeue, available since CPython 2.4)

searches for the delimiter only in recently-received data, preventing unnecessary searching of the delimiter in a long buffer.

The attached implementation also fixes #3277 and #3050 and probably fixes #3353 because it no longer erroneously rejects lines under the MAX_LENGTH.

A related ticket is #2611 which is focused on fixing O(N2) behavior in all protocols in protocols.basic

Attachments (2)

better-LineReceiver-1.diff (14.1 KB) - added by ivank 9 years ago.
"attached implementation"
better-LineReceiver-2.diff (14.6 KB) - added by ivank 9 years ago.
changes over #1: remove cStringIO nonsense, add test for delimiter change while in raw mode

Download all attachments as: .zip

Change History (16)

Changed 9 years ago by ivank

Attachment: better-LineReceiver-1.diff added

"attached implementation"

comment:1 Changed 9 years ago by ivank

Keywords: review added

comment:2 Changed 9 years ago by Glyph

Owner: Glyph deleted

Thanks for the patch!

Please leave reviews unassigned though, so the first person who has some free time can review it.

Changed 9 years ago by ivank

Attachment: better-LineReceiver-2.diff added

changes over #1: remove cStringIO nonsense, add test for delimiter change while in raw mode

comment:3 Changed 9 years ago by ivank

Sorry for using "over #1", I meant "over better-LineReceiver-1.diff"

comment:4 Changed 9 years ago by ivank

Should delimiter be a property so that it can raise an exception if it is changed at a bad time (while in the while loop)?

comment:5 Changed 9 years ago by ivank

I was wrong, this does not fix #3353, because that concerns LineOnlyReceiver as well, where it incorrectly does:

len(self._buffer) > self.MAX_LENGTH

comment:6 Changed 9 years ago by ivank

The attached implementation is still suboptimal, because if data with many delimiters arrives, and the data has lines that switch it into raw mode, the implementation will create and re-join far more string objects than necessary.

It also does unnecessary work converting data into lines, when the protocol is paused. This is probably a minor deficiency.

LineOnlyReceiver should probably have its MAX_LENGTH logic fixed at the same time, if it is fixed for LineReceiver in this branch/ticket.

Anything that hits MAX_LENGTH needs to be tested precisely for both protocols.

The docstrings need to stop referring to tickets and bugs.

Are there any other obvious problems that I have missed?

comment:7 Changed 9 years ago by ivank

Keywords: review removed

I think my design is bad because of my first point in the comment above.

It should probably be rewritten to store incoming data into some kind of a buffer object where it is possible to left-truncate data. Then, the implementation can efficiently find lines in the buffer and remove them as they are lineReceived. I don't know how to do this, but the details of this buffering shouldn't be in LineReceiver.

So, I'm removing this from review, because this has enough problems as-is.

comment:8 Changed 8 years ago by ivank

I think there's a way to do this without changing much of the original LineReceiver code:

Use a linked list of strings as a buffer (maybe reuse some of the pure-Python StringIO?). This buffer can be truncated from the left in O(1) time. To stop the 'dribble in bytes slowly' attack, optimize the buffer's x in buffer operation by remembering where there are no matches for x.

By putting all the optimization in a special buffer object, LineReceiver can be kept simple.

Another advantage is that this code will operate more correctly (compared to better-LineReceiver-2.diff), because line extraction will be as lazy as possible (no early split step).

comment:9 in reply to:  8 Changed 7 years ago by zooko

Cc: zooko@… added

Replying to ivank:

Use a linked list of strings as a buffer

I've written things like this a few times, and so the most recent time I went ahead and encapsulated it into a class with unit tests and benchmarks:

http://tahoe-lafs.org/trac/stringchain

Then I used it to successfully fix a performance problem in foolscap while not making too much change to the surrounding foolscap parsing logic: http://foolscap.lothar.com/trac/ticket/149

I think it might be useful here.

comment:10 Changed 7 years ago by khorn

Correct me if I'm wrong, but I don't think Twisted can use GPL'd code (as the above stringchain class is). A similar approach might be useful though.

comment:11 Changed 7 years ago by khorn

I suppose I should have said: "I don't think Twisted can _include_ GPL'd code"

comment:12 in reply to:  10 Changed 7 years ago by Glyph

Replying to khorn:

Correct me if I'm wrong, but I don't think Twisted can use GPL'd code (as the above stringchain class is). A similar approach might be useful though.

Indeed not. Good catch.

comment:13 Changed 7 years ago by <automation>

comment:14 Changed 5 years ago by therve

Resolution: duplicate
Status: newclosed

I've started the work in #6357.

Note: See TracTickets for help on using tickets.