Opened 11 years ago

Closed 5 years ago

Last modified 5 years ago

#2611 enhancement closed duplicate (duplicate)

Protocols in twisted.protocols.basic use O(n**2) complexity

Reported by: therve Owned by:
Priority: highest Milestone:
Component: core Keywords:
Cc: ivank, spiv, bshi, indigo, zooko@… Branch: branches/faster-string-receivers-2611-2
branch-diff, diff-cov, branch-cov, buildbot
Author: therve, danderson

Description (last modified by therve)

LineOnlyReceiver and LineReceiver hold data received in a string whereas they should use a list. This could probably be solved in one pass, the hard part being the tests, of course.

Attachments (6)

bench_intn.png (4.4 KB) - added by danderson 10 years ago.
Benchmark of the IntNStringReceiver
bench_line.png (4.1 KB) - added by danderson 10 years ago.
Benchmark of the LineReceiver
bench_lineonly.png (3.9 KB) - added by danderson 10 years ago.
Benchmark of the LineOnlyReceiver
reduce_basic_protocols_complexity.patch (15.5 KB) - added by danderson 10 years ago.
Patch closing this ticket
faster-string-receivers-2611.patch (19.1 KB) - added by indigo 6 years ago.
resolved conflicts
faster-string-receivers-2611.2.patch (19.0 KB) - added by indigo 6 years ago.
fixed a few errors introduced by merge conflicts

Download all attachments as: .zip

Change History (40)

comment:1 Changed 10 years ago by Jean-Paul Calderone

#2987 was a duplicate of this.

comment:2 Changed 10 years ago by danderson

Owner: changed from therve to danderson
Status: newassigned

comment:3 Changed 10 years ago by danderson

The attached patch reduces the complexity of LineOnlyReceiver, LineReceiver and IntNStringReceiver to O(n), with n the number of packets sitting in the receive buffer.

I wrote a quick benchmark that records the runtime for each of these protocols when receiving 100 "messages" (lines for the first two protocols, length-prefixed packets for IntNStringReceiver), each message being split into N 50 byte slices. The attached plots comparing trunk Twisted to trunk+patch fairly clearly show the reduction in complexity. Performance in non-pathological cases (< 10 packets per message) is equivalent to or better than the original code.

The patched trunk passes basic.* unit tests.

Changed 10 years ago by danderson

Attachment: bench_intn.png added

Benchmark of the IntNStringReceiver

Changed 10 years ago by danderson

Attachment: bench_line.png added

Benchmark of the LineReceiver

Changed 10 years ago by danderson

Attachment: bench_lineonly.png added

Benchmark of the LineOnlyReceiver

comment:4 Changed 10 years ago by danderson

Keywords: review added
Priority: normalhighest

comment:5 Changed 10 years ago by therve

Owner: danderson deleted
Status: assignednew

comment:6 Changed 10 years ago by therve

Keywords: review removed
Owner: set to danderson

First, thanks a lot for your contribution, this is a really nice work.

Some comments:

  • please provide the source of your benchmark, to be put in doc/core/benchmarks. I also wonder what python version you use.
  • don't write self._buffer = self._buffer or [], make an explicit check of None
  • at several place, you're doing buffer + [data]. I expect appending the data to be faster, because this realloc a new list each time.
  • the new added attributes should be documented in the class docstring.

comment:7 in reply to:  6 Changed 10 years ago by danderson

Keywords: review added

Replying to therve:

Some comments:

The attached patch has been updated in response to your comments.

  • please provide the source of your benchmark, to be put in doc/core/benchmarks. I also wonder what python version you use.

Done. Warning: it's not the prettiest code in the world, but it gets the job done.

The benchmark was run with Python 2.5. I have a 2.4 handy if you'd like a run with that version as well.

  • don't write self._buffer = self._buffer or [], make an explicit check of None

Done.

  • at several place, you're doing buffer + [data]. I expect appending the data to be faster, because this realloc a new list each time.

Done. New benchmarks show no noticeable change in performance.

  • the new added attributes should be documented in the class docstring.

Which new attributes? All the additions are private, and from what I see in the rest of basic.py, those shouldn't be documented in the class docstring.

A note however: I have removed the recvd attribute from IntNStringReceiver, since it appears to be an internal attribute that was exposed as public. Given that the type of the buffer has now changed to a list, either recvd should be removed, or made into a property which converts between the representations as needed. The property would be good, but I don't know if Twisted needs to support Python versions where properties didn't exist. What do you think?

Changed 10 years ago by danderson

Patch closing this ticket

comment:8 Changed 10 years ago by danderson

The latest revised version of this patch adds documentation for all attributes, public and private, of the three classes I touched. It also adds a property() based accessor for IntNStringReceiver.recvd, which allows access to the incoming buffer as a string, as it was before this patch. Since recvd should have been private to begin with, the accessors fire DeprecationWarnings.

I hope that these revisions fit what is best for the codebase.

comment:9 Changed 10 years ago by Jean-Paul Calderone

Owner: danderson deleted

comment:10 Changed 10 years ago by therve

Keywords: review removed
Owner: set to danderson

For the record, it breaks with multi-characters line separator feed one by one (for example in twisted.web tests).

comment:11 Changed 9 years ago by Jean-Paul Calderone

Author: exarkun
Branch: branches/faster-string-receivers-2611

(In [25588]) Branching to 'faster-string-receivers-2611'

comment:12 Changed 9 years ago by Jean-Paul Calderone

(In [25589]) Apply reduce_basic_protocols_complexity.patch (resolving conflicts)

refs #2611

comment:13 Changed 9 years ago by ivank

There is an attempt to fix O(n2) complexity among other things for just LineReceiver in #3803

comment:14 Changed 9 years ago by ivank

Cc: ivank added

comment:15 Changed 8 years ago by Jean-Paul Calderone

#4160 was a duplicate of this.

comment:16 Changed 8 years ago by spiv

Cc: spiv added

comment:17 Changed 8 years ago by Itamar Turner-Trauring

Is the branch ready for review?

comment:18 Changed 8 years ago by bshi

Cc: bshi added

If performance is a goal here, perhaps cStringIO would provide a constant factor speedup over using list() to maintain what appears to be essentially a byte buffer? You'd need an extra state variable to keep track of the string size as it grows.

If there's interest, I could try modifying the patch and running the performance tests to see if there's enough of an improvement to warrant action.

Changed 6 years ago by indigo

resolved conflicts

Changed 6 years ago by indigo

fixed a few errors introduced by merge conflicts

comment:19 Changed 6 years ago by indigo

I've updated the patch and resolved conflicts. The lasted patch (faster-string-receivers-2611.2.patch) has a few failing tests in twisted.test.test_protocols which seem to be a good indication of issues that need to be resolved before this can be accepted.

Also, there's a new DeprecationWarning which references an old version.

comment:20 Changed 6 years ago by indigo

Cc: indigo added

comment:21 Changed 6 years ago by Jean-Paul Calderone

Author: exarkundanderson

Just trying to fix the metadata. I think danderson wrote the code in this branch, I only checked it in.

comment:22 Changed 5 years ago by therve

Description: modified (diff)

comment:23 Changed 5 years ago by Itamar Turner-Trauring

Keywords: review added
Owner: danderson deleted

New patch, so I guess this should have review keyword.

comment:24 Changed 5 years ago by Tom Prince

There are a bunch of non-trivial merge conflicts here.

comment:25 Changed 5 years ago by Glyph

Keywords: review removed
Owner: set to Tom Prince

I did some optimizations in #5075 which are probably the source of some of those merge conflicts; I don't think that IntNStringReceiver has superlinear complexity any more; I am not sure about LineReceiver.

In any case, somebody needs to deal with those merge conflicts now, so it's probably not appropriate for this to be in review any more. Tom, I nominate you to figure it out since you've got some time on your hands, and this is a pretty low ticket number (and therefore high priority by virtue of being super old...)

comment:26 Changed 5 years ago by Tom Prince

So, I've done a little bit of benchmarking of this vs. trunk (using https://code.launchpad.net/~exarkun/+junk/twisted-benchmarks).

Varying (struct.pack("!H", 50) + "a" * 50) * 1000) in int16receiver.py, I get that trunk decodes ~1.5x as many strings with 50+1000/50+10,000 and the branch decodes ~1.5x as many strings with 5000+100. In other words, trunk appears to be faster at longer inputs, but the branch at longer strings.

Thus, picking one implementation vs. another appears to be a tradeoff. I'm not sure if there is a way to behave better in both cases.

comment:27 Changed 5 years ago by Tom Prince

Testing linereceiver.py, trunk appears faster (4896 vs 4624) and (6528 vs 6256 with 500+100).

Testing lineonlyreceiver.py, trunk appears faster on more lines (13056 vs 10608) but the branch on longer lines (14960 vs 16048).

comment:28 Changed 5 years ago by Tom Prince

Testing with receivers_fragmentation.py from the branch:

branch:

$ python receivers_fragmentation.py 1000 200 100 1000 300 
100 1.623524 4.786991 3.291329
400 6.369877 18.849241 6.462620
700 43.241146 33.814037 9.774048

trunk:

$ python receivers_fragmentation.py 1000 200 100 1000 300
100 5.372282 8.180363 5.309428
400 67.117141 77.499025 15.540092

(700 takes too long to get numbers)

comment:29 in reply to:  28 Changed 5 years ago by Glyph

Replying to tom.prince:

Testing with receivers_fragmentation.py from the branch: trunk: (700 takes too long to get numbers)

So it sounds like the performance is still quadratic on trunk, then, and we should be trying to get the branch to apply cleanly as a patch to trunk? A few more data points would help fit the curve more certainly :).

comment:30 Changed 5 years ago by therve

Author: dandersontherve, danderson
Branch: branches/faster-string-receivers-2611branches/faster-string-receivers-2611-2

(In [37277]) Branching to 'faster-string-receivers-2611-2'

comment:31 Changed 5 years ago by Itamar Turner-Trauring

Keywords: review added
Owner: Tom Prince deleted

Putting back in review, since it was merged forward.

comment:32 Changed 5 years ago by Jean-Paul Calderone

Keywords: review removed

< therve> not ready for review

comment:33 Changed 5 years ago by therve

Resolution: duplicate
Status: newclosed

I've opened #6356, #6357 and #6358 to manage the 3 different classes mentioned here.

comment:34 Changed 5 years ago by zooko

Cc: zooko@… added
Note: See TracTickets for help on using tickets.