Opened 8 years ago

Closed 5 years ago

#2310 enhancement closed fixed (fixed)

twisted.spread.banana.b1282int could be faster

Reported by: exarkun Owned by:
Priority: low Milestone:
Component: pb Keywords:
Cc: catlee Branch: branches/faster-b1282int-2310
(diff, github, buildbot, log)
Author: catlee Launchpad Bug:

Description

Banana calls this function a lot. It might be a bottleneck, who knows. Someone should profile it. I forget if I did. I found some code that apparently I wrote a while ago, though (patch reversed):

Index: twisted/spread/banana.py
===================================================================
--- twisted/spread/banana.py    (revision 18957)
+++ twisted/spread/banana.py    (working copy)
@@ -35,19 +35,18 @@
         stream(chr(integer & 0x7f))
         integer = integer >> 7

-def b1282int(st):
-    oneHundredAndTwentyEight = 128l
+
+def b1282int(st, _powersOfOneTwentyEight = []):
     i = 0
-    place = 0
-    for char in st:
+    if len(st) > len(_powersOfOneTwentyEight):
+        _powersOfOneTwentyEight.extend([
+            128 ** n for n in xrange(len(_powersOfOneTwentyEight), len(st))])
+    for place, char in enumerate(st):
         num = ord(char)
-        i = i + (num * (oneHundredAndTwentyEight ** place))
-        place = place + 1
-    if i <= 2147483647:
-        return int(i)
-    else:
-        return i
+        i = i + (num * _powersOfOneTwentyEight[place])
+    return i

+
 # delimiter characters.
 LIST     = chr(0x80)
 INT      = chr(0x81)

The strategy is simple: cache powers instead of recomputing them all the time. This drastically speeds up the function for multiple invocations.

Attachments (2)

banana_speed.2.py (1.4 KB) - added by catlee 5 years ago.
benchmark of b1282int methods
timing-test-2310.py (1.1 KB) - added by TimAllen 5 years ago.
An alternative microbenchmark

Download all attachments as: .zip

Change History (12)

comment:1 Changed 7 years ago by exarkun

  • Priority changed from normal to low

Plenty of Twisted could be faster I guess. I don't think the above patch actually ended up helping a whole lot, anyway.

comment:2 Changed 5 years ago by catlee

b1282int came up fairly high on one of our buildbot masters that handles a lot of incoming data from the slaves.

The following patch speeds up b1282int about 4x:

Index: banana.py
===================================================================
--- banana.py	(revision 27222)
+++ banana.py	(working copy)
@@ -34,17 +34,13 @@
         integer = integer >> 7
 
 def b1282int(st):
-    oneHundredAndTwentyEight = 128l
+    e = 1
     i = 0
-    place = 0
     for char in st:
-        num = ord(char)
-        i = i + (num * (oneHundredAndTwentyEight ** place))
-        place = place + 1
-    if i <= 2147483647:
-        return int(i)
-    else:
-        return i
+        n = ord(char)
+        i += (n * e)
+        e <<= 7
+    return i
 
 # delimiter characters.
 LIST     = chr(0x80)

For strings less than around 15 characters this is faster than the patch above.

Some simple benchmarks on my machine (measured in calls / second):

    string length   original    caching version     bitshifting version
    5               96602       282676              343451
    10              46349       176972              187775
    15              32511       119855              109543
    50              7894        33980               28125
    100             2998        14342               11502

comment:3 Changed 5 years ago by catlee

  • Cc catlee added

Changed 5 years ago by catlee

benchmark of b1282int methods

comment:4 Changed 5 years ago by catlee

  • Keywords review added

Changed 5 years ago by TimAllen

An alternative microbenchmark

comment:5 Changed 5 years ago by TimAllen

The attached banana_speed.2.py didn't work for me at first, so I wrote my own microbenchmark (attached as timing-test-2310.py)... the original seems to be working for me again now, not quite sure why.

Here's the results of my benchmark on my machine ('cps' is 'calls per second', I'm running Python 2.4 on Fedora 6, 32-bit):

Function       'caching',   1 byte string:     126582 cps
Function      'original',   1 byte string:     156250 cps
Function   'bitshifting',   1 byte string:     370370 cps
Function       'caching',   2 byte string:     196078 cps
Function      'original',   2 byte string:     101010 cps
Function   'bitshifting',   2 byte string:     285714 cps
Function       'caching',   3 byte string:     169491 cps
Function      'original',   3 byte string:      69444 cps
Function   'bitshifting',   3 byte string:     222222 cps
Function       'caching',   4 byte string:     147058 cps
Function      'original',   4 byte string:      52631 cps
Function   'bitshifting',   4 byte string:     188679 cps
Function       'caching',   5 byte string:     119047 cps
Function      'original',   5 byte string:      44642 cps
Function   'bitshifting',   5 byte string:     133333 cps
Function       'caching',  10 byte string:      67567 cps
Function      'original',  10 byte string:      21739 cps
Function   'bitshifting',  10 byte string:      63694 cps
Function       'caching',  50 byte string:      13661 cps
Function      'original',  50 byte string:       3825 cps
Function   'bitshifting',  50 byte string:      11534 cps
Function       'caching', 100 byte string:       6485 cps
Function      'original', 100 byte string:       1616 cps
Function   'bitshifting', 100 byte string:       5277 cps

I would guess that most integers transmitted by banana would be less than 32-bit, and thus take less than 5 bytes in the b128 encoding. 'caching' scales very well to large numbers, but very poorly to small numbers - for one byte strings, it's even slower than the original algorithm! 'bitshifting' scales up nearly as well as 'caching' (both are much better than the original), but it scales down spectacularly as well.

Also, note that the bit-shifting implementation is vastly simpler than the caching implementation, and passes all the tests.

I'm +1 for bit-shifting.

comment:6 Changed 5 years ago by exarkun

  • Owner exarkun deleted

comment:7 Changed 5 years ago by exarkun

  • Author set to exarkun
  • Branch set to branches/faster-b1282int-2310

(In [28057]) Branching to 'faster-b1282int-2310'

comment:8 Changed 5 years ago by exarkun

  • Author changed from exarkun to catlee
  • Keywords review removed
  • Owner set to exarkun
  • Status changed from new to assigned

Please attach patches as files rather than including them inline in comments. trac destroys patches posted this way. Fortunately this patch is simple, so I applied could reasonably apply it by hand.

The change looks good. I checked in a simpler version of the benchmark. It doesn't need to run against all versions, just against the version it can import from Twisted.

Going to apply this change now. Thanks.

comment:9 Changed 5 years ago by exarkun

  • Resolution set to fixed
  • Status changed from assigned to closed

(In [28061]) Merge faster-b1282int-2310

Author: catlee
Reviewer: TimAllen, exarkun
Fixes: #2310

Switch the banana function for decoding base-128 encoded integers from an expensive
exponentiation-based approach to a much cheaper bit-shifting approach. Also add a
benchmark which demonstrates how much faster decoding these integers now is.

comment:10 Changed 4 years ago by <automation>

  • Owner exarkun deleted
Note: See TracTickets for help on using tickets.