Opened 11 years ago

Closed 8 years ago

twisted.spread.banana.b1282int could be faster

Reported by: Owned by: Jean-Paul Calderone low pb catlee branches/faster-b1282int-2310 branch-diff, diff-cov, branch-cov, buildbot catlee

Description

Banana calls this function a lot. It might be a bottleneck, who knows. Someone should profile it. I forget if I did. I found some code that apparently I wrote a while ago, though (patch reversed):

```Index: twisted/spread/banana.py
===================================================================
--- twisted/spread/banana.py    (revision 18957)
+++ twisted/spread/banana.py    (working copy)
@@ -35,19 +35,18 @@
stream(chr(integer & 0x7f))
integer = integer >> 7

-def b1282int(st):
-    oneHundredAndTwentyEight = 128l
+
+def b1282int(st, _powersOfOneTwentyEight = []):
i = 0
-    place = 0
-    for char in st:
+    if len(st) > len(_powersOfOneTwentyEight):
+        _powersOfOneTwentyEight.extend([
+            128 ** n for n in xrange(len(_powersOfOneTwentyEight), len(st))])
+    for place, char in enumerate(st):
num = ord(char)
-        i = i + (num * (oneHundredAndTwentyEight ** place))
-        place = place + 1
-    if i <= 2147483647:
-        return int(i)
-    else:
-        return i
+        i = i + (num * _powersOfOneTwentyEight[place])
+    return i

+
# delimiter characters.
LIST     = chr(0x80)
INT      = chr(0x81)
```

The strategy is simple: cache powers instead of recomputing them all the time. This drastically speeds up the function for multiple invocations.

comment:1 Changed 11 years ago by Jean-Paul Calderone

Priority: normal → low

Plenty of Twisted could be faster I guess. I don't think the above patch actually ended up helping a whole lot, anyway.

comment:2 Changed 9 years ago by catlee

b1282int came up fairly high on one of our buildbot masters that handles a lot of incoming data from the slaves.

The following patch speeds up b1282int about 4x:

```Index: banana.py
===================================================================
--- banana.py	(revision 27222)
+++ banana.py	(working copy)
@@ -34,17 +34,13 @@
integer = integer >> 7

def b1282int(st):
-    oneHundredAndTwentyEight = 128l
+    e = 1
i = 0
-    place = 0
for char in st:
-        num = ord(char)
-        i = i + (num * (oneHundredAndTwentyEight ** place))
-        place = place + 1
-    if i <= 2147483647:
-        return int(i)
-    else:
-        return i
+        n = ord(char)
+        i += (n * e)
+        e <<= 7
+    return i

# delimiter characters.
LIST     = chr(0x80)
```

For strings less than around 15 characters this is faster than the patch above.

Some simple benchmarks on my machine (measured in calls / second):

```    string length   original    caching version     bitshifting version
5               96602       282676              343451
10              46349       176972              187775
15              32511       119855              109543
50              7894        33980               28125
100             2998        14342               11502
```

Changed 8 years ago by catlee

benchmark of b1282int methods

Changed 8 years ago by TimAllen

An alternative microbenchmark

comment:5 Changed 8 years ago by TimAllen

The attached `banana_speed.2.py` didn't work for me at first, so I wrote my own microbenchmark (attached as `timing-test-2310.py`)... the original seems to be working for me again now, not quite sure why.

Here's the results of my benchmark on my machine ('cps' is 'calls per second', I'm running Python 2.4 on Fedora 6, 32-bit):

```Function       'caching',   1 byte string:     126582 cps
Function      'original',   1 byte string:     156250 cps
Function   'bitshifting',   1 byte string:     370370 cps
Function       'caching',   2 byte string:     196078 cps
Function      'original',   2 byte string:     101010 cps
Function   'bitshifting',   2 byte string:     285714 cps
Function       'caching',   3 byte string:     169491 cps
Function      'original',   3 byte string:      69444 cps
Function   'bitshifting',   3 byte string:     222222 cps
Function       'caching',   4 byte string:     147058 cps
Function      'original',   4 byte string:      52631 cps
Function   'bitshifting',   4 byte string:     188679 cps
Function       'caching',   5 byte string:     119047 cps
Function      'original',   5 byte string:      44642 cps
Function   'bitshifting',   5 byte string:     133333 cps
Function       'caching',  10 byte string:      67567 cps
Function      'original',  10 byte string:      21739 cps
Function   'bitshifting',  10 byte string:      63694 cps
Function       'caching',  50 byte string:      13661 cps
Function      'original',  50 byte string:       3825 cps
Function   'bitshifting',  50 byte string:      11534 cps
Function       'caching', 100 byte string:       6485 cps
Function      'original', 100 byte string:       1616 cps
Function   'bitshifting', 100 byte string:       5277 cps
```

I would guess that most integers transmitted by banana would be less than 32-bit, and thus take less than 5 bytes in the b128 encoding. 'caching' scales very well to large numbers, but very poorly to small numbers - for one byte strings, it's even slower than the original algorithm! 'bitshifting' scales up nearly as well as 'caching' (both are much better than the original), but it scales down spectacularly as well.

Also, note that the bit-shifting implementation is vastly simpler than the caching implementation, and passes all the tests.

I'm +1 for bit-shifting.

comment:6 Changed 8 years ago by Jean-Paul Calderone

Owner: Jean-Paul Calderone deleted

comment:7 Changed 8 years ago by Jean-Paul Calderone

Author: → exarkun → branches/faster-b1282int-2310

(In [28057]) Branching to 'faster-b1282int-2310'

comment:8 Changed 8 years ago by Jean-Paul Calderone

Author: exarkun → catlee review removed set to Jean-Paul Calderone new → assigned

Please attach patches as files rather than including them inline in comments. trac destroys patches posted this way. Fortunately this patch is simple, so I applied could reasonably apply it by hand.

The change looks good. I checked in a simpler version of the benchmark. It doesn't need to run against all versions, just against the version it can import from Twisted.

Going to apply this change now. Thanks.

comment:9 Changed 8 years ago by Jean-Paul Calderone

Resolution: → fixed assigned → closed

(In [28061]) Merge faster-b1282int-2310

Author: catlee Reviewer: TimAllen, exarkun Fixes: #2310

Switch the banana function for decoding base-128 encoded integers from an expensive exponentiation-based approach to a much cheaper bit-shifting approach. Also add a benchmark which demonstrates how much faster decoding these integers now is.

comment:10 Changed 7 years ago by <automation>

Owner: Jean-Paul Calderone deleted
Note: See TracTickets for help on using tickets.