[Twisted-Python] Twisted Python vs. "Blocking" Python: Weird performance on small operations.

Tue Oct 13 09:41:19 EDT 2009

Dirk,

Using deferred directly in your bin2intAsync() may be somewhat less  
efficient than some other way described in Recipe 439358: [Twisted]  
 From blocking functions to deferred functions

recipe (http://code.activestate.com/recipes/439358/)

You would get same effect (asynchronous execution) - but potentially  
more efficiently - by just decorating your synchronous methods as:

from twisted.internet.threads import deferToThread
deferred = deferToThread.__get__
....
@deferred
def int2binAsync(anInteger):
     #Packs an integer, result is 4 bytes
     return struct.pack("i", anInteger)

@deferred
def bin2intAsync(aBin):
     #Unpacks a bytestring into an integer
     return struct.unpack("i", aBin)[0]

Kind regards,

Valeriy Pogrebitskiy
vpogrebi at verizon.net

On Oct 13, 2009, at 9:18 AM, Dirk Moors wrote:

> Hello Everyone!
>
> My name is Dirk Moors, and since 4 years now, I've been involved in  
> developing a cloud computing platform, using Python as the  
> programming language. A year ago I discovered Twisted Python, and it  
> got me very interested, upto the point where I made the decision to  
> convert our platform (in progress) to a Twisted platform. One year  
> later I'm still very enthousiastic about the overal performance and  
> stability, but last week I encountered something I did't expect;
>
> It appeared that it was less efficient to run small "atomic"  
> operations in different deferred-callbacks, when compared to running  
> these "atomic" operations together in "blocking" mode. Am I doing  
> something wrong here?
>
> To prove the problem to myself, I created the following example  
> (Full source- and test code is attached):
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
> import struct
>
> def int2binAsync(anInteger):
>     def packStruct(i):
>         #Packs an integer, result is 4 bytes
>         return struct.pack("i", i)
>
>     d = defer.Deferred()
>     d.addCallback(packStruct)
>
>     reactor.callLater(0,
>                       d.callback,
>                       anInteger)
>
>     return d
>
> def bin2intAsync(aBin):
>     def unpackStruct(p):
>         #Unpacks a bytestring into an integer
>         return struct.unpack("i", p)[0]
>
>     d = defer.Deferred()
>     d.addCallback(unpackStruct)
>
>     reactor.callLater(0,
>                       d.callback,
>                       aBin)
>     return d
>
> def int2binSync(anInteger):
>     #Packs an integer, result is 4 bytes
>     return struct.pack("i", anInteger)
>
> def bin2intSync(aBin):
>     #Unpacks a bytestring into an integer
>     return struct.unpack("i", aBin)[0]
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> While running the testcode I got the following results:
>
> (1 run = converting an integer to a byte string, converting that  
> byte string back to an integer, and finally checking whether that  
> last integer is the same as the input integer.)
>
> *** Starting Synchronous Benchmarks. (No Twisted => "blocking" code)
>   -> Synchronous Benchmark (1 runs) Completed in 0.0 seconds.
>   -> Synchronous Benchmark (10 runs) Completed in 0.0 seconds.
>   -> Synchronous Benchmark (100 runs) Completed in 0.0 seconds.
>   -> Synchronous Benchmark (1000 runs) Completed in 0.00399994850159  
> seconds.
>   -> Synchronous Benchmark (10000 runs) Completed in 0.0369999408722  
> seconds.
>   -> Synchronous Benchmark (100000 runs) Completed in 0.362999916077  
> seconds.
> *** Synchronous Benchmarks Completed in 0.406000137329 seconds.
>
> *** Starting Asynchronous Benchmarks . (Twisted => "non-blocking"  
> code)
>   -> Asynchronous Benchmark (1 runs) Completed in 34.5090000629  
> seconds.
>   -> Asynchronous Benchmark (10 runs) Completed in 34.5099999905  
> seconds.
>   -> Asynchronous Benchmark (100 runs) Completed in 34.5130000114  
> seconds.
>   -> Asynchronous Benchmark (1000 runs) Completed in 34.5859999657  
> seconds.
>   -> Asynchronous Benchmark (10000 runs) Completed in 35.2829999924  
> seconds.
>   -> Asynchronous Benchmark (100000 runs) Completed in 41.492000103  
> seconds.
> *** Asynchronous Benchmarks Completed in 42.1460001469 seconds.
>
> Am I really seeing factor 100x??
>
> I really hope that I made a huge reasoning error here but I just  
> can't find it. If my results are correct then I really need to go  
> and check my entire cloud platform for the places where I decided to  
> split functions into atomic operations while thinking that it would  
> actually improve the performance while on the contrary it did the  
> opposit.
>
> I personaly suspect that I lose my cpu-cycles to the reactor  
> scheduling the deferred-callbacks. Would that assumption make any  
> sense?
> The part where I need these conversion functions is in marshalling/ 
> protocol reading and writing throughout the cloud platform, which  
> implies that these functions will be called constantly so I need  
> them to be superfast. I always though I had to split the entire  
> marshalling process into small atomic (deferred-callback) functions  
> to be efficient, but these figures tell me otherwise.
>
> I really hope someone can help me out here.
>
> Thanks in advance,
> Best regards,
> Dirk Moors
>
>
>
>
>
>
>
>
>
>
>
>
>
> <twistedbenchmark.py>_______________________________________________
> Twisted-Python mailing list
> Twisted-Python at twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://twistedmatrix.com/pipermail/twisted-python/attachments/20091013/e9ae2546/attachment-0001.htm