Hello Everyone!<div><br></div><div>My name is Dirk Moors, and since 4 years now, I&#39;ve been involved in developing a cloud computing platform, using Python as the programming language. A year ago I discovered Twisted Python, and it got me very interested, upto the point where I made the decision to convert our platform (in progress) to a Twisted platform. One year later I&#39;m still very enthousiastic about the overal performance and stability, but last week I encountered something I did&#39;t expect;</div>

<div><br></div><div>It appeared that it was less efficient to run small &quot;atomic&quot; operations in different deferred-callbacks, when compared to running these &quot;atomic&quot; operations together in &quot;blocking&quot; mode. Am I doing something wrong here?</div>

<div><br></div><div>To prove the problem to myself, I created the following example (Full source- and test code is attached):</div><div>---------------------------------------------------------------------------------------------------------------------------------------------------------------------</div>

<div>import struct</div><div><br></div><div><div>def int2binAsync(anInteger):</div><div>    def packStruct(i):</div><div>        #Packs an integer, result is 4 bytes</div><div>        return struct.pack(&quot;i&quot;, i)    </div>

<div><br></div><div>    d = defer.Deferred()</div><div>    d.addCallback(packStruct)</div><div><br></div><div>    reactor.callLater(0,</div><div>                      d.callback,</div><div>                      anInteger)</div>

<div><br></div><div>    return d</div><div><br></div><div>def bin2intAsync(aBin):   </div><div>    def unpackStruct(p):</div><div>        #Unpacks a bytestring into an integer</div><div>        return struct.unpack(&quot;i&quot;, p)[0]</div>

<div><br></div><div>    d = defer.Deferred()</div><div>    d.addCallback(unpackStruct)</div><div><br></div><div>    reactor.callLater(0,</div><div>                      d.callback,</div><div>                      aBin)</div>

<div>    return d</div><div><br></div><div>def int2binSync(anInteger):</div><div>    #Packs an integer, result is 4 bytes</div><div>    return struct.pack(&quot;i&quot;, anInteger)    </div><div><br></div><div>def bin2intSync(aBin):</div>

<div>    #Unpacks a bytestring into an integer</div><div>    return struct.unpack(&quot;i&quot;, aBin)[0]  </div><div><br></div><div>---------------------------------------------------------------------------------------------------------------------------------------------------------------------</div>

<div><br></div><div>While running the testcode I got the following results:</div><div><br></div><div>(1 run = converting an integer to a byte string, converting that byte string back to an integer, and finally checking whether that last integer is the same as the input integer.)</div>

<div><br></div><div><div>*** Starting Synchronous Benchmarks. <b>(No Twisted =&gt; &quot;blocking&quot; code)</b></div><div>  -&gt; Synchronous Benchmark (1 runs) Completed in 0.0 seconds.</div><div>  -&gt; Synchronous Benchmark (10 runs) Completed in 0.0 seconds.</div>

<div>  -&gt; Synchronous Benchmark (100 runs) Completed in 0.0 seconds.</div><div>  -&gt; Synchronous Benchmark (1000 runs) Completed in 0.00399994850159 seconds.</div><div>  -&gt; Synchronous Benchmark (10000 runs) Completed in 0.0369999408722 seconds.</div>

<div>  -&gt; Synchronous Benchmark (100000 runs) Completed in 0.362999916077 seconds.</div><div>*** Synchronous Benchmarks Completed in<b> 0.406000137329</b> seconds.</div><div><br></div><div><div>*** Starting Asynchronous Benchmarks . <b>(Twisted =&gt; &quot;non-blocking&quot; code)</b></div>

<div>  -&gt; Asynchronous Benchmark (1 runs) Completed in 34.5090000629 seconds.</div><div>  -&gt; Asynchronous Benchmark (10 runs) Completed in 34.5099999905 seconds.</div><div>  -&gt; Asynchronous Benchmark (100 runs) Completed in 34.5130000114 seconds.</div>

<div>  -&gt; Asynchronous Benchmark (1000 runs) Completed in 34.5859999657 seconds.</div><div>  -&gt; Asynchronous Benchmark (10000 runs) Completed in 35.2829999924 seconds.</div><div>  -&gt; Asynchronous Benchmark (100000 runs) Completed in 41.492000103 seconds.</div>

<div>*** Asynchronous Benchmarks Completed in <b>42.1460001469</b> seconds.</div><div><br></div><div>Am I really seeing factor 100x??</div><div><br></div><div>I really hope that I made a huge reasoning error here but I just can&#39;t find it. If my results are correct then I really need to go and check my entire cloud platform for the places where I decided to split functions into atomic operations while thinking that it would actually improve the performance while on the contrary it did the opposit.</div>

<div><br></div><div>I personaly suspect that I lose my cpu-cycles to the reactor scheduling the deferred-callbacks. Would that assumption make any sense?</div><div>The part where I need these conversion functions is in marshalling/protocol reading and writing throughout the cloud platform, which implies that these functions will be called constantly so I need them to be superfast. I always though I had to split the entire marshalling process into small atomic (deferred-callback) functions to be efficient, but these figures tell me otherwise.</div>

<div><br></div><div>I really hope someone can help me out here.</div><div><br></div><div>Thanks in advance,</div><div>Best regards,</div><div>Dirk Moors</div><div><br></div><div><br></div><div><br></div><div><br></div><div>

<br></div><div><br></div><div><br></div></div><div><br></div><div><br></div><div><br></div><div><br></div></div><div><br></div><div><br></div></div>