[Twisted-Python] Asynchronous gzipped content decompression: best approach

Fri Jul 30 06:08:20 MDT 2010

On Fri, 2010-07-30 at 11:28 +0100, Michele - wrote:
> Hi,
> 
> 
> I have written a small utility function to replace
> "twisted.web.client.getPage", to be able to read the response header.
> 
> 
> I have to say that the ever improving documentation made it quite easy
> for me to do it using the new twisted.web.client.Agent, so well done
> to all!
> 
> 
> Since my wrapper works quite well, I decided to add gzip response
> support, as it's another feature lacking from the original getPage.
> Again, it was quite simple and it looks it works quite well, in proof
> of concept scenario.
> 
> 
> Then it came my dilemma. What I'm doing now is
> a synchronous decompression as shown below:
> 
> 
> compressedstream = StringIO.StringIO(inzip)  
> gzipper = gzip.GzipFile(fileobj=compressedstream)
> _data = gzipper.read()
> return _data

In the standard Agent API, streaming data is downloaded to a protocol.
So a gunzipping version would do the same: you have a wrapper protocol
that uncompresses data, then delivers to underlying protocol.

The basic logic would require reimplementing a small part of the gzip
module: first few bytes of data are gzip header, which you skip. Then,
use the zlib module to decompress data as it arrives (specifically you'd
want a decompression object) and deliver it to the wrapped protocol's
dataReceived.