[Twisted-Python] [Tutor] Help on personnal Proxy

Eric Holm twisted at eholm.com
Tue May 27 00:42:17 EDT 2003


"POYEN OP Olivier (DCL)" <Olivier.POYEN at clf-dexia.com> writes:
> I tried to read some parts of the how-to, [...] with no luck. 
> 
The short answer to your question is to replace:

   f.HTTPChannel=proxy.Proxy

with:

   f.protocol=proxy.Proxy

Or, I've written a (yet another) web proxy that you might 
want to try too.  Using my MinProxy (for Minimal proxy) should 
be fairly straightforward, simply dump MinProxy.py and 
HttpHeader.py into the same directory, and then run MinProxy 
(there's no #!, so you'll have to type python MinProxy.py 
if you're not on Windows).

If you want to look at (or mess with) the pages as they flow 
through MinProxy, it's a bit different (hopefully simpler) 
than web.Proxy.  Simply subclass MinProxy, and then override:

  onHeaderFromServer( self, header ):
  onHeaderFromServer( self, header ):
  onBodyFromBrowser( self, header, body ):
  onBodyFromServer( self, header, body ):

depending on what you're interested in.  If you override them, 
you'll probably want to call:

  self.sendLinesToServer( header )
  self.sendLinesToBrowser( header )
  self.sendBufferToServer( body )
  self.sendBufferToBrowser( body )

One other difference is that the headers in the above methods 
are encapsulated as HTTPHeader objects.  When treated as lists, 
they return the header lines, untouched, in the order received. 
For example:

  for line in header:
     print line

will print the lines as they were received.  When treated 
as dictionaries, HTTPHeader objects will return the desired 
field.  For example:

   header['content-type']

might return 'text/html', and:

   header.get( 'content-encoding', None )

might return 'gzip' (or None).  The keys are all converted 
to lower-case, the fields are untouched.

Anyway, if any of this looks the least bit interesting, 
please feel free to grab it from:

   http://nonplussed.eholm.com/twisted

Why write another web proxy?  Well, after playing with 
web.Proxy, I decided to simplify life (or, at least my 
particular situation) by writing a proxy based on 
basic.LineReceiver instead of http.HTTPClient and 
http.HTTPChannel.  I found that web.Proxy was choking on 
some mainstream'ish sites such as msn.com and amazon.com.  
While attempting to patch-up the problems, it occurred to 
me that I was messing with code in the HTTPxxx classes that 
was parsing the headers.  And while *real* HTTP clients and 
servers need to grok the headers, my simple proxy application 
doesn't need such sophistication.  So I decided to bypass the 
hard-core HTTP code, and come-up with a simpler approach (famous 
last words...).

I'd be happy to contribute this to twisted, if anybody's 
interested.  It needs some cleanup, as none of it conforms 
to the twisted coding standards (e.g. not a single docstring), 
and there aren't any twisted-style test cases (oh, and the name 
sucks too...).  Hopefully there's enough there to give folks 
a feel for whether or not it might be useful - if so I'll be 
happy to clean it up and submit it, if not I won't bother. In 
the meantime, feel free to give it a try, and send any comments 
or questions my way.


Eric.





More information about the Twisted-Python mailing list