[Twisted-web] new http

Wed May 19 20:09:22 MDT 2004

Firstly: HTTP 1.1 compliance is not easy. It's not *too* bad for an 
origin server, but if Twisted wants a compliant HTTP proxy module (even 
non-caching), there's a lot of requirements. Squid made a nice table of 
the 473 MUST/MAY/SHOULD [NOT]s which is helpful.

Anyhow as it's nearing workability, I thought I'd just write a bit 
about how I it's structured. This is kinda rambling but oh well. ;)

There's 4 main classes on the path for handling a HTTP connection:
HTTPFactory: a ServerFactory that creates HTTPChannel objects for each 
incoming TCP connection.
HTTPChannel: keeps track of queued 'ChannelRequest' objects, and some 
of the splitting up of the incoming data into distinct requests work.
ChannelRequest: handles all the low-level hop-by-hop behavior.
Request: high level request/response behavior.

The split off of ChannelRequest from bits of HTTPChannel and bits of 
Request is geared towards allowing a different request transport than 
normal HTTP. PB is one possibility. Also, it simplifies the Request 
object and gives it a cleaner API that only has to deal with the actual 
request, not the details of transfer encodings, pipelined connections, 
etc.

ChannelRequest provides the following methods for Requests to call:
     def writeIntermediateResponse(self, code, headers=None, 
code_message=None):
     def writeHeaders(self, code, headers, code_message=None):
     def writeData(self, data):
     def finish(self):
     def abortConnection(self):
Also the producer methods:
     def registerProducer(self, producer, streaming):
     def unregisterProducer(self):

Request provides the following callbacks that are called by 
ChannelRequest:
     def __init__(self, chanRequest, command, path, version, in_headers):
     def handleContentChunk(self, data):
     def handleContentComplete(self):
     def connectionLost(self, reason):

The core of the public interface to this whole thing are the 
fields/methods on Request:
   method: HTTP method used
   uri: URI passed in the request.
   clientproto: Tuple like (1,1)
   out_headers: a Headers object containing the headers to output.
   in_headers: a Headers object containing the incoming headers.

   acceptData(self): Call to notify the sender that you intend to accept 
the request.
   checkPreconditions(self); check if the preconditions are satisfied, 
and thus whether the action should take place/the output data should be 
written.
   write(self, data): Call to write some data. If headers haven't been 
written yet, write them.
   writeFile(...): Call to write a file in an optimized way like 
sendfile(). TBD what actually goes here.
   finish(self): Call when you've finished writing data.

Callbacks to override:
   process(self): called from __init__. Incoming headers have been 
received, but no data yet. Should do resource lookup.
   handleContentChunk(self, data): A chunk of data was received.
   handleContentComplete(self): The incoming data is done.
   connectionLost(self): the underlying connection was lost.

in_headers/out_headers are objects of type http_headers.Headers which 
provides for a standardized way of translating between raw string 
headers and structured data headers. Some of the header parsers are not 
written yet.

Unlike the old Request, this one is going to do nothing with incoming 
data. No form processing, no buffering, no nothing. No args processing 
of the uri, either. The "full featured" subclass of Request (e.g. 
server.Request) can do that stuff. It is expected to do all URI 
frobbing and then a locateChild() lookup at process() time (before the 
data has arrived). Then, figure out what the located resource wants to 
do with the incoming data (ignore it, buffer it all up into one string, 
or pass it along as it comes in). Note that this means locateChild 
can't use POST arguments. Then, in the usual case, render() would be 
called after all the data has arrived and form processing has been done 
on it. But for some resources, e.g. a proxying resource, it would just 
send all the data straight through, without doing form processing.

James