[Twisted-web] new http
James Y Knight
foom at fuhm.net
Wed May 19 20:09:22 MDT 2004
Firstly: HTTP 1.1 compliance is not easy. It's not *too* bad for an
origin server, but if Twisted wants a compliant HTTP proxy module (even
non-caching), there's a lot of requirements. Squid made a nice table of
the 473 MUST/MAY/SHOULD [NOT]s which is helpful.
Anyhow as it's nearing workability, I thought I'd just write a bit
about how I it's structured. This is kinda rambling but oh well. ;)
There's 4 main classes on the path for handling a HTTP connection:
HTTPFactory: a ServerFactory that creates HTTPChannel objects for each
incoming TCP connection.
HTTPChannel: keeps track of queued 'ChannelRequest' objects, and some
of the splitting up of the incoming data into distinct requests work.
ChannelRequest: handles all the low-level hop-by-hop behavior.
Request: high level request/response behavior.
The split off of ChannelRequest from bits of HTTPChannel and bits of
Request is geared towards allowing a different request transport than
normal HTTP. PB is one possibility. Also, it simplifies the Request
object and gives it a cleaner API that only has to deal with the actual
request, not the details of transfer encodings, pipelined connections,
etc.
ChannelRequest provides the following methods for Requests to call:
def writeIntermediateResponse(self, code, headers=None,
code_message=None):
def writeHeaders(self, code, headers, code_message=None):
def writeData(self, data):
def finish(self):
def abortConnection(self):
Also the producer methods:
def registerProducer(self, producer, streaming):
def unregisterProducer(self):
Request provides the following callbacks that are called by
ChannelRequest:
def __init__(self, chanRequest, command, path, version, in_headers):
def handleContentChunk(self, data):
def handleContentComplete(self):
def connectionLost(self, reason):
The core of the public interface to this whole thing are the
fields/methods on Request:
method: HTTP method used
uri: URI passed in the request.
clientproto: Tuple like (1,1)
out_headers: a Headers object containing the headers to output.
in_headers: a Headers object containing the incoming headers.
acceptData(self): Call to notify the sender that you intend to accept
the request.
checkPreconditions(self); check if the preconditions are satisfied,
and thus whether the action should take place/the output data should be
written.
write(self, data): Call to write some data. If headers haven't been
written yet, write them.
writeFile(...): Call to write a file in an optimized way like
sendfile(). TBD what actually goes here.
finish(self): Call when you've finished writing data.
Callbacks to override:
process(self): called from __init__. Incoming headers have been
received, but no data yet. Should do resource lookup.
handleContentChunk(self, data): A chunk of data was received.
handleContentComplete(self): The incoming data is done.
connectionLost(self): the underlying connection was lost.
in_headers/out_headers are objects of type http_headers.Headers which
provides for a standardized way of translating between raw string
headers and structured data headers. Some of the header parsers are not
written yet.
Unlike the old Request, this one is going to do nothing with incoming
data. No form processing, no buffering, no nothing. No args processing
of the uri, either. The "full featured" subclass of Request (e.g.
server.Request) can do that stuff. It is expected to do all URI
frobbing and then a locateChild() lookup at process() time (before the
data has arrived). Then, figure out what the located resource wants to
do with the incoming data (ignore it, buffer it all up into one string,
or pass it along as it comes in). Note that this means locateChild
can't use POST arguments. Then, in the usual case, render() would be
called after all the data has arrived and form processing has been done
on it. But for some resources, e.g. a proxying resource, it would just
send all the data straight through, without doing form processing.
James
More information about the Twisted-web
mailing list