Ticket #3420 enhancement closed fixed
twisted.web.client persistent connections
Description
The current twisted.web.client.HTTPClient does not support persistent connections, but since the current implementation is mostly designed for single-use, I am proposing a manager object that handles the management of connections, both persistent and otherwise. The closest corollary here is in twisted.web2.channel.http.HTTPChannel. This method will ensure backwards compatibility with current web.client implementations and should not break any tests -- it is mainly functionality exposed by the new client API (ticket forthcoming on that). Current proposal allows for the following:
1. Maximum number of connections for all domains, likely based on a Cooperator.
2. Persistent connections per-server.
Item (1) is quite simple and would basically function the same way as if you were to use getPage calls yielded to a Cooperator. Item (2) would automatically manage persistent connections on a per-server basis which would timeout after a given amount of time without further requests. IAW RFC 2616 no more than 2 connections would be maintained per-server for a single client. Here is a simple example:
agent = Agent(persist=True)
# Opens connection to foo.com and makes request
response = yield agent.requestString("http://foo.com/file-to-download.txt")
# ...
# Later, reuse same connection to foo.com
response = yield agent.requestString("http://foo.com/some-page.html")
# ...
# X time passes between request/response, open a new connection here
response = yield agent.requestString("http://foo.com/some-other-page.html")
In this way, handling of persistent connections is completely transparent to the user. The easiest and least-disruptive way I see to do this right now is to refactor HTTPPageGetter (which in time will be replaced by a generic protocol that is way better than having to create sub-classes for "getting" and "downloading" and whatever else) and create a manager for HTTPClientFactorys. Assuming HTTPPageGetter has been refactored to keep it from closing the transport connection after a response (and anywhere else applicable), here is a brief example:
def requestString(self, url):
scheme, host, port, path = _parse(url)
if self.persist:
for r in self.requestFactories:
if r.scheme == scheme and r.host == host and r.port == port:
# We already have a connection for this, so reuse it
r.setURL(url)
r.protocol.connectionMade()
return
# Make a new factory
self.requestFactories.append(_makeGetterFactory(url, HTTPClientFactory))
Obviously this is very simplified (and suggests the connection logic is in Agent which isn't terribly likely), but hopefully it gets the point across. I'm not entirely sure if I'm following procedure here, but I wanted to get a sanity check on the idea before I went about writing tests and implementing it. Thanks all.

