[Twisted-web] getting response headers from client.getPage

Tommi Virtanen tv at twistedmatrix.com
Thu Nov 11 07:35:19 MST 2004


Lars Woetmann Pedersen wrote:

> how do I get the responce header in printResult
> the code looks like this now, the content of the page
> is in data, but I also need the header to check the etag and 
> last-modified
>
>         d = client.getPage(myurl)
>         d.addCallback(printResult)
>
>
> def printResult(data):
>         print 'printing result:'
>         print data

I do not want to be rude, but honestly, I think this requires some
rudeness: ASKING QUESTIONS WITHOUT SPENDING FIVE MINUTES
ON THE PROBLEM YOURSELF DOES NOT LEAVE A GOOD IMPRESSION
OF YOU IN THE MINDS OF OTHERS.

Looking for the word "header" in twisted.web.client:

class HTTPClientFactory(protocol.ClientFactory):
    """Download a given URL.

    @type deferred: Deferred
    @ivar deferred: A Deferred that will fire when the content has
          been retrieved. Once this is fired, the ivars `status', `version',
          and `message' will be set.

    @type status: str
    @ivar status: The status of the response.

    @type version: str
    @ivar version: The version of the response.

    @type message: str
    @ivar message: The text message returned with the status.

    @type response_headers: dict
    @ivar response_headers: The headers that were specified in the
          response from the server.
    """

Okay, so the headers are in the factory after the deferred fires.
Let's see how we can get our hands on them:

def getPage(url, contextFactory=None, *args, **kwargs):
    """Download a web page as a string.

    Download a page. Return a deferred, which will callback with a
    page (as a string) or errback with a description of the error.

    See HTTPClientFactory to see what extra args can be passed.
    """
    scheme, host, port, path = _parse(url)
    factory = HTTPClientFactory(url, *args, **kwargs)
    if scheme == 'https':
        from twisted.internet import ssl
        if contextFactory is None:
            contextFactory = ssl.ClientContextFactory()
        reactor.connectSSL(host, port, factory, contextFactory)
    else:
        reactor.connectTCP(host, port, factory)
    return factory.deferred

Okay, so you can't really access the factory via getPage.
Just write a custom getPage that returns factory instead of
factory.deferred.

def myGetPage(url, contextFactory=None, *args, **kwargs):
    scheme, host, port, path = _parse(url)
    factory = HTTPClientFactory(url, *args, **kwargs)
    reactor.connectTCP(host, port, factory)
    return factory




More information about the Twisted-web mailing list