Ticket #1157 (new enhancement)

Opened 5 years ago

Last modified 10 months ago

web.http.HTTPClient does not support chunked transfer-encoding

Reported by: jeffsilver Owned by: tdavis
Priority: low Milestone:
Component: web Keywords: httpclient
Cc: jeffsilver, exarkun, andi5 Branch:
Author: Launchpad Bug:

Description


Attachments

http-client-chunked-24691.patch Download (19.3 KB) - added by tdavis 2 years ago.
Patch and tests to give HTTPClient chunked response reading capability

Change History

Changed 5 years ago by jeffsilver

I'm rather hoping to get HTTPClient to be able to do HTTP/1.1. The lack of
support for chunked transfer-encoding looks like the major obstacle to this, so
I intend to fix it.

Changed 2 years ago by glyph

  • owner changed from jeffsilver to dialtone

This is a subset of #2951, which dialtone is working on.

Changed 2 years ago by exarkun

Is this ticket for sending chunked or receiving chunked?

Changed 2 years ago by dialtone

The one I've been working on was both but I started to work on the sending side.

Changed 2 years ago by tdavis

Patch and tests to give HTTPClient chunked response reading capability

Changed 2 years ago by tdavis

I have attached a patch for review which adds proper chunked-response parsing to HTTPClient. IAW glyph's recommendation in #2951 I have attached it here. Further, I have gone over and remedied (most of) exarkun's issues with dialtone's patches in #2951.

The majority of the code was taken from dialtone's branch which I then updated to work properly with trunk (as of r24691).

Changed 2 years ago by tdavis

  • keywords review added

Changed 2 years ago by exarkun

  • owner dialtone deleted

Changed 2 years ago by exarkun

  • cc exarkun added
  • keywords review removed
  • owner set to tdavis

Coding standard points:

  • twisted/web/client.py
    • Attributes should be documented in class docstring rather than with comments. This includes private attributes like _headers and __buffer.
    • sendCommand needs a docstring (even though you didn't write it, you changed it)
    • sendHeader needs a docstring
    • endHeaders needs a docstring
    • lineReceived needs a docstring
    • In lineReceived, \-style line continuation is not allowed - use parenthesis instead.
  • twisted/web/test/test_webclient.py
    • HTTPLoopbackTestCase.Client and HTTPLoopbackTestCase.Request should be clientFactory and requestFactory
    • Test methods should be named like test_foo rather than testFoo and have docstrings.

Regarding the approach in HTTPLoopbackTestCase and HTTP11LoopbackTestCase, putting asserts into methods that the HTTP client is probably going to call means that the test is relying on the client to actually call those methods in order for the important parts of the test to run. It's also relying on whatever code is calling those methods to not handle the exceptions in a way which causes assertion failures to not actually fail the test. So, it's better to have assertions somewhere that you know they'll run and you know there isn't something above them on the call stack that will trap and discard the exceptions they raise (that's the mechanism for test failure, by the way - assertFoo raises a FailTest exception). A slightly less good fix is to set some flag/variable/whatever after your assertions, then check that flag at the end of the test, that way you know those assertions ran and succeeded, otherwise the flag wouldn't have been set. This fixes the reliability issue, but it leaves the tests in a state that's a bit hard for future maintainers to read.

So, a different approach would be to collect all the values passed to handleHeader, handleResponse, etc and make assertions about them after the loopbackAsync Deferred fires. If you want to make sure things happen in the right order or the right number of times, one common pattern is to keep appending data about events to a list shared by all the callbacks in the test, then assert things about the list at the end. This lets you catch things like multiple calls to endHeaders.

Aside from that, there are some parts of HTTPClient.lineReceived which are untested. I used trial --coverage twisted.web.test.test_http and looked at _trial_temp/coverage/twisted.web.http.cover to find out which parts. There are (at least) two things to note about the resulting file:

  • Lines beginning with >>>>>> are never executed by the test suite.
  • Lines that begin with a number are executed that number of times by the test suite. As a first approximation, any number 1 or greater is a great thing to have. Higher numbers aren't necessarily good, but there are some cases where a value greater than 1 is desirable. Consider this code:
    if foo:
        bar

If each line is executed once, then at least you know all the lines are run, so there are no glaring bugs. But there are actually two possible "paths" through this code. If each line executes once, then the test is only covering the "positive" path (ie, the path where foo is true). You could delete the if foo: line and the tests would still pass. So either you should delete that line in such a case, or you should add a new test where foo is false. Once you do that, you'll see that if foo: is executed twice but bar is still only executed once. This generalizes to numbers greater than 1, of course. If your test suite runs if foo: 50 times and bar 50 times, you're still only exercising the positive path through the code and a new test should be added.

Changed 22 months ago by andi5

  • cc andi5 added

Changed 21 months ago by exarkun

  • keywords httpclient added

Changed 10 months ago by exarkun

  • priority changed from normal to low

Considering #886 and #3987 have been resolved, this is of rather diminished importance, I think. If someone wants to finish it before we actually get around to deprecated HTTPClient and the related APIs, that's fine, though.

Note: See TracTickets for help on using tickets.