Opened 3 years ago

Last modified 13 months ago

#5192 enhancement new

100-continue support for twisted.web.client.Agent

Reported by: mithrandi Owned by: shira
Priority: normal Milestone:
Component: web Keywords:
Cc: jknight Branch: branches/100continue-5192-2
(diff, github, buildbot, log)
Author: dreid, glyph Launchpad Bug:

Attachments (6)

100Continue.patch (12.1 KB) - added by darfire 3 years ago.
Adds 100-Continue to HTTP11ClientProtocol
5192_2.patch (15.8 KB) - added by darfire 3 years ago.
5192_3.patch (30.8 KB) - added by darfire 3 years ago.
5192_4.patch (35.9 KB) - added by darfire 3 years ago.
5192_5.patch (3.0 KB) - added by sdsern 18 months ago.
5192_6.patch (3.3 KB) - added by sdsern 18 months ago.

Download all attachments as: .zip

Change History (33)

comment:1 Changed 3 years ago by DefaultCC Plugin

  • Cc jknight added

comment:2 Changed 3 years ago by darfire

How transparent needs this to be to the user? Should it see the '100-continue' response and maybe call a writeBodyTo method on the request or just set the Expect and let twisted handle it.

comment:3 Changed 3 years ago by exarkun

It seems like there probably aren't many cases where you will care if you got the 100 status (as opposed to not getting anything). It should perhaps be possible to tell if you did or not, but the default behavior should probably be to just do the right thing when the status arrives (ie, begin writing the body using the object already passed to the request method).

comment:4 Changed 3 years ago by darfire

Is there a reason not to move this logic down to HTTP11ClientProtocol? It seems a more natural place for things that have to do with the HTTP protocol. The interface would be the same, only the 100-Continue response will be handled internally. A way the user can find out whether the body was actually written would be to use the body producer.

comment:5 Changed 3 years ago by exarkun

Sure; I doubt this feature could be implemented without some changes to HTTP11ClientProtocol. The summary says Agent because Agent is the public-facing API.

Changed 3 years ago by darfire

Adds 100-Continue to HTTP11ClientProtocol

comment:6 Changed 3 years ago by darfire

  • Keywords review added

Adding 100-Continue support to HTTP11ClientProtocol.

comment:7 Changed 3 years ago by itamar

What curl does to handle badly behaving servers that don't support "100 continue" (like twisted.web until #4673 is merged) is send headers, wait for response, if none is forthcoming after 1 second it then sends body on assumption server doesn't do 100 continue support. This seems like it'd be nice to have.

This is not necessary to close this ticket, so if this behavior is not included at merge time, please open a separate ticket.

Changed 3 years ago by darfire

comment:8 Changed 3 years ago by darfire

5192_2.patch adds support for sending the request body after a timeout expires in case the server didn't send a response. This timeout is fixed and there's no mechanism for the moment for the caller to know whether the request body was sent because the server replied with 100-continue or because the timeout expired (apart from the 1 second delay).

comment:9 Changed 3 years ago by exarkun

  • Keywords review removed
  • Owner set to darfire

Thanks. It's really great to see progress here. Some thoughts about the patch:

  1. Instead of re-using the WAITING state, it's probably better to introduce a new state. WAITING already means something, and the state machine is easier to follow if the states don't get overloaded.
  2. Similarly, instead of the callbacks and errbacks underneath the if _expects100Continue: check, it would be better to extend the state machine with more states and transitions. For example, the current brokenServerTimer approach looks like it will cause problems if the 100-Continue response arrives after the timeout fires.
  3. In the tests, BufferProducer can be replaced with FileBodyProducer(StringIO(...)).
  4. Both the implementation and the tests should avoid using the global reactor. Instead parameterize it and use a twisted.internet.task.Clock instance in the tests to keep them isolated from the real time.
  5. The tests don't exercise all of the new/changed code. brokenServerTimer and ebFirstResponse in particular need more test coverage. I highly recommend coverage.py to report the state of test coverage.

Thanks a lot for your efforts here! I'm looking forward to the next version.

Changed 3 years ago by darfire

comment:10 Changed 3 years ago by darfire

Ok, added another iteration of this patch. Some notes:

  • As exarkun suggested I've expanded the state machine by adding states for waiting for the 100-continue response and its body.
  • When we send the request body because we didn't get a response in time (maybe because the server doesn't know about expectations), we check the first response received. If it is an 100-Continue response this means that the server was just slow and we consume it and return the second response to the user.
  • As a result of the approach at 2. the TRANSMITTING and WAITING states are a little bit overloaded, in the sense that while in there we might have self._forcedRequestBody == True but it seemed overkill to double them for this situation.
  • Finally, a request call produces a few request specific variables in the HTTP11ClientProtocol and these should probably be migrated into something like a RequestContext, maybe when we do persistent HTTP connections.

comment:11 Changed 3 years ago by itamar

  • Keywords review added
  • Owner darfire deleted

Looks like this is ready for review again.

comment:12 Changed 3 years ago by therve

  • Keywords review removed
  • Owner set to darfire

Sorry for the long wait.

As a preliminary comment, I'd say it's too bad the changes are so deep in HTTP11ClientProtocol. I guess it makes sense, but it feels weird.

  • I've noticed a couple of things untested (ie no tests fail if I touch it). Eg:
    +        _requestBodyDeferred.chainDeferred(self._requestDeferred)
    
    As well as _cleanupOn100ContinueError_TRANSMITTING, _cleanupOn100ContinueError_WAITING, _connectionLost_WAITING_100_CONTINUE_RESPONSE, _connectionLost_WAITING_100_CONTINUE_RESPONSE_BODY

Alright, for the cosmetics:

  • +        self.protocol = HTTP11ClientProtocol(reactor = self.clock)
    
    +        cooperator = Cooperator(scheduler = _immediateScheduler, started = False)
    +        producer = FileBodyProducer(StringIO(body), cooperator = cooperator)
    
    +    def __init__(self, reactor = None):
    
    +    def _setupParser(self, request, data = ''):
    
    Please remove the spaces around the "=" operator in these cases.
  • Please put 2 blank lines between your test methods.
  • +               TEorCL = "Transfer-Encoding: chunked\r\n"
    +           else:
    +               TEorCL = 'Content-Length: %d\r\n' % (self.bodyProducer.length,)
    
    You should use single quotes consistently. Also, TEorCL is not a great variable name. What about "header"?
  • +        Format this L{Request}'s headers as HTTP/1.1 and write them
    +        synchronously to the given transport
    
    I'm not sure what you mean by synchronously in this context, but it sounds a bit weird.
  • +            _expects100Continue = '100-continue' in [x.lower() for x in
    +                    _expectations]
    
    Indentation is a little bit weird. I also think the code would be more clear if you had a distinct for loop.
  • +            if self._state in ['TRANSMITTING',
    +                    'WAITING_100_CONTINUE_RESPONSE',
    +                    'WAITING_100_CONTINUE_RESPONSE_BODY']:
    
    Again strange indentation. What about:
    +            if self._state in ['TRANSMITTING',
    +                               'WAITING_100_CONTINUE_RESPONSE',
    +                               'WAITING_100_CONTINUE_RESPONSE_BODY']:
    
    Otherwise, when you break a line you should indent by one level only, like
    +        self._firstResponseDeferred.addCallbacks(
    +            self._handleFirstResponse, self._handle100ContinueError)
    
  • +        self._forcedRequestBody = False #yet
    
    You can remove the comment.
  • +        L{Request} has been parse. If the L{Response} wasn't an 100-Continue
    
    Typo: parsed.
  • +        expects 100-Continue. This are errors on parsing the first response
    
    Typo: these.

Thanks a lot for the patch, it's in a great shape. I hope we can review it faster next time...

comment:13 Changed 3 years ago by exarkun

What about "header"?

That's bad in a different way. TEorCL stands for "Transfer-Encoding or Content-Length". How about "bodyFramingHeader"?

Changed 3 years ago by darfire

comment:14 Changed 3 years ago by darfire

Sorry for the delay. Also, thank you for the updates, they were very useful.

I've updated the patch to work with quiescent callbacks.

Also, I've integrated your suggestions.

Regarding the parts that are not tested:

+        _requestBodyDeferred.chainDeferred(self._requestDeferred)

and cleanupOn100ContinueError_TRANSMITTING, _cleanupOn100ContinueError_WAITING seem to cause some tests to fail when I touch (comment parts of ) them. However, the failure consists in the test timeouting. Is that an accepted mode of failure?

The tests for _connectionLost_* were indeed missing. Also, I've added tests for the correct interaction with quiescent callbacks.

Thank you for the suggestions, looking forward to the next iteration.

comment:15 Changed 3 years ago by darfire

  • Keywords review added

comment:16 Changed 2 years ago by thijs

  • Owner darfire deleted

comment:17 Changed 2 years ago by glyph

  • Owner set to glyph
  • Status changed from new to assigned

comment:18 Changed 2 years ago by glyph

  • Author set to glyph
  • Branch set to branches/100continue-5192

(In [35762]) Branching to '100continue-5192'

comment:19 Changed 2 years ago by glyph

  • Keywords review removed

darfire,

Thank you so much for your continued patience and this excellent and substantial patch.

There are a couple of issues, which are minor enough that I think I can fix them for you and then commit it.

  1. As per the api-documentation builder, there are newly-invalid docstrings along with your changes.
  2. There should actually be two spaces after a period in a docstring, in HTTP11ClientProtocol where you "fixed" that it should be adjusted back :).
  3. This needs a news file.
  4. Timing out is not a particularly great failure mode. Tests should always fail (not error, not time out, not hang) when you write them. Luckily this is easy to fix, because you're not using the global reactor any more! When you have a Clock(), that means that you can assert that the Deferred was called by advancing the clock rather than waiting for the Deferred to be called by the reactor. (If this requires more extensive fixing than it appears at first glance, then I'll have to send this back to you.)
  5. All new and modified functions and classes need to have docstrings; even private ones, for future maintainers. So _writeHeaders should have a docstring. Thanks for covering everything else though!

-glyph

comment:20 Changed 18 months ago by sdsern

  • Owner changed from glyph to sdsern
  • Status changed from assigned to new

comment:21 Changed 18 months ago by sdsern

  • Status changed from new to assigned

Changed 18 months ago by sdsern

Changed 18 months ago by sdsern

comment:22 Changed 18 months ago by sdsern

  • Keywords review added
  • Owner sdsern deleted
  • Status changed from assigned to new

Attachment 5192_6.patch contains the changes mentioned in comment 19. Ignore attachment 5192_5.patch which is missing a new file.

comment:23 Changed 17 months ago by dreid

(In [38667]) Apply 5192_6.patch from sdsern. Refs #5192.

comment:24 Changed 17 months ago by dreid

  • Author changed from glyph to dreid, glyph
  • Branch changed from branches/100continue-5192 to branches/100continue-5192-2

(In [38668]) Branching to 100continue-5192-2.

comment:25 Changed 17 months ago by dreid

  • Keywords review removed
  • Owner set to sdsern

sdsern,

Thanks for your contribution.

I applied your patch in anticipation of reviewing this branch but I was unable to merge forward and review it because of non-trivial merge conflicts the work done in #4330.

I did however notice a few issues:

  1. In HTTP11ClientProtocol the docstring:
    +    @ivar _firstResponseTimer: A L{Deferred} that fires after
    +        TIMEOUT_100_CONTINUE seconds and forcefully sends the L{Request} body
    +        to the server.
    

_firstResponseTimer is not a Deferred, it is a DelayedCall.

  1. The original patch introduces trailing whitespace in test_newclient.py
  2. In _finishResponse_WAITING:
    +        if self._state in ['WAITING',
    +                'WAITING_100_CONTINUE_RESPONSE']:
    
    should be:
+        if self._state in ['WAITING',
+                           'WAITING_100_CONTINUE_RESPONSE']:
  1. All these new tests should be able to use self.successResultOf and self.failureResultOf instead of returning a Deferred from the test method. Tests using these methods are generally shorter and easier to read while also allowing you to:
    1. Easily assert things about the expected result of a deferred.
    2. Ensure that the tests are entirely synchronous and remove unwanted non-determinism.

Again, thank you for your contribution, I look forward to a more thorough review of this patch once it applies cleanly to trunk.

comment:26 Changed 16 months ago by shira

  • Owner changed from sdsern to shira
  • Status changed from new to assigned

comment:27 Changed 13 months ago by shira

  • Status changed from assigned to new
Note: See TracTickets for help on using tickets.