Opened 6 years ago

Closed 3 years ago

#3450 enhancement closed fixed (fixed)

HTTP client API doesn't allow the connection setup timeout to be specified

Reported by: palo Owned by: therve
Priority: normal Milestone:
Component: web Keywords: timeout web httpclient
Cc: exarkun, riteshn@… Branch: branches/agent-connect-timeout-3450
(diff, github, buildbot, log)
Author: therve Launchpad Bug:

Description (last modified by exarkun)

reactor.connectTCP imposes a timeout - 30 seconds by default. twisted.web.client.getPage offers no way for application code to specify this value.

There should be a way to control the value of the connection timeout for HTTP clients.

Attachments (1)

twisted-bug.zip (5.4 MB) - added by palo 6 years ago.
the test script, a spreadsheet with many statistic datas and other things

Change History (17)

Changed 6 years ago by palo

the test script, a spreadsheet with many statistic datas and other things

comment:1 Changed 6 years ago by palo

Track has removed some line breaks. How can I edit the text?
In the attached zip archiv is a proper version of the text.

The timeout exception is:

Traceback (most recent call last):
Failure: twisted.internet.error.TimeoutError: User timeout caused connection failure.

comment:2 Changed 6 years ago by exarkun

  • Description modified (diff)

Fix some of the description markup.

comment:3 Changed 6 years ago by exarkun

  • Cc exarkun added
  • Priority changed from high to normal

reactor.connectTCP typically defaults to a 30 second connection setup timeout. It sounds like this is the timeout you're encountering.

Can you try raising this timeout (edit twisted/web/client.py, the _makeGetterFactory function, add a higher timeout value to the call to connectSSL or connectTCP there) and run your tests again?

comment:4 Changed 6 years ago by exarkun

  • Resolution set to invalid
  • Status changed from new to closed

Please raise this on the mailing list or on the IRC channel if you are still having difficulty. I don't think there's a bug in Twisted here, though. Timeouts are happening as expected due to the workload.

comment:5 Changed 6 years ago by Mekk

  • Resolution invalid deleted
  • Status changed from closed to reopened

The problem is not in the timeout as such, but in the fact that it is set incorrectly.
getPage (and HTTPClientFactory) accept timeout parameter, but even if it is set to something big, default 30 seconds strike.

See http://www.twistedmatrix.com/pipermail/twisted-web/2006-February/002490.html
and http://www.twistedmatrix.com/pipermail/twisted-web/2006-February/002491.html

comment:6 Changed 6 years ago by exarkun

  • Description modified (diff)
  • Summary changed from user timeout bug to HTTP client API doesn't allow the connection setup timeout to be specified
  • Type changed from defect to enhancement

Adjusting summary and description to reflect the feature request.

The old description was:

Hello,


sorry for my bad english. I hope someone understand what I mean.


The short description: In twisted.web.client must be a bug.
If I have a URL list with many bad URLs I get a lot (50 - 100%) of strange
user timeout exceptions. But only if I have to many parallel requests
(more than 10).


For example:


I have a list of 20.000 URLs (with many bad URLs in it).
I have a script, that has always 10 parallel requests -> 2.65 % of the
requests get the timeout exception.
I start the same script with the same URLs list and 50 parallel requests
-> 5,88 % of the requests get the timeout exception.
I start the same script with the same URLs list and 100 parallel requests
-> 64,1 % of the requests get the timeout exception.
250 parallel requests -> 99,38 % of the requests get the timeout exception.


But if I use the script with a list of 20.000 good URLs I get this result:


100 parallel requests -> 0,34 % of the requests get the timeout exception.
250 parallel requests -> 0,48 % of the requests get the timeout exception.



Therefore, I think there must be a bug.


Why are there 100 % timeouts with 250 parallel request, but with 10 parallel
request only 2.65 % with the bad URL list?



The timeout exception is:


Traceback (most recent call last):
Failure: twisted.internet.error.TimeoutError: User timeout caused connection failure.

---------------------



I will attach my test script, a spreadsheet with many statistic datas and
other things (so you can better look at the bug) in this bug report:


Content of the zip file:


------------------
The URL lists:
------------------


  • the good urls: miscurls.py
  • the bad urls: webproxys.py



--------------------------------
The script (WebProxyChecker.py):
--------------------------------


Input (over comand line):


  • first parameter: quantity of parallel requests
  • second parameter: the url list "mode":
    • "miscurls" for the good url list (20.000 urls)
    • "webproxy" for the bad url list (20.000 urls)
  • third parameter: quantity of requests
    • (the script selects the urls per random.sample() from the url list)


example: python WebProxyChecker.py 20000 webproxy 50


the script prints statistic data and all exceptions to stdout.


---------------
bash_script.sh
---------------
calls the test script with many different parameters.
(runs on my machine ca. 30 hours, with 526.100 requests)


usage example: bash bash_script.sh | tee WebProxyCheckStatRaw.nfo


----------
parser.py
----------


parser for the output from the bash script, witch generates
a) a shelve db for easyly usage of the datas

(incl. all received exceptions) and

b) a spreadsheet for a better overlook.


----------------
The spreadsheet
----------------


  1. column: number of parallel requests (first comand line parameter from WebProxyChecker.py)
  2. column: url list (second comand line parameter from WebProxyChecker.py)
  3. column: number of requests (third comand line parameter from WebProxyChecker.py)
  4. column: number of successful requests
  5. column: errors (number of received exceptions)
  6. column: user timeouts (number of the strange user timeouts)
  7. column: % requests with user timeouts -> column_6 / (column_3 / 100)
  8. column: duration of all requests (in seconds)
  9. column: duration per requests (in seconds)
  10. column: the key in in the shelve db for this record (for example to look at the exceptions)



---------------
The shelve db
---------------


db key = the comand on the comand line (10. column in the spreadsheet)

for example: python WebProxyChecker.py 20000 webproxy 50


    >>> import shelve
    >>> db = shelve.open('result_db.slv')
    >>> record = [r for r in db.values()][0]
    >>> record['parallel_requests'] # 1. spreadsheet column
    '250'
    >>> record['URL_mode'] # 2. spreadsheet column
    'miscurls'
    >>> record['checked_URLs'] # 3. spreadsheet column
    '100'
    >>> record['successful_requests'] # 4. spreadsheet column
    '99'
    >>> record['errors'] # 5. spreadsheet column
    '1'
    >>> record['seconds_altogether'] # 8. spreadsheet column
    '10.0048542023'
    >>> record['seconds_per_url'] # 9. spreadsheet column
    '0.100048542023'
    >>> exceptions = record['exceptions'] # all exceptions
    >>> for exception, data in exceptions.iteritems():
    ...     print exception # the original exception string
    ...     data['cnt'] # occurrence of this exception
    ...     data['urls'] # a set of the urls that caused this exception
    ...     data['vars'] # a dict of lists, with removed data (all strings between '', "" and <>)
    ...     break
    ... 
    Traceback (most recent call last):
    Failure: twisted.web.error.Error: 400 Bad Request
    1
    set(['http://www.cruise.ch'])
    {}
    >>> 

----------
The rest:
----------


  • WebProxyCheckStatRaw.nfo: output from the bash script.

Unessential; all data are in the shelve db or in the
spreadsheet for simple usage.

  • this text



-------------------------------------------------------


My System:


  • Ubuntu 8.04
  • Python 2.5.2
  • Twisted 8.1.0


Do you need additional information?


I am searching this bug since over a week, but I don't know
twisted internals and can't find it.


Sorry for my terrible, poorly english.

comment:7 Changed 6 years ago by exarkun

  • Keywords httpclient added

comment:8 Changed 4 years ago by psykidellic

  • Cc riteshn@… added

comment:9 Changed 4 years ago by <automation>

  • Owner jknight deleted

comment:10 Changed 3 years ago by therve

  • Owner set to therve
  • Status changed from reopened to new

comment:11 Changed 3 years ago by therve

  • Author set to therve
  • Branch set to branches/agent-connect-timeout-3450

(In [32164]) Branching to 'agent-connect-timeout-3450'

comment:12 Changed 3 years ago by therve

  • Keywords review added
  • Owner therve deleted

Here it is. I added support to customize the connection timeout to Agent, and I added bindAddress at the same time as it sounded trivial. Please review.

comment:13 Changed 3 years ago by jonathanj

  • Keywords review removed
  • Owner set to therve

I think "argument which forwarded to the" should read "argument which is forwarded to the", times four.

Looks good to merge.

comment:14 Changed 3 years ago by exarkun

There's no test for the negative case. Since this is a security-related feature, it's probably doubly important to exercise the code path that causes secure cookies for a particular request to not be sent over an insecure transport (though even for non-security related cases, it's important to test positive and negative paths).

comment:15 Changed 3 years ago by exarkun

My previous comment was meant for another ticket.

comment:16 Changed 3 years ago by therve

  • Resolution set to fixed
  • Status changed from new to closed

(In [32244]) Merge agent-connect-timeout-3450

Author: therve
Reviewer: jonathanj
Fixes: #3450

Support the parametrization of connection timeout and bind address in
twisted.web.client.Agent.

Note: See TracTickets for help on using tickets.