Opened 17 months ago

Closed 5 months ago

Last modified 5 months ago

#9547 defect closed fixed (fixed)

twisted.web.client can fail with `exceptions.ValueError: list.remove(x): x not in list`

Reported by: Richard van der Hoff Owned by: Glyph
Priority: normal Milestone:
Component: core Keywords:
Cc: Branch:
Author:

Description

If an outgoing HTTP request is cancelled between the connect call and the request being written to the socket, then we later get an unhandled exception reported by the reactor:

Unhandled Error
Traceback (most recent call last):
  File "./test.py", line 59, in <module>
    reactor.run()
  File "/home/rav/work/synapse/env3/lib/python3.5/site-packages/twisted/internet/base.py", line 1261, in run
    self.mainLoop()
  File "/home/rav/work/synapse/env3/lib/python3.5/site-packages/twisted/internet/base.py", line 1270, in mainLoop
    self.runUntilCurrent()
--- <exception caught here> ---
  File "/home/rav/work/synapse/env3/lib/python3.5/site-packages/twisted/internet/base.py", line 896, in runUntilCurrent
    call.func(*call.args, **call.kw)
  File "/home/rav/work/synapse/env3/lib/python3.5/site-packages/twisted/internet/task.py", line 671, in _tick
    taskObj._oneWorkUnit()
  File "/home/rav/work/synapse/env3/lib/python3.5/site-packages/twisted/internet/task.py", line 521, in _oneWorkUnit
    self._completeWith(TaskFailed(), Failure())
  File "/home/rav/work/synapse/env3/lib/python3.5/site-packages/twisted/internet/task.py", line 477, in _completeWith
    self._cooperator._removeTask(self)
  File "/home/rav/work/synapse/env3/lib/python3.5/site-packages/twisted/internet/task.py", line 644, in _removeTask
    self._tasks.remove(task)
builtins.ValueError: list.remove(x): x not in list

The error is relatively harmless, but does cause noise and confusion which distracts from other problems.

I'll attach a script which demonstrates the problem. It's slightly contrived by sticking a sleep() call in the middle, but the race condition is otherwise hard to reproduce.

It seems two attempts are being made to remove the FileBodyProducer's write loop task - once when the request is cancelled, and again when the write loop later raises an exception.

Attachments (1)

test.py (1.6 KB) - added by Richard van der Hoff 17 months ago.
script demonstrating the issue

Download all attachments as: .zip

Change History (8)

Changed 17 months ago by Richard van der Hoff

Attachment: test.py added

script demonstrating the issue

comment:1 Changed 6 months ago by jchenggggg

Keywords: review added

Pull request: https://github.com/twisted/twisted/pull/1184

From the pull request:

twisted.web._newclient.Request._writeToBodyProducerContentLength returns a deferred that will be callbacked or errbacked depending on whether a Producer and Consumer pair are successful in writing the request body to the transport. This deferred has a canceller, which calls the Producer's canceller. However, the producer being used to reproduce the bug in 9547 (FileBodyProducer) has no canceller, so this just results in the cancel failure being passed to the producer's errback chain. This errback assumes that since the producer is errbacking, it should already have stopped producing, so stopProducing is not called in this case. However, the errback does stop the consumer.

This means that after the request has been cancelled, the producer will continue to produce as it is a scheduled task. But the consumer in this case (LengthEnforcingConsumer) does not expect the producer to continue to write to it, and so when the producer tries to write, execution comes to a branch at twisted/web/_newclient.py:992, which has a comment stating that if you've reached it then you've written a bug. In this branch, stopProducing is called on the producer (removing it from the task queue), and then an exception is raised (which causes the producer job to fail, and causes the task scheduler to try to remove the task from the queue again.) . This causes the exception reported in 9547.

This PR adds a canceller to FileBodyProducer. The canceller calls stopProducing, which will prevent the producer from trying to produce after the request has already been cancelled. As far as I can tell, FileBodyProducer is the only implementer of IBodyProducer for which it would make sense to have a canceller. The other way to do this would have been to have twisted.web._newclient.Request._writeToBodyProducerContentLength call stopProducing directly instead of calling cancel on the producer's deferred, but to me it seems to make more sense that Producers should be the ones to know how to cancel themselves.

comment:2 Changed 5 months ago by Tom Most

Owner: set to Tom Most
Status: newassigned

I'm assigning this to myself as I plan to do a review early next week. However, if anyone else gets to it sooner feel free to steal it.

comment:3 Changed 5 months ago by Glyph

Keywords: review removed
Owner: changed from Tom Most to jchenggggg
Status: assignednew

comment:4 Changed 5 months ago by jchenggggg

Keywords: review added
Owner: jchenggggg deleted

.

Last edited 5 months ago by Glyph (previous) (diff)

comment:5 Changed 5 months ago by Glyph

Keywords: review removed
Owner: set to Glyph

Ready to merge this now, just waiting for CI to finish up.

comment:6 Changed 5 months ago by Glyph <glyph@…>

Resolution: fixed
Status: newclosed

In e9b6f56:

Merge pull request #1184 from jchenggggg/9547-jchenggggg-make-filebodyproducer-cancellable

Author: jchenggggg

Reviewer: glyph

Fixes: ticket:9547

twisted.web.client.FileBodyProducer will now stop producing when the Deferred returned by FileBodyProducer.startProducing is cancelled.

comment:7 Changed 5 months ago by Glyph

Thank you for fixing this bug - your bearing with the process and driving this to completion is much appreciated!

Note: See TracTickets for help on using tickets.