Opened 12 years ago

Closed 9 years ago

#3233 defect closed fixed (fixed)

twisted.test.test_tcp.LargeBufferTestCase.testWriter fails with iocp reactor

Reported by: therve Owned by: PenguinOfDoom
Priority: highest Milestone:
Component: core Keywords:
Cc: ivank Branch: branches/iocp-tcp-large-write-3233-2
branch-diff, diff-cov, branch-cov, buildbot
Author: pahan, therve

Description

For example, on a recent buildot run:

[FAIL]: twisted.test.test_tcp.LargeBufferTestCase.testWriter

Traceback (most recent call last):
  File "c:\twistedbuildbot24\WXP32-full2.5-iocp\Twisted\twisted\test\test_tcp.py", line 1506, in check
    "(%d != %d)" % (clientF.len, self.datalen))
twisted.trial.unittest.FailTest: client didn't receive all the data it expected (0 != 62914560)

I think the fix is fairly trivial: writeToHandle doesn't limit the quantity of data sent.

Change History (23)

comment:1 Changed 12 years ago by therve

author: therve
Branch: branches/iocp-tcp-large-write-3233

(In [23607]) Branching to 'iocp-tcp-large-write-3233'

comment:2 Changed 12 years ago by therve

Keywords: review added
Owner: therve deleted

Hooray for self-contained change! Please review.

comment:3 Changed 12 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: set to therve

So simple looking. :)

Can you add better docs for IWriteHandle.writeToHandle and give the implementation a docstring as well? In particular, it seems like the return value is somewhat important.

comment:4 Changed 12 years ago by therve

Keywords: review added
Owner: therve deleted

I've added some doc. This is not totally clear to me as well, so I tried to explain what I understood.

comment:5 Changed 12 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: set to therve

Cool. Unfortunately, I still see some failures on some builders. eg, on Vista:

===============================================================================
[FAIL]: twisted.test.test_tcp.LargeBufferTestCase.testWriter

Traceback (most recent call last):
  File "c:\twistedbuildbot\Vista-full2.5-scmikes-iocp\Twisted\twisted\test\test_tcp.py", line 1506, in check
    "(%d != %d)" % (clientF.len, self.datalen))
twisted.trial.unittest.FailTest: client didn't receive all the data it expected (62913428 != 62914560)
===============================================================================

The xp machines seem happier now though. Do we have any idea what the difference might be? If it's really fixed on xp, then I see no reason not to merge now; vista is not a supported platform. I just wonder if it's really fixed on xp or if it's just getting lucky and some random irrelevant difference on vista is revealing a real problem that's obscured on xp.

comment:6 Changed 11 years ago by ivank

Author: thervetherve, ivank

I see this bug on Server 2008 SP1 (Python 2.6.2 32 bit, Twisted r27226). It seems to always fail, and does so with wildly varying numbers:

C:\temp\temp3>python C:\Python26\Scripts\trial.py -r iocp twisted.test.test_tcp.LargeBufferTestCase.testWriter
twisted.test.test_tcp
  LargeBufferTestCase
    testWriter ...                                                       [FAIL]

===============================================================================
[FAIL]: twisted.test.test_tcp.LargeBufferTestCase.testWriter

Traceback (most recent call last):
  File "C:\Python26\lib\site-packages\twisted\test\test_tcp.py", line 1544, in check
    "(%d != %d)" % (clientF.len, self.datalen))
twisted.trial.unittest.FailTest: client didn't receive all the data it expected (62907020 != 62914560)
-------------------------------------------------------------------------------
Ran 1 tests in 0.633s

FAILED (failures=1)

C:\temp\temp3>python C:\Python26\Scripts\trial.py -r iocp twisted.test.test_tcp.LargeBufferTestCase.testWriter
twisted.test.test_tcp
  LargeBufferTestCase
    testWriter ...                                                       [FAIL]

===============================================================================
[FAIL]: twisted.test.test_tcp.LargeBufferTestCase.testWriter

Traceback (most recent call last):
  File "C:\Python26\lib\site-packages\twisted\test\test_tcp.py", line 1544, in check
    "(%d != %d)" % (clientF.len, self.datalen))
twisted.trial.unittest.FailTest: client didn't receive all the data it expected (25256540 != 62914560)
-------------------------------------------------------------------------------
Ran 1 tests in 0.177s

Also:

twisted.trial.unittest.FailTest: client didn't receive all the data it expected (233600 != 62914560)
twisted.trial.unittest.FailTest: client didn't receive all the data it expected (16116372 != 62914560)
twisted.trial.unittest.FailTest: client didn't receive all the data it expected (2854300 != 62914560)
twisted.trial.unittest.FailTest: client didn't receive all the data it expected (9193620 != 62914560)

http://twistedmatrix.com/trac/ticket/3567

comment:7 Changed 11 years ago by ivank

Please ignore the last link to an irrelevant ticket in my previous comment.

comment:8 Changed 11 years ago by ivank

Author: therve, ivanktherve
Cc: ivank added

sorry, I meant to Cc

comment:9 Changed 10 years ago by ghazel

I'm not sure this problem is related to writes or write size at all. But slowing down or spacing out the writes might prevent the test from failing since the receiving side isn't reading quite as fast.

I did a little debugging of this test failure on Windows 7. Here's what I found:

The receiving side indicates a clean socket closure first, and then the sending side gets a RST - which indicates Windows thought there was pending IO on the socket.

iocpsupport claims WSARecv is returning 0 with a lpNumberOfBytesRecvd of 0. MSDN claims "For byte streams, zero bytes having been read (as indicated by a zero return value to indicate success, and lpNumberOfBytesRecvd value of zero) indicates graceful closure and that no more bytes will ever be read."

However, if I ignore this occurrence when WSARecv returns immediately (but continue to treat it as EOF when a callback presents this 0,0 result) I am able to continue reading bytes from the socket, and the test always passes.

I am much more inclined to think this is a mis-use of IOCP somewhere than a bug in WSARecv.

comment:11 Changed 10 years ago by ghazel

Ok, SetFileCompletionNotificationModes has core affinity problems and does not fix the problem here, so let's avoid that for now.

Looking at boost::asio, which seems to use IOCP successfully even at high speeds, the return value of WSARecv is only considered to complete an async operation if an error occurs. Other completions, including EOF, are only processed when the overlapped is returned by GQCS.

So, iocpcreator should probably be modified to do the same. It certainly solves this bug.

comment:12 Changed 9 years ago by <automation>

Owner: therve deleted

comment:13 Changed 9 years ago by pahan

Author: thervepahan, therve
Branch: branches/iocp-tcp-large-write-3233branches/iocp-tcp-large-write-3233-2

(In [31017]) Branching to 'iocp-tcp-large-write-3233-2'

comment:14 Changed 9 years ago by PenguinOfDoom

Keywords: review added
Priority: normalhighest

I removed immediate success handling from every operation

comment:15 Changed 9 years ago by Jean-Paul Calderone

Owner: set to Jean-Paul Calderone
Status: newassigned

comment:16 Changed 9 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: changed from Jean-Paul Calderone to PenguinOfDoom
Status: assignednew

Woop. Thanks for working on this.

  1. I'm not sure I fully understand the code. You said you removed the immediate success handling, but there is lots of code that does handleRead if rc != ERROR_IO_PENDING. That seems like what I would call "immediate success handling". What's up?
  2. Could the doRead in udp.py go away and be replaced by a suitable implementation of readFromHandle, which the doRead in abstract.py will call? Also why is evt.buff set to self._readBuffers in one place and self._readBuffers[0] in another?
  3. Some style issues here and there - stop using underscores in the middle of variables, add a few more docstrings (to changed methods) to explain what they do a little bit.
  4. There's the test that fails on POSIX that should be skipped, as we discussed IRL

comment:17 Changed 9 years ago by PenguinOfDoom

  1. There is immediate success (rc == 0) and immediate failure (rc not in [0, ERROR_IO_PENDING]). The latter doesn't put an event into the completion queue and has to be handled immediately.
  2. That code is, indeed, icky. I opened #4959 to refactor it.
  3. #4959 is probably a better place for mass renames.
  4. The test is now skipped if the imports fail.

comment:18 Changed 9 years ago by PenguinOfDoom

Keywords: review added
Owner: changed from PenguinOfDoom to Jean-Paul Calderone

comment:20 Changed 9 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: changed from Jean-Paul Calderone to PenguinOfDoom

Woop. Hooray. The implementation and test coverage looks good. There are a couple minor issues with the test code itself:

  1. If you can remove the duplication of pauseProducing/resumeProducing in StopStartReadingProtocol methods without damaging readability too much, that'd be good.
  2. Some docstrings would be nice. Fewer methods will let you write fewer docstrings, fwiw. ;)
  3. Rename the terrible methods if you like
  4. Use assertTrue instead of assert_, or use assertEquals if there's an == in the expression passed in.
  5. don't use under_scores, use camelCase
  6. Don't use semicolons!

comment:21 Changed 9 years ago by PenguinOfDoom

Keywords: review added
Owner: changed from PenguinOfDoom to Jean-Paul Calderone

comment:22 Changed 9 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: changed from Jean-Paul Calderone to PenguinOfDoom

This looks sort of great now. The gtk2reactor/glib2reactor failures need to be addressed. It should be skipped if you don't want to figure out how it might be fixed.

comment:23 Changed 9 years ago by pahan

Resolution: fixed
Status: newclosed

(In [31217]) Merge iocp-tcp-large-write-3233-2

Author: PenguinOfDoom Reviewer: exarkun Fixes: #3233

Remove immediate successful completion handling for all overlapped operations.

Note: See TracTickets for help on using tickets.