Opened 2 years ago

Closed 2 years ago

#8967 enhancement closed fixed (fixed)

twisted.internet.test.test_newtls.ProducerTestsMixin tests are flaky

Reported by: hawkowl Owned by: Glyph
Priority: normal Milestone:
Component: core Keywords:
Cc: Branch:
Author:

Description

[ERROR]
Traceback (most recent call last):
  File "/buildslave/freebsd10-py2_7/Twisted/build/py27-alldeps-nocov-posix/lib/python2.7/site-packages/twisted/internet/test/test_newtls.py", line 124, in test_producerSSLFromStart
    SSLCreator())
  File "/buildslave/freebsd10-py2_7/Twisted/build/py27-alldeps-nocov-posix/lib/python2.7/site-packages/twisted/internet/test/connectionmixins.py", line 178, in runProtocolsWithReactor
    reactorBuilder.runReactor(reactor)
  File "/buildslave/freebsd10-py2_7/Twisted/build/py27-alldeps-nocov-posix/lib/python2.7/site-packages/twisted/internet/test/reactormixins.py", line 323, in runReactor
    "reactor still running after %s seconds" % (timeout,))
twisted.internet.test.reactormixins.TestTimeoutError: reactor still running after 120.0 seconds

twisted.internet.test.test_newtls.ProducerTestsMixin_SelectReactorTests.test_producerSSLFromStart

Change History (13)

comment:1 Changed 2 years ago by hawkowl

I'm giving up on this. It seems that the client doesn't connect to the server. Things it isn't:

  • A bug in connectionmixins.
  • listen or bind raising EAGAIN (probably).
  • connect raising EAGAIN (probably).
  • using connectionMade vs handshakeRecieved.

????? good luck

comment:2 in reply to:  1 Changed 2 years ago by mark williams

Replying to hawkowl:

Things it isn't:

[snip]

  • using connectionMade vs handshakeRecieved.

I went ahead and tested this out after being aggravated by the bug and found that switching ProducerProtocol.connectionMade to handshakeCompleted did fix the issue! I did have to patch something to make that even possible:

https://twistedmatrix.com/trac/ticket/8973

I'm not ready to convert the tests to use this just yet, though. I'm still worried there's an authentic race condition on BSDs.

Last edited 2 years ago by mark williams (previous) (diff)

comment:3 Changed 2 years ago by hawkowl

I can still trigger it when I switched it to using handshakeCompleted, though. It's just much less rare. (Also, the bug still exists, even if the tests stop failing :))

comment:4 Changed 2 years ago by mark williams

Progress! The test hangs on FreeBSD when the client aborts the connection and sends an RST before the server can accept it, causing accept(2) to fail with ECONNABORTED and thus prevent TLSMemoryBIOFactory from creating the protocol whose _done Deferred holds up the test.

See https://github.com/twisted/twisted/blob/trunk/src/twisted/internet/tcp.py#L1044-L1062

That's why this doesn't manifest with a non-TLS version of the test (e.g., replacing SSLCreator with TCPCreator) and why this particular issue doesn't manifest with handshakeCompleted. TLSMemoryBIOProtocol.loseConnection aborts the connection if the handshake isn't done.

I think for this instance of the bug, the right thing to do is make ProducerProtocol an IHandshakeListener.

More research is needed for macOS.

Last edited 2 years ago by mark williams (previous) (diff)

comment:5 Changed 2 years ago by mark williams

The situation on macOS is weirder!

You can use the following script to send a reset to a server before it can call accept(2):

import argparse
import socket
import sys
import struct


def no_time_wait(listener):
    listener.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)


def send_rst(connection):
    connection.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER,
                          struct.pack("ii", 1, 0))


def server(_):
    listener = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    no_time_wait(listener)

    listener.bind(('localhost', 0))
    listener.listen(1)
    _, port = listener.getsockname()
    print("Listening on port %d."
          "  Press enter after the client has connected." % (port,))
    sys.stdin.readline()
    listener.accept()


def client(parsed):
    connection = socket.create_connection(('localhost', parsed.port))
    send_rst(connection)
    connection.close()


parser = argparse.ArgumentParser("Can accept(2) can fail with ECONNABORTED?")
subparsers = parser.add_subparsers()

server_parser = subparsers.add_parser("server")
server_parser.set_defaults(function=server)

client_parser = subparsers.add_parser("client")
client_parser.add_argument("port", type=int)
client_parser.set_defaults(function=client)

parsed = parser.parse_args()
parsed.function(parsed)

Run the server somewhere:

$ python econnaborted.py server
Listening on port NNNN.  Press enter after the client has connected.

Run the client somewhere else on the same machine:

$ python econnaborted.py client NNNN

Go back to the server, press enter, and see what accept did.

Here are my results:

Platform Result
Recent Linux (4.8.0) accept(2) returns a connection without raising an exception
FreeBSD 10.3-RELEASE accept(2) fails with ECONNABORTED
macOS Sierra accept(2) does not return (!)

The fact that macOS' accept(2) never returns explains why no log message appears.

I'm sure now that switching to handshakeCompleted will effectively eliminate this bug. I'm not at all sure why it hasn't appeared to totally eliminate it!

comment:6 in reply to:  5 Changed 2 years ago by mark williams

Replying to mark williams:

You can use the following script to send a reset to a server before it can call accept(2):

...except that it skips a step: it should first poll the socket for readability, then call accept(2).

Here's a correct version:

import argparse
import select
import socket
import sys
import struct


def no_time_wait(listener):
    listener.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)


def send_rst(connection):
    connection.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER,
                          struct.pack("ii", 1, 0))


def server(_):
    listener = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    no_time_wait(listener)

    listener.bind(('localhost', 0))
    listener.listen(1)
    _, port = listener.getsockname()
    print("Listening on port %d."
          "  Press enter after the client has connected." % (port,))
    sys.stdin.readline()
    readable, writable, exceptional = select.select([listener], 
                                                    [listener], 
                                                    [listener])
    print("readable: %r" % (readable,))
    print("writable: %r" % (writable,))
    print("exceptional: %r" % (writable,))
    if listener in readable:
        listener.accept()


def client(parsed):
    connection = socket.create_connection(('localhost', parsed.port))
    send_rst(connection)
    connection.close()


parser = argparse.ArgumentParser("Can accept(2) can fail with ECONNABORTED?")
subparsers = parser.add_subparsers()

server_parser = subparsers.add_parser("server")
server_parser.set_defaults(function=server)

client_parser = subparsers.add_parser("client")
client_parser.add_argument("port", type=int)
client_parser.set_defaults(function=client)

parsed = parser.parse_args()
parsed.function(parsed)

Run it the same way. And here are the results:

Platform select accept
Recent Linux (4.8.0) Listener is readable returns a connection without raising an exception
FreeBSD 10.3-RELEASE Listener is readable fails with ECONNABORTED
macOS Sierra does not return (!) N/A

That select(2) under macOS doesn't see the listener as readable explains why t.i.tcp.Port.doRead never runs.

comment:7 Changed 2 years ago by Glyph

What Sierra version are you testing on? For me, I get:

Listening on port 64189.  Press enter after the client has connected.

readable: [<socket._socketobject object at 0x10b562130>]
writable: []
exceptional: []

from the server, and the client just exits immediately on the first invocation.

comment:8 Changed 2 years ago by mark williams

Should have been El Capitan. Time for a new table!

Platform select accept
Recent Linux (4.8.0) Listener is readable returns a connection without raising an exception
FreeBSD 10.3-RELEASE Listener is readable fails with ECONNABORTED
OS X El Capitan does not return (!) N/A
macOS Sierra* Listener is readable returns a connection without raising an exception
  • Only sometimes?

comment:9 Changed 2 years ago by mark williams

https://github.com/twisted/twisted/pull/665

This should fix the FreeBSD builders at least...

Last edited 2 years ago by mark williams (previous) (diff)

comment:10 Changed 2 years ago by mark williams

Keywords: review added

comment:11 Changed 2 years ago by mark williams

I patched two additional issues.

One was in twisted.internet.test.test_ssl: ImmediatelyDisconnectingProtocol failed on macOS for what seems to be the same reason for the FreeBSD failure in test_producerSSLFromStart.

I also changed CloseAfterHandshake to be an IHandshakeListener even though no tests failed. Consistency!

The PR appears to have stabilized the FreeBSD builders:

https://buildbot.twistedmatrix.com/builders/freebsd10-py2.7-coverage/builds/82

https://buildbot.twistedmatrix.com/builders/freebsd10-py3.5-coverage/builds/79

comment:12 Changed 2 years ago by Glyph

Keywords: review removed
Owner: set to Glyph

Reviewed & approved. Since there's already clean CI, will land momentarily.

comment:13 Changed 2 years ago by Glyph <glyph@…>

Resolution: fixed
Status: newclosed

In aeeffc7:

Merge pull request #665 from twisted/8967-test_newtls-avoid-econnaborted

Author: markrwilliams

Reviewer: glyph

Fixes: ticket:8967

Fix test reliability bug in twisted.internet.test.test_newtls.ProducerTestsMixin by waiting for handshake completion for completion-sensitive tests.

Note: See TracTickets for help on using tickets.