[Twisted-Python] How do I debug this network problem?

Peter Westlake peter.westlake at pobox.com
Fri Nov 21 17:50:25 MST 2014


On Fri, 21 Nov 2014, at 19:26, Glyph wrote:
>> On Nov 21, 2014, at 15:13, Peter Westlake
>> <peter.westlake at pobox.com> wrote:
>>
>> I*am*missing something obvious. The file opened by open() immediately
>> goes out of scope. AAAUGH!
>
> So... back to square one? Or is this the solution to your problem? I
> don't entirely follow how this connects..

I didn't explain, sorry. This does indeed solve my problem!

It's a mistake in very nearly the first line of the first real Python
program I wrote. It goes likes this:

devnull = open('/dev/null').fileno()

# 1. Create a Python file object on /dev/null. Its file descriptor is
#    the first one available, 3.
# 2. Use it in an expression, and store the FD (3) in the variable
#    devnull.
# 3. Lose the last reference to it.
# 4. *The file object is garbage collected and the file is closed.*
# 5. The variable devnull remembers the file descriptor number of this
#    short-lived file.

reactor.connectTCP('localhost', 'http', factory)
# 6. Open a socket. Its file number is the first one available, which is
#    3 again.

reactor.callWhenRunning(reactor.spawnProcess, ProcessProtocol(),
'/bin/sleep', args=['/bin/sleep', '1000'], childFDs={0: devnull, 1:
'r', 2:'r'})

# 7. Pass the file on descriptor 3 to the spawned process. But it's the
#    socket, not /dev/null!

In the real program the socket is an AMP connection, and the
spawnProcess is a vast tree of processes that build a large and complex
piece of software. Eventually the build process reaches an SSH call, and
for some reason SSH sometimes reads from its stdin. I don't know why,
and I don't know why it always tried to read at precisely the same time
at Twisted. But I do know that it does, thanks to a systemtap probe.

To fix the bug (and it *is* fixed, and passes testing) I had the option
of either saving the file object:

devnull = open('/dev/null') ... spawnProcess(...childFDS{0:
devnull.fileno(), 1: 'r', 2: 'r'}...)

or closing stdin:

spawnProcess(...childFDS{1: 'r', 2: 'r'}...)

The fact that the AMP connection ever worked at all showed that the
build process didn't try to read stdin, so I chose the latter option.
The SSH still works, so I suppose it is using a select() or similar
to read stdin. That would account for it being synchronised with
Twisted: when they are both waiting for data, both of them see it at
once and collide.

I suspect that the reason this went away in 2009 was because a load of
SSH calls were removed (they were to Windows machines, and very
unreliable!), and the reason it came back now was because a load more
were put in. The system was moved behind a firewall, and had to do a lot
of its former work by remote control using SSH.

Again, thank you for all the help. I do like Twisted!

Peter.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/twisted-python/attachments/20141122/e2e02b40/attachment-0002.html>


More information about the Twisted-Python mailing list