Opened 5 years ago

Last modified 4 years ago

#5921 defect new

Trial fails flushing reporter stream with ENOSPC on Windows

Reported by: Itamar Turner-Trauring Owned by: Itamar Turner-Trauring
Priority: normal Milestone:
Component: core Keywords:
Cc: Tom Prince, Jean-Paul Calderone Branch: branches/ENOSPC-5921-4
branch-diff, diff-cov, branch-cov, buildbot
Author: itamarst, itamar, exarkun, glyph

Description

When running under a buildslave on Windows, trial is writing to stdout, attempts to flush, and sometimes gets ENOSPC:

...
  File "C:\Users\Build Slave\bot-glyph-6\windows7-64-py2.7-select\Twisted\twisted\trial\unittest.py", line 752, in run
    result.startTest(self)
  File "C:\Users\Build Slave\bot-glyph-6\windows7-64-py2.7-select\Twisted\twisted\trial\reporter.py", line 765, in startTest
    self._write('%s ... ', tm.id())
  File "C:\Users\Build Slave\bot-glyph-6\windows7-64-py2.7-select\Twisted\twisted\trial\reporter.py", line 436, in _write
    untilConcludes(self._stream.flush)
  File "C:\Users\Build Slave\bot-glyph-6\windows7-64-py2.7-select\Twisted\twisted\python\util.py", line 776, in untilConcludes
    return f(*a, **kw)
IOError: [Errno 28] No space left on device

Here's something that may be relevant: https://lists.gnu.org/archive/html/bug-gnulib/2011-04/msg00169.html

It suggests that untilConcludes or maybe trial's SafeStream should treat ENOSPC on writes to pipes the same as EAGAIN, at first glance.

Change History (23)

comment:1 Changed 5 years ago by Itamar Turner-Trauring

Summary: Trial fails flushing log with ENOSPC on WindowsTrial fails flushing reporter stream with ENOSPC on Windows

comment:2 Changed 5 years ago by itamarst

Author: itamarst
Branch: branches/ENOSPC-5921

(In [35395]) Branching to 'ENOSPC-5921'

comment:3 Changed 5 years ago by Itamar Turner-Trauring

Author: itamarstitamar
Cc: Tom Prince Jean-Paul Calderone added

The attempted fix I checked in seemed to work (http://buildbot.twistedmatrix.com/builders/windows7-64-py2.7/builds/1636 passed with select() reactor). At this point it needs to be expanded to cover write() as well as flush(). And, of course, limited to Windows pipes (an infinite loop on ENOSPC on the filesystem isn't a good idea).

comment:4 Changed 5 years ago by Itamar Turner-Trauring

Owner: set to Itamar Turner-Trauring

comment:5 Changed 5 years ago by Itamar Turner-Trauring

Other thoughts: this may need to be fixed in stdio and process support as well. And of course we should find and read relevant MSDN docs before proceeding.

comment:6 Changed 5 years ago by Glyph

Relevant references: _write, _doserrno, don't have anything to say about this. ENOSPC is listed as just meaning that there's no disk space left.

comment:7 Changed 5 years ago by Glyph

Also nothing on _pipe.

comment:8 Changed 5 years ago by Glyph

Based on my research just now I think that this behavior may just be an undocumented bug, and it would be okay to treat the GNU mailing list message as authoritative.

comment:9 Changed 5 years ago by Itamar Turner-Trauring

Did you manage to reproduce it in a test script? I'm planning to include one as part of the review submission.

comment:10 Changed 5 years ago by itamarst

Author: itamaritamarst, itamar
Branch: branches/ENOSPC-5921branches/ENOSPC-5921-2

(In [35852]) Branching to 'ENOSPC-5921-2'

comment:11 Changed 5 years ago by Itamar Turner-Trauring

It occurs to me that calling untilConcludes(file.write, ...) is a bug: file.write may call os.write multiple times, and only the last one that raised the EINTR/EAGAIN should be retried. This suggests that ENOSPC handling should be moved into untilConcludes, and SafeStream refactored to use os.write.

First step is getting a test to reproduce the underlying problem.

comment:12 Changed 5 years ago by Itamar Turner-Trauring

http://buildbot.twistedmatrix.com/builders/winxp32-py2.7/builds/1466/steps/select/logs/stdio demonstrates the brokenness of current solution. The end looks like this:

===============================================================================
[TODO]
Reason: 'This may be impossible but it sure would be nice.'
Traceback (most recent call last):
  File "C:\buildslave\twisted\winxp32-py2.7\Twisted\twisted\test\test_modules.py", line 357, in test_underUnderPathNotAlreadyImported
    self._underUnderPathTest(False)
  File "C:\buildslave\twisted\winxp32-py2.7\Twisted\twisted\test\test_modules.py", line 341, in _underUnderPathTest
    importPackages=doImport),
  File "C:\buildslave\twisted\winxp32-py2.7\Twisted\twisted\python\test\modules_helpers.py", line 26, in findByIteration
    self.fail("Unable to find module %r through iteration." % (modname,))
twisted.trial.unittest.FailTest: Unable to find module 'pyspacetests3.foozle' through iteration.

twisted.test.test_modules.PathModificationTest.test_underUnderPathNotAlreadyImported
===============================================================================
[TODO]

twisted.test.test_modules.RebindingTest.test_underUnderPathNotAlreadyImported

[TODO]



[TODO]



[TODO]





program finished with exit code 1
elapsedTime=306.156000

comment:13 Changed 5 years ago by Glyph

Author: itamarst, itamaritamarst, itamar, glyph
Branch: branches/ENOSPC-5921-2branches/ENOSPC-5921-3

(In [36863]) Branching to 'ENOSPC-5921-3'

comment:14 Changed 5 years ago by Tom Prince

#1305 looks related which also points to http://bugs.python.org/issue11395

comment:15 Changed 4 years ago by Jean-Paul Calderone

(In [38233]) test cleanups

  • split up a test into two to exercise each case separately
  • be explicit about bytes in the SafeStream tests
  • wrap long lines
  • minor docstring improvements
  • make the error parameter to BrokenStream required for explicitness
  • factor out IOError creation to remove redundancy
  • fix some possible fd leaks

refs #5921

comment:16 Changed 4 years ago by Jean-Paul Calderone

Author: itamarst, itamar, glyphitamarst, itamar, exarkun, glyph
Branch: branches/ENOSPC-5921-3branches/ENOSPC-5921-4

(In [38241]) Branching to 'ENOSPC-5921-4'

comment:17 Changed 4 years ago by Jean-Paul Calderone

Well I thought I had some ideas, but they didn't come to anything. Basically just churned the code a little bit. I 'hope' the test coverage is at least a little better overall now, but I made it at least a little worse by getting rid of the ENOSPC/flush coverage because I thought that wasn't a real case (but it turns out it is a real case).

The branch basically fails the same way now as it did when I started, with the weird semi-formed summary that itamar quoted in his comment above. I also noticed other problems with the output, such as a big grouping of empty lines seemingly randomly inserted in the middle of a list of skips. I'm beginning to think naive retry on encountering ENOSPC is not a correct strategy (nor even the retry I implemented, where each write is attempted with fewer and fewer bytes). It's hard to explain how the observed behavior could be happening without the underlying write discarding some of the bytes even when it claims success.

I also observed that test.log appears truncated in the buildbot build report. It ends at twisted.test.test_twistd.TapFileTest.test_createOrGetApplicationWithTapFile but this is not the last test in the suite. I noticed this after adding some debug logging to SafeStream.

I also noticed that ENOSPC errors seem to happen a lot after big runs of skipped tests, perhaps because these can run faster and therefore the console buffer (or whatever it is that runs out of space) has a greater chance of filling up at these times.

comment:18 Changed 4 years ago by Jean-Paul Calderone

Another problem here is that the colorizers don't use SafeStream - they write straight to sys.stdout. Whether this is part of the problem here, I don't know, but it's at least a possibility.

comment:19 Changed 4 years ago by Jean-Paul Calderone

Something I don't understand is that the gnu mailing list post talks about non-blocking pipes. Is stdout a non-blocking pipe on Windows? Why would it be? I don't see any code in trial that sets it non-blocking.

comment:20 Changed 4 years ago by Jean-Paul Calderone

Whether this is part of the problem here, I don't know, but it's at least a possibility.

Buildbot runs trial with --bwverbose which skips colorizers so I guess this is unrelated.

comment:21 in reply to:  19 Changed 4 years ago by Glyph

Replying to exarkun:

Something I don't understand is that the gnu mailing list post talks about non-blocking pipes. Is stdout a non-blocking pipe on Windows? Why would it be? I don't see any code in trial that sets it non-blocking.

Keep in mind that Buildbot uses Twisted, and spawns processes on Windows with spawnProcess. Perhaps the non-blocking-ness of a pipe is even more global on Windows than on Linux, and it actually makes both ends non-blocking? Or perhaps the way we're setting up the pipe is just stupidly setting both to be non-blocking?

comment:22 Changed 4 years ago by Jean-Paul Calderone

Indeed. I still don't quite understand why it's so hard to reproduce, though (considering I tried using actual stdout and ran the attempt using BuildBot).

comment:23 Changed 4 years ago by Glyph

Just to keep the tally updated, this came up again in this review.

Note: See TracTickets for help on using tickets.