Opened 10 years ago
Last modified 8 years ago
#5921 defect new
Trial fails flushing reporter stream with ENOSPC on Windows
Reported by: | Itamar Turner-Trauring | Owned by: | Itamar Turner-Trauring |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | core | Keywords: | |
Cc: | Tom Prince, Jean-Paul Calderone | Branch: |
branches/ENOSPC-5921-4
branch-diff, diff-cov, branch-cov, buildbot |
Author: | itamarst, itamar, exarkun, glyph |
Description
When running under a buildslave on Windows, trial is writing to stdout, attempts to flush, and sometimes gets ENOSPC:
... File "C:\Users\Build Slave\bot-glyph-6\windows7-64-py2.7-select\Twisted\twisted\trial\unittest.py", line 752, in run result.startTest(self) File "C:\Users\Build Slave\bot-glyph-6\windows7-64-py2.7-select\Twisted\twisted\trial\reporter.py", line 765, in startTest self._write('%s ... ', tm.id()) File "C:\Users\Build Slave\bot-glyph-6\windows7-64-py2.7-select\Twisted\twisted\trial\reporter.py", line 436, in _write untilConcludes(self._stream.flush) File "C:\Users\Build Slave\bot-glyph-6\windows7-64-py2.7-select\Twisted\twisted\python\util.py", line 776, in untilConcludes return f(*a, **kw) IOError: [Errno 28] No space left on device
Here's something that may be relevant: https://lists.gnu.org/archive/html/bug-gnulib/2011-04/msg00169.html
It suggests that untilConcludes
or maybe trial's SafeStream
should treat ENOSPC on writes to pipes the same as EAGAIN, at first glance.
Change History (23)
comment:1 Changed 10 years ago by
Summary: | Trial fails flushing log with ENOSPC on Windows → Trial fails flushing reporter stream with ENOSPC on Windows |
---|
comment:2 Changed 10 years ago by
Author: | → itamarst |
---|---|
Branch: | → branches/ENOSPC-5921 |
comment:3 Changed 10 years ago by
Author: | itamarst → itamar |
---|---|
Cc: | Tom Prince Jean-Paul Calderone added |
The attempted fix I checked in seemed to work (http://buildbot.twistedmatrix.com/builders/windows7-64-py2.7/builds/1636 passed with select()
reactor). At this point it needs to be expanded to cover write()
as well as flush()
. And, of course, limited to Windows pipes (an infinite loop on ENOSPC
on the filesystem isn't a good idea).
comment:4 Changed 10 years ago by
Owner: | set to Itamar Turner-Trauring |
---|
comment:5 Changed 10 years ago by
Other thoughts: this may need to be fixed in stdio and process support as well. And of course we should find and read relevant MSDN docs before proceeding.
comment:6 Changed 10 years ago by
Relevant references: _write, _doserrno, don't have anything to say about this. ENOSPC is listed as just meaning that there's no disk space left.
comment:8 Changed 10 years ago by
Based on my research just now I think that this behavior may just be an undocumented bug, and it would be okay to treat the GNU mailing list message as authoritative.
comment:9 Changed 10 years ago by
Did you manage to reproduce it in a test script? I'm planning to include one as part of the review submission.
comment:10 Changed 10 years ago by
Author: | itamar → itamarst, itamar |
---|---|
Branch: | branches/ENOSPC-5921 → branches/ENOSPC-5921-2 |
(In [35852]) Branching to 'ENOSPC-5921-2'
comment:11 Changed 10 years ago by
It occurs to me that calling untilConcludes(file.write, ...)
is a bug: file.write
may call os.write
multiple times, and only the last one that raised the EINTR/EAGAIN
should be retried. This suggests that ENOSPC
handling should be moved into untilConcludes
, and SafeStream
refactored to use os.write
.
First step is getting a test to reproduce the underlying problem.
comment:12 Changed 10 years ago by
http://buildbot.twistedmatrix.com/builders/winxp32-py2.7/builds/1466/steps/select/logs/stdio demonstrates the brokenness of current solution. The end looks like this:
=============================================================================== [TODO] Reason: 'This may be impossible but it sure would be nice.' Traceback (most recent call last): File "C:\buildslave\twisted\winxp32-py2.7\Twisted\twisted\test\test_modules.py", line 357, in test_underUnderPathNotAlreadyImported self._underUnderPathTest(False) File "C:\buildslave\twisted\winxp32-py2.7\Twisted\twisted\test\test_modules.py", line 341, in _underUnderPathTest importPackages=doImport), File "C:\buildslave\twisted\winxp32-py2.7\Twisted\twisted\python\test\modules_helpers.py", line 26, in findByIteration self.fail("Unable to find module %r through iteration." % (modname,)) twisted.trial.unittest.FailTest: Unable to find module 'pyspacetests3.foozle' through iteration. twisted.test.test_modules.PathModificationTest.test_underUnderPathNotAlreadyImported =============================================================================== [TODO] twisted.test.test_modules.RebindingTest.test_underUnderPathNotAlreadyImported [TODO] [TODO] [TODO] program finished with exit code 1 elapsedTime=306.156000
comment:13 Changed 9 years ago by
Author: | itamarst, itamar → itamarst, itamar, glyph |
---|---|
Branch: | branches/ENOSPC-5921-2 → branches/ENOSPC-5921-3 |
(In [36863]) Branching to 'ENOSPC-5921-3'
comment:14 Changed 9 years ago by
#1305 looks related which also points to http://bugs.python.org/issue11395
comment:15 Changed 9 years ago by
(In [38233]) test cleanups
- split up a test into two to exercise each case separately
- be explicit about bytes in the SafeStream tests
- wrap long lines
- minor docstring improvements
- make the error parameter to BrokenStream required for explicitness
- factor out IOError creation to remove redundancy
- fix some possible fd leaks
refs #5921
comment:16 Changed 9 years ago by
Author: | itamarst, itamar, glyph → itamarst, itamar, exarkun, glyph |
---|---|
Branch: | branches/ENOSPC-5921-3 → branches/ENOSPC-5921-4 |
(In [38241]) Branching to 'ENOSPC-5921-4'
comment:17 Changed 9 years ago by
Well I thought I had some ideas, but they didn't come to anything. Basically just churned the code a little bit. I 'hope' the test coverage is at least a little better overall now, but I made it at least a little worse by getting rid of the ENOSPC/flush coverage because I thought that wasn't a real case (but it turns out it is a real case).
The branch basically fails the same way now as it did when I started, with the weird semi-formed summary that itamar quoted in his comment above. I also noticed other problems with the output, such as a big grouping of empty lines seemingly randomly inserted in the middle of a list of skips. I'm beginning to think naive retry on encountering ENOSPC is not a correct strategy (nor even the retry I implemented, where each write is attempted with fewer and fewer bytes). It's hard to explain how the observed behavior could be happening without the underlying write discarding some of the bytes even when it claims success.
I also observed that test.log appears truncated in the buildbot build report. It ends at twisted.test.test_twistd.TapFileTest.test_createOrGetApplicationWithTapFile
but this is not the last test in the suite. I noticed this after adding some debug logging to SafeStream
.
I also noticed that ENOSPC errors seem to happen a lot after big runs of skipped tests, perhaps because these can run faster and therefore the console buffer (or whatever it is that runs out of space) has a greater chance of filling up at these times.
comment:18 Changed 8 years ago by
Another problem here is that the colorizers don't use SafeStream
- they write straight to sys.stdout
. Whether this is part of the problem here, I don't know, but it's at least a possibility.
comment:19 follow-up: 21 Changed 8 years ago by
Something I don't understand is that the gnu mailing list post talks about non-blocking pipes. Is stdout a non-blocking pipe on Windows? Why would it be? I don't see any code in trial that sets it non-blocking.
comment:20 Changed 8 years ago by
Whether this is part of the problem here, I don't know, but it's at least a possibility.
Buildbot runs trial with --bwverbose which skips colorizers so I guess this is unrelated.
comment:21 Changed 8 years ago by
Replying to exarkun:
Something I don't understand is that the gnu mailing list post talks about non-blocking pipes. Is stdout a non-blocking pipe on Windows? Why would it be? I don't see any code in trial that sets it non-blocking.
Keep in mind that Buildbot uses Twisted, and spawns processes on Windows with spawnProcess
. Perhaps the non-blocking-ness of a pipe is even more global on Windows than on Linux, and it actually makes both ends non-blocking? Or perhaps the way we're setting up the pipe is just stupidly setting both to be non-blocking?
comment:22 Changed 8 years ago by
Indeed. I still don't quite understand why it's so hard to reproduce, though (considering I tried using actual stdout and ran the attempt using BuildBot).
comment:23 Changed 8 years ago by
Just to keep the tally updated, this came up again in this review.
(In [35395]) Branching to 'ENOSPC-5921'