Opened 15 years ago

Last modified 9 years ago

#1964 enhancement new

Cleaning up the reactor in unit tests is hard

Reported by: moof Owned by:
Priority: normal Milestone:
Component: trial Keywords:
Cc: spiv, davidsarah Branch:
Author:

Description

After a discussion with jml tonight, I've discovered that cleaning up the reactor after a unit test is hard. The stuff in http://blackjml.livejournal.com/23029.html makes me think that I'm having to write far too much code just to accommodate my test suite, which should really just automate stuff for me.

I don't mind having to explicitly close the listening port, or the connecting client in a client-server interaction. Or waiting for their deferreds. But closing the server protocols is hard, especially if you're using somebody else's code that doesn't seem to be developed for this, such as twisted.spread.pb

As stated on that web page, trial should make it easy for me to clean the reactor of things that I need to clean it of. I agree that a dirty reactor shoudl raise an error.

Possible approaches to consider:

  • Give IListeningPort.stopListening a switch to clean up any protocols open before firing off a returned deferred. It shoudl be calling ConnectionLost at somepoint anyway, surely?
  • Have a wrapper for reactor.listen* that copes with such problems
  • Give trial a way to explicitly clean the reactor of certain types of object, or at least wait til the objects themselves time out.

Change History (10)

comment:1 Changed 15 years ago by spiv

Cc: spiv added

comment:2 Changed 15 years ago by Jonathan Lange

#1334 would probably go some way to fixing this.

comment:3 Changed 15 years ago by Glyph

Summary: Cleaning up the reactorin unit tests is hardCleaning up the reactor in unit tests is hard

comment:4 Changed 11 years ago by <automation>

Owner: Jonathan Lange deleted

comment:5 Changed 9 years ago by davidsarah

Cc: davidsarah added

A common reason why tests fail with an unclean reactor error is that there are DelayedCalls that would be triggered just after the test returns. If the delay is short, the test will fail nondeterministically, because sometimes the call will have been triggered before Trial checks for an unclean reactor, and sometimes not.

It arguably isn't a problem with the code under test that it does asynchronous cleanup. Indeed, it may be intentionally returning a result before doing cleanup for performance reasons. In that case, the changes needed to make the test wait long enough may be ugly and intrusive to the code under test. In Tahoe-LAFS we had this problem in several of our tests:

https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1852

We fixed it by simply waiting for any short DelayedCalls to have completed. But to do that we had to read non-public attributes of the reactor. Trial could do that automatically for all tests (in a less hacky way that would reliably exclude the DelayedCall needed to implement the test timeout).

comment:6 Changed 9 years ago by zooko

See also pyutil's clean_pending(), which I think accomplishes a similar result without the delay. I've used it successfully in the past, but not recently.

comment:7 Changed 9 years ago by Jean-Paul Calderone

It arguably isn't a problem with the code under test that it does asynchronous cleanup.

Yes, that's certainly arguable. However, it's *simpler* if all event sources still in the reactor after the test completes are treated as errors. This seems very clear-cut to me, and I don't expect much disagreement. How does it seem to you?

If we agree it's simpler not to have special handling for event sources which will trigger "soon", then I would argue that we should stick with the simpler thing. Applications that don't want to deal with the additional complexity of cleaning up these event sources themselves have several options:

  1. Don't create the extra event sources in the first place (ie, exercise less code in your tests)
  2. Create the event sources in a throw-away reactor (eg a Clock instance or some other test double)
  3. Develop some re-usable cleanup logic and then use it. Given trial's support of tearDown, addCleanup, and simply return d.addBoth(doCleanup), this doesn't seem unreasonable.

What I would love to see is for the two reactor.iterate(0) calls trial imposes after every test method be eliminated. This would make trial's behavior simpler than it is now (which I maintain is a good thing), and isn't very difficult to do - but likely requires a deprecation period and for someone to actually undertake the work.

Another trial change that would be really excellent is to separate all of the global-reactor-related functionality from the basic testing functionality so that the unclean reactor errors/failures come from subclassing some TestCase-like class, but perhaps not twisted.trial.unittest.TestCase (or, similarly, so that trial provides a TestCase-like class that is not twisted.trial.unittest.TestCase which has all the great features of trial but not the unclean reactor error/failure behavior).

comment:8 in reply to:  7 ; Changed 9 years ago by Glyph

Replying to exarkun:

Yes, that's certainly arguable. However, it's *simpler* if all event sources still in the reactor after the test completes are treated as errors. This seems very clear-cut to me, and I don't expect much disagreement. How does it seem to you?

I agree with this pretty strongly.

Another trial change that would be really excellent is to separate all of the global-reactor-related functionality from the basic testing functionality so that the unclean reactor errors/failures come from subclassing some TestCase-like class, but perhaps not twisted.trial.unittest.TestCase (or, similarly, so that trial provides a TestCase-like class that is not twisted.trial.unittest.TestCase which has all the great features of trial but not the unclean reactor error/failure behavior).

This sounds like a fantastic idea. Can you write it up in a new ticket if it isn't already written down somewhere, and link to it if it is? (I don't see one in any of my searches, so I'm guessing there isn't.) Thanks.

comment:9 in reply to:  8 Changed 9 years ago by davidsarah

Replying to glyph:

Replying to exarkun:

Yes, that's certainly arguable. However, it's *simpler* if all event sources still in the reactor after the test completes are treated as errors. This seems very clear-cut to me, and I don't expect much disagreement. How does it seem to you?

It isn't simpler to have to change the code under test and/or the test (often both) in order to avoid spurious test failures for something that wasn't a bug.

I wouldn't be so bothered by this if the resulting failures were deterministic. But in practice, when the DelayedCalls are short, they may or may not complete before Trial checks for an unclean reactor. So the test may or may not pass. In the worst case, it does not fail often enough to notice the problem immediately, which makes it difficult to isolate which change (to the test or the code under test) caused it. This is unnecessary debugging work, which takes away time from doing more productive things.

Another trial change that would be really excellent is to separate all of the global-reactor-related functionality from the basic testing functionality so that the unclean reactor errors/failures come from subclassing some TestCase-like class, but perhaps not twisted.trial.unittest.TestCase (or, similarly, so that trial provides a TestCase-like class that is not twisted.trial.unittest.TestCase which has all the great features of trial but not the unclean reactor error/failure behavior).

I don't want to remove the unclean reactor errors (or at least, that's orthogonal to this bug). They're an invaluable debugging aid, and I would enable them for all tests even if there was an option not to. I just want them not to trigger spuriously in cases where Trial could easily detect that there's no real problem.

exarkun wrote:

Applications that don't want to deal with the additional complexity of cleaning up these event sources themselves have several options:

  1. Don't create the extra event sources in the first place (ie, exercise less code in your tests)
  2. Create the event sources in a throw-away reactor (eg a Clock instance or some other test double)

These are fine for unit tests, but for system/integration tests, the whole point is to exercise the code as it will actually run.

  1. Develop some re-usable cleanup logic and then use it. Given trial's support of tearDown, addCleanup, and simply return d.addBoth(doCleanup), this doesn't seem unreasonable.

That's what we've done, but Zooko and I ended up doing it independently, even though we were trying to solve the same problem. To me it makes more sense to address false-positives in the unclean reactor detection in the code that actually does that detection.

comment:10 Changed 9 years ago by Jean-Paul Calderone

It isn't simpler to have to change the code under test and/or the test (often both) in order to avoid spurious test failures for something that wasn't a bug.

You say it isn't simpler. I say it is. Neither can present objective evidence supporting one conclusion over the other, so I guess I'll just drop this part of the conversation (while continuing to hold on to my subjective assessment).

I wouldn't be so bothered by this if the resulting failures were deterministic. But in practice, when the DelayedCalls are short, they may or may not complete before Trial checks for an unclean reactor.

The plan is to get rid of the code in trial that makes "short" DelayedCalls not trigger this error. It's just that no one has gotten around to it yet.

So the test may or may not pass.

The best solution is to not have any DelayedCalls in the global reactor at all, by not even using the global reactor in unit tests. This also entirely eliminates the problem.

I just want them not to trigger spuriously in cases where Trial could easily detect that there's no real problem.

trial can never detect this, and certainly not easily. It can detect whether there are event sources left in the reactor at the end of a test, it has no idea what might subdivide these into "real problem" vs "no real problem".

These are fine for unit tests, but for system/integration tests, the whole point is to exercise the code as it will actually run.

I guess you have (correctly, in my opinion) discovered that trial is a better unit testing tool than a system/integration testing tool.

That's what we've done, but Zooko and I ended up doing it independently, even though we were trying to solve the same problem. To me it makes more sense to address false-positives in the unclean reactor detection in the code that actually does that detection.

We can fix the false *negatives* fairly easily by removing the trial code that lets "short" DelayedCalls run at the end of a test. At the risk of repeating myself, I don't see how it can identify any DelayedCall as a false positive and not report it as an error.

Reading the ticket history, I'm not sure what change you're proposing anyway. I see two possibilities. Either:

I don't think you're actually interested in the first of these, and I don't see how the second can seriously be considered as a general solution, since 60 seconds is a purely arbitrary limit. This conclusion reinforces my idea that trial can't actually distinguish between different DelayedCalls (with respect to whether they should fail a test if they are still pending), and so the only sensible solution is for any DelayedCall to fail the test.

Note: See TracTickets for help on using tickets.