Ticket #5194 defect new

Opened 23 months ago

Last modified 16 months ago

log.msg can fail if someone removes an observer at the wrong time

Reported by: zseil Owned by: zseil
Priority: normal Milestone:
Component: core Keywords:
Cc: Branch: branches/log-observer-removal-5194
Author: zseil Launchpad Bug:

Description

I saw this while working on Win32Reactor:

Traceback (most recent call last):
  File ".\twisted\internet\base.py", line 1181, in mainLoop
    self.doIteration(t)
  File ".\twisted\internet\win32eventreactor.py", line 164, in doWaitForMultipleEvents
    log.msg(channel='system', event='iteration', reactor=self)
  File ".\twisted\python\log.py", line 291, in msg
    observer = self.observers[i]
exceptions.IndexError: list index out of range

In this case, the helper reactor was trying to log something in its own thread while the test in main reactor was trying to clean up an added log observer.

The attached test shows the same problem, but without using threads.

Attachments

observer-removal-test.patch Download (1.1 KB) - added by zseil 23 months ago.
test showing the problem
log-observer-removal.patch Download (3.8 KB) - added by zseil 23 months ago.
patch fixing the problem
log-observer-removal-2.patch Download (7.9 KB) - added by zseil 23 months ago.
patch fixing both problems and #3307
log-observer-removal-5194.patch Download (6.3 KB) - added by zseil 23 months ago.
Presumably threadsafe patch depending on patch for #3307
log-observer-removal-5194-2.patch Download (9.2 KB) - added by zseil 20 months ago.
additional patch against the log-observer-removal-5194 branch
snub.patch Download (1.9 KB) - added by exarkun 20 months ago.

Change History

Changed 23 months ago by zseil

test showing the problem

Changed 23 months ago by zseil

patch fixing the problem

1

Changed 23 months ago by zseil

  • keywords review added

The attached patch seems to fix the problem. Please review!

2

Changed 23 months ago by exarkun

  • owner set to zseil
  • keywords review removed

Thanks!

There are actually two problems here. You described both of them, but the patch only tries to address one of them. For clarity, the two problems are:

  1. removeObserver is not reentrant with respect to msg
  2. removeObserver is not threadsafe with respect to msg

Reentrancy is a weaker condition, since it involves interference at well-defined times (for example, removeObserver being called from a log observer). Threadsafety has to take care of cases like the observers list being modified after the observers.index call in the patch (line 312) succeeds but before the replacement is inserted into position (line 315).

The ticket description talks about the threading case, but it only tests and attempts to fix the reentrancy case.

I'm happy to see the two issues fixed separately, but depending on the solution it may make sense to fix them simultaneously.

As for the patch, the approach seems to make sense for fixing the reentrancy case. It slows down the error case a little, but it's the error case so that's okay.

  1. The test method names could maybe be a little more descriptive. Only so much information fits into the method name, of course, but test_removeObserver sounds like a test for a much simpler case than this one. ;)
  2. It seems both tests would benefit from an assertion about what observers are left in the log publisher after the log.msg call. Otherwise there seems to be no assurance that the code operates on the proper observers.

If #3307 were resolved, then these tests could use a LogPublisher instance instead of the global, which would be a bit nicer. :)

Having thought about this for a while now, it's not obvious how the thread-safety and reentrancy issues might be solved with a single change. It seems like removeObserver (and maybe addObserver?) need to be synchronized along with msg to make the whole thing thread-safe. Perhaps we should update the description of this ticket to describe the reentrancy issue and file another one for the thread-safety issue.

Also, after thinking about this for a while, I feel like it's a duplicate of some other ticket, but I can't find the other ticket.

3

Changed 23 months ago by zseil

  • keywords review added
  • owner zseil deleted

OK. Note that the previous patch did fix thread safety in the common case, when there is no buggy observer. That was the case that caused the traceback in the description.

Anyway, here is an attempt to make LogPublisher.msg both reentrant and threadsafe wrt observer removal. It does that by logging each buggy observer with a separate LogPublisher, fixing #3307 at the same time. The order of the message delivery is also fixed, see the change in test_erroneousErrors.

The new tests are converted to use a separate LogPublisher instance, hopefully the new names are also an improvement.

The changes to the global log.err are not strictly needed and slightly change the semantics of that function (exceptions passed to it are now also checked if they should be ignored), but I felt bad copying just the docstring without doing any cleanup. Let me know if I should drop that part.

Changed 23 months ago by zseil

patch fixing both problems and #3307

4

Changed 23 months ago by exarkun

It'd be nice to fix #3307 independently, instead of mixing the change in with these other fixes (the trunk revisions will be more understandable, and if there's a problem with one part of the fix it can be reverted independently of the other).

Changed 23 months ago by zseil

Presumably threadsafe patch depending on patch for #3307

5

Changed 23 months ago by zseil

OK, thanks for a quick review. I've split the fix for #3307 into a separate patch and updated this patch. Unfortunately the updated patch here depends on the one in #3307, it would be hard to make them independent.

I've added a few more tests, but there are still no direct tests for thread safety. I don't know how a reliable test involving threads should be written.

6

Changed 20 months ago by glyph

  • branch set to branches/log-observer-removal-5194
  • branch_author set to glyph

(In [32558]) Branching to 'log-observer-removal-5194'

7

Changed 20 months ago by glyph

  • keywords review removed
  • owner set to zseil

Hi zseil,

Again, thanks for your contribution, sorry for the long latency on some of these. I applied the patch and tried to run the tests locally, but unfortunately I got this traceback:

Traceback (most recent call last):
  File ".../twisted/test/test_log.py", line 159, in test_doubleErrorDoesNotRemoveObserver
    self.assertEqual(len(errors), 1)
  File ".../twisted/trial/unittest.py", line 270, in assertEqual
    % (msg, pformat(first), pformat(second)))
twisted.trial.unittest.FailTest: not equal:
a = 0
b = 1

I think that this may be an interaction with a branch that was merged in the interim. (Perhaps even #3307?)

Did I screw up the patch application somehow? I checked in the code to the branch that is now linked to this ticket, so you can examine that to make sure, and attach future patches against that branch.

I've seen your name on a couple of branches lately, and you're consistently on the high score charts for the last few months - are you interested in applying for a commit bit so you can expedite this process, commit to SVN, force builds, etc?

8

Changed 20 months ago by glyph

I should also note that, other than this traceback, this patch looks pretty good and I think it could be close to merging. It will bear some final scrutiny once it's ready though, since logging is pretty central and widely used :).

Changed 20 months ago by zseil

additional patch against the log-observer-removal-5194 branch

9

Changed 20 months ago by zseil

  • owner zseil deleted
  • keywords review added

Thanks for the review and don't worry about the latency, mine is no better :)

Sorry about the traceback, it showed a real bug in the patch (publisher.msg call at the end of the msg method was not protected by a try ... except block). I should have caught it before submitting patch for review by running all of the tests.

I attached an additional patch on top of the branch that fixes the bug and adds a few more tests.

About the commit rights, I really appreciate the invitation, I'd love to become a committer. What are the next steps to get the commit bit?

10

Changed 20 months ago by exarkun

About the commit rights, I really appreciate the invitation, I'd love to become a committer. What are the next steps to get the commit bit?

Jump on IRC sometime and talk to someone about your SSH keys and stuff. :)

11

Changed 20 months ago by exarkun

(In [32832]) apply log-observer-removal-5194-2.patch

refs #5194

12

Changed 20 months ago by exarkun

I'm not sure about those new except statements. It seems like the logging system shouldn't be going into infinite recursion and then silently proceeding as if nothing happened. If the code is more aggressive about snubbing observers that raise an exception, the likelihood of a stack overflow seems reduced.

I propose the attached adjustment to the dispatch loop. It breaks some tests, notably test_multipleBuggyObservers, but maybe that's okay. I understand the motivation to want to let other observers handle/report errors, but if they've already fail themselves then I think they've lost their chance.

Changed 20 months ago by exarkun

13

Changed 20 months ago by exarkun

  • owner set to zseil
  • keywords review removed
  • branch_author changed from glyph to zseil

14

Changed 19 months ago by zseil

(In [33012]) Apply snub.patch.

refs #5194

15

Changed 19 months ago by zseil

(In [33013]) Improve test_noObserverSkipped and make it pass.

refs #5194

16

Changed 19 months ago by zseil

(In [33014]) Resource exhaustion should not remove the failed observer.

log.msg will now fail if some operating system limit is hit while trying to collect the information about why the observer failed.

refs #5194

17

Changed 19 months ago by zseil

(In [33015]) Adjust test_multipleBuggyObservers so that it passes.

refs #5194

18

Changed 19 months ago by zseil

(In [33016]) Adjust test_doubleErrorDoesNotRemoveObserver so that it passes.

refs #5194

19

Changed 19 months ago by zseil

  • keywords review added
  • owner zseil deleted

OK, I think this is ready for another review. Buildbot results will be here:  http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/log-observer-removal-5194

api-documentation failures look unrelated and should go away when the branch is merged with trunk.

20

Changed 16 months ago by itamar

  • status changed from new to assigned
  • owner set to itamar

21

Changed 16 months ago by itamar

  • owner changed from itamar to zseil
  • keywords review removed
  • status changed from assigned to new

The solution is convincing, and the tests are pretty thorough. Remaining issues (thanks for working through all this):

  1. You should add a comment explaining why you use id in _publishMessage.
  2. You should add a comment explicitly pointing out that reversed() copies the observer list, which is the whole point of the exercise. And explain why it runs in reverse order ("legacy reasons" is OK if that's why).
  3. You should explain why test_removeObserverDuringMsg adds and removes the observer twice (saying "reproduce original error reported in #5194" may be sufficient, since this will end up referring to no longer applicable implementation details otherwise).
  4. test_removeObserverAndRaiseDuringMsg should assert that the *correct* RuntimeError was logged.
  5. test_multipleBuggyObservers should have a comment at end explaining why it expects that particular order of logged errors. In general I am not quite sure this test tests what it claims to be testing.
  6. test_noObserverSkipped is too tricksy for me to follow what's going on.
  7. Maybe I missed it, but I didn't see anywhere where the why message ("Log observer %s failed.") is actually checked in the tests.
Note: See TracTickets for help on using tickets.