Opened 3 years ago

Last modified 3 years ago

#5194 defect new

log.msg can fail if someone removes an observer at the wrong time

Reported by: zseil Owned by: zseil
Priority: normal Milestone:
Component: core Keywords:
Cc: Branch: branches/log-observer-removal-5194
(diff, github, buildbot, log)
Author: zseil Launchpad Bug:

Description

I saw this while working on Win32Reactor:

Traceback (most recent call last):
  File ".\twisted\internet\base.py", line 1181, in mainLoop
    self.doIteration(t)
  File ".\twisted\internet\win32eventreactor.py", line 164, in doWaitForMultipleEvents
    log.msg(channel='system', event='iteration', reactor=self)
  File ".\twisted\python\log.py", line 291, in msg
    observer = self.observers[i]
exceptions.IndexError: list index out of range

In this case, the helper reactor was trying to log something in its own thread while the test in main reactor was trying to clean up an added log observer.

The attached test shows the same problem, but without using threads.

Attachments (6)

observer-removal-test.patch (1.1 KB) - added by zseil 3 years ago.
test showing the problem
log-observer-removal.patch (3.8 KB) - added by zseil 3 years ago.
patch fixing the problem
log-observer-removal-2.patch (7.9 KB) - added by zseil 3 years ago.
patch fixing both problems and #3307
log-observer-removal-5194.patch (6.3 KB) - added by zseil 3 years ago.
Presumably threadsafe patch depending on patch for #3307
log-observer-removal-5194-2.patch (9.2 KB) - added by zseil 3 years ago.
additional patch against the log-observer-removal-5194 branch
snub.patch (1.9 KB) - added by exarkun 3 years ago.

Download all attachments as: .zip

Change History (27)

Changed 3 years ago by zseil

test showing the problem

Changed 3 years ago by zseil

patch fixing the problem

comment:1 Changed 3 years ago by zseil

  • Keywords review added

The attached patch seems to fix the problem. Please review!

comment:2 Changed 3 years ago by exarkun

  • Keywords review removed
  • Owner set to zseil

Thanks!

There are actually two problems here. You described both of them, but the patch only tries to address one of them. For clarity, the two problems are:

  1. removeObserver is not reentrant with respect to msg
  2. removeObserver is not threadsafe with respect to msg

Reentrancy is a weaker condition, since it involves interference at well-defined times (for example, removeObserver being called from a log observer). Threadsafety has to take care of cases like the observers list being modified after the observers.index call in the patch (line 312) succeeds but before the replacement is inserted into position (line 315).

The ticket description talks about the threading case, but it only tests and attempts to fix the reentrancy case.

I'm happy to see the two issues fixed separately, but depending on the solution it may make sense to fix them simultaneously.

As for the patch, the approach seems to make sense for fixing the reentrancy case. It slows down the error case a little, but it's the error case so that's okay.

  1. The test method names could maybe be a little more descriptive. Only so much information fits into the method name, of course, but test_removeObserver sounds like a test for a much simpler case than this one. ;)
  2. It seems both tests would benefit from an assertion about what observers are left in the log publisher after the log.msg call. Otherwise there seems to be no assurance that the code operates on the proper observers.

If #3307 were resolved, then these tests could use a LogPublisher instance instead of the global, which would be a bit nicer. :)

Having thought about this for a while now, it's not obvious how the thread-safety and reentrancy issues might be solved with a single change. It seems like removeObserver (and maybe addObserver?) need to be synchronized along with msg to make the whole thing thread-safe. Perhaps we should update the description of this ticket to describe the reentrancy issue and file another one for the thread-safety issue.

Also, after thinking about this for a while, I feel like it's a duplicate of some other ticket, but I can't find the other ticket.

comment:3 Changed 3 years ago by zseil

  • Keywords review added
  • Owner zseil deleted

OK. Note that the previous patch did fix thread safety in the common case, when there is no buggy observer. That was the case that caused the traceback in the description.

Anyway, here is an attempt to make LogPublisher.msg both reentrant and threadsafe wrt observer removal. It does that by logging each buggy observer with a separate LogPublisher, fixing #3307 at the same time. The order of the message delivery is also fixed, see the change in test_erroneousErrors.

The new tests are converted to use a separate LogPublisher instance, hopefully the new names are also an improvement.

The changes to the global log.err are not strictly needed and slightly change the semantics of that function (exceptions passed to it are now also checked if they should be ignored), but I felt bad copying just the docstring without doing any cleanup. Let me know if I should drop that part.

Changed 3 years ago by zseil

patch fixing both problems and #3307

comment:4 Changed 3 years ago by exarkun

It'd be nice to fix #3307 independently, instead of mixing the change in with these other fixes (the trunk revisions will be more understandable, and if there's a problem with one part of the fix it can be reverted independently of the other).

Changed 3 years ago by zseil

Presumably threadsafe patch depending on patch for #3307

comment:5 Changed 3 years ago by zseil

OK, thanks for a quick review. I've split the fix for #3307 into a separate patch and updated this patch. Unfortunately the updated patch here depends on the one in #3307, it would be hard to make them independent.

I've added a few more tests, but there are still no direct tests for thread safety. I don't know how a reliable test involving threads should be written.

comment:6 Changed 3 years ago by glyph

  • Author set to glyph
  • Branch set to branches/log-observer-removal-5194

(In [32558]) Branching to 'log-observer-removal-5194'

comment:7 Changed 3 years ago by glyph

  • Keywords review removed
  • Owner set to zseil

Hi zseil,

Again, thanks for your contribution, sorry for the long latency on some of these. I applied the patch and tried to run the tests locally, but unfortunately I got this traceback:

Traceback (most recent call last):
  File ".../twisted/test/test_log.py", line 159, in test_doubleErrorDoesNotRemoveObserver
    self.assertEqual(len(errors), 1)
  File ".../twisted/trial/unittest.py", line 270, in assertEqual
    % (msg, pformat(first), pformat(second)))
twisted.trial.unittest.FailTest: not equal:
a = 0
b = 1

I think that this may be an interaction with a branch that was merged in the interim. (Perhaps even #3307?)

Did I screw up the patch application somehow? I checked in the code to the branch that is now linked to this ticket, so you can examine that to make sure, and attach future patches against that branch.

I've seen your name on a couple of branches lately, and you're consistently on the high score charts for the last few months - are you interested in applying for a commit bit so you can expedite this process, commit to SVN, force builds, etc?

comment:8 Changed 3 years ago by glyph

I should also note that, other than this traceback, this patch looks pretty good and I think it could be close to merging. It will bear some final scrutiny once it's ready though, since logging is pretty central and widely used :).

Changed 3 years ago by zseil

additional patch against the log-observer-removal-5194 branch

comment:9 Changed 3 years ago by zseil

  • Keywords review added
  • Owner zseil deleted

Thanks for the review and don't worry about the latency, mine is no better :)

Sorry about the traceback, it showed a real bug in the patch (publisher.msg call at the end of the msg method was not protected by a try ... except block). I should have caught it before submitting patch for review by running all of the tests.

I attached an additional patch on top of the branch that fixes the bug and adds a few more tests.

About the commit rights, I really appreciate the invitation, I'd love to become a committer. What are the next steps to get the commit bit?

comment:10 Changed 3 years ago by exarkun

About the commit rights, I really appreciate the invitation, I'd love to become a committer. What are the next steps to get the commit bit?

Jump on IRC sometime and talk to someone about your SSH keys and stuff. :)

comment:11 Changed 3 years ago by exarkun

(In [32832]) apply log-observer-removal-5194-2.patch

refs #5194

comment:12 Changed 3 years ago by exarkun

I'm not sure about those new except statements. It seems like the logging system shouldn't be going into infinite recursion and then silently proceeding as if nothing happened. If the code is more aggressive about snubbing observers that raise an exception, the likelihood of a stack overflow seems reduced.

I propose the attached adjustment to the dispatch loop. It breaks some tests, notably test_multipleBuggyObservers, but maybe that's okay. I understand the motivation to want to let other observers handle/report errors, but if they've already fail themselves then I think they've lost their chance.

Changed 3 years ago by exarkun

comment:13 Changed 3 years ago by exarkun

  • Author changed from glyph to zseil
  • Keywords review removed
  • Owner set to zseil

comment:14 Changed 3 years ago by zseil

(In [33012]) Apply snub.patch.

refs #5194

comment:15 Changed 3 years ago by zseil

(In [33013]) Improve test_noObserverSkipped and make it pass.

refs #5194

comment:16 Changed 3 years ago by zseil

(In [33014]) Resource exhaustion should not remove the failed observer.

log.msg will now fail if some operating system limit is hit while trying to
collect the information about why the observer failed.

refs #5194

comment:17 Changed 3 years ago by zseil

(In [33015]) Adjust test_multipleBuggyObservers so that it passes.

refs #5194

comment:18 Changed 3 years ago by zseil

(In [33016]) Adjust test_doubleErrorDoesNotRemoveObserver so that it passes.

refs #5194

comment:19 Changed 3 years ago by zseil

  • Keywords review added
  • Owner zseil deleted

OK, I think this is ready for another review. Buildbot results will be here:
http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/log-observer-removal-5194

api-documentation failures look unrelated and should go away when the branch is merged with trunk.

comment:20 Changed 3 years ago by itamar

  • Owner set to itamar
  • Status changed from new to assigned

comment:21 Changed 3 years ago by itamar

  • Keywords review removed
  • Owner changed from itamar to zseil
  • Status changed from assigned to new

The solution is convincing, and the tests are pretty thorough. Remaining issues (thanks for working through all this):

  1. You should add a comment explaining why you use id in _publishMessage.
  2. You should add a comment explicitly pointing out that reversed() copies the observer list, which is the whole point of the exercise. And explain why it runs in reverse order ("legacy reasons" is OK if that's why).
  3. You should explain why test_removeObserverDuringMsg adds and removes the observer twice (saying "reproduce original error reported in #5194" may be sufficient, since this will end up referring to no longer applicable implementation details otherwise).
  4. test_removeObserverAndRaiseDuringMsg should assert that the *correct* RuntimeError was logged.
  5. test_multipleBuggyObservers should have a comment at end explaining why it expects that particular order of logged errors. In general I am not quite sure this test tests what it claims to be testing.
  6. test_noObserverSkipped is too tricksy for me to follow what's going on.
  7. Maybe I missed it, but I didn't see anywhere where the why message ("Log observer %s failed.") is actually checked in the tests.
Note: See TracTickets for help on using tickets.