Opened 8 years ago

Closed 3 years ago

Last modified 2 years ago

#1918 enhancement closed fixed (fixed)

Revive KQueue Reactor

Reported by: dialtone Owned by: oberstet
Priority: normal Milestone:
Component: core Keywords:
Cc: therve, thijs, psykidellic, tobias.oberstein@… Branch: branches/kqueue-1918
(diff, github, buildbot, log)
Author: oberstet, exarkun, therve Launchpad Bug:

Description

foom has been working a new version of pykqueue that works better than version 1.3 and is distributed and maintained by arg here: www.python-hpio.net.

Under this new circumstances reviving the kqueue reactor should be a good aim to work for.

Attachments (16)

kqueue.diff (15.1 KB) - added by dialtone 8 years ago.
Working KQueue reactor patch
kqueue_1.1.diff (12.9 KB) - added by dialtone 8 years ago.
New version
kqreactor_patch.diff (4.0 KB) - added by psykidellic 5 years ago.
Patch for kqreactor using select26 module.
issue_1918.patch (6.6 KB) - added by psykidellic 4 years ago.
kqueue using the Python select module.
twisted_errors.txt (44.7 KB) - added by psykidellic 4 years ago.
Cases those failed.
new_kqueue.log (647.1 KB) - added by oberstet 3 years ago.
new kqueue trial
select.log (644.1 KB) - added by oberstet 3 years ago.
select trial (on same system)
select_vs_new_kqueue_diff.log (6.3 KB) - added by oberstet 3 years ago.
diff select.log new_kqueue.log > select_vs_new_kqueue_diff.log
kqueue.patch (10.4 KB) - added by oberstet 3 years ago.
select.kqueue based reactor
kqueue.log (706.4 KB) - added by oberstet 3 years ago.
trial -r kqueue --rterrors twisted > kqueue.log
kqueue.2.patch (12.1 KB) - added by oberstet 3 years ago.
Revised patch after review
kqueue.3.patch (14.3 KB) - added by oberstet 3 years ago.
Revised patch : daemonization, correct news file
kqueue.4.patch (16.5 KB) - added by oberstet 3 years ago.
Revised patch: docstring, IReactorDaemonize
kqueue.5.patch (21.2 KB) - added by oberstet 3 years ago.
Revised patch: unit tests
kqueue.6.patch (20.9 KB) - added by oberstet 3 years ago.
revised for review comments of exarkun
kqueue.7.patch (20.9 KB) - added by oberstet 3 years ago.
issue 5.: use new style class

Change History (109)

Changed 8 years ago by dialtone

Working KQueue reactor patch

comment:1 Changed 8 years ago by jknight

I don't like platform.usingKQueue. Also, I'd like to see all tests pass on FreeBSD, or else an explanation as to why they don't.

Changed 8 years ago by dialtone

New version

comment:2 Changed 8 years ago by dialtone

Removed usingKQueue.

This patch was tested against MacOSX 10.4.7 and FreeBSD 6.1-stable.

stdio and PTY tests unfortunately still fail and need more testing. Probably _doReadOrWrite method needs some extra corner cases to deal with stdio and processes errors since kqueue/kevent deal with those with small but decisive differences.

comment:3 Changed 8 years ago by glyph

  • Owner changed from glyph to dialtone

comment:4 Changed 8 years ago by glyph

  • Priority changed from normal to low

comment:5 Changed 7 years ago by therve

  • Cc therve added

python-hpio.net is down for a while now, can you provide the latest tarball? The best would probably to ship it with Twisted, if license is compatible.

comment:6 Changed 7 years ago by jknight

The latest source is here:
http://twistedmatrix.com/trac/browser/sandbox/foom/pykqueue

As far as I know, the thing on python-hpio was unchanged from this.

License is BSD.

comment:7 Changed 7 years ago by jknight

Also, there is no way that kqueue will work with PTYs (or any "device" kind of fd) on osx < 10.5 (which hasn't been released yet). This is a limitation in the OS. Poll doesn't work for device fds either, btw.

Just to calibrate expectations here...

comment:8 Changed 7 years ago by dreid

This patch no longer applies cleanly to trunk.

comment:9 Changed 7 years ago by therve

Note that the libevent reactor (at #1930) is supposed to support kqueue, but I didn't have the opportunity to test.

comment:10 Changed 7 years ago by therve

I have started some things in the kqreactor-1918 branch.

comment:11 Changed 6 years ago by therve

  • author set to therve
  • Branch set to branches/kqreactor-1918-2

(In [23105]) Branching to 'kqreactor-1918-2'

comment:12 Changed 6 years ago by thijs

  • Cc thijs added

comment:13 Changed 5 years ago by psykidellic

While trying to port dialtone's inotify to Mac, seems I need to use kqueue(). The kqreactor in Twisted uses kqsyscall. The Python module for it seems only to be maintained with Python 1.5. Also, it does not compile on my Leapord even with the patch provided by Itamar at http://twistedmatrix.com/documents/8.2.0/api/twisted.internet.kqreactor.html.

Meanwhile, Python 2.6 is bringing kqueue() support in its select module. A backport of it for 2.5 is available at: http://pypi.python.org/pypi/select26. Using the module and patching up kqreactor.py with the attached diff, I was able to run:

kqreactor.install().

I would really appreciate if somebody can look into the patch and let me know if this is the right way to proceed.

I need folderwatcher support for Mac at my work so I am willing to spend time to get this thing working so any thoughts, comments are welcome.

Changed 5 years ago by psykidellic

Patch for kqreactor using select26 module.

comment:14 Changed 5 years ago by psykidellic

One thing to note.

Following the example at:
http://julipedia.blogspot.com/2004/10/example-of-kqueue.html (which
works perfectly as it tells all events), I tried to port it to Python:

http://www.bpaste.net/show/25/

Not sure where I am wrong but the poller only returns *one* event
ever. Adding new text, deleting the file does not return any event.

Looks like it might be related to the bug at: http://bugs.python.org/issue5910 but I tried to implement that patch in the current select26 without success.

I will try to contact the original author cheimes to see if he can help me out.

comment:15 Changed 5 years ago by psykidellic

  • Cc psykidellic added

comment:16 Changed 5 years ago by glyph

  • Owner changed from dialtone to glyph
  • Status changed from new to assigned

comment:17 Changed 4 years ago by glyph

#3114 also talks about kqueue.

comment:18 Changed 4 years ago by psykidellic

Hi

Here is my first attempt to use the Python 2.6 select module and Christian Heimes backport to update the kqueue reactor.

Initially, I had some issues with test cases related to timerservice but after exarkun pointed out at IRC, the timeout should be in seconds rather than milliseconds.

Though I am still seeing errors when I run the testcases and they primarily seemed to be grouped in processes or pty/tty which seems to follow dialtone and jknight observations.

I am attaching a patch file which is applicable to trunk at revision: r28802.

Let me know what else I can do?

Changed 4 years ago by psykidellic

kqueue using the Python select module.

comment:19 follow-up: Changed 4 years ago by jknight

Are you testing on OSX? What version?

I guess all the tests which use devices should be marked as known failures on OSX/kqueuereactor, since apple seems in no hurry to fix it...

Is the patch for select26 in your patch only required for the select26 backport? That is: does the reactor work properly against released python 2.6's select module?

comment:20 Changed 4 years ago by psykidellic

  • Keywords review added

Wow. That was quick.

Yes. I am testing on Mac OS X v10.5 now. I have a friend who has 10.6 so I can ask him to test it on Snow Leapord also. I guess, I can install FreeBSD on my virtual box and run the test cases.

As an addition, I am also attaching the error log of the complete test run.

The following tests seem to be taking longer (when using kqueue rather than the default select) but eventually they succeeded.

test_outputWithErrorIgnored ...
HalfCloseTestCase

testCloseWriteCloser ...

Changed 4 years ago by psykidellic

Cases those failed.

comment:21 in reply to: ↑ 19 Changed 4 years ago by psykidellic

  • Keywords review removed

Replying to jknight:

Is the patch for select26 in your patch only required for the select26 backport? That is: does the reactor work properly against released python 2.6's select module?

This works on both. It first tries to import select26 module and if it fails tries to import the select module.

comment:22 Changed 4 years ago by psykidellic

  • Keywords review added

Ooops, my bad. I should not have removed the review keyword.

comment:23 Changed 4 years ago by jknight

  • Keywords review removed

"This works on both" -- well, it doesn't work on an unpatched select26 -- it requires a patched copy to work properly. So I was wondering if it worked on an out-of-the-box python 2.6. Anyways, I looked at the source for selectmodule.c in python 2.6: it looks like only Python 2.6.5 or later will work, because that's the first version to include the fix for:

#5910: fix kqueue for calls with more than one event.

On earlier versions of Python 2.6, I suppose users will need to install the patched select26 module also. kqueuereactor should probably check that the version of python it's running against is acceptable before letting you use it.

As for the failing tests, the test_process and test_stdio failures are just because the kqueue syscall in OSX is broken for ptys: those failures are expected. But do retest on FreeBSD to make sure it works there.

I don't think the half-close tests in test_tcp should be failing, however. That /probably/ indicates an actual bug in kqueuereactor.

Also, as a note on the process here: if you think there are still things you need to work on, removing the review keyword yourself is an entirely appropriate thing to do.

comment:24 Changed 4 years ago by exarkun

  • Owner changed from glyph to psykidellic
  • Status changed from assigned to new

comment:25 Changed 4 years ago by psykidellic

My bad about the interpretation of your question. Yes, you need to apply the patch for select26 module.

Is there any standard way to check versions when you code in Twisted? I can put those checks.

I guess the next step is to look into the non test_process/test_stdio test cases to make sure there are no errors in kqreactor.

I will also try to install FreeBSD on my VirtualBox to run the testcases.

comment:26 follow-up: Changed 4 years ago by psykidellic

Alright. I am kind of stuck now. An extra eye would be much appreciated.

http://www.bpaste.net/show/5236/ - I have pasted the output of trial run with -e and the resulting test.log.

Basically, in the testcases: testCloseWriteCloser and testWriteCloseNotification, the method readConnectionLost method is never called.

comment:27 in reply to: ↑ 26 Changed 4 years ago by glyph

Replying to psykidellic:

http://www.bpaste.net/show/5236/ - I have pasted the output of trial run with -e and the resulting test.log.

In the future, could you attach logs like this to tickets as attachments? Pastebins tend to expire things pretty quickly, so when someone goes to diagnose it, it's OK.

Basically, in the testcases: testCloseWriteCloser and testWriteCloseNotification, the method readConnectionLost method is never called.

You mentioned on IRC that this was run under Leopard. Any chance you could test it on a more recent MacOS, or a recent FreeBSD, and see if the results differ? Thanks!

comment:28 Changed 4 years ago by exarkun

You mentioned on IRC that this was run under Leopard. Any chance you could test it on a more recent MacOS, or a recent FreeBSD, and see if the results differ? Thanks!

We have a Snow Leopard slave and a FreeBSD slave. Someone should put the code in a branch and run it on our buildbot.

comment:29 follow-up: Changed 4 years ago by jknight

So I tried select.kqueue on python on my mac running OSX 10.6.2. It still doesn't support PTYs (or /dev/null), and furthermore, it crashed my mac with a kernel panic when I exited the python process.

I had run something like this at an interactive python prompt:

import select, os
kq = select.kqueue()
devnull = os.open("/dev/null", O_WRONLY)
kq.control([select.kevent(devnull, select.KQ_FILTER_WRITE, select.KQ_EV_ADD)], 1, 0)

and it returned a KQ_EV_ERROR with EINVAL error code, as it always has...and kernel panic'd on control-D. I didn't just run the above, though, it was some interactive fiddling, so I don't know exactly what set of things caused the panic. I haven't tried (and am not going to try) to reproduce it.

It's nearly certainly kqueue's fault, though.

Anyhow, given that behavior, it doesn't seem like we should try to support kqueue reactor on OSX (nor even attempt to run it on the buildslave). It's clear Apple doesn't really test kqueue at all, nor care if it works (it's been broken since it was added, in 10.3 or so). They're probably too busy making sure that people can't run software on the iPhone to get around to fixing kernel panics in OSX.

And BTW, poll on OSX is broken just as badly as kqueue...but apparently python on OSX is built without poll support, so you can't even try that. :)

Reviving kqueuereactor for freebsd to use is still a good idea, though.

comment:30 in reply to: ↑ 29 Changed 4 years ago by glyph

Replying to jknight:

So I tried select.kqueue on python on my mac running OSX 10.6.2. It still doesn't support PTYs (or /dev/null), and furthermore, it crashed my mac with a kernel panic when I exited the python process.

I had run something like this at an interactive python prompt:

Unfortunately I can't reproduce this particular panic :-\. Maybe this would be helpful, if you experience it again? How to log a kernel panic.

and it returned a KQ_EV_ERROR with EINVAL error code, as it always has...and kernel panic'd on control-D. I didn't just run the above, though, it was some interactive fiddling, so I don't know exactly what set of things caused the panic. I haven't tried (and am not going to try) to reproduce it.

It's nearly certainly kqueue's fault, though.

Anyhow, given that behavior, it doesn't seem like we should try to support kqueue reactor on OSX (nor even attempt to run it on the buildslave). It's clear Apple doesn't really test kqueue at all, nor care if it works (it's been broken since it was added, in 10.3 or so). They're probably too busy making sure that people can't run software on the iPhone to get around to fixing kernel panics in OSX.

If you can reproduce this kernel panic, report it at bugreport.apple.com. I assure you someone will be interested :).

Calendar Server uses kqueue, albeit not with /dev/null or PTYs, so I'm personally motivated to help out with this where I can.

comment:31 Changed 4 years ago by psykidellic

Wow. So many replies. I thought I would be notified if I am in the CC list. Anyway, we had a mid-release coming up so I had to divert resources to office work and could not reply here.

Status update:

-- I was able to sneak in couple of hours over the weekdays to run do couple of testruns. Due to my oversight I completely missed that branch 1918-2 already had the patch from dialtone. After correction from exarkun, I ran the test again using Python 2.5 and select26 backport on my laptop. Then I went ahead and ported the changes to the trunk but unfortunately some extra test cases (related to process fails) even though the same test case dont fail in 1918-2 branch.

I am not sure what might be the reason. If somebody with more knowledge can help me out, it would be much appreciated. I see the same comment from dreid so I am not sure what changes broke those tests.

As for FreeBSD, I had VirtualBox and after some fiddling around and Googling, it seems VirtualBox does not really support FreeBSD out of the box. Thankfully though I was able to get one of the extra licenses for VMWare Fusion from office. According to the docs: http://www.freebsd.org/doc/en/books/handbook/virtualization-guest.html, VMWare Fusion should support FreeBSD. I am installing it as I write this message.

Regarding the kernel panic, even I was unable to reproduce the error. I had a kernel panic only once when I was trying to run the test suite over SSH but that probably was bad mixup of code/branches so I am not even sure of the steps to reproduce the error.

I will upload my test report soon as they are on my laptop.


comment:32 Changed 4 years ago by exarkun

  • Author changed from therve to exarkun, therve
  • Branch changed from branches/kqreactor-1918-2 to branches/kqreactor-1918-3

(In [29762]) Branching to 'kqreactor-1918-3'

comment:33 Changed 4 years ago by zectbumo

  • Cc alfred@… added

I have confirmed that the unit tests fail on the PTY test for both OSX 10.6.4 (Python 2.6) and FreeBSD 6.1 (Python 2.7).

BTW, I had to change a line in reactormixins.py from "kqueuereactor" to "kqreactor" so I could use the command trial -r kqueue

comment:34 Changed 3 years ago by oberstet

  • Cc tobias.oberstein@… added

comment:35 Changed 3 years ago by oberstet

I've prepared an adjusted kqueue reactor (based on the current one in Twisted) which uses the module select.kqueue/kevent which comes per default since Python 2.6.

https://github.com/oberstet/txkqreactor/blob/master/txkqreactor/kqreactor.py

I ran the trial against it on a FreeBSD 8.2 32-Bit (VM under VirtualBox). I'll attach the results, as well as those for the select reactor on that system.

Changed 3 years ago by oberstet

new kqueue trial

Changed 3 years ago by oberstet

select trial (on same system)

Changed 3 years ago by oberstet

diff select.log new_kqueue.log > select_vs_new_kqueue_diff.log

Changed 3 years ago by oberstet

select.kqueue based reactor

Changed 3 years ago by oberstet

trial -r kqueue --rterrors twisted > kqueue.log

comment:36 Changed 3 years ago by exarkun

  • Keywords review added
  • Owner psykidellic deleted
  • Priority changed from low to normal

I think this is the state intended.

comment:37 Changed 3 years ago by glyph

  • Owner set to glyph
  • Status changed from new to assigned

Reviewing.

comment:38 Changed 3 years ago by glyph

  • Keywords review removed
  • Owner changed from glyph to oberstet
  • Status changed from assigned to new

Thanks for getting this branch started again. I'm very happy to see this getting some momentum; I'd love to stop maintaining the version in Calendar Server :).

  1. First a couple of procedural comments. These (and more) are covered on the web site, on the "Contributing to Twisted Labs" page:
    1. In the future, to make it clear what's being submitted for review, please (A) submit for review by attaching the review keyword (as exarkun did in this case), and (B) when you attach the review keyword, indicate what it is that you want to review.
    2. If you're using a foreign revision control system (such as git), please include easy instructions to produce a complete diff so that a reviewer can examine it. Better yet, just attach such a patch yourself :). After about an hour of trying to get you sensible instructions, I still have no idea how to do this in the general case with git. I can't comment on your specific case, since txkqreactor isn't a branch of Twisted. If you want to use a revision control system that is easier to develop for Twisted with, you might consider Bazaar, for which the relevant command is "bzr diff -r ancestor:". If you'd like commit access so that you can work on this without sending patches around, please feel free to ask.
    3. When you re-submit this as a diff to respond to this review, please include a NEWS file as per the review process documentation.
  2. There are a number of pyflakes errors:
    26: Error 'failure' imported but unused
    85: Error local variable 'e' is assigned to but never used
    100: Error local variable 'e' is assigned to but never used
    130: Error local variable 'e' is assigned to but never used
    157: Error local variable 'e' is assigned to but never used
    
    It would be great to get these cleaned up. You can get the current Pyflakes from PyPI.
  3. Due to the way _disconnectSelectable works, the except: clause in _doWriteOrRead will create the Failure twice. This is an error condition, but still, creating Failure objects can be kinda expensive. Certainly more expensive than not creating them :-). It might be better to write this:
    except:
        why = Failure()
        log.err(why)
    
  4. I get a test timeout on my MacOS X 10.7.2 machine, from trial --rterrors --reactor kqueue twisted.internet. Here are the resulting errors:
    ===============================================================================
    [ERROR]
    Traceback (most recent call last):
      File ".../twisted/internet/test/test_fdset.py", line 335, in test_lostFileDescriptor
        self.runReactor(reactor)
      File ".../twisted/internet/test/reactormixins.py", line 206, in runReactor
        "reactor still running after %s seconds" % (timeout,))
    twisted.internet.defer.TimeoutError: reactor still running after 120.0 seconds
    
    twisted.internet.test.test_fdset.ReactorFDSetTestsBuilder_KQueueReactor.test_lostFileDescriptor
    ===============================================================================
    [ERROR]
    Traceback (most recent call last):
      File ".../twisted/python/log.py", line 84, in callWithLogger
        return callWithContext({"system": lp}, func, *args, **kw)
      File ".../twisted/python/log.py", line 69, in callWithContext
        return context.call({ILogContext: newCtx}, func, *args, **kw)
      File ".../twisted/python/context.py", line 118, in callWithContext
        return self.currentContext().callWithContext(ctx, func, *args, **kw)
      File ".../twisted/python/context.py", line 81, in callWithContext
        return func(*args,**kw)
      File ".../twisted/internet/test/test_fdset.py", line 318, in connectionLost
        reactor.stop()
      File ".../twisted/internet/base.py", line 577, in stop
        "Can't stop reactor that isn't running.")
    twisted.internet.error.ReactorNotRunning: Can't stop reactor that isn't running.
    
    twisted.internet.test.test_fdset.ReactorFDSetTestsBuilder_KQueueReactor.test_lostFileDescriptor
    
  5. Since this is all being so heavily modified, _doWriteOrRead should get a docstring as well. (Even private methods should have docstrings, to help future maintainers.)
  6. Similarly, _updateRegistration doesn't document its parameters or return type.
  7. Just nitpicking here now, but addReader and addWriter take an IReadDescriptor and IWriteDescriptor, respectively. Interestingly, these methods don't require a thorough docstring, since they're just implementations of methods which are already documented on an interface, and pydoctor will already link them up for you. I'd replace them with something like "Implement L{IReactorFDSet.addReader}".
  8. Hrm... the except: in removeReader makes me a little uneasy, since it just throws away the error. But I see that PollReactor does basically the same thing, so I guess the consequences can't be that bad. So you can ignore this point, if you want to.
  9. Line 22 is >80 characters. Since you have another line importing stuff from select this should be easy to fix :).
  10. Line 209 is >80 characters. May I suggest:
    (filter, flags, data, fflags) = (
        event.filter, event.flags, event.data, event.fflags)
    
  11. As long as you're in here, the module docstring uses an ancient convention: prefixing code-block lines with "|" is no longer required or recommended by the coding standard; the :: before the block will do the job just fine. Also, the module docstring should really mention twistd ... --reactor kqueue ... before it mentions kqreactor.install(), since hopefully that will be the more common form of installing the reactor.

Additionally, I'd appreciate it if you'd update https://github.com/oberstet/txkqreactor to point to this ticket, while we're waiting for it to get landed.

Thanks again for your work! This review looks pretty big, but a lot of the things I mentioned were really tiny changes, so I am hopeful that it will land on trunk soon.

comment:39 follow-up: Changed 3 years ago by zectbumo

  • Cc alfred@… removed

comment:40 Changed 3 years ago by oberstet

Hello Glyph,

thanks for your detailed review. And: I'm sorry causing you trouble/time by me missing the review tag (not properly linking to the attached patch). Please bare with me if I'm doing things wrong: first time I contribute to Twisted.

Then, I've added a link to this ticket on https://github.com/oberstet/txkqreactor and a prominent note that it's only "temporary" until this ticket gets resolved.

Details:

1.1/1.2: I'm sorry this caused you trouble/time .. will do in future.

1.3: done

2,3: done

4: this is "fixed" by the patch to "twisted/internet/test/test_fdset.py" which deactives the test, like for epoll, since there is no event generated for a lost FD by kqueue (like for epoll)

5,6,7: done

9,10,11: done

===

This leaves 8: yes, I'm uneasy about this also. I first tried to circumvent the issue by being more strict (the wasLost flag). This helped, but not fully.

When you take out the except catch all, those tests will fail:

twisted.conch.test.test_conch

CmdLineClientTestCase

test_exec ... [OK]
test_localToRemoteForwarding ... XX - addWriter 12 32 [Errno 32] Broken pipe

XX - removeReader 12 2 [Errno 2] No such file or directory

[OK]

test_remoteToLocalForwarding ... XX - removeReader 12 2 [Errno 2] No such file or directory

[OK]

OpenSSHClientForwardingTestCase

test_exec ... [OK]
test_localToRemoteForwarding ... XX - removeReader 12 2 [Errno 2] No such file or directory

[OK]

test_remoteToLocalForwarding ... XX - removeReader 12 2 [Errno 2] No such file or directory

[OK]

Sprinkled in is log output from the place where it fails, the FD and the errno from OSError.

Does this point to something specific which needs to be looked at, or is it safe to ignore those?

===

The trial output for the new patch does not change .. it still succeeds on all but

twisted.internet.test.test_posixprocess.FileDescriptorTests.test_expectedFDs
twisted.internet.test.test_process.PTYProcessTestsBuilder_KQueueReactor.test_openFileDescriptors
twisted.internet.test.test_process.PTYProcessTestsBuilder_PollReactor.test_openFileDescriptors
twisted.internet.test.test_process.PTYProcessTestsBuilder_SelectReactor.test_openFileDescriptors
twisted.internet.test.test_process.ProcessTestsBuilder_KQueueReactor.test_openFileDescriptors
twisted.internet.test.test_process.ProcessTestsBuilder_PollReactor.test_openFileDescriptors
twisted.internet.test.test_process.ProcessTestsBuilder_SelectReactor.test_openFileDescriptors

which are the open FD issue unrelated to reactor.

===

Since I didn't expect review to be so detailed, I'd like to get detailed also and mention a few things;)

The Mac PTY issue. It seems kqueue on Mac isn't able to to Pipes .. I've prepared a small test which shows this:

https://github.com/oberstet/txkqreactor/blob/master/test/test_pty.py

It'll work on FreeBSD, but not on my Mac.

I see 3 options:

  1. keep the patch as is
  2. deactivate the failing tests on platform Mac for kqueue
  3. make a reactor that uses kqueue, but runs select on background thread for PTY support

I'd be happy with a., would supply time to do b. if really desired, but not c. - don't know if thats feasible at all.

===

I've added docstring text which mentions that you need 2.6.5 due to a Python bug (also discussed previsouly on this ticket).

Should be check the exact py version during import in kqreactor?

Should we fallback to select26 module for Py versions lower?

I'm not sure if it's worth .. if I wanna run kqueue, use 2.6.5 or later seems pretty clear.

===

The timeout parameter handling (doKEvent). This is modeled after epoll reactor: take 1s if timeout == None. However, i.e. poll reactor will just pass through timeout, and select reactor will also pass through unless on win32 where it does fancy things.

What would be the right thing?

===

The kqueue reactor on calendarserver does event registration little bit different

https://trac.calendarserver.org/browser/CalendarServer/branches/users/cdaboo/clustering-3012/kqreactor.py?rev=3027#L98

It'll add event filters for both read and write (upon _first_ Reader _or_ Writer add), and then only selectively enables/disables those as Readers/Writers are added/removed.

Filters in the kqueue will remain for reactor lifetime. So if I add Readers for 10000 FDs, those will get added to kqueue, be enabled. Then, when I remove those Readers, the 10000 FDs will stay as filter in kqueue, but disabled. My guess would be that enable/disable only affects event delivery, but filtering is done previously.

I cannot see a reason for doing so .. it might affect performance negatively. It complicates stuff. So why do it?

Changed 3 years ago by oberstet

Revised patch after review

comment:42 Changed 3 years ago by thijs

  • Owner oberstet deleted

Putting patch up for review (as described on ReviewProcess#Authors:Howtogetyourchangereviewed).

comment:43 Changed 3 years ago by oberstet

  • Owner set to glyph

Ok, so review process = add patch, add "review", remove owner OR assign someone who agreed to review. Hopefully I now get this right. And hopefully it's ok if I shamelessly assign this review to Glyph;)

I want to come back to the point 8 of Glyph's review: the except catch all.

It now has bitten me .. I did more testing with an application starting a couple of network services.

When I start the app non-daemonized:

twistd --reactor kqueue --logfile autobahn.log -n -y autobahnhub.tac

everything is fine and works. Now, when I remove from above the "-n" making the app daemonize, it'll break:

In addReader(), in

self._updateRegistration(fd, KQ_FILTER_READ, KQ_EV_ADD)

and from

self._kq.control([kevent(fd, filter, op)], 0, 0)

raises OSError with bad FD.

Later, when the doKEvent() event loop is entered, it'll raise OSError with bad FD all the time in

l = self._kq.control([], len(self._selectables), timeout)

==

As said, this happens only when running daemonized.

Obviously, this needs to be fixed. The patch is not ready for prime time as things stand.

My problem is: since the whole trial runs fine and non-daemonized works also .. I just don't known where to look / proceed?

Any ideas?

comment:44 Changed 3 years ago by exarkun

Daemonization involves a fork and an exec. The kqueue descriptor is not inherited by the child process created by the fork call.

comment:45 Changed 3 years ago by oberstet

Ah, ok. This explains the behavior.

The man page

http://www.freebsd.org/cgi/man.cgi?query=kqueue&sektion=2

says:

"The queue is not inherited by a child created with fork(2).
However, if rfork(2) is called without the RFFDG flag, then the descrip-
tor table is shared, which will allow sharing of the kqueue between two
processes."

Is it possible to make Twisted use rfork() for daemonization?

comment:46 follow-up: Changed 3 years ago by oberstet

After reading http://lwn.net/Articles/430684/ (which in the middle part has some discussion comparing epoll and kqueue), I am wondering why Twisted seems to register readers/writers on the reactor _before_ forking for daemonization anyway.

Why isn't the fork done first, and then on the forked process the readers/writers are registered?

Or do I get things wrong?

comment:47 in reply to: ↑ 46 Changed 3 years ago by glyph

Replying to oberstet:

After reading http://lwn.net/Articles/430684/ (which in the middle part has some discussion comparing epoll and kqueue), I am wondering why Twisted seems to register readers/writers on the reactor _before_ forking for daemonization anyway.

Why isn't the fork done first, and then on the forked process the readers/writers are registered?

Because we want to report at least basic errors (like "I couldn't bind the port") to the terminal before daemonizing, so that if someone does twistd web blah and then goes on with their day, they'll notice if the web server isn't actually running. Actually there are a lot more errors that we need to report, since the design of that system is super old and therefore somewhat haphazard, but we shouldn't make it even worse :).

It's hypothetically possible to have a server which does the listen() in the parent process and the monitoring-for-read in the forked process, but Twisted's API (e.g. listenTCP) implies both.

I think it would probably be easier to have an atfork-style API to re-create the kqueue FD than to try to re-orient our entire API to this one obscure platform detail.

comment:48 Changed 3 years ago by glyph

Another option: rewrite the daemonization code in twistd so that it fork() happens super early in the process's lifetime, then have the parent process stay alive and report errors that it receives from the forked child where all user code is executed. I don't know if this would break anything though.

comment:49 follow-up: Changed 3 years ago by exarkun

I think it would probably be easier to have an atfork-style API to re-create the kqueue FD than to try to re-orient our entire API to this one obscure platform detail.

Except it can only be atfork-style. It would never work reliably, and people would forever be doing their own forking (which is not supported, but mostly works) and then asking us why.

Another option: rewrite the daemonization code in twistd so that it fork() happens super early in the process's lifetime, then have the parent process stay alive and report errors that it receives from the forked child where all user code is executed. I don't know if this would break anything though.

This would be a lot better than what we do now (and what we do now is sort of an accident that has a couple nice behaviors and a lot of bad ones). I think therve may have even started an implementation of this (but I don't remember for what ticket, and possibly I'm thinking of something different).

comment:50 in reply to: ↑ 49 Changed 3 years ago by therve

This would be a lot better than what we do now (and what we do now is sort of an accident that has a couple nice behaviors and a lot of bad ones). I think therve may have even started an implementation of this (but I don't remember for what ticket, and possibly I'm thinking of something different).

You may be thinking of #823.

comment:51 Changed 3 years ago by oberstet

It's probably fair to call it an "implementation detail":

people.freebsd.org/~jlemon/papers/kqueue.pdf

"The current implementation does not pass kqueue de-
scriptors to children unless the new child will share its
file table with the parent via rfork(RFFDG). This may be
viewed as an implementation detail; fixing this involves
making a copy of all knote structures at fork() time, or
marking them as copy on write."

However it might also apply to NetBSD, OpenBSD and Darwin.

On the other hand, when I daemonize, I'm prepared/expect
to have to look into log files for any errors. I might
even don't want to have any errors/output replicated to
terminal at all.

But if the current terminal output behavior must be held
up, then the variant of "early fork in twistd and report
back to parent process" sounds easier/cleaner ..

Are we then talking about the code in

Twisted/twisted/scripts/_twistd_unix.py

or is this sprinkled more broadly?

Changed 3 years ago by oberstet

Revised patch : daemonization, correct news file

comment:52 follow-up: Changed 3 years ago by oberstet

After looking at the code for twistd for a while, I now think that the 2nd option Glyph outlined (doing fork very early, report back to terminal controlling process until that goes away) would involve considerable refactorings. In particular, the reactor is already created during parsing (!) of options.

Anyway. I've implemented the 1st option outlined by Glyph. Its done by _beforeFork and _afterFork on the reactor called from the daemonization code. Please see the comments in the code. Daemonization now works.

I think this should be acceptable: it's minimally invasive, does it's job, and has zero impact on other stuff.

I've also fixed the NEWS file (recognized that I must include a new one).

The only thing still left is PTY on Mac .. which I can't do anything about .. Apple must fix their kqueue implementation.

Thus, I think the patch now really is ready for final review.

comment:53 in reply to: ↑ 52 Changed 3 years ago by glyph

  • Keywords review removed
  • Owner changed from glyph to oberstet

Replying to oberstet:

Anyway. I've implemented the 1st option outlined by Glyph. Its done by _beforeFork and _afterFork on the reactor called from the daemonization code. Please see the comments in the code. Daemonization now works.

I think this should be acceptable: it's minimally invasive, does it's job, and has zero impact on other stuff.

Broadly speaking I agree.

However,

  1. daemonize needs a docstring. All new or changed code needs to get a docstring.
  2. There should be direct unit tests for the new functionality; these will likely have to completely fake most of the things that daemonize depends on, but that's OK; this is pretty low-level stuff.
  3. The existence of private methods is a dicey way to test for this, especially the dual-check "does it have the attribute", "is the attribute callable". I'd rather have the reactor explicitly opt in, via an interface that we can check with providedBy. That way there's no risk that someone will have a custom reactor that accidentally had such a private method. I think we should also call these methods something like beforeDaemonize and afterDaemonize because fork implies that it will also be called at spawnProcess time.

As to exarkun's criticism, "people would forever be doing their own forking", that's true; but, this isn't going to break anyone who is already doing that (since it's only for kqueue, and nobody can use kqueue right now because it's broken anyway), and I do keep telling everyone who tries this to just use twistd :).

In the long run we should definitely do the bigger refactor though.

I've also fixed the NEWS file (recognized that I must include a new one).

Thanks :).

The only thing still left is PTY on Mac .. which I can't do anything about .. Apple must fix their kqueue implementation.

You are correct about this.

Thus, I think the patch now really is ready for final review.

Changed 3 years ago by oberstet

Revised patch: docstring, IReactorDaemonize

comment:54 Changed 3 years ago by oberstet

  • Branch changed from branches/kqreactor-1918-3 to trunk
  • Keywords review added
  • Owner changed from oberstet to glyph

1., 3.: Done. Rerun trial. All as before. And daemonization works.

2.: What I think is: IReactorDaemonize does not provide new functionality, but introduces a liability for daemonize(). Consequently, I think the unit test needs to provide a fake reactor that implements IReactorDaemonize, then run twistd while using that reactor and run daemonized and then check whether the hooks are actually called. I do think this should somehow reside in twisted/test/test_twistd.py.

However, I just don't know how to orchestrate and simply do above with trial. I need help on that. It sounds trivial for someone who knows trial, and likely would cost me hours of experimenting and learning.

The other look at IReactorDaemonize would be: a reactor that announces IReactorDaemonize. But since it doesn't change the outward functionality of the reactor, what to test? The only thing: if reactor announces IReactorDaemonize, and we do NOT call the hooks, will it break? But thats not a unit test ..

comment:55 Changed 3 years ago by oberstet

2 more notes (unrelated to the daemonization stuff):

a) Is it allowed to add a reader and a writer for a FD which are different? I guess not, since the reactor code here and in other reactors rely on that being not done.

b) The timeout parameter of doIteration. What is allowed, or where is the interface for doIteration defined anyway?

My previous comment:
"The timeout parameter handling (doKEvent). This is modeled after epoll reactor: take 1s if timeout == None. However, i.e. poll reactor will just pass through timeout, and select reactor will also pass through unless on win32 where it does fancy things.

What would be the right thing? "

comment:56 Changed 3 years ago by exarkun

  • Branch trunk deleted

Thanks for your continuing efforts on this oberstet. Just a note about tracker workflow. The "branch" field is supposed to name the branch which holds the code related to the ticket. It looks like your latest patch is against trunk, but trunk does not have the code. So the branch field should probably be empty now. It should pretty much never contained trunk.

Please correct me if I have misinterpreted the last few comments on this ticket.

comment:57 Changed 3 years ago by exarkun

  • Keywords review removed
  • Owner changed from glyph to oberstet

However, I just don't know how to orchestrate and simply do above with trial.

One thing that will make this easier is to not use the global reactor inside of daemonize. Instead, accept the reactor as a parameter. Then you can easily supply first a reactor that does not implement IReactorDaemonize and then one which does implement it - just by passing them as an argument.

Next you'll need to prevent daemonize from actually daemonizing. The solution here is quite similar: pass in an alternate implementation of the os module which has the necessary methods, but which implements them to not actually exit, fork, close descriptors, etc. We have something like this already, twisted.test.test_process.MockOS. It looks like it implements much of what daemonize wants. It's not my favorite test helper, but at least it exists.

what to test? The only thing: if reactor announces IReactorDaemonize, and we do NOT call the hooks, will it break? But thats not a unit test ..

I'm not sure I really follow what you're saying here. The unit test which passes an IReactorDaemonize-providing reactor to daemonize should assert that the hooks are called. If they are not called, the test should fail. This is testing exactly what we want daemonize to do: call the hooks if they're present.

Is it allowed to add a reader and a writer for a FD which are different? I guess not, since the reactor code here and in other reactors rely on that being not done.

Indeed. Since it won't work, saying it's not allowed makes sense. :) If someone really needed this functionality, they could request it (and perhaps supply a patch implementing it for all our reactors, including kqueue). That's something to worry about when someone needs it, though.

The timeout parameter of doIteration. What is allowed, or where is the interface for doIteration defined anyway?

doIteration isn't part of any interface. It's part of the common factoring of our reactor implementations. It's generally called by reactor.mainLoop, which is beneath reactor.run. The timeout value is the maximum number of seconds to wait for events before returning. It is used to implement delayed calls, which may need to run before any I/O otherwise wakes up the reactor.

On Windows we wake up sooner than the timeout because Ctrl-C will not interrupt the Windows I/O notification APIs as we currently use them. You shouldn't need to emulate this for kqueue.

As far as the interface goes.... we should be able to eliminate it after #823 is resolved (because then the reactor should not be instantiated until after daemonization). I guess that just means we'll deprecate it and then get rid of it once #823 happens, though.

Thanks!

Changed 3 years ago by oberstet

Revised patch: unit tests

comment:58 Changed 3 years ago by oberstet

  • Keywords review added
  • Owner changed from oberstet to exarkun

ok, I added unit tests as outlined above.

I reran the whole trial against new kqueue and select.

As before, both fail on

twisted.internet.test.test_posixprocess.FileDescriptorTests.test_expectedFDs
twisted.internet.test.test_process.PTYProcessTestsBuilder_KQueueReactor.test_openFileDescriptors
twisted.internet.test.test_process.PTYProcessTestsBuilder_PollReactor.test_openFileDescriptors
twisted.internet.test.test_process.PTYProcessTestsBuilder_SelectReactor.test_openFileDescriptors
twisted.internet.test.test_process.ProcessTestsBuilder_KQueueReactor.test_openFileDescriptors
twisted.internet.test.test_process.ProcessTestsBuilder_PollReactor.test_openFileDescriptors
twisted.internet.test.test_process.ProcessTestsBuilder_SelectReactor.test_openFileDescriptors

which is related to FreeBSD and unrelated to this patch.

However, there is one strange thing

twisted.test.test_iutils.ProcessUtilsTests.test_outputWithErrorIgnored

works when run as part of trial test "twisted.test.test_iutils", but will fail when run _as part of the whole trial_ "twisted".

=====

[autobahn@autobahnhub ~/temp]$ trial -r kqueue --rterrors twisted.test.test_iutils
twisted.test.test_iutils

ProcessUtilsTests

test_getProcessOutputAndValueDefaultPath ... [OK]
test_getProcessOutputAndValuePath ... [OK]
test_getProcessOutputDefaultPath ... [OK]
test_getProcessOutputPath ... [OK]
test_getProcessValueDefaultPath ... [OK]
test_getProcessValuePath ... [OK]
test_output ... [OK]
test_outputAndValue ... [OK]
test_outputSignal ... [OK]
test_outputWithErrorCollected ... [OK]
test_outputWithErrorIgnored ... [OK]
test_value ... [OK]

WarningSuppression

testSuppressWarnings ... [OK]


Ran 13 tests in 0.758s

PASSED (successes=13)
[autobahn@autobahnhub ~/temp]$

=====

test_outputWithErrorIgnored ... Traceback (most recent call last):

Failure: twisted.internet.defer.TimeoutError: <twisted.test.test_iutils.ProcessUtilsTests testMethod=test_outputWithErrorIgnored> (test_outputWithErrorIgnored) still running at 120.0 secs
[ERROR]
Traceback (most recent call last):
Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was unclean.
Selectables:
<twisted.internet.process.ProcessWriter object at 0x296176cc>
[ERROR]

...

===============================================================================
[ERROR]
Traceback (most recent call last):
Failure: twisted.internet.defer.TimeoutError: <twisted.test.test_iutils.ProcessUtilsTests testMethod=test_outputWithErrorIgnored> (test_outputWithErrorIgnored) still running at 120.0 secs

twisted.test.test_iutils.ProcessUtilsTests.test_outputWithErrorIgnored
===============================================================================
[ERROR]
Traceback (most recent call last):
Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was unclean.
Selectables:
<twisted.internet.process.ProcessWriter object at 0x296176cc>

twisted.test.test_iutils.ProcessUtilsTests.test_outputWithErrorIgnored


Ran 7068 tests in 255.442s

FAILED (skips=614, expectedFailures=11, failures=1, errors=8, successes=6435)

comment:59 Changed 3 years ago by exarkun

  • Owner exarkun deleted

comment:60 Changed 3 years ago by oberstet

  • Owner set to glyph

Is there anything missing still / should I do something to finalize this?

comment:61 Changed 3 years ago by glyph

  • Owner glyph deleted

You'll have to wait for someone to get around to reviewing it :).

I might be the one to do it, but assigning it to me just means that other folks might think that it needs specifically to be done by me and not even try. So I'll assign it back to nobody.

comment:62 Changed 3 years ago by exarkun

  • Keywords review removed
  • Owner set to oberstet

Thanks for sticking with this one!

  1. twisted/scripts/_twistd_unix.py
    1. The parameters daemonize_reactor and daemonize_os are not named according to the Twisted naming convention. Use camelCase not under_scores. Although in this case, the parameters would be better named just reactor and os.
    2. These two parameters also need documentation.
  2. twisted/test/test_twistd.py
    1. FakeNonDaemonizingReactor should be new-style
    2. FakeDaemonizingReactor doesn't need to re-document the instance attributes it inherits
    3. Test methods named like test_DaemonizationHooksCalled should instead be named like test_daemonizationHooksCalled.
    4. Test method docstrings shouldn't be written in the style "Check that X." Instead, just write them like "X."
    5. Local variables like test_reactor should be named according to the naming convention, or simply like reactor. It's a test method, after all. It's more or less implied that any variables are going to be related to testing.
    6. Don't combine two booleans in an assertTrue. If the test fails, this hides the specific manner in which the assertion is failing. Alternatively, combine them but add a custom message which gives all of the necessary information to understand the failure.
  3. twisted/internet/interfaces.py
    1. I want to somehow warn people away from IReactorDaemonize. I don't know how, though. This isn't really an interface for applications to call. It's not even a very good interface for new reactor implementations to implement (because they should just avoid initializing themselves until run is called). It's just there to make the current kqueue reactor implementation work until we improve it. Maybe those points are worth including in its docstring somehow.
  4. twisted/internet/kqreactor.py
    1. Use ()-style import line continuations, instead of \-style
    2. in _doWriteOrRead, the log.err call should include a second positional argument to describe the source of the error (other reactors/Twisted code are not very good about this, but they should be).
  5. PTY support may be out of our hands, but we need to do something about the unit tests for it. The failure mode at the moment seems to be to hang (indefinitely or for two minutes, I'm not sure). This will make it difficult to keep a builder running with this reactor, which will hamper our efforts to maintain and improve it. Some sort of skip for these unit tests is fine, I think - perhaps we could even split out an IReactorProcess base class that does not include PTY support, and have kqueue reactor implement that (on OS X).

I didn't measure code coverage, since the only BSD-like machine I have access to is an OS X Snow Leopard machine, which has both the stdlib kqueue one-fd bug (Python 2.6.1) and the PTY bugs, preventing a full run of the test suite.

Thanks again.

comment:63 Changed 3 years ago by oberstet

  • Keywords review added
  • Owner changed from oberstet to exarkun

Ok, another patch:

1.1 done
1.2 done
2.1 I don't understand
2.2 done
2.3 done
2.4 done
2.5 done
2.6 done
3.1 done
4.1 done
4.2 done

  1. see below

Regarding 2.1, whats wrong with the current code? It has a ctor and 2 methods. Should be fine.

5.: I've just checked on Mac: the first test trial will hang is

twisted.conch.test.test_manhole

ManholeLoopbackStdio

testClassDefinition


It'll hang for 120s. Then go on.

How do I find all PTY related tests? And how do I skip those when reactor == kqueue and os = Mac?

In any case: why is a Mac builder for Twisted kqueue needed? Or at least: can't we defer issue 5? The patch seems to be useful/working, and honestly I already invested much more time than expected getting from a working patch to having a "Twisted conforming" one.

Changed 3 years ago by oberstet

revised for review comments of exarkun

comment:64 Changed 3 years ago by oberstet

FWIW, here is how to detect whether kqueue implementation supports PTY:

https://github.com/oberstet/txkqreactor/blob/master/test/test_pty.py

comment:65 Changed 3 years ago by exarkun

> > 2. twisted/test/test_twistd.py
> >   1. FakeNonDaemonizingReactor should be new-style 
> I don't understand

FakeNonDaemonizingReactor just needs to subclass object (as all newly introduced classes in Twisted should, so we stop accumulating new "classic" classes).

How do I find all PTY related tests?

One trick for this might be to just make spawnProcess raise a bogus exception when someone tries to use a pty on os x. This should make all such tests fail quickly, and then we can decide how to handle the resulting list of tests (hopefully they'll appear in a few clusters so they don't all need to be handled individually).

In any case: why is a Mac builder for Twisted kqueue needed?

OS X is the only BSD-like platform we have a slave for. We have an offer of some hosting resources for a FreeBSD slave, but someone with FreeBSD experience needs to set it up and perhaps offer a minimal level of maintenance (making sure it stays running, fixing any platform-specific issues that might arise that interfere with the slave itself (although helping out with platform-specific issues in the test suite is also very helpful :), etc). Am I right in guessing that if you're not too interested in OS X, you are interested in FreeBSD instead?

An alternative to figuring out all the OS X/PTY test failures would be to only set up the FreeBSD slave and only claim kqueue is supported on FreeBSD. Someone else with more interest in OS X can deal with getting a clean test suite run on OS X later.

comment:66 follow-up: Changed 3 years ago by oberstet

  1. : ah, ok. now I see. done: kqueue.7.patch
  1. : yes, you are right;) I don't care about OSX, only FBSD. Instead of investing time into osx, I'd rather provide resources for a FreeBSD slave. Can be time or hosting. Could do the latter also. Is there any document that describes whats needed for a slave and how it works? How do I get started?

Changed 3 years ago by oberstet

issue 5.: use new style class

comment:67 in reply to: ↑ 66 ; follow-up: Changed 3 years ago by glyph

Replying to oberstet:

  1. : ah, ok. now I see. done: kqueue.7.patch
  1. : yes, you are right;) I don't care about OSX, only FBSD. Instead of investing time into osx, I'd rather provide resources for a FreeBSD slave. Can be time or hosting. Could do the latter also. Is there any document that describes whats needed for a slave and how it works? How do I get started?

I care about OS X, but I don't care a whole lot about OSX+kqueue+spawnProcess; there are enough other ways to get code which requires some subset of those features to work. Ideally we could just add a slave to our existing builder but have some conditional test skips for PTY-based tests.

comment:68 Changed 3 years ago by glyph

(oops. I meant OSX+kqueue+spawnProcess+PTYs. Actually i do care about OSX+kqueue+spawnProcess ;-))

comment:69 in reply to: ↑ 67 ; follow-up: Changed 3 years ago by oberstet

Replying to glyph:

Replying to oberstet:

  1. : ah, ok. now I see. done: kqueue.7.patch
  1. : yes, you are right;) I don't care about OSX, only FBSD. Instead of investing time into osx, I'd rather provide resources for a FreeBSD slave. Can be time or hosting. Could do the latter also. Is there any document that describes whats needed for a slave and how it works? How do I get started?

I care about OS X, but I don't care a whole lot about OSX+kqueue+spawnProcess; there are enough other ways to get code which requires some subset of those features to work. Ideally we could just add a slave to our existing builder but have some conditional test skips for PTY-based tests.

Do I get this right? :

  1. currently there is neither a OS X nor a FreeBSD slave
  2. ideally there should be both of those
  3. the OS X slave requires skipping PTY tests for kqueue. The detection of unsupported PTY could be done by simply detecting the OS X platform, or - IMHO better - using feature detection (see my comment # 64)

As said, I would help with the FreeBSD slave.


Is 3. then still required to have the patch accepted, or could that be done in _another_ ticket (and by someone actually interested in the OS X platform ;) ??

I don't wanna sound silly/frustrated, but my motivation of doing 3. approaches zero ...

P.S.: any docs for builder slaves?

comment:70 in reply to: ↑ 69 ; follow-ups: Changed 3 years ago by exarkun

Replying to oberstet:

Replying to glyph:

I care about OS X, but I don't care a whole lot about OSX+kqueue+spawnProcess; there are enough other ways to get code which requires some subset of those features to work. Ideally we could just add a slave to our existing builder but have some conditional test skips for PTY-based tests.

Do I get this right? :

  1. currently there is neither a OS X nor a FreeBSD slave

There is an OS X slave, but it isn't testing kqueue reactor.

  1. ideally there should be both of those

Since OS X mostly supports kqueue, yes - ideally, at some point, a kqueue-based reactor would be available on both OS X and FreeBSD (who knows, perhaps even on some other BSDs as well :).

  1. the OS X slave requires skipping PTY tests for kqueue. The detection of unsupported PTY could be done by simply detecting the OS X platform, or - IMHO better - using feature detection (see my comment # 64)

Yes. As long as tests are failing (consistently), we don't consider a platform supported. Relatedly, we don't introduce changes that make tests fail (consistently) on a supported platform. We do testing on unsupported platforms, but they get less attention.

As said, I would help with the FreeBSD slave.


Is 3. then still required to have the patch accepted, or could that be done in _another_ ticket (and by someone actually interested in the OS X platform ;) ??

OS X issues can absolutely be addressed in another ticket. Just adding kqueue support for FreeBSD is a major feature addition, and is worthwhile independent of any OS X issues.

I don't wanna sound silly/frustrated, but my motivation of doing 3. approaches zero ...

I understand. Thanks very much for sticking with it for this long. :)

P.S.: any docs for builder slaves?

Yep - http://twistedmatrix.com/trac/wiki/ContinuousIntegration

I'm trying to dig up credentials for an existing FreeBSD machine we should have access to. If I succeed, and you're interested in setting up/maintaining a slave on it, I can email you the details.

comment:71 in reply to: ↑ 70 Changed 3 years ago by oberstet

Replying to exarkun:

P.S.: any docs for builder slaves?

Yep - http://twistedmatrix.com/trac/wiki/ContinuousIntegration

Ah, ok. BuildBot. Used that before ..

I'm trying to dig up credentials for an existing FreeBSD machine we should have access to. If I succeed, and you're interested in setting up/maintaining a slave on it, I can email you the details.

Thanks! Either that, or I can host a virtualized one on machine in a datacenter.

So, more important for me: can you give me (tobias.oberstein@…) a slave name/password and configure that on the build master?

comment:72 Changed 3 years ago by oberstet

ok, full email: tobias.oberstein at tavendo.de

another thing: I would setup a slave running FreeBSD 8.2 i386 .. so slave name could be something like freebsd82_i386 ..

comment:73 in reply to: ↑ 70 ; follow-up: Changed 3 years ago by glyph

Replying to exarkun:

Replying to oberstet:

Replying to glyph:

Is 3. then still required to have the patch accepted, or could that be done in _another_ ticket (and by someone actually interested in the OS X platform ;) ??

OS X issues can absolutely be addressed in another ticket. Just adding kqueue support for FreeBSD is a major feature addition, and is worthwhile independent of any OS X issues.

Just to reinforce this, if no tests fail on the existing OS X builder I'm perfectly happy for that work to happen on a different ticket. I'm sorry if my earlier comment gave the impression otherwise.

Also, I'd be happy to take this on, so if you wouldn't mind describing things in detail on that ticket (maybe copy some of comment 63, to remind me) and assign it to me, please do so.

I don't wanna sound silly/frustrated, but my motivation of doing 3. approaches zero ...

I understand. Thanks very much for sticking with it for this long. :)

Ditto. This has been a significant chunk of work, and a feature I'm personally happy to see land, so your perseverance is noted and very much appreciated.

comment:74 in reply to: ↑ 73 ; follow-up: Changed 3 years ago by oberstet

Replying to glyph:

Replying to exarkun:

Replying to oberstet:

Replying to glyph:

Is 3. then still required to have the patch accepted, or could that be done in _another_ ticket (and by someone actually interested in the OS X platform ;) ??

OS X issues can absolutely be addressed in another ticket. Just adding kqueue support for FreeBSD is a major feature addition, and is worthwhile independent of any OS X issues.

Just to reinforce this, if no tests fail on the existing OS X builder I'm perfectly happy for that work to happen on a different ticket. I'm sorry if my earlier comment gave the impression otherwise.

I am trying to verify if "no tests fail on the existing OS X builder".

When I do the following on OS X (using trunk with the patch applied)

trial -r select --rterrors twisted

I get 11 errors in total. Here is the breakdown:

E1:
1 failed test case:

twisted.trial.test.test_loader

=> Tries to create a directory under the Twisted installation directory .. permission denied.

This is unrelated to the patch.

E2:
2 failed test cases:

twisted.test.test_enterprise
twisted.test.test_reflector

=> "from twisted.enterprise.adbapi import _safe" fails .. "cannot import name _safe"

This is unrelated to the patch.

E3:

1 failed test case:

TCPClientTestsBuilder_KQueueReactor

test_protocolGarbageAfterLostConnection


"Can't stop a reactor that isn't running"

E4:

7 failed test cases:

PTYProcessTestsBuilder_KQueueReactor

test_openFileDescriptors
...


===

So as far as I can see only E3 and E4 are kqueue related.

However, I am wondering why those cases are even run for kqueue _although_ I started trial with option "-r select".

How does a BuildBot slave run the trial? Similar to how I did it manually?

===

Anyway, would you agree that fixing E3 and E4 (that is make those tests be skipped on OS X) should make the patch finally acceptable?

===

PS: I'll create a new ticket ("Skip PTY tests on OS X when running kqueue") when this patch is done and it's clear whats being left to do for OS X.

comment:75 follow-up: Changed 3 years ago by oberstet

For completeness, here is the breakdown for FreeBSD when running the same

trial -r select --rterrors twisted

E1 and E2 are the same.

Further, the following cases fail/err:

twisted.internet.test.test_posixprocess.FileDescriptorTests.test_expectedFDs
twisted.internet.test.test_process.PTYProcessTestsBuilder_KQueueReactor.test_openFileDescriptors
twisted.internet.test.test_process.PTYProcessTestsBuilder_PollReactor.test_openFileDescriptors
twisted.internet.test.test_process.PTYProcessTestsBuilder_SelectReactor.test_openFileDescriptors
twisted.internet.test.test_process.ProcessTestsBuilder_KQueueReactor.test_openFileDescriptors
twisted.internet.test.test_process.ProcessTestsBuilder_PollReactor.test_openFileDescriptors
twisted.internet.test.test_process.ProcessTestsBuilder_SelectReactor.test_openFileDescriptors

This is due to platform restrictions (related to listing the open FDs of a process), but unrelated to the patch.

To make FreeBSD a "supported" platform, I'd create a separate ticket for those and fix them.

comment:76 in reply to: ↑ 75 ; follow-up: Changed 3 years ago by glyph

Replying to oberstet:

This is due to platform restrictions (related to listing the open FDs of a process), but unrelated to the patch.

To make FreeBSD a "supported" platform, I'd create a separate ticket for those and fix them.

This looks suspiciously similar to #4881. Are you using current trunk?

comment:77 in reply to: ↑ 74 Changed 3 years ago by glyph

Replying to oberstet:

Replying to glyph:

Replying to exarkun:

Replying to oberstet:

Replying to glyph:

Is 3. then still required to have the patch accepted, or could that be done in _another_ ticket (and by someone actually interested in the OS X platform ;) ??

OS X issues can absolutely be addressed in another ticket. Just adding kqueue support for FreeBSD is a major feature addition, and is worthwhile independent of any OS X issues.

Just to reinforce this, if no tests fail on the existing OS X builder I'm perfectly happy for that work to happen on a different ticket. I'm sorry if my earlier comment gave the impression otherwise.

I am trying to verify if "no tests fail on the existing OS X builder".

When I do the following on OS X (using trunk with the patch applied)

E2:
2 failed test cases:

twisted.test.test_enterprise
twisted.test.test_reflector

=> "from twisted.enterprise.adbapi import _safe" fails .. "cannot import name _safe"

This is unrelated to the patch.

There's no such module in current trunk: http://twistedmatrix.com/trac/browser/trunk/twisted/test/test_enterprise.py

Did you fail to clean up PYCs maybe?

E3:

1 failed test case:

TCPClientTestsBuilder_KQueueReactor

test_protocolGarbageAfterLostConnection


"Can't stop a reactor that isn't running"

This looks suspicious, like it might actually be related to the patch. If this is going to be run by the existing buildbot it might need fixing.

So as far as I can see only E3 and E4 are kqueue related.

However, I am wondering why those cases are even run for kqueue _although_ I started trial with option "-r select".

These are the reactor-builder tests. They test all the reactors using tests which build the reactor, then run very specific tests against that created reactor, then clean it up. What -r select says to trial is to use the select reactor as the global reactor, i.e. the one that gets spun up if you return a Deferred from your test case.

Generally, the reactor builder tests are a lot better, as they demonstrate more isolated failures, and we are moving to test all reactors in this way as much as possible.

How does a BuildBot slave run the trial? Similar to how I did it manually?

Pretty close, but if you want to know exactly, here's the builder: http://buildbot.twistedmatrix.com/builders/osx10.6-py2.6-select

Anyway, would you agree that fixing E3 and E4 (that is make those tests be skipped on OS X) should make the patch finally acceptable?

As long as there's an accompanying ticket to fix it later :). E3 looks like a potentially real problem, but given that this is new functionality, skipping it on OS X for now and fixing it separately is OK.

PS: I'll create a new ticket ("Skip PTY tests on OS X when running kqueue") when this patch is done and it's clear whats being left to do for OS X.

Sounds good, although any failing reactor builder tests need to be skipped before landing in order to not break the existing builder, as noted above.

comment:78 in reply to: ↑ 76 ; follow-up: Changed 3 years ago by oberstet

Replying to glyph:

Replying to oberstet:

This is due to platform restrictions (related to listing the open FDs of a process), but unrelated to the patch.

To make FreeBSD a "supported" platform, I'd create a separate ticket for those and fix them.

This looks suspiciously similar to #4881. Are you using current trunk?

Yep, I did use trunk.

And yes, the ticket is exactly about this issue.

In the meantime (with the help of exarkun), I've setup a FreeBSD build slave.

After mounting

mount -t fdescfs null /dev/fd

those cases also succeed:

http://buildbot.twistedmatrix.com/builders/freebsd-8.2-i386/builds/3

Is #4881 already merged on trunk? If yes, then the skipping of the open FD tests when "fdescfs" is NOT mounted doesn't seem to work.

But I will automount the fdescfs on the slave anyway .. now that I know how to.

comment:79 in reply to: ↑ 78 Changed 3 years ago by glyph

Replying to oberstet:

Replying to glyph:

Replying to oberstet:

This is due to platform restrictions (related to listing the open FDs of a process), but unrelated to the patch.

To make FreeBSD a "supported" platform, I'd create a separate ticket for those and fix them.

This looks suspiciously similar to #4881. Are you using current trunk?

Yep, I did use trunk.

Make sure to update, clean up PYCs, etc, to deal with any local configuration issues.

And yes, the ticket is exactly about this issue.

In the meantime (with the help of exarkun), I've setup a FreeBSD build slave.

After mounting

mount -t fdescfs null /dev/fd

those cases also succeed:

http://buildbot.twistedmatrix.com/builders/freebsd-8.2-i386/builds/3

OK, great.

Is #4881 already merged on trunk? If yes, then the skipping of the open FD tests when "fdescfs" is NOT mounted doesn't seem to work.

It was merged to trunk here: http://twistedmatrix.com/trac/ticket/4881#comment:24

It sounds like you're saying the fix didn't actually work. When fdescfs is mounted, even earlier versions of Twisted would pass tests on FreeBSD.

But I will automount the fdescfs on the slave anyway .. now that I know how to.

That does seem like a reasonable solution as far as this ticket is concerned, but possibly you should re-open #4881 for further investigation.

comment:80 Changed 3 years ago by exarkun

That does seem like a reasonable solution as far as this ticket is concerned, but possibly you should re-open #4881 for further investigation.

Or file a new ticket. I think only reverts should re-open tickets.

comment:81 Changed 3 years ago by oberstet

  • Author changed from exarkun, therve to oberstet, exarkun, therve
  • Branch set to branches/kqueue-1918

(In [33458]) Create feature branch for #1918 ("revive kqueue reactor").

comment:82 Changed 3 years ago by oberstet

Ok, the new branch "branches/kqueue-1918" now contains the above "kqueue.7.patch".

I'd like to discuss the results of running trial against the branch on OS X and FreeBSD. Need some help;)

OS X:

http://buildbot.twistedmatrix.com/builders/osx10.6-py2.6-select/builds/1791

It has 8 errors. 7 of those are PTY issues. As already discussed, those need to be skipped on OS X. I will open a new ticket for that when the branch is merged.

The 1 other error is in:

twisted.internet.test.test_tcp.TCPClientTestsBuilder_CFReactor.test_protocolGarbageAfterLostConnection

I suspect CF = Core Foundation? How is that related to kqueue .. what is happening. I'm a bit puzzled by that one.

FreeBSD:

http://buildbot.twistedmatrix.com/builders/freebsd-8.2-i386/builds/5

It has 8 erros and 1 failure.

Those are issues I have not seen in my testing before, I guess because the buildslave I've setup has everything (but GTK) installed, and thus skips less than my manual testing before.

All 9 issue are in

twisted.test.test_stdio.StandardInputOutputTestCase

and as far as I could analyze, they are related to handling of "temp" files produced during the test cases.

I have no clue why those pop up when running kqueue, but not when running select.

Any hints on that one would be great also .. whats going on?

comment:83 follow-up: Changed 3 years ago by exarkun

  • Keywords review removed
  • Owner changed from exarkun to oberstet

As already discussed, those need to be skipped on OS X.

A simple way to do this for now might be to remove kqueue reactor from the ReactorBuilder._reactors list on OS X. There's already some platform-specific inspection code there, should it should be an easy change.

The 1 other error is in:

twisted.internet.test.test_tcp.TCPClientTestsBuilder_CFReactor.test_protocolGarbageAfterLostConnection

I suspect CF = Core Foundation? How is that related to kqueue .. what is happening. I'm a bit puzzled by that one.

CF is Core Foundation, yes. I'm not sure what's going on here. Possibly state from one of the failing KQueue reactor tests is interfering. I would try to get rid of the rest of the failures and see if this one remains. If this one stops failing, it would then also be interesting to investigate how the KQueue failures were causing this one to fail (but that doesn't need to block this ticket).

All 9 issue are in

twisted.test.test_stdio.StandardInputOutputTestCase

These may be environment setup issues, caused by a bug in the unit tests, resulting in the wrong version of Twisted being used in the child process. I would investigate what PYTHONPATH ends up being set to in the launched process and see if that leads anywhere. I suspect this is the problem because:

  1. The PYTHONPATH initialization code in StandardInputOutputTestCase._spawnProcess looks complicated and perhaps even buggy (constructing "/foo/bar:" as a value sometimes).
  2. One of the tests fails with output including "ImportError: No module named kqsyscall"

comment:84 in reply to: ↑ 83 Changed 3 years ago by oberstet

Replying to exarkun:

All 9 issue are in

twisted.test.test_stdio.StandardInputOutputTestCase

These may be environment setup issues, caused by a bug in the unit tests, resulting in the wrong version of Twisted being used in the child process. I would investigate what PYTHONPATH ends up being set to in the launched process and see if that leads anywhere. I suspect this is the problem because:

  1. The PYTHONPATH initialization code in StandardInputOutputTestCase._spawnProcess looks complicated and perhaps even buggy (constructing "/foo/bar:" as a value sometimes).
  2. One of the tests fails with output including "ImportError: No module named kqsyscall"

Thanks! You were right .. problem is module search path is not correct.

The PYTHONPATH of the spawned process _does_ point to the locally checked out code tree (and thus, to the new kqueue), but the PYTHONPATH ends up _behind_ the one pointing to the system installed Twisted within sys.path.

I can get the cases working by doing something like this:

[oberstet@txbuild ~/Twisted-1918]$ svn diff
Index: twisted/test/stdio_test_consumer.py
===================================================================
--- twisted/test/stdio_test_consumer.py (revision 33468)
+++ twisted/test/stdio_test_consumer.py (working copy)
@@ -8,7 +8,8 @@
 that process transports implement IConsumer properly.
 """

-import sys
+import sys, os
+sys.path = os.environ.get("PYTHONPATH", "").split(":") + sys.path

 from twisted.python import log, reflect
 from twisted.internet import stdio, protocol
[oberstet@txbuild ~/Twisted-1918]$

Is is acceptable to do that for all 9 files? Or do you prefer something different?

comment:85 Changed 3 years ago by oberstet

As discussed on IRC with exarkun, I did:

svn cp bin/_preamble.py twisted/test

and added

import _preamble

to the twisted/test/stdio_test_* files.

This solves the problem. FreeBSD now runs the whole trial without any error/unexpected failure:

http://buildbot.twistedmatrix.com/builders/freebsd-8.2-i386/builds/8

==

I want to note that at least 2 more files contain roughly similar code like _preamble.py:

./twisted/test/process_twisted.py

./twisted/conch/test/test_conch.py

a) The code is not the same, and it is inferior (i.e. one will only work when the local Twisted base directory is actually called "Twisted").

b) The code is replicated.

Thus, I think we should clean that up also. But probably not in this ticket. This ticket is already touching a couple of places. Agreed?

comment:86 Changed 3 years ago by oberstet

  • Keywords review added
  • Owner changed from oberstet to exarkun

Ok, the OS X platform also succeeds, after doing like suggested by exarkun in #83.

I have run the full set of builders using force-builds.py:

http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/kqueue-1918

Is there anything more to do for final merge?

comment:87 Changed 3 years ago by exarkun

(In [33478]) Cosmetic fixes

refs #1918

comment:88 Changed 3 years ago by exarkun

  • Keywords review removed
  • Owner changed from exarkun to oberstet

I made a few further changes, very minor; one is linked above, the others were r33479 and r33480.

Latest build results look fairly good (I think that's as good as Windows gets at the moment).

This looks good to me to merge now. Since this is your first merge, maybe it would be good if you found someone on IRC to walk you through it, or at least confirm your ideas of how it works if you have some already. :)

Thanks very much for your work on this!

comment:89 in reply to: ↑ 39 Changed 3 years ago by zectbumo

Replying to zectbumo:
It says I was removed from the CC but I still get emails. Will someone please remove me from the CC?

comment:90 Changed 3 years ago by oberstet

  • Resolution set to fixed
  • Status changed from new to closed

(In [33481]) Merge kqueue-1918: Revive kqueue reactor

Author: oberstet
Reviewer: exarkun, glyph
Fixes: #1918

A kqueue()/kevent() based implementation of the Twisted main loop.

This implementation depends on Python 2.6 or higher which has kqueue support
built in the select module.

Note, that you should use Python 2.6.5 or higher, since previous implementations
of select.kqueue had U{http://bugs.python.org/issue5910} not yet fixed.

Note, that on OS X, PTYs will (currently) NOT work due to platform restrictions.

comment:91 in reply to: ↑ 70 ; follow-up: Changed 2 years ago by building team

I cant get this to work with Python 2.6.5

I'm using Linux 64bit, Python 2.6.5.

The following script, causes my CPUs to go 100% after 10 seconds.

If I use gtk.main() instead of reactor.run(), there will be no problem, so I wonder if it is caused by the new version of pykqueue or a bug of twisted?

#!/usr/bin/python
#import gtk
from twisted.internet import gtk2reactor
gtk2reactor.install()
from twisted.internet import reactor
import subprocess
subprocess.Popen(['sleep', '10'])
#gtk.main()
reactor.run()

/Århus


(In [33481]) Merge kqueue-1918: Revive kqueue reactor

Author: oberstet Reviewer: exarkun, glyph Fixes: #1918

A kqueue()/kevent() based implementation of the Twisted main loop.

This implementation depends on Python 2.6 or higher which has kqueue support built in the select module.

Note, that you should use Python 2.6.5 or higher, since previous implementations of select.kqueue had U{ http://bugs.python.org/issue5910} not yet fixed.

Note, that on OS X, PTYs will (currently) NOT work due to platform restrictions.


Replying to glyph:

Replying to oberstet:

Replying to glyph:

Replying to oberstet:

This is due to platform restrictions (related to listing the open FDs of a process), but unrelated to the patch.

To make FreeBSD a "supported" platform, I'd create a separate ticket for those and fix them.

This looks suspiciously similar to #4881. Are you using current trunk?

Yep, I did use trunk.

Make sure to update, clean up PYCs, etc, to deal with any local configuration issues.

And yes, the ticket is exactly about this issue.

In the meantime (with the help of exarkun), I've setup a FreeBSD build slave.

After mounting

mount -t fdescfs null /dev/fd

those cases also succeed:

http://buildbot.twistedmatrix.com/builders/freebsd-8.2-i386/builds/3 Teambuilding århus

OK, great.

Is #4881 already merged on trunk? If yes, then the skipping of the open FD tests when "fdescfs" is NOT mounted doesn't seem to work.

It was merged to trunk here: http://twistedmatrix.com/trac/ticket/4881#comment:24 Teambuilding

It sounds like you're saying the fix didn't actually work. When fdescfs is mounted, even earlier versions of Twisted would pass tests on FreeBSD.

But I will automount the fdescfs on the slave anyway .. now that I know how to.

That does seem like a reasonable solution as far as this ticket is concerned, but possibly you should re-open #4881 for further investigation.

comment:92 Changed 2 years ago by exarkun

I cant get this to work with Python 2.6.5

Please use the mailing list or the IRC channel for support.

comment:93 in reply to: ↑ 91 Changed 2 years ago by glyph

Replying to building team:

I'm using Linux 64bit, (...) I wonder if it is caused by the new version of pykqueue or a bug of twisted?

For posterity, I just want to make it clear that this question doesn't really make sense, since kqueue is a feature of BSD-family OSes (FreeBSD, NetBSD, Mac OS X) and can't be used on Linux. Also, the kqueue reactor wouldn't be used in conjunction with the GTK reactor; they're different event-loop APIs.

As exarkun already suggested, if you don't know what your issue actually relates to, ask the question on the mailing list or IRC first and we'll attempt to steer you to the correct bug (or a new bug).

Note: See TracTickets for help on using tickets.