[Twisted-Python] doWrite on twisted.internet.tcp.Port

Barry Scott barry.scott at forcepoint.com
Tue Sep 15 11:05:47 MDT 2020


On Friday, 11 September 2020 19:28:13 BST Jean-Paul Calderone wrote:
> On Fri, Sep 11, 2020 at 1:34 PM <chris at cmsconstruct.com> wrote:
> 
> > Hey guys,
> >
> >
> >
> > Last year I hit a condition discussed in this ticket:
> > https://twistedmatrix.com/trac/ticket/4759 for doWrite called on a
> > twisted.internet.tcp.Port.
> >
> >
> >
> > I ignored it at the time since it was just on Linux, and my main platform
> > was Windows.  Now I’m coming back to it.  I’ll add context on the problem
> > below, but first I want to ask a high-level, design-type question with
> > multiprocessing and Twisted:
> >
> >
> >
> > Referencing Jean-Paul’s comment at the end of ticket 4759, I read you
> > shouldn’t fork a process (multiprocessing module) that already has a
> > Twisted reactor.  Understood.  But what about a parent process (not doing
> > anything Twisted) forking child processes, where each child process starts
> > their own Twisted reactor?  Is that intended to work from the Twisted
> > perspective?
> >
> 
> To answer the asked question, I don't think there is rigorous (or even
> casual) testing of very much of Twisted in the context of "some Twisted
> code has been loaded into memory and then the process forked".  So while it
> seems like a reasonable thing, I wouldn't say there's currently much effort
> being put towards making it a *supported* usage of Twisted.  Of course this
> can change at almost any moment if someone decides to commit the effort.
> 
> To dig a bit further into the specific problem, even if you only *import* the
> reactor in the parent process and then fork a child and try to start the
> reactor in the child, I strongly suspect epollreactor will break.  This is
> because the epoll object is created by reactor instantiation (as opposed to
> being delayed until the reactor is run).  epoll objects have a lot of weird
> behavior.  See the *Questions and Answers* section of the epoll(7) man page.

Suspect no longer. It breaks very badly.

I guess by weird you mean that the epoll uses an FD, sounds
like a great feature to me, the state is managed in the kernel.
Of course you have to be aware of this and that the FD is inherited
into children where the reactor will fail in various ways.

You now have each child that adds and removes FD's is doing it on
one shared epoll FD, but events get delivered to all the children.

What we did is use the PollReactor not the EPollReactor.
And before we fork we call reactor.removeReader on each of
the ports (sockets) we setup in the parent.

Then in the children we add the ports back in with reactor.addReader.
After that the children run just fine.

We do this so that we can open priv'ed ports that the children will share.
We drop priv's after the priv'ed ports are opened.

If you do not need to listen on sockets in the parent then simply
avoid importing reactor before you fork.

Barry


> 
> I don't know if this is the cause of your particular expression of these
> symptoms (it certainly doesn't apply to the *original* bug report which is
> on FreeBSD where there is no epoll) but it's at least *a possible* cause.
> 
> This could probably be fixed in Twisted by only creating the epoll object
> when run is called.  There's nothing particularly difficult about that
> change but it does involve touching a lot of the book-keeping logic since
> that all assumes it can register file descriptors before the reactor is
> started (think reactor.listenTCP(...); reactor.run()).
> 
> I'm not sure but it may also be the case that only delaying creation of the
> *waker* until the reactor starts would also fix this.  This is because as
> long as the epoll object remains empty a lot of the weird behavior is
> avoided and the waker is probably the only thing that actually gets added
> to it if you're just *importing* the reactor but not *running* it before
> forking.
> 
> Alternatively, your application *should* be able to fix it by studiously
> avoiding the import of twisted.internet.reactor (directly or transitively,
> of course).  You could add some kind of assertion about the state of
> sys.modules immediately before your forking code to develop some confidence
> you've managed this.
> 
> And if this is really an epoll problem then switching to poll or select
> reactor would also presumably get rid of the issue.
> 
> Jean-Paul
> 
> 
> >
> >
> >
> > Context:
> >
> > I only hit this problem on Linux, not Windows.
> >
> >
> >
> > The software project (github.com/opencontentplatform/ocp) has been a lot
> > of fun, especially with walking the tight rope in using multi-processing,
> > multi-threading, and Twisted reactors.  The main controller process kicks
> > off about 10 child processes, each doing different types of work.  In
> > general though, the child processes individually start a Twisted reactor,
> > connect to Kafka, connect to a database, use a shared REST API, and some
> > listen for connecting clients to accomplish work.
> >
> >
> >
> > I test on Linux about once a year, so too many changes to rollback and
> > figure out that way.  It was working on Linux 2 years ago, but last year’s
> > testing and current testing, receive the doWrite error.  It continues
> > running fine on Windows.  I’ve gone back about 2 years of versions with
> > Python3, Twisted, and dependent libs… on both Windows and Linux.  Every
> > version change yields the same result - continues to work on Windows and
> > continues to hit the error on Linux.  So something I added has caused Linux
> > to throw that error.
> >
> >
> >
> > I’m not explicitly sharing much between the main process and the sub
> > processes.  They are spun up with (1) a shared multiprocessing.Event() - to
> > have the main process shut the children down, (2) their own unique
> > multiprocessing.Event() - to have the child processes notify back to the
> > parent, and (3) a deep copy of a dictionary (containing a bunch of settings
> > that remain constant).  The main process uses twisted.logger, but for
> > testing I strip that out to remove any twisted imports in the main
> > process.  So I’m not importing anything Twisted in the main process, and I
> > don’t believe I’m explicitly sharing something I shouldn’t.  Seems like
> > something is implicitly being exposed/shared across Linux child processes,
> > that aren’t on Windows.
> >
> >
> >
> > The tracebacks come through on Linux (sometimes randomly), on the console
> > of the parent controller process.  No need to paste here, since it’s the
> > same as the ticket shows.  I can’t reliably reproduce the problem, but I
> > know if I stop/start client connections
> > (ServerFactory/twisted.internet.interfaces.IReactorTCP and
> > twisted.internet.protocol.ReconnectingClientFactory) then it will
> > eventually happen.  I need to devote time at whittling down the code and
> > attempting to create a reliable test case… if even possible.
> >
> >
> >
> > The error is slightly different when running HTTP vs HTTPS, but the story
> > is the same.  It cripples whatever child process that hits it, from doing
> > much of anything thereafter.  Not much luck with troubleshooting.  The
> > tracebacks do not include a calling function from my code, to tell me where
> > to start. And it happens across different child process types, so not the
> > same one each time.  When I throw debuggers on the child processes, the
> > problem seems to mask itself.  Well, at least I didn’t hit the problem over
> > the last 3 days using pudb and stepping through code at breakpoints.
> >
> >
> >
> > I’m absolutely open to suggestions for troubleshooting, but first wanted
> > to take a HUGE step back and ask a design question regarding Twisted and
> > multiprocessing.
> >
> >
> >
> > Thanks!
> > _______________________________________________
> > Twisted-Python mailing list
> > Twisted-Python at twistedmatrix.com
> > https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
> >
> 






More information about the Twisted-Python mailing list