[Twisted-Python] Experimenting with tubes

Jan Pobrislo ccx at webprojekty.cz
Tue Aug 19 16:58:19 MDT 2014


On Thu, 14 Aug 2014 14:29:56 -0700
Glyph Lefkowitz <glyph at twistedmatrix.com> wrote:

> On Aug 11, 2014, at 6:51 AM, ccx at webprojekty.cz wrote:
> 
> > Hello, I've been playing with the new tubes that are being
> > implemented:
> > http://comments.gmane.org/gmane.comp.python.twisted/27248
> > https://twistedmatrix.com/trac/ticket/1956
> 
> Thanks so much for taking the time to play with it, and taking some
> time to write feedback.
> 
> > Here are few things that I did with it. I won't publish the full
> > code now, as in it's current shape it could implode eyeballs of
> > twisted devs and possibly make them summon some of the elder gods,
> > but I'll see if I can produce something less vile as I merge the
> > ongoing changes to the tubes branch.
> 
> I'd be interested to see the code nevertheless.  If you had to do
> eyeball-imploding antics to get Tubes to work well for your use-case,
> being able to have a look at that would help us evaluate whether
> those antics were required by the code, encouraged by misfeatures of
> the API design, or just issues with lack of documentation.

It's mostly me not really documenting anything, not writing tests and
littering it with debug statements (which will go away as soon as I
find time to improve my debugging module so it can monkeypatch them).

http://wpr.cz/ccx/bzr/tubes7/ for my changes only
http://wpr.cz/ccx/bzr/tubes7-merge-2/ for changes on top of the bzr
mirror of the svn branch

example usage:
http://wpr.cz/ccx/paste/2014-08-19/2/
http://wpr.cz/ccx/paste/2014-08-19/3/
http://wpr.cz/ccx/paste/2014-08-19/4/
http://wpr.cz/ccx/paste/2014-08-19/5/


> > So far I wrote relatively simple app that read logfiles, parse them
> > and insert what they got out of them into a database.
> 
> If it's actually reading a file, another nice to-do would be an
> IFount provider that provides the contents of a file with appropriate
> flow control, and maybe a thread or process in the background to do
> the file I/O.  Another thing you could contribute to the branch,
> possibly?  :-)  How did you implement this?

At the moment I don't mind the blockingness of the calls. I did write a
ThreadReader and ThreadWriter though for my earlier tubes-alike with
Queue-based loop.

What is more interesting challenge (and we discussed this earlier on
irc) would be generic async file api. I suggested implementing 9p2000
back then and I still think it is a good starting point... but nothing
I have spare time for at the moment.


> I'm not sure I totally understand the case that you're describing
> right now.  Can you perhaps contribute a unit test which demonstrates
> why this line of code is necessary?

I'd love to, alas I'll be bit preocuppied with some more urgent matters
for following week or two. The short version is "flowStopped just
didn't get passed through the series otherwise".


> Are you running into <https://twistedmatrix.com/trac/ticket/7546>?

Most probably, as far as I can tell from the vague description.


> That ... definitely sounds kind of gross.  As does actually setting
> the nextFount attribute directly on the fan.Out.

Indeed. The point of the experiment was not produce nice code but to
see if there are any major pitfalls using the tubes API.


> twisted.web.client.Agent has a solution to this where there's a
> multi-failure object that aggregates multiple errors into one thing.
> I think we have to do something similar.  Unfortunately this is a
> very confusing interface in addition to being poorly documented and
> relies on private classes that expose ostensibly public attributes.
> We need to very carefully document this within fan.In.

Some nice abstraction of multiple failures would be indeed handy. I'm
pretty sure DeferredList could use one too.

 
> > As for data representation that I choose to pass between each tube
> > I've started with simple namedtuples and following that I've built
> > a simple "datatype" class somewhat reminiscent of
> > https://github.com/hynek/characteristic
> > which I learned of few moments after I finished polishing my own
> > implementation. What I have there is added layer above namedtuples
> > that autogenerate zope Interfaces (so I can have adaptation), do
> > field type and value validation/adaptation and possibly (as a
> > future extension) provide easy way to make them into AMP commands
> > so the series can be split into communicating processes as needed.
> > (What would be interesting imo is something like ampoule for tubes,
> > or perhaps a ThreadTube and SubprocessTube for performing blocking
> > operations)
> 
> I think it's likely we'll acquire a dependency on Characteristic
> sometime soon, I have promised to look at the issues on
> <https://github.com/hynek/characteristic/pull/13> and try to address
> them already :).

What makes me ponder is how to work with multiple types of messages
being passed through. Traditionally in twisted one would use different
methods for handling each one, eg. IRCClient has userJoined, userLeft,
and so on. If we keep tubes as they are with a single received() method
then somehow we need to be able to tell those messages apart,
deconstruct them and mainly document them and test for proper handling
of all cases.

Instinctively I started looking for algebraic data types, but making
those work on python is high-level metaprogramming magic and that
either implies python3.3+ or AST rewriting:
https://github.com/lihaoyi/macropy

Perhaps what would be bearable is AST-based checker (integrated into
testcases perhaps) that would do exhaustiveness and field name checking
for such complex data - so all users of a tube/fount producing some
type woud be flagged whenever the type signature of it changes.

Other possible resolution is to mantain the multi-method approach and
make tubes into pausing mechanism only. I think it could work somewhat
like:

@pauseable
def lineReceived(line):
    ...
    # get reference object of specified interface
    # and wait until it is unpaused
    (yield IIRCClient).userJoined(...)

The first obvious downside of this approach that I see is that we now
need proxy objects for generic fan-in/out.

> > Also maybe of note is the implementation of Pipes in Async library
> > for OCaml which I've been examining lately. What they seem to do
> > there is that they push values downstream and the function called
> > in each processing step may return deferred signifying a pause is
> > requested until this deferred is fired. For those interested in the
> > details you can refer to:
> > https://ocaml.janestreet.com/ocaml-core/111.25.00/doc/async/#Std.Pipe
> > and the relevant section of Real World OCaml book (available
> > online).
> 
> Creating a token for every single call to .receive() makes life
> hard.  Deferred could go to some trouble to be a cheaper token to
> pass around (especially on PyPy) but doing it this way is also
> error-prone as a mistaken error-handler in the Deferred chain means
> that the default behavior of buggy code un-hooks your loop and leaves
> idle data sources that will never be cleaned up.

How does current approach prevent that? From what I see unhandled
exception in non-well written drain can do the very much same. Tubes
are handled specially so it can be prevented there.


> I worked quite a bit with the 'Streams' interface in web2 on Calendar
> Server, and my conclusion there is that while this is better than
> nothing (it was very nice to be able to just return a Stream rather
> than cobble together something that returned NOT_DONE_YET every time)
> it was (A) slow and (B) error prone.  Tubes are designed specifically
> to avoid this error.  Although you can return Deferreds internally,
> no consumer ever needs to write the callback-loop that calls .read()
> again from a callback on .read().

I agree that something like tubes is needed, but it can be a
upper-level layer over something simple as flow-signalling callbacks.

Anyway, linked mostly for inspiration.

What I'd really like to see though is some rationale for current design
choices of tubes - eg. list of reasons the previous attempts failed and
how does each next address the issues. :-)

- ccxcz




More information about the Twisted-Python mailing list