[Twisted-Python] Experimenting with tubes

Thu Aug 21 01:36:52 MDT 2014

On Aug 19, 2014, at 3:58 PM, Jan Pobrislo <ccx at webprojekty.cz> wrote:

> On Thu, 14 Aug 2014 14:29:56 -0700
> Glyph Lefkowitz <glyph at twistedmatrix.com> wrote:
> 
>> On Aug 11, 2014, at 6:51 AM, ccx at webprojekty.cz wrote:
>> 
>>> Hello, I've been playing with the new tubes that are being
>>> implemented:
>>> http://comments.gmane.org/gmane.comp.python.twisted/27248
>>> https://twistedmatrix.com/trac/ticket/1956
>> 
>> Thanks so much for taking the time to play with it, and taking some
>> time to write feedback.
>> 
>>> Here are few things that I did with it. I won't publish the full
>>> code now, as in it's current shape it could implode eyeballs of
>>> twisted devs and possibly make them summon some of the elder gods,
>>> but I'll see if I can produce something less vile as I merge the
>>> ongoing changes to the tubes branch.
>> 
>> I'd be interested to see the code nevertheless.  If you had to do
>> eyeball-imploding antics to get Tubes to work well for your use-case,
>> being able to have a look at that would help us evaluate whether
>> those antics were required by the code, encouraged by misfeatures of
>> the API design, or just issues with lack of documentation.
> 
> It's mostly me not really documenting anything, not writing tests and
> littering it with debug statements (which will go away as soon as I
> find time to improve my debugging module so it can monkeypatch them).

You should be able to debug tubes mostly with composition to analyze flows.  If you can't do what you want with that, we should talk :-).

> http://wpr.cz/ccx/bzr/tubes7/ for my changes only
> http://wpr.cz/ccx/bzr/tubes7-merge-2/ for changes on top of the bzr
> mirror of the svn branch
> 
> example usage:
> http://wpr.cz/ccx/paste/2014-08-19/2/
> http://wpr.cz/ccx/paste/2014-08-19/3/
> http://wpr.cz/ccx/paste/2014-08-19/4/
> http://wpr.cz/ccx/paste/2014-08-19/5/

I think I don't understand the purpose of all of these.  Particularly, what is the purpose of TypedTube, since Tube already supports specification of input and output types?

>>> So far I wrote relatively simple app that read logfiles, parse them
>>> and insert what they got out of them into a database.
>> 
>> If it's actually reading a file, another nice to-do would be an
>> IFount provider that provides the contents of a file with appropriate
>> flow control, and maybe a thread or process in the background to do
>> the file I/O.  Another thing you could contribute to the branch,
>> possibly?  :-)  How did you implement this?
> 
> At the moment I don't mind the blockingness of the calls. I did write a
> ThreadReader and ThreadWriter though for my earlier tubes-alike with
> Queue-based loop.
> 
> What is more interesting challenge (and we discussed this earlier on
> irc) would be generic async file api. I suggested implementing 9p2000
> back then and I still think it is a good starting point... but nothing
> I have spare time for at the moment.

The point is not to use a specific implementation.  Rather, the point is to get a single well-documented entry-point within Twisted for asynchronously reading a file so that people can start using it.  Frankly this entrypoint could be a total lie and actually do the I/O synchronously on the main thread, as long as it could be transparently upgraded to being the truth without exposing the change to applications in the future :-).  If we make everyone implement their own read-a-file fount, then there's no hope that future Twisted maintenance could improve their performance.

>> I'm not sure I totally understand the case that you're describing
>> right now.  Can you perhaps contribute a unit test which demonstrates
>> why this line of code is necessary?
> 
> I'd love to, alas I'll be bit preocuppied with some more urgent matters
> for following week or two. The short version is "flowStopped just
> didn't get passed through the series otherwise".

I hope I'll have some time later in the week to investigate this. 

>> Are you running into <https://twistedmatrix.com/trac/ticket/7546>?
> 
> Most probably, as far as I can tell from the vague description.

Well, you'll be glad to know that's also on my to-do list :-).

>> That ... definitely sounds kind of gross.  As does actually setting
>> the nextFount attribute directly on the fan.Out.
> 
> Indeed. The point of the experiment was not produce nice code but to
> see if there are any major pitfalls using the tubes API.

As long as it's clear that this is not really necessary :).

>> twisted.web.client.Agent has a solution to this where there's a
>> multi-failure object that aggregates multiple errors into one thing.
>> I think we have to do something similar.  Unfortunately this is a
>> very confusing interface in addition to being poorly documented and
>> relies on private classes that expose ostensibly public attributes.
>> We need to very carefully document this within fan.In.
> 
> Some nice abstraction of multiple failures would be indeed handy. I'm
> pretty sure DeferredList could use one too.

Yeah, uh, maybe.  Also DeferredList should go away and be replaced with something that doesn't inherit from Deferred, and instead is just a function that returns a new regular-old-Deferred, since the subclassing is entirely unnecessary.  But I digress.

>>> As for data representation that I choose to pass between each tube
>>> I've started with simple namedtuples and following that I've built
>>> a simple "datatype" class somewhat reminiscent of
>>> https://github.com/hynek/characteristic
>>> which I learned of few moments after I finished polishing my own
>>> implementation. What I have there is added layer above namedtuples
>>> that autogenerate zope Interfaces (so I can have adaptation), do
>>> field type and value validation/adaptation and possibly (as a
>>> future extension) provide easy way to make them into AMP commands
>>> so the series can be split into communicating processes as needed.
>>> (What would be interesting imo is something like ampoule for tubes,
>>> or perhaps a ThreadTube and SubprocessTube for performing blocking
>>> operations)
>> 
>> I think it's likely we'll acquire a dependency on Characteristic
>> sometime soon, I have promised to look at the issues on
>> <https://github.com/hynek/characteristic/pull/13> and try to address
>> them already :).
> 
> What makes me ponder is how to work with multiple types of messages
> being passed through. Traditionally in twisted one would use different
> methods for handling each one, eg. IRCClient has userJoined, userLeft,
> and so on. If we keep tubes as they are with a single received() method
> then somehow we need to be able to tell those messages apart,
> deconstruct them and mainly document them and test for proper handling
> of all cases.

Dispatching from a single "received" message to multiple distinct methods based on type is a pretty well solved problem in Python :-).  There is a whole class of design patterns for this which we could apply to Tubes.  I don't think we need this in the first release though; it's easy to implement yourself, there are a few different styles which might be a good idea that we'll need to try out, and many of the examples in the documentation that we've written so far don't require them.

> Instinctively I started looking for algebraic data types, but making
> those work on python is high-level metaprogramming magic and that
> either implies python3.3+ or AST rewriting:
> https://github.com/lihaoyi/macropy

Yeesh.

> Perhaps what would be bearable is AST-based checker (integrated into
> testcases perhaps) that would do exhaustiveness and field name checking
> for such complex data - so all users of a tube/fount producing some
> type woud be flagged whenever the type signature of it changes.

Is this really a substantial enough advantage over, say, a dictionary with types as keys and callables as values, that it would be worth the (frankly insane-sounding) level of complexity involved in its implementation?

> Other possible resolution is to mantain the multi-method approach and
> make tubes into pausing mechanism only. I think it could work somewhat
> like:
> 
> @pauseable
> def lineReceived(line):
>    ...
>    # get reference object of specified interface
>    # and wait until it is unpaused
>    (yield IIRCClient).userJoined(...)
> 
> The first obvious downside of this approach that I see is that we now
> need proxy objects for generic fan-in/out.

Yeah, I, uh, don't quite understand what you're getting at here.

>>> Also maybe of note is the implementation of Pipes in Async library
>>> for OCaml which I've been examining lately. What they seem to do
>>> there is that they push values downstream and the function called
>>> in each processing step may return deferred signifying a pause is
>>> requested until this deferred is fired. For those interested in the
>>> details you can refer to:
>>> https://ocaml.janestreet.com/ocaml-core/111.25.00/doc/async/#Std.Pipe
>>> and the relevant section of Real World OCaml book (available
>>> online).
>> 
>> Creating a token for every single call to .receive() makes life
>> hard.  Deferred could go to some trouble to be a cheaper token to
>> pass around (especially on PyPy) but doing it this way is also
>> error-prone as a mistaken error-handler in the Deferred chain means
>> that the default behavior of buggy code un-hooks your loop and leaves
>> idle data sources that will never be cleaned up.
> 
> How does current approach prevent that? From what I see unhandled
> exception in non-well written drain can do the very much same. Tubes
> are handled specially so it can be prevented there.

That's exactly the point - applications should very rarely need to create new drains or founts, they should be working mostly in terms of tubes, fan.In, fan.Out, protocol founts and drains, and process founts and drains.

But in the case of a buggy drain, protocol founts and tubes can be written to handle the error _and cleanly shut down the whole flow_.

In the Streams (i.e. every-read-returns-a-Deferred) approach, you don't know who your caller is except that they might have added a callback to you.  There's no way to propagate other notifications or inspect the chain for debugging in case of errors.

>> I worked quite a bit with the 'Streams' interface in web2 on Calendar
>> Server, and my conclusion there is that while this is better than
>> nothing (it was very nice to be able to just return a Stream rather
>> than cobble together something that returned NOT_DONE_YET every time)
>> it was (A) slow and (B) error prone.  Tubes are designed specifically
>> to avoid this error.  Although you can return Deferreds internally,
>> no consumer ever needs to write the callback-loop that calls .read()
>> again from a callback on .read().
> 
> I agree that something like tubes is needed, but it can be a
> upper-level layer over something simple as flow-signalling callbacks.
> 
> Anyway, linked mostly for inspiration.
> 
> What I'd really like to see though is some rationale for current design
> choices of tubes - eg. list of reasons the previous attempts failed and
> how does each next address the issues. :-)

Hmm.  It's tough to document these, because there was a lot of experimenting in tubes, a lot of backtracking, some influence from other projects and a lot of parallel invention.  Trying to outline all the things we tried and why they did or didn't work would be extremely time-consuming for us and probably pretty confusing and unhelpful for the reader.

What would you be looking to get from such a write-up?

-g
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/twisted-python/attachments/20140821/78136e2b/attachment-0002.html>