[Twisted-Python] Question on push/pull producers inter-workings, was : "Is there a simple Producer/Consumer example or tutorial?"

Mon Apr 21 10:41:19 EDT 2008

On Mon, 21 Apr 2008 09:52:56 +0200, Gabriel Rossetti <mailing_lists at evotex.ch> wrote:
>Jean-Paul Calderone wrote:
>>On Fri, 18 Apr 2008 09:57:35 +0200, Gabriel Rossetti 
> [snip]
>What exactly do you mean by :
>
>"there's no events which will signal that you can send some more of the 
>string *except* for
>the reactor deciding that it is ready for some more"?
>
>When I looked at Twisted's code, the difference that I saw was that if a 
>push producer is used, and if the data to be sent is bigger than a certain 
>length, it calls producer.pauseProducing()

This is true.  Let's back up for a moment, though.

A pull producer is one which only produces data when it is asked for data.
The ask-for-data API is resumeProducing.  This means that a consumer which
is given a pull producer must ask it for data repeatedly until there is
none left.  The consumer is free to do this in its own pace, and a typical
efficient way to do this is to ask for more data each time the application
buffer is empty.

A push producer produces data all the time, until it is asked to stop.  It
does this at whatever pace it wishes; it might produce a byte each second
or it might produce a chunk of bytes each time a user interacts with a UI
somehow or it might produce whatever it reads out of some socket whenever
it happens to do that.  The consumer is free to ask it to stop at any time
though.  The API for that is pauseProducing, and in this circumstance,
resumeProducing delivers the opposite message: it tells the producer that
it can go back to whatever it was doing.

Does it make sense why only the push producer case has a pauseProducing
call in it?

>>So that's how you
>>should decide which of these you want to write - if the consumer is the
>>only event source involved, as in the large string case, then you want a
>>pull producer (streaming = False);
>
>How can the consumer be an event source? The producer is the one sending the 
>data, maybe I don't get what you mean by "event source".

For example, if the consumer is a socket, then there are at least two events
which it can generate which are potentially interesting: application-level
buffer empty and application-level buffer full.  These are good indicators
that more data should be produced and that no more data should be produced
(for a while), respectively.

>>if the producer itself is event-driven
>>in its ability to provide data, then you want a push producer.
>>
>
>I thought the push producer worked like this : if the data is too big, send 
>part of it and pause the rest, let the reactor breath some, and repeat. I 
>thought the pull producer was basically like if no producer was used, one 
>has to take care of any data splitting and send small parts when the 
>consumer is ready. Is this not correct?

It's often the case that a producer doesn't have all of the data it is going
to produce when it is first registered with the consumer.  In these cases,
it is less a matter if splitting up the data and more a matter of knowing
whether to keep trying to gather more data to give to the consumer.  If the
consumer has indicated that it wants no more data (via pauseProducing), then
the producer can chill out for a while.  Only when the consumer issues the
resumeProducing call does the producer need to start getting data again.  For
TCP connections, this is a pretty good reflection of what goes on at a lower
level.  If you stop reading from a TCP socket, the remote side has to stop
sending shortly afterwards.  This is more efficient than letting an unbounded
amount of data pile up in memory.

If you _do_ already have all of the data that is going to be produced (that
is, in-memory and as a Python string or other byte buffer object which can
be used with socket.send), then the only reasons to use a producer are that
some object that you want to give the data to only supports the producer/
consumer API so you have no choice but to use a producer, or that you want
to know when the data has been cleared out of the application-level buffer
(not necessarily sent over the network, and certainly not necessarily
received by the peer, but at least no longer buffered in your userspace
process).  If neither of these apply, you may as well just write the one
string to the transport all at once.  Since you already had all the data
in memory, you already payed the resource allocation penalty, so there's
not really much lost by ignoring P/C.

Hope this helps,

Jean-Paul