[Twisted-Python] Question on push/pull producers inter-workings, was : "Is there a simple Producer/Consumer example or tutorial?"

Mon Apr 21 12:06:58 EDT 2008

Jean-Paul Calderone wrote:
> On Mon, 21 Apr 2008 09:52:56 +0200, Gabriel Rossetti 
> <mailing_lists at evotex.ch> wrote:
>> Jean-Paul Calderone wrote:
>>> On Fri, 18 Apr 2008 09:57:35 +0200, Gabriel Rossetti 
>> [snip]
>> What exactly do you mean by :
>>
>> "there's no events which will signal that you can send some more of 
>> the string *except* for
>> the reactor deciding that it is ready for some more"?
>>
>> When I looked at Twisted's code, the difference that I saw was that 
>> if a push producer is used, and if the data to be sent is bigger than 
>> a certain length, it calls producer.pauseProducing()
>
> This is true.  Let's back up for a moment, though.
>
> A pull producer is one which only produces data when it is asked for 
> data.
> The ask-for-data API is resumeProducing.  This means that a consumer 
> which
> is given a pull producer must ask it for data repeatedly until there is
> none left.  The consumer is free to do this in its own pace, and a 
> typical
> efficient way to do this is to ask for more data each time the 
> application
> buffer is empty.
>
> A push producer produces data all the time, until it is asked to 
> stop.  It
> does this at whatever pace it wishes; it might produce a byte each second
> or it might produce a chunk of bytes each time a user interacts with a UI
> somehow or it might produce whatever it reads out of some socket whenever
> it happens to do that.  The consumer is free to ask it to stop at any 
> time
> though.  The API for that is pauseProducing, and in this circumstance,
> resumeProducing delivers the opposite message: it tells the producer that
> it can go back to whatever it was doing.
>
> Does it make sense why only the push producer case has a pauseProducing
> call in it?
yes
>
>>> So that's how you
>>> should decide which of these you want to write - if the consumer is the
>>> only event source involved, as in the large string case, then you 
>>> want a
>>> pull producer (streaming = False);
>>
>> How can the consumer be an event source? The producer is the one 
>> sending the data, maybe I don't get what you mean by "event source".
>
> For example, if the consumer is a socket, then there are at least two 
> events
> which it can generate which are potentially interesting: 
> application-level
> buffer empty and application-level buffer full.  These are good 
> indicators
> that more data should be produced and that no more data should be 
> produced
> (for a while), respectively.
>
>>> if the producer itself is event-driven
>>> in its ability to provide data, then you want a push producer.
>>>
>>
>> I thought the push producer worked like this : if the data is too 
>> big, send part of it and pause the rest, let the reactor breath some, 
>> and repeat. I thought the pull producer was basically like if no 
>> producer was used, one has to take care of any data splitting and 
>> send small parts when the consumer is ready. Is this not correct?
>
> It's often the case that a producer doesn't have all of the data it is 
> going
> to produce when it is first registered with the consumer.  In these 
> cases,
> it is less a matter if splitting up the data and more a matter of knowing
> whether to keep trying to gather more data to give to the consumer.  
> If the
> consumer has indicated that it wants no more data (via 
> pauseProducing), then
> the producer can chill out for a while.  Only when the consumer issues 
> the
> resumeProducing call does the producer need to start getting data 
> again.  For
> TCP connections, this is a pretty good reflection of what goes on at a 
> lower
> level.  If you stop reading from a TCP socket, the remote side has to 
> stop
> sending shortly afterwards.  This is more efficient than letting an 
> unbounded
> amount of data pile up in memory.
>
> If you _do_ already have all of the data that is going to be produced 
> (that
> is, in-memory and as a Python string or other byte buffer object which 
> can
> be used with socket.send), then the only reasons to use a producer are 
> that
> some object that you want to give the data to only supports the producer/
> consumer API so you have no choice but to use a producer, or that you 
> want
> to know when the data has been cleared out of the application-level 
> buffer
> (not necessarily sent over the network, and certainly not necessarily
> received by the peer, but at least no longer buffered in your userspace
> process).  If neither of these apply, you may as well just write the one
> string to the transport all at once.  Since you already had all the data
> in memory, you already payed the resource allocation penalty, so there's
> not really much lost by ignoring P/C.
>
> Hope this helps,
>
> Jean-Paul
>
Thank you Jean-Paul, yes it helps a lot.

In my application, I send xml strings through a server, some may have 
rather large data embedded in them, so the idea for using the 
producer/consumer paradigm was to not congest the server as it acts like 
a proxy if you wish. I though that if I did that, then other clients may 
send data through it while the producer pauses. The server and the 
clients are both using server factories (see 
http://twistedmatrix.com/pipermail/twisted-python/2008-February/016879.html), 
since the client-to-client communication isn't direct, the server needs 
to be able to connect to the end/destination client. To send data, I use 
single-use clients, like described in the twisted documentation. In this 
case, my producer was supposed to be the single-use client and the 
consumer the server factory's protocol instance's (whether it being the 
server or the clients), transport (tcp/ip).

I guess the problem is that like you said, I already have all the data 
in the source client and thus there is no need to use the p/c paradigm. 
I must ask though, when I do transfer large amounts of data, if I 
understood correctly the reactor is busy doing that, and thus no other 
clients can send data until it is done, correct? How must one correctly 
deal with this problem? What happens to the other clients' data that 
they try to send?

Thank you,
Gabriel