[Twisted-web] [Athena] Is ReliableMessageDelivery really necessary?

glyph at divmod.com glyph at divmod.com
Wed Jul 1 18:04:19 EDT 2009


On 10:15 am, spongelavapaul at googlemail.com wrote:
>I've hit a problem as my app has got bigger (about 30-40 widgets now, 
>all chattering roughly once every 2 seconds) where the reliable 
>message delivery mechanism is spiralling out of control. It seems that 
>the constant back and forth means that large 'baskets' of messages are 
>resent. The more this happens, the busier everything gets until the 
>browser becomes unresponsive.

This is unfortunate, but I'm sure it's fixable.  At least, partially. 
Client-server communication, especially in JavaScript, isn't free.
>There's a fix for it: [Divmod-dev] athena duplicate messages issue but 
>I'm slightly concerned about the potential for lost messages - and 
>also confused about how this could happen. Given that HTTP is a 
>reliable connection-oriented transport, where is the gap that messages 
>can fall through?

HTTP is neither reliable nor connection-oriented :).  TCP is reliable 
and connection-oriented, but HTTP builds on top of it to produce 
something which is neither.

"reliable" in this case doesn't mean that the transport is perfect and 
will deliver everything, but that if you send messages "1, 2, 3", you 
will get messages "1, 2, 3" in that order or you will get nothing at 
all.  (Of course you may also get just "1", or "1, 2", but you will 
never get "3, 1, 2".)

Even if HTTP had a way to initiate the delivery of a message over a 
channel that was already busy receiving the response to another message 
(it doesn't) we'd have to contend with the browser APIs for issuing HTTP 
requests, which leave out significant portions of the actual protocol. 
For example, browser javascript may never issue more than two concurrent 
requests to the same host, since the spec says that's all that you can 
do.

So, what is happening here is that have Nevow attempts to implement a 
protocol in terms of HTTP messages as individual, unreliable messages, 
which may be eaten by beasts like transparent proxies and browser 
runtime bugs, and present to your application a stream of messages which 
are always in order and never dropped.  This is, as it happens, 
*exactly* what Orbited does, and Nevow could potentially be implemented 
on top of Orbited.  However, Nevow's implementation has a bug, and over- 
zealously re-delivers messages, when frequently re-delivery is not 
required.  This is rarely a problem except for the noise that it 
generates in your log files and the performance problems that it 
creates, which you've noticed, if your message queue starts to back up.

So, my suggestion to you would be to read through the relevant 
JavaScript code for delivering "baskets" to the server, and try to 
figure out what exactly is happening, and write a patch to correct this 
behavior.  It's not trivial, but it's not rocket science either.  If I 
recall correctly, the problem is that the client will overzealously 
interrupt its own connection to the server where it is sending a basket 
of collected messages, in order to free up the HTTP connection to send a 
*new* message which it has generated.  It would be better if the client 
would allow for a brief (and actually "brief" probably needs to be 
pretty long, in the wild) grace period to allow the HTTP request to be 
fully received and responded to before piling on more work.

Part of the problem here, of course, is that the crappy JavaScript 
browser HTTP API won't let us tell how much of our request has been 
uploaded or process the response as it arrives.  So we have to guess 
what a reasonable timeout would be, rather than have the algorithm 
operate on actual data.

In other words, you're right: the messages are not actually disappearing 
into a black hole :).

As far as what you should do: I think you should try to write a patch. 
It's not trivial, but it's not rocket science either: it's just computer 
science.  Hopefully my description of the problem is accurate enough to 
get you started; I'm sure that if you ask for help on this list or on 
IRC as you're working on it, you will find no shortage of it.  Lots of 
people have reported this problem over the years but nobody has (as far 
as I can tell from searching right now) thought to even report the bug 
as a ticket on divmod.org, let alone contribute a fix for it.
>I think I can cope with lost messages in most cases, so would it be 
>useful to add a kind of 'sendRemote' that was like 'callRemote' but 
>didn't care about a response? Or maybe this already exists and I've 
>missed it?

Could you cope with these messages arriving arbitrarily out of order?  I 
am willing to bet not; it would just make your application extremely 
difficult to test, and it would start spewing exceptions when it started 
to get more heavily loaded, rather than making the browser unresponsive.
>P.S. this app is likely to get more noisy - is it likely that I'll 
>have to abandon Athena for Orbited or similar? I mean, are there 
>architectural differences that will prevent Athena scaling?



More information about the Twisted-web mailing list