[Twisted-Python] Multicast XMLRPC

Sat Aug 26 07:53:13 MDT 2006

Chaz. wrote:

> 
> I started out using Spread some time ago (more than 2 years ago). The 
> implementation was limited to a hundred or so nodes (that is in the 
> notes on the spread implementation). Secondly it isn't quite so 
> lightweight as you think (I've measured the performance).
> 
> It is a very nice system but when it gets to 1000s of machines very 
> little work has been done on solving many of the problems. My research 
> on it goes back almost a decade starting out with Horus.

I must admit to not having attempted to scale it that far, but I was 
under the impression that only the more expensive delivery modes were 
that costly. But by the sounds of it, you don't need me to tell you that.

> 
> Actually I am part of the IRTF group on P2P, E2E and SAM. I know the 
> approaches they are being tossed about. I have tried to implement some 
> of them. I just am not of the opinion that smart people can't find 
> solutions to tough problems.

Ok, in which case my apologies. My reading of your posts had lead me to 
believe, incorrectly, you may not be familiar with the various issues. 
In that case, you can (should) disregard most of it.

> 
> Is multicast or broadcast the right way? I don't know, but I do know 
> that without trying we will never know. Having been part of the IETF 

It's clearly right for some things - I'm just not sure how much 
bi-directional distribution would be helped by it, since you've got at 
some point to get the replies back.

> community for a lot of years (I was part of the group that worked on 
> SNMP v1 and the WinSock standard), I know that when the "pedal meets the 
> metal" sometimes you discover interesting things.

I didn't realise winsock went near the IETF. You learn something new 
every day.

>>
>> Knuth and his comments on early optimisation apply here. Have you 
>> tried it? You might be surprised.
>>
> 
> I am sorry to say I don't know the paper or research you are referring 
> to. Can you point me to some references?

Sorry, it's a phrase from Donald Knuth's (excellent) three-volume 
programming book, "The Art of Computer Programming". Highly recommended.

> 
> Thanks for the information. This is what makes me think that I want 
> something based on UDP and not TCP! And if I can do RMT (or some variant 
> of it) I might be able to get better performance. But, as I said it is 
> the nice thing about not having someone telling me I need to get a 
> product out the door tomorrow! I have time to experiment and learn.

When I wrote my reply I hadn't seen your comment on the app being 
distributed storage.

>> How many calls per second are you doing, and approximately what volume 
>> of data will each call exchange?
>>
> This is information I can't provide since the system I have designing 
> has no equivalent in the marketplace today (either commercial or open 
> source). All I know is that the first version of the system I built - 
> using C/C++ and a traditional architecture (a few dozens of machines) 
> was able to handle 200 transactions/minute (using SOAP). While there 
> were some "short messages" (less than an normal MTU), I had quite a few 
> that topped out 50K bytes and some up to 100Mbytes.

Oh. In which case more or less everything I wrote is useless!

>>
>> Successful transmission is really the easy bit for multicast. There is 
>> IGMP snooping, IGMP querier misbehaviour, loss of forwarding on an 
>> upstream IGP flap, flooding issues due to global MSDP issues, and so 
>> forth.
>>
> 
> I agree about the successful transmission. You've lost me on the IGMP 
> part. Can you elaborate as to your thoughts?

Well, my experience of large multicast IPv4 networks is that short 
interruptions in multicast connectivity are not uncommon. There are a 
number of reasons for this, which can be broadly broken down into 1st 
hop and subsequent hop issues.

Basically, in a routed-multicast environment, I've seen the subnet IGMP 
querier (normally the gateway) get pre-empted by badly configured or 
plain broken OS stacks (e.g. someone running Linux with the IGMPv3 early 
patches). I've also seen confusion for highly-available subnets (e.g. 
VRRPed networks) where the IGMP querier and the multicast designated 
forwarder are different. This can cause issues with the IGMP snooping on 
the downstream layer2 switches when the DF is no longer on the path 
which the layer2 snooping builds.

You also get issues with upstream changes in the unicast routing 
topology affecting PIM.

Most of these are only issues with routed multicast. Subnet-local is a 
lot simpler, though you do still need an IGMP querier and switches with 
IGMP snooping.

> 
> I do know I need EXACTLY-ONCE semantics but how and where I implement 
> them is the unknown. When you use TCP you assume the network provides 
> the bulk of the solution. I have been thinking that if I use a less 
> reliable network - one with low overhead - that I can provide the server 
> part to do the EXACTLY-ONCE piece.
> 
> As to why I need EXACTLY-ONCE, well if I have to store something I know 
> I absolutely need to store it. I can't be in the position that I don't 
> know it has been stored - it must be there.
> 
> Thanks for the great remarks....I look forward to reading more.

This makes a lot more sense now I know it's storage related.

You're right, this is a tricky and uncommon problem.

Let me see if I've got this right:

You're building some kind of distributed storage service. Clients will 
access the storage by a "normal" protocol to one of the nodes. Reads 
from the store are relatively easy, but writes to the store will need to 
be distributed to all or a subset of the nodes. Obviously you'll have a 
mix of lots of small writes and some very large writes.

Hmm.

Are you envisioning that you might have >1 storage set on the nodes, and 
using a different multicast group per storage set to build optimal 
distribution?

You might be able to perform some tricks depending on whether this 
service provides block- or filesystem-level semantics. If it's the 
latter, you could import some techniques from the distributed version 
control arena - broadly speaking, node+version number each file and 
"broadcast" (in the application sense) just the file + 
newnode+newversion to the other store nodes, and have them lock the 
local copy and initiate a pull from the updated node.

For block-level storage, that's going to be a lot harder.

For the multicast, something like NORM, which as you probably know is 
basically forward-error-corrected transmit channel with 
receiver-triggered re-transmits, would probably work. An implementation 
would likely be non-trivial, but a fascinating project.