[Twisted-Python] Multicast XMLRPC

Fri Aug 25 18:59:23 EDT 2006

Please forgive the top post...I just felt it better this time since the 
discussion has become one of heartbeat.

Well if you look at things like the linux clustering software the 
approach they take is brain dead - each machine pings all the others. If 
you had 2 machines, you would have 2 pings per cycle. With 3 machines, 
you would have 6 and so on.

If you do put an overlay network on top of the physical topology you 
would definitely have something different. And a hub-and-spoke layout 
would give you much more than 2N.

What I did was combine two approaches into a single mechanism. I use 
"gossip" to pass around the state of system as I know it (actually the 
changes in state). I use a probabilistic approach to find the machine to 
poll - I pick one randomly. If that machine doesn't answer I pick P 
other machines asking them to poll the original machine and tell me what 
  they found (with the idea it might be congestion between me and the 
original machine). I find this approach converges to the true state of 
the system within a few polling cycles.

Why do I need to know the state of all the machines? Actually the system 
does "self-repair" and "self-healing". When a node goes down and comes 
back up each node will check the information it knows about the node. 
Some nodes will recognize that the machine that just came back has to 
hold certain data, and tell it. That's the 5 cent answer.

Chaz.

glyph at divmod.com wrote:
> On Fri, 25 Aug 2006 16:33:52 -0400, "Chaz." <eprparadocs at gmail.com> wrote:
>> glyph at divmod.com wrote:
> 
>>> I'm still not really clear on what the application is.
>>
>> The application is a massively scalable data storage system.
> 
> Okay!  Now I know what you're getting at :).
> 
>> This is not a "home application" but an enterprise and/or SSP 
>> application. Most likely it sits behind a firewall and if remote 
>> offices need to access it, they will get access via VPN portals.
>>
>> I guess this is sort of a topology! Doh. All I know is that I have 
>> multicast and with some effort broadcast support.
> 
> OK, that makes more sense.
> 
>>> At any rate, producing a functioning multiunicast prototype with, 
>>> e.g. PB, would be the easiest way to get started if you need to fall 
>>> back to that sort of topology anyway in the case where a multicast 
>>> solution doesn't work.  Then you can collect data and determine how 
>>> much bandwidth is going to be saved in a realistic scenario...
>>
>> So I can use PB with multicast support? How would I deal with all the 
>> target machines getting responses back>
> 
> My point here was really not anything about PB specifically.  Using PB 
> with multicast would require some tricks; you'd have to have a different 
> Broker implementation, probably, and a datagram-based API.  You could 
> still use the underlying message serialization format though.
> 
>> I had thought of the hub-and-spoke model and I designed the system 
>> that way, originally. But I have to respond to instantaneous demands 
>> which caused me to change the design of the system. Each of the 
>> servers can run as both servers (providing a service to a client app) 
>> and an end point (providing storage features). So a hub-and-spoke 
>> architecture are really out of the picture for me (at least I can't 
>> see an easy way).
> 
> I don't see that it's out of the picture - your network topology allows 
> you to fairly effortlessly connect between machines (no need for NAT 
> traversal or "home servers" or any of that garbage: just give an IP on 
> the intranet) - just include the "hub" and "spoke" code in the same 
> process, and then any process can act as a hub... dynamic load-balancing 
> is never easy, but it is certainly a possibility.
> 
>> I could probably do a self-organizing overlay network on top of the 
>> machines taking advantage of how they are connected together (the real 
>> physical topology) but even that presents me with an issue: I want the 
>> system to sort of be self-configuring. As such I don't have a way to 
>> auto-detect connection speeds.
> 
> You can detect connection speeds on the fly; just start doing some work, 
> gather statistics on each connection, and reconfigure if it's not going 
> fast enough.  No need for clock synchronization.
> 
>>>> Today the underlying transport and tomorrow the marshallings.
>>>
>>> It still seems to me like this is backwards.
>>>
>>> The application can be complete, end-to-end, if you start marshalling 
>>> data and sending it over a simplistic (but possibly too-expensive) 
>>> mechanism. (...)
>> That is certainly one way. I tend to think all my hard problems are 
>> going to be transport issues and work up the stack. I have had a share 
>> of algorithm issues too; nothing is quite obvious when you have a 1000 
>> or 10,000 machines to deal with!
> 
> Working up the stack is difficult because you can't measure the working 
> system at any point to decide what you need to optimize.  I prefer to 
> work downwards.  If your highest level of code can remain unchanged 
> while you refactor the underlying layers, then you can run the same 
> tests for the same high-level code with different underlying layers to 
> get an idea of their relative performance.  If you start optimizing at 
> the bottom of the stack before the top is done, then you can easily end 
> up with something which is optimized in the wrong direction, and which 
> requires rewriting when the top layer is done anyway.
> 
> I guess this doesn't really have much bearing on your other questions 
> though.
> 
>>>> Now let me address the issue of TCP. It is a pretty heavy protocol 
>>>> to use. 
> 
>>> I still don't know what you mean by "resources", and as compared to 
>>> what. 
> 
>> By resources I mean memory and time. Granted on a 1GB system with 3 GB 
>> of virtual, memory isn't a big deal, most of the times. But I have 
>> seen memory leaks kill this sucker more times than I care to recall. 
>> Once I ran the application for a few days and saw all my swap being 
>> used! It was very subtle memory leak in one of the libraries (in fact 
>> one library leak consumed 584M in less than one hour!).
> 
> I notice you don't specifically refer to features of TCP here, but 
> instead of the perils of writing any software at all in C/C++ :).  Of 
> course, Python can have memory leaks, but I wouldn't base your 
> architecture around bugs in libraries which will hopefully be 
> unnecessary in the future :).
> 
>> Yes, I completely forgot that I would see them all in parallel. I tend 
>> to overlook Twisted's state machine architecture when I think of 
>> solutions. I am getting better but not quite there yet...
> 
> It might not solve your problem.  But Twisted may be doing quite a lot 
> more work in "parallel" than you're used to.  I can't really say, but 
> I'd be curious to hear about it if you measure it.
> 
>>> (Threads are bad)
>> Yes...see my mea culpa above....it is hard to stop thinking in terms 
>> of threads and processes!
> 
> Yeah, it took me a while to get out of that habit when I started writing 
> Twisted in the first place :).  (The thing that preceded it was a 
> blocking, multithreaded abomination.)
> 
>> Do the games use TCP or UDP? I would have thought they save state 
>> about each of the players in the server and use UDP for message 
>> passing. I thought that was part of the reason most game developers 
>> where interested in STUN?
> 
> They ... vary.  A general rule of thumb is that they use TCP (or 
> something like it) for control messages and data transfer, and then an 
> *unreliable* most-recent-first UDP protocol for transmitting information 
> about physical position, orientation and movement.  Game protocols are 
> incredibly involved because they're typically communicating information 
> about a dozen systems at once.  Game performance is different than 
> typical application performance because quite often you only care about 
> the most recent state of something, and you can happily throw away any 
> old messages.
> 
> The games that are interested in STUN are not MMPs; the reason they are 
> using it is to establish P2P connections so that players don't have to 
> receive their updates from a central server, and you don't need to 
> configure your firewall to play.
> 
>> Bandwidth is a very important issue in this system. No one would run 
>> this on their network if it could bring down their network (or congest 
>> it so badly ...the old packet-storm issue).
> 
> This is another good reason to use TCP.  There are congestion control 
> mechanisms for TCP; you would have to implement something yourself for UDP.
> 
>> As I mentioned earlier, if I was to do normal heartbeat messages with 
>> N machines, I have N! messages moving around. So long as N is small - 
>> a few hundred machines (and I have built machines in the telecom world 
>> that have had 100 machines), the load on the network is reasonable. 
>> But once you have 1000 machines, you have a 1,000,000 messages flying 
>> around. Through the work I did I got the number down to 2,000! And 
>> over 15 seconds, that isn't too bad.
> 
> Why N! messages?  Using a naive hub-and-spoke model it seems like it 
> would just be 2N.  It's only if every node needs to know about every 
> other node that you get up to N!... why would you need that?
> 
> _______________________________________________
> Twisted-Python mailing list
> Twisted-Python at twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
>