[Twisted-Python] Multicast XMLRPC

Sat Aug 26 19:25:40 EDT 2006

glyph at divmod.com wrote:
> On Sat, 26 Aug 2006 10:14:48 -0400, "Chaz." <eprparadocs at gmail.com> wrote:
> 
>> Right now I am trying to find a solution to an interesting problem: 
>> how to find a file without knowing exactly where it exists in the 
>> network. You have to do this to make the system scale nicely.
> 
>> Basically each node holds information about the files (aka objects) it 
>> stores. I do this so that I don't have a central database any where 
>> (this allows the system to scale differently. With a central database 
>> I would have that set of servers scale differently than the storage 
>> nodes).
> 
>> Now I can build a set of machines that are the distributed database 
>> machines - each storing something - and querying them for where the 
>> file lives; this would narrow the machines I have to directly talk to, 
>> but it feels wrong. This is sort of a variation of the hub-and-spoke 
>> that Glyph talked about. But having said that I am trying to determine 
>> if I can get away from that and just go to a very unstructured 
>> environment (without intermediate database nodes).
> 
> This sounds an awful lot like a distributed hashtable.  It does 
> implicitly use an overlay network, but not a hub-and-spoke overlay network.
> 
> I'm not intimately familiar with the algorithms involved, so rather than 
> try to describe them, I'll just refer you to the relatively nice 
> wikipedia page on the topic:
> 
>    http://en.wikipedia.org/wiki/Distributed_hash_table
> 
> There is also a project in Python (not Twisted though) which may serve 
> as an example:
> 
>    http://thecircle.org.au/
> 
> Are these ideas useful?  Have you looked at them before?
> 
> _______________________________________________
> Twisted-Python mailing list
> Twisted-Python at twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
> 

As I understand DHT the concept is to create a hash identifier, 
partition it into "chunks", and use the chunks to locate the file. It is 
an interesting idea and certainly one approach. I am keeping it in my 
back pocket.  There are many reasons I don't like this approach.

First, with a poorly segmented hash, you can have a few levels of 
indirection before reaching the file. You can see this in a lot of p2p 
file sharing system. I would like to see if I can overcome this 
performance penalty (another problem is DHT works well in a very sparse 
environment, so the hash keys have to be pretty big. That means more 
intermediate nodes).

The second issue is one unique to data storage systems: I need to have 
multiple copies of the file around. So I had thought if I do a DHT I 
will just keep copies all along the path. That should solve the problem 
of access quickly and copies.

The third issue - and this one I had more difficulty grasping - is that 
once an intermediate node disappears, its contents have to be passed on 
to someone else. Also the link from the prior node to this one (the one 
going away) has to be adjusted. What is the problem? It is quite 
possible that the node would have millions of files on it, hence copying 
it is impossible. That means I have to keep exact copies at multiple 
sites, at the same time (definitely smaller than the entire space of all 
the peers).

But the real problem is that in a network of 1000s of machines it is 
quite possible the the two I am using to store indices on can disappear 
at the same time (granted small, but still a problem). So I opted to 
look at another approach, the one that I started talking about - using 
broadcast or multicast with some sort of RPC-like mechanism and light 
weight protocol applied over a lot of machines.

This approach hasn't been well researched, almost being excluded out of 
hand. I decided it was at least worth investigating. It solves some 
problems like scalability and easy management. The downside is that I 
have to worry about building a lightweight protocol and handle RPC like 
AT LEAST ONCE semantics instead of EXACTLY ONCE.

Glyph, thanks for the references. I will definitely look up 'thecircle' 
stuff. That one I didn't know about!

Peace,
Chaz