[Twisted-Python] Advice Request: Under what circumstances should I use AMP's Command Response field?

Burak Nehbit burak at nehbit.net
Wed Aug 21 12:09:25 MDT 2013


(Only the latter part of this post is related to Twisted. I apologise in advance for the former, and you can skip to that half directly to read about the gist)

Hi Laurens,

> I think you're doing fine. Distributed systems are just kind of hard :-)

This is encouraging to hear!

> It sounds like your fundamentally building an eventually consistent distributed database. We have a few of those already: it might be significantly less work to just use one of them. I suppose it depends why you're trying to make it distributed. 

I have done a superficial research, so far it points to implementing my own solution. Or rather, what I am producing is not a full–fledged eventually consistent database at all, it's a loosely knit bunches of people. There is no 'database' in the classical sense.

> Is this about reliability in the face of e.g. hardware failure, or is this about being able to disseminate data when someone tries to stop you from doing so? Are you trying to protect against byzantine failures too?

It's about the right to free speech, so I would say the latter. I'm from Turkey and currently living in the United States, so while I am protected under First Amendment, my compatriots are not. This is significant: As you might have heard, we had a wave of peaceful protests two months ago against the last green space being demolished in the heart of Istanbul, the Gezi Park, to turn the space into a shopping mall. The riot police attacked unarmed, stationary people with tear gas canisters, which, led to a series of much larger protests (a few million large at its height) and an international outcry. Thousands arrested, five dead, state apparatus pressing charges for everyone from high schoolers to suburban moms. This country is not a banana republic and it's what made these so shocking: Turkey is the 15th largest economy in the world, larger than South Korea in terms of GDP, in the process of joining the European Union etc. 

The reason I'm telling this is what happened afterwards; after the govt. had to stop police violence after the EU / US pressure, they silently began a witch–hunt of Twitter, Facebook users and bloggers they deemed to be 'promoting armed insurgency against the state', and there are currently around 20-25 people currently under detention, waiting for trial. Of course, Twitter and Facebook does not give Turkish government any IP addresses: They just look at people's profiles, and just grab the most likely person having the same name. Our very broken legal system allows for up to five years (!) of contempt of court without pressing any charges, something which the Islamist government really loves to use against its people. 

I am building a tool that allows people to express their opinions without necessarily revealing their identities. It's called Aether, a distributed network that allows its users—all users are anonymous and unregistered by default— to exercise their right to free speech without being endangered by state violence. Everything within it is public, and everything posted on Aether is in public domain. (And please excuse the holier–than–thou sounding copywriting on the webpage—this was my thesis project and it was one of the requirements.)

The backend process of this application runs entirely on Twisted. The business rules are simple. Consider Alice the local node, Bob the remote node, and Carol another remote node. When Bob connects to Alice and gets a list of the posts Alice has, it will request the posts he does not have. The posts Alice has publicly available are the posts Alice has either a) created, or b) upvoted. If Bob, at some point, also likes the post he has gotten from Alice, Bob will also start to publicly distribute that post. At that point, it is impossible to determine whether Alice or Bob has created this post—Alice might as well have gotten the post from a third party which Bob is not aware of. The post, being distributed from two nodes, now has have a higher chance of being found by Carol. If you extrapolate it to a thousand people who all have upvoted the post, it becomes rather impossible to determine the origin. The act of 'liking' something is the exact same thing as sharing something, as is 'creating' something—there is absolutely no difference, and every node only has the IP address of the last ring of the chain. The nodes simply count how many times they encounter the same post digest to determine the amount of upvotes the post has gotten, and they use it to determine the lineup of posts the app shows to its user. Other than that, there is no global database, there is no global state, no people in the entire network is aware of all rest—I just strive to distribute the maximum amount of popular data to maximum amount of people possible. The client application pieces together all this information into a coherent whole of topics, subjects and posts. There is also distribution of user addresses through Aether to allow people to find new nodes to connect to.

There are also other details both in cryptography to defeat a global passive adversary, or detailed business rules to detect and hide abuses from the local user and many other things—this is a large project I have been working on alone for a very long time so some parts of it are rightly esoteric. And I have been off–topic already way longer than acceptable. I have the entire local application finished, and the only remaining part is networking, which is why I am trying to figure out Twisted so hard. Here are a bunch of screenshots for your perusal. Image 1  Image 2  Image 3  Image 4.

> While you can't rely on synchronized clocks (in the wall-clock time sense) in a distributed system, you *can* rely on timestamps of your immutable messages. You could send only message id's in the preceding time window, for example. You can use hash chains to guarantee that the boards share the same history.

I'm just discarding posts timestamped UNIX time that is ahead of the local UNIX time—for this specific purpose, it works well. 

All of those commands are handled through AMP protocol, and so far I am treating AMP like a local protocol with no chance of failure—that won't be the case under a real network. I can serve errors over AMP, but it starts to get very, very complex very fast when you do not have any guarantees on in which order things arrive. There are certain actions I want to forbid if a certain sequence has not been completed with that peer yet, but otherwise the protocol is remarkably flexible, and likewise remarkably pain–inflicting in its implementation. I guess I just want to know if I am using Twisted in this project the sanest way possible—I have enough insanity going on in my project to last a lifetime already!

Sorry about the semi off–topic wall of text, won't happen again.

Thanks,
Burak





On Aug 21, 2013, at 3:28 PM, Laurens Van Houtven <_ at lvh.io> wrote:

> Hi Burak,
> 
> 
> I think you're doing fine. Distributed systems are just kind of hard :-)
> 
> It sounds like your fundamentally building an eventually consistent distributed database. We have a few of those already: it might be significantly less work to just use one of them. I suppose it depends why you're trying to make it distributed. Is this about reliability in the face of e.g. hardware failure, or is this about being able to disseminate data when someone tries to stop you from doing so? Are you trying to protect against byzantine failures too?
> 
> That said, you might want to consider how you communicate posts. Six months worth of posts is a lot. Even with ten posts per day, you'd end up with ~10*30*6 = 1800 hash values. The digest size of BLAKE2 is variable, but if you're using 512 bit digests, that's 64 bytes, or 112.5 kibibytes for the whole thing. That's probably more than you want to send in a single message.
> 
> While you can't rely on synchronized clocks (in the wall-clock time sense) in a distributed system, you *can* rely on timestamps of your immutable messages. You could send only message id's in the preceding time window, for example. You can use hash chains to guarantee that the boards share the same history.
> 
> cheers
> lvh
> _______________________________________________
> Twisted-Python mailing list
> Twisted-Python at twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://twistedmatrix.com/pipermail/twisted-python/attachments/20130821/9c0b7497/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4343 bytes
Desc: not available
URL: <http://twistedmatrix.com/pipermail/twisted-python/attachments/20130821/9c0b7497/attachment-0001.bin>


More information about the Twisted-Python mailing list