Opened 16 years ago

Closed 15 years ago

Last modified 5 years ago

#1715 enhancement closed fixed (fixed)

move vertex.juice into twisted core, as twisted.protocols.amp (asynchronous messaging protocol)

Reported by: Glyph Owned by:
Priority: highest Milestone:
Component: core Keywords:
Cc: spiv, jknight, Jean-Paul Calderone Branch:
Author:

Description

Juice is an incredibly simple protocol for sending request/response pairs over a wire connection. It is basically the Twisted world's answer to incredibly simple protocols like XMLRPC, except it's connection-oriented (and therefore stateful), and has explicit support for negotiated TLS.

Since this is basically the simplest protocol one can reasonably wrap a complete two-way Deferred around, it would be good for it to be in Twisted rather than Divmod. It is an extremely handy protocol for doing subprocess control messages, for example.

This depends on #302.

Change History (29)

comment:1 Changed 16 years ago by radix

This definitely should not go into twisted.internet. Perhaps twisted.protocols.juice.

Twisted Internet is [source:/trunk/twisted/internet/__init__.py Asynchronous I/O and Events].

Twisted Protocols is [source:/trunk/twisted/protocols/__init__.py A collection of Internet Protocol implementations].

comment:2 Changed 16 years ago by Glyph

Status: newassigned
Summary: move vertex.juice into twisted.internetmove vertex.juice into twisted core

I wish that the packages were just twisted.python ("twisted base") and twisted.internet ("twisted core"), instead of the smattering of stuff that is "core" in persisted/protocols etc.

However! You are correct and changing this status quo would be a bunch of unnecessary busy work, so I've changed the topic appropriately.

comment:3 Changed 16 years ago by Glyph

I've also had the idea to rename Juice when we do this to something a bit more exciting-sounding and indicative of what it does: "AMP", the Asynchronous Messaging Protocol.

comment:4 Changed 15 years ago by Glyph

Summary: move vertex.juice into twisted coremove vertex.juice into twisted core, as twisted.protocols.amp (asynchronous messaging protocol)

comment:5 Changed 15 years ago by spiv

Cc: spiv added

comment:6 Changed 15 years ago by Glyph

In a last-minute coup, Itamar (and exarkun, and radix) declared that including a line-delimited protocol in Twisted which was designed (as AMP is) to be "The Twisted Protocol", i.e. the suggested protocol construction kit, was absolutely unacceptable and it would have to be changed to use length prefixing. As I was unable to convince him otherwise, and buy-in from the Twisted core team is really important to me on this, that conversion is now part of this ticket.

comment:7 Changed 15 years ago by jknight

Cc: jknight added

This seems like a great idea. Something mentioning what distinguishes it from PB/jelly/bannana/etc would be good. I'm not saying I think it isn't distinct, but just that I'm sure that will be the first thing most people knowledgable in twisted will want to know.

A brand new general purpose protocol absolutely needs a protocol specification (more than what's in the class docstring right now), of course.

I'm also confused by all the EncodedString/ListOf/etc stuff. Is that actually part of Amp or is it just random serialization helper functions? And if the second, should it really be in there?

comment:8 Changed 15 years ago by jknight

Also, while I agree that that length prefixing is much better than escaping, I'm worried about the change to <2 byte length>. Do you _really_ want to restrict everything to be at most 65535 bytes? That seems a bit restrictive.

How about something like NetStringReceiver instead?

comment:9 Changed 15 years ago by Jean-Paul Calderone

NetStringReceiver was considered and avoided due to the extra burden of implementation. Int 16 strings can be implemented with a static buffer. NetStrings require heap allocation or more careful bounds checking.

I would love it if this were a stupid pointless concern but I'm not sure it is yet.

comment:10 Changed 15 years ago by jknight

I'm not sure quite what you mean.

if (len > BUF_SIZE) DIE_DIE_DIE() is difficult?

Don't forget this is a protocol development kit. If some _particular_ protocol built on it wants to document a maximum length of attributes, that's of course no problem, but it shouldn't be a base property of this generic protocol family.

comment:11 Changed 15 years ago by Jean-Paul Calderone

if (len > BUF_SIZE) DIE_DIE_DIE() is difficult?

Yes, that's all I'm saying.

Glyph, Chris and I discussed protocol level support for streaming last night. I think this covers cases where one would have otherwise wanted a larger key/value length limit. The gyst is simply that the Python API will have a producer argument type and do the necessary extra work of passing its data, chunked up, after the initial message is accepted.

comment:12 Changed 15 years ago by Jean-Paul Calderone

I didn't explain that very well. Maybe Glyph will add a better description, since he's the one actually implementing it.

comment:13 Changed 15 years ago by jknight

I can't see any upsides to designing a fundamental restriction of this type into the protocol. Saying you can't write a simple if check to see if the length of the data is too long is a really poor argument, and really I don't see how you can even be seriously arguing that position, especially seeing as the exact same check is needed to enforce the arbitrary 256 byte header-name restriction.

Whether or not a producer API is layered on top is a secondary concern. That's certainly a good idea, but doesn't make the above less true.

comment:14 Changed 15 years ago by radix

TooLong should be two different classes, one for keys and one for values. The line count spent on documenting how you're supposed to check whether it's one or the other and why some values are only available sometimes, and the code in repr to do that itself, would probably equal a more sensible separation that didn't have conditionally-set values. And end-user code that handles the different cases in different ways would be shorter.

comment:15 Changed 15 years ago by Jean-Paul Calderone

Hmm, I hadn't considered the point about key (they're keys not header names there's no headers on amp ;) length restriction. I think you're right, these concerns basically balance each other out.

glyph seems to think the presence of nuls throughput the stream is useful, as a means of detecting clients speaking the wrong protocol which connect to a server (they probably won't send nul as the first byte, which a juice client will always currently do) and with nuls in various other places in the stream serving to help avoid damaging buffer overflows. I'm not sure if I agree that either of these is really a useful feature, but he might want to argue for them.

Otherwise I guess I am +0 on switching to netstrings from int16 strings.

comment:16 Changed 15 years ago by Jean-Paul Calderone

Cc: Jean-Paul Calderone added

comment:17 Changed 15 years ago by Glyph

spiv:

  • EncodedString/ListOf were garbage, and they're gone. ;)

radix:

  • Why not two classes, one for local and one for remote? Or five classes, one for local key, one for local value, one for remote key, one for remote value (reserved for future use and different encodings) and two for use by Arguments? Or ... one dynamically generated along with every Argument class! In other words: no. Keeping it as one class was an intentional decision. One exception works fine, and there really isn't anywhere you can handle it anyway; if it's the remote end, the connection's already closed, and if it's local, you should have made sure your values were the right size beforehand anyway.

James, Exarkun:

  • This is a protocol for asynchronous messaging. If you're implementing streaming or some other large-data-moving application, you should be able to interleave unrelated messages over the same connection. Large packets are just not a good idea. Enforcing this at the level of encoding reduces the possibility of non-compliant implementations screwing this up. (I mean, how many _really_ compliant HTTP implementations are there? 3?)
  • if(len > BUF_SIZE) { DIE(); } is easy enough (and, as you say, required by the key length check anyway), but for a protocol like netstrings, you have to do that twice; once with the length prefix and once with the payload. Also you have to check the range of the characters, and whether you're looking at a ":" or not. The fewer places you can make mistakes, the better.
  • That thing exarkun said about NULs. I do think these are useful features, especially the protocol detection. People do write protocols badly out there, and accomodating their foibles is an important part of protocol design. (This accomodates a different set of foibles than the design I originally wanted to go with, but "secure by default" is probably better than "misleadingly easy to get wrong quickly".)
  • If there is quick, unanimous, emphatic support of netstrings here I will not press the point. But this is getting ridiculous: first, it was unacceptable to go with an ASCII encoding because its simplicity is misleading; now, it's bad to go with binary length-prefixing because it's too limiting. I would like support from the core team but I don't want a bike shed here.
  • Just to keep the discussion short: I now consider the defined max value length of 0xffff as non-negotiable. There has to be _some_ maximum, so let's keep it large enough that the overhead of re-encoding the length every so often isn't too great (.003% isn't too bad, or if we must use netstrings, 0.01%) but small enough that anything recognizable as a "file" is chunked appropriately. Anything else is just as arbitrary but less likely to fall on an interesting existing arbitrary boundary.

comment:18 Changed 15 years ago by radix

Why not two classes, one for local and one for remote? Or five classes, one for local key, one for local value, one for remote key, one for remote value (reserved for future use and different encodings) and two for use by Arguments?

You made this spurious and egregious comment yesterday on IRC, and I explained how it was a straw-man: the local and remote cases don't actually change what state *is available* on the class, and don't require extra strange conditionals anywhere you want to access a particular piece of state. isKey and keyName, however, are obviously weird: Look at your own implementation of repr and you can see what I mean.

comment:19 Changed 15 years ago by Glyph

After posting it I realized my last point wasn't too clear. Here's where my reasoning comes from on that:

PB's limitation of 640k is too big. People only hit it once they start putting seriously large, deeply structured data through their applications. It would be far better if you saw size limitation errors once you started putting anything non-trivial; the average photo .JPG is smaller than 640k but bigger than 64k.

90% of users of AMP are not going to read the documentation, they are just going to start using it and expect that it will magically do everything no matter what they pass it. The sooner they interactively experience the size limit, the better. When I was (heh) "commercially supporting" PB, the first thing I would do would be to lower the packet size maximum to around 64k (I think I actually used something ridiculous like 64000 bytes) to see if any size limitation explosions were on the horizon, and they were in every case where I did that.

comment:20 Changed 15 years ago by Glyph

Owner: changed from Glyph to washort
Status: assignednew

You are the LUCKY WINNER.

comment:21 Changed 15 years ago by Glyph

Keywords: review added
Priority: normalhighest

comment:22 Changed 15 years ago by Glyph

(On a more serious note on the TooLong exception-hierarchy issue: I really don't want any more exposed errors than there already are. I do think there is some validity to the subclass-rather-than-attribute idea, and I tried it that way first, but ultimately more deeply nested exceptions are more work, because there needs to be more code and more docs. TooLong is really not the sort of exception you should want fine-grained error reporting on. If there is actually an application that needs better control of this later, we can add in the exception hierarchy and use class vars to maintain compatibility with the current API.)

comment:23 Changed 15 years ago by washort

Owner: changed from washort to Glyph

Looks nice.

Isn't IOPump from test_pb? maybe the version in there should be deleted?

s/propogate/propagate/ in AMPTest.test_unknownCommandLow.

Why's this commented out in AMPTest.test_helloFatalErrorHandling?

        # It's fatal, so we want to log it...
        # self.failUnless(log.flushErrors(DeathThreat))

the docstring for Argument.toBox says:

        @param strings: the dictionary to write to, an AmpBox mapping argument
        names to string values.
        @type strings: AmpBox

        @param objects: the dictionary to read from, an AmpBox mapping argument
        names to Python objects.
        @type objects: dict

I appreciate that AmpBoxes mostly act like dicts, but conflating them like this is confusing. As I understand it, dicts are used for stuff coming from/going to callbacks and responders, and AmpBoxes are for the wire side. So perhaps these should say "The AmpBox mapping argument names to strings to write to" and "The dictionary mapping argument names to Python objects to read from".

Also Argument.fromBox doesn't have @type entries for objects and strings, FWIW.

Other than this stuff, I think this branch looks really good. Once these details are fixed, I'd say it's ready to merge.

comment:24 Changed 15 years ago by Glyph

IOPump is, in fact, a copy of test_pb's IOPump, but it works subtly (read: completely) differently. Changing test_pb's over makes those tests fail.

In fact, the whole sad story of IOPump and its family is the subject of a ticket of its own, #1936, which could probably use a companion ticket, "make everything in Twisted use ONE live-fire and ONE simulated event loop"; exarkun and I can think of 7 implementations of this that have existed in various repositories at various points, not counting implementations of it for web functional testing.

So I'm not going to be dealing with that in this branch.

Some spelling errors fixed.

The comment you mention was deleted, and the docstring for the test clarified.

Your issue with fromBox and toBox's docstrings is right on target: it's not just misleading, it's a clear error. The objects argument is not an AmpBox and never should be.

Thanks for the review. I'm really glad to be merging this, finally.

comment:25 Changed 15 years ago by Glyph

Resolution: fixed
Status: newclosed

(In [17708]) AMP

This is AMP, the Asynchronous Messaging Protocol.

Derived from the Divmod message protocol "JUICE", AMP is a much improved version of this idea. It sports considerably better test coverage (100% lines covered), a number of bufixes, a more efficient (and simpler) wire format using length prefixing rather than newline separators, vastly improved documentation, examples, a better way of specifying responders for commands, a "callRemote"-compatible API, and more.

This change also includes a new test utility module, "twisted.test.iosim", which is a re-hash of the unfortunately repetitive theme throughout the Twisted tests of semi-fake reactors which allow two protocols to be connected. However, this new addition does sport a simple API, producer and consumer support, as well as improved StartTLS negotiation support.

Fixes #1715

Author: glyph

Reviewer: washort

Cast of Thousands: itamar, exarkun, radix, jknight, spiv

comment:26 Changed 15 years ago by jknight

I'll note you never responded to my comment that it needs protocol-level documentation, so I made a new ticket for that. #1957.

comment:27 Changed 15 years ago by spiv

I realise that I'm somewhat late to the party here, but I also agree that supporting arbitrary-size messages, rather than 64k max, is a better trade-off. So you want to support multiplexing multiple asynchronous messages. Great; but don't penalise clients that want don't need this, but may want to transfer largish files or what have you.

Simple clients should be able to send one logical message as one logical message.

Complex clients that want multiple large messages in flight concurrently, and insist on doing so on a single socket, will need to do the chunking into small pieces regardless of whether the protocol has silly restrictions on message size, so having the restriction built into the protocol buys them nothing.

I think the 64k-limit is a poorly chosen tradeoff.

comment:28 Changed 11 years ago by <automation>

Owner: Glyph deleted

comment:29 Changed 5 years ago by hawkowl

Keywords: review removed

[mass edit] Removing review from closed tickets.

Note: See TracTickets for help on using tickets.