[Twisted-Python] AMP message length limit

Oon-Ee Ng ngoonee.talk at gmail.com
Mon Nov 23 18:16:59 MST 2015


On Mon, Nov 23, 2015 at 8:54 AM, Glyph Lefkowitz
<glyph at twistedmatrix.com> wrote:
> I'm sorry that this was an unpleasant surprise.  I wish that we had a better
> way of getting this across up-front :-).  However, it seems like the length
> limit is doing its job in terms of constraining your protocol design to not
> have individual messages "hog" the wire...

Yes, that it did.

> Definitely the latter if you have a short time frame.  How big are your
> messages?  If your limit is still fairly small (5M, let's say) but much
> bigger than 64k there are other options you can use.

I don't foresee it getting over an MB or so (as the data is being read
from disk, so unlikely that network I/O will be the biggest bottleneck
in this case).

>> Questions:-
>> 1. ID is so the client can be sure not to concatenate different lists
>
> This... is correct, but doesn't sound like a question.  Is it meant to be?

Sorry, the real question is whether an ID is at all required. I'm not
using threads, and the concurrent AMP messages will be sent from a
single server process in a loop. Each client is guaranteed to have
only one server. In this situation, do I even need an ID?

> No.  You can tell AMP not to bother generating the protocol-level response
> by setting the requiresAnswer flag on your Command to False:
> <https://twistedmatrix.com/documents/15.4.0/api/twisted.protocols.amp.Command.html#requiresAnswer>

Thanks, right now I just have plenty of return {} everywhere. Does
requiresAnswer=False mean less bandwidth usage (no need to transmit an
empty dict)?

>> 3. Should I attempt to plug as many list items as possible into each
>> page (requires length checking of json-encoded strings and repeated
>> encoding/checks) or just choose a suitable limit of list items (my
>> current max length is about 200 characters and average is 71) of maybe
>> 300 list items per message? My current list is about 1k items in all,
>> and it's only going to get bigger.
>
>
> Why are you encoding as _both_ JSON and AMP?
>
> I'd say you should do the length-checking, because you still might end up
> with list items that are larger than expected if they're variable size.

I'm sending classes over the wire by json-encoding their __dict__.
Although now that you mentioned it, I started doing that because I
believed AMP to be constrained to ASCII strings (before I found
amp.Unicode()) and my classes will almost always have unicode data.
Looks like I can skip a step then, will test that out.

I'm trying not to do length-checking simply because I'm lazy (and
because I'm abstracting out all the twisted parts into an SPClient and
SPServer which handle this data conversion transparently to the
working code). In particular the 'best' ways I can think to do
length-checking is to either:-
1. Binary search for an 'optimal' size just under a limit (50k for
sake of argument)
2. Single check which splits the length by half (300>150>75 etc.)
Both would clutter up the transmission code more than I would like at
this point, and could probably be added in future on transmission side
without any change in recipient side code. So it's on the backburner.

> I would love it if you would help me test out and develop Tubes.  If it is a
> small homegrown app it might be a good use-case.  There are pros and cons:
> Tubes has higher test coverage and cleaner code since it was developed much
> more recently; but, it still has very limited functionality, badly broken
> areas, and no compatibility guarantees, because it's still somewhat
> experimental.
>
> However, Tubes is a way of implementing protocols, whereas AMP is an
> implementation of a request/response protocol.  If you use Tubes, you'll
> need to do an implementation of AMP (or something like it) in order to issue
> requests and give responses.  If I were you, especially since you've already
> figured out paging, I would probably just stick with AMP and Twisted as-is.

That's polite =). I'll keep it in mind. If there's a quick link
somewhere on 'badly broken area's I'd be interested, because without
knowing that it's hard to justify spending time there when I already
have something working with AMP. I especially like the idea of
streaming, but that'd require writing my code to accept data piecemeal
on the other end, and I can foresee that getting very messy very fast.




More information about the Twisted-Python mailing list