Opened 4 years ago

Last modified 4 years ago

#6558 enhancement reopened

LineOnlyReceiver doesn't pass the entire contents of its buffer to lineLengthExceeded()

Reported by: zooko Owned by:
Priority: normal Milestone:
Component: core Keywords:
Cc: zooko, Lukasz Dobrzanski Branch: branches/line-only-receiver-buffer-remnants-6558
branch-diff, diff-cov, branch-cov, buildbot
Author: tomprince

Description

This is analogous to #6536, but this is for LineOnlyReceiver, not for LineReceiver.

Attachments (1)

pass-whole-buffer.patch (2.7 KB) - added by zooko 4 years ago.

Download all attachments as: .zip

Change History (9)

Changed 4 years ago by zooko

Attachment: pass-whole-buffer.patch added

comment:1 Changed 4 years ago by zooko

Keywords: review added

comment:2 Changed 4 years ago by Lukasz Dobrzanski

Cc: Lukasz Dobrzanski added

+1 Tests pass, code looks ok (maybe just break lines test_basic.py:430 test_basic.py:435).

Not a twisted expert so leaving the 'review' label for another pair of eyes.

comment:3 Changed 4 years ago by Tom Prince

Author: tomprince
Branch: branches/line-only-receiver-buffer-remnants-6558

(In [39136]) Branching to line-only-receiver-buffer-remnants-6558.

comment:4 Changed 4 years ago by Tom Prince

(In [39137]) Apply pass-whole-buffer.patch from zooko.

Refs: #6558

comment:5 Changed 4 years ago by Tom Prince

Keywords: review removed
Resolution: invalid
Status: newclosed

I'm not sure if this behavior is compatible. The documentation just says

Called when the maximum line length has been reached. Override if it needs to be dealt with in some special way.

and the current implementation passes in the offending line, or as much of it has been received (without, unfortunately, indicating which case it is). So, it seems to me that passing the entire buffer instead is a signficant change in behavior.

I'm going to close this as invalid, since I don't think this is a bug, as currently stated. I do think the current behavior isn't particularly useful.

Feel free to make a case that this change can be made compatably and re-open this ticket.

Here are some review points that I had written, prior to concluding that this change isn't compatible.

  1. test_lineLengthExceeded seems to be testing multiple cases, and so has multiple assertions.
    • Each test method should test only one case, and should perfarbly only have a single assertion.
    • Please split it into multiple test methods
  2. LineOnlyReceiver.lineLengthExceeded doesn't currently document the behavior your are implementing. Please expand its documentation to reflect the new behavior.
    • lineLengthExceed is called with the entire buffered input.
    • the buffer is cleared before it gets called, so that future data will start a new line.

comment:6 Changed 4 years ago by zooko

Resolution: invalid
Status: closedreopened

tom.prince: Thanks for the review! I think this ticket should be the same as #6536, and that one has more detailed discussion on the ticket, so maybe we should review that one first.

In any case, if we're not going to change [source:trunk/twisted/protocols/basic.py?annotate=blame&rev=38148#L508 LineReceiver] and LineOnlyReceiver to pass the entire contents of its buffer to lineLengthExceeded, then we should change the docstring of [source:trunk/twisted/protocols/basic.py?annotate=blame&rev=38148#L638 lineLengthExceeded] so that it doesn't say that we will.

I approve of this decision! I didn't want to have to remember the entire unconsumed contents of the buffers in my optimization patch (intended for #6357). This will allow that patch to be simpler (and a maybe a microsecond faster too).

Re-opening this so that we can fix the docstrings, so that programmers don't incorrectly think that they're going to get the entire remnants of the buffer as the argument to lineLengthExceeded.

comment:7 Changed 4 years ago by Tom Prince

Well, there are two lineLengthExceed:

  • The one for LineReceiver documents that it sends the entire buffer (and does modulo a missing delimiter if it has received a complete line).
  • The one for LineOnlyReceiver which doesn't document its argument, but always passes the current line as that argument.

So, since the current behaviour is (unfortunately) different, the behaviour going forward should be different (at least without proper deprecation).

comment:8 Changed 4 years ago by zooko

Wait a minute, I'm all for the Twisted policy that we don't change behavior without a deprecation period, but I'm unclear on what the Twisted policy is when the behavior contradicts the docs. Especially when the behavior does almost what the docs that it will do, except for in some cases it doesn't quite. To me, that makes it sound like this is just a bug that nobody wants, rather than a behavior that some user could be depending on.

Let me see if I understand the facts:

  • LineReceiver.lineLengthExceeded says: "The argument 'line' contains the remainder of the buffer, starting with (at least some part) of the line which is too long. This may be more than one line, or may be only the initial portion of the line."
  • LineOnlyReceiver says: "This is purely a speed optimisation over LineReceiver, for the cases that raw mode is known to be unnecessary.", meaning that in those cases the behavior is the same as that of LineReceiver.
  • LineReceiver.lineLengthExceeded does not pass the remainder of the buffer, as it is documented to do, but instead in certain situations omits some characters (the line separator characters), i.e. if the lines were "powergen\nitalia\ncom", then it might pass "powergenitalia\ncom" in certain circumstances.
  • LineOnlyReceiver.lineLengthExceeded does not pass the remainder of the buffer either, but instead passes the current line.

Are we agreed on the above facts?

It sounds like to me there are a couple of reasonable paths forward. We can choose one of these options:

1.a) Keep LineReceiver omitting the separator chars in certain circumstances, and change the documentation to explain that it does so, or

1.b) Change LineReceiver to include the separator chars in all circumstances.

If we choose a), then code which was dependent on on the documentation (i.e. code which depends on LineReceiver delivering the unmangled buffer) may find that in certain circumstances the documentation is wrong, which could cause a bug in that code. If we choose b), then code which was dependent on that behavior (i.e. code which depends on the occasional appearance of the "powergenitalia\ncom" pattern) may find that the newer release of Twisted makes that behavior stop happening, which could cause a bug in that code.

To my mind, we should definitely choose 1.b). Choosing 1.a) would simply be documenting a bug instead of fixing it. That might be appropriate if we think more people are depending on that bug than people are depending on the stated behavior, but in this case it is surely the opposite, right? Nobody needs their buffer to be mangled in this way. If anyone needs anything out of that buffer, then what they need is for it to obey its documented interface of being unmangled.

Then there's a separate question (which doesn't have to have the same answer even though it has the same form):

2.a) Keep LineOnlyReceiver behaving differently from LineReceiver and change the documentation to explain that they behave differently, or

2.b) Change LineOnlyReceiver to behave the same as LineReceiver.

If we choose 2.a), then code which was dependent on on the documentation (i.e. code which depends on LineOnlyReceiver and LineReceiver having the same behavior in the absence of raw mode) may find that in certain circumstances the documentation is wrong, which could cause a bug in that code. If we choose 2.b), then code which was dependent on that behavior (i.e. code which depends on LineOnlyReceiver sending only the line while LineReceiver sends the whole buffer) may find that a newer release of Twisted changes that behavior, which could cause a bug in that code.

In my opinion, this is less clear-cut than the first one, because it is possible that someone actually does depend on LineOnlyReceiver to send just-the-line, and at the same time that same person or some other person may depend on LineReceiver to send the full buffer. On the other hand, it also seems plausible to me that people are depending on the stated contract that LineReceiver and LineOnlyReceiver have the same behavior as one another in the absence of raw mode.

So it sounds like to me we need to choose a) or b) from each of these two issues and then move forward with either changing or documenting the behavior.

Who has the authority to decide what the answers are to those two questions?

Note: See TracTickets for help on using tickets.