Opened 7 years ago

Closed 7 years ago

#4378 defect closed fixed (fixed)

Error handling in NetstringReceiver broken

Reported by: Albert Brandl Owned by:
Priority: normal Milestone:
Component: core Keywords: netstrings
Cc: itamarst, jesstess, Albert Brandl Branch: branches/netstring-errors-4378-2
branch-diff, diff-cov, branch-cov, buildbot
Author: exarkun, brandl

Description (last modified by Jean-Paul Calderone)

The NetstringReceiver should raise a NetstringParseError when it receives an invalid netstring. Valid netstrings have the form '<number>:<data>,', where <number> corresponds with the number of characters in <data>.

This error handling does not work in several cases: The receiver does not raise a NetstringParseError for strings like '3:aa,', '1:a,a' and 'aaa'.

There is another (minor) problem: The doComma method allows to raise a NetstringParseError with additional information about the parsed string. This information (self.__data) does _not_ contain the complete string, since it was truncated by the do_length method.

I'll attach a file with some doctests.

Attachments (7)

test_netstring_receiver.py (1022 bytes) - added by Albert Brandl 7 years ago.
Doctests for NetstringReceiver
ticket_4378.patch (12.2 KB) - added by Albert Brandl 7 years ago.
Here is my attempt to patch the NetstringReceiver.
ticket_4378.2.patch (17.8 KB) - added by Albert Brandl 7 years ago.
Oops - last second change broke a unit test :-(
ticket_4378.3.patch (49.0 KB) - added by Albert Brandl 7 years ago.
Included feedback from comment #12
ticket_4378.4.patch (46.6 KB) - added by Albert Brandl 7 years ago.
Fixed conflicts
ticket_4378.5.patch (1.6 KB) - added by Albert Brandl 7 years ago.
Added tests for _consumeLength, fixed method name, fixed docstring
ticket_4378.6.patch (14.6 KB) - added by Albert Brandl 7 years ago.
Fixed border case for _MAX_LENGTH; improved readability, documentation and unit tests

Download all attachments as: .zip

Change History (45)

Changed 7 years ago by Albert Brandl

Attachment: test_netstring_receiver.py added

Doctests for NetstringReceiver

comment:1 Changed 7 years ago by Glyph

Branch: Twisted-9.0.0

comment:2 Changed 7 years ago by Glyph

Thanks for figuring this out and attaching an executable example :).

In the future, it would be even more helpful if you could attach tests in unit-test format, using any of the numerous examples in twisted/.../test directories as examples. While the example is still quite helpful in quickly reproducing the problem, we can't integrate doctests into twisted itself.

comment:3 Changed 7 years ago by Jean-Paul Calderone

Description: modified (diff)

Fixing description markup.

Changed 7 years ago by Albert Brandl

Attachment: ticket_4378.patch added

Here is my attempt to patch the NetstringReceiver.

comment:4 Changed 7 years ago by Albert Brandl

Owner: changed from Glyph to Albert Brandl

I've added a patch for the Netstring Receiver and its unittests. This is my first attempt to contribute to twisted, so it's likely that I have broken some rules. Please review and comment.

comment:5 Changed 7 years ago by Jean-Paul Calderone

Keywords: review added
Owner: Albert Brandl deleted

comment:6 Changed 7 years ago by jesstess

Author: jesstess
Branch: branches/netstring-errors-4378

(In [28788]) Branching to 'netstring-errors-4378'

comment:7 Changed 7 years ago by jesstess

(In [28789]) Apply ticket_4378.patch

refs #4378

comment:8 Changed 7 years ago by Itamar Turner-Trauring

Cc: itamarst added
Keywords: review removed
Owner: set to jesstess

This is not a full review, just a quick one over:

  1. I'm pretty sure this patch, in its current form, will break in the (still supported) Python 2.4, since it uses the 2.5 generator feature where you pass in data.
  1. Why not initialize the parser in the Protocol's init? Less code in a frequently called path.
  1. stringReceived in docstring violates coding standard, should be a L{} or C{} or something, I forget which. So does received_string variable name.
  1. In dataReceived, when would we expect to get StopIteration to be raised? Quick read of the generator doesn't show anywhere it actually returns, it seems to be in infinite loop.
  1. Using long() is unnecessary, you can just use int().
  1. Hooray for more and better tests! I am uncomfortable about use of transport.disconnecting, but we've already gone down the path of using it elsewhere, and not sure what else you could do, so it's probably fine.
  1. In test_receiveTwoNetstrings(self), perhaps the second string should be different than the first, e.g. "a" and "woo", to catch theoretical bugs where parser repeatedly spits out first thing it parsed.

comment:9 Changed 7 years ago by jesstess

Cc: jesstess added
Owner: changed from jesstess to Albert Brandl

Thanks for the ticket and patch, brandl! I've put the patch in branch netstring-errors-4378. Please make subsequent diffs against that branch.

You can see the build results for the branch here.

Can you address Itamar's review points and then resubmit for review?

comment:10 Changed 7 years ago by Albert Brandl

Cc: Albert Brandl added
Keywords: review added
Owner: changed from Albert Brandl to Itamar Turner-Trauring

I've attached a new patch that does not use new Python 2.5 generator features. The code is not quite as compact as before, but I understand that Python 2.4 support is still necessary.

Please review.

Changed 7 years ago by Albert Brandl

Attachment: ticket_4378.2.patch added

Oops - last second change broke a unit test :-(

comment:11 Changed 7 years ago by Jean-Paul Calderone

(In [29414]) Apply ticket_4378.2.patch

refs #4378

comment:12 Changed 7 years ago by Jean-Paul Calderone

Author: jesstessbrandl
Keywords: review removed
Owner: changed from Itamar Turner-Trauring to Albert Brandl

Thanks for working on this, brandl. Here's a bunch more review feedback. :) Please let me know if anything is unclear.

  1. There are some backwards compatibility issues with the patch:
    1. twisted.protocols.basic had LENGTH, DATA, COMMA, and NUMBER attributes before. These need to be deprecated before they can be removed. twisted.python.deprecate.deprecatedModuleAttribute should be of use here (also note that deprecations must be covered by unit tests as well). Another option is to leave the attributes there, unused, and file a separate ticket for deprecating them.
    2. NetstringReceiver previously had no __init__. Since it is a classic class, this means subclasses could not call an inherited __init__ at all. So if someone has a subclass overriding __init__, it will stop working with this change. Putting the code into connectionMade (or even makeConnection - the former should be called by a subclass if it was overridden, the latter is almost an absolute requirement) instead of __init__ should address this.
  2. Thinking about future backwards compatibility issues:
    1. Do any of the new NETSTRING_... module attributes really need to be public? Also, they might be nicer as NetstringReceiver attributes with the NETSTRING prefix dropped.
    2. _readerState used to be private. This conveniently allowed the implementation change to go from three states to two without even requiring us to stop and think about the compatibility consequences. I think the state attribute should go back to being something private.
    3. Likewise for currentData, payload, and expectedLength.
  3. Miscellaneous style/coding standard issues
    1. A couple places raise IncompleteNetstring, they should probably raise IncompleteNetstring() instead.
    2. Method definitions should be separated from each other by two blank lines
    3. The unit tests should all have docstrings (in a declarative style, test_sendNonString gives a good example) explaining what behavior is being tested. This is to help future maintainers understand why the test exists.
    4. too_long in test_maxReceiveLimit should be tooLong
    5. In _payloadComplete, the second and third lines of the @return docstring aren't indented properly.
    6. Docstrings should start on the line following the opening triple quote. IncompleteNetstring copies the older style used by NetstringParseError and NetstringReceiver. All three should be updated.
  4. Other points
    1. The OverflowError case in _extractLength is untested (I suspect because it cannot happen in Python 2.4 or newer).
    2. I think the > instead _payloadComplete should be a >= (if so, presumably there's also a unit test for the equal case missing).
    3. I wish we had some performance tests for NetstringReceiver. Then I could say something better than I feel like the new parser does a lot more string copying and method calling than the old one and is probably noticably slower. Are interested in writing any kind of benchmarks for this code to determine if this is true or not? This could be done as a separate ticket, since the benchmarks should apply just as well to any implementation of NetstringReceiver.
    4. The TestNetstring class seems to provide a list of received strings. It might be better to make assertions against this list, rather than against what was written to the transport, since then you can tell where one string ends and the next begins.
    5. We track changes in "news fragments", described on <http://twistedmatrix.com/trac/wiki/ReviewProcess#Newsfiles>, if you can add one for this branch, that'd be great.

comment:13 Changed 7 years ago by Albert Brandl

Owner: changed from Albert Brandl to Jean-Paul Calderone

Thanks for the detailed review. I'll try to incorporate your change requests.

Most of your remarks are obvious, but I have some questions:

  1. (ad 3.2): I am not sure how to handle whitespace for the first method definition.

Is it better to write

class A:
    """
    docstring
    """

    def f(self):
    ...

(one blank line between docstring and method definition), or

class A:
    """
    docstring
    """


    def f(self):
    ...

(two blank lines between docstring and first method definition)? The style guide does not specify this special case.

  1. I'll fix the style issues you mentioned throughout the modules that I've changed, not only in the classes that I have touched / created; hope that is ok.
  1. (ad 4.3): Testing the performance is certainly interesting. Is there a framework for this kind of tests? I could obviously create a unittest that fails unless a long string is received in less than <num> milliseconds, but since I don't know the hardware that will be used, determining a value for <num> is not trivial.

I could also implement a script that takes the string length and chunk size as arguments, instantiates a NetstringReceiver and calls its dataReceived until the string is complete. It could then print the time needed for receiving the data. If this is the way to go: Where would such a script be placed?

BTW: Parsing a netstring with 4194305 characters split into 2050 chunks (one for the length specification plus colon, the rest for data and the trailing comma) is finished after 0.07 seconds on my machine (a Pentium 4 with 3.00 GHz) with the patched version. The original version takes about 7.87 seconds. At least in this case, the performance has definitely _not_ decreased ;-).

comment:14 Changed 7 years ago by Jean-Paul Calderone

(ad 3.2): I am not sure how to handle whitespace for the first method definition.

The former.

I'll fix the style issues you mentioned throughout the modules that I've changed, not only in the classes that I have touched / created; hope that is ok.

As long as this doesn't snowball too much. :) It's good to update the old code, but bad to make the patch bigger and harder to review and increase the chances of conflicts with other people's work.

(ad 4.3): Testing the performance is certainly interesting. Is there a framework for this kind of tests?

Not such much, alas. You can find a few feeble attempts in doc/core/benchmarks/ and doc/conch/benchmarks/. There's also lp:twisted-benchmarks which is a bit more frameworky. It's more oriented towards network benchmarks, but not exclusively (the iterations and threads benchmarks don't touch the network, for example).

I could obviously create a unittest that fails unless a long string is received in less than <num> milliseconds, but since I don't know the hardware that will be used, determining a value for <num> is not trivial.

Yea, I'd stay away from that. It's a recipe for spurious failures. :) For now, if we can just have a program that reports a time, that's sufficient. Someday we might set up something like http://speed.pypy.org/ to actually collect and monitor the results of such things.

Thanks again.

comment:15 Changed 7 years ago by Albert Brandl

One more question about 2.1 (moving constants like NETSTRING_LENGTH to NetstringReceiver): Should these instance variables be uppercased, like MAX_LENGTH in NetstringReceiver? Or should I use names like missingLength?

Presumably, uppercased instance variables are used to define "constants", but I didn't find any mention of this in the style guide.

comment:16 Changed 7 years ago by Jean-Paul Calderone

Presumably, uppercased instance variables are used to define "constants", but I didn't find any mention of this in the style guide.

This is correct. Twisted defers to PEP 8 for things not explicitly covered in the style guide.

comment:17 Changed 7 years ago by Jean-Paul Calderone

Owner: changed from Jean-Paul Calderone to Albert Brandl

Changed 7 years ago by Albert Brandl

Attachment: ticket_4378.3.patch added

Included feedback from comment #12

comment:18 Changed 7 years ago by Albert Brandl

Keywords: review added
Owner: changed from Albert Brandl to Jean-Paul Calderone

I've changed my patch according to your feedback (I hope).

  • (1.1) The module attributes LENGTH, DATA, COMMA, and NUMBER are still in the module. I'll create a separate ticket for deprecating them.
  • (1.2) The contents of __init__ were moved to makeConnection, as suggested.
  • (2.1) The new module attributes (NETSTRING_...) were moved to the class.
  • (2.2, 2.3) state was renamed to _state and is thus "private". The same for the other instance variables.
  • (3.1) I raise IncompleteNetstring() now.
  • (3.2, 3.6) The whitespace issues (separation of methods / docstrings) should be fixed. I have also touched the other code in basic.py and test_protocols.py - please check if these fixes collide with other changes on the trunk.
  • (3.3) The unittests have docstrings - please review if they are meaningful.
  • (3.4) too_long was replaced by tooLong.
  • (3.5) The indentation of the docstring for _payloadComplete was fixed.
  • (4.1) I've removed the OverflowError, since Python 2.4 is the oldest supported version. Instead, I check if the string size of the length specification exceeds the string size of self.MAX_LENGTH. If this is the case, I don't even try to convert the string to an integer. This should prevent a client from causing a MemoryError by sending an absurdly long length specification string. I think this resembles the behavior of the old version, which calculated the length one character at a time and raised an NetstringParseError as soon as the digits received so far formed an integer bigger than self.MAX_LENGTH.
  • (4.2)_payloadComplete now uses >= instead of >. Two of the existing unit tests were enhanced to check if this method returns the correct value (and one of them triggers an assertion when >= is replaced by >).
  • (4.3) The module doc.core.benchmarks.netstringreceiver contains a performance test for dataReceived. It takes a number as first argument and a filename for the report as second argument. The number is used for calculating the maximal chunk size and number of chunks (both become 2 <number>). Be careful with values > 15 - my machine started to behave strangely. The remark in comment #13 about the better performance seems still to hold valid, especially for larger values. I think this is due to the use of a cStringIO buffer instead of manipulating plain strings.
  • (4.4) TestNetstring now asserts against received, which really improves the readability of the tests.
  • (4.5) I've tried to create a news fragment, twisted/topfiles/4378.bugfix.

Please check.

(BTW: Is it OK to reassign this issue to you? Or should I only add the 'review' keyword and let others decide who will review the issue?)

comment:19 Changed 7 years ago by Jean-Paul Calderone

I'm a little confused about what the latest patch is against. I tried applying it to the netstring-errors-4378 branch but encountered several conflicts in basic.py and test_protocols.py.

Changed 7 years ago by Albert Brandl

Attachment: ticket_4378.4.patch added

Fixed conflicts

comment:20 Changed 7 years ago by Albert Brandl

No wonder - I didn't do "svn update" before creating the patch :-(. You applied patch #2 a week ago, and this caused the conflict.

Sorry, I assumed that the branch would only be updated once the patch is accepted. Next time I'll know better...

I've attached a new diff, this time against the current version of the branch.

comment:21 Changed 7 years ago by Jean-Paul Calderone

(In [29615]) Apply ticket_4378.4.patch

refs #4378

comment:22 Changed 7 years ago by Jean-Paul Calderone

(In [29616]) Apply the other half of ticket_4378.4.patch

refs #4378

comment:23 Changed 7 years ago by Jean-Paul Calderone

(In [29617]) Fix a few mistakes

  1. Make PARSING_* attributes of NetstringReceiver
  2. Remove the duplicate definition of IncompleteNetstring
  3. Make NetstringReceiver.MAX_LENGTH_SIZE private
  4. Correct the NetstringReceiver docstring to document private things by their proper name
  5. Remove an unused local in NetstringReceiver.dataReceived
  6. Grab length from self._expectedLength in NetstringReceiver._consumeLength

refs #4378

comment:24 Changed 7 years ago by Jean-Paul Calderone

(In [29618]) Use assertTrue instead of failUnless

refs #4378

comment:25 Changed 7 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: changed from Jean-Paul Calderone to Albert Brandl

Thanks. I applied the new patch to the branch and made a few more changes, mostly minor. The _consumeLength change in r29617 was a bit more than minor, so if you could look that over and make sure it makes sense, that'd be very helpful.

If it does make sense, then there's a bit of test coverage missing. The raise on line 212 doesn't appear to be exercised by any of the unit tests.

The only other issue I see now is that I can't actually get the benchmark to run. It fails like this:

exarkun@boson:~/Projects/Twisted/trunk$ python doc/core/benchmarks/netstringreceiver.py 1 foo
Disabled Garbage Collector.
Traceback (most recent call last):
  File "doc/core/benchmarks/netstringreceiver.py", line 244, in <module>
    main(*sys.argv[1:3])
  File "doc/core/benchmarks/netstringreceiver.py", line 236, in main
    npt.testPerformance(int(number))
  File "doc/core/benchmarks/netstringreceiver.py", line 60, in testPerformance
    self.performTest(iteration)
  File "doc/core/benchmarks/netstringreceiver.py", line 168, in performTest
    self.testCombination(chunkSize, numberOfChunks)
  File "doc/core/benchmarks/netstringreceiver.py", line 184, in testCombination
    elapsed = self.receiveData(chunk, numberOfChunks, dataSize)
  File "doc/core/benchmarks/netstringreceiver.py", line 224, in receiveData
    assert self.netstringReceiver.received, "Didn't receive string!"
AssertionError: Didn't receive string!
exarkun@boson:~/Projects/Twisted/trunk$ 

Changed 7 years ago by Albert Brandl

Attachment: ticket_4378.5.patch added

Added tests for _consumeLength, fixed method name, fixed docstring

comment:26 Changed 7 years ago by Albert Brandl

Owner: changed from Albert Brandl to Jean-Paul Calderone

The fix of _consumeLength is fine. It looks as if the merge between my changes and the current branch version re-introduced some old code; I definitely should get accustomed to run the unit tests before submitting patches... :-/

Patch version 5 introduces one more unit test that checks _consumeLength and also triggers the exception by attempting to receive more than MAX_LENGTH data.

Could you give me more information about the problems with the benchmark? It seems to break on the trunk; does it work on the branch? Does it produce meaningful results if you comment out line 224 (the assertion)? Is the assertion also raised if you run the script with a number > 1?

comment:27 Changed 7 years ago by Albert Brandl

BTW: When I run the benchmark on the trunk, I don't encounter any problems. This version uses the "old" NetstringReceiver from the trunk, so the numbers are rather large.


brandl@home:MyTwistedTrunk$ svn info Pfad: . URL: svn://svn.twistedmatrix.com/svn/Twisted/trunk Basis des Projektarchivs: svn://svn.twistedmatrix.com/svn/Twisted UUID des Projektarchivs: bbbe8e31-12d6-0310-92fd-ac37d47ddeeb Revision: 29646 Knotentyp: Verzeichnis Plan: normal Letzter Autor: exarkun Letzte geänderte Rev: 29597 Letztes Änderungsdatum: 2010-07-05 17:51:39 +0200 (Mon, 05. Jul 2010)

brandl@home:MyTwistedTrunk$ time python doc/core/benchmarks/netstringreceiver.py 12 foo Disabled Garbage Collector. The report was written to foo.

real 0m12.026s user 0m4.228s sys 0m7.784s

brandl@home:MyTwistedTrunk$ cat foo +------------+------------------+------------+-----------------+ | Chunk size | Number of chunks | Total size | Time to receive | +------------+------------------+------------+-----------------+ | 1 | 1 | 1 | 0.0001 | | 2 | 1 | 2 | 0.0000 | | 2 | 2 | 4 | 0.0000 | | 4 | 1 | 4 | 0.0000 | | 4 | 2 | 8 | 0.0000 | | 4 | 4 | 16 | 0.0001 | | 8 | 1 | 8 | 0.0000 | | 8 | 2 | 16 | 0.0000 | | 8 | 4 | 32 | 0.0001 | | 8 | 8 | 64 | 0.0001 | | 16 | 1 | 16 | 0.0000 | | 16 | 2 | 32 | 0.0000 | | 16 | 4 | 64 | 0.0000 | | 16 | 8 | 128 | 0.0001 | | 16 | 16 | 256 | 0.0001 | | 32 | 1 | 32 | 0.0000 | | 32 | 2 | 64 | 0.0000 | | 32 | 4 | 128 | 0.0000 | | 32 | 8 | 256 | 0.0001 | | 32 | 16 | 512 | 0.0001 | | 32 | 32 | 1024 | 0.0002 | | 64 | 1 | 64 | 0.0000 | | 64 | 2 | 128 | 0.0000 | | 64 | 4 | 256 | 0.0001 | | 64 | 8 | 512 | 0.0001 | | 64 | 16 | 1024 | 0.0002 | | 64 | 32 | 2048 | 0.0002 | | 64 | 64 | 4096 | 0.0004 | | 128 | 1 | 128 | 0.0000 | | 128 | 2 | 256 | 0.0000 | | 128 | 4 | 512 | 0.0001 | | 128 | 8 | 1024 | 0.0001 | | 128 | 16 | 2048 | 0.0001 | | 128 | 32 | 4096 | 0.0002 | | 128 | 64 | 8192 | 0.0005 | | 128 | 128 | 16384 | 0.0011 | | 256 | 1 | 256 | 0.0000 | | 256 | 2 | 512 | 0.0000 | | 256 | 4 | 1024 | 0.0001 | | 256 | 8 | 2048 | 0.0001 | | 256 | 16 | 4096 | 0.0001 | | 256 | 32 | 8192 | 0.0003 | | 256 | 64 | 16384 | 0.0006 | | 256 | 128 | 32768 | 0.0016 | | 256 | 256 | 65536 | 0.0041 | | 512 | 1 | 512 | 0.0000 | | 512 | 2 | 1024 | 0.0000 | | 512 | 4 | 2048 | 0.0001 | | 512 | 8 | 4096 | 0.0001 | | 512 | 16 | 8192 | 0.0001 | | 512 | 32 | 16384 | 0.0003 | | 512 | 64 | 32768 | 0.0007 | | 512 | 128 | 65536 | 0.0019 | | 512 | 256 | 131072 | 0.0082 | | 512 | 512 | 262144 | 0.0738 | | 1024 | 1 | 1024 | 0.0000 | | 1024 | 2 | 2048 | 0.0000 | | 1024 | 4 | 4096 | 0.0001 | | 1024 | 8 | 8192 | 0.0001 | | 1024 | 16 | 16384 | 0.0002 | | 1024 | 32 | 32768 | 0.0004 | | 1024 | 64 | 65536 | 0.0013 | | 1024 | 128 | 131072 | 0.0047 | | 1024 | 256 | 262144 | 0.0329 | | 1024 | 512 | 524288 | 0.1866 | | 1024 | 1024 | 1048576 | 0.9501 | | 2048 | 1 | 2048 | 0.0000 | | 2048 | 2 | 4096 | 0.0000 | | 2048 | 4 | 8192 | 0.0001 | | 2048 | 8 | 16384 | 0.0001 | | 2048 | 16 | 32768 | 0.0002 | | 2048 | 32 | 65536 | 0.0008 | | 2048 | 64 | 131072 | 0.0022 | | 2048 | 128 | 262144 | 0.0124 | | 2048 | 256 | 524288 | 0.0851 | | 2048 | 512 | 1048576 | 0.4483 | | 2048 | 1024 | 2097152 | 1.9377 | | 2048 | 2048 | 4194304 | 7.9230 | +------------+------------------+------------+-----------------+

comment:28 Changed 7 years ago by Albert Brandl

AARGL - this looked _much_ better in the editor window :-(. Could someone with editing rights please add some braces?

comment:29 Changed 7 years ago by Jean-Paul Calderone

trac comments are immutable :/

comment:30 Changed 7 years ago by Albert Brandl

OK, here is the "quoted" output.

brandl@home:MyTwistedTrunk$ svn info
 Pfad: .
 URL: svn://svn.twistedmatrix.com/svn/Twisted/trunk
 Basis des Projektarchivs: svn://svn.twistedmatrix.com/svn/Twisted
 UUID des Projektarchivs: bbbe8e31-12d6-0310-92fd-ac37d47ddeeb
 Revision: 29646
 Knotentyp: Verzeichnis
 Plan: normal
 Letzter Autor: exarkun
 Letzte geänderte Rev: 29597
 Letztes Änderungsdatum: 2010-07-05 17:51:39 +0200 (Mon, 05. Jul 2010)

 brandl@home:MyTwistedTrunk$ time python
 doc/core/benchmarks/netstringreceiver.py 12 foo
 Disabled Garbage Collector.
 The report was written to foo.

 real    0m12.026s
 user    0m4.228s
 sys     0m7.784s

 brandl@home:MyTwistedTrunk$ cat foo
 +------------+------------------+------------+-----------------+
 | Chunk size | Number of chunks | Total size | Time to receive |
 +------------+------------------+------------+-----------------+
 |          1 |                1 |          1 |          0.0001 |
 |          2 |                1 |          2 |          0.0000 |
 |          2 |                2 |          4 |          0.0000 |
 |          4 |                1 |          4 |          0.0000 |
 |          4 |                2 |          8 |          0.0000 |
 |          4 |                4 |         16 |          0.0001 |
 |          8 |                1 |          8 |          0.0000 |
 |          8 |                2 |         16 |          0.0000 |
 |          8 |                4 |         32 |          0.0001 |
 |          8 |                8 |         64 |          0.0001 |
 |         16 |                1 |         16 |          0.0000 |
 |         16 |                2 |         32 |          0.0000 |
 |         16 |                4 |         64 |          0.0000 |
 |         16 |                8 |        128 |          0.0001 |
 |         16 |               16 |        256 |          0.0001 |
 |         32 |                1 |         32 |          0.0000 |
 |         32 |                2 |         64 |          0.0000 |
 |         32 |                4 |        128 |          0.0000 |
 |         32 |                8 |        256 |          0.0001 |
 |         32 |               16 |        512 |          0.0001 |
 |         32 |               32 |       1024 |          0.0002 |
 |         64 |                1 |         64 |          0.0000 |
 |         64 |                2 |        128 |          0.0000 |
 |         64 |                4 |        256 |          0.0001 |
 |         64 |                8 |        512 |          0.0001 |
 |         64 |               16 |       1024 |          0.0002 |
 |         64 |               32 |       2048 |          0.0002 |
 |         64 |               64 |       4096 |          0.0004 |
 |        128 |                1 |        128 |          0.0000 |
 |        128 |                2 |        256 |          0.0000 |
 |        128 |                4 |        512 |          0.0001 |
 |        128 |                8 |       1024 |          0.0001 |
 |        128 |               16 |       2048 |          0.0001 |
 |        128 |               32 |       4096 |          0.0002 |
 |        128 |               64 |       8192 |          0.0005 |
 |        128 |              128 |      16384 |          0.0011 |
 |        256 |                1 |        256 |          0.0000 |
 |        256 |                2 |        512 |          0.0000 |
 |        256 |                4 |       1024 |          0.0001 |
 |        256 |                8 |       2048 |          0.0001 |
 |        256 |               16 |       4096 |          0.0001 |
 |        256 |               32 |       8192 |          0.0003 |
 |        256 |               64 |      16384 |          0.0006 |
 |        256 |              128 |      32768 |          0.0016 |
 |        256 |              256 |      65536 |          0.0041 |
 |        512 |                1 |        512 |          0.0000 |
 |        512 |                2 |       1024 |          0.0000 |
 |        512 |                4 |       2048 |          0.0001 |
 |        512 |                8 |       4096 |          0.0001 |
 |        512 |               16 |       8192 |          0.0001 |
 |        512 |               32 |      16384 |          0.0003 |
 |        512 |               64 |      32768 |          0.0007 |
 |        512 |              128 |      65536 |          0.0019 |
 |        512 |              256 |     131072 |          0.0082 |
 |        512 |              512 |     262144 |          0.0738 |
 |       1024 |                1 |       1024 |          0.0000 |
 |       1024 |                2 |       2048 |          0.0000
 |       1024 |                4 |       4096 |          0.0001 |
 |       1024 |                8 |       8192 |          0.0001 |
 |       1024 |               16 |      16384 |          0.0002 |
 |       1024 |               32 |      32768 |          0.0004 |
 |       1024 |               64 |      65536 |          0.0013 |
 |       1024 |              128 |     131072 |          0.0047 |
 |       1024 |              256 |     262144 |          0.0329 |
 |       1024 |              512 |     524288 |          0.1866 |
 |       1024 |             1024 |    1048576 |          0.9501 |
 |       2048 |                1 |       2048 |          0.0000 |
 |       2048 |                2 |       4096 |          0.0000 |
 |       2048 |                4 |       8192 |          0.0001 |
 |       2048 |                8 |      16384 |          0.0001 |
 |       2048 |               16 |      32768 |          0.0002 |
 |       2048 |               32 |      65536 |          0.0008 |
 |       2048 |               64 |     131072 |          0.0022 |
 |       2048 |              128 |     262144 |          0.0124 |
 |       2048 |              256 |     524288 |          0.0851 |
 |       2048 |              512 |    1048576 |          0.4483 |
 |       2048 |             1024 |    2097152 |          1.9377 |
 |       2048 |             2048 |    4194304 |          7.9230 |
 +------------+------------------+------------+-----------------+


Looks a bit more readable ;-).

comment:31 Changed 7 years ago by Jean-Paul Calderone

Owner: changed from Jean-Paul Calderone to Albert Brandl

Indeed. Nice formatting. :)

I applied the latest patch to the branch. If I run the benchmark against trunk, then I see results similar to the above. I only get the error when I try running the benchmark in the branch. Sorry I didn't specify that before; even though the paths in the traceback look like trunk, the source being used had the branch merged into it.

I traced through the code a little bit and this appears to be caused by an error in MAX_LENGTH handling. _processLength sets _expectedLength to the length prefix received from the network plus 1. The plus 1 is correct since it accounts for the comma, but it means that if the length prefix received from the network is exactly the MAX_LENGTH, then _expectedLength will exceed MAX_LENGTH and the message will be rejected by _consumeLength. Easily fixed by adding a +1 or a -1 to the right place. Needs a test, though.

One other thing I noticed while looking through the code this time, changes to MAX_LENGTH won't be reflected in _MAX_LENGTH_SIZE, so it's no longer actually possible to raise MAX_LENGTH. Maybe _MAX_LENGTH_SIZE should be a property?

Thanks.

Changed 7 years ago by Albert Brandl

Attachment: ticket_4378.6.patch added

Fixed border case for _MAX_LENGTH; improved readability, documentation and unit tests

comment:32 Changed 7 years ago by Albert Brandl

Owner: changed from Albert Brandl to Jean-Paul Calderone

Thanks for finding and describing this bug. I've added some unit tests for _consumeLength that handle this border case.

I have also renamed some of the instance variables. self._currentData was renamed to self._remainingData, since it holds the portion of the data that have not yet been processed. self._currentDataSize was renamed accordingly.

Calculating _MAX_LENGTH_SIZE on demand is a good idea. I introduced a method for this purpose (this is an old-style class, so I could not use a property). The method configureCombination in the benchmark was adapted accordingly.

The code for extracting the length from the match object was quite ugly. I simplified the regular expression to '(0|[1-9]\d*)(:)', so that there are only two groups that are always non-empty. The method _processLength is much simpler now, because I don't have to check if a group is empty before setting the variables endOfNumber and startOfData.

The definition of _LENGTH_PREFIX now also checks if the length specification starts with more than one '0' character. This allows to catch erroneous netstrings at the earliest point.

Please check.

comment:33 Changed 7 years ago by Jean-Paul Calderone

Keywords: review added
Owner: Jean-Paul Calderone deleted

comment:34 Changed 7 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: set to Jean-Paul Calderone
  1. The length check at the end of _consumeLength appears to be redundant. _consumeLength calls _processLength calls _extractLength. _extractLength does a length check and raises NetstringParseError if it fails, so _consumeLength's check can never fail. A coverage report confirms that the test suite never manages to execute the raise statement in _consumeLength. So I just deleted the check (r29778)
  2. A couple tests have upper case letters where they should have lower case letters. Fixed that in (r29779).

The rest looks good to me. If I get a clean build, I'll merge. Thanks a lot!

comment:35 Changed 7 years ago by Jean-Paul Calderone

Author: brandlexarkun, brandl
Branch: branches/netstring-errors-4378branches/netstring-errors-4378-2

(In [29780]) Branching to 'netstring-errors-4378-2'

comment:36 Changed 7 years ago by Jean-Paul Calderone

Make that clean build.

comment:37 Changed 7 years ago by Jean-Paul Calderone

Resolution: fixed
Status: newclosed

(In [29782]) Merge netstring-errors-4378-2

Author: brandl Reviewer: exarkun Fixes: #4378

Fix several parsing bugs in NetstringReceiver which would cause it to accept invalid inputs as valid. Additionally, greatly improve the performance of NetstringReceiver by buffering with a cStringIO instead of a str.

Also add a benchmark for NetstringReceiver which demonstrates the performance improvement.

comment:38 Changed 6 years ago by <automation>

Owner: Jean-Paul Calderone deleted
Note: See TracTickets for help on using tickets.