Opened 16 years ago

Closed 13 years ago

#1493 enhancement closed fixed (fixed)

static File web module doesn't support byte ranges

Reported by: John-Mark Gurney Owned by:
Priority: normal Milestone:
Component: web Keywords:
Cc: Jean-Paul Calderone, jknight, John-Mark Gurney, brubelsabs Branch: branches/static-file-byte-range-1493-3
branch-diff, diff-cov, branch-cov, buildbot
Author: exarkun

Description


Attachments (5)

twisted.web.static.patch (3.9 KB) - added by John-Mark Gurney 16 years ago.
twisted.web.static.diff (7.0 KB) - added by brubelsabs 14 years ago.
Simple Range-Header support with a simple unit-test.
twisted.web.static.2.diff (13.1 KB) - added by brubelsabs 14 years ago.
Due to the review from jknight, some improvments of the previous patch
twisted.web.static.3.diff (16.8 KB) - added by brubelsabs 13 years ago.
beautified docstrings and testnames as well 2 blanks between methods, and new MalformedHeader Exception derived from Error
twisted.web.static.4.diff (24.4 KB) - added by brubelsabs 13 years ago.
improvements for the suggestions from exarkun

Download all attachments as: .zip

Change History (50)

Changed 16 years ago by John-Mark Gurney

Attachment: twisted.web.static.patch added

comment:1 Changed 16 years ago by John-Mark Gurney

The File class in web/static.py doesn't support byte servering.  The code is
commented out because it is broken.  Attached is a patch that in my testing
fixes it (at least the basic continue an aborted download).

comment:2 Changed 16 years ago by John-Mark Gurney

oh, I forgot to say the patch is against Web 0.5.0 that was part of
TwistedSumo-2005-11-06.

comment:3 Changed 16 years ago by Jean-Paul Calderone

This isn't likely to get applied without a unit test, and since most development
is going towards twisted.web2, a unit test isn't likely to get written unless
you write it.

comment:4 Changed 16 years ago by jknight

Exarkun is correct.

Also, priority -> feature.

comment:5 Changed 14 years ago by therve

#2986 was closed as a duplicate of this one.

Changed 14 years ago by brubelsabs

Attachment: twisted.web.static.diff added

Simple Range-Header support with a simple unit-test.

comment:6 Changed 14 years ago by brubelsabs

Cc: brubelsabs added

I reused the patch above and added a simple unit test the result is in the attachment

Basically, I think the position of some of this code (start stop vars) belongs more to the FilePath class than to the File class, because in the resumeProducing method the real data is read an written. Additionally the request object is passed to this class too.

comment:7 Changed 14 years ago by brubelsabs

Keywords: review added

comment:8 Changed 14 years ago by jknight

Keywords: review removed

You haven't fixed at least two of the serious issues in the original code:

The parsing is fragile (and then protected via a catch-all except, very bogus), and the range returned is wrong (off-by-one on the endpoint).

Changed 14 years ago by brubelsabs

Attachment: twisted.web.static.2.diff added

Due to the review from jknight, some improvments of the previous patch

comment:9 Changed 14 years ago by brubelsabs

Keywords: review added

Thanks jknight! I hope this one is more done the right way. I moved the chunky test from test_web to test_static, where are already some basic test (which I should have seen earlier ;)). I (hopefully) fixed the off-by-one error, and the except clause catching invalid byte-range-specs is just generic to ValueErrors (handling non integer ranges like "bytes=seom-abc"). So I set it back to review..

comment:10 Changed 13 years ago by Glyph

Keywords: review removed
Owner: jknight deleted

Thanks for the patch!

  • SyntaxError is a scary exception class to use for this; it really means that some Python syntax was invalid. It should probably be a custom exception that specifically deals with this case.
  • The failure is logged (good) but there's no test for this logging (not good). You can do this by adding a log observer for the duration of your test. Also, the repr()'d invalid range header (or at least a part of it) should always be included in the log message so that we can see if there is a parsing case we've missed.

The twisted.web code is kind of old, and you followed it consistently which I appreciate, but our coding-standard preferences have shifted since it was originally written:

  • Please put 2 blank lines between methods
  • Test methods should be named test_fooBar not testFooBar.
  • docstring quotes should be on a line by themselves, i.e.
    """
    Like this
    """
    
    not
    """Like this"""
    

Trac doesn't allow me to assign this to you for some reason so I'm reassigning to nobody...

comment:11 Changed 13 years ago by therve

Owner: set to brubelsabs

Changed 13 years ago by brubelsabs

Attachment: twisted.web.static.3.diff added

beautified docstrings and testnames as well 2 blanks between methods, and new MalformedHeader Exception derived from Error

comment:12 Changed 13 years ago by brubelsabs

Keywords: review added

Hmm, though I've to improve again and again, I see this patch gets better and better :-) So set it back to review.

comment:13 Changed 13 years ago by Jean-Paul Calderone

Keywords: review removed

Cool, thanks! :)

Here's some more feedback:

  • in test_web.py, FakeRequest.getHeader has no docstring - could you add one?
  • FakeRequest doesn't document any of its instance attributes. It should document all of them. Could you add documentation for at least the new ones you're adding?
  • in test_static.py, Range should have a class docstring, and probably be named RangeTests to indicate it's a test case.
  • the payload assignment line is wider than 79 cols, it should be split up (you can do this with some parens and string literal concatenation, eg
    foo = ('foo'
           'bar')
    
  • Instead of tempfile.NamedTemporaryFile(), you may want to use file(self.mktemp()). mktemp creates files beneath _trial_temp where it's easy to clean them up, or leave them behind for inspection after the test run is done.
  • You might find self.addCleanup nicer than tearDown. eg, after opening the temp file, self.addCleanup(self.file.close) and after adding the log observer, self.addCleanup(log.removeObserver, self.catcher.append).
  • Also, there's some trailing whitespace in the setUp method
  • _assert_logged should be _assertLogged (or it could be public - there's no API stability guarantees in the test suite)
  • test methods should be test_foo rather than test_Foo
  • test methods should all have docstrings explaining the case they're testing, for example, test_BodyLength might have a docstring like this:
    """
    A correctly formatted range request consisting of a single range beginning at 0 and ending at a length less than the length of the page is responded to with the substring of the render results defined by that range.
    """
      * "... is correct ..." doesn't need to be in any test docstrings, since it is implied that the test is for correct behavior. ;)
      * There's a bit more trailing whitespace in some of the tests
      * in `error.py`,
        * thanks for adding the docstring to `Error.__init__` :)
        * Can you give `MalformedHeader` a class docstring?
        * there should be direct tests for the cases which raise `MalformedHeader` - have some unit tests that call `File._handleRangeRequest` directly and assert it raises the right exception with the right code
      * in `static.py,
        * `_handleRangeRequest` is a bit long:
          * the `ValueError` it handles covers a lot of code - which failure is this trying to catch?  I guess it's covering all the `int` calls?  Is there a unit test for each of these? (I'm being a bit lazy since I need to go eat - I should be checking to find the answer to that :P)
    
    Thanks for your work on this!  I look forward to the next version of the patch. (when you add the review keyword, re-assign the ticket to nobody so everyone recognizes that they could review it)
    
    

Changed 13 years ago by brubelsabs

Attachment: twisted.web.static.4.diff added

improvements for the suggestions from exarkun

comment:14 Changed 13 years ago by brubelsabs

Keywords: review added
Owner: brubelsabs deleted

comment:15 Changed 13 years ago by Jean-Paul Calderone

author: exarkun
Branch: branches/static-file-byte-range-1493

(In [23686]) Branching to 'static-file-byte-range-1493'

comment:16 Changed 13 years ago by Jean-Paul Calderone

(In [23687]) Apply twisted.web.static.4.diff

refs #1493

comment:17 Changed 13 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: set to brubelsabs

I applied the latest patch to the branch referenced above. I made some docstring adjustments and other minor changes.

  • twisted/web/test/test_static.py
    • Where did that payload come from? Is it just random bytes?
    • test_statusCodeRequestedRangeNotSatisfiable and test_invalidStartBytePos don't seem to be correct based on my reading of the RFC. I think they are testing syntactically invalid ranges and so should be returning 200, not 416.
    • is test_multipleRanges just demonstrating an implementation limitation? Presumably some future enhancement can support responding to multiple ranges?
  • twisted/web/static.py
    • variables should be named camelCase instead of with_underscores.
    • It'd probably make sense to do all of the parsing in a separate function from _doRangeRequest.

comment:18 Changed 13 years ago by brubelsabs

Sorry for beeing so long absent :-/, here the answers as far as I can remember...

  • twisted/web/test/test_static.py
    • Yes the payload is just randomly choosen
    • Yes again, you are right. The RFC 2616 just says, if the current extend is overlapped with the bytes ranges, nothing about invalid byte ranges e.g. 20-13, but if bytes:1023-1024 is requested then if the ressource has not so much bytes, only then 416 should be returned. AFAIU
    • Yes the multipleRanges thing demonstrates an implementation limitation. I though a while over the problem, how one could implement multiple byte ranges, but I think then the class FileTransfer cannot be used that easy way as it is done in the moment. I think this is really hard to implement.
  • twisted/web/static.py
    • Sorry for violating still the coding standards, I hope I can do any further updates
    • Yes _doRangeRequest is grown very big, I remember on some coding styles which told: that a method should fill at maximum one screen ;), and this is near the limit, so you right again ;)

ATM I really struggle on my work here, so I've little time right now. I would really appreaciate if some one could do the "rest". If not it will take some time for me to carry on, but I won't let the work be for nothing. (oh my english is.. well you know already ;)) Exarkun, thanks for your review! Maybe next month, I've a little time to do any further work.

comment:19 Changed 13 years ago by Jean-Paul Calderone

Owner: changed from brubelsabs to Jean-Paul Calderone
Status: newassigned

Thanks for your work on this so far brubelsabs, this ticket would be in much worse shape without your help. :) I hope things at work get smoother for you soon.

comment:20 Changed 13 years ago by Jean-Paul Calderone

Branch: branches/static-file-byte-range-1493branches/static-file-byte-range-1493-2

(In [24198]) Branching to 'static-file-byte-range-1493-2'

comment:21 Changed 13 years ago by Jean-Paul Calderone

(In [24200]) comment about this pile of randomness

refs #1493

comment:22 Changed 13 years ago by Jean-Paul Calderone

(In [24201]) change behavior required by test_statusCodeRequestedRangeNotSatisfiable and test_invalidStartBytePos

these are both examples of syntactically invalid byte ranges and should be treated as any other syntactically invalid range header.

refs #1493

comment:23 Changed 13 years ago by Jean-Paul Calderone

(In [24202]) note that really, multiple ranges *should* be supported; this test is just to make sure the single range behavior works when multiple ranges are present in the request.

refs #1493

comment:24 Changed 13 years ago by Jean-Paul Calderone

(In [24204]) simplify the test suite by directly testing parsing logic; remove some redundant tests; refactor the parsing logic into a separate method; remove some other unused stuff

refs #1493

comment:25 Changed 13 years ago by Jean-Paul Calderone

Branch: branches/static-file-byte-range-1493-2branches/static-file-byte-range-1493-3

(In [24205]) Branching to 'static-file-byte-range-1493-3'

comment:26 Changed 13 years ago by Jean-Paul Calderone

Keywords: review added
Owner: Jean-Paul Calderone deleted
Status: assignednew

Alright, ready for someone else to take a look.

comment:27 Changed 13 years ago by jknight

Keywords: review removed
Owner: set to Jean-Paul Calderone

Is the comment you modified in test_multipleRanges ("") even correct at all? Where does the RFC say it's acceptable to return a single range when multiple are asked for? It says you can totally ignore the Range header, but I don't see where it says you can ignore parts of it.

test_invalidStartBytePos is testing completely wrong behavior. Firstly, a range with start greater than resource length isn't syntactically invalid, secondly the RFC is clear about what you're supposed to do with an unsatisfiable range request (return a 416 response).

The _parseRangeHeader doesn't follow HTTP RFC parsing rules w.r.t. whitespace. After every split, you need .strip. Or, use a generic tokenizer.

It seems a little silly to have a 3-line log entry whenever someone sends a random bad http header.

comment:28 Changed 13 years ago by Jean-Paul Calderone

(In [24209]) send 416 when a range starts after the resource ends, fixing test_invalidStartBytePos

refs #1493

comment:29 Changed 13 years ago by Jean-Paul Calderone

(In [24210]) just give up on multiple byte ranges instead of pretending there was juist one

refs #1493

comment:30 Changed 13 years ago by Jean-Paul Calderone

(In [24211]) a test for ranges with whitespace (which passes without changes to the implementation, although only by accident)

refs #1493

comment:31 Changed 13 years ago by Jean-Paul Calderone

(In [24212]) add a test for null elements in the bytes range and make it pass by ignoring all null elements

refs #1493

comment:32 Changed 13 years ago by Jean-Paul Calderone

(In [24213]) shorten the log message for ignored range headers to 1 line

refs #1493

comment:33 Changed 13 years ago by Jean-Paul Calderone

Keywords: review added
Owner: Jean-Paul Calderone deleted

Thanks for the review!

I think I addressed everything. 'test_multipleRanges doesn't exist anymore (unless I've managed to confuse myself), so I interpreted your comment about that function as meaning that I should get rid of test_multiRange and add that case to test_invalidRanges` or that I should implement real multiple range support (I did the former).

comment:34 Changed 13 years ago by Jean-Paul Calderone

(In [24215]) Add a couple more whitespace-related range parsing tests

<foom> exarkun: self.assertEqual(self.resource._parseRangeHeader(' bytes = 1 - 2 '), (1, 2))

refs #1493

comment:35 Changed 13 years ago by ghazel

Looking at static-file-byte-range-1493-3:

File._parseRangeHeader's docstring says "Return a three-tuple of the type, start, and stop value" but it only returns (start, end).

comment:36 Changed 13 years ago by Jean-Paul Calderone

(In [24239]) Correct docstring for _parseTangeHeader

refs #1493

comment:37 Changed 13 years ago by therve

Keywords: review removed
Owner: set to Jean-Paul Calderone

DummyRequest has a 2 getHeader methods

+        range = request.getHeader('range')
+        if range is not None:

range is a builtin, another name would be better.

+        self.assert_(len(self.catcher) == 0,

It should use assertEquals

+        content_length = stop = self.getFileSize()

Please rename it contentLenght.

As all the points are minor, please merge it when corrected. Thanks!

comment:38 Changed 13 years ago by Jean-Paul Calderone

(In [24315]) address review comments

refs #1493

comment:39 Changed 13 years ago by Jean-Paul Calderone

Resolution: fixed
Status: newclosed

(In [24318]) Merge static-file-byte-range-1493-3

Author: brubelsabs, exarkun Reviewer: exarkun, jknight, ghazel, therve Fixes: #1493

Add support for simple Range header requests to twisted.web.static.File. Single byte-range requests will now be honored by that resource. Other range requests will be ignored, as before.

comment:40 Changed 13 years ago by Jean-Paul Calderone

Resolution: fixed
Status: closedreopened

(In [24322]) Revert r24318 - test suite regression on Windows

A number of tests from twisted.web.test.test_static.RangeTests fail on Windows with this revision.

Reopens #1493

comment:41 Changed 13 years ago by Jean-Paul Calderone

(In [24323]) make sure the file gets closed before we try to look at what's in it, because Windows has very weak guarantees about when you'll actually be able to see your data

refs #1493

comment:42 Changed 13 years ago by Jean-Paul Calderone

Keywords: review added
Owner: Jean-Paul Calderone deleted
Status: reopenednew

comment:43 Changed 13 years ago by Colin Alston

Keywords: review removed
Owner: set to Jean-Paul Calderone

Works for me. Tested with wget and FireFox 3.0.1.

It does raise a question about how precise logging will be with this change though. Unless the range start is deducted from the file size for CLF or a combined log format, I don't see how any of Apache's logging standards are accurate...

comment:44 Changed 13 years ago by Jean-Paul Calderone

Resolution: fixed
Status: newclosed

(In [24361]) Merge static-file-byte-range-1493-3

Author: brubelsabs, exarkun Reviewer: exarkun, jknight, ghazel, therve, colin Fixes: #1493

Add support for simple Range header requests to twisted.web.static.File. Single byte-range requests will now be honored by that resource. Other range requests will be ignored, as before.

This re-merge changes the handling of files in the unit tests so that it is correct on Windows, where data does necessarily show up in files after it has been written but before the file has been closed.

comment:45 Changed 11 years ago by <automation>

Owner: Jean-Paul Calderone deleted
Note: See TracTickets for help on using tickets.