Opened 11 years ago

Last modified 12 months ago

#585 enhancement new

Include sendfile(2) support into defaultreactor

Reported by: PenguinOfDoom Owned by: therve
Priority: normal Milestone:
Component: core Keywords:
Cc: radix, exarkun, spiv, itamarst, PenguinOfDoom, therve, tobias.oberstein@… Branch: branches/sendfile-585-11
(diff, github, buildbot, log)
Author: therve Launchpad Bug:

Description


Change History (61)

comment:1 Changed 11 years ago by PenguinOfDoom

I've done a quick hack to this effect. It seems that sendfile(2)-enhanced
twisted.web is significantly closer to Apache speeds on large (5MB) files.
Benchmarks will be posted once somebody figures out how to properly measure the
speed.

comment:2 Changed 11 years ago by glyph

can you upload a patch, even if it's a nasty hack?

comment:3 Changed 10 years ago by spiv

Where's the patch?

comment:4 Changed 8 years ago by exarkun

Yea where's the patch?

comment:5 Changed 8 years ago by therve

  • Cc therve added

comment:6 Changed 8 years ago by therve

  • Owner changed from PenguinOfDoom to therve

comment:7 Changed 8 years ago by therve

For people interested, I've try to update PenguinOfDoom's work in sendfile-585, and it seems to work fine.

comment:8 Changed 8 years ago by jknight

Not having looked at the patch yet, some test suggestions:

1) If the sendfile syscall is present, but fails, the code deals sanely (fallback to read/send, I suppose, or else some kind of notification?)

1b) what if it fails because the file is unreadable or has an I/O error of some kind?

2) SSL sockets don't support sendfile

3) it works, or at least, doesn't blow up on *BSD (which has a sendfile, with a different signature)

comment:9 Changed 8 years ago by exarkun

From a quick skim of the branch, a couple thoughts:

  • there should be a new interface including sendfile (if having a sendfile method is the right way to expose this). hasattr/getattr calls like the one in static.py result in bugs
  • fallback should be provided if the syscall is missing too (in addition to the case where it's present but fails), so code doesn't end up with checks littered all over.

comment:10 Changed 8 years ago by therve

Replying to jknight:

Not having looked at the patch yet, some test suggestions:

Thanks.

1) If the sendfile syscall is present, but fails, the code deals sanely (fallback to read/send, I suppose, or else some kind of notification?)

For now, that closes the connection.

1b) what if it fails because the file is unreadable or has an I/O error of some kind?

Same as above.

I'm interested to know what behavior we could choose.

2) SSL sockets don't support sendfile

Yes that's disabled, even thought I'm not really happy about the mecanism (value of the sendfile attribute)

3) it works, or at least, doesn't blow up on *BSD (which has a sendfile, with a different signature)

I didn't do that yet, but it's interesting.

comment:11 Changed 7 years ago by therve

  • author set to therve
  • Branch set to branches/sendfile-585-2

(In [22235]) Branching to 'sendfile-585-2'

comment:12 Changed 7 years ago by therve

(In [22239]) Docs and cleanups.

Refs #585

comment:13 Changed 7 years ago by therve

  • Branch changed from branches/sendfile-585-2 to branches/sendfile-585-3

(In [23312]) Branching to 'sendfile-585-3'

comment:14 Changed 7 years ago by exarkun

  • Component changed from web to core

This is cool. I am going to call it a core ticket, though, since it's about extending the reactor interface.

comment:15 Changed 5 years ago by therve

  • Branch changed from branches/sendfile-585-3 to branches/sendfile-585-4

(In [28626]) Branching to 'sendfile-585-4'

comment:16 Changed 4 years ago by <automation>

  • Owner therve deleted

comment:17 Changed 3 years ago by therve

  • Branch changed from branches/sendfile-585-4 to branches/sendfile-585-5

(In [34562]) Branching to 'sendfile-585-5'

comment:18 Changed 3 years ago by therve

  • Owner set to therve

comment:19 Changed 2 years ago by therve

  • Keywords review added
  • Owner therve deleted

I believe this is ready to review. The tests are meant to be relatively generic, and could be integrated later on with ITCPTransport tests. The main thing I don't like are the FileDescriptor changes, but I don't see a way around it.

Here are the test builds:

http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/sendfile-585-5

comment:20 Changed 2 years ago by exarkun

  • Keywords review removed
  • Owner set to therve

Thanks. Interesting stuff. Some thoughts:

  1. write, writeSequence, sendfile. One of these things is not like the others...
  2. How about just depending on http://code.google.com/p/pysendfile/ instead of shipping a new extension module?
  3. twisted/internet/_sendfile.py, in SendfileInfo.__init__, 0 is rejected as a file descriptor, but it's a completely valid file descriptor.
  4. The cork changes look somewhat unrelated? Maybe that's good for a separate ticket (although I think cork is for C programmers where buffering in user space is hard... but I could be missing something). The tests should definitely be in twisted/internet/test/, in any case.
  5. Perhaps the interface doesn't need to specify sendfile(2). For example, Windows has TransmitFile.
  6. There's some missing test coverage in abstract.FileDescriptor.doWrite. It's not exactly at the points where you made changes, but it's close enough that it worries me.
    1. Also, there's no reason to support all of the insane return values that writeSomeData is allowed to have for the new doSendfile method. It should return something much simpler instead.
  7. I want to suggest something about the factoring the fallback code in sendfile.
  8. SendfileProducer is a lot like FileSender with an extra feature. Perhaps there's an opportunity for some code re-use here. Also, SendfileProducer has no direct unit tests.
  9. What if doSendfile were factored into another producer implementation too, instead of being on a FileDescriptor subclass?

comment:21 Changed 2 years ago by therve

  • Keywords review added
  • Owner therve deleted

Thanks for the review! I think it helped fixing some major inconsistencies and cleaning it up.

  1. Renamed to writeFile
  1. I have several reasons not to like it. Embedding it works out of the box. There is only one release of that project, so I don't want to end up like our old kqueue support. There are several sendfile libraries out there, so it may be hard to find the proper one.
  1. Fixed.
  1. Removed and opened #5782.
  1. Removed the mention. I'll open a bug about Windows if you care.
  1. I removed a line. with 6.1 refactoring the sendfile section quite a bit.

7.1 I removed SendfileProducer and the extra feature that I didn't use nor test.

7.2 I moved it under the SendfileInfo class.

comment:22 Changed 2 years ago by therve

  • Summary changed from Include sendfile(2) support into defaultreactor and make twisted.web use it to Include sendfile(2) support into defaultreactor

comment:23 Changed 2 years ago by therve

  • Branch changed from branches/sendfile-585-5 to branches/sendfile-585-6

(In [35063]) Branching to 'sendfile-585-6'

comment:24 Changed 2 years ago by rodrigc

I am new to Twisted, and am participating in the Python sprint in San Francisco,
on August 19, 2012.

sendfile() has different implementations on different platforms.
The code you wrote uses the 4-argument version of sendfile() which is on Linux.
You should make it clear that this implementation is Linux-only.

For your reference, here are the man pages for sendfile on Linux, MacOS X, and FreeBSD.

Linux, sendfile has 4 arguments
http://man7.org/linux/man-pages/man2/sendfile.2.html

MacOS X, sendfile has 6 arguments
https://developer.apple.com/library/mac/#documentation/Darwin/Reference/Manpages/man2/sendfile.2.html

FreeBSD sendfile(2), sendfile has 7 arguments
http://www.freebsd.org/cgi/man.cgi?query=sendfile&sektion=2

In addition, on Windows, there is a TransmitFile
http://msdn.microsoft.com/en-us/library/windows/desktop/ms740565%28v=vs.85%29.aspx

If you want to look at a sendfile() compatibility layer, you can take
a look at the ACE C++ library which has some code for sendfile() on Linux
and Windows:

https://svn.dre.vanderbilt.edu/viewvc/Middleware/trunk/ACE/ace/OS_NS_sys_sendfile.inl
https://svn.dre.vanderbilt.edu/viewvc/Middleware/trunk/ACE/ace/OS_NS_sys_sendfile.cpp
https://svn.dre.vanderbilt.edu/viewvc/Middleware/trunk/ACE/ace/OS_NS_sys_sendfile.h

That implementation has code which falls back to using mmap(). You may not want
to do that, but at least it gives you an idea.

If you add comments to your code to indicate that the Linux sendfile() is being
used, then we can file other tickets to get this code to work on
MacOS X, FreeBSD, and Windows, later on.

Other than that, I think we should just commit your code to Twisted, after
getting approval from the other Twisted reviewers.

comment:25 Changed 2 years ago by therve

The branch already implements the sendfile compatibility between Linux, OS X and FreeBSD. We can possibly implement the Windows optimization later on.

comment:26 Changed 2 years ago by dreid

  • Owner set to dreid
  • Status changed from new to assigned

comment:27 Changed 2 years ago by dreid

  • Keywords review removed
  • Owner changed from dreid to therve
  • Status changed from assigned to new

Thanks for working on this. I'm probably not the best qualified to review it but you can blame glyph for that.

  • twisted/internet/abstract.py does not merge cleanly.
  • DragonFly is unsupported. There is no easy way to determine if the DragonFly checks are correct and the Extension condition doesn't even check for sys.platform == 'dragonfly' so this will never even get built on DragonFly.
  • _SendfileMarker = Exception() - Wat.
  • writeFile just gets added to ITCPTransport. You can't change interfaces. You can only add new interfaces. ISendFileTransport perhaps.
  • writeFile doesn't document it's argument or return value.
  • SendfileIntegrationMixin's docstring references C{writeFile} this should be an L{} to writeFile instead.

comment:28 Changed 2 years ago by therve

  • Branch changed from branches/sendfile-585-6 to branches/sendfile-585-7

(In [35787]) Branching to 'sendfile-585-7'

comment:29 Changed 2 years ago by therve

  • Keywords review added
  • Owner therve deleted

Alright, everything handled I think.

comment:30 Changed 2 years ago by rodrigc

I attended the Twisted Sprint in San Francisco on October 20.
As part of my learning process for Twisted, I looked at this ticket.

Since I have access to a FreeBSD 9 machine, I tested the following on FreeBSD 9.
I did the following on my FreeBSD 9 box:

(1) Checked out this branch: svn://svn.twistedmatrix.com/svn/Twisted/branches/sendfile-585-7

(2) Ran: python setup.py build

(3) Ran: python build/scripts-2.7/trial twisted

(4) Output of results are here: http://home.comcast.net/~rodrigc/twisted/trial-sendfile-585-7.freebsd-9.1-PRERELEASE-i386-2.7.txt

I then did the following:

(1) Looked to see when sendfile-585-7 branched from trunk: svn log --stop-on-copy svn://svn.twistedmatrix.com/svn/Twisted/branches/sendfile-585-7

(2) Saw that sendfile-585-7 branched from trunk at GRN 35786

(3) Checked out trunk@35786

(4) Ran: python setup.py build

(5) Ran: python build/scripts-2.7/trial twisted

(6) Results are here: http://home.comcast.net/~rodrigc/twisted/trial-freebsd-9.1-PRERELEASE-i386-2.7_trunk@38786.txt

So basically it looks like on FreeBSD, no new test failures were introduced in the sendfile branch.
Also, the sendfile tests seem to run properly and pass.

This looks good.

comment:31 Changed 2 years ago by rodrigc

To learn about running Twisted under Windows 7, I did the following:

SETUP STEPS

(1) Followed instructions at http://blog.ying.li/2012/07/twisted-on-windows-using-activestate.html

to install ActiveState Python.

(2) Followed instructions from same blog posting, but installed Microsoft Visual C++ 2008 Express Edition.

The compiler version MUST match the compiler version used to
compile the Python interpreter. There is code in Python's distutils package
which needs this, otherwise setup.py will not work when compiling native code.

(3) Downloaded SVN client from http://sliksvn.com

TESTING STEPS

(1) Checked out this branch: svn://svn.twistedmatrix.com/svn/Twisted/branches/sendfile-585-7

(2) Ran: python setup.py build

(3) Ran: python build/scripts-2.7/trial.py twisted

(4) Output of results are here: http://home.comcast.net/~rodrigc/twisted/trial-windows7_sendfile-585-7.txt

I then did the following:

(1) Looked to see when sendfile-585-7 branched from trunk: svn log --stop-on-copy svn://svn.twistedmatrix.com/svn/Twisted/branches/sendfile-585-7

(2) Saw that sendfile-585-7 branched from trunk at GRN 35786

(3) Checked out trunk@35786

(4) Ran: python setup.py build

(5) Ran: python build/scripts-2.7/trial.py twisted

(6) Results are here: http://home.comcast.net/~rodrigc/trial-windows7_trunk@38786.txt

Looks good on Windows. No new regressions introduced.
Some of the sendfile tests on Windows are executed:

twisted.internet.test.test_sendfile

SendfileInfoTestCase

test_invalidFile ... [OK]
test_lengthDetection ... [OK]

I was confused by this at first, because the code in this branch does not
implement the TransmitFile() system call under Windows (which is the Windows equivalent to sendfile()). However,
after looking at the testcase code with Ying and Glyph's help at the Sprint,
I learned that this test only executes the "platform-independent" tests which
do not actually use the sendfile() system call.

This all looks really good, you have done a really good job.

You have gone through a few iterations of code reviews, and I will look at the code next.

This will take me some time though, since I am new to Twisted, and need to
become more familiar with the Twisted coding stanards and code review procedures.

Thanks.


comment:32 Changed 2 years ago by rodrigc

After running the tests on FreeBSD and Windows,
I looked over the changes in branches/sendfile-585-7 . Everything looks quite good. Thanks for doing this work.
I have some review comments, but they fall into the category of "typo",
so nothing major.

Here are my review comments:

twisted/python/test/test_sendfile.py
=====================================
(1) Take the date out of the Copyright, and make it

like other copyright notices in the code base.

(2) In def test_afterSend(self): , change the comment from:

Verify that a send after sendfile behaves correctly.

to:


Verify that a send followed by sendfile behaves correctly.

because in the test you are actually calling send(), then
sendfile().

twisted/internet/test/test_sendfile.py
======================================

(1) In def_testinvalidFile(self): , fix typo:

file, and a C{ValueError} when passwed a closed file.

to:

file, and a C{ValueError} when passed a closed file.

twisted/python/_sendfile.c
==========================

You have this:

#if defined(FreeBSD)
....
#else
#if defined(APPLE)
....
#else
...
#endif
#endif

You should change it to something like:

#if defined(FreeBSD)
...
#elif defined(APPLE)
...
#elif defined(linux)
...
#else
#error "This platform does not support sendfile."
#endif

This makes the conditional a bit easier to read.


comment:33 Changed 2 years ago by exarkun

  • Owner set to therve

I forced a build. It looks like there are perhaps some Windows issues.

comment:34 Changed 2 years ago by exarkun

  • Keywords review removed

comment:35 Changed 2 years ago by therve

  • Branch changed from branches/sendfile-585-7 to branches/sendfile-585-8

(In [36398]) Branching to 'sendfile-585-8'

comment:36 Changed 23 months ago by therve

  • Branch changed from branches/sendfile-585-8 to branches/sendfile-585-9

(In [37102]) Branching to 'sendfile-585-9'

comment:37 Changed 23 months ago by therve

  • Keywords review added
  • Owner therve deleted

Back for another round, with Python3 being happier. Builds here: http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/sendfile-585-9

comment:38 Changed 22 months ago by tom.prince

  1. The test module docstrings could be improved. (One is testing the transport interface and one the extension module.
  2. It appears that this doesn't work on python3.3:
    % python3.3
    Python 3.3.0 (default, Feb 11 2013, 17:58:23) 
    [GCC 4.6.3] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import twisted.python._sendfile
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ImportError: ./twisted/python/_sendfile.cpython-33.so: undefined symbol: Py_InitModule
    
  3. (optional) Consider using
       path = filepath.FilePath(self.mktemp())
       path.setContent(...)
       file = path.open("r")
    
    to create files for testing.
  4. I'm not convinced that SendfileClientProtocol.doneDeferred is going to be reliably called.
    • On further consideration, I'm not sure why AccumulatingProtocol can't be used for these test (and .data checked after the connection is closed).
  5. I'm not clear what test_error and test_errorAtFirstSight are supposed to be testing, and it looks like the later is partly copied from the former:
    client.doneDeferred = doneDeferred = Deferred()
    client.expected = 100000
    
    doesn't seem to be used in the later.
  6. I think it would be useful to have test that sendfile actually sends the right data. This would be more than a unit test, but I'd rather know that a supported platform has a broken sendfile (or that we call it in a broken way) on the buildbot, than later.

I'm going to need some more time to digest FileDescriptor.doWrite, before I can finish this review.

comment:39 Changed 22 months ago by tom.prince

  • Keywords review removed
  • Owner set to therve
  1. twisted.internet.test.test_sendfile.
    1. test_invalidFile
      • This would be better as separate tests (one assert per test, ideally).
      • The specific exceptions that are being checked here look like they are accidental implementation details. It would be better to check for these errors, and then raise something that is meaningful to the calling code.
      • It looks like passing a StringIO will be accepted by the constructor, then fail later.
  2. The use of deferreds with SendfileInfo seems to be spread over at least twisted.internet._sendfile and twisted.internet.tcp
  3. I'm not sure why twisted.internet.tcp.Connection is calling abstract.FileDescriptor._sendfile as opposed to self._sendfile.
  4. I'll reiterate:

    _SendfileMarker = Exception() - Wat.

    object() seems to be idiomatic for this.
    • I guess this is because this.
  5. I'm not sure how to test this (especially reliably), but I think that that if an incomplete write followed by a writeFile is done, but the write ends up being a partial write, that the sendfile will be inserted before the rest of the initial write is comepleted.
  6. The fallback mechanism seems like it will fail if there is already a producer registered with the transport.

A general comment: The control flow here seems fairly subtle and not easy to follow. I'm not certain that I've reviewed, but I think there is enough here to warrant a followup.

comment:40 Changed 22 months ago by therve

  • Keywords review added
  • Owner therve deleted

Thanks a lot for the detailed review. I think it ended up with some significant cleanups:

  1. Done.
  1. I think we can postpone 3.3 support? I mostly cared about the fallback, which works.
  1. Done.
  1. Correct, I removed SendfileClientProtocol.
  1. Cleaned up a bit. The docstring ought to differentiate the 2 tests though.
  1. We already have tests which check what's sent, I'm not sure what you're thinking of.
  1. I removed the test for now, I can't think of a nice way to do that.
  1. Yeah... There are not that many places, and I'm not sure how to do it without introducing several weird APIs.
  1. Fixed.
  1. Removed.
  1. You're right, there was a missing check. I managed to somehow test it in test_writeFilePendingData.
  1. Indeed. I made sure the deferred fired properly with the error.

comment:41 Changed 22 months ago by glyph

Re: point 12: I just looked at this changeset and I was wondering, why 'except Exception'? Why not BaseException? Do you not want to catch KeyboardInterrupt and SystemExit for some reason? Also: why are you throwing away the traceback? This is what the 0-argument form of Deferred.errback() is for. Consider this:

>>> from twisted.internet.defer import Deferred
>>> d = Deferred()
>>> try:
...   1/0
... except ZeroDivisionError as z:
...   d.errback(z)
... 
>>> d2 = Deferred()
>>> try:
...  1/0
... except ZeroDivisionError:
...  d2.errback()
... 
>>> 
>>> d2.result.printTraceback()
Traceback (most recent call last):
--- <exception caught here> ---
  File "<stdin>", line 2, in <module>
    
exceptions.ZeroDivisionError: integer division or modulo by zero
>>> d.result.printTraceback()
Traceback (most recent call last):
Failure: exceptions.ZeroDivisionError: integer division or modulo by zero

comment:42 Changed 22 months ago by tom.prince

  1. I think we can postpone 3.3 support? I mostly cared about the fallback, which works.

I guess setup3.py doesn't build modules, so that's fine.

  1. Indeed. I made sure the deferred fired properly with the error.

If you are going to fail when there is a producer, you should document that writeFile can't be used when there is a producer registered with the transport.

  1. I removed the test for now, I can't think of a nice way to do that.

It would be nice to handle this gracefully, at some point. File a ticket?

  1. You're right, there was a missing check. I managed to somehow test it in test_writeFilePendingData.

This test does catch it, but it depends on the OS not accepting all the data at once. Add a comment to the effect that it depends on that?

comment:43 Changed 22 months ago by tom.prince

  • Keywords review removed
  • Owner set to therve

Applying this diff causes the test to fail

  • twisted/internet/test/test_sendfile.py

    diff --git a/twisted/internet/test/test_sendfile.py b/twisted/internet/test/test_sendfile.py
    index e35dc62..18525bb 100644
    a b class SendfileIntegrationMixin(object): 
    229229            clients.append(client) 
    230230            fileObject = self.createFile() 
    231231            doneDeferred = server.transport.writeFile(fileObject) 
     232            server.transport.write("y") 
    232233            return doneDeferred.addBoth(finished, server) 
    233234 
    234235        def finished(passthrough, server): 
    class SendfileIntegrationMixin(object): 
    242243        d.addErrback(log.err) 
    243244        self.runReactor(reactor) 
    244245 
    245         self.assertEqual(1000000, len(clients[0].data)) 
     246        self.assertEqual(1000001, len(clients[0].data)) 
     247        self.assertEqual(1000000, clients[0].data.rindex("y")) 
    246248 
    247249    if sendfileSkip: 
    248250        test_writeFileErrorAtFirstSight.skip = sendfileSkip 

because when switching to the fallback, the writes get pushed to the end.

I think fixing this could also address 8, if SendfileInfo just directly handled the fallback.

Depending on how this is done, it might also allow .writeFile to be used from a producer.

Lastly, I was wondering if it might make sense to make FileSender use writeFile, if its consumer supports (which would play merry hell with using to implement the fallback.


That a rather disjointed and incomplete review, sorry. I'll perhaps try to add some more coherent thoughts later.

comment:44 Changed 22 months ago by therve

  • Branch changed from branches/sendfile-585-9 to branches/sendfile-585-10

(In [37485]) Branching to 'sendfile-585-10'

comment:45 Changed 22 months ago by therve

  • Keywords review added
  • Owner therve deleted

The new branch changes the approach by putting the logic in FileSender: it's backward compatible, doesn't change the reactor internals at all, and overall limit where the changes are made. The negative side is that some checks are a bit ugly.

http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/sendfile-585-10 for the build results before cosmetic changes.

comment:46 Changed 21 months ago by exarkun

  • Owner set to exarkun
  • Status changed from new to assigned

comment:47 follow-up: Changed 21 months ago by exarkun

  • Keywords review removed
  • Owner changed from exarkun to therve
  • Status changed from assigned to new

Woo! Thanks.

  1. Sad Python 3.3 is sad
  2. Some twistedchecker errors to look at as well I guess
  3. Some new compiler warnings are showing up on OS 10.6 (but not OS 10.8 it seems)
  4. in twisted/python/_sendfile.c
    1. I think identifiers of the form _sendfile are not a great idea in C (this page suggests it is a form of reserved identifier, I think).
    2. It looks like offset could potentially overflow. off_t seems to be a signed type, so the result might be negative after adding sbytes to it. Since offset and sbytes are both off_t, I think that means their sum definitely fits into a Py_ssize_t, so you could add them into a variable of that type to avoid the overflow. This seems to be true for the FreeBSD and Apple code paths. Maybe not relevant for the Linux codepath.
  5. in twisted/protocols/basic.py
    1. Some new attributes aren't documented. Maybe don't bother documenting them if you're going to delete them in a refactoring I ask for below, though. :)
    2. Since beginFileTransfer is separate from FileSender construction, you could probably make sendfile an attribute of the FileSender instance. Then the tests won't need to patch the basic module.
    3. Non-seekable files will be a problem. I wonder if there are any such things supported by sendfile? I do not know. However, I guess perhaps the case needs to be considered by _trySendFile one way or another.
    4. maybe self.transport is None inside _trySendFile - instead of not self.transport.
    5. If you want, os.fstat will probably tell you the size of the file without seeking back and forth a bunch of times. As a bonus, it will raise an exception if the fileno is not valid.
    6. I guess the factoring would be a little bit nicer if the logic represented by _resumeProducingNoOp and _resumeProducingSendfile were factored into one helper class and the logic represented by _resumeProducingWrite were factored into another helper class. It seems like this would allow FileSender.resumeProducing to be a simple delegation to the resumeProducing of another object (without instance method patching). Actually, making NoOp its own object as well would remove even more of the conditionals and flags, I think, so maybe that's a good idea. Alternatively, keeping everything as one class but re-organizing the code as something more like an explicit state machine might be another way to make things less messy.
    7. Oh no : from twisted.internet.tcp import EWOULDBLOCK :/ What will we do? That dependency goes the wrong way. Maybe since Windows is not supported, errno.EWOULDBLOCK is good enough?
    8. It's a bit funny how there are addWriter calls in this code but there are no removeWriter calls. I see that is because twisted.internet.abstract.FileDescriptor calls removeWriter before resuming any producer it has. So this seems like it will work and be correct, but I wonder if there will be any problems caused by the close coupling of these two pieces of code. Perhaps at least add some documentation, both to FileSender's use of addWriter explaining that it depends on FileDescriptor's removeWriter behavior, and to FileDescriptor.doWrite explaining its removeWriter usage? Could be part of docstrings or comments as you think makes sense.
    9. You did the deferred, self.deferred = self.deferred, None / deferred.errback() dance in most places, in beginFileTransfer spelled it a little differently. Probably not a problem, but maybe consistent spelling would be nicer.
  6. in twisted/protocols/test/test_basic.py
    1. To be absolutely safe, test_transfer (and other tests that check the last byte value) should probably use a test string that ends with a byte that appears nowhere else in the string.
    2. The last assertTrue in test_error should provide a better error message to the assertTrue method.

I guess another round of a review is probably a good idea. Thanks!

comment:48 in reply to: ↑ 47 Changed 21 months ago by therve

  • Keywords review added
  • Owner changed from therve to i love speedbumps

Replying to exarkun:

Woo! Thanks.

  1. Sad Python 3.3 is sad

I think that was already fixed in the branch. Seems better now anyway. I also used the new os.sendfile.

  1. Some twistedchecker errors to look at as well I guess

Fixed.

  1. Some new compiler warnings are showing up on OS 10.6 (but not OS 10.8 it seems)

I'm not sure what to do about that. I'm wondering if we can't just copy http://code.google.com/p/pysendfile/ in our tree. It seems to do smart stuff with
HAVE_LARGEFILE_SUPPORT.

  1. in twisted/python/_sendfile.c
    1. I think identifiers of the form _sendfile are not a great idea in C (this page suggests it is a form of reserved identifier, I think).

Fixed.

  1. It looks like offset could potentially overflow. off_t seems to be a signed type, so the result might be negative after adding sbytes to it. Since offset and sbytes are both off_t, I think that means their sum definitely fits into a Py_ssize_t, so you could add them into a variable of that type to avoid the overflow. This seems to be true for the FreeBSD and Apple code paths. Maybe not relevant for the Linux codepath.

I changed the interface to match os.sendfile, so it shouldn't be relevant anymore.

  1. in twisted/protocols/basic.py
    1. Some new attributes aren't documented. Maybe don't bother documenting them if you're going to delete them in a refactoring I ask for below, though. :)
    2. Since beginFileTransfer is separate from FileSender construction, you could probably make sendfile an attribute of the FileSender instance. Then the tests won't need to patch the basic module.
    3. Non-seekable files will be a problem. I wonder if there are any such things supported by sendfile? I do not know. However, I guess perhaps the case needs to be considered by _trySendFile one way or another.

I'd need some guidance on that if you have an idea.

  1. maybe self.transport is None inside _trySendFile - instead of not self.transport.

I think you meant transform.

  1. If you want, os.fstat will probably tell you the size of the file without seeking back and forth a bunch of times. As a bonus, it will raise an exception if the fileno is not valid.

Done.

  1. I guess the factoring would be a little bit nicer if the logic represented by _resumeProducingNoOp and _resumeProducingSendfile were factored into one helper class and the logic represented by _resumeProducingWrite were factored into another helper class. It seems like this would allow FileSender.resumeProducing to be a simple delegation to the resumeProducing of another object (without instance method patching). Actually, making NoOp its own object as well would remove even more of the conditionals and flags, I think, so maybe that's a good idea. Alternatively, keeping everything as one class but re-organizing the code as something more like an explicit state machine might be another way to make things less messy.

Done. I have some concerns about attributes accessing but otherwise it splits things up in a nicer way.

  1. Oh no : from twisted.internet.tcp import EWOULDBLOCK :/ What will we do? That dependency goes the wrong way. Maybe since Windows is not supported, errno.EWOULDBLOCK is good enough?

Yes.

  1. It's a bit funny how there are addWriter calls in this code but there are no removeWriter calls. I see that is because twisted.internet.abstract.FileDescriptor calls removeWriter before resuming any producer it has. So this seems like it will work and be correct, but I wonder if there will be any problems caused by the close coupling of these two pieces of code. Perhaps at least add some documentation, both to FileSender's use of addWriter explaining that it depends on FileDescriptor's removeWriter behavior, and to FileDescriptor.doWrite explaining its removeWriter usage? Could be part of docstrings or comments as you think makes sense.

I tried something.

  1. You did the deferred, self.deferred = self.deferred, None / deferred.errback() dance in most places, in beginFileTransfer spelled it a little differently. Probably not a problem, but maybe consistent spelling would be nicer.

I'm not sure where you saw inconsistency. At any rate, I added a helper method.

  1. in twisted/protocols/test/test_basic.py
    1. To be absolutely safe, test_transfer (and other tests that check the last byte value) should probably use a test string that ends with a byte that appears nowhere else in the string.

Fixed

  1. The last assertTrue in test_error should provide a better error message to the assertTrue method.

Fixed.

Thanks!

comment:49 Changed 21 months ago by Julian

  • Owner i love speedbumps deleted

comment:50 follow-up: Changed 21 months ago by tom.prince

  • Keywords review removed
  • Owner set to therve
  1. The interface that is implemented is actually IPullProducer. So pauseProducing is still there simply for compatibility. It would be good to mention this in the docstring.
  2. The docstring for beginFileTransfer could use some L{}ve.
  3. The code now behaves differently depending on whether sendfile is used or not. If it is, the file is streamed from the beginning, and its position isn't updated. If it isn't, the file is streamed from the current position, and the position is updated.
  4. I think passing _count after the first call to sendfile is the wrong thing to do.
  5. The C binding appears to be depending on coincidence of C types. For example, "L" is documented as being long long, But you seem to be using it with off_t.
  6. The C binding doesn't support passing None for offset. This is relevant to addressing point 3 above.

Oh no : from twisted.internet.tcp import EWOULDBLOCK :/ What will we do? That dependency goes the wrong way. Maybe since Windows is not supported, errno.EWOULDBLOCK is good enough?

twisted.protocols can depend on twisted.internet, no?

comment:51 follow-up: Changed 21 months ago by exarkun

twisted.protocols can depend on twisted.internet, no?

Yes, but that import is actually in twisted.python (which can't depend on twisted.internet).

comment:52 in reply to: ↑ 51 Changed 21 months ago by therve

Replying to exarkun:

twisted.protocols can depend on twisted.internet, no?

Yes, but that import is actually in twisted.python (which can't depend on twisted.internet).

It wasn't afaict. But it's removed anyway.

comment:53 in reply to: ↑ 50 Changed 21 months ago by therve

  • Keywords review added
  • Owner therve deleted

Replying to tom.prince:

  1. The interface that is implemented is actually IPullProducer. So pauseProducing is still there simply for compatibility. It would be good to mention this in the docstring.

I added something.

  1. The docstring for beginFileTransfer could use some L{}ve.

OK.

  1. The code now behaves differently depending on whether sendfile is used or not. If it is, the file is streamed from the beginning, and its position isn't updated. If it isn't, the file is streamed from the current position, and the position is updated.

Fixed.

  1. I think passing _count after the first call to sendfile is the wrong thing to do.

I don't understand what you mean.

  1. The C binding appears to be depending on coincidence of C types. For example, "L" is documented as being long long, But you seem to be using it with off_t.

Right. I ended copying the code from http://code.google.com/p/pysendfile/ as it was doing the right thing.

  1. The C binding doesn't support passing None for offset. This is relevant to addressing point 3 above.

This behavior is platform dependant afaik.

Thanks!

comment:54 Changed 20 months ago by rwall

  • Owner set to rwall
  • Status changed from new to assigned

comment:55 follow-up: Changed 20 months ago by rwall

  • Keywords review removed
  • Owner changed from rwall to therve
  • Status changed from assigned to new

Code Review

Hi Therve,

Thanks for the branch. It's been interesting reading about sendfile
and it looks like this will give some exciting performance
improvements.

I'm not really qualified to comment on the code changes and especially
not the C code so this is only a cosmetic review, but I do notice some
test failures which you probably need to investigate and fix and I've
spotted one or two typos and missing epytext links.

Some notes:

  1. Test failures
    1. http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/sendfile-585-10
      [FAIL]
      Traceback (most recent call last):
        File "/var/lib/buildbot/bot-glyph-1/py-select-gc/Twisted/twisted/test/test_pb.py", line 903, in test_emptyFilePaging
          self.fail('getAllPages timed out')
      twisted.trial.unittest.FailTest: getAllPages timed out
      
      twisted.test.test_pb.PagingTestCase.test_emptyFilePaging
      ===============================================================================
      [FAIL]
      Traceback (most recent call last):
        File "/var/lib/buildbot/bot-glyph-1/py-select-gc/Twisted/twisted/test/test_tpfile.py", line 51, in testSendingEmptyFile
          'producer unregistered with deferred being called')
        File "/var/lib/buildbot/bot-glyph-1/py-select-gc/Twisted/twisted/trial/_synctest.py", line 308, in assertTrue
          raise self.failureException(msg)
      twisted.trial.unittest.FailTest: producer unregistered with deferred being called
      
      twisted.test.test_tpfile.FileSenderTestCase.testSendingEmptyFile
      ===============================================================================
      [ERROR]
      Traceback (most recent call last):
      Failure: twisted.internet.defer.TimeoutError: <twisted.test.test_ftp.FTPReadWriteTestCase testMethod=test_read> (test_read) still running at 120.0 secs
      
      twisted.test.test_ftp.FTPReadWriteTestCase.test_read
      
    2. The same tests fail on my fedora 18 x86_64 Python2.7 laptop
  1. branches/sendfile-585-10/twisted/protocols/test/test_basic.py
    1. typo "producter"
    2. C{Deferred} should be L{Deferred}
  2. branches/sendfile-585-10/twisted/protocols/basic.py
    1. C{}s which should be L{}s
      1. C{IFileDescriptor}
      2. C{IReactorFDSet}
      3. C{self._producer}
      4. C{IPullProducer}
      5. C{beginFileTransfer}'s C{Deferred}
    2. typos
      1. "as we're don't"
      2. "and change the producer"
      3. "calling use immediately"
      4. "systme call"
    3. Should there be new tickets created for these comments
      1. call startWriting again (or reactor.addWriter). It'd be nice if the producer had a way to tell us that we shouldn't call stopWriting.
      2. descriptor is already registered, so changing the behavior underneath wouldn't harm anything, but it would make things cleaner here.
    4. branches/sendfile-585-10/twisted/python/_sendfile.c
      1. Worth adding a link to the location of the original file?
  1. Performance
    1. PenguinOfDoom's original patch in this ticket was about speeding up twisted.web. Will this branch have any impact on eg t.w.static.File? If not, should there be a ticket for that?
    2. Are there going to be new speed.twistedmatrix.com benchmarks for this? If so, is it worth doing those before merging this branch so that the performance improvements can be clearly seen?

Please address or answer the points above and paste a link to some
clean buildbot results.

Also may be worth merging forward so that the buildbot produces sane
"new error" reports.

-RichardW.

comment:56 Changed 19 months ago by oberstet

  • Cc tobias.oberstein@… added

comment:57 Changed 18 months ago by therve

  • Branch changed from branches/sendfile-585-10 to branches/sendfile-585-11

(In [39061]) Branching to 'sendfile-585-11'

comment:58 in reply to: ↑ 55 Changed 18 months ago by therve

Thanks a lot for the review Richard!

Replying to rwall:

Some notes:

  1. Test failures

That was interesting, fixed.

  1. branches/sendfile-585-10/twisted/protocols/test/test_basic.py
    1. typo "producter"
    2. C{Deferred} should be L{Deferred}
  2. branches/sendfile-585-10/twisted/protocols/basic.py
    1. C{}s which should be L{}s
      1. C{IFileDescriptor}
      2. C{IReactorFDSet}
      3. C{self._producer}
      4. C{IPullProducer}
      5. C{beginFileTransfer}'s C{Deferred}

Fixed, except self._producer because pydoctor doesn't like it.

  1. typos
    1. "as we're don't"
    2. "and change the producer"
    3. "calling use immediately"
    4. "systme call"

All of those fixed.

  1. Should there be new tickets created for these comments
    1. call startWriting again (or reactor.addWriter). It'd be nice if the producer had a way to tell us that we shouldn't call stopWriting.
    2. descriptor is already registered, so changing the behavior underneath wouldn't harm anything, but it would make things cleaner here.

Yeah, I'd be happy to create those once this one is closed.

  1. branches/sendfile-585-10/twisted/python/_sendfile.c
    1. Worth adding a link to the location of the original file?

Done.

  1. Performance
    1. PenguinOfDoom's original patch in this ticket was about speeding up twisted.web. Will this branch have any impact on eg t.w.static.File? If not, should there be a ticket for that?

static.File uses FileSender, so gets the benefit for free.

  1. Are there going to be new speed.twistedmatrix.com benchmarks for this? If so, is it worth doing those before merging this branch so that the performance improvements can be clearly seen?

There is a "File Sender Throughput" benchmark which has been running for a little it now. It's about 3 times faster in the branch.

Please address or answer the points above and paste a link to some
clean buildbot results.

There are some errors on FreeBSD, checking that.

Also may be worth merging forward so that the buildbot produces sane
"new error" reports.

We really shouldn't need that :/

comment:59 Changed 18 months ago by therve

  • Keywords review added
  • Owner therve deleted

Okay, buildbot seems better (spurious pyflakes and twistedchecker errors, sigh). I removed OS X and FreeBSD support for now: at least on OS X it fails spuriously, and I don't have time to debug some global state for now. We can open a ticket to support those later on.

comment:60 Changed 16 months ago by Alex

Some notes:

  • The method name _trySendFile is suboptimal, something like _shouldSendFile would be better, _trySendFile makes it sound like it actually begins sending the file.
  • In stopProducing, a more precise exception should be raised than Exception (because Failure.trap and Failure.check use isinstance semantics, a more precise type doesn't break backwards compatibility)
  • It'd be pretty awesome if this used cffi, instead of a C extension, so it works in a high performance fashion on pypy.

comment:61 Changed 12 months ago by Alex

  • Keywords review removed
  • Owner set to therve
  • protocols/basic.py:1128 use the os.SEEK_ constant, instead of "2"
  • _sendfile.py:32 you probably want to use "OSError(ffi.errno, os.strerror(ffi.errno))" that way the exception's "errno" attribute works
  • _sendfile.py:52 same
  • _sendfile.py:73 same :-)

Finally, does the logic handle not building on windows correctly?

Should the root setup.py include cffi in the install_requires when setuptools is present (it basically always is) so users will mostly get this speedup transparently?

Besides these items this patch looks good to me!

Note: See TracTickets for help on using tickets.