Opened 9 years ago
Last modified 2 weeks ago
I've done a quick hack to this effect. It seems that sendfile(2)-enhanced twisted.web is significantly closer to Apache speeds on large (5MB) files. Benchmarks will be posted once somebody figures out how to properly measure the speed.
can you upload a patch, even if it's a nasty hack?
Where's the patch?
Yea where's the patch?
The branch is here: http://twistedmatrix.com/trac/browser/branches/pahan/sendfile/sandbox/pahan?rev=10492
The original sendfile wrapper is from here: http://snakefarm.org/.
For people interested, I've try to update PenguinOfDoom's work in sendfile-585, and it seems to work fine.
Not having looked at the patch yet, some test suggestions:
1) If the sendfile syscall is present, but fails, the code deals sanely (fallback to read/send, I suppose, or else some kind of notification?)
1b) what if it fails because the file is unreadable or has an I/O error of some kind?
2) SSL sockets don't support sendfile
3) it works, or at least, doesn't blow up on *BSD (which has a sendfile, with a different signature)
From a quick skim of the branch, a couple thoughts:
Replying to jknight:
Thanks.
For now, that closes the connection.
Same as above.
I'm interested to know what behavior we could choose.
Yes that's disabled, even thought I'm not really happy about the mecanism (value of the sendfile attribute)
I didn't do that yet, but it's interesting.
(In [22235]) Branching to 'sendfile-585-2'
(In [22239]) Docs and cleanups.
Refs #585
(In [23312]) Branching to 'sendfile-585-3'
This is cool. I am going to call it a core ticket, though, since it's about extending the reactor interface.
(In [28626]) Branching to 'sendfile-585-4'
(In [34562]) Branching to 'sendfile-585-5'
I believe this is ready to review. The tests are meant to be relatively generic, and could be integrated later on with ITCPTransport tests. The main thing I don't like are the FileDescriptor changes, but I don't see a way around it.
Here are the test builds:
http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/sendfile-585-5
Thanks. Interesting stuff. Some thoughts:
Thanks for the review! I think it helped fixing some major inconsistencies and cleaning it up.
1. Renamed to writeFile
2. I have several reasons not to like it. Embedding it works out of the box. There is only one release of that project, so I don't want to end up like our old kqueue support. There are several sendfile libraries out there, so it may be hard to find the proper one.
3. Fixed.
4. Removed and opened #5782.
5. Removed the mention. I'll open a bug about Windows if you care.
6. I removed a line. with 6.1 refactoring the sendfile section quite a bit.
7.1 I removed SendfileProducer and the extra feature that I didn't use nor test.
7.2 I moved it under the SendfileInfo class.
(In [35063]) Branching to 'sendfile-585-6'
I am new to Twisted, and am participating in the Python sprint in San Francisco, on August 19, 2012.
sendfile() has different implementations on different platforms. The code you wrote uses the 4-argument version of sendfile() which is on Linux. You should make it clear that this implementation is Linux-only.
For your reference, here are the man pages for sendfile on Linux, MacOS X, and FreeBSD.
Linux, sendfile has 4 arguments http://man7.org/linux/man-pages/man2/sendfile.2.html
MacOS X, sendfile has 6 arguments https://developer.apple.com/library/mac/#documentation/Darwin/Reference/Manpages/man2/sendfile.2.html
FreeBSD sendfile(2), sendfile has 7 arguments http://www.freebsd.org/cgi/man.cgi?query=sendfile&sektion=2
In addition, on Windows, there is a TransmitFile http://msdn.microsoft.com/en-us/library/windows/desktop/ms740565%28v=vs.85%29.aspx
If you want to look at a sendfile() compatibility layer, you can take a look at the ACE C++ library which has some code for sendfile() on Linux and Windows:
https://svn.dre.vanderbilt.edu/viewvc/Middleware/trunk/ACE/ace/OS_NS_sys_sendfile.inl https://svn.dre.vanderbilt.edu/viewvc/Middleware/trunk/ACE/ace/OS_NS_sys_sendfile.cpp https://svn.dre.vanderbilt.edu/viewvc/Middleware/trunk/ACE/ace/OS_NS_sys_sendfile.h
That implementation has code which falls back to using mmap(). You may not want to do that, but at least it gives you an idea.
If you add comments to your code to indicate that the Linux sendfile() is being used, then we can file other tickets to get this code to work on MacOS X, FreeBSD, and Windows, later on.
Other than that, I think we should just commit your code to Twisted, after getting approval from the other Twisted reviewers.
The branch already implements the sendfile compatibility between Linux, OS X and FreeBSD. We can possibly implement the Windows optimization later on.
Thanks for working on this. I'm probably not the best qualified to review it but you can blame glyph for that.
(In [35787]) Branching to 'sendfile-585-7'
Alright, everything handled I think.
I attended the Twisted Sprint in San Francisco on October 20. As part of my learning process for Twisted, I looked at this ticket.
Since I have access to a FreeBSD 9 machine, I tested the following on FreeBSD 9. I did the following on my FreeBSD 9 box:
(1) Checked out this branch: svn://svn.twistedmatrix.com/svn/Twisted/branches/sendfile-585-7
(2) Ran: python setup.py build
(3) Ran: python build/scripts-2.7/trial twisted
(4) Output of results are here: http://home.comcast.net/~rodrigc/twisted/trial-sendfile-585-7.freebsd-9.1-PRERELEASE-i386-2.7.txt
I then did the following:
(1) Looked to see when sendfile-585-7 branched from trunk: svn log --stop-on-copy svn://svn.twistedmatrix.com/svn/Twisted/branches/sendfile-585-7
(2) Saw that sendfile-585-7 branched from trunk at GRN 35786
(3) Checked out trunk@35786
(4) Ran: python setup.py build
(5) Ran: python build/scripts-2.7/trial twisted
(6) Results are here: http://home.comcast.net/~rodrigc/twisted/trial-freebsd-9.1-PRERELEASE-i386-2.7_trunk@38786.txt
So basically it looks like on FreeBSD, no new test failures were introduced in the sendfile branch. Also, the sendfile tests seem to run properly and pass.
This looks good.
To learn about running Twisted under Windows 7, I did the following:
SETUP STEPS
(1) Followed instructions at http://blog.ying.li/2012/07/twisted-on-windows-using-activestate.html
to install ActiveState Python.
(2) Followed instructions from same blog posting, but installed Microsoft Visual C++ 2008 Express Edition.
The compiler version MUST match the compiler version used to compile the Python interpreter. There is code in Python's distutils package which needs this, otherwise setup.py will not work when compiling native code.
(3) Downloaded SVN client from http://sliksvn.com
TESTING STEPS
(3) Ran: python build/scripts-2.7/trial.py twisted
(4) Output of results are here: http://home.comcast.net/~rodrigc/twisted/trial-windows7_sendfile-585-7.txt
(5) Ran: python build/scripts-2.7/trial.py twisted
(6) Results are here: http://home.comcast.net/~rodrigc/trial-windows7_trunk@38786.txt
Looks good on Windows. No new regressions introduced. Some of the sendfile tests on Windows are executed:
twisted.internet.test.test_sendfile
SendfileInfoTestCase test_invalidFile ... [OK] test_lengthDetection ... [OK]
SendfileInfoTestCase
test_invalidFile ... [OK] test_lengthDetection ... [OK]
I was confused by this at first, because the code in this branch does not implement the TransmitFile() system call under Windows (which is the Windows equivalent to sendfile()). However, after looking at the testcase code with Ying and Glyph's help at the Sprint, I learned that this test only executes the "platform-independent" tests which do not actually use the sendfile() system call.
This all looks really good, you have done a really good job.
You have gone through a few iterations of code reviews, and I will look at the code next.
This will take me some time though, since I am new to Twisted, and need to become more familiar with the Twisted coding stanards and code review procedures.
After running the tests on FreeBSD and Windows, I looked over the changes in branches/sendfile-585-7 . Everything looks quite good. Thanks for doing this work. I have some review comments, but they fall into the category of "typo", so nothing major.
Here are my review comments:
twisted/python/test/test_sendfile.py ===================================== (1) Take the date out of the Copyright, and make it
like other copyright notices in the code base.
(2) In def test_afterSend(self): , change the comment from:
Verify that a send after sendfile behaves correctly.
to:
Verify that a send followed by sendfile behaves correctly.
because in the test you are actually calling send(), then sendfile().
twisted/internet/test/test_sendfile.py ======================================
(1) In def_testinvalidFile(self): , fix typo:
file, and a C{ValueError} when passwed a closed file.
file, and a C{ValueError} when passed a closed file.
twisted/python/_sendfile.c ==========================
You have this:
#if defined(FreeBSD) .... #else #if defined(APPLE) .... #else ... #endif #endif
You should change it to something like:
#if defined(FreeBSD) ... #elif defined(APPLE) ... #elif defined(linux) ... #else #error "This platform does not support sendfile." #endif
This makes the conditional a bit easier to read.
I forced a build. It looks like there are perhaps some Windows issues.
(In [36398]) Branching to 'sendfile-585-8'
(In [37102]) Branching to 'sendfile-585-9'
Back for another round, with Python3 being happier. Builds here: http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/sendfile-585-9
% python3.3 Python 3.3.0 (default, Feb 11 2013, 17:58:23) [GCC 4.6.3] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import twisted.python._sendfile Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: ./twisted/python/_sendfile.cpython-33.so: undefined symbol: Py_InitModule
path = filepath.FilePath(self.mktemp()) path.setContent(...) file = path.open("r")
client.doneDeferred = doneDeferred = Deferred() client.expected = 100000
I'm going to need some more time to digest FileDescriptor.doWrite, before I can finish this review.
A general comment: The control flow here seems fairly subtle and not easy to follow. I'm not certain that I've reviewed, but I think there is enough here to warrant a followup.
Thanks a lot for the detailed review. I think it ended up with some significant cleanups:
1. Done.
2. I think we can postpone 3.3 support? I mostly cared about the fallback, which works.
3. Done.
4. Correct, I removed SendfileClientProtocol.
5. Cleaned up a bit. The docstring ought to differentiate the 2 tests though.
6. We already have tests which check what's sent, I'm not sure what you're thinking of.
7. I removed the test for now, I can't think of a nice way to do that.
8. Yeah... There are not that many places, and I'm not sure how to do it without introducing several weird APIs.
9. Fixed.
10. Removed.
11. You're right, there was a missing check. I managed to somehow test it in test_writeFilePendingData.
12. Indeed. I made sure the deferred fired properly with the error.
Re: point 12: I just looked at this changeset and I was wondering, why 'except Exception'? Why not BaseException? Do you not want to catch KeyboardInterrupt and SystemExit for some reason? Also: why are you throwing away the traceback? This is what the 0-argument form of Deferred.errback() is for. Consider this:
>>> from twisted.internet.defer import Deferred >>> d = Deferred() >>> try: ... 1/0 ... except ZeroDivisionError as z: ... d.errback(z) ... >>> d2 = Deferred() >>> try: ... 1/0 ... except ZeroDivisionError: ... d2.errback() ... >>> >>> d2.result.printTraceback() Traceback (most recent call last): --- <exception caught here> --- File "<stdin>", line 2, in <module> exceptions.ZeroDivisionError: integer division or modulo by zero >>> d.result.printTraceback() Traceback (most recent call last): Failure: exceptions.ZeroDivisionError: integer division or modulo by zero
I guess setup3.py doesn't build modules, so that's fine.
If you are going to fail when there is a producer, you should document that writeFile can't be used when there is a producer registered with the transport.
It would be nice to handle this gracefully, at some point. File a ticket?
This test does catch it, but it depends on the OS not accepting all the data at once. Add a comment to the effect that it depends on that?
Applying this diff causes the test to fail
diff --git a/twisted/internet/test/test_sendfile.py b/twisted/internet/test/test_sendfile.py index e35dc62..18525bb 100644
because when switching to the fallback, the writes get pushed to the end.
I think fixing this could also address 8, if SendfileInfo just directly handled the fallback.
Depending on how this is done, it might also allow .writeFile to be used from a producer.
Lastly, I was wondering if it might make sense to make FileSender use writeFile, if its consumer supports (which would play merry hell with using to implement the fallback.
That a rather disjointed and incomplete review, sorry. I'll perhaps try to add some more coherent thoughts later.
(In [37485]) Branching to 'sendfile-585-10'
The new branch changes the approach by putting the logic in FileSender: it's backward compatible, doesn't change the reactor internals at all, and overall limit where the changes are made. The negative side is that some checks are a bit ugly.
http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/sendfile-585-10 for the build results before cosmetic changes.
Woo! Thanks.
I guess another round of a review is probably a good idea. Thanks!
Replying to exarkun:
Woo! Thanks. 1. Sad Python 3.3 is sad
I think that was already fixed in the branch. Seems better now anyway. I also used the new os.sendfile.
1. Some twistedchecker errors to look at as well I guess
Fixed.
1. Some new compiler warnings are showing up on OS 10.6 (but not OS 10.8 it seems)
I'm not sure what to do about that. I'm wondering if we can't just copy http://code.google.com/p/pysendfile/ in our tree. It seems to do smart stuff with HAVE_LARGEFILE_SUPPORT.
1. in twisted/python/_sendfile.c 1. I think identifiers of the form _sendfile are not a great idea in C ( this page suggests it is a form of reserved identifier, I think).
1. It looks like offset could potentially overflow. off_t seems to be a signed type, so the result might be negative after adding sbytes to it. Since offset and sbytes are both off_t, I think that means their sum definitely fits into a Py_ssize_t, so you could add them into a variable of that type to avoid the overflow. This seems to be true for the FreeBSD and Apple code paths. Maybe not relevant for the Linux codepath.
I changed the interface to match os.sendfile, so it shouldn't be relevant anymore.
1. in twisted/protocols/basic.py 1. Some new attributes aren't documented. Maybe don't bother documenting them if you're going to delete them in a refactoring I ask for below, though. :) 1. Since beginFileTransfer is separate from FileSender construction, you could probably make sendfile an attribute of the FileSender instance. Then the tests won't need to patch the basic module. 1. Non-seekable files will be a problem. I wonder if there are any such things supported by sendfile? I do not know. However, I guess perhaps the case needs to be considered by _trySendFile one way or another.
I'd need some guidance on that if you have an idea.
1. maybe self.transport is None inside _trySendFile - instead of not self.transport.
I think you meant transform.
1. If you want, os.fstat will probably tell you the size of the file without seeking back and forth a bunch of times. As a bonus, it will raise an exception if the fileno is not valid.
Done.
1. I guess the factoring would be a little bit nicer if the logic represented by _resumeProducingNoOp and _resumeProducingSendfile were factored into one helper class and the logic represented by _resumeProducingWrite were factored into another helper class. It seems like this would allow FileSender.resumeProducing to be a simple delegation to the resumeProducing of another object (without instance method patching). Actually, making NoOp its own object as well would remove even more of the conditionals and flags, I think, so maybe that's a good idea. Alternatively, keeping everything as one class but re-organizing the code as something more like an explicit state machine might be another way to make things less messy.
Done. I have some concerns about attributes accessing but otherwise it splits things up in a nicer way.
1. Oh no : from twisted.internet.tcp import EWOULDBLOCK :/ What will we do? That dependency goes the wrong way. Maybe since Windows is not supported, errno.EWOULDBLOCK is good enough?
Yes.
1. It's a bit funny how there are addWriter calls in this code but there are no removeWriter calls. I see that is because twisted.internet.abstract.FileDescriptor calls removeWriter before resuming any producer it has. So this seems like it will work and be correct, but I wonder if there will be any problems caused by the close coupling of these two pieces of code. Perhaps at least add some documentation, both to FileSender's use of addWriter explaining that it depends on FileDescriptor's removeWriter behavior, and to FileDescriptor.doWrite explaining its removeWriter usage? Could be part of docstrings or comments as you think makes sense.
I tried something.
1. You did the deferred, self.deferred = self.deferred, None / deferred.errback() dance in most places, in beginFileTransfer spelled it a little differently. Probably not a problem, but maybe consistent spelling would be nicer.
I'm not sure where you saw inconsistency. At any rate, I added a helper method.
1. in twisted/protocols/test/test_basic.py 1. To be absolutely safe, test_transfer (and other tests that check the last byte value) should probably use a test string that ends with a byte that appears nowhere else in the string.
Fixed
1. The last assertTrue in test_error should provide a better error message to the assertTrue method.
Thanks!
Oh no : from twisted.internet.tcp import EWOULDBLOCK :/ What will we do? That dependency goes the wrong way. Maybe since Windows is not supported, errno.EWOULDBLOCK is good enough?
twisted.protocols can depend on twisted.internet, no?
Yes, but that import is actually in twisted.python (which can't depend on twisted.internet).
twisted.protocols can depend on twisted.internet, no? Yes, but that import is actually in twisted.python (which can't depend on twisted.internet).
It wasn't afaict. But it's removed anyway.
Replying to tom.prince:
1. The interface that is implemented is actually IPullProducer. So pauseProducing is still there simply for compatibility. It would be good to mention this in the docstring.
I added something.
2. The docstring for beginFileTransfer could use some L{}ve.
OK.
3. The code now behaves differently depending on whether sendfile is used or not. If it is, the file is streamed from the beginning, and its position isn't updated. If it isn't, the file is streamed from the current position, and the position is updated.
4. I think passing _count after the first call to sendfile is the wrong thing to do.
I don't understand what you mean.
5. The C binding appears to be depending on coincidence of C types. For example, "L" is documented as being long long, But you seem to be using it with off_t.
Right. I ended copying the code from http://code.google.com/p/pysendfile/ as it was doing the right thing.
6. The C binding doesn't support passing None for offset. This is relevant to addressing point 3 above.
This behavior is platform dependant afaik.
Hi Therve,
Thanks for the branch. It's been interesting reading about sendfile and it looks like this will give some exciting performance improvements.
I'm not really qualified to comment on the code changes and especially not the C code so this is only a cosmetic review, but I do notice some test failures which you probably need to investigate and fix and I've spotted one or two typos and missing epytext links.
Some notes:
[FAIL] Traceback (most recent call last): File "/var/lib/buildbot/bot-glyph-1/py-select-gc/Twisted/twisted/test/test_pb.py", line 903, in test_emptyFilePaging self.fail('getAllPages timed out') twisted.trial.unittest.FailTest: getAllPages timed out twisted.test.test_pb.PagingTestCase.test_emptyFilePaging =============================================================================== [FAIL] Traceback (most recent call last): File "/var/lib/buildbot/bot-glyph-1/py-select-gc/Twisted/twisted/test/test_tpfile.py", line 51, in testSendingEmptyFile 'producer unregistered with deferred being called') File "/var/lib/buildbot/bot-glyph-1/py-select-gc/Twisted/twisted/trial/_synctest.py", line 308, in assertTrue raise self.failureException(msg) twisted.trial.unittest.FailTest: producer unregistered with deferred being called twisted.test.test_tpfile.FileSenderTestCase.testSendingEmptyFile =============================================================================== [ERROR] Traceback (most recent call last): Failure: twisted.internet.defer.TimeoutError: <twisted.test.test_ftp.FTPReadWriteTestCase testMethod=test_read> (test_read) still running at 120.0 secs twisted.test.test_ftp.FTPReadWriteTestCase.test_read
Please address or answer the points above and paste a link to some clean buildbot results.
Also may be worth merging forward so that the buildbot produces sane "new error" reports.
-RichardW.