Opened 5 years ago

Closed 3 years ago

Last modified 2 years ago

#3896 defect closed fixed (fixed)

Passing a unicode object to request.write corrupts the entire response

Reported by: radix Owned by: exarkun
Priority: normal Milestone:
Component: core Keywords:
Cc: terry@…, msabramo@…, msabramo Branch: branches/request-write-unicode-3896-2
(diff, github, buildbot, log)
Author: msabramo, exarkun Launchpad Bug:

Description

I passed a unicode object to request.write, and my HTTP response came out like this:

'H\x00\x00\x00T\x00\x00\x00T\x00\x00\x00P\x00\x00\x00/\x00\x00\x001\x00\x00\x00.\x00\x00\x001\x00\x00\x00'

and so on. (That's the string "HTTP/1.1" encoded in, apparently, UCS-4).

idnar points out that this may be because of cStringIO usage:

<idnar> >>> StringIO(u'abc').getvalue()
<idnar> 'a\x00\x00\x00b\x00\x00\x00c\x00\x00\x00'

In my opinion, either an exception should be immediately raised when unicode is passed to request.write, or the data should be encoded with the response's encoding, if one has been set.

Attachments (5)

web-sux.tac (461 bytes) - added by radix 5 years ago.
reproducer
unicodeinrequesthandler.py (416 bytes) - added by msabramo 5 years ago.
twisted.internet.ticket_3896.patch (604 bytes) - added by msabramo 5 years ago.
Proposed patch (output from "diff -F def -u abstract.py.orig abstract.py")
test_filedescriptor.py (946 bytes) - added by msabramo 4 years ago.
twisted/internet/test/test_filedescriptor.py
twisted.internet.test.test_tcp.WriteSequenceTests.test_writeSequenceWithUnicodeRaisesException.patch (1.6 KB) - added by msabramo 3 years ago.
Add a ReactorBuilder test for unicode given to writeSequence

Download all attachments as: .zip

Change History (41)

Changed 5 years ago by radix

reproducer

comment:1 Changed 5 years ago by terrycojones

  • Cc terry@… added

Funnily enough, I ran into exactly this issue today. In my case I inadvertently put some unicode into the value passed to request.setHeader before calling request.write with a body. I don't think it's (only) unicode in the response payload/body that causes this problem, as the output shown above is produced by self.transport.writeSequence(l) in request.write as self.startedWriting is 0.

It would be good to get this fixed, as it's non-trivial to figure out what's going on. My client (with very slightly tweaked protocol and factory subclassing those in twisted.web.client) got a failure from its getPage with a twisted.web.error.Error / ConnectionDone with message 'connection closed cleanly'. The server thought it was done and closed the connection. The client protocol and factory saw very little of the action though as the incoming data was a mess and so lots of methods that you'd expect to be called, like handleHeader, handleStatus, etc were not called, and yet failed was 0 in the HTTPPageGetter instance.

Sorry to be a little vague on the client details - they're not too important seeing as the issue will be fixed in the server. I'm just trying to highlight that the problem takes some tracking down (at least when using twisted.web.client tools to process the response).

comment:2 Changed 5 years ago by msabramo

I just ran into this issue as well and it took me a LONG time to figure out what was going on.

In my case, I switched my program from reading data from MySQL to sqlite3 (via adbapi.ConnectionPool) and didn't realize that sqlite3 was returning Unicode strings.
The symptom I saw was that Firefox would hang forever while trying to load the page.

After hours of looking at the issue, I figured out what was happening. As mentioned in the ticket, writing any Unicode causes the response to get written in some kind of multi-byte Unicode encoding:

$ curl -i http://localhost:8000/ > output.bin
$ hexdump -c output.bin | head -2 | cut -c5-
000   H  \0   T  \0   T  \0   P  \0   /  \0   1  \0   .  \0   1  \0
010      \0   2  \0   0  \0   0  \0      \0   O  \0   K  \0  \r  \0

This of course confuses the heck out of any browser, whether it's Firefox or curl.

Interestingly, this problem only seems to occur when the response uses HTTP/1.1 chunked encoding. If the client requests HTTP/1.0, then attempting to write Unicode causes Twisted to raise TypeError("Data must not be unicode"), which seems to be the expected behavior (according to the Twisted FAQ).

The difference between behavior for non-chunked vs. chunked seems to come down to the twisted.internet.abstract.FileDescriptor class.

For non-chunked data, the write method is called and that checks if isinstance(data, unicode). For chunked data, the writeSequence method is called, which does no such check. Later on, the response gets garbled in twisted.internet.tcp.Connection.writeSomeData, which does:

return self.socket.send(buffer(data, 0, self.SEND_LIMIT))

Now I know that I'm not supposed to write Unicode, but it's nice for the framework to let me know when I'm doing something wrong. It already does that if the response is unchunked, so I propose to have the same behavior for when the response is chunked.

Changed 5 years ago by msabramo

Changed 5 years ago by msabramo

Proposed patch (output from "diff -F def -u abstract.py.orig abstract.py")

comment:3 Changed 5 years ago by msabramo

  • Cc msabramo@… added

comment:4 Changed 4 years ago by msabramo

Any update on this? Is my patch acceptable or does it need some tweaking?

comment:5 Changed 4 years ago by thijs

  • Keywords review added
  • Owner jknight deleted

Looks like the 'review' keyword was not added to this ticket, fixing that now. See ReviewProcess for more info.

comment:6 Changed 4 years ago by exarkun

  • Keywords review removed
  • Owner set to msabramo

Could you also add a unit test for this change? The fix itself seems reasonable (unless I let myself worry about performance). It should be easy to add a test to twisted/internet/test/test_tcp.py. And such a test will automatically be applied to all reactor implementations.

Changed 4 years ago by msabramo

twisted/internet/test/test_filedescriptor.py

comment:7 Changed 4 years ago by msabramo

OK, so I added a test (see the attachment:"test_filedescriptor.py") and here are the results of running it before and after patching with attachment:"twisted.internet.ticket_3896.patch". This is with a fresh checkout of Twisted from svn://svn.twistedmatrix.com/svn/Twisted/trunk:

.../twisted/trunk/twisted/internet$ trial test_filedescriptor
test_filedescriptor
  FileDescriptorTests
    test_writeSequenceWithUnicodeRaisesException ...                     [FAIL]
    test_writeWithUnicodeRaisesException ...                               [OK]

===============================================================================
[FAIL]
Traceback (most recent call last):
  File ".../twisted/trunk/test_filedescriptor.py", line 27, 
in test_writeSequenceWithUnicodeRaisesException
    self.assertRaises(TypeError, fileDescriptor.writeSequence, [u'foo', u'bar', u'baz'])
twisted.trial.unittest.FailTest: TypeError not raised (None returned)

test_filedescriptor.FileDescriptorTests.test_writeSequenceWithUnicodeRaisesException
-------------------------------------------------------------------------------
Ran 2 tests in 0.011s

FAILED (failures=1, successes=1)
.../twisted/trunk/twisted/internet$ patch < twisted.internet.ticket_3896.patch 
patching file abstract.py
Hunk #1 succeeded at 300 (offset 93 lines).
.../twisted/trunk/twisted/internet$ trial test_filedescriptor
test_filedescriptor
  FileDescriptorTests
    test_writeSequenceWithUnicodeRaisesException ...                       [OK]
    test_writeWithUnicodeRaisesException ...                               [OK]

-------------------------------------------------------------------------------
Ran 2 tests in 0.294s

PASSED (successes=2)

I know that twisted/internet/test/test_tcp.py was mentioned as a place to add the test but I looked there and it looked like that file had more TCP-specific tests and this seemed more general as doing input and output of unicode without a specified encoding on any file descriptor seems nonsensical. In fact, it was already disallowed in the write method; just not in writeSequence. In any case, if twisted/internet/test/test_tcp.py or any other place is a better place for this test, then let me know and I'll revise my test.

comment:8 follow-up: Changed 4 years ago by exarkun

See also #4782.

comment:9 Changed 4 years ago by msabramo

  • Author set to msabramo
  • Component changed from web to core
  • Keywords review added
  • Owner msabramo deleted

comment:10 in reply to: ↑ 8 Changed 4 years ago by msabramo

Replying to exarkun:

See also #4782.

I just tested with

>>> twisted.version
Version('twisted', 10, 1, 0)  # (SVN r30361)

along with my patch and both the redirect and the cookie with unicode cases mentioned in #4782 lead to an exception as desired:

exceptions.TypeError: Data must not be unicode

The patch deals with data at the TCP level rather than the HTTP level, so it should find all sorts of improper uses of unicode. Theoretically it should be useful for other protocols besides HTTP.

comment:11 Changed 4 years ago by exarkun

  • Author changed from msabramo to msabramo, exarkun
  • Branch set to branches/request-write-unicode-3896

(In [30463]) Branching to 'request-write-unicode-3896'

comment:12 Changed 4 years ago by exarkun

(In [30464]) Apply patch and add test module

refs #3896

comment:13 Changed 4 years ago by exarkun

  • Keywords review removed
  • Owner set to msabramo

Thanks for your work on this msabramo. I agree that this applies to more than just TCP. A downside of not making a reactorbuilder-style test for these, though, is that any reactor that happens not to use twisted.internet.abstract to implement writeSequence won't get any test coverage from the test. For example, IOCP reactor has its own writeSequence implementation which also incorrectly accepts unicode. This test won't catch that bug though. So I think it's probably a good idea to write this in the reactorbuilder style and against TCP (or perhaps against TCP, UNIX, and SSL, if you want to be very thorough).

I checked your code into the branch mentioned a couple comments above. Please make future patches against that branch. Thanks!

comment:14 Changed 3 years ago by exarkun

See also #5249

comment:15 Changed 3 years ago by msabramo

  • Cc msabramo added

Looking at adding a test to twisted/internet/test/test_tcp.py, but I think I need a bit of guidance. That file has reactor tests? Reactors by and large don't seem to have a writeSequence method (maybe the IOCP reactor does but those tests are skipped on my system, probably because I don't have the proper dependencies installed). No testable reactor on my system has a writeSequence method. My understanding is that reactors have protocol factories registered, which create protocol objects, which have transport objects, which have write and writeSequence methods. Not sure if I'm supposed to set up all that machinery to test writeSequence or if there's something simpler that just tests the few reactors that have their own writeSequence?

Hope that made some semblance of sense? :-)

comment:16 Changed 3 years ago by msabramo

  • Keywords review added
  • Owner changed from msabramo to exarkun

comment:17 Changed 3 years ago by msabramo

Oh, duh, IOCP is Windows-only.

comment:18 Changed 3 years ago by exarkun

  • Author changed from msabramo, exarkun to msabramo
  • Keywords review removed
  • Owner changed from exarkun to msabramo

It sounds like your understanding is correct. The conclusion you reached, that you have to set up a bunch of stuff in order to get a transport, which actually has the writeSequence method, to use in the test is correct.

It's true that that's a lot of setup for an otherwise quite simple test (call one method, assert that it raises an exception). It's the best we've managed to invent, though. :) So far, the programming overhead has seemed worthwhile for the feature provided - the test can be written once and then run against an arbitrary reactor implementation.

Fortunately this is a bit easier now, as someone has added twisted.internet.test.test_tcp.WriteSequenceTests which should at least serve as an example of what to do to set up the transport, and which may even offer some re-usable code.

comment:19 follow-up: Changed 3 years ago by msabramo

twisted.internet.test.test_tcp.WriteSequenceTests is present on trunk but not the branch. Do you think we could merge from trunk to branches/request-write-unicode-3896 and not get a ton of breakage?

comment:20 in reply to: ↑ 19 Changed 3 years ago by msabramo

Replying to msabramo:

twisted.internet.test.test_tcp.WriteSequenceTests is present on trunk but not the branch. Do you think we could merge from trunk to branches/request-write-unicode-3896 and not get a ton of breakage?

My subversion-fu is likely to be rusty, but this looks less it might be less than straightforward:

 svn merge --dry-run svn://svn.twistedmatrix.com/svn/Twisted/trunk svn://svn.twistedmatrix.com/svn/Twisted/branches/request-write-unicode-3896
--- Merging differences between repository URLs into '.':
   C win32
   C twisted/test/test_extensions.py
   C twisted/test/test_zshcomp.py
   C twisted/python/dxprofile.py
   C twisted/python/dispatch.py
   C twisted/python/_twisted_zsh_stub
   C twisted/python/test/test_shellcomp.py
   C twisted/python/_shellcomp.py
   C twisted/python/twisted-completion.zsh
   C twisted/python/test/modules_helpers.py
   C twisted/python/test/test_zshcomp.py
   C twisted/python/zsh/_websetroot
...

comment:21 Changed 3 years ago by msabramo

$ svn merge --dry-run svn://svn.twistedmatrix.com/svn/Twisted/trunk
...
Summary of conflicts:
  Text conflicts: 50
  Property conflicts: 1
  Tree conflicts: 333
$ svn merge --dry-run svn://svn.twistedmatrix.com/svn/Twisted/trunk/twisted/internet/test/test_tcp.py Twisted/twisted/internet/test/test_tcp.py
--- Merging r23274 through r23491 into 'Twisted/twisted/internet/test/test_tcp.py':
C    Twisted/twisted/internet/test/test_tcp.py
--- Merging r23492 through r23632 into 'Twisted/twisted/internet/test/test_tcp.py':
C    Twisted/twisted/internet/test/test_tcp.py
--- Merging r23633 through r32926 into 'Twisted/twisted/internet/test/test_tcp.py':
C    Twisted/twisted/internet/test/test_tcp.py
Summary of conflicts:
  Text conflicts: 3
...

I could manually copy the WriteSequenceTests stuff into the branch, but that might make the history messy and might make later merges harder.

comment:22 Changed 3 years ago by exarkun

Oops, sorry. I should have realized WriteSequenceTests wasn't in the branch and that that would make things tricky.

What we do with svn for this case is effectively rebase the branch onto trunk. I'll do that now.

comment:23 Changed 3 years ago by exarkun

  • Author changed from msabramo to msabramo, exarkun
  • Branch changed from branches/request-write-unicode-3896 to branches/request-write-unicode-3896-2

(In [32927]) Branching to 'request-write-unicode-3896-2'

comment:24 Changed 3 years ago by exarkun

(In [32928]) Merge forward to pick up new WriteSequenceTests from trunk

refs #3896

Changed 3 years ago by msabramo

Add a ReactorBuilder test for unicode given to writeSequence

comment:25 Changed 3 years ago by msabramo

  • Keywords review added
  • Owner changed from msabramo to exarkun

Have a look at twisted.internet.test.test_tcp.WriteSequenceTests.test_writeSequenceWithUnicodeRaisesException.patch

On OS X 10.6.8:

$ patch -p0 -R < twisted.internet.ticket_3896.patch 
patching file twisted/internet/abstract.py
Hunk #1 succeeded at 324 (offset 117 lines).

$ trial twisted/internet/test/test_tcp.py
twisted.internet.test.test_tcp
...
===============================================================================
[ERROR]
Traceback (most recent call last):
  File "/Users/marc/src/twisted_svn/request-write-unicode-3896-2/Twisted/twisted/internet/test/test_tcp.py", line 973, in serverConnected
    self.fail("Reactor %r: Did not raise a TypeError exception for unicode given to writeSequence" % reactor.__class__)
twisted.trial.unittest.FailTest: Reactor <class 'twisted.internet.cfreactor.CFReactor'>: Did not raise a TypeError exception for unicode given to writeSequence

twisted.internet.test.test_tcp.WriteSequenceTests_CFReactor.test_writeSequenceWithUnicodeRaisesException
===============================================================================
[ERROR]
Traceback (most recent call last):
  File "/Users/marc/src/twisted_svn/request-write-unicode-3896-2/Twisted/twisted/internet/test/test_tcp.py", line 973, in serverConnected
    self.fail("Reactor %r: Did not raise a TypeError exception for unicode given to writeSequence" % reactor.__class__)
twisted.trial.unittest.FailTest: Reactor <class 'twisted.internet.selectreactor.SelectReactor'>: Did not raise a TypeError exception for unicode given to writeSequence

twisted.internet.test.test_tcp.WriteSequenceTests_SelectReactor.test_writeSequenceWithUnicodeRaisesException
-------------------------------------------------------------------------------
Ran 165 tests in 1.866s

FAILED (skips=94, errors=2, successes=69)

$ patch -p0 < twisted.internet.ticket_3896.patch 
patching file twisted/internet/abstract.py
Hunk #1 succeeded at 324 (offset 117 lines).

$ trial twisted/internet/test/test_tcp.py
twisted.internet.test.test_tcp
...
-------------------------------------------------------------------------------
Ran 165 tests in 0.948s

PASSED (skips=94, successes=71)

I don't have a Windows machine with a working compiler so that I can install Twisted, so I haven't tested the IOCPReactor, which was one of the main reasons for adding the ReactorBuilder test (my understanding is that the IOCPReactor test should fail).

comment:26 Changed 3 years ago by msabramo

I wonder if the Twisted Buildbot supports Buildbot try for non-committers? That would be a nice way for folks to test their patches against all the supported platforms.

comment:27 Changed 3 years ago by exarkun

Sorry, it doesn't. We can't accept arbitrary executable code from the internet to run on our slaves.

comment:28 Changed 3 years ago by msabramo

Yeah Try with no Auth would be not so good. I guess it could maybe work if there was some way to give temporary credentials. Like maybe something that gets created by the combinator tool when a ticket branch is created. But maybe this isn't enough of an issue to warrant the complexity. Just thinking out loud here. Though it would be much easier and safer to just have a committer do the try.

A totally different approach would be some kind of VM appliance. Windows is tricky - it seems to be a bit of work to get a Twisted dev environment going on Windows (e.g.: my Windows VM doesn't have Visual Studio 2003 or even disk space to install it). But having a VM appliance is probably a licensing issue.

I guess it might be best to have a Windows-focused dev try the patch or commit it to the branch and see what Buildbot says.

I could try to set up a Windows env, but that will take me a while.

comment:29 Changed 3 years ago by exarkun

  • Owner exarkun deleted

I could try to set up a Windows env, but that will take me a while.

You can let a committer do the Windows testing. I know how unpleasant getting a Windows environment set up is.

comment:30 Changed 3 years ago by antoine

For the record, when you witness a bug in Python or its standard library, don't hesitate to report it at http://bugs.python.org

In this case, the bug was first filed and fixed 5 years ago, but the fix was then reverted for a compatibility issue:
http://bugs.python.org/issue1548891

comment:31 Changed 3 years ago by exarkun

(In [33000]) Apply twisted.internet.test.test_tcp.WriteSequenceTests.test_writeSequenceWithUnicodeRaisesException.patch

refs #3896

comment:32 Changed 3 years ago by exarkun

  • Keywords review removed
  • Owner set to exarkun

Thanks for sticking with this one msabramo.

  1. in twisted/internet/test/test_filedescriptor.py
    1. We don't put dates in copyright statements anymore
    2. Since these are whitebox tests, bypassing the real public APIs normally used to create a FileDescriptor, it's probably worth noting that in the module docstring.
    3. FileDescriptorTests itself should have a docstring, and maybe narrow its focus a little - say, to just deal with writeSequence.
    4. In the test method docstrings, don't begin with "Test that". Just write about the behavior being tested.
    5. Keep lines under 80 columns when possible.
  2. in twisted/internet/test/test_tcp.py
    1. TestCase.assertRaises seems like it applies pretty well to serverConnected in test_writeSequenceWithUnicodeRaisesException - seems like it should be used, rather than duplicating its functionality with try/except. In case you weren't sure, assertRaises returns the exception, so you can still make an assertion about its string representation.
    2. The test should add an explicit log.err to the callback chain so that it isn't relying on garbage collection to have the Deferred failure be reported (particularly since that is the only thing that will cause the test to fail).
  3. The new test_tcp.py test does indeed fail on Windows with IOCP. The fix is simple though, it's the same as the fix for twisted/internet/abstract.py, added to twisted/internet/iocpreactor/abstract.py.
  4. There should be a news fragment describing the enhancement

These are pretty minor points - the basic change and the tests look good to me. I'll make the changes and, if buildbot is happy, merge to trunk. Thanks again.

comment:33 Changed 3 years ago by exarkun

(In [33001]) Coding standard tweaks. Fix IOCP too.

refs #3896

comment:34 Changed 3 years ago by exarkun

(In [33001]) Coding standard tweaks. Fix IOCP too.

refs #3896

comment:35 Changed 3 years ago by exarkun

  • Resolution set to fixed
  • Status changed from new to closed

(In [33003]) Merge request-write-unicode-3896-2

Author: msabramo
Reviewer: exarkun
Fixes: #3896

Check for unicode in the list passed to ITransport.writeSequence and raise TypeError,
as the ITransport.write implementations do, if any is found.

comment:36 Changed 2 years ago by exarkun

This was sort of a duplicate of #1510.

Note: See TracTickets for help on using tickets.