[Twisted-Python] Twisted FTP: Data must not be unicode

exarkun at twistedmatrix.com exarkun at twistedmatrix.com
Thu Nov 24 07:07:07 MST 2011


On 01:49 pm, tobias.oberstein at tavendo.de wrote:
>Should I file a bug? If so, any guidelines what to do?

This report isn't sufficiently complete to decide if this is a bug in 
Twisted or in something else.

You really cannot send unicode over a socket without encoding it.  The 
question to consider here is the question of whose responsibility it 
should be to do that encoding in this case.
>[snip]
>
>[autobahn at autobahnhub ~/Twisted]$ svn diff twisted/protocols/ftp.py
>Index: twisted/protocols/ftp.py
>===================================================================
>--- twisted/protocols/ftp.py    (revision 33225)
>+++ twisted/protocols/ftp.py    (working copy)
>@@ -382,7 +382,7 @@
>             self._onConnLost.callback(None)
>
>     def sendLine(self, line):
>-        self.transport.write(line + '\r\n')
>+        self.transport.write(str(line) + '\r\n')

This isn't the correct fix, even if the bug is in Twisted's FTP support.
`str(line)` is the least reliable way to encode a unicode string into a 
byte string.  It has unpredictable behavior (it relies on the action- 
at-a-distance API, `sys.setdefaultencoding`, which doesn't even exist 
most of the time, but which can be used to completely change what 
`str(unicode)` does).

A more correct solution would be `line.encode(someencoding)`.  However, 
looking at `sendLine`, it's clear that the value of `someencoding` is 
not easily decided upon.  Should it be UTF-8?  ASCII with an error 
replacement policy?  cp1252?  Does it depend on the client, or the 
server, or the filesystem encoding, or a user preference?

An even more correct solution would be for `line` to have been encoded 
properly already before it was passed to `sendLine`.  Where did the data 
come from, and why wasn't it encoded already?

Jean-Paul




More information about the Twisted-Python mailing list