[Twisted-Python] Twisted FTP: Data must not be unicode
exarkun at twistedmatrix.com
exarkun at twistedmatrix.com
Thu Nov 24 09:07:07 EST 2011
On 01:49 pm, tobias.oberstein at tavendo.de wrote:
>Should I file a bug? If so, any guidelines what to do?
This report isn't sufficiently complete to decide if this is a bug in
Twisted or in something else.
You really cannot send unicode over a socket without encoding it. The
question to consider here is the question of whose responsibility it
should be to do that encoding in this case.
>[snip]
>
>[autobahn at autobahnhub ~/Twisted]$ svn diff twisted/protocols/ftp.py
>Index: twisted/protocols/ftp.py
>===================================================================
>--- twisted/protocols/ftp.py (revision 33225)
>+++ twisted/protocols/ftp.py (working copy)
>@@ -382,7 +382,7 @@
> self._onConnLost.callback(None)
>
> def sendLine(self, line):
>- self.transport.write(line + '\r\n')
>+ self.transport.write(str(line) + '\r\n')
This isn't the correct fix, even if the bug is in Twisted's FTP support.
`str(line)` is the least reliable way to encode a unicode string into a
byte string. It has unpredictable behavior (it relies on the action-
at-a-distance API, `sys.setdefaultencoding`, which doesn't even exist
most of the time, but which can be used to completely change what
`str(unicode)` does).
A more correct solution would be `line.encode(someencoding)`. However,
looking at `sendLine`, it's clear that the value of `someencoding` is
not easily decided upon. Should it be UTF-8? ASCII with an error
replacement policy? cp1252? Does it depend on the client, or the
server, or the filesystem encoding, or a user preference?
An even more correct solution would be for `line` to have been encoded
properly already before it was passed to `sendLine`. Where did the data
come from, and why wasn't it encoded already?
Jean-Paul
More information about the Twisted-Python
mailing list