[Twisted-web] nevow 0.3 + encoding

James Y Knight foom at fuhm.net
Mon Sep 27 21:05:19 MDT 2004


On Sep 28, 2004, at 12:47 AM, Bartek Bargiel wrote:
> My first comment after having installed Nevow 0.3: my polish cp1250
> encoding chars are not displayed:
>
> - htmlfile replaces them all with question marks (it works OK when 
> using
> UTF-8 encoding)
>
> - xmlfile works fine when reading the text from file but again it
> fails to display national charset when it gets it returned from Python 
> code
>
> Maybe it's me doing something wrong somewhere, I'm getting more&more
> confused with that encoding stuff :)

Firstly, Nevow is and always will be designed to use only unicode 
internally. Doing anything else at this point in time is complete 
madness.

This has a few consequences:
1) you should always use unicode strings in your python code if they 
have any non-core-ASCII characters in them.
   like e.g. u"새카만 커피 oh no~ 새하얀 우유 oh yes~"
Additionally, you have to make sure your source code file encoding is 
set properly <http://www.python.org/peps/pep-0263.html> or else use 
unicode escapes instead of the actual characters,
   e.g. u"\uc0c8\uce74\ub9cc \ucee4\ud53c oh no~ \uc0c8\ud558\uc580 
\uc6b0\uc720 oh yes~"

2) xmlfile and htmlfile must decode from the file's encoding to 
unicode. However, htmlfile is completely broken in this regard: it does 
not decode the file encoding at all. If the file happens to be in UTF-8 
already, it will "work", but only because it returns byte strings, 
which are not encoded upon output.

This really ought to be fixed; people have lots of pre-existing files 
in strange encodings, and utf-8 editor support isn't quite all there 
yet, either. htmlfile should do META content-type tag sniffing (like a 
browser would), and also allow the developer to specify a default 
encoding in the htmlfile constructor.

Fortunately, xmlfile does work right: use a standard <?xml 
version="1.0" encoding="cp1250"?> declaration at the top of the file 
and it'll do the right thing.

3) When writing the response to the client, nevow must encode from 
unicode into the proper response encoding.  Currently there is no way 
to specify any response encoding besides UTF-8. I do not believe this 
needs to be (or even should be) fixed: any browser that cannot handle 
UTF-8 encoding is utterly worthless, and I don't think there are any 
browsers that worthless still in use. At least I hope there aren't.

James




More information about the Twisted-web mailing list