[Twisted-Python] Encoding patch for twisted.web.microdom.Document

Cory Dodt corydodt at yahoo.com
Mon Aug 18 12:14:20 EDT 2003

Francisco Miguel Colaço wrote:

>  I have patched that class, in order to make the xml opening as:
>  <?xml version="1.0" encoding="utf-8" ?>
This patch addresses the header without checking the actual character 
encoding of the document, which basically turns inside-out the current 
problem (character encoding can be utf-8 but header doesn't say so).  
Microdom does support utf-8 output, by checking if the strings it 
contains are UnicodeType.  This leaks all over the place, and you would 
get into trouble with your patch if you passed in encoding='utf-8' and 
then provided microdom with an 8-bit string.

There's a few ways to make this less leaky:

- if encoding parameter is something unicodey, make sure self.data is 
UnicodeType anywhere it is written to (i.e. anywhere it appears on the 
left side of an assignment) by raising an exception if it's not
- make sure you can't change encoding to a non-unicode encoding or pass 
a non-unicode encoding to any methods once the document has nodes


- internally convert everything to Unicode even if it is passed in as 
8-bit string, and then use only the encoding parameter to determine how 
writexml should work.

As it happens, there's already a bug open on this issue.  Please sign up 
on roundup and continue discussion of this issue here:


Thanks!  I have already pasted your original email and my reply there.


More information about the Twisted-Python mailing list