[Twisted-Python] raw xml to element, char encoding/decoding error : OT

Gabriel Rossetti gabriel.rossetti at arimaz.com
Wed Feb 18 08:41:05 EST 2009


Gabriel Rossetti wrote:
> Hello,
>
> I wrote some code to transform a raw XML string into a domish.Element, 
> and I keep on getting char encoding/decoding errors :
>
>    class __RawXmlToElement(object):
>              def __call__(self, s):
>            self.result = None
>            def onStart(el):
>                self.result = el
>            def onEnd():
>                pass
>            def onElement(el):
>                self.result.addChild(el)
>                          parser = domish.elementStream()
>            parser.DocumentStartEvent = onStart
>            parser.ElementEvent = onElement
>            parser.DocumentEndEvent = onEnd
>            tmp = domish.Element(("", "s"))
>            tmp.addRawXml(s)
>            parser.parse(tmp.toXml())
>                      return self.result.firstChildElement()
>
>    rawXmlToElement = __RawXmlToElement()
>
>
> Here's a test raw XML string :
>
>     >>> u"<t>reçu</t>"
>    u'<t>re\xe7u</t>'
>
>     >>> u"<t>reçu</t>".encode("utf-8")
>    '<t>re\xc3\xa7u</t>'
>
>     >>> "<t>reçu</t>"
>    '<t>re\xc3\xa7u</t>'
>
>
> As you can see my system encodes strings in UTF-8, I tried the 
> following but I
> keep on getting errors :
>
>     >>> rawXmlToElement("<t>reçu</t>")
>    raw xml adder error : 'ascii' codec can't decode byte 0xc3 in
>    position 5: ordinal not in range(128)
>
>     >>> rawXmlToElement(u"<t>reçu</t>")
>    parser error : 'ascii' codec can't encode character u'\xe7' in
>    position 8: ordinal not in range(128)
>    Traceback (most recent call last):
>      File "<stdin>", line 1, in <module>
>      File "<stdin>", line 26, in __call__
>    AttributeError: 'NoneType' object has no attribute 'firstChildElement'
>
>     >>> rawXmlToElement(unicode("<t>reçu</t>", "utf-8"))
>    parser error : 'ascii' codec can't encode character u'\xe7' in
>    position 8: ordinal not in range(128)
>    Traceback (most recent call last):
>      File "<stdin>", line 1, in <module>
>      File "<stdin>", line 26, in __call__
>    AttributeError: 'NoneType' object has no attribute 'firstChildElement'
>
>
> If I try it with ASCII encodable chars it works correctly :
>
>     >>> rawXmlToElement("<t>toto</t>").toXml()
>    u'<t>toto</t>'
>
>     >>> rawXmlToElement(u"<t>toto</t>").toXml()
>    u'<t>toto</t>'
>
>     >>> rawXmlToElement(unicode("<t>toto</t>", " utf-8")).toXml()
>    u'<t>toto</t>'
>
>
> Does anyone have an idea on what I'm doing wrong here? Thank you!
>
I think this is an Python environment problem and not a Twisted problem. 
If I run the attached example in Eclipse, it works, if I run it from a 
terminal, it doesn't. This is now off topic, but if anyone has an Idea 
I'd be grateful... I'm also going to post this on the Python mailing list.

Thank you,
Gabriel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xml_parser_test.py
Type: text/x-python
Size: 812 bytes
Desc: not available
Url : http://twistedmatrix.com/pipermail/twisted-python/attachments/20090218/b55f0a66/attachment.py 


More information about the Twisted-Python mailing list