[Twisted-Python] raw xml to element, char encoding/decoding error

Gabriel Rossetti gabriel.rossetti at arimaz.com
Wed Feb 18 06:14:01 EST 2009


Hello,

I wrote some code to transform a raw XML string into a domish.Element, 
and I keep on getting char encoding/decoding errors :

    class __RawXmlToElement(object):
       
        def __call__(self, s):
            self.result = None
            def onStart(el):
                self.result = el
            def onEnd():
                pass
            def onElement(el):
                self.result.addChild(el)
               
            parser = domish.elementStream()
            parser.DocumentStartEvent = onStart
            parser.ElementEvent = onElement
            parser.DocumentEndEvent = onEnd
            tmp = domish.Element(("", "s"))
            tmp.addRawXml(s)
            parser.parse(tmp.toXml())
           
            return self.result.firstChildElement()

    rawXmlToElement = __RawXmlToElement()


Here's a test raw XML string :

     >>> u"<t>reçu</t>"
    u'<t>re\xe7u</t>'

     >>> u"<t>reçu</t>".encode("utf-8")
    '<t>re\xc3\xa7u</t>'

     >>> "<t>reçu</t>"
    '<t>re\xc3\xa7u</t>'


As you can see my system encodes strings in UTF-8, I tried the following 
but I
keep on getting errors :

     >>> rawXmlToElement("<t>reçu</t>")
    raw xml adder error : 'ascii' codec can't decode byte 0xc3 in
    position 5: ordinal not in range(128)

     >>> rawXmlToElement(u"<t>reçu</t>")
    parser error : 'ascii' codec can't encode character u'\xe7' in
    position 8: ordinal not in range(128)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 26, in __call__
    AttributeError: 'NoneType' object has no attribute 'firstChildElement'

     >>> rawXmlToElement(unicode("<t>reçu</t>", "utf-8"))
    parser error : 'ascii' codec can't encode character u'\xe7' in
    position 8: ordinal not in range(128)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 26, in __call__
    AttributeError: 'NoneType' object has no attribute 'firstChildElement'


If I try it with ASCII encodable chars it works correctly :

     >>> rawXmlToElement("<t>toto</t>").toXml()
    u'<t>toto</t>'

     >>> rawXmlToElement(u"<t>toto</t>").toXml()
    u'<t>toto</t>'

     >>> rawXmlToElement(unicode("<t>toto</t>", " utf-8")).toXml()
    u'<t>toto</t>'


Does anyone have an idea on what I'm doing wrong here? Thank you!




More information about the Twisted-Python mailing list