Opened 10 years ago

Last modified 11 months ago

#2490 enhancement new

xpath.queryForString fails on unicode XML

Reported by: akrherz Owned by:
Priority: low Milestone:
Component: words Keywords: unicode xpath
Cc: ralphm Branch:
Author:

Description (last modified by therve)

I'm getting this traceback:

File "./bot-twisted.py", line 133, in processMessage
    bstring = xpath.queryForString('/message/body', elem)
  File "/usr/lib/python2.3/site-packages/twisted/words/xish/xpath.py", line 275, in queryForString
    return internQuery(xpathstr).queryForString(elem)
  File "/usr/lib/python2.3/site-packages/twisted/words/xish/xpath.py", line 242, in queryForString
    self.baseLocation.queryForString(elem, result)
  File "/usr/lib/python2.3/site-packages/twisted/words/xish/xpath.py", line 115, in queryForString
    self.childLocation.queryForString(c, resultbuf)
  File "/usr/lib/python2.3/site-packages/twisted/words/xish/xpath.py", line 117, in queryForString
    resultbuf.write(str(elem))
exceptions.UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 177: ordinal not in range(128)

I'm using Twisted 2.5.0 (TwistedWords-0.5.0). thanks!

Attachments (2)

test_xpath_unicode.py (649 bytes) - added by Yury Yurevich 10 years ago.
unit-test, tests unicode in content and path
xpath.diff (1.6 KB) - added by Yury Yurevich 10 years ago.
quick&dirty patch for bug #2490

Download all attachments as: .zip

Change History (14)

comment:1 Changed 10 years ago by Yury Yurevich

Keywords: unicode xpath added
Priority: normalhigh

I'm getting this bug too.

Changed 10 years ago by Yury Yurevich

Attachment: test_xpath_unicode.py added

unit-test, tests unicode in content and path

Changed 10 years ago by Yury Yurevich

Attachment: xpath.diff added

quick&dirty patch for bug #2490

comment:2 Changed 10 years ago by ralphm

Cc: ralphm added

comment:3 Changed 10 years ago by akrherz

thanks j2a, this patch seems to be working for me :)

comment:4 Changed 9 years ago by akrherz

Any updates on fixing this bug for the next twisted release? thanks!

comment:5 Changed 9 years ago by therve

Description: modified (diff)

comment:6 Changed 9 years ago by therve

xpathparser.py must not be modified, xpathparser.g should be instead. Also, the provided patch only fixes one of the tests for me.

I have no idea if the approach is correct or not.

comment:7 Changed 9 years ago by ralphm

Milestone: Twisted-8.2

comment:8 Changed 8 years ago by ralphm

Milestone: Twisted-8.2
Priority: highlow
Type: defectenhancement

As therve said, this would require changing the xpathparser.g file from which xpathparser.py is generated. However, YAPPS 2 does not support unicode in its scanner, so we would need to find a way around that. A possibility might be patching the scanner class later on in the file.

Since PyXML also uses YAPPS for parsing XPath queries, we might learn from that. They couldn't use the generated Scanner as-is, either, by the way.

comment:9 Changed 8 years ago by powdahound

I noticed that another (ghetto) workaround for this is to call xpath.queryForStringList('/message/body', stanza)[0] since that flow never casts the data using str().

comment:10 Changed 6 years ago by <automation>

Owner: ralphm deleted

comment:11 Changed 11 months ago by Ralph Meijer <ralphm@…>

In 5980a7b:

Resolve str/unicode ambiguities, add tests, exceptions on syntax errors.

This also addresses #2490 for the cases where non-ascii code points are
used in string literals with attribute or function matches. Matching
element names that are non-ascii still isn't supported.

comment:12 Changed 11 months ago by Ralph Meijer <ralphm@…>

In 3c7ef07:

Resolve str/unicode ambiguities, add tests, exceptions on syntax errors.

This also addresses #2490 for the cases where non-ascii code points are
used in string literals with attribute or function matches. Matching
element names that are non-ascii still isn't supported.

Note: See TracTickets for help on using tickets.