[Twisted-web] Encoding bug in Safari's XMLHttpRequest

David Remahl chmod007 at gmail.com
Sat Feb 12 23:24:39 MST 2005


On Fri, 11 Feb 2005 13:53:50 -0800, Donovan Preston <dp at ulaluma.com> wrote:
> 
> +1 on this; liveevil.js should abstract all of these problems away from
> the developer. If nobody else generates a patch, I will do one in the
> manner which James suggests some weekend soon.
> 
> dp

I've created a patch now. The problem turned out to be rather more
difficult than originally anticipated. I chose to go with the "magic"
method suggested by James.

The first time nevow_liveOutput is requested, a second argument is
passed. magicEcho is the URI encoded version of "\u9b54\u8853"
(Japanese for "magic", clever huh? ;-). nevow_liveOutput prefixes its
reply with magicEcho. No extra roundtrips, and very little overhead
since the magic is passed only on the first liveOutput query.

The problems started when I realized that AppleWebKit does not simply
interpret each byte in the stream as \xXX. The encoding it defaults to
is not iso-8859-1, it is windows latin-1 (including cp1252). This
means that for example \x91 becomes \u2018 (left single quotation
mark).

I ended up creating a lookup table for going back to something
resembling the original stream (which could then be passed to
from_utf8). Unfortunately five bytes map to the same character, namely
\ufffd (undefined) (\x81, \x8d, \x8f, \x90 and \x9d). This makes it
impossible to perfectly reconstruct the original stream if it
contained one of those bytes. This affects roughly 10% of unicode
characters smaller than 0x10000.

In any case, allowing Safari to process 90% of all characters is
better than getting erroneous output for 99.9% of them...

The only other workaround I can think of is for the client to request
a re-send of the latest message in some 7-bit encoded form (base64 or
something like that). The advantage is that the interpretation would
always be accurate and that we don't have to include the cp1252
conversion table. Disadvantages include that it requires the server to
remember the latest message, that it requires an extra xmlhttprequest
round trip, that it is relatively space inefficient and that
liveevil.js would have to include a base64 decoding function on top of
from_utf8(). Magic would still be required to determine whether a
re-transmission is necessary (i.e. if the JS implementation is buggy).

Does this seem like a reasonable compromise? If so, I'll clean up the
patch, create some unit tests and submit it for consideration.

/ Sincerely, David Remahl



More information about the Twisted-web mailing list