Opened 19 years ago

Closed 13 years ago

Last modified 6 years ago

#414 defect closed fixed (fixed)

Lore incorrectly throws away text nodes which are all whitespace, omitting them from the output document

Reported by: Nafai Owned by:
Priority: normal Milestone:
Component: lore Keywords:
Cc: Glyph, Nafai, spiv, itamarst, Moshe Zadka, Jean-Paul Calderone Branch: branches/whitespace-preservation-414-3
branch-diff, diff-cov, branch-cov, buildbot
Author: exarkun

Description (last modified by Glyph)

If a lore input document contains a fragment such as

<b>hello</b> <i>world</i>

then the output document will contain a fragment such as

<b>hello</b><i>world</i>

This is incorrect, and the output should be the same as the input in this case: nodes which are entirely whitespace are significant.

The root cause of this problem is that microdom erroneously throws away whitespace nodes. The current proposed fix is to stop using microdom for lore, and to replace it with minidom, which implements correct handling of whitespace in XML documents.

Change History (18)

comment:1 Changed 19 years ago by Nafai

For example, look at the Components howto and look for this:
<q>smart</q> <code>HairDryer</code>

And then look at the corresponding output

comment:2 Changed 19 years ago by Moshe Zadka

This is a problem in MicroDOMParser: it ignores gotText if it's all space
and the element is not shouldPreserveSpace. Perhaps it would be better
in that case to pretend we got exactly one space:

 def gotText(....)
     if ...:
         ....
     else:
+        self.gotStandAlone(Text, ' ')

comment:3 Changed 18 years ago by itamarst

empty space is dropped in order to up performance.

comment:4 Changed 18 years ago by jknight

Performance without correctness is not good. This is a bug. XML documents are usually space-
insensitive, but that only means you can translate multiple spaces into one space, not eat them entirely.

comment:5 Changed 15 years ago by Jean-Paul Calderone

Cc: Jean-Paul Calderone added
Keywords: review added
Owner: spiv deleted
Priority: highhighest

I agree with James.

The branch whitespace-preservation-414 removes a test for the invalid behavior, adds a couple tests for the valid behavior, and changes the parser to collapse multiple spaces into one space, rather than eating them entirely.

comment:6 Changed 15 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: set to Jean-Paul Calderone

There are some failing tests in lore still which I guess I'll need to fix.

comment:7 Changed 14 years ago by Jean-Paul Calderone

Priority: highestnormal

This is pretty hard. It's not actually a high priority for me now. I hope this ticket can be resolved someday, but it'll probably be a while before I get back to it again.

comment:8 Changed 13 years ago by Jean-Paul Calderone

Author: exarkun
Branch: branches/whitespace-preservation-414-3

(In [25530]) Branching to 'whitespace-preservation-414-3'

comment:9 Changed 13 years ago by Jean-Paul Calderone

Resolution: wontfix
Status: newclosed

The code in the branch now doesn't fix microdom, it makes lore not use microdom anymore. This resolves the functional issue which prompted the creation of this ticket, but it doesn't fix microdom. Since microdom's purpose is to be an XML library which doesn't change, changing it is probably a bad idea, so let's not. Instead, let's encourage people to not use it.

comment:10 Changed 13 years ago by Glyph

For what it's worth, microdom's purpose was to be an XML DOM which didn't change so that lore would stop breaking when people installed different versions of the XML DOM implementation. I think it's OK for it to change a little bit to make lore less broken.

On the other hand, it's OK if it stops changing because it just isn't worth the effort to maintain it; switching to microdom is an OK resolution to the lore issue here. Is there a new ticket for that issue, though, the lore bug (which the branch now fixes) rather than the microdom bug? It sounds like something should be in review, but I don't see any other tickets mentioned.

comment:11 Changed 13 years ago by Glyph

Or should I reopen and change the component to 'lore'?

comment:12 Changed 13 years ago by Jean-Paul Calderone

I filed #3560 and put the code into a branch for that ticket.

comment:13 Changed 13 years ago by Glyph

Component: weblore
Description: modified (diff)
Summary: Missing space between </q> and <code> even though there exists a space in the .xhtml fileLore incorrectly throws away text nodes which are all whitespace, omitting them from the output document

I thought about this for a while, and I think that the bug being reported here is actually in lore, not microdom. The component said "web" but the summary refers to .xhtml files, which are not processed by microdom directly, they are loaded by lore. I am guessing this component was chosen because the microdom and lore issues were (understandably, but incorrectly) conflated.

This ticket itself isn't too interesting, but it's an interesting exmplar of a particular type of debugging process pattern: a functional issue is discovered, the underlying library is discovered to be at fault, the underlying library can't or shouldn't be fixed trivially (due to compatibility concerns) and the functional issue gets dismissed because the underlying library issue will be fixed a different way. In this case, we've actually committed to fix the functional issue but the status of this ticket is misleading.

I think that in general we should try to keep tickets which describe functional issues open until the issue is actually fixed. Underlying library problems, when they are discovered to be distinct from the functional issue that they are the root cause of, should be shuffled off into separate tickets. To make it clear here I've refined the summary and added a description explaining the behavior in detail.

In this case I'm not shuffling the library issue off into another ticket because I don't think anybody wants to fix it :). I briefly considered fixing it myself, but I should probably find better things to do.

The branch in #3560 is one strategy for fixing the functional issue in lore, but we should leave this ticket open in case someone (say, a contributor to thinkcspy) is interested in tracking the resolution of this problem, in the sense that they want to know what Twisted version actually shipped with a fixed lore.

comment:14 Changed 13 years ago by Glyph

Resolution: wontfix
Status: closedreopened

comment:15 Changed 13 years ago by Jean-Paul Calderone

Resolution: fixed
Status: reopenedclosed

(In [26264]) Merge minidom-lore-3560-4

Author: exarkun Reviewer: radix, glyph Fixes: #3560, #414

Switch Lore from using twisted.web.microdom to using xml.dom.

This explicitly makes the default Lore input document DTD XHTML1-Strict, but allows either that DTD or XHTML1-Transitional (but not XHTML1-Frameset or any other DTD).

comment:16 Changed 11 years ago by <automation>

Owner: Jean-Paul Calderone deleted

comment:17 Changed 6 years ago by GitHub <noreply@…>

In 2d4b239e:

Error: Processor CommitTicketReference failed
 does not appear to be a Git repository. See the log for more information.

comment:18 Changed 6 years ago by Glyph

This pull request namespace conflict is hilarious but maybe we should start thinking about separating them somehow.

Note: See TracTickets for help on using tickets.