Opened 9 years ago

Closed 8 years ago

#3560 enhancement closed fixed (fixed)

remove usage of microdom from twisted.lore

Reported by: Jean-Paul Calderone Owned by:
Priority: normal Milestone:
Component: lore Keywords:
Cc: Thijs Triemstra Branch: branches/minidom-lore-3560-4
branch-diff, diff-cov, branch-cov, buildbot
Author: exarkun

Description

microdom has lots of bugs and they won't be fixed. These have negative consequences for lore (eg #414). Lore should start using a better XML library instead (ie, xml.dom.minidom).

Change History (23)

comment:1 Changed 9 years ago by Jean-Paul Calderone

Author: exarkun
Branch: branches/minidom-lore-3560

(In [25545]) Branching to 'minidom-lore-3560'

comment:2 Changed 9 years ago by Glyph

As noted on ticket 3561, switching to minidom has the disadvantage of no longer letting users writing lore documents know where their mismatched tags have gone.

I remember when I added this feature, in r2645 - it vastly reduced the amount of time I spent mismatched-tag-hunting in Emacs and elsewhere.

comment:3 Changed 9 years ago by Jean-Paul Calderone

Even though this feature had no unit tests, I intend to preserve it in the switch to minidom. :)

comment:4 Changed 9 years ago by Glyph

It's worthwhile, I think, to link to the monthly TSF report since it contains a list of related problems with minidom, each of which is being dealt with by this branch.

comment:5 Changed 9 years ago by Jean-Paul Calderone

Branch: branches/minidom-lore-3560branches/minidom-lore-3560-2

(In [25747]) Branching to 'minidom-lore-3560-2'

comment:6 Changed 8 years ago by Jean-Paul Calderone

Keywords: review added
Owner: Jean-Paul Calderone deleted

Don't review this until #3600 has been reviewed. All of the domhelpers changes in this branch are duplicated in the branch for #3600. #3600 is intended to be the authoritative definition of those changes.

Aside from that, I guess this is ready for some eyes. Some things which may be easier to understand if I explain them:

  • I've deleted most of the data files used by the lore tests. Their contents are now inline in the unit tests which I find makes the tests easier to understand and maintain.
  • Many of the strings are slightly changed in either irrelevant or important ways, primarily in the amount and placement of whitespace. microdom's behavior was incorrect, so the strings were all "incorrect". minidom's behavior is better, many of the strings are different and better now.
  • In some places (for example, twisted/lore/test/template.tpl) I've deleted part of an input document being used in the tests. I've done this where the thing deleted from the document was optional and the test using the document/template was not directly for the aspect of lore behavior that thing was required for.
  • In order to avoid changing huge chunks of lore in a relatively trivial way, I've imported xml.dom.minidom as microdom. This may result in somewhat confusing code, I'm not sure. I could change it and update all the callers if the consensus is that that would be more readable.

comment:8 Changed 8 years ago by Glyph

Keywords: review removed
Owner: set to Jean-Paul Calderone

For the record, I remain unconvinced that the quality of maintenance of the standard library minidom module is substantially better than that of minidom (browsing the revision log doesn't show a dramatically better history), and the issues you spotted suggest that it may change in unpleasant ways in the future. For example, we don't have an "internet disconnected" buildbot to make sure that we don't need to turn off yet another new switch for DTD verification to not make random blocking internet connections.

However, I'm only ambivalent about that part of the change. The majority of the change here is an unambiguously awesome improvement to both the test data and the structure of the unit tests.

  1. One of my original design goals for lore was to provide a friendly, error-tolerant frontend, needing to include a DOCTYPE turd and ?xml PI at the beginning of every file seems like a regression. Especially given that the error when you do so ends up being "undefined entity", and it's not entirely clear that the output file wasn't generated. Even disregarding my personal opinion here, it seems likely that other users of lore (okay "the other user of lore") has documents that are invalid in this fashion, and it would be good not to give them incentives to stay forked. It seems like this could be trivially fixed by just sticking in a default DOCTYPE if one wasn't included (potentially only if an unknown entity is encountered) and emitting a warning (that includes the appropriate DOCTYPE).
  2. Please file a (low-priority) ticket to do the 'microdom' to 'minidom' renaming. Leaving it this way was the right call for this branch; it eliminated a lot of noise. Reviewing that particular branch should be easy :).
  3. pyflakes issues, although I'm sure not all of these are your fault:
    ~/Projects/Twisted/trunk$ svn st | cut -c 8- | grep '.py$' | xargs pyflakes
    twisted/python/test/test_release.py:21: redefinition of unused 'Popen4' from line 16
    twisted/python/test/test_release.py:30: redefinition of unused 'pydoctor' from line 26
    twisted/python/test/test_release.py:57: redefinition of unused 'lore' from line 55
    twisted/python/test/test_release.py:1997: undefined name 'DistributionBuilderTests'
    twisted/lore/test/test_lore.py:46: redefinition of unused 'FilePath' from line 38
    twisted/lore/tree.py:12: 'copyright' imported but unused
    twisted/lore/tree.py:18: 'InsensitiveDict' imported but unused
    twisted/lore/tree.py:77: redefinition of unused 'text' from line 13
    twisted/lore/latex.py:35: redefinition of unused 'text' from line 14
    twisted/lore/slides.py:47: redefinition of unused 'os' from line 47
    

If you can deal with the no-DOCTYPE regression, I think this is ready to merge.

comment:9 Changed 8 years ago by Jean-Paul Calderone

Branch: branches/minidom-lore-3560-2branches/minidom-lore-3560-3

(In [26033]) Branching to 'minidom-lore-3560-3'

comment:10 in reply to:  8 Changed 8 years ago by Jean-Paul Calderone

Keywords: review added
Owner: Jean-Paul Calderone deleted

Replying to glyph:

For the record, I remain unconvinced that the quality of maintenance of the standard library minidom module is substantially better than that of minidom (browsing the revision log doesn't show a dramatically better history), and the issues you spotted suggest that it may change in unpleasant ways in the future. For example, we don't have an "internet disconnected" buildbot to make sure that we don't need to turn off yet another new switch for DTD verification to not make random blocking internet connections.

Your point here makes a lot of sense to me. I hope we won't regret using minidom. We may have to do some maintenance on it ourselves, but that's not necessarily doomed to failure, right?

However, I'm only ambivalent about that part of the change. The majority of the change here is an unambiguously awesome improvement to both the test data and the structure of the unit tests.

Hooray.

  1. One of my original design goals for lore was to provide a friendly, error-tolerant frontend, needing to include a DOCTYPE turd and ?xml PI at the beginning of every file seems like a regression. Especially given that the error when you do so ends up being "undefined entity", and it's not entirely clear that the output file wasn't generated. Even disregarding my personal opinion here, it seems likely that other users of lore (okay "the other user of lore") has documents that are invalid in this fashion, and it would be good not to give them incentives to stay forked. It seems like this could be trivially fixed by just sticking in a default DOCTYPE if one wasn't included (potentially only if an unknown entity is encountered) and emitting a warning (that includes the appropriate DOCTYPE).

The DOCTYPE is now optional. It's necessary to load the XHTML DTD in order to know what the replacement text for the character entities is, so that's what lore does now.

  1. Please file a (low-priority) ticket to do the 'microdom' to 'minidom' renaming. Leaving it this way was the right call for this branch; it eliminated a lot of noise. Reviewing that particular branch should be easy :).

Filed - #3619

  1. pyflakes issues, although I'm sure not all of these are your fault:
    ~/Projects/Twisted/trunk$ svn st | cut -c 8- | grep '.py$' | xargs pyflakes
    twisted/python/test/test_release.py:21: redefinition of unused 'Popen4' from line 16
    twisted/python/test/test_release.py:30: redefinition of unused 'pydoctor' from line 26
    twisted/python/test/test_release.py:57: redefinition of unused 'lore' from line 55
    twisted/python/test/test_release.py:1997: undefined name 'DistributionBuilderTests'
    twisted/lore/test/test_lore.py:46: redefinition of unused 'FilePath' from line 38
    twisted/lore/tree.py:12: 'copyright' imported but unused
    twisted/lore/tree.py:18: 'InsensitiveDict' imported but unused
    twisted/lore/tree.py:77: redefinition of unused 'text' from line 13
    twisted/lore/latex.py:35: redefinition of unused 'text' from line 14
    twisted/lore/slides.py:47: redefinition of unused 'os' from line 47
    

Fixed in r26035

If you can deal with the no-DOCTYPE regression, I think this is ready to merge.

I think that due to the character of the changes required to address the DOCTYPE issue, another review is merited.

comment:11 Changed 8 years ago by radix

Keywords: review removed
Owner: set to Jean-Paul Calderone
  1. +from xml.sax.handler import ErrorHandler, feature_external_ges, feature_external_pes, feature_validation

the feature_external_* names here aren't used, and this line is longer than 80 characters.

  1. I'm really unhappy that so much code here is changed without docstrings being added. This is making it really hard for me to review. I want to make it clear that if there is a huge problem in the branch, my share of the blame, as a review, should be reduced. Instead, that blame should be directed at moshez.
  2. However, thanks very much for the *new* tests you've written, that are actual unit tests and well-documented.
  3. + # here, and then neuter reset so that the parser created is the one

less graphic comment please

also, is there a reason that maybe we should actually monkeypatch reset and move the _parser setup stuff into there? That would allow it to actually do whatever original reset logic exists, along with our DTD/entity hacks.

  1. I don't think it was necessary for you to put all of those class definitions in parseFileAndReport. I guess I won't require you to refactor that, but it seems like it would be better at module level. (With _-prefixes, of course).
  1. Yay for assertXMLEqual! Thanks.
  1. So here's the real problem: It looks like entities (or at least some entities) aren't coming out in the output because they're being parsed into their unicode character form and then being written out with an ASCII codec, and thus blowing up. Either the output should be utf-8, or they should be converted back to their entities.

comment:12 Changed 8 years ago by Jean-Paul Calderone

(In [26142]) address review feedback

refs #3560

  • move nested classes to module scope, make them private
  • remove graphic language from comment
  • expand comment about why .reset is being ganked
  • remove unused imports

comment:13 Changed 8 years ago by Jean-Paul Calderone

Branch: branches/minidom-lore-3560-3branches/minidom-lore-3560-4

(In [26158]) Branching to 'minidom-lore-3560-4'

comment:14 Changed 8 years ago by Jean-Paul Calderone

Keywords: review added
Owner: Jean-Paul Calderone deleted

Points 1-6 addressed in the revision linked in the previous message. Point 7 addressed in several later revisions. The diff is available here: <http://twistedmatrix.com/trac/changeset?new=branches%2Fminidom-lore-3560-3%4026154&old=branches%2Fminidom-lore-3560-3%4026142>. Several other unicode-related issues came up while fixing this, so the diff includes fixes for them as well. I also switched several more documents from xhtml1-strict to xhml1-transitional, as the former is no longer supported.

There was also a conflict merging to trunk, so I've merged forward. The only change in the new branch is the conflict resolution in test_release.py.

Several

comment:15 Changed 8 years ago by Thijs Triemstra

Cc: Thijs Triemstra added
Keywords: review removed
Owner: set to Jean-Paul Calderone

Make sure to update the copyright headers for 2009 in the files you modified.

comment:16 Changed 8 years ago by Jean-Paul Calderone

Keywords: review added
Owner: Jean-Paul Calderone deleted

Updated in r26180

comment:17 Changed 8 years ago by Jean-Paul Calderone

(In [26255]) Allow (and make the default) XHTML1 Strict

refs #3560

comment:18 Changed 8 years ago by Jean-Paul Calderone

(In [26256]) revert changes from strict to transitional

refs #3560

comment:19 Changed 8 years ago by Jean-Paul Calderone

radix asked for xhtml1 strict to remain supported, so the previous two changes are for that.

comment:20 Changed 8 years ago by Jean-Paul Calderone

(In [26259]) Give test_mungeAuthors a real docstring

refs #3560

comment:21 Changed 8 years ago by radix

Keywords: review removed
Owner: set to Jean-Paul Calderone

Okay, thanks for fixing those issues. It looks goodish now. +1

for the record, there were two review points I told exarkun in real-space which he fixed:

  • support xhtml1 strict again for BC purposes
  • give test_mungeAuthors a real docstring

comment:22 Changed 8 years ago by Jean-Paul Calderone

Resolution: fixed
Status: newclosed

(In [26264]) Merge minidom-lore-3560-4

Author: exarkun Reviewer: radix, glyph Fixes: #3560, #414

Switch Lore from using twisted.web.microdom to using xml.dom.

This explicitly makes the default Lore input document DTD XHTML1-Strict, but allows either that DTD or XHTML1-Transitional (but not XHTML1-Frameset or any other DTD).

comment:23 Changed 6 years ago by <automation>

Owner: Jean-Paul Calderone deleted
Note: See TracTickets for help on using tickets.