Opened 11 years ago
Closed 11 years ago
#3560 enhancement closed fixed (fixed)
remove usage of microdom from twisted.lore
Reported by: | Jean-Paul Calderone | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | lore | Keywords: | |
Cc: | Thijs Triemstra | Branch: |
branches/minidom-lore-3560-4
branch-diff, diff-cov, branch-cov, buildbot |
Author: | exarkun |
Description
microdom has lots of bugs and they won't be fixed. These have negative consequences for lore (eg #414). Lore should start using a better XML library instead (ie, xml.dom.minidom).
Change History (23)
comment:1 Changed 11 years ago by
Author: | → exarkun |
---|---|
Branch: | → branches/minidom-lore-3560 |
comment:2 Changed 11 years ago by
As noted on ticket 3561, switching to minidom has the disadvantage of no longer letting users writing lore documents know where their mismatched tags have gone.
I remember when I added this feature, in r2645 - it vastly reduced the amount of time I spent mismatched-tag-hunting in Emacs and elsewhere.
comment:3 Changed 11 years ago by
Even though this feature had no unit tests, I intend to preserve it in the switch to minidom. :)
comment:4 Changed 11 years ago by
It's worthwhile, I think, to link to the monthly TSF report since it contains a list of related problems with minidom, each of which is being dealt with by this branch.
comment:5 Changed 11 years ago by
Branch: | branches/minidom-lore-3560 → branches/minidom-lore-3560-2 |
---|
(In [25747]) Branching to 'minidom-lore-3560-2'
comment:6 Changed 11 years ago by
Keywords: | review added |
---|---|
Owner: | Jean-Paul Calderone deleted |
Don't review this until #3600 has been reviewed. All of the domhelpers changes in this branch are duplicated in the branch for #3600. #3600 is intended to be the authoritative definition of those changes.
Aside from that, I guess this is ready for some eyes. Some things which may be easier to understand if I explain them:
- I've deleted most of the data files used by the lore tests. Their contents are now inline in the unit tests which I find makes the tests easier to understand and maintain.
- Many of the strings are slightly changed in either irrelevant or important ways, primarily in the amount and placement of whitespace. microdom's behavior was incorrect, so the strings were all "incorrect". minidom's behavior is better, many of the strings are different and better now.
- In some places (for example, twisted/lore/test/template.tpl) I've deleted part of an input document being used in the tests. I've done this where the thing deleted from the document was optional and the test using the document/template was not directly for the aspect of lore behavior that thing was required for.
- In order to avoid changing huge chunks of lore in a relatively trivial way, I've imported
xml.dom.minidom
asmicrodom
. This may result in somewhat confusing code, I'm not sure. I could change it and update all the callers if the consensus is that that would be more readable.
comment:7 Changed 11 years ago by
Oh yea, build results: http://buildbot.twistedmatrix.com/boxes-supported?branch=/branches/minidom-lore-3560-2
comment:8 follow-up: 10 Changed 11 years ago by
Keywords: | review removed |
---|---|
Owner: | set to Jean-Paul Calderone |
For the record, I remain unconvinced that the quality of maintenance of the standard library minidom module is substantially better than that of minidom (browsing the revision log doesn't show a dramatically better history), and the issues you spotted suggest that it may change in unpleasant ways in the future. For example, we don't have an "internet disconnected" buildbot to make sure that we don't need to turn off yet another new switch for DTD verification to not make random blocking internet connections.
However, I'm only ambivalent about that part of the change. The majority of the change here is an unambiguously awesome improvement to both the test data and the structure of the unit tests.
- One of my original design goals for lore was to provide a friendly, error-tolerant frontend, needing to include a
DOCTYPE
turd and?xml
PI at the beginning of every file seems like a regression. Especially given that the error when you do so ends up being "undefined entity", and it's not entirely clear that the output file wasn't generated. Even disregarding my personal opinion here, it seems likely that other users of lore (okay "the other user of lore") has documents that are invalid in this fashion, and it would be good not to give them incentives to stay forked. It seems like this could be trivially fixed by just sticking in a defaultDOCTYPE
if one wasn't included (potentially only if an unknown entity is encountered) and emitting a warning (that includes the appropriateDOCTYPE
). - Please file a (low-priority) ticket to do the 'microdom' to 'minidom' renaming. Leaving it this way was the right call for this branch; it eliminated a lot of noise. Reviewing that particular branch should be easy :).
- pyflakes issues, although I'm sure not all of these are your fault:
~/Projects/Twisted/trunk$ svn st | cut -c 8- | grep '.py$' | xargs pyflakes twisted/python/test/test_release.py:21: redefinition of unused 'Popen4' from line 16 twisted/python/test/test_release.py:30: redefinition of unused 'pydoctor' from line 26 twisted/python/test/test_release.py:57: redefinition of unused 'lore' from line 55 twisted/python/test/test_release.py:1997: undefined name 'DistributionBuilderTests' twisted/lore/test/test_lore.py:46: redefinition of unused 'FilePath' from line 38 twisted/lore/tree.py:12: 'copyright' imported but unused twisted/lore/tree.py:18: 'InsensitiveDict' imported but unused twisted/lore/tree.py:77: redefinition of unused 'text' from line 13 twisted/lore/latex.py:35: redefinition of unused 'text' from line 14 twisted/lore/slides.py:47: redefinition of unused 'os' from line 47
If you can deal with the no-DOCTYPE regression, I think this is ready to merge.
comment:9 Changed 11 years ago by
Branch: | branches/minidom-lore-3560-2 → branches/minidom-lore-3560-3 |
---|
(In [26033]) Branching to 'minidom-lore-3560-3'
comment:10 Changed 11 years ago by
Keywords: | review added |
---|---|
Owner: | Jean-Paul Calderone deleted |
Replying to glyph:
For the record, I remain unconvinced that the quality of maintenance of the standard library minidom module is substantially better than that of minidom (browsing the revision log doesn't show a dramatically better history), and the issues you spotted suggest that it may change in unpleasant ways in the future. For example, we don't have an "internet disconnected" buildbot to make sure that we don't need to turn off yet another new switch for DTD verification to not make random blocking internet connections.
Your point here makes a lot of sense to me. I hope we won't regret using minidom. We may have to do some maintenance on it ourselves, but that's not necessarily doomed to failure, right?
However, I'm only ambivalent about that part of the change. The majority of the change here is an unambiguously awesome improvement to both the test data and the structure of the unit tests.
Hooray.
- One of my original design goals for lore was to provide a friendly, error-tolerant frontend, needing to include a
DOCTYPE
turd and?xml
PI at the beginning of every file seems like a regression. Especially given that the error when you do so ends up being "undefined entity", and it's not entirely clear that the output file wasn't generated. Even disregarding my personal opinion here, it seems likely that other users of lore (okay "the other user of lore") has documents that are invalid in this fashion, and it would be good not to give them incentives to stay forked. It seems like this could be trivially fixed by just sticking in a defaultDOCTYPE
if one wasn't included (potentially only if an unknown entity is encountered) and emitting a warning (that includes the appropriateDOCTYPE
).
The DOCTYPE is now optional. It's necessary to load the XHTML DTD in order to know what the replacement text for the character entities is, so that's what lore does now.
- Please file a (low-priority) ticket to do the 'microdom' to 'minidom' renaming. Leaving it this way was the right call for this branch; it eliminated a lot of noise. Reviewing that particular branch should be easy :).
Filed - #3619
- pyflakes issues, although I'm sure not all of these are your fault:
~/Projects/Twisted/trunk$ svn st | cut -c 8- | grep '.py$' | xargs pyflakes twisted/python/test/test_release.py:21: redefinition of unused 'Popen4' from line 16 twisted/python/test/test_release.py:30: redefinition of unused 'pydoctor' from line 26 twisted/python/test/test_release.py:57: redefinition of unused 'lore' from line 55 twisted/python/test/test_release.py:1997: undefined name 'DistributionBuilderTests' twisted/lore/test/test_lore.py:46: redefinition of unused 'FilePath' from line 38 twisted/lore/tree.py:12: 'copyright' imported but unused twisted/lore/tree.py:18: 'InsensitiveDict' imported but unused twisted/lore/tree.py:77: redefinition of unused 'text' from line 13 twisted/lore/latex.py:35: redefinition of unused 'text' from line 14 twisted/lore/slides.py:47: redefinition of unused 'os' from line 47
Fixed in r26035
If you can deal with the no-DOCTYPE regression, I think this is ready to merge.
I think that due to the character of the changes required to address the DOCTYPE issue, another review is merited.
comment:11 Changed 11 years ago by
Keywords: | review removed |
---|---|
Owner: | set to Jean-Paul Calderone |
- +from xml.sax.handler import ErrorHandler, feature_external_ges, feature_external_pes, feature_validation
the feature_external_* names here aren't used, and this line is longer than 80 characters.
- I'm really unhappy that so much code here is changed without docstrings being added. This is making it really hard for me to review. I want to make it clear that if there is a huge problem in the branch, my share of the blame, as a review, should be reduced. Instead, that blame should be directed at moshez.
- However, thanks very much for the *new* tests you've written, that are actual unit tests and well-documented.
- + # here, and then neuter reset so that the parser created is the one
less graphic comment please
also, is there a reason that maybe we should actually monkeypatch reset and move the _parser setup stuff into there? That would allow it to actually do whatever original reset logic exists, along with our DTD/entity hacks.
- I don't think it was necessary for you to put all of those class definitions in parseFileAndReport. I guess I won't require you to refactor that, but it seems like it would be better at module level. (With _-prefixes, of course).
- Yay for assertXMLEqual! Thanks.
- So here's the real problem: It looks like entities (or at least some entities) aren't coming out in the output because they're being parsed into their unicode character form and then being written out with an ASCII codec, and thus blowing up. Either the output should be utf-8, or they should be converted back to their entities.
comment:12 Changed 11 years ago by
comment:13 Changed 11 years ago by
Branch: | branches/minidom-lore-3560-3 → branches/minidom-lore-3560-4 |
---|
(In [26158]) Branching to 'minidom-lore-3560-4'
comment:14 Changed 11 years ago by
Keywords: | review added |
---|---|
Owner: | Jean-Paul Calderone deleted |
Points 1-6 addressed in the revision linked in the previous message. Point 7 addressed in several later revisions. The diff is available here: <http://twistedmatrix.com/trac/changeset?new=branches%2Fminidom-lore-3560-3%4026154&old=branches%2Fminidom-lore-3560-3%4026142>. Several other unicode-related issues came up while fixing this, so the diff includes fixes for them as well. I also switched several more documents from xhtml1-strict to xhml1-transitional, as the former is no longer supported.
There was also a conflict merging to trunk, so I've merged forward. The only change in the new branch is the conflict resolution in test_release.py
.
Several
comment:15 Changed 11 years ago by
Cc: | Thijs Triemstra added |
---|---|
Keywords: | review removed |
Owner: | set to Jean-Paul Calderone |
Make sure to update the copyright headers for 2009 in the files you modified.
comment:16 Changed 11 years ago by
Keywords: | review added |
---|---|
Owner: | Jean-Paul Calderone deleted |
Updated in r26180
comment:17 Changed 11 years ago by
comment:18 Changed 11 years ago by
comment:19 Changed 11 years ago by
radix asked for xhtml1 strict to remain supported, so the previous two changes are for that.
comment:21 Changed 11 years ago by
Keywords: | review removed |
---|---|
Owner: | set to Jean-Paul Calderone |
Okay, thanks for fixing those issues. It looks goodish now. +1
for the record, there were two review points I told exarkun in real-space which he fixed:
- support xhtml1 strict again for BC purposes
- give test_mungeAuthors a real docstring
comment:22 Changed 11 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
(In [26264]) Merge minidom-lore-3560-4
Author: exarkun Reviewer: radix, glyph Fixes: #3560, #414
Switch Lore from using twisted.web.microdom
to using xml.dom
.
This explicitly makes the default Lore input document DTD XHTML1-Strict, but allows either that DTD or XHTML1-Transitional (but not XHTML1-Frameset or any other DTD).
comment:23 Changed 9 years ago by
Owner: | Jean-Paul Calderone deleted |
---|
(In [25545]) Branching to 'minidom-lore-3560'