Opened 12 years ago

Closed 12 years ago

Last modified 5 years ago

#4050 enhancement closed fixed (fixed)

Structural issues in Lore XHTML documents

Reported by: khorn Owned by:
Priority: low Milestone:
Component: core Keywords: documentation
Cc: Jean-Paul Calderone Branch:
Author:

Description

I've been looking over the xhtml documents used to generate the twisted documentation, and I've noticed a number of issues:

  • some docs do not have a DOCTYPE declaration, I think they should all have one
  • of those documents that do have DOCTYPEs, some are using xhmtl-strict, and some are using xhtml-transitional, which is preferred? I think they should all use the same one
  • some of the docs are lacking an xml namespace attribute in their root <html> element...I think they should either all have one, or none of them should
  • according to: http://twistedmatrix.com/projects/lore/documentation/howto/lore.html all of the docs should have the same text in both their <title> element and their <h1> element...this is not the case

Attachments (2)

lorelint-failures.txt (4.6 KB) - added by khorn 12 years ago.
List of files that fail basic xml tests
docs_namespaces_titleh1match.patch (6.3 KB) - added by khorn 12 years ago.
fixes missing xhtml namespaces, makes sure title and h1 tags match, wraps a paragraph with a p tag

Download all attachments as: .zip

Change History (15)

comment:1 Changed 12 years ago by Glyph

Owner: changed from Glyph to khorn
Priority: normallow

Please feel free to submit a patch.

comment:2 Changed 12 years ago by Jean-Paul Calderone

Cc: Jean-Paul Calderone added
Keywords: documentation added

The DOCTYPE for lore documents is explicitly optional. If it is not specified, XHTML1 strict is assumed. Only XHTML1 strict and XHTML1 transitional are allowed. A fix for the documentation explaining this would be great.

It would be useful if you enumerated the documents which have title and h1 elements which disagree, so as not to force anyone else who wants to help with this to grovel through all the documents. You clearly already collected the information; sharing it makes the most value of that effort.

comment:3 in reply to:  2 Changed 12 years ago by khorn

Replying to exarkun:

The DOCTYPE for lore documents is explicitly optional. If it is not specified, XHTML1 strict is assumed. Only XHTML1 strict and XHTML1 transitional are allowed. A fix for the documentation explaining this would be great.

Hmmm...I'm not sure making the DOCTYPE optional is such a good idea. This kind of thing can cause issues with resolving entities (since whatever parser you're using doesn't know what doctype it should us). I guess you could do a preprocessing step where you stuck a doctype in there before you sent it to the parser, but it feels icky to me.

Also, doesn't XHTML require a DOCTYPE be present?

From my reading, if it has external entity references, and no doctype, it's not even well-formed XML... http://www.w3.org/TR/REC-xml/#sec-references

It would be useful if you enumerated the documents which have title and h1 elements which disagree, so as not to force anyone else who wants to help with this to grovel through all the documents. You clearly already collected the information; sharing it makes the most value of that effort.

Yeah, was planning to do that yesterday, but got distracted. I'll try to get it up here soonish.

Changed 12 years ago by khorn

Attachment: lorelint-failures.txt added

List of files that fail basic xml tests

comment:4 Changed 12 years ago by khorn

I've attached a file listing which lore documents do not pass the following tests:

  • check that doctype exists
  • check that xhtml namespace exists
  • check that <title> and <h1> node match

This comes from a (currently very ugly) script that makes these checks automatically. I'll post the script itself, once I have it cleaned up a bit.

comment:5 Changed 12 years ago by Jean-Paul Calderone

Use of named entities (I forget the actual name for the concept I'm referring to, but I think everyone knows what "named entities" are - and if not, then to be clear, I'm talking about things like &mdash;) isn't recommended in lore documents. In fact, I would even like to say that it is discouraged or perhaps even forbidden. :) I'm not sure if that's documented anywhere. So, for the purposes of validity, I don't think it's necessary to worry about named entities. I'm also not terribly worried about the documents not being valid (or well-formed?) XHTML. Perhaps I can be convinced that there is some reason to care about this. Feel free to try if you think it's important. Note however that these are all issues relating to lore *input* documents. Lore output documents are another matter entirely. The primary use of lore input documents is to be processed by the lore toolchain to produce lore output documents. Secondarily, a web browser should be able to render the lore input documents into something vaguely readable. This is a development convenience, though, not much else.

Separately, with respect to your "lorelint" tool, are you aware of "lore -olint"?

comment:6 in reply to:  5 Changed 12 years ago by khorn

Replying to exarkun:

Use of named entities (I forget the actual name for the concept I'm referring to, but I think everyone knows what "named entities" are - and if not, then to be clear, I'm talking about things like &mdash;) isn't recommended in lore documents. In fact, I would even like to say that it is discouraged or perhaps even forbidden. :) I'm not sure if that's documented anywhere. So, for the purposes of validity, I don't think it's necessary to worry about named entities.

If it is documented someplace, I can't find it. And there are a few entities (&mdash; might be the only one) in the existing lore docs. If this isn't supposed to be the case, then it certainly needs to be documented, and the relevant files changed.

I'm also not terribly worried about the documents not being valid (or well-formed?) XHTML. Perhaps I can be convinced that there is some reason to care about this. Feel free to try if you think it's important.

Well, when reading about the Lore input format, it says that "Lore documents are a special subset of XHTML documents". If they aren't valid XHTML, then that's not the case, and the documentation is misleading. Obviously this isn't that big a deal for most people, until they try to process the files with some other toolchain, expecting valid XHTML, and it goes all wonky on them.

In any case, I don't see any reason *not* to go ahead and put doctypes in and require proper namespaces. It's easy to detect, and easy to fix. What's the downside?

(as I said on the mailing list...and I just noticed I forgot to mention here...I'm happy to produce a patch to fix these issues, once it's decided what should be done. I just haven't gotten to it yet)

SNIP

Separately, with respect to your "lorelint" tool, are you aware of "lore -olint"?

Er, I wasn't, but after you mentioned it, I tried it out, and I can't seem to get it to work for me. I get tracebacks like this:

conch/index.xhtml:0:0: title and h1 text differch/index)
conch/index.xhtml:0:0: only list items allowed in lists
conch/index.xhtml:0:0: only list items allowed in lists
conch/index.xhtml:0:0: only list items allowed in lists
Traceback (most recent call last):
  File "C:\Python26\scripts\lore.py", line 21, in <module>
    run()
  File "c:\documents and settings\hornk\desktop\twisted-dev\twisted\lore\scripts
\lore.py", line 149, in run
    result = runGivenOptions(opt)
  File "c:\documents and settings\hornk\desktop\twisted-dev\twisted\lore\scripts
\lore.py", line 131, in runGivenOptions
    walker.generate()
  File "c:\documents and settings\hornk\desktop\twisted-dev\twisted\lore\process
.py", line 53, in generate
    self.df(fullpath, linkrel)
  File "c:\documents and settings\hornk\desktop\twisted-dev\twisted\lore\default
.py", line 54, in <lambda>
    return lambda file, linkrel: lint.doFile(file, checker)
  File "c:\documents and settings\hornk\desktop\twisted-dev\twisted\lore\lint.py
", line 218, in doFile
    checker.check(doc, file)
  File "c:\documents and settings\hornk\desktop\twisted-dev\twisted\lore\lint.py
", line 24, in check
    method(dom, filename)
  File "c:\documents and settings\hornk\desktop\twisted-dev\twisted\lore\lint.py
", line 103, in check_80_columns
    if node.getAttribute('class', '').endswith('listing'):
TypeError: getAttribute() takes exactly 2 arguments (3 given)

So either I'm doing it wrong or there's a bug in this part of lore. It looks like a bug to me, assuming that node is an ordinary xml.dom.Element. Or maybe it's a difference between python versions?

comment:7 Changed 12 years ago by Jean-Paul Calderone

In any case, I don't see any reason *not* to go ahead and put doctypes in and require proper namespaces. It's easy to detect, and easy to fix. What's the downside?

The only real downside is that doctype declarations are a pain in the ass. As a rule, I like to avoid things that are a pain in the ass unless there is a compelling reason not to. :)

As far as I can tell, the only reason to include the doctype declarations is to satisfy alternate xhtml-processing toolchains which someone might want to apply to lore input documents. This may be a compelling reason, or not. I'm not going to pass judgment on that, but I will repeat the fact that the only xhtml tool I've ever used with a lore input document other than lore itself is firefox, and firefox doesn't care about the doctype. I'm open to the possibility that there are other nice tools one might want to get involved, but someone will have to introduce me to them.

I'd rather someone else make the call about what to actually do here (unless more information is presented, in which case I may feel informed enough to make the decision myself). It just doesn't make that much difference to me right now either way.

As to the behavior of "lore -olint", that looks like a bug in lore. It probably merits a bug report. I don't really think this kind of "lint" tool is a good idea. I'd rather the normal lore processing told you about stuff you did wrong; requiring a separate tool to be invoked is a good way to help people never learn about these problems.

comment:8 in reply to:  7 Changed 12 years ago by khorn

Replying to exarkun:

The only real downside is that doctype declarations are a pain in the ass. As a rule, I like to avoid things that are a pain in the ass unless there is a compelling reason not to. :)

I guess we have to agree to disagree here. I find *not* having doctype declarations to be a pain in the ass. :)

As far as I can tell, the only reason to include the doctype declarations is to satisfy alternate xhtml-processing toolchains which someone might want to apply to lore input documents. This may be a compelling reason, or not.

Fair enough.

I'd rather someone else make the call about what to actually do here (unless more information is presented, in which case I may feel informed enough to make the decision myself). It just doesn't make that much difference to me right now either way.

As to the behavior of "lore -olint", that looks like a bug in lore. It probably merits a bug report.

I'll open a ticket...#4051

I don't really think this kind of "lint" tool is a good idea. I'd rather the normal lore processing told you about stuff you did wrong; requiring a separate tool to be invoked is a good way to help people never learn about these problems.

As I said, I wasn't aware of the -olint flag when I started down this road. It's only mentioned at the veeeeery bottom of this page and I hadn't scrolled down quite far enough. My fault. It also doesn't show up in the output of lore --help. It looks like there's a reference or two in the man page, but if you want to read the man page online (I'm on Windows at the moment) you have to really want it.

Anyway, since I didn't know about it, I just "rolled my own". Now I know better. :)

Changed 12 years ago by khorn

fixes missing xhtml namespaces, makes sure title and h1 tags match, wraps a paragraph with a p tag

comment:9 Changed 12 years ago by khorn

Attached a patch which fixes the missing namespaces and title/h1 mismatches.

Also fixed a "naked" paragraph...now wrapped in a <p> tag.

Left out the DOCTYPE declarations for now, in case anyone else has objections to including them.

comment:10 Changed 12 years ago by khorn

Keywords: review added
Owner: khorn deleted

sending to review

comment:11 Changed 12 years ago by Jean-Paul Calderone

Resolution: fixed
Status: newclosed

(In [27528]) Apply docs_namespaces_titleh1match.patch fixing title/h1 mismatches

Author: khorn Reviewer: exarkun Fixes: #4050

Fix a number of mismatches between title and h1 contents in lore input documents. Also add a missing <p> wrapping a paragraph which was previously bare.

comment:12 Changed 11 years ago by <automation>

comment:13 Changed 5 years ago by hawkowl

Keywords: review removed

[mass edit] Removing the review tag from closed tickets

Note: See TracTickets for help on using tickets.