[Twisted-Python] Lore, Sphinx, and getting to the finish line (was: re: lore and tickets and other stuff)

Kevin Horn kevin.horn at gmail.com
Thu Mar 7 12:01:29 EST 2013

Sorry it's taken me so long to get back to this.  But it's gotten to be a
Looong email.

On Sat, Mar 2, 2013 at 3:14 AM, Glyph <glyph at twistedmatrix.com> wrote:

> On Mar 1, 2013, at 9:35 PM, Kevin Horn <kevin.horn at gmail.com> wrote:
> That "never-ending" series of Lore source fixes took place over the course
> of a couple of weeks.  Doing things that way was not my idea, though it
> seemed reasonable at the time because  the idea was that we would do the
> cutover at the end of it.
> Well, let's go to the video tape. Based on this comment -  <
> http://twistedmatrix.com/trac/ticket/4500#comment:12> - these tickets
> were closed over a period ranging from 2010/07 to 2011/03. 6 months isn't
> quite "weeks", but okay I guess it wasn't "never-ending" either :).
Hmmm.  I recall it as being much shorter.  Probably most of the work took
place it two "spurts" around the beginning and end of that time, and that's
why I remember it that way.  But I'm not interested in digging through a
bunch of old dates to find out for sure.

> (As an aside, lore2sphinx is in no way a "broken pile of regexes".  Not to
> say that it isn't broken in some really significant ways, because it is,
> but it doesn't use regexes at all.  Just sayin'.)
> Actually yeah, "regex" is just a curse-word here :).  It's the emitter I'm
> complaining about, anyway, not the parser, so deriding it as a "regex" is
> in no way accurate.

I figured that was the case, I just wanted to say something so others
reading this didn't get the wrong impression about how lore2sphinx is
implemented.  I mean it's not code I'm very proud of, but it's not _that_
bad :)

<<< snip a bunch of stuff about who said what when, why I thought what I
thought, etc. >>>

It boils down to the fact that a bunch of the conversations happened either
in person or on IRC.  This was mostly because I was in a hurry at the time,
usually because I wanted to do something before additions were made to the
documentation, which was in a somewhat "known" state (as in I knew how it
was going to behave when run through lore2sphinx) at the time.

Also, please elaborate on what you mean  by "do *everything* in one big
> bang.  My intention was never to do anything but get the SphinxBuilder
> working on that branch.  Was there something else you thought I was doing?
>  Was there something else I should (or should not) have been doing?
> My reasoning goes like this: the ticket for the release tools is still not
> in review, so you must be waiting for something to re-submit it.  It looks
> like you responded to the code, so the only thing I could think you were
> still waiting for would be for the lore sources themselves to be ready.
It's been long enough that I can't fully recall my reasoning on this.  But
_probably_ I decided that if I finished the release tools ticket, someone
might use it.  Which would be great, except that I think I had decided that
before that actually happened I needed to figure out a way to emit nicer
output from lore2sphinx.  So I left it alone until I had figured out how to
do that.

At least, that _might_ have been part of my thought process.  It really was
ages ago.

[the fixed-up Lore sources] got left alone because of the release tools
> hangup.  Ideally the release tools would have been done before the whole
> lore-source-tweaking process, but they weren't.  I'll admit my frustration
> played a part in this, but so did the deafening silence I got when I asked
> for anyone to comment on the ticket.
> Where and how did you ask people to comment on the ticket?  I don't recall
> being asked, and I tend to be pretty good about leaving prompts like that
> in my inbox until I've done what was asked.  (Not *perfect*, of course, and
> if you asked a list then there might have been some bystander effect.)  It
> seems like we might have avoided this whole mess if you had just attached
> the 'review' keyword :).


> My perception has been that I would say "what do we need to do to make
> this happen"?  There would be some hemming and hawing (and at least several
> times long discussions about how documentation didn't really fit the
> regular UQDS process) and a sort of plan would be invented.  I would
> proceed according to the plan as I understood it.  I would then say "OK,
> we're ready"!  And then be told that some other thing not in the plan
> needed to be done.  The cycle would then repeat.
> The only "cycle" I can either see on the tickets or recall here is where
> the release tools didn't come in to the initial plan.

This was the latest of several (3 or 4) according to my
recollection/perception.  It doesn't really matter now.

> No [the need for release automation] was not brought up until well into
> the process. I (sort of) understand the desire for this, but it seems
> pretty weird to be building what is essentially a wrapper for an existing
> tool, along with tests for said wrapper,
> OK.  I can believe that this did not happen.  One problem is that we (the
> inner-circle old-school Twisted developers) tend to engage in conversations
> about how a thing might be done while at the same time we discuss what must
> be done.  And we also tend to discuss what policy is (or what all or some
> of us believe it *ought to be* in some case, further confusing the issue)
> without making explicit what the *purpose* of that requirement is.
> I would ask the community to help us with this by doing a couple of things.
> If somebody says "X is policy", always ask for a link to it.  If there is
> a link, it'll help you understand it better.  If there *isn't* a link,
> then the authority telling you it's "policy" might just be remembering that
> it's the way we've done things since forever and of course it's a good
> idea.  There are definitely things that I have thought were in the coding
> standard that are not actually written down anywhere, on more than one
> occasion.
> If a meandering discussion is happening - here, on the mailing list, on
> the ticket - never be afraid to break it up and separate out the different
> concerns which are being discussed: what is necessary for compliance with
> our development process, what would be a good idea from a design point of
> view, how the work might be broken up to get through review more
> manageably, what other concerns are in play.
> Especially, if you ever see a code review where a reviewer says "I
> think..." without making it clear what you should *do*, you should always
> ask, 'is this a requirement of the review or just some thoughts you have'.
And when we ask, we should ask on the ticket, and put it back into review,
yes?  Because I think this was the part (or at least _A_ part) I was really
missing here.

> There's also the problem of "I think you should..." being interpreted as
> "You must...".  It is *very* hard to consistently separate design
> feedback from code review, although we try very hard; but, it's hard to
> separate it out when reading it as well.  So one important point to keep in
> mind is that, as the author of a proposed change, outside the things that
> are agreed upon policy consensus, you always have some degree of discretion
> to disagree with a reviewer.  And you should freely do so when submitting
> anything for re-review.  It's best to just do this as quickly as possible,
> so that it gets back to the reviewer without a whole lot of delay, and they
> can respond with either "I still disagree, but you're doing the work, so OK
> go ahead" or "No, you really have to do this, it's required by policy
> document X, here's a link" ;-).
>>    1. The documentation itself needs to be able to be generated from any
>>    version of trunk.  While one or two formatting snafus are acceptable to be
>>    fixed after the fact, the documentation needs to be in a comprehensible
>>    state in every revision of trunk, which means that in order to land on
>>    trunk, the ReST output.
>> So...you didn't finish that sentence.  I realize you apologized for
> errors at the end of your mail, but I have a feeling you were going to say
> something rather important there...
> Well yes, that was the point of the apology.  That was a rather important
> thing.  What I was probably going to say was just:
> The ReST output needs to be in good enough shape to be generally readable,
> with a manageable number of errors.  But, we need to be able to *verify*
> that it has not too many errors.
> And I'd already discussed that somewhat above.
> Now that I've replied to all of that, let me give you a rundown of what
> I've been thinking and planning, so that you have an idea of where I'm
> coming from.
> Here are the various things that I have perceived to be necessary/required
> in order to get the conversion to happen:
> a) The conversion process needs to be able to be run concurrently with
> Lore for an extended period of time.  In other words, Lore would be the
> "official" version of the docs, and the Sphinx docs would be built in some
> form of automated fashion until everyone was happy with them and/or ready
> to deprecate/abandon Lore.
> Your understanding of this requirement is slightly off, I think, although
> possibly the consequences are the same.  As per the difficulties I laid out
> above, about separating the requirements from the strategies for satisfying
> said requirements.

I've been told that almost verbatim, several times.  This is basically what
led to the Sphinx buildbot happening.  Perhaps I wasn't clear about what I

> The thing that we weren't going to tolerate was any message saying that
> people should hold off on writing documentation, even for "a little while"
> while we fixed up the lore conversion, because without a contractual
> obligation for someone to finish this work, there's really no telling how
> long "a little while" would be :).

Well, when I originally was pushing it, my plan was for that little while
to be "today" (this was at PyCon during the only day of sprints I was able
to attend), and if it didn't get done, we'd abandon that particular
attempt.  You and exarkun managed to convince me that even this was
probably not a very good idea though.

> Since the whole point of this sphinx conversion is to appeal to
> documentation authors who prefer the ReST format as input (it's definitely
> not to make the docs look nicer, writing a new stylesheet for Lore would
> have taken 1/100th of the effort and nobody has expressed interest in doing
> that), creating a period where things were even *less* appealing to
> documentation authors would defeat the purpose.

I actually considered the stylesheet thing, but it was really only a
passing thought.  My personal motivation started with not being able to
find things in the documentation.  So I started looking at the various Lore
tickets to see whether there was something to clean up that would help.
 And a bunch of them seemed to be asking for things that Sphinx already
did.  Sphinx was starting to become a common tool, and I had used it on
several other projects, and found it pleasant to work with.  Also, when I
asked about Lore on IRC, I got a lot of "I'm not sure anyone knows how that
works these days" and "oh man, I wish we didn't have to support that any
more", etc.  So I started looking into how to convert the docs over to use

> Another possible solution to this problem would be to modify Lore so it
> could process ReST sources, so that we could convert the documentation
> within the repository piecemeal, and start writing any new docs in ReST,
> but still have a coherent whole of documentation produced, eventually
> switching the documentation processor from Lore to Sphinx.

This would require someone smarter than me.  Or at least more versed in
formal parsing theory/techniques.  Or something.  And that would be just to
read the docutils sources.  I find them...alien. (though less so that when
I first started looking at them...I'm not sure if they've improved, or I

> Yet another possible solution would be to modify Sphinx, adding a plugin
> to process the Lore sources.

This is more reasonable, but still has problems.  Actually the reasonable
thing would be to create a docutils piece to process Lore sources, and then
maybe some Sphinx extensions on top of that.  Or something.  Still, it
might have been doable.  However, I think Lore would have had to be
modified as well, and possibly the Lore format expanded
to accommodate certain constructs that it just doesn't do right now (mostly
I'm thinking of the toctree directive and related stuff).

> As an aside: this is the part of the process which has been so frustrating
> to me, personally.  The two alternate solutions I proposed here (and have
> proposed before) seem far saner and more manageable in terms of effort, to
> me.  But, everyone I have spoken to about docutils and ReST has told me in
> no uncertain terms that they are both a pile of heinous hacks that resist
> any attempt at sensible software-engineering solutions to problems, so we
> need to resort to hackish system-integration stuff like what we've done.
>  This worries me.

Ooookaaaaay....I don't know how to respond to that exactly.

> I know that Sphinx's output is well-loved by the Python community, but if
> it's so hard to call into that we can't reasonably modify it to get an XML
> DOM that looks like Lore source to Lore, and it's so hard to plug in to it
> that we can't give it a data structure that it likes from Lore's XML DOM,
> then how the heck is it being maintained?  And if it actually *isn't* that
> bad, then why haven't I managed to find someone that knows its code well
> enough to do one or the other of these things?

It would be possible to make Sphinx emit Lore sources, though I'm not sure
what that buys.  You could do this either through a custom Sphinx
"builder", or possibly even just using a custom html template with the html
builder.  But you'd need ReST sources to feed into Sphinx, so...

You could write a docutils "parser" which parses a document and returns a
"nodetree" data structure.  This would get you as far as docutils, but
AFAIK there is no existing way to get Sphinx to use any parser other than
the default ReST one.  You could probably create such a thing, which would
almost certainly involve modifications to Sphinx, though that's not
necessarily a big deal.  It might not even be hard.  I think this would
actually be a lot easier now than when I started down this path, mostly
because docutils seems to have better documentation on the nodes that can
go in the "nodetree" I mentioned above.  Note that I said "seems" because
I'm not sure if it's that docutils documentation has gotten more complete,
or just that I've bounced around in it enough times to find things.  The
Docutils docs have the same problem that the Twisted docs have, which is
that they are nigh un-navigable.  (I also think that the docutils docs
should start using Sphinx, but I'm not sure how well that would go over in
that camp...)

The main problem with creating such a parser, is that Sphinx uses a bunch
of docutils extensions to tie together the disparate documents in your
project, and Lore, like vanilla docutils, doesn't have much of a concept of
being one document among many (at least not from within a document).  For
example, it has things to handle tables of contents, cross document links
(with the ability to link to a document section, rather than a specific
document, so if it gets moved to a different document, the link gets
adjusted), compilation for glossaries and index entries from across the
docs project, etc.  So you'd need to add some stuff to Lore to account for
this (some is already there).  And then we'd have to go through and modify
a bunch of the Lore sources anyway.

Like I said, this looks a lot more feasible now than it did when I first
looked at it, though I'm not sure whether it's me or docutils/Sphinx that's
changed.  Probably some of each.

At any rate, back then it seemed awfully difficult, and less interesting.

Hmmm.  And you'd also need to make some changes to the way Sphinx picks up
files.  And probably some other stuff I haven't thought of.

I have no direct knowledge of any of this stuff, because my main interest
> here is improving the experience of working on Twisted, both for you,
> Kevin, and for the people who will arguably be helped by the use of Sphinx.
>  Maybe I'm completely wrong and Sphinx is beautifully architected and we
> could have done this from day 1.  But I faintly hope that some Docutils and
> Sphinx contributor hears that I said "sphinx is garbage" and makes a fool
> of me by contributing either a lore modification or a sphinx plugin which
> solves this whole problem so we can do the format or tool migration
> incrementally :).
> b) Because of a), there needs to be tooling to run lore2sphinx (or
> whatever) on a regular basis.  (This was sort of being done via the
> Sphinx-building buildbot, but in a very ad-hockery sort of way, which was
> brittle, broke a couple of times, and needed to be improved.)
> Hmm. I wasn't aware of that. But it seems like it's running by a charm now.

I think this is because a) exarkun fixed it a couple of times, and b) I
stopped making changes to the lore2sphinx repo (which the buildbot pulls
from).  I'm also referring here to something which is completely
non-obvious to anyone who hasn't actually run lore2sphinx by hand, which is
that the command line tool was fairly terrible in several ways.  This made
it harder to use for development than it should have been.

> c) There needs to be release management tooling to build the Sphinx docs
> from ReST into whatever formats we want to publish (HTML and PDF to start,
> maybe others later on)
> Yup.  (ePub?  PDF is so last-century... :))
> d) Convert the Lore sources to better ReST documents without all the
> problems that the current lore2sphinx output has.
> So, this wasn't *necessary*.  If we had gotten through the release
> automation stuff - and I still don't understand why that's stuck - we could
> have merged it.

Well, I decided it was.  Or at least really really desirable.

> I at one time thought this was pretty impractical.  My first attempt at a
> conversion tool tried to use an intermediate object model, but I ran into
> trouble when trying to combine the various objects.  So I abandoned the
> effort and created what became lore2sphinx, which basically just combined a
> bunch of strings.  I then figured out a way of making the intermediate
> object thing work, and that was lore2sphinx-ng.  Then it became convenient
> to split out the intermediate object model from the documetn processing
> code, so I put all of that into a library and that became rstgen.
> It seems the saving grace here is that rstgen might be a generally useful
> tool in its own right, with more of a long-term future than lore2sphinx
> would have had.

I admit that I have become more interested in the actual problem of
"generating ReST" than I once was.  And I hope that it will become a
generally useful tool.

And probably one of the reasons I have been making such relatively slow
progress on it is is _because_ I'm trying to solve a more general problem
than I once was.  The original lore2sphinx (the one running on the buildbot
now) was very much a minimal-thing-that-could-possibly-work kind of
solution.  It tried to do just enough to get the job done.  It sort of did
get the job done, but I was never very satisfied with it.

> (For anyone who is curious, the lore2sphinx-ng repo is forked off from the
> lore2sphinx repo, primarily because I didn't want to break the Sphinx
> buildbot by making drastic changes.)
> Have a link?

I've posted it a couple of times in this thread, though I can hardly blame
you for either missing it or losing track of it.

original: https://bitbucket.org/khorn/lore2sphinx
extra-crispy: https://bitbucket.org/khorn/lore2sphinx-ng

> Here's what my plan was prior to this whole discussion getting started
> again.
> 1) Finish rstgen, where "finished" in this instance is defined as "is
> capable of generating all the vanilla docutils and sphinx-specific ReST
> elements that we need for converting the
> Twisted documentation.
> Sounds like a worthy goal, although I don't think this is necessarily
> required.  Have you been working on it for the last 2 years?  Do you have
> any idea when it might be done?  It might be worthwhile to write a
> *smaller* .

I started on rstgen a bit more than a year ago.  I was hung up on the
problem of how to combine various parts of a document for a while without
having the crazy space-handling issues.  And also I've been trying to come
up with a relatively friendly API, and enough generality that it will end
up useful outside of the lore2sphinx context.

I really started on l2s-ng last July during "Julython".  I've been working
on it in fits and starts a few times since then.

> 2) Finish lore2sphinx-ng (which would probably have ended with merging it
> back into the lore2sphinx repo), where "finished" means that it would be
> capable of processing all the XHTML Lore tags that were defined in the Lore
> documentation and used in the Twisted documentation, and generating a tree
> of rstgen elements, which could then be rendered into ReST.
> Cool.
> While this would be handy, especially for people working on documentation
> branches, it's not necessarily necessary.
> (this would also serve to satisfy b) above, as the CLI in lore2sphinx-ng
> is less...well, let's just call it broken than lore2sphinx's was/is.)
> OK.
> 3) Go back and finish SphinxBuilder (release tooling for building a sphinx
> project, which is basically a wrapper for sphinx-build, plus some vague
> "version feature").
> This is really the crux; this is the thing you should work on first, I
> think, even if you're going to keep working on lore2sphinx-ng.  Basically
> the only reason that I was keen to get the lore to sphinx conversion
> improved in the first place was that creating this tool seemed to be
> dragging on for quite a while after the "chunk tickets" were done.  But
> now, this tool is almost done, and we could re-do the lore-source review if
> you wanted to do that.  The current lore2sphinx might well be good enough
> to just go with, especially if the next-generation version is going to take
> another six months to finish.

I'll take a look at this again soonish (a week?  this month? don't know.).
 Probably it's a matter of:

- merge forward (it has been a while)
- figure out how the other tools guess/determine the Twisted version in the
checkout, and make SphinxBuilder do that.
- get it reveiewed
- commit

But I'll have to remember how to use combinator again (which will be much
easier now that the combinator "docs" are on the Twisted wiki...thanks to
whomever did that!)

Yes, I could probably use Bazaar, but so far every time I've tried that,
I've ended up spending waaaaaay too much time just on the VCS.  I guess I
have some kind of mental block with bzr.  I'll get over it someday I

> 4) Get someone to use something less hackish than what's currently
> building the Sphinx docs on the buildbot, and preferably in such a way that
> the results of those builds could be published somewhere and have
> persistent links.  Currently the results of what the Sphinx buildbot does
> are stored for a time, and then go away, so you'll see links to build
> results in some trac tickets that go nowhere, which is decidedly unhelpful.
>  My plan was that we'd set up something where the Sphinx docs would get
> generated and published someplace for every buildbot build so that we could
> always have the current results for the lore to sphinx conversion for the
> tip of each branch.  I have no idea whether this is actually feasible or
> practical, but it seemed like it would be useful.
> OK, *this* sounds like really unnecessary turd-polishing ;-).  This
> builder is an interim step; the more interesting step is the builder that
> just builds the sphinx docs, in the same way that the current builder
> builds the lore docs.  Furthermore, it seems to be working fine.  Build
> results links that go nowhere are a known problem with buildbot, since it
> does eventually lose most history, and this type of history takes up a fair
> bit of disk space.

Well, it was mostly motivated by the fact that we were doing a lot of
linking to build results that would then cease to exist for a while, and it
really annoyed me.  It doesn't seem nearly as "necessary" to me now as it
once did.

> 5) Proceed with Sphinx docs being built from lore sources, making tweaks
> as necessary to lore2sphinx(ng) for as long as it took for the generated
> docs to be good enough to justify switching to Sphinx entirely.
> 6) Switch to Sphinx entirely.
> I really wasn't planning on trying to get people excited about switching
> to Sphinx again until 1) and 2) were at least "mostly" done (for certain
> values of done) and I had gone back to finish 3).
> So.  I guess at this point the question is whether to try and go with
> what's there (lore2sphinx) or finish up the "new stuff" (lore2sphinx-ng +
> rstgen).  I think 3-6 in my above plan need to happen in any case, and I
> think those will be much easier with lore2sphinx-ng+rstgen.
> This decision is really determined by time estimates.
> In any case, work out the sphinx release automation tool first, since we
> need that regardless of how we switch over

Got it.

> IIRC, rstgen has support for most of the vanilla docutils elements, with
> the notable exception of tables (and maybe definition lists...can't recall
> whether I finished those).  It has a basic level of test coverage (of
> course you can never have too many tests) for rendering the elements
> individually, and some test for elements in combination (particularly
> nested lists).  Footnotes and Citations I think also need some work, which
> I have a plan for, but haven't implemented yet (i don't think).
> The "new" lore2sphinx CLI tool needs more work, but is relatively
> straightforward.  Like the old tool, it's basically an elementtree
> processor, except instead of spitting out strings that get joined together
> (which granted was an unholy mess), it generates rstgen elements, which all
> have a .render() method.  After processing a Lore document, you shoudl end
> up with a rstgen.Document object.  You call it's render() method, which
> calls it's children's render() methods, etc. and it's turtles all the way
> down.
> The framework is there for the new CLI tool, it's mostly a matter of
> writing a bunch of short methods that take elementtree elements as input
> and return appropriate rstgen objects.
> Obviously these tools aren't finished, but they produce much better output
> than the old version of lore2sphinx w.r.t. whitespace handling, paragraph
> wrapping, etc.
> Aesthetically, this appeals to me a lot more than going with the messiness
> of lore2sphinx.

Me too.

> But it is _not_ a requirement.

Understood.  Though I think it might be a practical requirement, even if it
isn't a policy requirement.  If that makes sense.

> Some of the code is still pretty messy, but nowhere near the train wreck
> that the current/old version of lore2sphinx is.  By which I mean it _can_
> be cleaned up, it just hasn't been yet.  In particular there's some places
> in rstgen where the API is (to me at least) obviously awful, but I haven't
> gotten around to fixing it yet.
> Please review the code.  Please feel free to ask questions if you're
> interested.
> Personally, I've gotten over being in a hurry about all this, and I think
> a robust tool is more likely to succeed in the long run, though finishing
> it may make the run a bit longer.  So I'm for finishing
> lore2sphinx-ng+rstgen.
> I think a little false urgency might not hurt here :-).  I'm not going to
> work on the tool - just writing these emails probably blew my Twisted
> development budget for the next two months ;-)

I can relate... :)

> - but I will do my best to quickly clear up any procedural
> what-needs-to-be-done questions unambiguously.  Please ping if anything
> gets you stuck.

I'll let you know.

Kevin Horn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://twistedmatrix.com/pipermail/twisted-python/attachments/20130307/af379fec/attachment-0001.htm 

More information about the Twisted-Python mailing list