[Twisted-Python] Lore, Sphinx, and getting to the finish line (was: re: lore and tickets and other stuff)

Glyph glyph at twistedmatrix.com
Sat Mar 2 04:14:39 EST 2013

On Mar 1, 2013, at 9:35 PM, Kevin Horn <kevin.horn at gmail.com> wrote:

> That "never-ending" series of Lore source fixes took place over the course of a couple of weeks.  Doing things that way was not my idea, though it seemed reasonable at the time because  the idea was that we would do the cutover at the end of it.

Well, let's go to the video tape. Based on this comment -  <http://twistedmatrix.com/trac/ticket/4500#comment:12> - these tickets were closed over a period ranging from 2010/07 to 2011/03. 6 months isn't quite "weeks", but okay I guess it wasn't "never-ending" either :).

> I never "wandered off".  Been here the whole time.  I've been in #twisted almost continually for about the last 3 years, and in #twisted-dev for about a year (I didn't relaize it existed before that). I just got tired of (my perception) talking to myself about doing the conversion.  So I was being quiet.  Granted, I shouldn't have been, and that's on me.  but it's not like I'm hard to get a hold of.

Fair enough.  I had the inaccurate impression that you "weren't around" but you were just being quiet.  You never actually failed to respond, so that's not a fair impression.

> OK.  Let's move things along then.  
> Yes lets.

Right on.

> The last day or two have probably not been the best to try and get my attention, especially yesterday, as I essentially worked a 14 hr day trying to meet a deadline.

24 hours is a perfectly reasonable response latency, don't worry about it :).

> But I see the conversation on IRC.  I'll note that noone seems to have considered asking me anything about it.  Looks like it was about 4am, though, so perhaps that wouldn't have done much good, as I was asleep. :) 

> But hey...I have email!  Ask me!  I'll talk your ear off about it!

This email was written after said conversation as an explicit attempt to ask about just that, so, there you go :).

> (As an aside, lore2sphinx is in no way a "broken pile of regexes".  Not to say that it isn't broken in some really significant ways, because it is, but it doesn't use regexes at all.  Just sayin'.)

Actually yeah, "regex" is just a curse-word here :).  It's the emitter I'm complaining about, anyway, not the parser, so deriding it as a "regex" is in no way accurate.

> I got tired of complaining.  And arguing.

>> And since we didn't break the toolchain, I've been in no particular hurry.  I've accepted that this will take approximately a billion years.  So no rush.
> It does not have to take a billion years.  The criteria ought to be clear - and if they aren't, you should have asked for clarification :).
> I have asked for clarification more times than I can count about more aspects of this than I can possibly keep track of.


On <http://twistedmatrix.com/trac/ticket/5312> I see exactly one un-answered question in your review response, "Is re-raising the exception enough here? Or should I do something entirely different?"  Except it never actually got back into review, so it never got bubbled back up to get attention for an official response.

> Let's be specific: <http://twistedmatrix.com/trac/ticket/5312> is in need of some final code-review.  Despite several reviews and an apparently extensive final response pass, it's not currently in review, which means it's still in your court for some reason.  There is no reason to hold back on this and try to do *everything* in one big bang: this code just needs to be production-quality and land on trunk _before_ the ReST sources themselves are ready to go.
> Despite numerous attempts to prod someone into responding to my requests for clarification ;) on the ticket, I never got any response.

Like I said, I see one un-answered question.  On a ticket which is not in review: according to the development process, that means you still think you have some stuff to do on it, and it's not ready for anyone to take a look at it yet.  If you want a response, put it into review and someone will look at it as soon as time allows.  Or post here.  A comment on a ticket doesn't necessarily show up in anyone's in-box and won't necessarily get a response that isn't a code-review.

> Specifically, I could never get an answer on whether the sphinx build tool should require whomever was running it to specify a version or whether the tool should guess.  The existing tools (at the time, I haven't looked at the current state of these) do/did both, in different places.

The word "version" does not appear on 4500 at all, and on 5312 the only comment you make related to versions is you saying "Not sure which direction to go here. Deferring to sometime not the middle of the night".  It's exarkun asking the question about the versions though, not you :).

Sorry to be overly pedantic here: I'm not trying to assign blame, since that is fairly pointless now.  I'm just meaning to say that, based on what I see here, I am wondering what we could have improved.  I know we chatted on the mailing list, and in person, as well as on the tickets, so not all of this is necessarily public or even written down, but it really seems like you developed an impression of having to repeatedly ask questions and argue about things far more than you actually did :).

> And I admit, my impetus for immediacy kind of crashed when I had spent several weeks (I thought) getting everything ready to switch over the docs (in 4500) and then being told "oh we have some release stuff, we need to have a tool for that too".  My impression prior to this was that sphinx-build would be used to build the sphinx docs, which turned out to be erroneous.  I didn't even know that those tools (twisted.python._release) even existed prior to that point.

The release stuff was new-ish at the time, and is obviously not super publicly documented (it's for "internal" use only on Twisted itself right now).  So it's understandable that it didn't get communicated well, but it hardly seems like a reason to tank the whole process.

> Anyway, after a while it looked like fixing the lore sources would have to be done all over again, so I started looking into whether the conversion process itself could be improved, so that we didn't have to keep doing that.

That part of the conversation, at least, jives with my understanding :).

> Also, please elaborate on what you mean  by "do *everything* in one big bang.  My intention was never to do anything but get the SphinxBuilder working on that branch.  Was there something else you thought I was doing?  Was there something else I should (or should not) have been doing?

My reasoning goes like this: the ticket for the release tools is still not in review, so you must be waiting for something to re-submit it.  It looks like you responded to the code, so the only thing I could think you were still waiting for would be for the lore sources themselves to be ready.

> I have no idea about how the buildbots are configured.  But the linked buildbot log looks like part of the official release process.
> http://twistedmatrix.com/trac/wiki/ReleaseProcess#Buildhowtodocumentsforwebsite

Yeah. Ugh. I hate that part of that wiki page.  But that part can be Tom's problem, since he's responsible for the buildbot :).

> [the fixed-up Lore sources] got left alone because of the release tools hangup.  Ideally the release tools would have been done before the whole lore-source-tweaking process, but they weren't.  I'll admit my frustration played a part in this, but so did the deafening silence I got when I asked for anyone to comment on the ticket.

Where and how did you ask people to comment on the ticket?  I don't recall being asked, and I tend to be pretty good about leaving prompts like that in my inbox until I've done what was asked.  (Not *perfect*, of course, and if you asked a list then there might have been some bystander effect.)  It seems like we might have avoided this whole mess if you had just attached the 'review' keyword :).

> You keep saying that I wanted to "abandon the development process", and I'm not sure what you mean by that.

As I recall, we discussed this process in person at PyCon and you were quite keen to just check the documentation in in a broken state, and fix it all up in one gigantic branch while nobody did any Lore work.  To be fair, when I described the problems this would create, you did agree that we shouldn't do it that way.

> My perception has been that I would say "what do we need to do to make this happen"?  There would be some hemming and hawing (and at least several times long discussions about how documentation didn't really fit the regular UQDS process) and a sort of plan would be invented.  I would proceed according to the plan as I understood it.  I would then say "OK, we're ready"!  And then be told that some other thing not in the plan needed to be done.  The cycle would then repeat.

The only "cycle" I can either see on the tickets or recall here is where the release tools didn't come in to the initial plan.

> No [the need for release automation] was not brought up until well into the process. I (sort of) understand the desire for this, but it seems pretty weird to be building what is essentially a wrapper for an existing tool, along with tests for said wrapper, 

OK.  I can believe that this did not happen.  One problem is that we (the inner-circle old-school Twisted developers) tend to engage in conversations about how a thing might be done while at the same time we discuss what must be done.  And we also tend to discuss what policy is (or what all or some of us believe it ought to be in some case, further confusing the issue) without making explicit what the purpose of that requirement is.

I would ask the community to help us with this by doing a couple of things.

If somebody says "X is policy", always ask for a link to it.  If there is a link, it'll help you understand it better.  If there isn't a link, then the authority telling you it's "policy" might just be remembering that it's the way we've done things since forever and of course it's a good idea.  There are definitely things that I have thought were in the coding standard that are not actually written down anywhere, on more than one occasion.

If a meandering discussion is happening - here, on the mailing list, on the ticket - never be afraid to break it up and separate out the different concerns which are being discussed: what is necessary for compliance with our development process, what would be a good idea from a design point of view, how the work might be broken up to get through review more manageably, what other concerns are in play.

Especially, if you ever see a code review where a reviewer says "I think..." without making it clear what you should do, you should always ask, 'is this a requirement of the review or just some thoughts you have'.

There's also the problem of "I think you should..." being interpreted as "You must...".  It is very hard to consistently separate design feedback from code review, although we try very hard; but, it's hard to separate it out when reading it as well.  So one important point to keep in mind is that, as the author of a proposed change, outside the things that are agreed upon policy consensus, you always have some degree of discretion to disagree with a reviewer.  And you should freely do so when submitting anything for re-review.  It's best to just do this as quickly as possible, so that it gets back to the reviewer without a whole lot of delay, and they can respond with either "I still disagree, but you're doing the work, so OK go ahead" or "No, you really have to do this, it's required by policy document X, here's a link" ;-).
> The documentation itself needs to be able to be generated from any version of trunk.  While one or two formatting snafus are acceptable to be fixed after the fact, the documentation needs to be in a comprehensible state in every revision of trunk, which means that in order to land on trunk, the ReST output.
> So...you didn't finish that sentence.  I realize you apologized for errors at the end of your mail, but I have a feeling you were going to say something rather important there...

Well yes, that was the point of the apology.  That was a rather important thing.  What I was probably going to say was just:

The ReST output needs to be in good enough shape to be generally readable, with a manageable number of errors.  But, we need to be able to *verify* that it has not too many errors.

And I'd already discussed that somewhat above.

> Experience shows that it's unlikely to be surprisingly close.  I like your optimism though.

Experience just teaches it that it's not done yet.  And experience has taught us that about every change, and it was right up until the exact moment when it wasn't right any more ;-).

> Now that I've replied to all of that, let me give you a rundown of what I've been thinking and planning, so that you have an idea of where I'm coming from.
> Here are the various things that I have perceived to be necessary/required in order to get the conversion to happen:
> a) The conversion process needs to be able to be run concurrently with Lore for an extended period of time.  In other words, Lore would be the "official" version of the docs, and the Sphinx docs would be built in some form of automated fashion until everyone was happy with them and/or ready to deprecate/abandon Lore.

Your understanding of this requirement is slightly off, I think, although possibly the consequences are the same.  As per the difficulties I laid out above, about separating the requirements from the strategies for satisfying said requirements.

The thing that we weren't going to tolerate was any message saying that people should hold off on writing documentation, even for "a little while" while we fixed up the lore conversion, because without a contractual obligation for someone to finish this work, there's really no telling how long "a little while" would be :).  Since the whole point of this sphinx conversion is to appeal to documentation authors who prefer the ReST format as input (it's definitely not to make the docs look nicer, writing a new stylesheet for Lore would have taken 1/100th of the effort and nobody has expressed interest in doing that), creating a period where things were even *less* appealing to documentation authors would defeat the purpose.

Another possible solution to this problem would be to modify Lore so it could process ReST sources, so that we could convert the documentation within the repository piecemeal, and start writing any new docs in ReST, but still have a coherent whole of documentation produced, eventually switching the documentation processor from Lore to Sphinx.

Yet another possible solution would be to modify Sphinx, adding a plugin to process the Lore sources.

As an aside: this is the part of the process which has been so frustrating to me, personally.  The two alternate solutions I proposed here (and have proposed before) seem far saner and more manageable in terms of effort, to me.  But, everyone I have spoken to about docutils and ReST has told me in no uncertain terms that they are both a pile of heinous hacks that resist any attempt at sensible software-engineering solutions to problems, so we need to resort to hackish system-integration stuff like what we've done.  This worries me.

I know that Sphinx's output is well-loved by the Python community, but if it's so hard to call into that we can't reasonably modify it to get an XML DOM that looks like Lore source to Lore, and it's so hard to plug in to it that we can't give it a data structure that it likes from Lore's XML DOM, then how the heck is it being maintained?  And if it actually *isn't* that bad, then why haven't I managed to find someone that knows its code well enough to do one or the other of these things?

I have no direct knowledge of any of this stuff, because my main interest here is improving the experience of working on Twisted, both for you, Kevin, and for the people who will arguably be helped by the use of Sphinx.  Maybe I'm completely wrong and Sphinx is beautifully architected and we could have done this from day 1.  But I faintly hope that some Docutils and Sphinx contributor hears that I said "sphinx is garbage" and makes a fool of me by contributing either a lore modification or a sphinx plugin which solves this whole problem so we can do the format or tool migration incrementally :).

> b) Because of a), there needs to be tooling to run lore2sphinx (or whatever) on a regular basis.  (This was sort of being done via the Sphinx-building buildbot, but in a very ad-hockery sort of way, which was brittle, broke a couple of times, and needed to be improved.)

Hmm. I wasn't aware of that. But it seems like it's running by a charm now.

> c) There needs to be release management tooling to build the Sphinx docs from ReST into whatever formats we want to publish (HTML and PDF to start, maybe others later on)

Yup.  (ePub?  PDF is so last-century... :))

> d) Convert the Lore sources to better ReST documents without all the problems that the current lore2sphinx output has.

So, this wasn't *necessary*.  If we had gotten through the release automation stuff - and I still don't understand why that's stuck - we could have merged it.

> I at one time thought this was pretty impractical.  My first attempt at a conversion tool tried to use an intermediate object model, but I ran into trouble when trying to combine the various objects.  So I abandoned the effort and created what became lore2sphinx, which basically just combined a bunch of strings.  I then figured out a way of making the intermediate object thing work, and that was lore2sphinx-ng.  Then it became convenient to split out the intermediate object model from the documetn processing code, so I put all of that into a library and that became rstgen.

It seems the saving grace here is that rstgen might be a generally useful tool in its own right, with more of a long-term future than lore2sphinx would have had.

> (For anyone who is curious, the lore2sphinx-ng repo is forked off from the lore2sphinx repo, primarily because I didn't want to break the Sphinx buildbot by making drastic changes.)

Have a link?

> Here's what my plan was prior to this whole discussion getting started again.
> 1) Finish rstgen, where "finished" in this instance is defined as "is capable of generating all the vanilla docutils and sphinx-specific ReST elements that we need for converting the 
> Twisted documentation.

Sounds like a worthy goal, although I don't think this is necessarily required.  Have you been working on it for the last 2 years?  Do you have any idea when it might be done?  It might be worthwhile to write a *smaller* .

> 2) Finish lore2sphinx-ng (which would probably have ended with merging it back into the lore2sphinx repo), where "finished" means that it would be capable of processing all the XHTML Lore tags that were defined in the Lore documentation and used in the Twisted documentation, and generating a tree of rstgen elements, which could then be rendered into ReST.


While this would be handy, especially for people working on documentation branches, it's not necessarily necessary.

> (this would also serve to satisfy b) above, as the CLI in lore2sphinx-ng is less...well, let's just call it broken than lore2sphinx's was/is.)


> 3) Go back and finish SphinxBuilder (release tooling for building a sphinx project, which is basically a wrapper for sphinx-build, plus some vague "version feature").

This is really the crux; this is the thing you should work on first, I think, even if you're going to keep working on lore2sphinx-ng.  Basically the only reason that I was keen to get the lore to sphinx conversion improved in the first place was that creating this tool seemed to be dragging on for quite a while after the "chunk tickets" were done.  But now, this tool is almost done, and we could re-do the lore-source review if you wanted to do that.  The current lore2sphinx might well be good enough to just go with, especially if the next-generation version is going to take another six months to finish.

> 4) Get someone to use something less hackish than what's currently building the Sphinx docs on the buildbot, and preferably in such a way that the results of those builds could be published somewhere and have persistent links.  Currently the results of what the Sphinx buildbot does are stored for a time, and then go away, so you'll see links to build results in some trac tickets that go nowhere, which is decidedly unhelpful.  My plan was that we'd set up something where the Sphinx docs would get generated and published someplace for every buildbot build so that we could always have the current results for the lore to sphinx conversion for the tip of each branch.  I have no idea whether this is actually feasible or practical, but it seemed like it would be useful.

OK, *this* sounds like really unnecessary turd-polishing ;-).  This builder is an interim step; the more interesting step is the builder that just builds the sphinx docs, in the same way that the current builder builds the lore docs.  Furthermore, it seems to be working fine.  Build results links that go nowhere are a known problem with buildbot, since it does eventually lose most history, and this type of history takes up a fair bit of disk space.

> 5) Proceed with Sphinx docs being built from lore sources, making tweaks as necessary to lore2sphinx(ng) for as long as it took for the generated docs to be good enough to justify switching to Sphinx entirely.
> 6) Switch to Sphinx entirely.
> I really wasn't planning on trying to get people excited about switching to Sphinx again until 1) and 2) were at least "mostly" done (for certain values of done) and I had gone back to finish 3).
> So.  I guess at this point the question is whether to try and go with what's there (lore2sphinx) or finish up the "new stuff" (lore2sphinx-ng + rstgen).  I think 3-6 in my above plan need to happen in any case, and I think those will be much easier with lore2sphinx-ng+rstgen.

This decision is really determined by time estimates.

In any case, work out the sphinx release automation tool first, since we need that regardless of how we switch over.

> I think I have some changes to lore2sphinx and rstgen which I haven't pushed yet.  I'll try to get those out there soonish (sometime over the weekend) in case people want to look at them.

You might want to send a considerably shorter message just enticing other list members to have a look at maybe help out with that stuff :).

> IIRC, rstgen has support for most of the vanilla docutils elements, with the notable exception of tables (and maybe definition lists...can't recall whether I finished those).  It has a basic level of test coverage (of course you can never have too many tests) for rendering the elements individually, and some test for elements in combination (particularly nested lists).  Footnotes and Citations I think also need some work, which I have a plan for, but haven't implemented yet (i don't think).
> The "new" lore2sphinx CLI tool needs more work, but is relatively straightforward.  Like the old tool, it's basically an elementtree processor, except instead of spitting out strings that get joined together (which granted was an unholy mess), it generates rstgen elements, which all have a .render() method.  After processing a Lore document, you shoudl end up with a rstgen.Document object.  You call it's render() method, which calls it's children's render() methods, etc. and it's turtles all the way down.
> The framework is there for the new CLI tool, it's mostly a matter of writing a bunch of short methods that take elementtree elements as input and return appropriate rstgen objects.
> Obviously these tools aren't finished, but they produce much better output than the old version of lore2sphinx w.r.t. whitespace handling, paragraph wrapping, etc.

Aesthetically, this appeals to me a lot more than going with the messiness of lore2sphinx.  But it is _not_ a requirement.

> Some of the code is still pretty messy, but nowhere near the train wreck that the current/old version of lore2sphinx is.  By which I mean it _can_ be cleaned up, it just hasn't been yet.  In particular there's some places in rstgen where the API is (to me at least) obviously awful, but I haven't gotten around to fixing it yet.
> Please review the code.  Please feel free to ask questions if you're interested.
> Personally, I've gotten over being in a hurry about all this, and I think a robust tool is more likely to succeed in the long run, though finishing it may make the run a bit longer.  So I'm for finishing lore2sphinx-ng+rstgen.

I think a little false urgency might not hurt here :-).  I'm not going to work on the tool - just writing these emails probably blew my Twisted development budget for the next two months ;-) - but I will do my best to quickly clear up any procedural what-needs-to-be-done questions unambiguously.  Please ping if anything gets you stuck.

> What are others' opinions?  Make the "old" tool work?  Or make the "new" tool work?
> Damn.  Talk about long emails.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://twistedmatrix.com/pipermail/twisted-python/attachments/20130302/1eacb123/attachment-0001.htm 

More information about the Twisted-Python mailing list