[Twisted-Python] Lore, Sphinx, and getting to the finish line (was: re: lore and tickets and other stuff)
abdulraufhaseeb at gmail.com
Sat Mar 16 06:22:47 EDT 2013
I am very much interested to complete this project, i did apply to
Gsoc last time with this project and was rejected (however there were
better students). And i would be happy if i can take this time. Let me
know your thoughts.
On 3/7/13, Kevin Horn <kevin.horn at gmail.com> wrote:
> Sorry it's taken me so long to get back to this. But it's gotten to be a
> Looong email.
> On Sat, Mar 2, 2013 at 3:14 AM, Glyph <glyph at twistedmatrix.com> wrote:
>> On Mar 1, 2013, at 9:35 PM, Kevin Horn <kevin.horn at gmail.com> wrote:
>> That "never-ending" series of Lore source fixes took place over the
>> of a couple of weeks. Doing things that way was not my idea, though it
>> seemed reasonable at the time because the idea was that we would do the
>> cutover at the end of it.
>> Well, let's go to the video tape. Based on this comment - <
>> http://twistedmatrix.com/trac/ticket/4500#comment:12> - these tickets
>> were closed over a period ranging from 2010/07 to 2011/03. 6 months isn't
>> quite "weeks", but okay I guess it wasn't "never-ending" either :).
> Hmmm. I recall it as being much shorter. Probably most of the work took
> place it two "spurts" around the beginning and end of that time, and that's
> why I remember it that way. But I'm not interested in digging through a
> bunch of old dates to find out for sure.
>> (As an aside, lore2sphinx is in no way a "broken pile of regexes". Not
>> say that it isn't broken in some really significant ways, because it is,
>> but it doesn't use regexes at all. Just sayin'.)
>> Actually yeah, "regex" is just a curse-word here :). It's the emitter
>> complaining about, anyway, not the parser, so deriding it as a "regex" is
>> in no way accurate.
> I figured that was the case, I just wanted to say something so others
> reading this didn't get the wrong impression about how lore2sphinx is
> implemented. I mean it's not code I'm very proud of, but it's not _that_
> bad :)
> <<< snip a bunch of stuff about who said what when, why I thought what I
> thought, etc. >>>
> It boils down to the fact that a bunch of the conversations happened either
> in person or on IRC. This was mostly because I was in a hurry at the time,
> usually because I wanted to do something before additions were made to the
> documentation, which was in a somewhat "known" state (as in I knew how it
> was going to behave when run through lore2sphinx) at the time.
> Also, please elaborate on what you mean by "do *everything* in one big
>> bang. My intention was never to do anything but get the SphinxBuilder
>> working on that branch. Was there something else you thought I was
>> Was there something else I should (or should not) have been doing?
>> My reasoning goes like this: the ticket for the release tools is still
>> in review, so you must be waiting for something to re-submit it. It
>> like you responded to the code, so the only thing I could think you were
>> still waiting for would be for the lore sources themselves to be ready.
> It's been long enough that I can't fully recall my reasoning on this. But
> _probably_ I decided that if I finished the release tools ticket, someone
> might use it. Which would be great, except that I think I had decided that
> before that actually happened I needed to figure out a way to emit nicer
> output from lore2sphinx. So I left it alone until I had figured out how to
> do that.
> At least, that _might_ have been part of my thought process. It really was
> ages ago.
> [the fixed-up Lore sources] got left alone because of the release tools
>> hangup. Ideally the release tools would have been done before the whole
>> lore-source-tweaking process, but they weren't. I'll admit my
>> played a part in this, but so did the deafening silence I got when I
>> for anyone to comment on the ticket.
>> Where and how did you ask people to comment on the ticket? I don't
>> being asked, and I tend to be pretty good about leaving prompts like that
>> in my inbox until I've done what was asked. (Not *perfect*, of course,
>> if you asked a list then there might have been some bystander effect.)
>> seems like we might have avoided this whole mess if you had just attached
>> the 'review' keyword :).
> On IRC.
>> My perception has been that I would say "what do we need to do to make
>> this happen"? There would be some hemming and hawing (and at least
>> times long discussions about how documentation didn't really fit the
>> regular UQDS process) and a sort of plan would be invented. I would
>> proceed according to the plan as I understood it. I would then say "OK,
>> we're ready"! And then be told that some other thing not in the plan
>> needed to be done. The cycle would then repeat.
>> The only "cycle" I can either see on the tickets or recall here is where
>> the release tools didn't come in to the initial plan.
> This was the latest of several (3 or 4) according to my
> recollection/perception. It doesn't really matter now.
>> No [the need for release automation] was not brought up until well into
>> the process. I (sort of) understand the desire for this, but it seems
>> pretty weird to be building what is essentially a wrapper for an existing
>> tool, along with tests for said wrapper,
>> OK. I can believe that this did not happen. One problem is that we (the
>> inner-circle old-school Twisted developers) tend to engage in
>> about how a thing might be done while at the same time we discuss what
>> be done. And we also tend to discuss what policy is (or what all or some
>> of us believe it *ought to be* in some case, further confusing the issue)
>> without making explicit what the *purpose* of that requirement is.
>> I would ask the community to help us with this by doing a couple of
>> If somebody says "X is policy", always ask for a link to it. If there is
>> a link, it'll help you understand it better. If there *isn't* a link,
>> then the authority telling you it's "policy" might just be remembering
>> it's the way we've done things since forever and of course it's a good
>> idea. There are definitely things that I have thought were in the coding
>> standard that are not actually written down anywhere, on more than one
>> If a meandering discussion is happening - here, on the mailing list, on
>> the ticket - never be afraid to break it up and separate out the
>> concerns which are being discussed: what is necessary for compliance with
>> our development process, what would be a good idea from a design point of
>> view, how the work might be broken up to get through review more
>> manageably, what other concerns are in play.
>> Especially, if you ever see a code review where a reviewer says "I
>> think..." without making it clear what you should *do*, you should always
>> ask, 'is this a requirement of the review or just some thoughts you
> And when we ask, we should ask on the ticket, and put it back into review,
> yes? Because I think this was the part (or at least _A_ part) I was really
> missing here.
>> There's also the problem of "I think you should..." being interpreted as
>> "You must...". It is *very* hard to consistently separate design
>> feedback from code review, although we try very hard; but, it's hard to
>> separate it out when reading it as well. So one important point to keep
>> mind is that, as the author of a proposed change, outside the things that
>> are agreed upon policy consensus, you always have some degree of
>> to disagree with a reviewer. And you should freely do so when submitting
>> anything for re-review. It's best to just do this as quickly as
>> so that it gets back to the reviewer without a whole lot of delay, and
>> can respond with either "I still disagree, but you're doing the work, so
>> go ahead" or "No, you really have to do this, it's required by policy
>> document X, here's a link" ;-).
>>> 1. The documentation itself needs to be able to be generated from any
>>> version of trunk. While one or two formatting snafus are acceptable
>>> to be
>>> fixed after the fact, the documentation needs to be in a
>>> state in every revision of trunk, which means that in order to land
>>> trunk, the ReST output.
>>> So...you didn't finish that sentence. I realize you apologized for
>> errors at the end of your mail, but I have a feeling you were going to
>> something rather important there...
>> Well yes, that was the point of the apology. That was a rather important
>> thing. What I was probably going to say was just:
>> The ReST output needs to be in good enough shape to be generally
>> with a manageable number of errors. But, we need to be able to *verify*
>> that it has not too many errors.
>> And I'd already discussed that somewhat above.
>> Now that I've replied to all of that, let me give you a rundown of what
>> I've been thinking and planning, so that you have an idea of where I'm
>> coming from.
>> Here are the various things that I have perceived to be
>> in order to get the conversion to happen:
>> a) The conversion process needs to be able to be run concurrently with
>> Lore for an extended period of time. In other words, Lore would be the
>> "official" version of the docs, and the Sphinx docs would be built in
>> form of automated fashion until everyone was happy with them and/or ready
>> to deprecate/abandon Lore.
>> Your understanding of this requirement is slightly off, I think, although
>> possibly the consequences are the same. As per the difficulties I laid
>> above, about separating the requirements from the strategies for
>> said requirements.
> I've been told that almost verbatim, several times. This is basically what
> led to the Sphinx buildbot happening. Perhaps I wasn't clear about what I
>> The thing that we weren't going to tolerate was any message saying that
>> people should hold off on writing documentation, even for "a little
>> while we fixed up the lore conversion, because without a contractual
>> obligation for someone to finish this work, there's really no telling how
>> long "a little while" would be :).
> Well, when I originally was pushing it, my plan was for that little while
> to be "today" (this was at PyCon during the only day of sprints I was able
> to attend), and if it didn't get done, we'd abandon that particular
> attempt. You and exarkun managed to convince me that even this was
> probably not a very good idea though.
>> Since the whole point of this sphinx conversion is to appeal to
>> documentation authors who prefer the ReST format as input (it's
>> not to make the docs look nicer, writing a new stylesheet for Lore would
>> have taken 1/100th of the effort and nobody has expressed interest in
>> that), creating a period where things were even *less* appealing to
>> documentation authors would defeat the purpose.
> I actually considered the stylesheet thing, but it was really only a
> passing thought. My personal motivation started with not being able to
> find things in the documentation. So I started looking at the various Lore
> tickets to see whether there was something to clean up that would help.
> And a bunch of them seemed to be asking for things that Sphinx already
> did. Sphinx was starting to become a common tool, and I had used it on
> several other projects, and found it pleasant to work with. Also, when I
> asked about Lore on IRC, I got a lot of "I'm not sure anyone knows how that
> works these days" and "oh man, I wish we didn't have to support that any
> more", etc. So I started looking into how to convert the docs over to use
>> Another possible solution to this problem would be to modify Lore so it
>> could process ReST sources, so that we could convert the documentation
>> within the repository piecemeal, and start writing any new docs in ReST,
>> but still have a coherent whole of documentation produced, eventually
>> switching the documentation processor from Lore to Sphinx.
> This would require someone smarter than me. Or at least more versed in
> formal parsing theory/techniques. Or something. And that would be just to
> read the docutils sources. I find them...alien. (though less so that when
> I first started looking at them...I'm not sure if they've improved, or I
>> Yet another possible solution would be to modify Sphinx, adding a plugin
>> to process the Lore sources.
> This is more reasonable, but still has problems. Actually the reasonable
> thing would be to create a docutils piece to process Lore sources, and then
> maybe some Sphinx extensions on top of that. Or something. Still, it
> might have been doable. However, I think Lore would have had to be
> modified as well, and possibly the Lore format expanded
> to accommodate certain constructs that it just doesn't do right now (mostly
> I'm thinking of the toctree directive and related stuff).
>> As an aside: this is the part of the process which has been so
>> to me, personally. The two alternate solutions I proposed here (and have
>> proposed before) seem far saner and more manageable in terms of effort,
>> me. But, everyone I have spoken to about docutils and ReST has told me
>> no uncertain terms that they are both a pile of heinous hacks that resist
>> any attempt at sensible software-engineering solutions to problems, so we
>> need to resort to hackish system-integration stuff like what we've done.
>> This worries me.
> Ooookaaaaay....I don't know how to respond to that exactly.
>> I know that Sphinx's output is well-loved by the Python community, but if
>> it's so hard to call into that we can't reasonably modify it to get an
>> DOM that looks like Lore source to Lore, and it's so hard to plug in to
>> that we can't give it a data structure that it likes from Lore's XML DOM,
>> then how the heck is it being maintained? And if it actually *isn't*
>> bad, then why haven't I managed to find someone that knows its code well
>> enough to do one or the other of these things?
> It would be possible to make Sphinx emit Lore sources, though I'm not sure
> what that buys. You could do this either through a custom Sphinx
> "builder", or possibly even just using a custom html template with the html
> builder. But you'd need ReST sources to feed into Sphinx, so...
> You could write a docutils "parser" which parses a document and returns a
> "nodetree" data structure. This would get you as far as docutils, but
> AFAIK there is no existing way to get Sphinx to use any parser other than
> the default ReST one. You could probably create such a thing, which would
> almost certainly involve modifications to Sphinx, though that's not
> necessarily a big deal. It might not even be hard. I think this would
> actually be a lot easier now than when I started down this path, mostly
> because docutils seems to have better documentation on the nodes that can
> go in the "nodetree" I mentioned above. Note that I said "seems" because
> I'm not sure if it's that docutils documentation has gotten more complete,
> or just that I've bounced around in it enough times to find things. The
> Docutils docs have the same problem that the Twisted docs have, which is
> that they are nigh un-navigable. (I also think that the docutils docs
> should start using Sphinx, but I'm not sure how well that would go over in
> that camp...)
> The main problem with creating such a parser, is that Sphinx uses a bunch
> of docutils extensions to tie together the disparate documents in your
> project, and Lore, like vanilla docutils, doesn't have much of a concept of
> being one document among many (at least not from within a document). For
> example, it has things to handle tables of contents, cross document links
> (with the ability to link to a document section, rather than a specific
> document, so if it gets moved to a different document, the link gets
> adjusted), compilation for glossaries and index entries from across the
> docs project, etc. So you'd need to add some stuff to Lore to account for
> this (some is already there). And then we'd have to go through and modify
> a bunch of the Lore sources anyway.
> Like I said, this looks a lot more feasible now than it did when I first
> looked at it, though I'm not sure whether it's me or docutils/Sphinx that's
> changed. Probably some of each.
> At any rate, back then it seemed awfully difficult, and less interesting.
> Hmmm. And you'd also need to make some changes to the way Sphinx picks up
> files. And probably some other stuff I haven't thought of.
> I have no direct knowledge of any of this stuff, because my main interest
>> here is improving the experience of working on Twisted, both for you,
>> Kevin, and for the people who will arguably be helped by the use of
>> Maybe I'm completely wrong and Sphinx is beautifully architected and we
>> could have done this from day 1. But I faintly hope that some Docutils
>> Sphinx contributor hears that I said "sphinx is garbage" and makes a fool
>> of me by contributing either a lore modification or a sphinx plugin which
>> solves this whole problem so we can do the format or tool migration
>> incrementally :).
>> b) Because of a), there needs to be tooling to run lore2sphinx (or
>> whatever) on a regular basis. (This was sort of being done via the
>> Sphinx-building buildbot, but in a very ad-hockery sort of way, which was
>> brittle, broke a couple of times, and needed to be improved.)
>> Hmm. I wasn't aware of that. But it seems like it's running by a charm
> I think this is because a) exarkun fixed it a couple of times, and b) I
> stopped making changes to the lore2sphinx repo (which the buildbot pulls
> from). I'm also referring here to something which is completely
> non-obvious to anyone who hasn't actually run lore2sphinx by hand, which is
> that the command line tool was fairly terrible in several ways. This made
> it harder to use for development than it should have been.
>> c) There needs to be release management tooling to build the Sphinx docs
>> from ReST into whatever formats we want to publish (HTML and PDF to
>> maybe others later on)
>> Yup. (ePub? PDF is so last-century... :))
>> d) Convert the Lore sources to better ReST documents without all the
>> problems that the current lore2sphinx output has.
>> So, this wasn't *necessary*. If we had gotten through the release
>> automation stuff - and I still don't understand why that's stuck - we
>> have merged it.
> Well, I decided it was. Or at least really really desirable.
>> I at one time thought this was pretty impractical. My first attempt at a
>> conversion tool tried to use an intermediate object model, but I ran into
>> trouble when trying to combine the various objects. So I abandoned the
>> effort and created what became lore2sphinx, which basically just combined
>> bunch of strings. I then figured out a way of making the intermediate
>> object thing work, and that was lore2sphinx-ng. Then it became
>> to split out the intermediate object model from the documetn processing
>> code, so I put all of that into a library and that became rstgen.
>> It seems the saving grace here is that rstgen might be a generally useful
>> tool in its own right, with more of a long-term future than lore2sphinx
>> would have had.
> I admit that I have become more interested in the actual problem of
> "generating ReST" than I once was. And I hope that it will become a
> generally useful tool.
> And probably one of the reasons I have been making such relatively slow
> progress on it is is _because_ I'm trying to solve a more general problem
> than I once was. The original lore2sphinx (the one running on the buildbot
> now) was very much a minimal-thing-that-could-possibly-work kind of
> solution. It tried to do just enough to get the job done. It sort of did
> get the job done, but I was never very satisfied with it.
>> (For anyone who is curious, the lore2sphinx-ng repo is forked off from
>> lore2sphinx repo, primarily because I didn't want to break the Sphinx
>> buildbot by making drastic changes.)
>> Have a link?
> I've posted it a couple of times in this thread, though I can hardly blame
> you for either missing it or losing track of it.
> original: https://bitbucket.org/khorn/lore2sphinx
> extra-crispy: https://bitbucket.org/khorn/lore2sphinx-ng
>> Here's what my plan was prior to this whole discussion getting started
>> 1) Finish rstgen, where "finished" in this instance is defined as "is
>> capable of generating all the vanilla docutils and sphinx-specific ReST
>> elements that we need for converting the
>> Twisted documentation.
>> Sounds like a worthy goal, although I don't think this is necessarily
>> required. Have you been working on it for the last 2 years? Do you have
>> any idea when it might be done? It might be worthwhile to write a
>> *smaller* .
> I started on rstgen a bit more than a year ago. I was hung up on the
> problem of how to combine various parts of a document for a while without
> having the crazy space-handling issues. And also I've been trying to come
> up with a relatively friendly API, and enough generality that it will end
> up useful outside of the lore2sphinx context.
> I really started on l2s-ng last July during "Julython". I've been working
> on it in fits and starts a few times since then.
>> 2) Finish lore2sphinx-ng (which would probably have ended with merging it
>> back into the lore2sphinx repo), where "finished" means that it would be
>> capable of processing all the XHTML Lore tags that were defined in the
>> documentation and used in the Twisted documentation, and generating a
>> of rstgen elements, which could then be rendered into ReST.
>> While this would be handy, especially for people working on documentation
>> branches, it's not necessarily necessary.
>> (this would also serve to satisfy b) above, as the CLI in lore2sphinx-ng
>> is less...well, let's just call it broken than lore2sphinx's was/is.)
>> 3) Go back and finish SphinxBuilder (release tooling for building a
>> project, which is basically a wrapper for sphinx-build, plus some vague
>> "version feature").
>> This is really the crux; this is the thing you should work on first, I
>> think, even if you're going to keep working on lore2sphinx-ng. Basically
>> the only reason that I was keen to get the lore to sphinx conversion
>> improved in the first place was that creating this tool seemed to be
>> dragging on for quite a while after the "chunk tickets" were done. But
>> now, this tool is almost done, and we could re-do the lore-source review
>> you wanted to do that. The current lore2sphinx might well be good enough
>> to just go with, especially if the next-generation version is going to
>> another six months to finish.
> I'll take a look at this again soonish (a week? this month? don't know.).
> Probably it's a matter of:
> - merge forward (it has been a while)
> - figure out how the other tools guess/determine the Twisted version in the
> checkout, and make SphinxBuilder do that.
> - get it reveiewed
> - commit
> But I'll have to remember how to use combinator again (which will be much
> easier now that the combinator "docs" are on the Twisted wiki...thanks to
> whomever did that!)
> Yes, I could probably use Bazaar, but so far every time I've tried that,
> I've ended up spending waaaaaay too much time just on the VCS. I guess I
> have some kind of mental block with bzr. I'll get over it someday I
>> 4) Get someone to use something less hackish than what's currently
>> building the Sphinx docs on the buildbot, and preferably in such a way
>> the results of those builds could be published somewhere and have
>> persistent links. Currently the results of what the Sphinx buildbot does
>> are stored for a time, and then go away, so you'll see links to build
>> results in some trac tickets that go nowhere, which is decidedly
>> My plan was that we'd set up something where the Sphinx docs would get
>> generated and published someplace for every buildbot build so that we
>> always have the current results for the lore to sphinx conversion for the
>> tip of each branch. I have no idea whether this is actually feasible or
>> practical, but it seemed like it would be useful.
>> OK, *this* sounds like really unnecessary turd-polishing ;-). This
>> builder is an interim step; the more interesting step is the builder that
>> just builds the sphinx docs, in the same way that the current builder
>> builds the lore docs. Furthermore, it seems to be working fine. Build
>> results links that go nowhere are a known problem with buildbot, since it
>> does eventually lose most history, and this type of history takes up a
>> bit of disk space.
> Well, it was mostly motivated by the fact that we were doing a lot of
> linking to build results that would then cease to exist for a while, and it
> really annoyed me. It doesn't seem nearly as "necessary" to me now as it
> once did.
>> 5) Proceed with Sphinx docs being built from lore sources, making tweaks
>> as necessary to lore2sphinx(ng) for as long as it took for the generated
>> docs to be good enough to justify switching to Sphinx entirely.
>> 6) Switch to Sphinx entirely.
>> I really wasn't planning on trying to get people excited about switching
>> to Sphinx again until 1) and 2) were at least "mostly" done (for certain
>> values of done) and I had gone back to finish 3).
>> So. I guess at this point the question is whether to try and go with
>> what's there (lore2sphinx) or finish up the "new stuff" (lore2sphinx-ng +
>> rstgen). I think 3-6 in my above plan need to happen in any case, and I
>> think those will be much easier with lore2sphinx-ng+rstgen.
>> This decision is really determined by time estimates.
>> In any case, work out the sphinx release automation tool first, since we
>> need that regardless of how we switch over
> Got it.
>> IIRC, rstgen has support for most of the vanilla docutils elements, with
>> the notable exception of tables (and maybe definition lists...can't
>> whether I finished those). It has a basic level of test coverage (of
>> course you can never have too many tests) for rendering the elements
>> individually, and some test for elements in combination (particularly
>> nested lists). Footnotes and Citations I think also need some work,
>> I have a plan for, but haven't implemented yet (i don't think).
>> The "new" lore2sphinx CLI tool needs more work, but is relatively
>> straightforward. Like the old tool, it's basically an elementtree
>> processor, except instead of spitting out strings that get joined
>> (which granted was an unholy mess), it generates rstgen elements, which
>> have a .render() method. After processing a Lore document, you shoudl
>> up with a rstgen.Document object. You call it's render() method, which
>> calls it's children's render() methods, etc. and it's turtles all the way
>> The framework is there for the new CLI tool, it's mostly a matter of
>> writing a bunch of short methods that take elementtree elements as input
>> and return appropriate rstgen objects.
>> Obviously these tools aren't finished, but they produce much better
>> than the old version of lore2sphinx w.r.t. whitespace handling, paragraph
>> wrapping, etc.
>> Aesthetically, this appeals to me a lot more than going with the
>> of lore2sphinx.
> Me too.
>> But it is _not_ a requirement.
> Understood. Though I think it might be a practical requirement, even if it
> isn't a policy requirement. If that makes sense.
>> Some of the code is still pretty messy, but nowhere near the train wreck
>> that the current/old version of lore2sphinx is. By which I mean it _can_
>> be cleaned up, it just hasn't been yet. In particular there's some
>> in rstgen where the API is (to me at least) obviously awful, but I
>> gotten around to fixing it yet.
>> Please review the code. Please feel free to ask questions if you're
>> Personally, I've gotten over being in a hurry about all this, and I think
>> a robust tool is more likely to succeed in the long run, though finishing
>> it may make the run a bit longer. So I'm for finishing
>> I think a little false urgency might not hurt here :-). I'm not going to
>> work on the tool - just writing these emails probably blew my Twisted
>> development budget for the next two months ;-)
> I can relate... :)
>> - but I will do my best to quickly clear up any procedural
>> what-needs-to-be-done questions unambiguously. Please ping if anything
>> gets you stuck.
> I'll let you know.
> Kevin Horn
Abdul Rauf (haseeb)
More information about the Twisted-Python