[Twisted-Python] The Twisted 14.0 Release Pre-Post-Mortem, and Where To From Here

HawkOwl hawkowl at atleastfornow.net
Wed May 7 16:08:10 MDT 2014


On 8 May 2014, at 5:40, Glyph Lefkowitz <glyph at twistedmatrix.com> wrote:

> 
> On May 7, 2014, at 7:07 AM, HawkOwl <hawkowl at atleastfornow.net> wrote:
> 
>> Hi everyone,
> 
> Hi HawkOwl,
> 
>> I’m sure that some of you have been following the past seven or so weeks of Twisted 14.0 release shenanigans, and this email hopes to explain what went wrong,
> 
> Given that there does not appear to be a 14.0 final, shouldn't this be "what is still going wrong"?  This is more like a death rattle, not a post mortem ;-).

Pre-post-mortem! :)

> 
>> what we can do better next time, and where we can go from here.
> 
> Thank so much for doing this.  I'm sorry the 14.0 release process has been a tough one, and that its toughness has been partially my fault.
> 
> However, I'm glad that this has provoked some reflection and discussion.  The fact that you've done such a thorough analysis almost makes a challenging release cycle worth it :).
> 
>> Problem 1: Twisted 14.0.0pre1 had a regression. This was not noticed in the prerelease stage because it was not marked as a regression, where the RM does a check for open regressions on the milestone.
> 
> When you say it was "not noticed in the prerelease stage", do you just mean it didn't show up before the pre-release was made?
> 
> Also, in the future, can you always include specific links to the tickets involved in the problems encountered?  I'm not exactly sure which regressions we're talking about in pre1.

This regression was https://twistedmatrix.com/trac/ticket/6926 - ie. that all our docs would be wrong.

> 
>> What we can do better next time: Tickets that are regressions need to be marked as regressions and applied to the release milestone. If you think it might be a regression - even slightly - mark it as such, and comment that you are not sure. It’s easier to find the ticket later and decide it is not actually a regression than have to abort a release because it’s come up after a prerelease.
> 
> At the same time, I feel like I should stress like this, by itself, was not a huge problem.  Specifically, rolling a second pre-release is okay.  It's a bit unfortunate that the regression was not tagged in advance of the release, but discovering issues and fixing them is exactly what the pre-release process is for.
> 
>> Problem 2: The fix for the regression was not merged into pre1, the release was rerolled from trunk. This meant some pyOpenSSL and TLS improvements got into the 14.0 release from pre2 onwards, but introduced new regressions.
>> What we can do better next time: Do not reroll from trunk to get bug fixes - merge them into the release branch. 
> 
> Another problem here, that I can take full blame for, was that the communication involved was fragmented and not terribly consistent.  HawkOwl would ask a question on IRC, I would give an answer, then a couple of hours later someone else would give an apparently contradictory answer to a follow-up question.  I don't think that we were actually disagreeing all that much, but at a number of points, it became a game of telephone.  Also, I'd sometimes ask a question about the release process, and someone would tell me something they thought HawkOwl had said or a guess as to what might come next, which I took to be the actual plan.
> 
> Particularly, I was very confused at various points as to whether the next prerelease was going to have things backported, which things were going to be backported, or whether we were re-rolling from trunk.  I think that, similarly, HawkOwl was very confused as to what I _wanted_ to happen.
> 
> In the future, when we're communicating about the release process, we should probably try harder than usual to have all the discussion in a persistent forum so that it's obvious where the state of things is.  Maybe that means the mailing list, maybe the release ticket, but IRC has proven to be a particularly inappropriate and unreliable channel for this kind of discussion.
> 
> If we _do_ have a discussion on IRC, following the precedent that some more responsible members of the community have set, and copying a summary or trimmed transcript of the relevant conclusions into the ticket or to the list should be a requirement.
> 
> To get a head start on this, I have put a link to this very discussion on the ticket. <https://twistedmatrix.com/trac/ticket/7039#comment:23>
> 
> And a final point on communication: on release branches, sensible commit messages are particularly important.  On most branches, individual commit messages can be a bit less than helpful because they're eventually all bundled up into a squash commit (hopefully one day a proper merge commit) with its own useful commit message.  That commit message can fill in any gaps left by unhelpful individual commits.
> 
> On release branches, however, every individual commit has release implications, so explaining why things are being done is extra important.  For example, this sequence of events is confusing: <https://twistedmatrix.com/trac/changeset/42616> <https://twistedmatrix.com/trac/changeset/42617>.  Which merge is being reverted?  (I can kinda guess it's the immediately preceding commit, but...) Did a build fail or something?  Which build?  Were some commits merged incorrectly?  Not hypothetical questions, by the way, I am seriously wondering what happened there :-).

That was me screwing up the merge of 7097 - which was causing conflicts and all sorts of weirdness.

> 
>> Problem 3: The fixes for the regressions were finished after some delay, since the fixes had to be written and reviewed. This introduced delays into the 14.0 release cycle.
>> What we can do better next time: Rather than fix regressions introduced, the ticket that introduced them should be reverted.
> 
> Yup.
> 
>> Problem 4: The fixes for the regressions did not merge cleanly with the release branch. Some 35+ tickets were merged between pre1 and the release of the regression fix into trunk.
> 
> The fact that PyCon was happening at the same time definitely did not help.  For what it's worth, I _really_ tried as hard as I could to finish that stuff before the sprints.  But 14.0 probably should have just come out before then anyway :-).
> 
>> What we can do better next time: Bug fixes should be based off the release branch, not trunk. This reduces the likelihood of code churn or unknown dependencies causing problems during the merge.
> 
> This was one of the aforementioned problems with communication.
> 
>> Problem 5: There was mixed communication whether one of the regression fixes was to be introduced in 14.0 or in a bug fix release (14.0.1).
>> What we can do better: If a fix is intended for merging in to a prerelease, it should be raised on the mailing list, so that there is more visibility for its intentions.
> 
> There should probably also be a comment on the release ticket.
> 
>> Problem 6: I personally made several mistakes along the way - from screwing up svn merges to interpreting the “abort the release and incorporate the bugfix” to apply the initial regression fix. Since the TLS changes were topical, I decided that having them out ASAP would be better than not.
> 
> Again: communication, communication, communication.  I didn't know about any screwed-up SVN merges and wasn't super clear on when releases were aborted.  I would have tried to help more if I knew about the issues with the release branch as they were occurring.

The merge problems was why we have 4 14.0 release branches, remember? :)

> 
>> What we can do better: Improved docs/automation to reduce the margin for RM error, and better automation to make a new release to get out important features really easy.
> 
> The release process _is_ getting easier and easier, but sometimes we still act like it's really hard and thereby introduce additional complexity and difficulties.
> 
>> These are the major problems which I have identified - I’m sure there’s plenty more, and I would like people to list them if I have not - even if they make me look like an idiot ;). We can learn from it, I’m sure.
>> 
>> So, this leaves where to from now. I see a few options, with my estimates for work and risk that it’ll explode:
>> 
>> 1 - Most work, high risk - Work on making the regression fixes merge cleanly with 14.0.0pre5. This is big-ish task with room for error, since there was some underlying code churn.
> 
> Just to be clear, "the regression" that we're talking about is <https://twistedmatrix.com/trac/ticket/7097>, right?

Yes.

> 
>> 2 - Some work, medium risk - Release 14.0.0pre5 as 14.0 final,
> 
> I would most prefer this option.  Embarrassing as the errors in the message fixed by 7097 are, I think it's acceptable to say that this is not a particularly meaningful regression.  For me personally it stretches the definition of "regression" a little bit, because it's information about new functionality, not a change or break in old functionality.  And emitting a new warning is (pretty much by definition) never a "regression" because part of our compatibility policy contract is that your code has to be tolerant to warnings being emitted.
> 
> To be fair, it stretches the definition, but it still technically adheres to it.  Importing twisted's TLS support without service_identity installed is a supported thing, it used to do something "correct", it's moved to do something "incorrect" because there is incorrect text emitted.  Still, if I had to classify it without input from anyone else I'd probably call it a "new bug".
> 
> Critically, users applications won't be broken by this.  They'll see some ugly or possibly incorrect text which will be fixed in an update which will hopefully follow on pretty quickly.  Not to mention that there's an easy fix for this by installing the relevant dependency.

Now that I’ve slept on it, I’m thinking #2 might actually be the best way forward.

> 
>> and I (or another RM if I’m no longer trusted ;) )
> 
> Honestly, at this point, I trust you a bit more with the release process.  Up until this point, you've had only easy successes, which (as you can see!) is a little dangerous ;-).  An experience of a failure that you have clearly articulated the reasons for strikes me as a very useful skill-building exercise.
> 

Hopefully a skill I won’t have to use again, but… ;)

>> initiate the 14.1 release immediately.
> 
> More releases are always better!
> 

True!

>> 3 - Least work, highish risk - Scrap 14.0, begin the 14.1 release immediately. since 14.0 tags become 14.1 tags, and we just have to hope that there’s no regressions in the 39 tickets fixed between pre1 and now. This may introduce issues (since 14.0 is an un-release, and there are questions about what this does to our deprecation windows).
> 
> I think that trying to cram in more features to 14.0 got us into a mess in the first place, so throwing our hands up at this point and trying to shepherd 39 _more_ features into this release, potentially delaying things even longer, does not strike me as a good idea.
> 
>> If I am to be honest, I much prefer option #3, but I would like opinions from other developers, before I go causing more problems than I already have :)
> 
> I can see why #3 is tempting, but trunk has got a lot of churn on it right now and I'm relieved we didn't attempt to re-roll post-PyCon despite the merge difficulties.
> 
> More than I'd prefer option 2 though, I'd prefer that everyone interested weigh in and we make a decision quickly so that the release process doesn't drag on further; I should reiterate that I still trust our glorious release manager HawkOwl to make this decision and be responsible for it, so I'm providing input but I'm not giving any orders here.

Agreed.

I’m going to give this another work day for people to weigh in on. Otherwise, I will go with option #2, get pre5-as-14.0 out the door, cut a 14.1 prerelease, and get that ball rolling. Now that I’ve had some rest between worrying about how much I’ve screwed up the release, that seems like the best way forward :)

But for now, I’m off to play Ingress in the rain before work! :)

- hawkie

> 
> -glyph
> 
> _______________________________________________
> Twisted-Python mailing list
> Twisted-Python at twistedmatrix.com
> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: </pipermail/twisted-python/attachments/20140508/b4845ee5/attachment.sig>


More information about the Twisted-Python mailing list