[Twisted-Python] resolving release management conflict

Mon Feb 1 13:59:47 MST 2021

Hello Twisted community!

Recently we've had some emotional language around the release process.  On https://github.com/twisted/twisted/pull/1464 <https://github.com/twisted/twisted/pull/1464>, Craig said several times that he found Adi's work and process changes "very aggressive".  Obviously, to have suggested this, Craig would have found some things Adi said to be emotionally loaded as well.

I think we need to resolve this conflict in order to move forward productively and avoid wasting more volunteer energy on negative feelings.  In order to do that, I think we need to centralize the discussion here.

At this point I think we need to all begin with a presumption of good faith.  Adi, Thomas Grainger, Tom Most, Maarten, Povilas and Craig are all trying to address big problems in Twisted.  Nobody's trying to be aggressive, even if it's coming across that way.

(Personally, I should acknowledge that I don't see Adi's actions as aggressive, but I think we have to address Craig's perception that they are.  Part of the presumption of good faith is presuming that everyone has valid reasons for their reactions.)

Here's the status as I understand it:

Adi, Thomas, and Maarten are trying to address release automation issues to make future releases go faster.  They've merged several changes, including https://github.com/twisted/twisted/pull/1464 <https://github.com/twisted/twisted/pull/1464>, to that end.
Craig is trying to get out a release according to the current process, which means addressing https://twistedmatrix.com/trac/ticket/10069 <https://twistedmatrix.com/trac/ticket/10069> as it is a release blocker.  He has a fix for this, but that fix is blocked.

While we've been waiting on #2, the changes in #1 proceed apace.  Craig's also noted several times that I've given permission for him to do the release and so he feels that breaking things mid-way is reneging on a promise from the project.

I believe the proximate cause of the conflict here is that we're dealing with this regression incorrectly.  When a regression is introduced, there's a process for dealing with it, originally documented here: https://twistedmatrix.com/trac/wiki/ReviewProcess#Revertingachange <https://twistedmatrix.com/trac/wiki/ReviewProcess#Revertingachange>. [1]  The principle we need to outline clearly is this: if we have a regression, the change that introduced it should be reverted as soon as possible.  If we can't revert it directly because of changes to trunk in the meanwhile that introduce conflicts, ideally the change that we would make would be the closest possible thing to reverting it, even if that reintroduces a known bug that we have to reopen.

In the case where a lot of work has happened in the interim, and it's less work to just "fix forward" on top of trunk, it's fine to do a small fix as an end run around this process.  The goal of this process is to minimize the work required to restore trunk to a releasable state.

Right now we have three competing fixes which have all been flagged as incorrect in various ways:

from Tom: https://github.com/twisted/twisted/pull/1499 <https://github.com/twisted/twisted/pull/1499>
Craig rejected this one by saying it will break "certain versions" of Python, but I don't understand why it was closed rather than reviewed / reassigned as normal.  (Or for that matter which versions are the problems, or how it breaks them.)
from Adi: https://github.com/twisted/twisted/pull/1501 <https://github.com/twisted/twisted/pull/1501>
Craig rejected this one by saying "please stop working on this" but it also doesn't have any technical reason that this solution is incorrect.
from Craig: https://github.com/twisted/twisted/pull/1502 <https://github.com/twisted/twisted/pull/1502>
This was originally given a passing review by Povilas from the Buildbot project, but twm added an additional blocking review shortly thereafter explaining that it introduced an additional regression.  It seems as though this one is now actively being worked on?

Craig: you closed two of these issues on the basis that you were "already working on a fix", but that's not a valid reason to close a PR.  Or, for that matter, to give it a failing review.  In the future, please comment on PRs that you don't want merged by pointing out their technical insufficiency, so that we can take the opportunity to learn for the benefit of future maintenance.  In this case, the underlying issue is clearly quite nuanced and requires expertise that touches platform differences in Python's core APIs, so having multiple people working on different fixes is totally fine; this effort is not wasted if everybody learns a little bit about the code in question, and it's definitely not wasted if we have to synthesize our approaches in the end anyway because they address different cases.

If you are pressed for time and don't have the ability to enumerate a large number of subtle problems, it's fine to post a changes-required review saying something high-level, such as "this does not address a number of issues that I've taken into account in my fix in #NNNN; please see the code on that PR for more detail, I can answer specific questions later".

All of these fixes have each been outstanding for weeks, which strongly suggests that this is way too hard to fix forward and we should have stopped trying a while ago.  We should fall back to the previous one if in any way possible, then reintroduce the changes which caused it.  If we're really just about to land 1502, then feel free to complete the work there, but in the future we should regard this kind of delay in reverting as a process failure.

If there are too many conflicts to address in a revert, and reverting means we need to revert 3 or 4 tickets which have built on top of that work, then let's just do that and get those re-merged later as well.  It's unfortunate to ship without features but it's better than multi-month release hold-ups.

(Note: a straight revert doesn't require a full code review.  If you can revert a merge and CI is happy with it, feel free to get an administrator to smash the red button to merge immediately.)

I think a secondary problem is that we have not traditionally been very clear about the role of "release manager".  My intent for the role was originally that it last 1-2 weeks at most, and the responsibility is supposed to begin just a bit before the first prerelease is uploaded, and end when the final release is completed and the release announcement is written.

Given that it's the release manager's responsibility to ensure no regressions are present in the release, this can come along with a little bit of soft power where we should all promptly honor the current RM's request to revert things or to prioritize review of fixes for release blockers which might not be straightforward code regressions and therefore can't be fixed by a revert (for example, 3rd party services like readthedocs or github actions breaking our integrations or CI due to no change of our own).  However the RM has no special authority to circumvent processes, land code, prevent or code from landing, and should ideally never be making changes to the process in-line with the manual process of making the release.

The release manager really isn't supposed to do a bunch of development on trunk.  This has been confounded in the past by the fact that in the past Amber has been both a prolific contributor and the release manager, often at the same time; and, as a way to make the manual process easier, she often would do a bunch of automation work right before a relase.  But the role, inasmuch as it exists, ought to be about executing whatever manual steps remain after we've automated as much as we can.  Similarly, Amber's faithful execution of this role for years was a great benefit to everyone working on the project, but it also obscured that the RM is supposed to be a hat that one person wears for one to two weeks.  My personal feeling (it certainly would overstate this to call it a "process") is that if someone takes on the release manager role and can't get a prerelease out within a week, then they should step down and we should have a bigger community discussion about why we're stuck. 

My follow-up questions are:

Can we revert the change that introduced 10069 so that we can follow the process to re-introduce it after a release has been made, and have the conversation about which PR to accept and why post-release?
Do we need to revert the release process changes from https://github.com/twisted/twisted/pull/1464 <https://github.com/twisted/twisted/pull/1464> for this release?  If so, what is the specific technical need this addresses?
Is there some problem with the release process changes that ought to make this revert more permanent? They look really good to me, but I don't want to discount that possibility either.  A reviewer can always request that code be pulled back out of trunk if there's a problem that was not forseen during the review process?

-g

[1]: I'm not sure if this has been propagated forward to our newer process docs in git; it would be good to delete most of this doc and replace it with a bunch of links so we don't have duplicate and outdated docs.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/twisted-python/attachments/20210201/06a7e396/attachment-0001.htm>