Opened 10 years ago

Last modified 9 years ago

#4138 enhancement new

A fresh Twisted checkout should support "setup.py sdist"

Reported by: TimAllen Owned by: Screwtape
Priority: normal Milestone:
Component: core Keywords:
Cc: Jean-Paul Calderone, radix, Thijs Triemstra, tarek, khorn Branch: branches/sdist-support-4138-3
branch-diff, diff-cov, branch-cov, buildbot
Author: TimAllen, glyph

Description

This ticket is a stepping-stone towards #1696, but this is an independent, self-contained sub-goal that I expect will involve lots of discussion and exploration of alternatives, so I thought I'd file a new ticket.

Steps to reproduce:

  1. Download the official Twisted 9.0.0 source tarball
  2. Run the following commands:
    tar tjf Twisted-9.0.0.tar.bz2 | sort > /tmp/official-tarball-files
    tar xjf Twisted-9.0.0.tar.bz2
    cd Twisted-9.0.0
    ./setup.py sdist
    tar tzf dist/Twisted-9.0.0.tar.gz | sort > /tmp/sdist-tarball-files
    
  3. Compare /tmp/official-tarball-files and /tmp/sdist-tarball-files with your favourite diff tool.

Expected results:

  • File lists should be identical, at least as far as executable code and tests go.

Actual results:

  • The sdist tarball does not include Pyrex source files for Twisted's extension modules, or various test-data files required by test suites, which rather cripples the resulting tarball.
  • The sdist tarball also does not include documentation, but I'm not so fussed about that (see "as far as executable code and tests" above).

Why this is a problem:

  • People (including but not limited to me) want to build RPMs of Twisted so they can depend on newer versions of Twisted than are shipped with their production platform (say, CentOS or RHEL).
  • Building RPMs with bdist_rpm is superior to building RPMs with admin/twisted.spec because:
    • Nearly every Python project that uses distutils can be built as an RPM this way, reducing documentation and training needs.
    • For those whose production systems depend on changes to Twisted that will be pushed upstream, running bdist_rpm from the branch is much, much simpler than writing a build-system to download official tarballs and bundle a patch-set and custom twisted.spec.
  • Working sdist is a precondition for working bdist_rpm.

Attachments (4)

sdist-support-4138.2.patch (140.1 KB) - added by TimAllen 9 years ago.
Address Glyph's review comments.
sdist-support-4138.3.patch (141.7 KB) - added by TimAllen 9 years ago.
New patch against current Twisted trunk, without regressions.
sdist-support-4138-2.patch (2.0 KB) - added by TimAllen 9 years ago.
Address review comments; patch against latest branch
sdist-support-4138.4.patch (141.0 KB) - added by TimAllen 9 years ago.
New patch against current Twisted trunk.

Download all attachments as: .zip

Change History (56)

comment:1 Changed 10 years ago by TimAllen

The traditional solution for messing with the files sdist includes is by using a MANIFEST or MANIFEST.in file, however in #1696 I mentioned an IRC conversation with radix in which he mentioned he would argue against any patch that duplicated the what-files-go-where logic in the existing Twisted release scripts. Thus, I have a proof-of-concept patch for hooking sdist up to the innards of those release scripts.

The changes are all glommed together into one patch, so here's an overview of what's changed:

  • In previous code-reviews, people have pointed out that Twisted's code style requires three blank lines between top-level constructs and two blank lines between methods of a class. Some of the files I've touched didn't conform to this, so I fixed up whatever whitespace problems I found in those files.
  • The global variable twisted_subprojects has moved from t.p.dist (where it wasn't used) to t.p._release where it's used extensively. A later change makes t.p.dist import from t.p._release and moving this variable across prevents a cyclic import dependency.
  • t.p.dist.get_setup_args() now adds its custom cmdclass values to the ones supplied by the setup.py file, so a setup.py can redefine one without having to redefine them all.
  • t.p._release.DistributionBuilder.buildTwisted() previously walked the Twisted directory tree and added things to a tarball. I've broken out the 'walk the directory tree' code into a new method iterateTwistedPaths(), so that other things that want to get the exact list of files to package can do so without having to build a tarball.
  • t.p.dist now defines an sdist_twisted command that builds its list of files by calling iterateTwistedPaths() instead of reading a MANIFEST or MANIFEST.in. The root setup.py uses this implementation instead of the distutils default sdist.

I've included unit-tests for most of the code changes, but there are no tests for t.p.dist.sdist_twisted because it depends on knowing the path to the root of the Twisted checkout, and I'm not sure how to find that reliably. I'd pass it in, except that I can't modify the class interface because it must conform to what distutils expects.

Now that I think about it, perhaps I could include an optional parameter to the constructor that distutils wouldn't call, but which I could call from the test suite...

Anyway, this patch is more to provoke ideas and discussion than seriously for review.

comment:2 Changed 10 years ago by TimAllen

I forgot to mention: After the attached patch is applied, the differences between an official tarball and an sdist tarball are reduced to:

  • sdist tarball includes entries for directories, not just files
  • sdist tarball includes documentation with a .xhtml extension, not .html
  • sdist tarball includes man pages in troff but not in HTML.
  • sdist tarball includes some .pyc files (not a problem for release tarballs because the checkout that's packaged is not the checkout where the release-script is run).

comment:3 Changed 10 years ago by Glyph

Thanks for this wonderfully detailed description.

comment:4 Changed 10 years ago by Jean-Paul Calderone

Cc: Jean-Paul Calderone added

What if setup.py sdist just invoked (probably some sub-set of) the code which is currently invoked by bin/admin/build-tarballs? In particular, it seems like calling DistributionBuilder.buildTwisted is almost exactly the right thing to do.

It seems like having two ways to build a source distribution tarball which produce different results is a good way to get into trouble.

Is the tarball produced by buildTwisted not suitable as a result of setup.py sdist?

comment:5 Changed 10 years ago by TimAllen

If you'd asked me that a week ago, I might have thought it impossible. Having learned a bunch about distutils innards in the intervening time, I thought I'd have a go.

There's one major difference between the base assumptions of buildTwisted and the base assumptions of distutils' sdist: buildTwisted assumes it will be packaging a pristine, temporary SVN checkout, and so it defaults to 'include everything apart from this blacklist'. sdist assumes it will be packaging the package it's running from, and so it defaults to 'exclude everything apart from this whitelist'.

However, I've managed to bodge together an even hackier (i.e. no tests) proof-of-concept that includes properly-built documentation and manpages. It doesn't just call buildTwisted directly, because that would neuter sdist's rather useful --formats option.

The changes between an official 9.0.0 tarball and the one produced by sdist with this new patch include:

  • Official tarballs are always .tar.bz2, sdist defaults to .tar.gz but can be configured to build any combination of .tar, .tar.Z, .tar.gz, .tar.bz2, .zip.
  • sdist includes a Twisted-9.0.0/PKG-INFO file, which contains the setup.py metadata in some RFC822-based format. It's explicitly included by the standard sdist implementation, so I felt I should include it too.
  • sdist has dropped the executable bit from various files; if there's a better tool than t.p.filepath.FilePath.copyTo() I'd like to hear about it.
  • sdist includes 52 .pyc files and any Vim swap files, etc. that are lying around.
  • sdist includes entries for Twisted-9.0.0/, Twisted-9.0.0/bin/, Twisted-9.0.0/doc/, Twisted-9.0.0/twisted/, Twisted-9.0.0/twisted/plugins/.
  • The official tarballs include Twisted-9.0.0/twisted/trial/test/directory/, which is detritus from the test suite (see #3101).
  • There are strange differences in the generated documentation, but they don't seem to be significant. For example:
    --- a/tmp/official-twisted/Twisted-9.0.0/doc/core/development/policy/coding-standard.html
    +++ b/tmp/patched-sdist/Twisted-9.0.0/doc/core/development/policy/coding-standard.html
    @@ -114,9 +114,9 @@ $ bin/trial twisted/mail/pop3.py
     
     <pre class="python"><p class="py-linenumber">1
     2
    -</p><span class="py-src-comment"># Copyright (c) 2009 Twisted Matrix Laboratories.</span>
    -<span class="py-src-comment"># See LICENSE for details.</span>
    -</pre>
    +</p><span class="py-src-comment"># Copyright (c) 2009 Twisted Matrix Laboratories.
    +</span><span class="py-src-comment"># See LICENSE for details.
    +</span></pre>
     
         <p>When you update existing files, make sure the year in the copyright
         header is up to date as well. You should add a new copyright header when
    

This is slightly less efficient than both the existing sdist and existing Twisted distribution builders:

  • Unlike sdist, this code never takes advantage of hard-links to speed up building the directory tree to be archived.
  • Unlike the previous incarnation of buildTwisted, this code does not create a tarball directly, it just creates a full directory tree with the desired contents and archives it.

comment:6 Changed 10 years ago by TimAllen

Cc: radix added
Keywords: review added
Owner: Glyph deleted

I still don't expect this to be ready for merging, but I would like feedback on whether there might be better approaches for integrating with Twisted's distribution-building code, for copying files and preserving permissions, and why the Lore docs are reformatted.

Also, Cc'ing radix because this patch messes rather intimately with his release-making code.

comment:7 Changed 10 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: set to TimAllen

As you say, it's not done, but I do like the direction the new patch is taking. We should be careful around the overridden sdist methods, be sure to test that functionality well, document whatever assumptions we're making, etc, and maybe get Tarek Ziade's opinion on it, but I think it could work.

As far as the Lore reformatting goes, what were you comparing? Lore uses xml.dom.minidom now; could differences in the version of that used be introducing these differences?

Thanks for your work on this!

comment:8 Changed 10 years ago by TimAllen

Status: newassigned

I've spent an hour or two tinkering with code trying to make the test suite set the executable bit on files in its test directory structures, and then check the executable bit packaging, but it's pretty maddening; I fear I shall have to add a "copyStat" parameter to FilePath.copyTo() that makes it call shutil.copystat() on each file it copies.

As for the Lore changes, my code is directly based on the 9.0.0 release; I assume radix built the 9.0.0 tarball using the version of Lore from the 9.0.0 release, as did I - but I haven't changed anything Lore related (as far as I know) and I get the subtly different output I pasted above. On the other hand, I've been doing my testing with Python 2.4, and I guess radix is using 2.5 or 2.6 - if minidom has changed in some way, I guess that might explain it.

comment:9 Changed 10 years ago by Jean-Paul Calderone

I fear I shall have to add a "copyStat" parameter to FilePath.copyTo() that makes it call shutil.copystat() on each file it copies.

Also, don't be afraid to add a private helper to dist.py or _release.py or wherever that does exactly what you want without any concerns for generality.

On the other hand, I've been doing my testing with Python 2.4, and I guess radix is using 2.5 or 2.6 - if minidom has changed in some way, I guess that might explain it.

Yea. What if you generate the 9.0.0 docs yourself w/o your changes and compare that to your generated docs w/ your changes (using the same version of Python each time, of course)?

comment:10 Changed 10 years ago by TimAllen

Another checkpoint - the newly attached patch adds t.p._release._stageFile that largely replicates what distutils code did - it hard-links files (if it can) or copies them with metadata (otherwise). This means all the files that should be executable in the resulting tarball, are.

Things I still intend to do before review:

  • filtering out .pyc and perhaps .pyo files. Black-listing particular file extensions is a slippery slope (how many editor-backup-file extensions are there?), but I've already discovered (from #1696) that Twisted has too many random files to make white-listing feasible.
  • Try and build an official Twisted tarball on my workstation and compare the Lore output.

Things I will do if asked, or for bonus points:

  • Track down Tarek Ziade, try and explain what my code does, see if he has an opinion.
  • Port the other distribution-building methods to use the same structure as buildTwisted - no performance or compatibility benefit, just consistency within the release code.

About generating the 9.0.0 docs myself on the trunk: it seems the bin/admin/build-tarballs script requires an SVN checkout, and I've been using git-svn all this time. I'll try an SVN checkout and try to get it working later, but not today.

comment:11 Changed 10 years ago by Thijs Triemstra

Keywords: review added
Owner: TimAllen deleted
Status: assignednew

Putting it in the review queue.

comment:12 Changed 10 years ago by TimAllen

I used SVN to check out today's trunk, and ran the bin/admin/build-tarballs script to produce Twisted-9.0.0.tar.bz2. Then, I rebased my changes onto the trunk and used sdist to produce Twisted-9.0.0.tar.gz. I untarred them both side-by-side, removed the .pyc files from the sdist tree, and compared the two.

Results: The only differences were the changes in my patch. The Lore differences I observed must be a Python/minidom thing, as exarkun surmised.

comment:13 Changed 10 years ago by TimAllen

OK, the latest attachment adds a quick hack (+ tests, naturally) to make _stageFile() quietly ignore any attempts to stage .pyc or .pyo files. With that, the only differences between the sdist tarball and one produced by the official Twisted script are:

  • sdist packages include a PKG-INFO file in the root directory. Apparently this is a distutils thing, and I am loathe to change it.
  • Official tarballs include twisted/trial/test/directory/ due to #3101.
  • sdist packages include entries for various directories, which shouldn't affect anything.
  • sdist packages default to .tar.gz although you can build .tar.bz2 as well.
  • sdist packages built with this patch *include* this patch, tarballs built from the trunk do not.

And with that, I believe this patch is ready for the usual in-depth review. Lay on, MacDuff! ;)

comment:14 Changed 10 years ago by radix

Keywords: review removed
Owner: set to Screwtape

Wow, I'm happy that you were able to do this without doing anything with private distutils APIs. Unless I missed something...

+        _, extension = src.splitext()

I prefer "extension = src.splitext()[1]". In fact, since the variable is only used once, you could just in-line the expression in the conditional on the following line.

  1. Can you compare to errno.EXDEV instead of the literal number 18?
  1. I think it's worth being explicit in _stageFile's docstring that hard links will be made if possible.
-        if sys.version_info[:3] < (2, 3, 0):
-            kw['cmdclass']['build_py'] = build_py_twisted
+    if sys.version_info[:3] < (2, 3, 0):
+        defaultCmdClasses['build_py'] = build_py_twisted

This code can probably be deleted, since we don't even support Python 2.3 any more, let alone anything less than that.

  1. When I run "python setup.py sdist" from an SVN checkout, the resulting tarball contains .svn directories. These should be filtered out.

comment:15 Changed 10 years ago by TimAllen

Keywords: review added
Owner: changed from Screwtape to TimAllen

I don't think I've used any private distutils stuff; I've messed with undocumented things, but then almost everything in distutils is undocumented.

I've attached a new version of the patch with the following changes:

  1. Added a module constant to t.p._release.py named EXTENSION_BLACKLIST, containing the extensions of bytecode files and of various VCS tools that people might use with Twisted.
  2. Updated _stageFile to ignore files or directories with extensions in EXTENSION_BLACKLIST.
  3. Updated _stageFile's docstring, as requested.
  4. Merged the splitext() line into the if statement, as requested.
  5. I have now discovered the errno module in the standard library, thank you. :)
  6. I've removed the build_py_twisted class from t.p.dist and the various places it was used.

Back to review!

comment:16 Changed 10 years ago by Thijs Triemstra

Cc: Thijs Triemstra added

Is #3912 a duplicate of this one?

comment:17 Changed 10 years ago by Jean-Paul Calderone

Hard to say yet. However, even if it turns out to be, the main blocking issue for #3912 is test coverage, which probably involves some buildbot configuration. So we shouldn't close #3912 until it's CIS-tested somehow.

comment:18 Changed 10 years ago by TimAllen

I would say that #3912 is not a duplicate of this ticket, but merging this branch would partially fix #3912 by happy accident. A proper fix for #3912 would probably include doing the same fix to all the subproject setup.py files too.

comment:19 Changed 10 years ago by TimAllen

Owner: TimAllen deleted

comment:20 Changed 10 years ago by Thijs Triemstra

Could you cover a fix for #3912 in this ticket so we can use that other ticket for the subproject's? Because you say #3912 would be partially fixed by this ticket already.. So a unit test for the main setup.py version nr issue here would be nice because doesn't tap2rpm kinda depend on a consistent version nr anyway? Or is this ticket even blocked by #3912..

comment:21 Changed 10 years ago by Thijs Triemstra

Cc: tarek added

comment:22 Changed 10 years ago by TimAllen

I'd rather not absorb #3912 here, honestly. I don't know that this ticket would fix #3912, I just know that it does the same kind of "override sdist" trick that the patch in #3912 does. I took a look at setuptools' patched sdist and I can't see any evidence that it messes with the version string at all, so I don't understand how the patch in #3912 could possibly work.

Digging up enough information on setuptools to find out what *would* work is a bit far removed from what the rest of the work in this branch, and I think really belongs as a separate ticket.

comment:23 Changed 10 years ago by Jean-Paul Calderone

I'd rather not absorb #3912 here, honestly.

That's sufficient reason for me. :)

comment:24 Changed 10 years ago by Glyph

Keywords: review removed
Owner: set to TimAllen

Thanks again for plugging away on these gnarly edge-cases!

  1. I don't really like how EXTENSION_BLACKLIST works. For example, why .git and .bzr, but not .hg? Why not .DS_Store? Why not .nautilus_preview_media_turd or whatever that's going to be called in the next version? For one thing, I think that all hidden files should be ignored, but more importantly I think that the entry point to the should-this-file-be-included logic should be in a fucnction. "if src.splitext()[1] in EXTENSION_BLACKLIST" is repeated twice, and that's just on the border of an acceptable amount and complexity of code to repeat :).
  2. The blacklisted-ness should be checked before trying to stage the file. If, for example, you have an Emacs buffer open, that will create a dead symbolic link as a lock file. Those dead symbolic links are hidden, but if you check to see if they're files or directories first, you'll blow up with "can only stage files or directories" despite the fact that it should have been ignored in the first place.
  3. sdist_twisted's name violates the coding standard, I think. Does it need to have an underscore in it for some reason? If not, SDistTwisted would be the right thing to call it. The other stuff in that file also seems to be violating the coding standard but I can't see that it needs to; new stuff should be done right.
  4. Should sdist_twisted actually be public? It seems like it might be generally useful (for the Divmod projects, at least) but maybe it would be better if it began its life in _release? I could be wrong about this: I'm not really clear on the intended division of labor in these modules. But I'd err on the side of making it private.
  5. It also needs a better docstring. What is 'our way'?
  6. get_file_list and make_release_tree look like they're overriding functions from sdist.sdist, but you should probably mention that in order to avoid getting flack from another review about coding-standard compliance on their names :).
  7. We're trying to promote a more assertive style in test docstrings. For example, instead of "_stageFile should duplicate the content and metadata of a single file", "_stageFile duplicates the content and metadata of a single file." (Also: which file?)
  8. Wow. Hard links, really? This kind of stuff should be in FilePath. Especially since you're duplicating the EXDEV-handling logic that moveTo has, and you're accessing '.path' a lot. I think I'd start off with a simple .copyTo, and have a separate ticket for optimizing it with hard links when that's possible. Moreover: your EXDEV handling codepath is untested, which makes this even dodgier.
  9. In test_release, "if "/bin" in child.path" almost certainly wants to be "if "bin" in child.segmentsFrom(root):". I know these tests are UNIX-only so we don't need to worry about the literal slash, but if your Twisted checkout happens to be in a directory called "/media/bingo" you are likely to see some bad results from that test. Moral of the story: use FilePath, it will save you from yourself.
  10. makeAPIBaseURL needs a @param and a @return and maybe some @rtypes. It's especially important to be clear about that stuff when you're passing around strings (which could be text or bytes) that represent something else (and could therefore be URLPaths or FilePaths or whatever).

Otherwise I think it's looking pretty good.

comment:25 in reply to:  24 Changed 10 years ago by TimAllen

Replying to glyph:

  1. makeAPIBaseURL needs a @param and a @return and maybe some @rtypes.

Fixed.

  1. sdist_twisted's name violates the coding standard, I think.

Yeah, I copied the name from the surrounding examples; I've changed it to SDistTwisted.

  1. Should sdist_twisted actually be public?

I can see an argument for SDistTwisted being private, since it's not really useful for any other project (or anything outside Twisted's SVN tree), but that argument would probably apply to a bunch of other things in t.p.dist like get_setup_args(). If there is a division between dist and _release, I'd say it's that dist contains modifications to distutils, and _release contains Twisted's home-grown release system, so SDistTwisted should stay in dist. I wouldn't regard either of them as public modules, though.

  1. It also needs a better docstring. What is 'our way'?
  1. get_file_list and make_release_tree look like they're overriding functions from sdist.sdist
  1. We're trying to promote a more assertive style in test docstrings.

I've updated my various test docstrings accordingly.

  1. For one thing, I think that all hidden files should be ignored, but more importantly I think that the entry point to the should-this-file-be-included logic should be in a function.

I have just discovered an almost identical system in t.p.dist in the EXCLUDE_NAMES and EXCLUDE_PATTERNS globals, which exclude .svn and CVS and {arch} and _darcs but not .git or .bzr or .hg. I think some centralisation is in order.

I've removed t.p.dist._filterNames and its associated blacklists and combined them with the EXTENSION_BLACKLIST stuff into a new method called t.p._release.isDistributable(), which is a lot simpler. So far as I can tell, the only user of _filterNames was t.p.dist.getDataFiles(), which now has a fairly comprehensive test suite and has been rewritten to use isDistributable().

The related function t.p.dist.relativeTo() is a shambling horror that is required to make getDataFiles() do its thing. We should probably get radix to check whether relativeTo()'s... eccentric behaviour is actually required, and remove it if we can. At the very least, I've updated the docstring to show what it *does* do, rather the tragically misleading content it had before.

  1. The blacklisted-ness should be checked before trying to stage the file.

Fixed.

  1. Wow. Hard links, really? This kind of stuff should be in FilePath. Especially since you're duplicating the EXDEV-handling logic that moveTo has, and you're accessing '.path' a lot. I think I'd start off with a simple .copyTo, and have a separate ticket for optimizing it with hard links when that's possible.

Although it was a while ago now, I seem to recall a conversation with exarkun on #twisted where he recommended writing a custom, specific function rather than trying to fit something into FilePath. In particular, a .copyTo method could not be optimized to use hard-links (because copies should be independant) nor could a hypothetical .linkTo method fall back to copying (links should be... linked). Therefore, I'd need a custom function for my quirky use-case. Since I need a custom function anyway, I might as well make my custom function do the whole operation (copying content, copying metadata, hard-linking where possible).

Moreover: your EXDEV handling codepath is untested, which makes this even dodgier.

I've added a test for EXDEV handling.

Also, during testing I noticed that trying to stage one file on top of an existing file raised different errors depending on whether it used hard-links or copying, so I've taught _stageFile to paper over that difference.

  1. In test_release, "if "/bin" in child.path" almost certainly wants to be "if "bin" in child.segmentsFrom(root):". I know these tests are UNIX-only so we don't need to worry about the literal slash, but if your Twisted checkout happens to be in a directory called "/media/bingo" you are likely to see some bad results from that test. Moral of the story: use FilePath, it will save you from yourself.

You're right; if I hadn't had FilePath I probably should have split on os.sep or something. It turns out that because createStructure() is recursive, "child.segmentsFrom(root)" didn't do exactly what I wanted, but I changed things about a bit to make it work.

comment:26 Changed 10 years ago by TimAllen

Keywords: review added
Owner: TimAllen deleted

comment:27 Changed 10 years ago by Jean-Paul Calderone

Author: exarkun
Branch: branches/sdist-support-4138

(In [27953]) Branching to 'sdist-support-4138'

comment:28 Changed 10 years ago by Jean-Paul Calderone

Author: exarkunTimAllen
Keywords: review removed
Owner: set to TimAllen

So, there's one big issue that needs to be resolved here. It's about which way the various release-related module dependencies go, and what platforms they're all supported on. Currently, in trunk, twisted.python._release claims to only work on Linux. Its test suite only runs on Linux (and if it weren't skipped on other platforms, some of its tests would fail on at least a one of them). So, obviously this makes it problematic for setup.py to import things from twisted.python._release, either directly or indirectly via twisted.python.dist. There seem to be a few reasons the dependency goes in the direction it does:

  1. DistributionBuilder - Basically the core of SDistTwisted, and implemented in _release with all the other builders (and dependent itself on ManBuilder and DocBuilder). ManBuilder and DocBuilder may actually work on Windows (though I'm not positive) and I suspect they work fine on non-Linux POSIXs.
  2. makeAPIBaseURL and isDistributable - nothing really complicated here

The solution here may just be to make the skips in the tests for twisted.python._release more specific so that the parts of that module which need to be cross-platform are actually be tested across multiple platforms. I did really like the idea of having the entire module be Linux-only, though. It meant it was really easy to tell if some code that needed to be cross platform actually would be.

Some other minor points which I've taken care of (r27955):

  1. The multi-line import in setup.py could use () instead of \ now (yay, Python 2.4) and there's a stray period at the end of the second line
  2. There's some trailing whitespace near the data_files argument in setup.py
  3. The cmdclass parameter is still defined in terms of sdist_twisted instead of SDistTwisted.
  4. twisted/python/dist.py has a multi-line import that could change too.
  5. assertEquals yay, assertEqual boo. fail* boo!
  6. Some uses of path.open("r").read() (in test_release.py) would be better as path.getContent() (which will make sure the file is closed, too).

Some other small-ish issues that I didn't want to do myself:

  1. The NotImplementedError at the end of _stageFile is untested; neither is the non-EXDEV error case for the copyfunc call.
  2. twisted.python.dist.getDataFiles is now iterating over a list as it removes things from that list. Surely this will cause some problem.
  3. Still a "/bin" in ... in test_release.py
  4. The makeAPIBaseURL test could verify that the result is a str, or that it can be parsed as an URL or something like that.

comment:29 Changed 10 years ago by Jean-Paul Calderone

Also, twisted.python.test.test_dist.GetDataFilesTests.test_directoryRecursion fails on Windows.

comment:30 Changed 10 years ago by TimAllen

Man, I'd whined about this in #twisted, but never actually thought it through to its logical conclusion like that before. Twisted's current packaging system has the basic assumption that only Twisted devs will ever make a Twisted package, so the code doesn't need to be made generalised or portable, while distutils has the basic assumption that anyone anywhere should be able to make a package. These two assumptions are completely at odds. Trying to work around this conflict just leads to redundancy, trying to resolve it requires a fairly fundamental rework or rewrite of Twisted's packaging code.

I expect I'll have to think about this a while longer. Suggestions welcome.

comment:31 in reply to:  30 Changed 10 years ago by Glyph

Replying to TimAllen:

Man, I'd whined about this in #twisted, but never actually thought it through to its logical conclusion like that before. Twisted's current packaging system has the basic assumption that only Twisted devs will ever make a Twisted package, so the code doesn't need to be made generalised or portable, while distutils has the basic assumption that anyone anywhere should be able to make a package. These two assumptions are completely at odds. Trying to work around this conflict just leads to redundancy, trying to resolve it requires a fairly fundamental rework or rewrite of Twisted's packaging code.

This isn't quite right.

distutils' code is generalized so that anyone anywhere should be able to make a distribution; Twisted's code is specialized because only Twisted developers need to make a release. Distutils doesn't do most of the stuff that Twisted's release code needs to, and it doesn't need to be taught to. Even twisted devs need to be able to take the output of the release process and build packages on other platforms; the Windows toolchain, for example, is a big pain in the ass to get working on other platforms, and even if we could get it to work it would be impossible to test.

The thing that needs to be able to do 'sdist' can't depend on the code which does the release-building, but it can depend on work that is done by the thing that makes the release, including building API documentation, incrementing version numbers, building manpages, generating LaTeX ouptut (which, if I understand correctly, is the main reason that this code is platform-specific), and building news files.

Now that I understand the distinction, my suggestion is to put the new distutils code into twisted.python.dist (well, actually, I'd prefer twisted.python._dist), and move any functions that distutils might want to call from _release (which may be entirely platform-specific) to _dist (which must be supported cross-platform). I strongly suspect that most of them would not be a significant amount of effort to support cross-platform since I believe they're mostly must moving files around, not spawning subprocesses.

How does that sound?

comment:32 Changed 10 years ago by khorn

Cc: khorn added

comment:33 Changed 9 years ago by TimAllen

I have begun tinkering with moving things from t.p._release to a new t.p._dist module, but pulling apart the classes in _release to separate the "build Twisted combined package" logic from "build Twisted subprojects" logic will take some careful thought.

Also, t.p._release keeps its tests in t.p.test.test_release, so presumably t.p._dist should keep its tests in t.p.test.test_dist — except that name is already used by the tests for t.p.dist. Currently, I've stuck the tests in t.p.test.test__dist, but I'm open to suggestions.

comment:34 Changed 9 years ago by TimAllen

Keywords: review added
Owner: TimAllen deleted

The attached patch (against the branch) is a savage hack-job of t.p._release along the lines that Glyph outlined in comment:31. It also include a few deletions of unused imports and variables. Among my various changes, I wound up messing with an import that used to bear the comment "This import is an example of why you shouldn't use this module unless you're radix", but no thunderbolts have yet struck me; I'll try to keep under shelter on the way home.

The split is not exact - for example, all of DistributionBuilder has migrated to t.p._dist, even though it includes methods for creating tarballs of Twisted Core and other subprojects, even though such code isn't required for the main setup.py.

All the tests pass, and additionally when one runs ./setup.py sdist the resulting tarball also passes all of its tests. I haven't compared the results to a Twisted 9.0 tarball as with previous patches, because I suspect that's a bit out-of-date by now.

comment:36 Changed 9 years ago by Glyph

Keywords: review removed
Owner: set to TimAllen
  1. You're calling makeAPIBaseURL twice in SDistTwisted, and the invocation that is bound to a local variable is ignored.
  2. As long as you're removing unused stuff, there's an unused 'sys' import.
  3. Let's try to avoid new apparently-public names in twisted.python.dist; please call it _SDistTwisted.
  4. In test docstrings, you should use epytext L{} syntax to link to the Python names being tested.
  5. "CVS"? Really? I think we can count that out, unless there's some little-known 'svn2cvs' I'm unaware of.
  6. test_makeAPIBaseURLIsSubstitutable doesn't contain any assertions. You should really assertEquals(url, something) at the end.
  7. Please add a docstring for IsDistributableTest
    1. "isDistributable denies names Python bytecode files." doesn't really make sense grammatically, please reword it.

Otherwise, this actually looks pretty good to me. Unfortunately I don't feel really confident in my ability to evaluate modifications to the release code, so it would be good if at least one of radix or exarkun had a look at it too.

comment:37 Changed 9 years ago by TimAllen

You're calling makeAPIBaseURL twice in SDistTwisted, and the invocation that is bound to a local variable is ignored.

Well, this is embarrassing; I seem to have made this change on my local machine, but not included it in the patch I submitted.

As long as you're removing unused stuff, there's an unused 'sys' import.

...this one too.

"CVS"? Really?

Removed RCS and SCCS too; I don't recall where I got that list, but it's now less encyclopaedic.

test_makeAPIBaseURLIsSubstitutable doesn't contain any assertions.

There doesn't appear to be an "assertDoesNotRaise()" assertion. One of the reasons I factored it out into its own helper function was so that the URL could be easily customized, and having strict tests about the returned value would rather defeat that.

I could do the substitution, then test that the result was acceptable input to urlparse.urlparse(), but I don't think urlparse.urlparse() is very strict, and the test still wouldn't call any assert methods.

I'll attach a new patch addressing the above concerns.

I should note, however, that in the past three months I've become less sure that this is, in fact, the correct approach to take. Twisted's "source" tarballs are so different from the content checked into SVN that I'm beginning to think SVN shouldn't have a setup.py at all. I soliloquised about it on the Python Packaging mailing list a while ago, and probably the Twisted mailing list would be a better place to discuss the idea, but I thought I should mention it.

Changed 9 years ago by TimAllen

Attachment: sdist-support-4138.2.patch added

Address Glyph's review comments.

comment:38 Changed 9 years ago by TimAllen

Keywords: review added
Owner: TimAllen deleted

comment:39 Changed 9 years ago by Glyph

Branch updated in r29074

comment:40 in reply to:  37 ; Changed 9 years ago by Glyph

Keywords: review removed
Owner: set to TimAllen

Replying to TimAllen:

test_makeAPIBaseURLIsSubstitutable doesn't contain any assertions.

There doesn't appear to be an "assertDoesNotRaise()" assertion.

And for good reason! Asserting that things don't raise is a pretty silly thing to do. You should assert about properties of their results; asserting that they don't raise anything is somewhat redundant.

One of the reasons I factored it out into its own helper function was so that the URL could be easily customized, and having strict tests about the returned value would rather defeat that.

No it wouldn't. It would merely add a safeguard to make sure that the person changing the function was intentionally changing its output value, and would provide that person a place to change before changing the implementation, to make sure that the new output value was correct. It's OK to edit tests - they don't have to be set in stone forever, especially not tests for private utility functionality.

I could do the substitution, then test that the result was acceptable input to urlparse.urlparse(), but I don't think urlparse.urlparse() is very strict, and the test still wouldn't call any assert methods.

Just assert on the final value you expect, after doing the second substitution. Or maybe make the function take 2 arguments, and do both substitutions?

I'll attach a new patch addressing the above concerns.

Thanks. Unfortunately, the branch now conflicts with (possibly among other things) the fix for #4282. Since you're using git anyway, can you maybe just do some DVCS magic and produce a nice, clean, conflict-free patch against trunk? (Perhaps even a git repository I can merge from, if that's a thing that makes sense? I'd prefer bzr if there's a sane mapping between the two, but whatever's easiest.)

Due to the conflicts I haven't quite done a full review, but one thing I did notice is that the docstring for twisted.python._dist should really mention why distutils can't safely depend on all the release-making code in twisted.python._release; it would be super helpful to maintainers if the distinction between "release" and "distribution" that we spelled out in the comments on this ticket were included in the code somewhere.

I should note, however, that in the past three months I've become less sure that this is, in fact, the correct approach to take. Twisted's "source" tarballs are so different from the content checked into SVN that I'm beginning to think SVN shouldn't have a setup.py at all. I soliloquised about it on the Python Packaging mailing list a while ago, and probably the Twisted mailing list would be a better place to discuss the idea, but I thought I should mention it.

There should definitely be a setup.py in trunk. I use it somewhat frequently, and it's immensely helpful for testing. Plus, we should be shooting for making the whole process more integrated, not less. Not every installation requires a release-quality tarball with all the docs (or even all the C modules!)

Nevertheless, that's a lot of interesting stuff in your soliloquy for distutils to take on - but in the present state of things, with eggs and debs and RPMs being what they are, and building software being as complex as it is, I think that this approach is our best shot. It actually seems to be working out pretty well in the branch that you've put together thus far, modulo conflicts.

comment:41 in reply to:  40 Changed 9 years ago by TimAllen

Replying to glyph:

Replying to TimAllen:

test_makeAPIBaseURLIsSubstitutable doesn't contain any assertions.

I could do the substitution, then test that the result was acceptable input to urlparse.urlparse(), but I don't think urlparse.urlparse() is very strict, and the test still wouldn't call any assert methods.

Just assert on the final value you expect, after doing the second substitution. Or maybe make the function take 2 arguments, and do both substitutions?

Unfortunately, the output of makeAPIBaseURL() is used as a command-line parameter to Lore, so making the function do both substitutions at once is not very easy.

I've added a call to self.assertEquals().

I'll attach a new patch addressing the above concerns.

Thanks. Unfortunately, the branch now conflicts with (possibly among other things) the fix for #4282. Since you're using git anyway, can you maybe just do some DVCS magic and produce a nice, clean, conflict-free patch against trunk? (Perhaps even a git repository I can merge from, if that's a thing that makes sense? I'd prefer bzr if there's a sane mapping between the two, but whatever's easiest.)

It turns out no; git's magic merging didn't do the right thing after all. I've created a new patch against the trunk with my changes in, and manually checked every patch to twisted.python._release since the last time I branched from the trunk to make sure these changes don't cause regressions.

Due to the conflicts I haven't quite done a full review, but one thing I did notice is that the docstring for twisted.python._dist should really mention why distutils can't safely depend on all the release-making code in twisted.python._release; it would be super helpful to maintainers if the distinction between "release" and "distribution" that we spelled out in the comments on this ticket were included in the code somewhere.

I've put a summary in the docstring of the new twisted.python._dist module; please check that I've covered all the important bits (and tell me if there's a better place to put it).


I should note, however, that in the past three months I've become less sure that this is, in fact, the correct approach to take. Twisted's "source" tarballs are so different from the content checked into SVN that I'm beginning to think SVN shouldn't have a setup.py at all. I soliloquised about it on the Python Packaging mailing list a while ago, and probably the Twisted mailing list would be a better place to discuss the idea, but I thought I should mention it.

There should definitely be a setup.py in trunk. I use it somewhat frequently, and it's immensely helpful for testing. Plus, we should be shooting for making the whole process more integrated, not less. Not every installation requires a release-quality tarball with all the docs (or even all the C modules!)

Nevertheless, that's a lot of interesting stuff in your soliloquy for distutils to take on - but in the present state of things, with eggs and debs and RPMs being what they are, and building software being as complex as it is, I think that this approach is our best shot. It actually seems to be working out pretty well in the branch that you've put together thus far, modulo conflicts.

My problem with the current branch is that it takes the old, organically grown, highly-coupled _release.py and doesn't clean it up so much as unevenly smear it across two modules (for example, the way that half the *Builder classes are in one file and half are in the other, with no obvious division beyond "what's necessary to make the module dependencies work out"). Also, if "not every installation requires a release-quality tarball with all the docs (or even all the C modules!)" it's kind of a shame I spent all this time trying to make the SVN root setup.py exactly replicate the content of the Sumo tarball produced by the release process. :(

If I were designing Twisted's release and distribution code from scratch right now, it would probably look something like this:

  • No setup.py in the SVN root, or a deliberately crippled and minimalist one that defines a package named "TwistedDev" and just has enough logic to handle "setup.py build_ext --inplace". Using this code to build a "distribution" should not be supported, because the only supported way to distribute code from SVN should be direct from SVN (or an official bzr mirror, etc.).
  • There should be a portable and supported script (that contributors who want to work on the distribution code can run on any platform) that picks through an SVN checkout (or an official bzr mirror, etc.) and constructs a tarball for each distribution that can be built (Twisted Sumo, Twisted Core, Twisted Web, etc.). This would include building whatever documentation should be included in these distributions, but shouldn't do release tasks like building NEWS files, bumping version numbers or uploading docs to twistedmatrix.com (it might generate a version number like "10.0.0-r23456").
  • The tarballs constructed by the above script should be proper distributions with a setup.py; if you extract them and run "setup.py sdist" you should get another tarball containing exactly the same files as the first, they should support "setup.py bdist_rpm" and so forth.

It's my suspicion that a good deal of the complexity of the current distutils goo is because the same setup.py is used to handle the subtly different layouts of an SVN checkout versus a Twisted Sumo tarball.

On the other hand, this branch is already written, tested, mostly code-reviewed, and doesn't radically rewrite the battle-scarred release code. In the short term, this branch is probably the better choice, but in the longer term (especially now that the Twisted release process is being documented and the project has a better idea what the concrete requirements of the process are) it might be worth re-thinking. Something that makes it possible for Win32 users to "easy_install Twisted" would be popular.

Changed 9 years ago by TimAllen

Attachment: sdist-support-4138.3.patch added

New patch against current Twisted trunk, without regressions.

comment:42 Changed 9 years ago by TimAllen

Keywords: review added
Owner: TimAllen deleted

comment:43 Changed 9 years ago by Glyph

Author: TimAllenTimAllen, glyph
Branch: branches/sdist-support-4138branches/sdist-support-4138-2

(In [29102]) Branching to 'sdist-support-4138-2'

comment:44 Changed 9 years ago by Glyph

Changes available in a branch, waiting on build results.

comment:45 Changed 9 years ago by Glyph

Keywords: review removed
Owner: set to TimAllen
  1. Build results indicate that there is a failing test on Windows.
  2. pyflakes is unhappy. Removing the irrelevant warnings, we've still got
    twisted/python/_release.py:19: 'tarfile' imported but unused
    twisted/python/_release.py:20: 'errno' imported but unused
    twisted/python/_release.py:21: 'shutil' imported but unused
    twisted/python/_release.py:30: 'DocBuilder' imported but unused
    twisted/python/_release.py:30: 'ManBuilder' imported but unused
    twisted/python/_release.py:30: 'NoDocumentsFound' imported but unused
    twisted/python/_release.py:31: 'isDistributable' imported but unused
    twisted/python/test/test_release.py:61: 'Popen4' imported but unused
    
    It's a good thing that that last one is unused, because we recently went to some trouble to get rid of all uses of popen2.
  3. You might want to look at the release process documentation to see if those little code snippets can be simplified by (or need to be modified to deal with) these changes.

Feel free to attach a patch against either trunk or the branch to address these, just say which one it is. Or, again, some kind of git thing that someone might use to review it.

Changed 9 years ago by TimAllen

Attachment: sdist-support-4138-2.patch added

Address review comments; patch against latest branch

comment:46 Changed 9 years ago by TimAllen

Keywords: review added
Owner: TimAllen deleted

The attached patch includes the following changes:

  1. Modified the failing test to not make assumptions about path delimiters; should make test pass on Windows.
  2. Removed unused imports.

I looked at the release process documentation page. The first example, "Build API docs for website", doesn't seem to need any changes (It didn't work for me because I don't have pydoctor installed, but it didn't have any import errors). In the second example, ManBuilder and DocBuilder have moved from t.p._release to t.p._dist. After making those changes, I ran the script and (after a little whirring of gears) wound up with a bunch of HTML documentation and a file called book.pdf that appears to have most of the Twisted Core documentation, but nothing from doc/conch, doc/web or doc/words - I'm not sure if that's intentional.

I don't think there's any simplification that can be done to those scripts. I'd update the wiki page, but it wouldn't make sense to do so until after this branch has landed.

comment:47 Changed 9 years ago by Jean-Paul Calderone

(In [29204]) Apply sdist-support-4138-2.patch

refs #4138

Changed 9 years ago by TimAllen

Attachment: sdist-support-4138.4.patch added

New patch against current Twisted trunk.

comment:48 Changed 9 years ago by TimAllen

The recent landing of #4316 introduced changes to _release.py that conflicted with my previous patch; this current patch should apply cleanly to today's trunk. No functional changes.

comment:49 Changed 9 years ago by Jean-Paul Calderone

Author: TimAllen, glyphexarkun, TimAllen, glyph
Branch: branches/sdist-support-4138-2branches/sdist-support-4138-3

(In [29446]) Branching to 'sdist-support-4138-3'

comment:50 Changed 9 years ago by Jean-Paul Calderone

(In [29447]) Apply sdist-support-4138.4.patch

refs #4138

comment:51 Changed 9 years ago by Jean-Paul Calderone

Author: exarkun, TimAllen, glyphTimAllen, glyph
Branch: branches/sdist-support-4138-3branches/sdist-support-4138-2
Keywords: review removed
Owner: set to Screwtape
  1. twisted/python/test/test__dist.py
    1. Needs a module docstring
    2. Twisted imports at the top should be separated from stdlib imports by a blank line
    3. StructureAssertingMixin.createStructure should probably document its new script-chmoding behavior.
    4. Likewise for the new assertions made by assertStructure
    5. assertStructure still has this line: if "/bin" in child.path:
    6. The pattern of visiting each leaf node in the dirDict is present in createStructure and assertStructure. The code might be a little clearer if that were factored out into a helper iterator.
    7. Whitespace isn't used consistently around |, eg in test_stageFileStagesFile.
    8. There are 11 new occurrences of a failUnless... method. Please us the assert... forms.
  2. twisted/python/_dist.py
    1. Since twisted_subprojects is being redefined here, take the opportunity to make its name comply with the naming convention (ie, lose the underscore).
    2. EXCLUDE_PATTERNS should drop the arch and darcs entries. I was going to say that there should just be a comment explaining why they were present, but since it's only arch and darcs, I'm just gonna say we're never gonna care about such configurations (we're certainly not gonna help out arch uses and ignore bzr or hg users).
    3. DistributionBuilder re-used PROJECT_BLACKLIST to define its blacklist attribute. Now it duplicates the two-element list defined in twisted/python/_release.py. Shouldn't this list still be shared?

The change as a whole is too large to fit into my head, so I don't know if I missed some other things. Aim for smaller changes in the future. :)

comment:52 Changed 9 years ago by Glyph

Branch: branches/sdist-support-4138-2branches/sdist-support-4138-3

(Fixing branch, the review accidentally mangled it.)

Note: See TracTickets for help on using tickets.