Opened 15 years ago

Closed 15 years ago

Last modified 15 years ago

#1940 enhancement closed fixed (fixed)

make Twisted tests runnable from within a zipfile

Reported by: Glyph Owned by:
Priority: highest Milestone:
Component: core Keywords: trial
Cc: Branch:
Author:

Description

I have one project on hand that needs to load Twisted from a zipfile, and reliability is important, so I want to be able to run unit tests in this deployed configuration for the behavior I'm going to be using.

This creates several problems. First of all, a few tests that I'd really like to run in the context of this project use file to refer to sibling files, and that means that there needs to either (A) be a resource-loader in Twisted so that the tests can be converted to a different idiom, or (B) the trial test-discovery mechanism needs to honor __path__. Ideally both of these things would be true.

I'm going to be starting on a branch for this tonight, and I believe it will also provide a slightly more general solution to #1276 than James wrote in his branch (since it wasn't addressing zip files). I've reviewed the issue as carefully as I can, because there are a bajillion extremely subtle issues involved.

First, I'm going to create a new twisted.python module, "pyspace", which abstractly represents the Python sys.path object space that it seems like most Twisted developers think in terms of these days. There are lots of ad-hoc ways to inspect this, and trial's TestLoader is probably the most complete, but there's no abstract object to look at. I looked at setuptools to make sure that I understood if I was replicating any behavior, and it doesn't appear to have anything like this either; it has ways to inspect the "distribution" space, but not so much just plain old sys.path.

Here's my proposed API. pyspace.top will be a singleton with __getitem__(packagename), __iter__, and walk() methods. top['twisted'] will return an object with this same interface (although quite likely a different implementation), __iter__ will yield a sequence of these objects, etc. Each will also have several utility attributes giving metadata about the represented module (or package), and a "load" method which returns the imported thing.

Ideally this would start with a very simple implementation based on the filesystem, in this branch grow a zip-compatible extension, and eventually get very sophisticated, extracting all available metadata (lists of classes, functions, methods, attributes) without evaluating anything, using similar techniques to what pydoctor does.

To give a simple example, disentangled from the mess of complexity that is trial, imagine a hypothetical 'module lint' tool, designed to provide you with a list of non-stdlib, non-packaged modules on your python path.

import os
stdlibdir = os.path.dirname(os.__file__) 

from twisted.python.pyspace import top
for module in top:
    if module.sysPathEntry != stdlibdir and '.' not in module.name:
        print 'Oops, unpackaged module: ', module

JP and I went through all the myriad use-cases for supporting __path__ and eventually decided that it wasn't worth it in the general case, so we are going to support it only for already-loaded packages, which is to say Twisted, which is to say, the only module that we are ever likely to care about __path__ for in any practical scenario (after the repository is internally split up).

This might be revised, but implementing support for some form of __path__ can easily be done under the covers of the implementation; the interface, and trial (and friends') use of it can remain unchanged.

Change History (25)

comment:1 Changed 15 years ago by jknight

It looks to me like this proposal essentially duplicates: http://peak.telecommunity.com/DevCenter/PkgResources#basic-resource-access

Or, if not duplicates, at least *should use* as its underlying API.

In particular, you can write a "walk all the modules under this module" on top of listdir and isdir.

comment:2 Changed 15 years ago by jknight

(please insert "parts of" before "this proposal" in above comment). Clearly there are many parts of this proposal that are not in any way covered by the pkg_resources API.

comment:3 Changed 15 years ago by Glyph

I disagree for the following reasons, but you're right, I should have addressed pkg_resources right off the bat because it's superficially similar.

  • I am talking about a module which mainly manages dealing with Python objects and Python hierarchies. pkg_resources deals primarily with resources. Now, it would be possible to implement this in terms of the pkg_resources API rather than a trivial application of filesystem and zipfile operations, but I don't feel that it's going to be a win, since the additional dependency is a pain for anyone. For example:
    >>> pkg_resources.resource_exists("twisted", "internet/serialport")
    False
    >>> pkg_resources.resource_exists("twisted", "internet/serialport.py")
    True
    >>> pkg_resources.resource_exists("twisted", "internet/serialport.pyc")
    True
    >>> pkg_resources.resource_exists("twisted", "internet/serialport.pyo")
    False
    glyph@alastor>>> pkg_resources.resource_listdir("twisted", "internet")
    ['.svn', 'cfsupport', (...)]
    
    That last bit is the most telling. I don't know whether cfsupport is a package or not without a dozen other method calls; ".svn" is not something meaningful to me in this context. The output isn't worse than listdir, but it's not much better, either. I feel that this is the most compelling reason.
  • For the same reason, pkg_resources's listing methods do not seem to honor __path__ in the normal case. While our own support of it will initially be fairly hacky and ad-hoc, and work only for one particular use-case (which, oddly, happens to be the same on my weird-o little linux machine and for the Twisted split, e.g. #1276). While I don't have to solve that problem at the same time, but it seems like it would be nice if I could. Also, pkg_resources can't really add support for other types of Python objects, which is where I'd really like this to go.
  • The motivation for me to start thinking about this involved a project for deployment on an embedded device, where installing all of setuptools and dealing with the additional import overhead of eggs and a good deal of metadata that I don't need is actually a problem. If pkg_resources were actually the right way to deal with this in a "normal" Python installation situation then I'd probably write something else for this particular project and propose something using pkg_resources for Twisted, but it isn't for other reasons. I am probably going to end up re-implementing a trivial subset of the functionality offered by eggs, but I don't see that as a bad thing.
    • it's a trivial subset; a good deal of setuptools is geared towards doing things like dependency verification.
    • it's a non-conflicting subset; the package listing stuff I'm working on can be (and probably will be, in short order) made to work with the egg importer as well as the zip importer, which will likely make twisted more amenable to packaging as an egg.
  • Twisted has its own zipfile-stream implementation which would be easy to use to implement an equivalent to get_resource_stream which did not stuff the entire contents of a zip chunk into a StringIO. Again, mumble, embedded device, limited memory, IO bandwith etc, I don't want to read any more data than I have to. Honestly we should probably contribute this to the standard library so that setuptools can eventually do the same thing, but it will be a good 1.5 years before it would be realistic for me to use it in that way, and I'd really like the streaming functionality before then.

To reiterate, although I like inventing things, this isn't NIH. It will be used for radically different things from pkg_resources, although it may also be used for a few of the same things. Eggs will work unmodified with the code I plan to write with no additional changes, and I consider that a good thing. They are using the same underlying APIs (the zip importer API and the new import hooks API) and so I don't forsee any risk of divergence.

Ultimately the only use for pkg_resources in this implementation would be to remove about 10 lines of code that called functions on importer objects; and if you take a look at help(sys.path_hooks[0]) on any python interpreter you will see that most of the interesting methods are just exposed there anyway.

comment:4 Changed 15 years ago by jknight

The API pkg_resources provides is not all trivially implementable on top of the python loader API; it provides additional capabilities. If you write another loader, you can also register a pkg_resources provider to provide those capabilities for your loader. This is much more likely to happen than for somebody to add that capability to twisted's copy of the API, especially so once it becomes included with python.

Anyhow, even if we don't want to depend on it until it _is_ part of python, I think it would be at least a good idea to use the same API, from all twisted code that needs to access resources. That's mostly tests, and also this thing you're writing. Whether or not the API actually comes from the "pkg_resources" module can be modified later.

Also, there's one part of that is somewhat non-trivial, and that we almost certainly will need: get_resource_filename and the related extraction of stuff from the zipfile into a temporary directory hierarchy.

Finally, please look at pkgutil.iter_modules, and pkgutil.walk_modules. Which _does_ handle path.

comment:5 Changed 15 years ago by Glyph

The de-facto python loader API is the extended one provided by the zip importer; some of its methods are even mentioned (although oddly not required) by the PEP. I am just going to call the zip importer's methods for now, and perhaps optionally call down into pkg_resources' API later, to avoid the problem you describe - I don't want people to have to write implementations of the backend for Twisted, but going through pkg_resources isn't any easier, it just provides a layer of indirection if there were a bunch of really interesting things that people were doing with importers (which they aren't, at least currently).

The pkgutil functions you mention are available only in Python 2.5, and in any event they walk around importing every package they find, which I explicitly want to avoid, since it caused problems with the last implementation of #1276. I do wish that get_importer existed in 2.3, though.

For now, I'm going to consider the resources stuff a separate issue and focus on making the object-space-traversal right. I don't like the pkg_resources API for that either, but for completely different reasons which may or may not be spurious and it would be a waste of time to discuss right now. If __path__ support works right and I can put the test-cases in a filesystem directory, I'll be perfectly happy for my current use-cases.

comment:6 Changed 15 years ago by jknight

We can backport python functionality; we've done it before, we should do it again if useful. If you want get_importer in 2.3, simply add it to pkgutil in 2.3.

I know it's not as sexy as an object space thingamabob, but to accomplish the title of this bug, all that's really *needed* is walk_modules (to find the modules), and a resource API (to allow tests to find associated data files).

So if that's not what you want to work on, which it sounds like it's not, maybe you should change the title. :)

comment:7 Changed 15 years ago by Glyph

All that's needed is:

  • a different implementation of walk_modules that does not use __import__, but tries to respect __path__.
  • a re-implementation of TestLoader on top of that.

and that is basically what I'm doing, except I'm cleaning up TestLoader at the same time to make it usable for non-trial cases as well.

After looking at pkgutil, I am on the fence right now about backporting ImpImporter. That might be a big help; I will grab it if I need it.

comment:8 Changed 15 years ago by jknight

Summary of my main points, persisted from irc conversation:

  • I think your module sounds like it does have useful novel functionality
  • It should be implemented on top of functionality in python, or slated to be added to python, where possible.
  • My default position is against unrelated-to-async-networking functionality being added twisted just cause it seems like a convenient place, but that's sway-able.
  • If a python core module is untested or undocumented, the solution isn't writing a new module that does the same thing only different.
  • I haven't seen anything to show that the novel functionality is actually necessary for twisted (vs. the non-novel functionality already available in pkgutil and pkg_resources)

comment:9 Changed 15 years ago by jknight

Re: a different implementation of walk_modules that does not use import, but tries to respect path.

I don't think that's really necessary. Sensible error handling upon failure to import the subpackage is fine. For trial, that means printing a warning or a [SKIP], for other use-cases that may simply mean silently ignoring it.

comment:10 Changed 15 years ago by Glyph

Keywords: review trial added
Owner: changed from Glyph to Jonathan Lange

Thanks for offering to review.

comment:11 Changed 15 years ago by Glyph

I also updated #1276 with a relevant comment since this branch affects it.

comment:12 Changed 15 years ago by Jonathan Lange

Keywords: review removed
Owner: changed from Jonathan Lange to Glyph

General

  • Docstring text should begin immediately after the triple quote, not on a newline (see PEP 8, example in Coding Standard and the vast majority of docstrings in Twisted)
  • File a ticket for Trial's crazy sorting behaviour
  • twisted.trial.test.test_loader.PackageOrderingTest.test_sortPackagesSillyOrder failed when I ran it on the 2.3 buildbot
  • The tests in test_module could be more unit-y. However, given they cover the public api pretty well, I'm ok.

twisted/python/filepath.py

  • Document how _PathHelper.walk traverses the tree
  • _PathHelper.segmentsFrom says it takes a FilePath. Does this actually mean FilePath or should it 'instances of _PathHelper subclasses'. If the latter, consider defining an interface

twisted/test/test_paths.py

  • test_paths.AbstractFilePathTestCase.test_segmentsFromNegative docstring should say """Verify that segmentsFrom notices when the ancestor isn't an ancestor""" or something
  • "appropriate introspection qualities" doesn't mean anything

twisted/trial/test/test_loader.py

  • testInvalidSubdir, testValidFiles and testGetAndSet should be test_invalidSubdir, test_validFiles and test_getAndSet respectively.
  • Remove commented out debug stuff in loadSortedPackages

twisted/trial/runner.py

  • runner.suiteVisit should be runner.visitSuite
  • remove # raise AttributeError from runner.suiteVisit
  • PyUnitTestCase.id does exactly what pyunit.TestCase.id does, afaict. Remove the override.
  • Write tests for ErrorHolder.visit (or else remove comment, or else file bug)
  • In TestLoader, remove moduleGlob, replace with modulePrefix, and use self.modulePrefix in loadPackage
  • Shouldn't _findTestClasses use t.p.modules?

twisted/python/modules.py

  • t.p.modules docstring refers to 'top', which has since been replaced with methods (good move btw)
  • Do we really have to import generators from the future still?
  • _isPackagePath(FP('foo/bar/suckedininit.py')) returns True. It probably shouldn't.
  • _ModuleIteratorHelper.iterModules docstring says 'yield nothing'. Does this mean 'not yield anything' or 'yield t.p.modules.nothing'?
  • I'm a little bleary right now. What's the difference between _ModuleIteratorHelper.walkModules and _ModuleIteratorHelper.iterModules?
  • docstring of getitem refers to 'pyspace', should say 'modules'
  • Hurrah for iter's docstring & implementation
  • PythonAttribute.init's docstring should so "do not construct me directly, use <foo>"
  • Should PythonAttribute.load()'s default default be nothing?
  • PythonModule.init's docstring should say "do not construct this directly".
  • It's unclear what I should expect to happen if I try to load a bad module. If it's _deliberately_ unclear, then there should be a docstring somewhere
  • Boy, getitem makes for some non-obvious recursion.
  • I don't really understand "# feh something with importers... "
  • Likewise, everything from !IPathImportMapper to _ZipMapImpl is a mystery to me.

twisted/test/test_modules.py

  • test_modules has inconsistent vertical whitespace. Recommend 1 lines after methods, 2 lines after classes
  • PathModificationTest docstring should say "These tests share" rather than "Test stuff that shares"
  • test_modules._serialnum deserves its own comment

This API is well overdue. Thank you very much for doing the hard work of getting this stuff as right as it can possibly be.

The code looks good. I can only provide such detailed feedback because it works and is (mostly) clean.

comment:13 Changed 15 years ago by Glyph

Status: newassigned

Since there's so much here I'm going to respond to each point-by-point and tell you what I'm going to do, rather than just hope you catch it all in a re-review. I'm only mentioning stuff with open issues, anything not mentioned I will fix.

General

  • Beginning docstring text immdiately after the quote breaks a few things; paragraph filling in emacs, for starters, and it makes the first ~60 characters "special" in various tools, which I don't like. It is in the coding standard though, so I'll change it for now - but I think this is probably a discussion that needs to be reopened (elsewhere) based on what pydoctor expects, or should expect.
  • You take care of the sorting ticket. I don't know how it should work for real, only that it was badly broken before, and this branch just makes sure there is some sort order, not that it's the right sort order.
  • I'll fix 2.5 ( http://twistedmatrix.com/buildbot/full-2.5-debian/builds/95/step-shell/0 ) and 2.3 ( http://twistedmatrix.com/buildbot/full-2.3/builds/1671/step-shell/2 ) test failures. Although maybe Python shouldn't change so damn much.
  • I like the level of integration covered in test-module; I wrote them white-box on purpose because I want internal refactorings to be easy. I don't particularly like the internal structure of the objects right now, but the API they expose is good.

filepath

  • I'll check the tests and either (A) move segmentsFrom back to FilePath if nothing's using it, or (B) document it differently otherwise.

runner

  • I changed the name to suiteVisit because visitSuite means something different; it's a method of SuiteVisitor. I think I'll leave that as-is unless you have a strong objection.
  • _findTestClasses should, yes, and other things (isPackage et. al. come to mind) should probably start using it too, but I was writing this in support of a particular goal - zipfile support - and I didn't make that change. I don't want the refactoring to be bigger than necessary and it is already pretty freakin' big.
  • ErrorHolder needs a whole slew of tests, and they're getting out of scope. How about this: a .skipped, empty test which describes what needs to be tested, but remove the comment? And an issue, too.

modules

  • "nothing" shouldn't have a non-private name, so it "yields nothing" means the normal thing, i.e. "StopIteration after no results"
  • iterModules is always shallow, walkModules is deep. Maybe the docstrings should use those words.
  • loading bad modules: it's unfortunately the same weird, undocumented error behavior as namedAny. Specifying or documenting this in a way that makes sense is just out of scope here: one potential outcome (which I actually see on a semi-regular basis) is an interpreter segfault or abort. Just don't load bad modules. I will try to address it at least a little better in the docs though, by making it sound scary :). BTW: this is one of the reasons I commented on #1276 the way that I did.

I'm glad you appreciate this :).

comment:14 Changed 15 years ago by Jonathan Lange

Cool. Assume "yes" for things I haven't replied to.

  • So nothing => _nothing? (I think 'yes please')
  • Loading bad modules: Trial has to do this, because people are stupid and trial needs to be stupid-proof. Definitely update the docstring.
  • I'll file a ticket re ErrorHolder. No other action required.
  • I'll file a ticket for _findTestClasses et al. No other action required.

No need to re-assign to me when done -- just merge, provided tests pass on buildbot.

comment:15 Changed 15 years ago by Jonathan Lange

See #1946 and #1947 for ErrorHolder and _findTestClasses tickets

comment:16 Changed 15 years ago by Jonathan Lange

#1948 for the load sorting ticket

comment:17 Changed 15 years ago by Glyph

OK, I'm about to merge.

Thanks for filing tickets.

You might want to give this one last look since the satisfy-the-review diff was bigger than I thought it would be; there's a lot of stuff bridging the 2.3->2.4->2.5 triple that we now support.

The tests seem to be passing on 2.3 and 2.4 for me, but 2.5 I had to build myself, which is uncovering other issues (I will file a ticket on that in a little bit).

If I remove 'id' and the override, the doctest test starts failing again. I am deeming that an involved enough issue to get its own branch at this point.

IPathImportMapper is just the way that we decide what the string entry on sys.path means, based on the PEP 302 importer protocol. Given that the importer protocol is in flux, it's an internal detail and not terribly important to understand :).

As far as vertical whitespace; you were right that it's inconsistent, but i'm going for 2 lines between methods, 3 between classes, because I want to reserve single blank lines for discretionary separators between sections of a method.

comment:18 Changed 15 years ago by Jonathan Lange

Status: assignednew

Looks good. Merge away.

comment:19 Changed 15 years ago by Glyph

Resolution: fixed
Status: newclosed

(In [17651]) Make it possible to run trial on tests located within a zip file.

Fixes #1940

Fixes #1276

Author: glyph

Reviewer: jml

This change also introduces a major new piece of functionality, twisted.python.modules, as well as twisted.python.zippath, which allows for FilePath-like access to zipfiles.

As part of changing trial to support zipfiles, another issue was discovered and fixed, which is that trial's sort support was previously broken. This is now fixed, and trial will, by default, sort tests alphabetically now, by fully qualified name.

comment:20 Changed 15 years ago by Jean-Paul Calderone

Resolution: fixed
Status: closedreopened

Reverted merge because of #1952. Re-opening.

comment:21 Changed 15 years ago by jknight

I'm still _quite_ skeptical about the use for this in twisted.

As far as the functionality trial needs is concerned, this module has pretty much exactly duplicated the functions pkgutil provides. What is the point in that?

comment:22 Changed 15 years ago by Jean-Paul Calderone

Having a pkgutil-based implementation in a branch (apart from a complete Twisted repo reorganization) would make it easier to compare these two solutions.

comment:23 Changed 15 years ago by Glyph

Resolution: fixed
Status: reopenedclosed

(In [17672]) Re-merge branch for #1940, plus fix for Windows and additional docs.

Fixes #1952 Fixes #1940 Fixes #1276

Author: glyph

Reviewer: jerub

This is a do-over of r17651, hopefully this time without the test failure on the Windows buildslave.

comment:24 Changed 15 years ago by Glyph

"What is the point in that?"

pkgutil, while ostensibly eventually the right way to do this, is:

  • untested
  • undocumented
  • buggy
  • changing
  • only available in a beta version of python

This is a core piece of functionality that trial depends on intimately, and that other functionality (the plugin system) also implements. (see #1951).

We have backported Python functionality in the past, but never from beta versions of Python, and we explicitly do not make local modifications to it, or write our own test suites.

And personally, after evaluating them endlessly while I was considering your objections to this branch, I have to say that even if it were documented, tested, stable, and available everywhere, I find pkgutil's API is still gross. It yields strings with no metadata aside from an 'importer' (an API which is also still in flux), and its API doesn't extend into non-module python objects, such as classes and methods and attributes. With twisted.python.modules, twisted (trial plus plugin) can eventually use a unified API for all its discovery; with pkgutil, it has to use walk_packages for some stuff, inspect for other stuff, and pkg_resources for yet other stuff, passing strings between the three and sometimes cutting them up for various reasons.

I think that the right thing to do here is, once python2.5 has been out for a little while, propose modules, zippath, zipstream and filepath for inclusion in 2.6. Perhaps as part of doing so, we will re-implement several things in terms of different APIs, but I think the public API they're presenting now is about right.

comment:25 Changed 11 years ago by <automation>

Owner: Glyph deleted
Note: See TracTickets for help on using tickets.