[Twisted-Python] Bloody Twisted Tree (VFS)
glyph at divmod.com
glyph at divmod.com
Sun Jul 6 11:12:09 EDT 2008
On 07:32 am, jml at mumak.net wrote:
>On Sat, Jul 5, 2008 at 3:27 AM, Andy Gayton <andy at thecablelounge.com>
>>On Mon, Jun 30, 2008 at 9:06 PM, Jonathan Lange <jml at mumak.net> wrote:
>> * The primitive interface for IO should be producer/consumers,
>>replacing readChunk, writeChunk. This interface is primitive enough
>>to express all other interfaces, while still providing the opportunity
>>to optimize streaming performance. The producer/consumer interface
>>will need to take an offset to allow readChunk and writeChunk to be
>It would be nice to have things so that readChunks and writeChunks
>(plural) could be implemented, to avoid potato programming.
I don't think this is actually going to be a practical consideration, if
I correctly understand what you mean. For one thing, the
producer/consumer interface is going to be something (very vaguely) like
remoteFile.writeFrom(producer[, offset, length])
remoteFile.readInto(consumer[, offset, length])
This means that if you've got a really giant file, the implementation
could pretty trivially optimize delivering it to you in the most
efficient possible way, keeping all the relevant buffers full at every
opportunity. Given that stream-based I/O is somewhat inherently serial,
it's difficult to get less potato-y than that.
writeChunks, if I understand it, would be pretty trivially implementable
Mapping 'readChunks' and 'writeChunks' to readv and writev in my head,
I'm not really sure what a 'readChunks' would actually do, since we copy
memory every time we sneeze in Python anyway. We're not going to have
preallocated buffers to read into.
>> * we're still using getMetadata and setMetadata - its likely we want
>>a layer on top of using arbitrary key/value dicts for metadata, but
>>this can be introduced in a backwards compatible way.
Hmm. I don't remember agreeing to layering anything on top of
"arbitrary key/value dicts"; I'd really like to see a completely
different layer that specifically separates out optional features
(xattrs, symlinks, posix ACLs(?)) into separate interfaces with specific
methods that don't necessarily need to retrieve all the metadata at
once, which is sort of an inherent property of having a key/value dict.
I'm OK with "still using getMetadata and setMetadata", though, since as
you say, it can be introduced in a backwards-compatible way. I do think
that we should keep that discussion open (for later, after the rest of
this work has been completed).
>This reminds me, it would be good for VFS to have an exception for
>"this operation isn't supported" (say with symlinks on fat32) and
>another exception for "supportable, but not actually implemented yet".
I don't think it's useful to distinguish between these two types of
exception at *runtime*. The use-case I can see for distinguishing is
letting a programmer know that they should figure out something that
might be tricky to implement and write some wrappers or submit some
patches. Perhaps a separate error message, rather than a separate
exception type? Do you have a different use-case?
One related thing that we spoke about in person was pushing this
negotiation of file-system features backwards to the initialization
step, so that applications which needed unusual filesystem attributes
could fail quickly with a clear error message if they weren't supported
by the underlying platform. ("WebDAV requires extended filesystem
attributes, and your backend, SFTP, does not provide that feature.",
"txGnuStow requires symbolic links, and your backend, the Microsoft
Windows filesystem, does not provide that feature.")
The nice thing about this is that the default interface to the backend
would be the one that masked everything but the most common subset of
filesystem features, so that you couldn't *accidentally* depend on a
feature that wasn't present everywhere, without specifically requesting
it. In order to get more obscure features you'd have to specify a
longer list of interfaces.
>> * we still need to decide whether path resolution should be moved to
>>a separate interface, instead of being part of the node's interface.
>I'm not 100% sure what this means? Does this relate to possibly
>combining with FilePath?
The tongue-in-cheek name that radix gave to this interface was
'filepath.pathdelta'. It's related to filepath in the sense that
FilePath, ZipPath, et. al. could benefit from using the same interface
to talk about relative pathnames rather than manipulating lists of
strings. One can, after all, abstractly do operations like "child()"
and "parent()" without knowing a lot about the base implementation of
the filesystem in question.
>> * there's concern over the package name. twisted.tree has
>>considerable support :)
>I kind of like that. I'm not sure what the concern is with 'vfs'
"twisted.vfs" sounds incredibly boring and unpronounceable. It would be
the first twisted.<acronym> package, and it's not really related to any
other technology ambiguously named "vfs".
However, this reminds me about another concern which I did not remember
to raise while Andy was here. Should this really be twisted.<anything>
at all? I'd like twisted <x> "dot products" to generally be an
application which does something <x>-ish. I'm aware that not every
package follows this rule, but the ones that don't are either (A)
unmaintained and slated for removal, or (B) part of the core, not
independent subprojects, as "vfs" seems slated to be.
Put a different way: what should 'twistd tree' do? My suggestion would
be a simple multi-protocol file server: HTTP, FTP (although probably
disable that by default), SFTP, maybe a "native" protocol for providing
a generalized backend for any Twisted application that uses the 'tree'
API, so that we can write a proxy that exposes every arbitrary
combination of features from the protocols it's talking to.
If everyone agrees with this, then great. However, if we never intend
for this to go beyond providing an API that other systems hook into,
maybe it should go somewhere subordinate to another project;
To be clear: I don't mind doing a release that does not include this
tool; I don't think anything should block on it. I just want it to be
in the cards eventually if this is the way we're going to release it.
>>I'll try and make these changes in the next week or so. If you are
>>interested in shaping how this goes, you can track what's going on in
>>http://twistedmatrix.com/trac/ticket/2815 - just weigh in once the
>>ticket goes back to review.
>Here's some random stuff that I wanted to at least mention:
>- Error translation. This should translate the exception types, but it
>should also translate values, so the error contains the virtual path.
This sounds like a specific enough thing that you could file a ticket
that described the exact behavior that you wanted. It doesn't sound
contentious at all to me, so unless you think there's some hidden
confusion there... go ahead?
>- Deferreds. You don't mention them at all, but the lack of
>asynchronous interfaces was one of the biggest problems we had with
I believe that the consensus on asynchronicity is that all of the
synchronous stuff should be FilePath's job. In the glorious future of
twisted.tree, everything will be async. As discussed above, this
doesn't always mean Deferreds, it also means producers and consumers.
One thing we didn't talk about in person: handling extremely large
directories. We had spoken about children() returning a Deferred of a
list; I think it would be nice if it actually had a producer/consumer
API of its own. Maybe this is too much of a corner case to worry about
in average applications (i.e. we could provide a give-me-a-deferred
convenience API) but it would be nice if it were *possible* to implement
things that were efficient against really big networked directories.
>- URL Escaping. I got bitten by this recently. It's obviously not a
>general VFS problem, but it's an issue with enough of them that it
>should be considered when defining interfaces.
I *think* that this should be pretty easily dealt with in a pretty
generic way by having a clearly-defined set of string escaping rules
depending on which protocol you're using. It's a general VFS issue in
the sense that there are escaping issues with "/" on regular
filesystems, after all. Or at least, there are error-reporting issues
with characters like "/", ";", and ":" on certain FSes.
>- "Decorators" like "read-only" and "chroot" could prove useful. Is
>there room in the design for such things?
We did discuss having things like this. Specifically we talked a lot
during the metadata discussion about the possibility for 'decorators'
like "provide-xattrs-with-dotfiles" and "provide-atime-by-pretending-
its-zero". However, we didn't spend too long on it because every
alternative that got brought up sounded like it was a pretty amenable to
a simple delegation approach; there just wasn't a lot of meat there.
We'll have to check to make sure that is true in the review process, of
course, but this is probably the thing I'm least worried about :).
More information about the Twisted-Python