[Twisted-Python] Bloody Twisted Tree (VFS)

glyph at divmod.com glyph at divmod.com
Sun Jul 6 09:12:09 MDT 2008


On 07:32 am, jml at mumak.net wrote:
>On Sat, Jul 5, 2008 at 3:27 AM, Andy Gayton <andy at thecablelounge.com> 
>wrote:
>>On Mon, Jun 30, 2008 at 9:06 PM, Jonathan Lange <jml at mumak.net> wrote:

>>  * The primitive interface for IO should be producer/consumers,
>>replacing readChunk, writeChunk.  This interface is primitive enough
>>to express all other interfaces, while still providing the opportunity
>>to optimize streaming performance.  The producer/consumer interface
>>will need to take an offset to allow readChunk and writeChunk to be
>>implemented.
>
>It would be nice to have things so that readChunks and writeChunks
>(plural) could be implemented, to avoid potato programming.

I don't think this is actually going to be a practical consideration, if 
I correctly understand what you mean.  For one thing, the 
producer/consumer interface is going to be something (very vaguely) like 
this:

    remoteFile.writeFrom(producer[, offset, length])
    remoteFile.readInto(consumer[, offset, length])

This means that if you've got a really giant file, the implementation 
could pretty trivially optimize delivering it to you in the most 
efficient possible way, keeping all the relevant buffers full at every 
opportunity.  Given that stream-based I/O is somewhat inherently serial, 
it's difficult to get less potato-y than that.

writeChunks, if I understand it, would be pretty trivially implementable 
by saying

    remoteFile.writeFrom(MultiChunkProducer(chunks))

Mapping 'readChunks' and 'writeChunks' to readv and writev in my head, 
I'm not really sure what a 'readChunks' would actually do, since we copy 
memory every time we sneeze in Python anyway.  We're not going to have 
preallocated buffers to read into.
>>  * we're still using getMetadata and setMetadata - its likely we want
>>a layer on top of using arbitrary key/value dicts for metadata, but
>>this can be introduced in a backwards compatible way.

Hmm.  I don't remember agreeing to layering anything on top of 
"arbitrary key/value dicts"; I'd really like to see a completely 
different layer that specifically separates out optional features 
(xattrs, symlinks, posix ACLs(?)) into separate interfaces with specific 
methods that don't necessarily need to retrieve all the metadata at 
once, which is sort of an inherent property of having a key/value dict.

I'm OK with "still using getMetadata and setMetadata", though, since as 
you say, it can be introduced in a backwards-compatible way.  I do think 
that we should keep that discussion open (for later, after the rest of 
this work has been completed).
>This reminds me, it would be good for VFS to have an exception for
>"this operation isn't supported" (say with symlinks on fat32) and
>another exception for "supportable, but not actually implemented yet".

I don't think it's useful to distinguish between these two types of 
exception at *runtime*.  The use-case I can see for distinguishing is 
letting a programmer know that they should figure out something that 
might be tricky to implement and write some wrappers or submit some 
patches.  Perhaps a separate error message, rather than a separate 
exception type?  Do you have a different use-case?

One related thing that we spoke about in person was pushing this 
negotiation of file-system features backwards to the initialization 
step, so that applications which needed unusual filesystem attributes 
could fail quickly with a clear error message if they weren't supported 
by the underlying platform. ("WebDAV requires extended filesystem 
attributes, and your backend, SFTP, does not provide that feature.", 
"txGnuStow requires symbolic links, and your backend, the Microsoft 
Windows filesystem, does not provide that feature.")

The nice thing about this is that the default interface to the backend 
would be the one that masked everything but the most common subset of 
filesystem features, so that you couldn't *accidentally* depend on a 
feature that wasn't present everywhere, without specifically requesting 
it.  In order to get more obscure features you'd have to specify a 
longer list of interfaces.
>>  * we still need to decide whether path resolution should be moved to
>>a separate interface, instead of being part of the node's interface.

>I'm not 100% sure what this means? Does this relate to possibly
>combining with FilePath?

The tongue-in-cheek name that radix gave to this interface was 
'filepath.pathdelta'.  It's related to filepath in the sense that 
FilePath, ZipPath, et. al. could benefit from using the same interface 
to talk about relative pathnames rather than manipulating lists of 
strings.  One can, after all, abstractly do operations like "child()" 
and "parent()" without knowing a lot about the base implementation of 
the filesystem in question.
>>  * there's concern over the package name.  twisted.tree has
>>considerable support :)

>I kind of like that. I'm not sure what the concern is with 'vfs' 
>though.

"twisted.vfs" sounds incredibly boring and unpronounceable.  It would be 
the first twisted.<acronym> package, and it's not really related to any 
other technology ambiguously named "vfs".

However, this reminds me about another concern which I did not remember 
to raise while Andy was here.  Should this really be twisted.<anything> 
at all?  I'd like twisted <x> "dot products" to generally be an 
application which does something <x>-ish.  I'm aware that not every 
package follows this rule, but the ones that don't are either (A) 
unmaintained and slated for removal, or (B) part of the core, not 
independent subprojects, as "vfs" seems slated to be.

Put a different way: what should 'twistd tree' do?  My suggestion would 
be a simple multi-protocol file server: HTTP, FTP (although probably 
disable that by default), SFTP, maybe a "native" protocol for providing 
a generalized backend for any Twisted application that uses the 'tree' 
API, so that we can write a proxy that exposes every arbitrary 
combination of features from the protocols it's talking to.

If everyone agrees with this, then great.  However, if we never intend 
for this to go beyond providing an API that other systems hook into, 
maybe it should go somewhere subordinate to another project; 
twisted.internet.files perhaps?

To be clear: I don't mind doing a release that does not include this 
tool; I don't think anything should block on it.  I just want it to be 
in the cards eventually if this is the way we're going to release it.
>>I'll try and make these changes in the next week or so.  If you are
>>interested in shaping how this goes, you can track what's going on in
>>http://twistedmatrix.com/trac/ticket/2815 - just weigh in once the
>>ticket goes back to review.

>Here's some random stuff that I wanted to at least mention:
>
>- Error translation. This should translate the exception types, but it
>should also translate values, so the error contains the virtual path.

This sounds like a specific enough thing that you could file a ticket 
that described the exact behavior that you wanted.  It doesn't sound 
contentious at all to me, so unless you think there's some hidden 
confusion there... go ahead?
>- Deferreds. You don't mention them at all, but the lack of
>asynchronous interfaces was one of the biggest problems we had with
>twisted.vfs.

I believe that the consensus on asynchronicity is that all of the 
synchronous stuff should be FilePath's job.  In the glorious future of 
twisted.tree, everything will be async.  As discussed above, this 
doesn't always mean Deferreds, it also means producers and consumers.

One thing we didn't talk about in person: handling extremely large 
directories.  We had spoken about children() returning a Deferred of a 
list; I think it would be nice if it actually had a producer/consumer 
API of its own.  Maybe this is too much of a corner case to worry about 
in average applications (i.e. we could provide a give-me-a-deferred 
convenience API) but it would be nice if it were *possible* to implement 
things that were efficient against really big networked directories.
>- URL Escaping. I got bitten by this recently. It's obviously not a
>general VFS problem, but it's an issue with enough of them that it
>should be considered when defining interfaces.

I *think* that this should be pretty easily dealt with in a pretty 
generic way by having a clearly-defined set of string escaping rules 
depending on which protocol you're using.  It's a general VFS issue in 
the sense that there are escaping issues with "/" on regular 
filesystems, after all.  Or at least, there are error-reporting issues 
with characters like "/", ";", and ":" on certain FSes.
>- "Decorators" like "read-only" and "chroot" could prove useful. Is
>there room in the design for such things?

We did discuss having things like this.  Specifically we talked a lot 
during the metadata discussion about the possibility for 'decorators' 
like "provide-xattrs-with-dotfiles" and "provide-atime-by-pretending- 
its-zero".  However, we didn't spend too long on it because every 
alternative that got brought up sounded like it was a pretty amenable to 
a simple delegation approach; there just wasn't a lot of meat there. 
We'll have to check to make sure that is true in the review process, of 
course, but this is probably the thing I'm least worried about :).




More information about the Twisted-Python mailing list