[Twisted-Python] Bloody Twisted Tree (VFS)

Jonathan Lange jml at mumak.net
Sun Jul 6 19:26:00 EDT 2008


On Mon, Jul 7, 2008 at 1:12 AM,  <glyph at divmod.com> wrote:
>
> On 07:32 am, jml at mumak.net wrote:
>>
>> On Sat, Jul 5, 2008 at 3:27 AM, Andy Gayton <andy at thecablelounge.com>
>> wrote:
>>>
>>> On Mon, Jun 30, 2008 at 9:06 PM, Jonathan Lange <jml at mumak.net> wrote:
>
>>>  * The primitive interface for IO should be producer/consumers,
>>> replacing readChunk, writeChunk.  This interface is primitive enough
>>> to express all other interfaces, while still providing the opportunity
>>> to optimize streaming performance.  The producer/consumer interface
>>> will need to take an offset to allow readChunk and writeChunk to be
>>> implemented.
>>
>> It would be nice to have things so that readChunks and writeChunks
>> (plural) could be implemented, to avoid potato programming.
>
> I don't think this is actually going to be a practical consideration, if I
> correctly understand what you mean.  For one thing, the producer/consumer
> interface is going to be something (very vaguely) like this:
>
>   remoteFile.writeFrom(producer[, offset, length])
>   remoteFile.readInto(consumer[, offset, length])
>
> This means that if you've got a really giant file, the implementation could
> pretty trivially optimize delivering it to you in the most efficient
> possible way, keeping all the relevant buffers full at every opportunity.
>  Given that stream-based I/O is somewhat inherently serial, it's difficult
> to get less potato-y than that.
>
> writeChunks, if I understand it, would be pretty trivially implementable by
> saying
>
>   remoteFile.writeFrom(MultiChunkProducer(chunks))
>
> Mapping 'readChunks' and 'writeChunks' to readv and writev in my head, I'm
> not really sure what a 'readChunks' would actually do, since we copy memory
> every time we sneeze in Python anyway.  We're not going to have preallocated
> buffers to read into.

Well, one theoretical advantage is that it can avoid roundtrips in
cases where the remote file server supports a readv-style operation. I
can't think of any servers that do this at the moment (maybe the bzr
smart server? does http 1.1 allow this?), so maybe it's not an issue.

>> This reminds me, it would be good for VFS to have an exception for
>> "this operation isn't supported" (say with symlinks on fat32) and
>> another exception for "supportable, but not actually implemented yet".
>
> I don't think it's useful to distinguish between these two types of
> exception at *runtime*.  The use-case I can see for distinguishing is
> letting a programmer know that they should figure out something that might
> be tricky to implement and write some wrappers or submit some patches.
>  Perhaps a separate error message, rather than a separate exception type?
>  Do you have a different use-case?
>

The first kind should skip tests, the second kind should fail tests.

> One related thing that we spoke about in person was pushing this negotiation
> of file-system features backwards to the initialization step, so that
> applications which needed unusual filesystem attributes could fail quickly
> with a clear error message if they weren't supported by the underlying
> platform.

This sounds like a good idea, provided that there are still clear
runtime errors and that you can skip the negotiation.

Use cases for this would be a virtual filesystem that's glued together
from other virtual filesystems, each of which has different
capabilities.

> ("WebDAV requires extended filesystem attributes, and your
> backend, SFTP, does not provide that feature.", "txGnuStow requires symbolic

Actually, some versions of SFTP do provide it. I'm not sure that there
are any implementations though :)

>>>
>>>  * we still need to decide whether path resolution should be moved to
>>> a separate interface, instead of being part of the node's interface.
>
>> I'm not 100% sure what this means? Does this relate to possibly
>> combining with FilePath?
>
> The tongue-in-cheek name that radix gave to this interface was
> 'filepath.pathdelta'.  It's related to filepath in the sense that FilePath,
> ZipPath, et. al. could benefit from using the same interface to talk about
> relative pathnames rather than manipulating lists of strings.  One can,
> after all, abstractly do operations like "child()" and "parent()" without
> knowing a lot about the base implementation of the filesystem in question.

Well, the world needs a decent one of these.

>>>
>>>  * there's concern over the package name.  twisted.tree has
>>> considerable support :)
>
>> I kind of like that. I'm not sure what the concern is with 'vfs' though.
>
> "twisted.vfs" sounds incredibly boring and unpronounceable.  It would be the
> first twisted.<acronym> package, and it's not really related to any other
> technology ambiguously named "vfs".
>
> However, this reminds me about another concern which I did not remember to
> raise while Andy was here.  Should this really be twisted.<anything> at all?
>  I'd like twisted <x> "dot products" to generally be an application which
> does something <x>-ish.  I'm aware that not every package follows this rule,
> but the ones that don't are either (A) unmaintained and slated for removal,
> or (B) part of the core, not independent subprojects, as "vfs" seems slated
> to be.
>
> Put a different way: what should 'twistd tree' do?  My suggestion would be a
> simple multi-protocol file server: HTTP, FTP (although probably disable that
> by default), SFTP, maybe a "native" protocol for providing a generalized
> backend for any Twisted application that uses the 'tree' API, so that we can
> write a proxy that exposes every arbitrary combination of features from the
> protocols it's talking to.
>
> If everyone agrees with this, then great.  However, if we never intend for
> this to go beyond providing an API that other systems hook into, maybe it
> should go somewhere subordinate to another project; twisted.internet.files
> perhaps?
>

So, this is the thing that *I'm* least worried about.

I think it should just be an API, and that it should be done so that
other Twisted components can depend on it. Beyond that, it's package
location is unimportant.

>> Here's some random stuff that I wanted to at least mention:
>>
>> - Error translation. This should translate the exception types, but it
>> should also translate values, so the error contains the virtual path.
>
> This sounds like a specific enough thing that you could file a ticket that
> described the exact behavior that you wanted.  It doesn't sound contentious
> at all to me, so unless you think there's some hidden confusion there... go
> ahead?
>>
>> - Deferreds. You don't mention them at all, but the lack of
>> asynchronous interfaces was one of the biggest problems we had with
>> twisted.vfs.
>
> I believe that the consensus on asynchronicity is that all of the
> synchronous stuff should be FilePath's job.  In the glorious future of
> twisted.tree, everything will be async.  As discussed above, this doesn't
> always mean Deferreds, it also means producers and consumers.
>

Good good.

> One thing we didn't talk about in person: handling extremely large
> directories.  We had spoken about children() returning a Deferred of a list;
> I think it would be nice if it actually had a producer/consumer API of its
> own.  Maybe this is too much of a corner case to worry about in average
> applications (i.e. we could provide a give-me-a-deferred convenience API)
> but it would be nice if it were *possible* to implement things that were
> efficient against really big networked directories.
>>

Yes. This would be very nice.

>> - URL Escaping. I got bitten by this recently. It's obviously not a
>> general VFS problem, but it's an issue with enough of them that it
>> should be considered when defining interfaces.
>
> I *think* that this should be pretty easily dealt with in a pretty generic
> way by having a clearly-defined set of string escaping rules depending on
> which protocol you're using.  It's a general VFS issue in the sense that
> there are escaping issues with "/" on regular filesystems, after all.  Or at
> least, there are error-reporting issues with characters like "/", ";", and
> ":" on certain FSes.
>

Good. I just wanted to flag it.




More information about the Twisted-Python mailing list