[Twisted-Python] Re: plus mode was Re: how winnt fileops work and what to do about it

Sat Dec 31 17:48:29 EST 2005

----- Original Message ----- 
From: "Brian Warner" <warner at lothar.com>
To: "Twisted general discussion" <twisted-python at twistedmatrix.com>
Sent: Saturday, December 31, 2005 5:16 PM
Subject: [Twisted-Python] Re: plus mode was Re: how winnt fileops work and 
what to do about it

> glyph at divmod.com writes:
>
>> It seems like we can work around this more easily than that, considering
>> that flush and seek are available from Twisted; the file object causing
>> problems in the tests is being returned from the open() method of a
>> FilePath object, if I understand it correctly. FilePath could include the
>> workaround far in advance of Python deciding to.
>
> I'm pretty sure that the real problem we're trying to solve here is caused 
> by
> a stuck process keeping a .pyd file open. Indeed, if you look at the
> buildslave's logs, you'll see the exception is as follows:
>
> exceptions.OSError: [Errno 13] Permission denied: 
> 'c:\\buildslave\\win32-win32er\\W32-full2.4-win32er\\Twisted\\twisted\\protocols\\_c_urlarg.pyd'
>
> So changing the way Twisted or its unit tests open a file is just not 
> going
> to help. What matters is the way python (or.. pyrex?) opens a file.
>
> (for context: the buildbot is currently configured to do SVN 
> checkout/updates
> into one directory, then copy the tree into a second directory, then run
> tests on that second directory. This mode='copy' approach uses 'svn 
> update'
> to minimizes network bandwidth, but at the expense of doubling the disk 
> usage
> with the extra copy. At the beginning of each build, the buildslave 
> deletes
> the second directory with a function named rmdirRecursive() that bear
> provided, which does a chmod() of any mis-permissioned files before 
> deleting
> them. It was an os.remove() inside this rmdirRecursive which raised the
> exception).

sysinternals.com should have a utility equivalent to lsof. this is probably 
the best way to figure out who's doing this.

> I've run into a similar problem in the past, under Solaris, using NFS, 
> where
> a test case spawned off a daemon process which then didn't die when it was
> supposed to, somehow held on to a file (I think solaris won't let you 
> delete
> a file that is being used as the backing store for an executable), and 
> that
> prevented the unlink() from succeeding.

this has to do with how execution works in unices generally. it is *not* a 
lock - there are no compulsory locks - so while the situation is somewhat 
(not very, though) similar wrt effects, it's actually completely different. 
posix semantics dictate that you can not open a file being executed for 
writing and can not execute if it's open for writing; you can, however, 
unlink because the inode doesn't get reaped until the refcount drops to 0. 
this is the case on linux systems. svr4 prohibits the unlink as well, this 
is an svr4 extension to posix. as an interesting piece of trivia to chuckle 
about, the errno for these conditions is ETXTBUSY aka Textfile Busy. (this 
is funny because executables are always binary in practice).

> In that environment, I just renamed the top-level directory to something
> unique, spawned off an 'rm -rf' into the background to delete the old
> directory if it was possible, then continued on with the next build. If 
> the
> code had to try too hard to come up with a unique name, it would flag a
> warning that there might be a stuck process somewhere.

this is a valid technique, except when you're dealing with windows ;) as i 
mentioned in another post, renames (regardless of how high up in the tree 
you go) are recursive copy + recursive delete. the delete will fail. 
furthermore, SHFileOperation recursive deletes bail on first error, afair.

> Perhaps we could use something similar here?

no, see above.

> Of course, the real fix would be to find a way to let the testing code 
> kill
> off any stuck processes, but that'll probably be very windows-specific.

on windows, we probably want to use os.abort() and on *nix os.kill(). 
however, it is probably more interesting to figure out why processes are 
getting stuck ;)

-p