Opened 11 years ago
Last modified 8 years ago
#3004 defect new
FilePath.setContent should do an atomic rename on Windows where possible
Reported by: | Glyph | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | core | Keywords: | |
Cc: | Jean-Paul Calderone, Trent.Nelson, zooko@…, davidsarah | Branch: | |
Author: |
Description
It would really be helpful to have someone who knows a lot of Windows stuff look at this ticket and figure out what the right thing to do is.
Using the python stdlib's default os.rename, the right thing to do would be to prefer ending up in a more-consistent state. Currently setContent
will delete the old file before moving the new file into place. It might also be advisable to have a "repair" method to complete setContent operations which died after the file was written but before it was moved into place. This should be a separate ticket since UNIX can leave extraneous temporary files around too, but whether it needs to differ per platform depends on whether we can use this:
It looks like the MoveFileWithProgress (or MoveFileEx
, or even MoveFileTransacted
on Vista) may provide an atomic-rename facility. If we can invoke these without external dependencies it might be better to forego the use of the os
module in this case.
Change History (14)
comment:1 follow-up: 2 Changed 11 years ago by
Cc: | Jean-Paul Calderone added |
---|
comment:2 follow-up: 3 Changed 11 years ago by
Replying to exarkun:
I don't think there could be a
repair
method which completes the operation. There's no way to know if the temporary file has been fully written or not.
I was referring to the case that you refer to here:
For the particular case where
setContent
dies on Windows between renaming the original file out of the way and deleting it, it's possible to complete the operation, since it is known that all the data is present in that case. This would be a no-op on other platforms though, and if we use some Windows API that supports the rename atomically, then there's no need for it on Windows either.
On other platforms, or in the case where we have an atomic rename on Windows (the APIs I referred to don't clearly describe their error behavior) it would simply delete any temporary files that were created by not finished. The important thing is that you could call this method on a directory without necessarily understanding the steps involved in setContent
, and get something consistent back.
comment:3 Changed 11 years ago by
comment:4 Changed 11 years ago by
Cc: | Trent.Nelson added |
---|
comment:6 Changed 9 years ago by
Owner: | changed from Glyph to PenguinOfDoom |
---|
comment:7 Changed 8 years ago by
Owner: | PenguinOfDoom deleted |
---|
comment:8 Changed 8 years ago by
Cc: | zooko@… added |
---|
comment:9 follow-up: 10 Changed 8 years ago by
Cc: | davidsarah added |
---|---|
Summary: | FilePath.setContent should do something correct on Windows → FilePath.setContent should do an atomic rename on Windows where possible |
As far as I understand, MoveFileEx or MoveFileWithProgress with the MOVEFILE_REPLACE_EXISTING flag does not do an atomic rename, and I don't see any reason to believe that this would be an improvement over what setContent currently does.
MoveFileTransacted with MOVEFILE_REPLACE_EXISTING (and not MOVEFILE_COPY_ALLOWED or MOVEFILE_DELAY_UNTIL_REBOOT) does do an atomic rename on Vista or later, on an NTFS filesystem.
It would also be useful to expose the atomic rename directly as well as using it in setContent.
I don't think it would be a good idea to have FilePath.setContent depend on pywin32; use ctypes for this.
comment:10 Changed 8 years ago by
Replying to davidsarah:
I don't think it would be a good idea to have FilePath.setContent depend on pywin32; use ctypes for this.
If it did use pywin32, it would be an optional dependency, falling back to the current behavior. The advantage of pywin32 is that you get 64-bit support for free, and it's somebody else's job to make sure it doesn't segfault.
Thanks for the info though, and I look forward to your ctypes-using patch :).
comment:11 Changed 8 years ago by
pywin32 is already a fairly significant dependency on Windows. I think we should depend on it if we are going to provide this functionality.
"Atomic update, unless you (or a naive end user) forgot to install a dependency" seems like very unuseful behavior to offer in this API.
comment:12 follow-up: 13 Changed 8 years ago by
For context, we recently succeeded at eliminating pywin32 from the dependencies of Tahoe-LAFS, allowing us to remove instructions to our users to manually install pywin32 if they are on Windows. This is very important to me—it has taken about five years, and we've gradually eliminated every instruction to manually install a dependency other than Python. pywin32 was the last one to go.
This was possible because David-Sarah wrote code using ctypes to do what we previously used pywin32 for. (And, unfortunately, because we changed to use subprocess
instead of getProcessOutput
.)
For Tahoe-LAFS's purposes, I would rather suffer an unreliable setContent
than add a dependency on pywin32. Of course, knowing David-Sarah they would probably write a reliable setContent
for us using ctypes and then we would end up maintaining that in the Tahoe-LAFS source code.
comment:13 follow-up: 14 Changed 8 years ago by
Replying to zooko:
For context, we recently succeeded at eliminating pywin32 from the dependencies of Tahoe-LAFS, allowing us to remove instructions to our users to manually install pywin32 if they are on Windows. This is very important to me—it has taken about five years, and we've gradually eliminated every instruction to manually install a dependency other than Python. pywin32 was the last one to go.
So if pywin32 fixed this bug, your objection would go away?
This was possible because David-Sarah wrote code using ctypes to do what we previously used pywin32 for. (And, unfortunately, because we changed to use
subprocess
instead ofgetProcessOutput
.)
This strikes me as a bad tradeoff. Am I wrong that the choice is writing a bunch of new code vs. setting up an alternative distribution name on PyPI for pywin32 with a metadata pointer at the correct files, and then declaring it as a dependency? Presumably such a fork would motivate the actual pywin32 maintainers to add a step to their release process, or give you the relevant credentials.
But, I wouldn't reject a ctypes-based implementation out of hand; ctypes is always available and the relevant APIs are all system-level things that shouldn't change much, so by all means submit a patch :).
For Tahoe-LAFS's purposes, I would rather suffer an unreliable
setContent
than add a dependency on pywin32. Of course, knowing David-Sarah they would probably write a reliablesetContent
for us using ctypes and then we would end up maintaining that in the Tahoe-LAFS source code.
If you really want to avoid the possibility of a future required dependency on pywin32, we should probably have a Twisted buildbot without it installed. Would you like to supply or maintain one? This may be a little tricky, because Buildbot itself requires parts of Twisted that need pywin32; the Python environments would need to be isolated from each other.
Ideally you could just fix the issue with pywin32's installation though, and skip this whole mess :).
comment:14 Changed 8 years ago by
Replying to glyph:
So if pywin32 fixed this bug, your objection would go away?
I had forgotten that I almost got that working in 2009!
Yes, my objection is all about installation.
I should definitely try to replicate my success from that comment from 2009. I'm not sure what exactly would be required to make it work for us. Ideally doing python setup.py bdist_egg
from a pywin32 source tree on Windows would suffice, but of course we would want that to be automated on a buildbot.
If you really want to avoid the possibility of a future required dependency on pywin32, we should probably have a Twisted buildbot without it installed. Would you like to supply or maintain one? This may be a little tricky, because Buildbot itself requires parts of Twisted that need pywin32; the Python environments would need to be isolated from each other.
Here is a ticket about testing, on the Tahoe-LAFS buildbot, that Tahoe-LAFS doesn't depend on pywin32
Tahoe-LAFS #1334. I wonder if virtualenv
or exocet
would help.
Here is a ticket about testing, on the Twisted buildbot, that Twisted doesn't depend on any packages which aren't declared in a machine-parseable format in the Twisted packaging metadata: #4438 . Currently that ticket is closed as invalid
but I hope it can be re-opened as a valid thing to test for.
I don't think there could be a
repair
method which completes the operation. There's no way to know if the temporary file has been fully written or not. If some additional functionality should be provided, it should be to clean up the temporary file (ie, delete it). However, since it's easy to determine the name of the temporary file, I'm not sure this really needs specific support by a new API.For the particular case where
setContent
dies on Windows between renaming the original file out of the way and deleting it, it's possible to complete the operation, since it is known that all the data is present in that case. This would be a no-op on other platforms though, and if we use some Windows API that supports the rename atomically, then there's no need for it on Windows either.