Opened 14 years ago
Closed 5 years ago
#2366 defect closed duplicate (duplicate)
FilePath should allow access to all possible paths on Windows
Reported by: | jknight | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | core | Keywords: | unicode windows |
Cc: | teratorn, Josh Wilcox, davidsarah, zooko@…, Harry Bock | Branch: | |
Author: |
Description (last modified by )
FilePath
incorrectly prohibits access to certain files, due to the crummy APIs it's using.
For example, you can actually have a file called "CON" on windows, and it is accessible by quoting the path in various ways (or by accessing it via native Windows APIs), but FilePath
would currently consider that path a security violation, regardless of whether it actually exists.
Additionally, Windows paths may natively be unicode. FilePath
doesn't really pay attention to unicode vs str, so it may be broken with respect to non-ASCII(?) paths. Even if it does work, it needs unit tests to demonstrate this and prevent regressions.
Attachments (1)
Change History (23)
comment:1 Changed 14 years ago by
Description: | modified (diff) |
---|
comment:2 Changed 14 years ago by
comment:3 Changed 14 years ago by
Cc: | teratorn added |
---|
comment:4 Changed 12 years ago by
Description: | modified (diff) |
---|---|
Summary: | Make filepath work on windows → FilePath should allow access to all possible paths on Windows |
comment:5 Changed 12 years ago by
I have created #3454 for the separate issue of setContent
and moveTo
being non-atomic.
comment:6 follow-ups: 7 8 Changed 10 years ago by
Owner: | jknight deleted |
---|
comment:7 Changed 9 years ago by
Replying to <automation>:
Hi all I see that #4736 was closed as a duplicate of this ticket, but it's not clear to me that FilePath handles Unicode in all cases. Does it?
comment:8 Changed 9 years ago by
Cc: | Josh Wilcox added |
---|
Replying to <automation>:
Hi all I see that #4736 was closed as a duplicate of this ticket, but it's not clear to me that FilePath handles Unicode in all cases. Does it?
comment:9 Changed 9 years ago by
Cc: | davidsarah added |
---|---|
Keywords: | unicode windows added |
The Tahoe-LAFS project would like to use FilePath, but we're concerned that lack of Unicode support would cause a regression on Windows relative to our existing code that uses unicode
strings for paths.
comment:10 Changed 9 years ago by
By the way, I don't really agree that it is a good idea to create Windows files with names that are usually prohibited. The Microsoft article on this is pretty clear:
Do not use the following reserved device names for the name of a file: CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9. Also avoid these names followed immediately by an extension; for example, NUL.txt is not recommended. For more information, see Namespaces.
The fact that you can access these files via \\?\
is beside the point; you'll end up with files that can't be accessed by most other applications.
But #4736 was closed as a duplicate of this ticket. Perhaps #4736 should be reopened in order to focus on the Unicode issue.
comment:11 Changed 9 years ago by
Description: | modified (diff) |
---|
comment:12 follow-up: 14 Changed 9 years ago by
#4736 was just a bad ticket. I would have closed it as invalid instead of as a duplicate.
If someone wants to work on the issue of unicode paths separately from the other issues described on this ticket, file a new ticket for it. As James pointed out though, there is some work in the branch associated with this ticket which addresses unicode issues.
comment:13 Changed 9 years ago by
Cc: | zooko@… added |
---|
I created #5203 (FilePath.children() should return FilePath objects with unicodes in them instead of strs) to track a specific feature that we could use. Also to publish a small novel about Tahoe-LAFS and filename handling.
comment:14 follow-up: 15 Changed 9 years ago by
Replying to exarkun:
#4736 was just a bad ticket. I would have closed it as invalid instead of as a duplicate.
If someone wants to work on the issue of unicode paths separately from the other issues described on this ticket, file a new ticket for it. As James pointed out though, there is some work in the branch associated with this ticket which addresses unicode issues.
Er, what branch? I don't see anything linked from the Branch field. Am I looking in the wrong place?
comment:15 Changed 9 years ago by
Branch: | → /branches/win-filepath-2366-2 |
---|
Replying to davidsarah:
Replying to exarkun:
#4736 was just a bad ticket. I would have closed it as invalid instead of as a duplicate.
If someone wants to work on the issue of unicode paths separately from the other issues described on this ticket, file a new ticket for it. As James pointed out though, there is some work in the branch associated with this ticket which addresses unicode issues.
Er, what branch? I don't see anything linked from the Branch field. Am I looking in the wrong place?
Some branches predate the auto-branch-field-updater gizmo so you have to go to http://twistedmatrix.com/trac/browser/branches/ and grovel around for the ticket number. I've updated the branch field on this ticket.
comment:16 Changed 8 years ago by
Branch: | /branches/win-filepath-2366-2 |
---|---|
Cc: | Harry Bock added |
Keywords: | review added |
I've attached a first stab at a patch that resolves this issue, per the advice of Itamar on the mailing list.
Sorry the patch is so large. Most of the changes involve getting rid of the b"" qualifier for byte strings in the unit tests.
- Use os.O_BINARY instead of importing it from the win32 module - supported in 2.6+
- Rename _secureEnoughString to _secureEnoughBytes to reflect usage
- Remove old pre-2.6 exception handling for listdir() on Windows - this no longer applies and can be handled with one except clause
- listdir() is expected to return a list in other modules (trial); map() returns a view on Python 3.3, use list comprehensions instead
- This also applies to globChildren()
- Add eq support
- Update documentation to explain str/bytes behavior and restrictions
- Depending on whether the input pathname string is unicode or bytes, internally cache certain constants as the right type so Python won't raise exceptions for various operators and system calls.
- setContent and temporarySibling now use None as a placeholder for their default extensions, so we can properly infer the correct string type to use without needing to decode/encode
- Change old unittests back to using the native string type, which should continue to work as expected for both Python 2.6+ and 3.3+.
- Add unittests that explicitly deal with unicode vs. bytes handling.
One question I have is regarding the suppression of warnings; I force Python to filter the ntpath DeprecationWarnings on Windows during unit tests. Is this acceptable?
Changed 8 years ago by
Attachment: | twisted.python.filepath-2366-1.patch added |
---|
first stab at a fix for both Python 2.6+/3.3+
comment:17 Changed 8 years ago by
Thanks for your work on this. Just one comment for starters, sorry for not doing a complete review instead.
Most of the changes involve getting rid of the b"" qualifier for byte strings in the unit tests.
I don't think this is the right approach. All strings should either be b"" or u"", both of which are legal in both Python 2.{6,7} and Python 3.3, the three versions of CPython Twisted targets at this time.
Using "" means the test has different meaning depending on which version of Python is being used. Using b"" or u"" means the test has the same meaning no matter what. We should aim to retain the current behavior for byte strings, tested on Python 2.x and Python 3.x. We'll presumably also need new tests that verify the desired behavior for unicode strings has been implemented.
Plan/Python3 tries to explain this idea along with some others.
comment:18 Changed 8 years ago by
Keywords: | review removed |
---|
comment:19 Changed 8 years ago by
Removed from review per the suggestion of Tom Prince to break it up.
comment:20 Changed 6 years ago by
This ticket talks about "native APIs", implying that CreateFile
doesn't have the same restrictions on CON and NUL that C stdio open
does.
I've experimented with it though via pywin32, and it looks like it does.
This documentation suggests this feature can't be disabled, and doesn't give any guidance on quoting the path name - https://msdn.microsoft.com/en-us/library/windows/desktop/aa363858(v=vs.85).aspx#consoles
comment:22 Changed 5 years ago by
Resolution: | → duplicate |
---|---|
Status: | new → closed |
That's... probably as close as we're going to get. I still have no idea if it's possible to create a file called "CON" :).
This should also include using
MOVE_FILE_REPLACE_EXISTING
to actually makesetContent
(andmoveTo
, of course) atomic.