Opened 8 years ago

Last modified 13 months ago

#2366 defect new

FilePath should allow access to all possible paths on Windows

Reported by: jknight Owned by:
Priority: normal Milestone:
Component: core Keywords: unicode windows
Cc: teratorn, wilcoxjg@…, davidsarah, zooko@…, bock.harryw@… Branch:
Author: Launchpad Bug:

Description (last modified by exarkun)

FilePath incorrectly prohibits access to certain files, due to the crummy APIs it's using.

For example, you can actually have a file called "CON" on windows, and it is accessible by quoting the path in various ways (or by accessing it via native Windows APIs), but FilePath would currently consider that path a security violation, regardless of whether it actually exists.

Additionally, Windows paths may natively be unicode. FilePath doesn't really pay attention to unicode vs str, so it may be broken with respect to non-ASCII(?) paths. Even if it does work, it needs unit tests to demonstrate this and prevent regressions.

Attachments (1)

twisted.python.filepath-2366-1.patch (52.6 KB) - added by hbock 13 months ago.
first stab at a fix for both Python 2.6+/3.3+

Download all attachments as: .zip

Change History (20)

comment:1 Changed 8 years ago by glyph

  • Description modified (diff)

comment:2 Changed 8 years ago by glyph

This should also include using MOVE_FILE_REPLACE_EXISTING to actually make setContent (and moveTo, of course) atomic.

comment:3 Changed 8 years ago by teratorn

  • Cc teratorn added

comment:4 Changed 6 years ago by glyph

  • Description modified (diff)
  • Summary changed from Make filepath work on windows to FilePath should allow access to all possible paths on Windows

comment:5 Changed 6 years ago by glyph

I have created #3454 for the separate issue of setContent and moveTo being non-atomic.

comment:6 follow-ups: Changed 3 years ago by <automation>

  • Owner jknight deleted

comment:7 in reply to: ↑ 6 Changed 3 years ago by Zancas

Replying to <automation>:

Hi all I see that #4736 was closed as a duplicate of this ticket, but it's not clear to me that FilePath handles Unicode in all cases. Does it?

comment:8 in reply to: ↑ 6 Changed 3 years ago by Zancas

  • Cc wilcoxjg@… added

Replying to <automation>:

Hi all I see that #4736 was closed as a duplicate of this ticket, but it's not clear to me that FilePath handles Unicode in all cases. Does it?

comment:9 Changed 3 years ago by davidsarah

  • Cc davidsarah added
  • Keywords unicode windows added

The Tahoe-LAFS project would like to use FilePath, but we're concerned that lack of Unicode support would cause a regression on Windows relative to our existing code that uses unicode strings for paths.

comment:10 Changed 3 years ago by davidsarah

By the way, I don't really agree that it is a good idea to create Windows files with names that are usually prohibited. The Microsoft article on this is pretty clear:

Do not use the following reserved device names for the name of a file:

CON, PRN, AUX, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9. Also avoid these names followed immediately by an extension; for example, NUL.txt is not recommended. For more information, see Namespaces.

The fact that you can access these files via \\?\ is beside the point; you'll end up with files that can't be accessed by most other applications.

But #4736 was closed as a duplicate of this ticket. Perhaps #4736 should be reopened in order to focus on the Unicode issue.

comment:11 Changed 3 years ago by exarkun

  • Description modified (diff)

comment:12 follow-up: Changed 3 years ago by exarkun

#4736 was just a bad ticket. I would have closed it as invalid instead of as a duplicate.

If someone wants to work on the issue of unicode paths separately from the other issues described on this ticket, file a new ticket for it. As James pointed out though, there is some work in the branch associated with this ticket which addresses unicode issues.

comment:13 Changed 3 years ago by zooko

  • Cc zooko@… added

I created #5203 (FilePath.children() should return FilePath objects with unicodes in them instead of strs) to track a specific feature that we could use. Also to publish a small novel about Tahoe-LAFS and filename handling.

comment:14 in reply to: ↑ 12 ; follow-up: Changed 3 years ago by davidsarah

Replying to exarkun:

#4736 was just a bad ticket. I would have closed it as invalid instead of as a duplicate.

If someone wants to work on the issue of unicode paths separately from the other issues described on this ticket, file a new ticket for it. As James pointed out though, there is some work in the branch associated with this ticket which addresses unicode issues.

Er, what branch? I don't see anything linked from the Branch field. Am I looking in the wrong place?

comment:15 in reply to: ↑ 14 Changed 3 years ago by glyph

  • Branch set to /branches/win-filepath-2366-2

Replying to davidsarah:

Replying to exarkun:

#4736 was just a bad ticket. I would have closed it as invalid instead of as a duplicate.

If someone wants to work on the issue of unicode paths separately from the other issues described on this ticket, file a new ticket for it. As James pointed out though, there is some work in the branch associated with this ticket which addresses unicode issues.

Er, what branch? I don't see anything linked from the Branch field. Am I looking in the wrong place?

Some branches predate the auto-branch-field-updater gizmo so you have to go to http://twistedmatrix.com/trac/browser/branches/ and grovel around for the ticket number. I've updated the branch field on this ticket.

comment:16 Changed 13 months ago by hbock

  • Branch /branches/win-filepath-2366-2 deleted
  • Cc bock.harryw@… added
  • Keywords review added

I've attached a first stab at a patch that resolves this issue, per the advice of Itamar on the mailing list.

Sorry the patch is so large. Most of the changes involve getting rid of the b"" qualifier for byte strings in the unit tests.

  • Use os.O_BINARY instead of importing it from the win32 module - supported in 2.6+
  • Rename _secureEnoughString to _secureEnoughBytes to reflect usage
  • Remove old pre-2.6 exception handling for listdir() on Windows - this no longer applies and can be handled with one except clause
  • listdir() is expected to return a list in other modules (trial); map() returns a view on Python 3.3, use list comprehensions instead
    • This also applies to globChildren()
  • Add eq support
  • Update documentation to explain str/bytes behavior and restrictions
  • Depending on whether the input pathname string is unicode or bytes, internally cache certain constants as the right type so Python won't raise exceptions for various operators and system calls.
  • setContent and temporarySibling now use None as a placeholder for their default extensions, so we can properly infer the correct string type to use without needing to decode/encode
  • Change old unittests back to using the native string type, which should continue to work as expected for both Python 2.6+ and 3.3+.
  • Add unittests that explicitly deal with unicode vs. bytes handling.

One question I have is regarding the suppression of warnings; I force Python to filter the ntpath DeprecationWarnings on Windows during unit tests. Is this acceptable?

Changed 13 months ago by hbock

first stab at a fix for both Python 2.6+/3.3+

comment:17 Changed 13 months ago by exarkun

Thanks for your work on this. Just one comment for starters, sorry for not doing a complete review instead.

Most of the changes involve getting rid of the b"" qualifier for byte strings in the unit tests.

I don't think this is the right approach. All strings should either be b"" or u"", both of which are legal in both Python 2.{6,7} and Python 3.3, the three versions of CPython Twisted targets at this time.

Using "" means the test has different meaning depending on which version of Python is being used. Using b"" or u"" means the test has the same meaning no matter what. We should aim to retain the current behavior for byte strings, tested on Python 2.x and Python 3.x. We'll presumably also need new tests that verify the desired behavior for unicode strings has been implemented.

Plan/Python3 tries to explain this idea along with some others.

comment:18 Changed 13 months ago by hbock

  • Keywords review removed

comment:19 Changed 13 months ago by hbock

Removed from review per the suggestion of Tom Prince to break it up.

Note: See TracTickets for help on using tickets.