Opened 3 years ago

Closed 11 months ago

Last modified 11 months ago

#6917 enhancement closed fixed (fixed)

Porting twisted.python.zippath to python3

Reported by: realcr Owned by: hawkowl
Priority: normal Milestone: Python-3.x
Component: core Keywords:
Cc: Branch: branches/zippath-py3-6917-2
branch-diff, diff-cov, branch-cov, buildbot
Author: hawkowl, glyph

Description

zippath.py will support both python2.7 and python3.3 syntax.

Uploaded path contains:

  • Addition of relevant from future statement.
  • Changing implements to implementer, regarding zope.interface.

Attachments (3)

python_zippath_py3.patch (1.1 KB) - added by realcr 3 years ago.
Minor syntax fixes.
port_python_zippath_py3.patch (35.4 KB) - added by cpdean 2 years ago.
converting to python3 and fixing tests
port_python_zippath_py3_comparable_decorator.patch (32.8 KB) - added by cpdean 2 years ago.
removed my own implementation of eq in favor of the old cmp with the comparable decorator for python3 compatibility

Download all attachments as: .zip

Change History (35)

Changed 3 years ago by realcr

Minor syntax fixes.

comment:1 Changed 3 years ago by exarkun

  • Keywords python3 review removed
  • Owner set to realcr

Comments for this are basically the same as for #6910. Thanks.

comment:2 follow-up: Changed 2 years ago by cpdean

I'm looking at this with respect to all the steps porting tests, etc

comment:3 in reply to: ↑ 2 Changed 2 years ago by cpdean

Replying to cpdean:

I'm looking at this with respect to all the steps porting tests, etc

in case i lose my bash history:

python admin/run-python3-tests twisted.python.test.test_zippath

comment:4 Changed 2 years ago by cpdean

  • Owner changed from realcr to cpdean

Changed 2 years ago by cpdean

converting to python3 and fixing tests

comment:5 Changed 2 years ago by cpdean

  • Keywords review added

I think this is ready for review. Also opened a PR, because I'm not exactly sure if patches are still preferred for reviewers.

https://github.com/twisted/twisted/pull/26

comment:6 Changed 2 years ago by itamar

FYI twisted.python.compat.comparable is a class decorator to use for __cmp__ conversion (adding just __eq__ is insufficient, you also need __ne__).

comment:7 Changed 2 years ago by cpdean

  • Keywords review removed

thanks! I'll revert my __eq__ change and use the old __cmp__ implementation with twisted.python.compat.comparable and add some kind of test for notequals.

taking review keyword off for now

Changed 2 years ago by cpdean

removed my own implementation of eq in favor of the old cmp with the comparable decorator for python3 compatibility

comment:8 Changed 2 years ago by cpdean

  • Keywords review added

added @itamar 's suggestion, using the comparable class decorator now. decided against adding the explict != test. ready for review

comment:9 Changed 2 years ago by adiroiban

Please note that I am not a core developer and just learning how to do good reviews.

This patch needs a news file https://twistedmatrix.com/trac/wiki/ReviewProcess#Newsfiles

It is required to attach a .diff file without git prefix ... see https://twistedmatrix.com/trac/wiki/GitMirror#CreatingaGitclone

I know that doing review over Trac ticket is not as easy as PR, but this is the current process.

I tried to review the code but I got lost in the puzzle of encoding / decoding path inside a method.

I think that all input path should be converted to Unicode, do all handling in Unicode and when output is required convert it to request encoding. I see that this patch is doing the reverse... converts everything into bytes as soon as possible and work only with bytes.

Also, I don't think that the test are done with filenames containing non-ASCII characters. I see that self.cmn is based on self.mktemp() which AFAIK only produces names with ASCII characters.

Thanks for the work on this!

comment:10 follow-up: Changed 2 years ago by itamar

I'm not sure I agree that all inputs should be unicode... but there's no good answers, really.

The current patch's use of the filesystem encoding is incorrect: there's no particular reason to think that will be the same encoding used for the filenames *within* a zip file. There's a ZIP extension that indicates UTF-8, which one hopes modern ZIP files use, but old ZIP files can have random encodings and there's no way to know which. From Python docs: "There is no official file name encoding for ZIP files. If you have unicode file names, you must convert them to byte strings in your desired encoding before passing them to write(). WinZip interprets all file names as encoded in CP437, also known as DOS Latin."

The reason this code mostly still works on Python 2 is that Python 2 will automatically convert Unicode to bytes in a lot of places, using the default encoding. This is *usually* ASCII (or a superset) but that is not guaranteed... which means on some platforms the produced zipfiles will be confusingly unreadable.

As a comparison, FilePath currently accepts only bytes on both 2 and 3. The hope is to expand it so it can accept both bytes and unicode.

I suggest using native strings - bytes on python 2 and unicode on python 3. This is wrong, but we can't do any better anyway since Python 3's zipfile module is broken.

comment:11 in reply to: ↑ 10 Changed 2 years ago by cpdean

Replying to itamar:

I'm not sure I agree that all inputs should be unicode... but there's no good answers, really.

Same. When I started on this someone mentioned how paths in Twisted should be bytes, and a superclass test suite, twisted/test/test_path.py, seemed to agree with that. Since self.cmn is <bytes> on py3, I thought it made sense to convert the rest of the input strings to bytes as well.

The current patch's use of the filesystem encoding is incorrect: there's no particular reason to think that will be the same encoding used for the filenames *within* a zip file.

The reason this code mostly still works on Python 2 is that Python 2 will automatically convert Unicode to bytes in a lot of places, using the default encoding. This is *usually* ASCII (or a superset) but that is not guaranteed... which means on some platforms the produced zipfiles will be confusingly unreadable.

Yeah I had to get back to some nonbytestring so that the stdlib zipfile would accept it as a path. I see now that my solution doesn't really fix anything if the encoding isn't what's expected.

As a comparison, FilePath currently accepts only bytes on both 2 and 3. The hope is to expand it so it can accept both bytes and unicode.

I suggest using native strings - bytes on python 2 and unicode on python 3. This is wrong, but we can't do any better anyway since Python 3's zipfile module is broken.

So change the test inputs per version?

I'm not sure how I should move forward to make this good enough.

Thank you for taking the time to take a look at this!

comment:12 Changed 2 years ago by itamar

You can get native strings by simply omitting the b/u prefix from strings, i.e. "hello" instead of b"hello" or u"hello". So tests would test bytes on Python 2 and Unicode on Python 3 automatically.

(Taking a step back, as I've previously mentioned I really don't think this module is on the blocking path for anything more generally useful being ported Python 3; you could just make the support for ZIP in twisted.python.modules disabled by default on Python 3.)

comment:13 Changed 21 months ago by glyph

  • Author set to glyph
  • Branch set to branches/port-python-zippath-py3-6917

(In [43476]) Branching to port-python-zippath-py3-6917.

comment:14 follow-up: Changed 20 months ago by exarkun

  • Owner cpdean deleted

Last submitted by cpdean so probably not being reviewed by cpdean.

comment:15 in reply to: ↑ 14 Changed 20 months ago by cpdean

Replying to exarkun:

Last submitted by cpdean so probably not being reviewed by cpdean.

Hey, sorry -- I lost all steam on this when I started reading up on all the zip problems in py3k and zip path problems in general :(

comment:16 Changed 20 months ago by glyph

No problem cpdean. We had some trouble with the "assign to" field for a while so it was impossible to set correctly, it's not surprising that some tickets still have it wrong. Will you be interested in taking it up again once it gets a review?

comment:17 Changed 20 months ago by cpdean

Well, if not zippath, then at least something. Last time this got reviewed, I think itamar suggested the module just be disabled for python3, but I didn't know where to start on that.

I think I was in the middle of trying to port zippath because I was trying to port twisted.trial.reporter, but then I had to track down the dependencies in its tests to twisted.trial.runner, and then twisted.python.zippath.

I admit I'm kind of new to this, any advice would be appreciated.

comment:18 Changed 18 months ago by glyph

  • Keywords review removed
  • Owner set to cpdean

Hi cpdean, thanks for working on this. Sorry for the long review latency, we're working on it :).

Some points that need to be addressed:

  1. Throughout this patch, the documentation needs to be updated to explain what a "string" is. For example, the first change I see in the diff is a change to test_zippath.zipit, which doesn't have @param documentation explaining what dirname and zfname are. So it's hard for me to comment on whether the .decode is the right call or not. Although this is required everywhere anyway, this is especially important in ZipArchive.__init__: this is where the "path" gets started.
  2. test_zipPathReprEscaping has a conditional in it that appears to be unnecessary. When I deleted it it seemed to pass. How did this get in there, and what was the intent of differing between str and repr on py3?
  3. zippath.ENCODING is a new public API which is undocumented. I think this should probably not be a public API at all, but it needs to be documented anyway. What is it encoding?
  4. Why is it encoding in the tests and ENCODING in the tests? It seems like the tests ought to be importing whatever is defined in the module rather than getting the filesystem encoding again.

Other observations:

  1. sys.getfilesystemencoding is a dubious API. It doesn't really mean anything. In every modern operating system, the "filesystem" is not in a single encoding; every mount point in POSIX can be in its own encoding, and the user's environment separately defines a path-name encoding for programs they've run to encode stuff. On OS X, the encoding is always UTF-8[NFD] and the value of this environment is basically ignored. On Windows, pathnames are unicode to begin with and therefore are not really "encoded" at all.
  2. Speaking of Windows... forcing path names to be one type or another makes it hard to properly support both platforms; on UNIX you need to be able to express bytes to access all paths and on Windows you need to be able to express Unicode.

Design notes:

Paths "inside" a zip file - those which are actually stored as bytes within the zip archive itself - are encoded according to different rules than those "outside" - those which come from the operating system. There's a specification - https://pkware.cachefly.net/webdocs/APPNOTE/APPNOTE-6.3.0.TXT - but apparently Python's zipfile module does not implement it - Python 3's documentation specifically says:

There is no official file name encoding for ZIP files. If you have unicode file names, you must convert them to byte strings in your desired encoding before passing them to write(). WinZip interprets all file names as encoded in CP437, also known as DOS Latin.

In my opinion, we should go with UTF-8, not CP437 (in fact WinZip has been able to store Unicode filenames for many versions but you need to tag the filename within the archive appropriately) as at least it preserves enough information to be corrected in the future.

We don't need to get this 100% correct to port ZipPath to python3, but in the process of sorting out which type is which, we should at least understand and clearly document what the limitations are and how we deal with them.

Thanks for taking on this port; sorry it is a little trickier than it looked at first :). The good news is that the buildbots look very happy and with a bit more documentation and clarity around what types things are supposed to be, we could land this.

comment:19 follow-up: Changed 18 months ago by cpdean

Hey glyph -- thanks for getting back to me ;)

I haven't looked at this in months, but I think I was approaching this the wrong way. I don't like what I did for this, mostly because I was motivated by getting tests to pass instead of understanding how ZipPath is used.

Specifically your observation #2 -- yeah now that I look at the grossness of the tests, I would hate to use anything with an api that fragile, it makes more sense to try to make ZipPath do what needs to be done without putting any extra burden on the user. Still trying to figure out the user, though.

Since Python's zipfile doesn't implement the spec, would it be ZipPath's responsibility to take that up? itamar's suggestion for disabling ZIP support in twisted.python.modules is sounding like a pretty attractive alternative to me, but I'm biased in that I've never used zipimport.

Would that be an acceptable alternative, or do we still want to try to make this work for python3?

comment:20 in reply to: ↑ 19 Changed 18 months ago by glyph

Replying to cpdean:

Hey glyph -- thanks for getting back to me ;)

I haven't looked at this in months, but I think I was approaching this the wrong way. I don't like what I did for this, mostly because I was motivated by getting tests to pass instead of understanding how ZipPath is used.

Passing tests are good, but having a sane implementation is better :-).

Specifically your observation #2 -- yeah now that I look at the grossness of the tests, I would hate to use anything with an api that fragile, it makes more sense to try to make ZipPath do what needs to be done without putting any extra burden on the user. Still trying to figure out the user, though.

Cool. if you'd like to set up a time to chat on IRC or discuss over email I am often available.

Since Python's zipfile doesn't implement the spec, would it be ZipPath's responsibility to take that up? itamar's suggestion for disabling ZIP support in twisted.python.modules is sounding like a pretty attractive alternative to me, but I'm biased in that I've never used zipimport.

I know that itamar loves removing useful functionality from Twisted but I am usually of the opposite opinion :-).

Would that be an acceptable alternative, or do we still want to try to make this work for python3?

I've used zipimport in half a dozen projects or so over the years. It can be very, very helpful in resource-constrained environments. Just recently I've been helping package something that uses zipimport using Twisted: https://github.com/rackerlabs/mimic/pull/103

If anything, I want to add functionality to zippath; particularly (as that pull request shows) a setContent method.

comment:21 follow-up: Changed 18 months ago by exarkun

I know that itamar loves removing useful functionality from Twisted but I am usually of the opposite opinion :-).

Of course, Itamar only suggested that while porting one of the things that depended on twisted.python.zippath delaying the port of the zippath-depending functionlity until zippath had been ported.

I think that was probably twisted.python.modules, required by trial. It would be incredibly useful to the porting effort to have trial working on Python 3 (for example, to make it easier to run the twisted.python.zippath test suite while trying to port that module ;). It might also be useful to have zippath working on Python 3 - but not nearly as useful.

comment:22 Changed 18 months ago by adiroiban

+1 for exarkun

Looking forward for running Python 3 tests using trial

comment:23 in reply to: ↑ 21 Changed 18 months ago by glyph

Replying to exarkun:

I know that itamar loves removing useful functionality from Twisted but I am usually of the opposite opinion :-).

Of course, Itamar only suggested that while porting one of the things that depended on twisted.python.zippath delaying the port of the zippath-depending functionlity until zippath had been ported.

Aah, I should have gone back and read the history.

I think that was probably twisted.python.modules, required by trial. It would be incredibly useful to the porting effort to have trial working on Python 3 (for example, to make it easier to run the twisted.python.zippath test suite while trying to port that module ;). It might also be useful to have zippath working on Python 3 - but not nearly as useful.

If we could speed along the development of Trial, that would probably let us develop the replacement for this and many other issues faster. By all means, porting trial first would be great.

comment:24 Changed 15 months ago by cpdean

Picking this up again, starting from scratch, progress viewable on this PR https://github.com/cpdean/twisted/pull/2/files

I'm not sure how to move forward from here -- zipfile is doing something weird when you give it a path as bytes.

Example test:

python admin/run-python3-tests twisted.python.test.test_zippath.ZipFilePathTests.test_childrenNonexistentError

output:

test_childrenNonexistentError (twisted.python.test.test_zippath.ZipFilePathTests)
test_childrenNonexistentError ... ERROR

======================================================================
ERROR: test_childrenNonexistentError (twisted.python.test.test_zippath.ZipFilePathTests)
test_childrenNonexistentError
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/conrad/dev/foss/twisted/twisted/trial/_synctest.py", line 1298, in _run
    runWithWarningsSuppressed(suppress, method)
  File "/Users/conrad/dev/foss/twisted/twisted/python/util.py", line 1021, in runWithWarningsSuppressed
    return f(*args, **kwargs)
  File "/Users/conrad/dev/foss/twisted/twisted/python/test/test_zippath.py", line 38, in setUp
    zipit(self.cmn, self.cmn + b'.zip')
  File "/Users/conrad/dev/foss/twisted/twisted/python/test/test_zippath.py", line 26, in zipit
    zf.write(fspath, arcpath)
  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/zipfile.py", line 1334, in write
    zinfo = ZipInfo(arcname, date_time)
  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/zipfile.py", line 321, in __init__
    null_byte = filename.find(chr(0))
TypeError: Type str doesn't support the buffer API

----------------------------------------------------------------------
Ran 1 test in 0.003s

FAILED (errors=1)
Exception ignored in: <bound method ZipFile.__del__ of <zipfile.ZipFile object at 0x101154cf8>>
Traceback (most recent call last):
  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/zipfile.py", line 1457, in __del__
    self.close()
  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/zipfile.py", line 1468, in close
    pos1 = self.fp.tell()
AttributeError: 'bytes' object has no attribute 'tell'

Poking around with this, filename is b'file1', so there seems to be a type mismatch between the filename and the output of chr(0). See interpreter output:

>>> b'test'.find(chr(0))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Type str doesn't support the buffer API
>>> chr(0)
'\x00'
>>> b'test'.find('\x00')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Type str doesn't support the buffer API

Also, bit of a problem, the traceback after the test fails seems to be from a cleanup step, trying to close any open files.

Traceback (most recent call last):
  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/zipfile.py", line 1457, in __del__
    self.close()
  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/zipfile.py", line 1468, in close
    pos1 = self.fp.tell()
AttributeError: 'bytes' object has no attribute 'tell'

It's a little concerning that self.fp is a 'bytes' object instead of a file object, it looks like the API for zipfile has changed from python2 to 3.

It's been suggested that this module just be disabled for now so that other parts of twisted can be ported over sooner, like Trial.

I'm not sure how I should be getting around the issues I'm seeing with zipfile, any advice? Otherwise, how can I help disable this module in python3 so that Trial can be ported sooner?

comment:25 Changed 14 months ago by adiroiban

It looks like Python3 implementatin of zippath only support filenames as text/Unicode

This mean that both the ZipArchive and the test code should decode the filename before instantiating the zipfile.ZipFile

ZipPath can continue to receive bytes.

FilePath was updated to allow both bytes or Unicode text.


Regarding self.fp I see that it is not touched by the twisted code and tests... so I don't know why we got that. It might be also due the fact that it only support Unicode paths.


A side note.

Twisted only support Python 2.6 or newer so you can get rid of

    _USE_ZIPFILE = False
    from twisted.python.zipstream import ChunkingZipFile

this can be done in a separate branch and then port a clener version of ZipPath

Cheers!

comment:26 Changed 11 months ago by hawkowl

  • Author changed from glyph to hawkowl, glyph
  • Branch changed from branches/port-python-zippath-py3-6917 to branches/zippath-py3-6917-2

(In [45609]) Branching to zippath-py3-6917-2.

comment:27 Changed 11 months ago by hawkowl

  • Keywords review added
  • Owner cpdean deleted

I took a stab, trying to implement a similar interface to FilePath.

Tests pass, I think it's in a pretty good place, re-enabled support in other places where it uses it.

Please review.

comment:28 Changed 11 months ago by hawkowl

Note -- I removed a repr test, as it did something entirely different than as FilePath does, and there's no reason for it to be more complex.

comment:29 Changed 11 months ago by adiroiban

  • Keywords review removed
  • Owner set to hawkowl

Changes look good.

I see no coverage for sibling, dirname and getsize ... so maybe we need to extend AbstractFilePathTest to validate the code from this branch... even if it is old code... but with the py3 port we can take the opportunity and increase the coverage.

Otherwise feel free to create a follow up ticket to extend the coverage.

Please see my comment and merge... but It would be nice to extend the coverage before merge :)

Thanks!

comment:30 Changed 11 months ago by hawkowl

There's a ticket which will fix this already (#5549). Will merge.

comment:31 Changed 11 months ago by hawkowl

  • Resolution set to fixed
  • Status changed from new to closed

(In [45661]) Merge zippath-py3-6917-2: Port twisted.python.zipfile to Python 3

Author: hawkowl Reviewer: adiroiban Fixes: #6917

comment:32 Changed 11 months ago by cpdean

Nice! It's doubtful i would have been able to complete this.

Thanks for closing this out.

Note: See TracTickets for help on using tickets.