Opened 3 years ago

Closed 6 days ago

#5987 enhancement closed fixed (fixed)

Port twisted.internet.process to Python 3

Reported by: itamar Owned by: hawkowl
Priority: normal Milestone: Python-3.x
Component: core Keywords:
Cc: Branch: branches/tiprocess-py3-5987-5
(diff, github, buildbot, log)
Author: hawkowl Launchpad Bug:

Description

twisted.internet.process should run on Python 3. This includes re-enabling process support in twisted.internet.posixbase.

Attachments (1)

modi_process.patch (2.8 KB) - added by introom 2 years ago.
modified some print statements and exception clauses

Download all attachments as: .zip

Change History (28)

comment:1 Changed 3 years ago by itamar

This may actually not be required for the minimal 3.3 milestone.

comment:2 Changed 2 years ago by exarkun

is this a duplicate of #5968?

comment:3 Changed 2 years ago by itamar

  • Milestone changed from Python 3.3 Minimal to Python-3.x

Changed 2 years ago by introom

modified some print statements and exception clauses

comment:4 Changed 6 weeks ago by hawkowl

  • Author set to hawkowl
  • Branch set to branches/tiprocess-py3-5987

(In [44120]) Branching to tiprocess-py3-5987.

comment:5 Changed 6 weeks ago by hawkowl

This will also be ready for review when #7805 lands.

comment:6 Changed 6 weeks ago by hawkowl

  • Branch changed from branches/tiprocess-py3-5987 to branches/tiprocess-py3-5987-2

(In [44252]) Branching to tiprocess-py3-5987-2.

comment:7 Changed 6 weeks ago by hawkowl

  • Keywords review added

I'm reasonably happy with this.

The test_process files are free of warnings, and make sure to pass EVERYTHING as bytes, which is why there might be some weird juggling around.

Please review. Builders spun. Look green so far.

comment:8 Changed 6 weeks ago by adiroiban

Thanks for working on this

twistedchecker is red.

  1. in twisted/internet/process.py:429 stderr.write(msg.encode("ascii"))

On Python3 msg is Unciode right... and then converting unicode to ascii will break in case there are non-ascii characters. Why not encode to utf-8?

  1. For twisted/internet/test/process_cli.py and twisted/internet/test/process_connectionlost.py I was expecting a comment describing how they are used by the test suite and what should the test. I found it easier to have them near the test code rather than as a separate file.
  1. In twisted/internet/test/test_gireactor.py regarding # Requires twisted.python.modules, why not let the code break and wait for #7804 to be merged... oherwise we might forget to uncomment it.
  1. Is it ok to just use str.decode() without an explicit encoding?
  1. in twisted/test/process_tester.py maybe we need a comment describing why we can not reuse the _PY3 from compat and need to duplicate this code.

6.Is it ok to convert iterators to list in py27 just to please py3? I would prefer to keep using iterators in Py27, where possible. Ex iteritems(env) instead of env.items() ... and xrange()

Other than the comment above it looks good.

I am leaving this in the review queue so that others can look at it.

I think that point 3 is a blocker for merging these changes. I will focus on reviewing it.

Thanks!

comment:9 Changed 5 weeks ago by adiroiban

  • Keywords review removed
  • Owner set to hawkowl

I think that #7804 is ready to merge and we can remove the reference from this branch.

Please see previous comments and re-submit for review.

Thanks!

comment:10 Changed 4 weeks ago by hawkowl

  • Branch changed from branches/tiprocess-py3-5987-2 to branches/tiprocess-py3-5987-3

(In [44350]) Branching to tiprocess-py3-5987-3.

comment:11 Changed 4 weeks ago by hawkowl

  • Keywords review added
  • Owner hawkowl deleted

Thanks for the review, Adi.

  • I think that's a good idea. Done.
  • I'm not sure how to do this in a nice way. Probably out of the scope of the ticket, as it'd have to be done everywhere that uses an external process_* file, not just the new ones.
  • Merged forward, made those tests pass.
  • Did an explicit decoding.
  • Comment added
  • Good idea. Changed in a few spots.

Builders spun, all look good, please review.

comment:12 Changed 4 weeks ago by adiroiban

  • Keywords review removed
  • Owner set to hawkowl

Thanks for the changes!

I find it hard to associated your comments with my initial review commends.

Based on the comments from #6535 I am -1 on the changes which moved the python data into files stored on disk.

What does 'us' 'exe' variable names should mean?

For

-    for process in reapProcessHandlers.values():
+    for process in list(reapProcessHandlers.values()):

-        for signalnum in range(1, signal.NSIG):
+        for signalnum in xrange(1, signal.NSIG):

-                        for fd in range(3):
+
+                        for fd in xrange(3):

-        for childFD, parentFD in helpers.items():
+        for childFD, parentFD in items(helpers):

-        destList = fdmap.values()
-        for fd in _listOpenFDs():
+        destList = list(fdmap.values())
+        for fd in list(_listOpenFDs()):

-                    for c, p in fdmap.items():
+                    for c, p in items(fdmap):

and many more ... but I stopped doing stupid copy/paste from diff

Why do we need these conversions?

def _bytesEnviron():

and

def _concatenate(bObj, offset, bArray):

need docstring ... and the method is defined in a test, why we need to mark it private ?

it looks like we are slowly reinventing six, why not depend in six?

I think that is ok with bytesPath in spawnProcess but I hope that soon we can update the spanProcess to access path as Unicode, at least for Windows/OSX

comment:13 Changed 4 weeks ago by hawkowl

  • Keywords review added
  • Owner hawkowl deleted

Hi Adi, thanks for the review.

  • I'm not sure if that applies to this -- that seems to be data (like, data sets) rather than scripts. I prefer them separate, though.
  • I added comments about the exe, and removed the use of "us" (it wasn't actually required).
  • These conversions mean that both py2/py3 will have either solid lists, or an iterator. .items() on Py3 is an iterator, not a list. items() wraps this in a list for py3, and passes it through in py2.
  • I removed both of those instances. _concatenate was a copy from somewhere else, so rather than have a second, I just imported it.
  • I think that introducing six might be a good idea. All of these functions could just be reexported from twisted.python.compat with the six versions, later.

Builders spun, please review.

comment:14 Changed 3 weeks ago by adiroiban

Thanks for working on this:

There are still some twistedchecker errors

************* Module twisted.python.compat
W9208:432,4:iteritems: Missing docstring
W9208:435,4:items: Missing docstring
************* Module twisted.test.test_process
C0301:2119,0: Line too long (98/79)
C0301:620,0: Line too long (109/79)

I am leaving this for others to review as I am -1 for forcing iterators into lists.

Thanks!

comment:15 Changed 13 days ago by hynek

  • Keywords review removed
  • Owner set to hawkowl
  1. I don’t think we need items? Usually we want to force an iterator but I can’t think of a reason to force a list conversion. In some cases (e.g. twisted.internet.base. ReactorBase_checkProcessArgs) iteritems got changed into items which doesn’t make sense to me? I generally consider the usage of non-iterators a bug. I will not point them out in the review any further.
  2. Lines 620 and 2119 are indeed too long, please split them.
  3. why is for i in xrange(CONCURRENT_PROCESS_TEST_COUNT): switched to range?
  4. reapAllProcesses: why is reapProcessHandlers.values() changed into a list?
  5. _fork: the encode to utf-8 seems a bit adventurous; is there any precedent on this? By UNIX standards, we should respect LC_CTYPE here but I’m not sure how we handle that elsewhere.
  6. Process: the empty line between the docstring and debug is not necessary.
  7. Writing bytes to stderr on Python 3 is a bug since they are “text files”. So e.g. in _setupChild, the changes should be IMHO reverted. Native strings are the way to go.
  8. same method, the list()ing seems superfluous and wrong. If we just iterate over it, an iterator is the much better choice.
  9. same method, childlist.sort() is icky and requires list()ing. for child in sorted(fdmap.keys()): instead?
  10. the loop in process_cli.py needs an explanation on what it’s doing and why.
  11. test_process.py: import twisted shouldn’t be glued to from … imports.
  12. the free lines after the imports do not conform with our standards
  13. since exe is a global variable, it should have a more descriptive name.
  14. test_systemCallUninterruptedByChildExit why are we decoding here a string that gets formatted into a native string? Shouldn’t the format string be unicode too then?
  15. ProcessTestsBuilder: do we need the b in sibling(b'process_helper.py') now that FilePath is unicode aware?
  16. makeSourceFile since all lines touched are anyway…with open() maybe? :)
  17. twisted/test/process_cmdline.py shouldn’t it import print_statement from future?
  18. dist3: shouldn’t twisted/internet/_baseprocess.py, be added too? internet.base and posixbase are still missing parts I guess?

That’s all I can find right now. Please address all points either in code or prose and re-submit for review. Thank you!

comment:16 Changed 13 days ago by hynek

oh and # write the error to.] in test_stderr in test_process.py looks like a typo.

comment:17 Changed 13 days ago by hawkowl

  • Branch changed from branches/tiprocess-py3-5987-3 to branches/tiprocess-py3-5987-4

(In [44504]) Branching to tiprocess-py3-5987-4.

comment:18 Changed 11 days ago by hawkowl

  • Keywords review added
  • Owner hawkowl deleted

Hi Hynek, thanks for the review.

  1. Some things mutate underneath us. :( I've gone through and tested where I changed them, and left the ones as list() that actually need to be cast to a list.
  2. Fixed.
  3. Fixed.
  4. See #1.
  5. As per 7, changing this to write it in text mode.
  6. Fixed.
  7. Fixed.
  8. Fixed.
  9. Fixed.
  10. Fixed?
  11. Fixed
  12. Fixed
  13. Made it better
  14. Fixed
  15. Yes, as that produces a bytes path :)
  16. Done
  17. Done
  18. Done.

Builders spun, please review.

comment:19 Changed 9 days ago by adiroiban

Just a small comment. I think that in the case where a list is required, the change should come with a comment explaining/warning that the list is changed during the iteration.

I don't know what is the Twisted policy regarding this, but I think that iterating over a list which is mutated should be avoided... and when required it should come with a big warning.

Thanks for the changes!

comment:20 Changed 9 days ago by hynek

  • Keywords review removed
  • Owner set to hawkowl
  1. I’m having second thoughts about 7: I just realized we don’t use sys.stderr but open it ourselves. TBH, I don’t know what the correct approach here would be…I fear bytes was the right approach? *sorry*
  2. there’s really no point on introducing the childlist variable, is there?
  3. I don’t think the list in for process in list(reapProcessHandlers.values()): is necessary. If it is, please add an comment why that is so. (basically what Adi said)
  4. umm, what does the t in with open(script, 'wt') as scriptFile: ’s mode actually mean? It that a typo? Or does it explicitly enforce the default text mode? If so, it doesn’t appear to be documented?
  5. You didn’t fix 2 according to the buildbot.

We’re getting close but at least 1 and 4 need to be addressed; 2, 3, and 5 would be nice.

As for 1, I guess we should ask glyph/jp? If they OK the current state, feel free to merge. If 1 needs substantial changes, please resubmit for review.

Thanks!

comment:21 Changed 8 days ago by glyph

Regarding sys.stderr and encoding... standard I/O is another tricky mess, like path names.

When you're writing to a console, you're displaying text to a user, so that "I/O stream" is text. But of course the kernel interface is bytes, which have to be encoded in a certain way. A way which is a secret to you, but may be hinted at by some environment variables which you may or may not have. Except on Windows, where the kernel interface is actually text. Except that interface may or may not actually be exposed to you in Python.

But when you're writing to an I/O stream that may be redirected to a file, you really are writing bytes.

SO!

The correct thing to do is to use the brand-new twisted.python.compat.ioType, to ask what type of stream it is, before you start doing stuff with it. There's no unambiguous way to represent output; it's not as simple as "native string", since the distinction is not "py2 vs. py3" but "whatever random mangling that various libraries have done to the sys.std* objects before you get them".

I seem to recall that sometimes StandardIO would be a TextWrapper and sometimes it would be a direct BufferedWriter... but I can't reproduce this at the moment. Maybe it was just a result of some code actually setting the attribute, but that can and does happen, and _setupChild should be prepared to handle it.

Does that adequately resolve the question?

comment:22 Changed 8 days ago by hynek

Interesting, the problem is that in this code we open stderr ourselves (os.fdopen(2, 'w') in twisted/internet/process.py). Maybe we should use one of those fancy mentioned APIs instead?

comment:23 Changed 8 days ago by hawkowl

  • Branch changed from branches/tiprocess-py3-5987-4 to branches/tiprocess-py3-5987-5

(In [44554]) Branching to tiprocess-py3-5987-5.

comment:24 Changed 8 days ago by hawkowl

  • Keywords review added
  • Owner hawkowl deleted
  1. The docs say that stderr/etc is always a text file -- so for the ones we don't open, I think that's safe to assume (because otherwise nothing would work?) and ones we do open, we open, encode and write bytes. I reverted part of the changes to line up with this.
  2. Fixed.
  3. Added a comment.
  4. It is documented :) https://docs.python.org/3.4/library/functions.html#open
  5. I think the buildbot broke from being blocked up. I manually checked and they are fine, hopefully the buildbot picks it up.

Builders spun, please review.

comment:25 Changed 6 days ago by glyph

Yes, you're absolutely right. the fdopen'd stderr is clearly isolated from sys.stderr and should remain so; the idea in the child environment is to make as minimal changes as necessary to report errors, and to work mainly with objects that we made ourselves. We open it as bytes and we write bytes, and then we consume those bytes from the parent process as bytes. Seems like it should work fine to me.

comment:26 Changed 6 days ago by hynek

  • Keywords review removed
  • Owner set to hawkowl

4: "t" is not documented in https://docs.python.org/2.7/library/functions.html#open but in io.open it is so I guess it’s okay…

5: is not fine. Is it possible you’re mixing up twisted/test/test_process.py and twisted/internet/test/test_process.py?

E.g. https://github.com/twisted/twisted/compare/trunk...tiprocess-py3-5987-5#diff-2602810bbaa6ba2fee88219fa4042993R1046 is 82 chars.


Please really fix 5, then merge after the twistedchecker buildbot is green.

comment:27 Changed 6 days ago by hawkowl

  • Resolution set to fixed
  • Status changed from new to closed

(In [44579]) Merge tiprocess-py3-5987-5: Port twisted.internet.process to Python 3

Author: hawkowl
Reviewers: adiroiban, hynek
Fixes: #5987

Note: See TracTickets for help on using tickets.