Opened 14 years ago

Last modified 14 years ago

#1008 enhancement closed fixed (fixed)

Implement spawnProcess for IOCP reactor

Reported by: justinj Owned by: justinj
Priority: low Milestone:
Component: Keywords: win32
Cc: Pahan, justinj, PenguinOfDoom, zooko Branch:
Author:

Description


Attachments (3)

win32gui.pyd (92.0 KB) - added by justinj 14 years ago.
final_patch2.txt (33.5 KB) - added by justinj 14 years ago.
the_patch3.txt (33.5 KB) - added by justinj 14 years ago.

Download all attachments as: .zip

Change History (33)

comment:1 Changed 14 years ago by justinj

Adding Pahan to Nosy List.  Is there any plan to implement this?  If not, are
there any recommendations for me if I were to start digging through the code to
see if I could tackle it?

comment:2 Changed 14 years ago by justinj

Attached (iocp_spawnProcess.patch) is what I have so far, so I don't lose it.  
It had to be implemented with named pipes since anonymous pipes aren't support 
for async.  I'm not positive if my named pipe code is correct at this point.  
I'm also working out some details on how to call the appropriate methods on 
the Process transport in certain situations.

comment:3 Changed 14 years ago by justinj

Updated patch attached (old one removed) as well as debugging info I'm 
currently using to track down a problem.

comment:4 Changed 14 years ago by justinj

The bug I'm currently getting (see FileMon.log) only happens when one or more 
of the initial three read operations returns CANCELLED status.  The closest 
thing I can find to a description of what that means is at 
http://www.cs.princeton.edu/~appel/smlnj/basis/posix-error.html which says 
cancelled indicates that the associated asynchronous operation was cancelled 
before completion.

comment:5 Changed 14 years ago by justinj

Fixed!  New patch attached.  The only thing remaining is to implement getHost 
and getPeer.

comment:6 Changed 14 years ago by justinj

It was decided that getHost and getPeer aren't needed and don't make sense on 
a process transport.

After discussing on IRC, the task remaining is to wait for the process to end 
before getting the exit status instead of relying on the process having died 
after the 3 handles are closed.  Supposedly calls to GetExitCodeProcess will 
block at certain times if the process.  According to what exarkun thinks he 
remembers, there are 3 states: running, dying, dead.  If GetExitCodeProcess is 
called when the process is dying it can block.  It seems that our only option 
is to have a thread running WaitForMultipleObjects for each set of 64 
processes, which will notify the process transport when a process dies so we 
can safely get the exit status.  Ugly, but it seems there are no alternatives.

comment:7 Changed 14 years ago by justinj

Attached (process.py) is my initial attempt at calling WaitForMultipleObjects 
on 64 process handles per thread.  It hasn't been tested and is dependent on 
win32gui.PeekMessage being added by Mark Hammond as he commented at 
http://mail.python.org/pipermail/python-win32/2005-July/003604.html.

comment:8 Changed 14 years ago by justinj

Updated attached patch.  It passes all tests now.  Dependent on updated 
win32gui.pyd from Mark Hammond, until the next win32all release.

Changed 14 years ago by justinj

Attachment: win32gui.pyd added

comment:9 Changed 14 years ago by justinj

Updated patch.  Now unit test returns deferred, and wfmo threads properly 
close.

comment:10 Changed 14 years ago by justinj

See output.txt for the output I'm receiving when running...

python c:\python24\scripts\trial.py -r iocp 
twisted.test.test_process.ProcessTestCase.testManyProcesses

comment:11 Changed 14 years ago by justinj

I am currently aware of two problems that occur when running the tests.

1) It seems that all64 processes in testManyProcesses get started and run and 
complete successfully, but _check never gets called and the test hangs and 
eventually returns with the following error:

[ERROR]: twisted.test.test_process.ProcessTestCase.testManyProcesses

  File "C:\src\Twisted\twisted\trial\util.py", 
line335, in wait
    r = _wait(d, timeout)
  File "C:\src\Twisted\twisted\trial\util.py", 
line282, in _wait
    raise defer.TimeoutError()
twisted.internet.defer.CancelledError:

2) Sometimes an error occurs because events happen in the wrong order and thus 
the stages list has the wrong contents.  I haven't seen this error in a 
while.  I'd like to resolve #1 first and see if this happens again.

comment:12 Changed 14 years ago by justinj

I think I may have fixed problem #1, and am running into #2 again.  I need to 
attach an updated patch, hopefully later tonight.  My code that closes the 
wfmo thread was in the wrong place.

comment:13 Changed 14 years ago by justinj

Fixed out of order stages problem.  See updated patch.  I still occasionally 
get a lock up / timeout error.

comment:14 Changed 14 years ago by justinj

Updated patch: cleaned up code and removed print statements.  Currently can't 
reproduce lock up problem, but I think it still exists.

comment:15 Changed 14 years ago by justinj

I reproduced the lockup problem and found out what is happening.  To notify 
the wfmo threads that to wait on new handles I call PostThreadMessage.  I 
added some code to track when a message was posted and when it was received 
and found that sometimes messages are sent but not received, resulting in wfmo 
eventually being called on an empty list which hangs.

Per PenguinOfDoom's recommendation I rewrote the code to use a Window to send 
messages using PostMessage instead of PostThreadMessage.  That code is now 
attached, but suffers from the same problem.  I'm not sure why this is 
happening.

comment:16 Changed 14 years ago by justinj

Fixed.  Working patch attached as working_patch_using_win_msgs.txt

comment:17 Changed 14 years ago by justinj

Got code to hang again.  I fixed this though and after extensive testing have 
not been able to get it to hang.  Please see final_patch.txt and commit if 
you're okay with it.

comment:18 Changed 14 years ago by justinj

To be clear, I would like this code to be committed before the next release if 
at all possible.

comment:19 Changed 14 years ago by justinj

Attached final_patch2.txt with some fixes to tests to remove log files.

comment:20 Changed 14 years ago by justinj

That last patch didn't include process.py.  Since process.py is new and I 
forgot to do an svn add of it before doing my diff.

Changed 14 years ago by justinj

Attachment: final_patch2.txt added

comment:21 Changed 14 years ago by justinj

Instead of being lazy I updated the patch to include process.py.

comment:22 Changed 14 years ago by justinj

Yo!  Can you commit for me?  I keep trying to get a hold of you but we don't 
seem to sync up.

Changed 14 years ago by justinj

Attachment: the_patch3.txt added

comment:23 Changed 14 years ago by justinj

Updated patch with some additional test fixes.

comment:24 Changed 14 years ago by justinj

Zooko, have you had time to look at the patch I committed to /branches/justinj?

comment:25 Changed 14 years ago by zooko

No, I haven't, but I still think that I will sometime soon after my current task
at work is complete.

comment:26 Changed 14 years ago by justinj

Okay, please keep me updated.  Otherwise I'll just keep pestering you.  :)  
Thanks.

comment:27 Changed 14 years ago by justinj

Any chance to look at this yet?

comment:28 Changed 14 years ago by justinj

Resolved in r14952.

comment:29 Changed 14 years ago by justinj

Not all tests were run previously due to other bug which is now resolved.  
twisted.tests.test_iutils.testValue made a call to spawnProcess indirectly 
which resulted in an error dialog appearing on win32 build slave.  Reverted 
and will test on own branch.

comment:30 Changed 14 years ago by justinj

Fixed in r15084.
Note: See TracTickets for help on using tickets.