wiki:DocumentationAnalysis/UsingProcesses/AndrewBennetts

Version 1 (modified by trac, 8 years ago) (diff)

--

Review details

  • Link to document: Using Processes
  • Reviewer's name: Andrew Bennetts
  • Review date: 17 January 2006

Document expectations

Intended user

Before someone begins using this document, I'd really really like it if they knew:

  • What processes, argv, process exit codes, environment variables, stdin, stderr and stdout are (i.e. the basics of what processes in a modern OS are).
  • It would be nice if they knew how to spawn processes in Python, and read their output, e.g. with the subprocess module, although strictly speaking this isn't really relevant to doing this in Twisted.
  • Of the security risks in using untrusted data from the network to spawn processes, or as input for a process.
  • The typical pitfalls in communicating with subprocesses over pipes: e.g. deadlocks due to buffering. (although Twisted's async API ameliorates this a little), Python's -u switch, and so on.
  • reactor basics.
  • deferred basics (how to be a client of APIs that return them, perhaps not necessarily how to create them).

Outcomes

Once someone has read this document they should understand:

  • how to spawn processes in Twisted:
    • that the fundamental API to use is reactor.spawnProcess (which is defined in IReactorProcess), and how to use it.
      • that this involves writing a ProcessProtocol, quite similar to writing a Protocol for a TCP socket, and how to do so.
    • that the standard library functions for process spawning are not (unfortunately) safe to use in Twisted, due to signal handling woes on POSIX platforms -- os.popen is a strict no-no.
    • that there are helper functions in twisted.internet.util such as getProcessOutput that take care of the common cases for you.
    • how to use IProcess objects, e.g. foo.signalProcess('KILL').
  • that process support on Windows is more-or-less totally broken (except under cygwin) -- help would be appreciated! Hmm, but see http://twistedmatrix.com/bugs/issue591 -- I guess the key message is "make sure the reactor you're using supports IReactorProcess".
  • advanced uses:
    • when and how to use the usePTY=True argument of spawnProcess (POSIX only?)
    • the childFDs argument of spawnProcess (POSIX only)
    • that if the developer doesn't understand usePTY and childFDs, they probably don't need them so can probably just ignore them (this is why I didn't include PTYs and file descriptors in the expected user prerequisite knowledge).
  • that the reactor will process events (data written over pipes, process ending) from spawned processes just as it handles events from sockets (and so just like with network stuff in Twisted, you need to start the event loop with reactor.run() or twistd or similar).

Document review

Coverage of subject matter

This document seems to cover the following subject matter at least acceptably well:

  • the spawnProcess API and its arguments and return value.
  • that env argument of spawnProcess is empty by default because the default is to be secure rather than convenient -- it's good that both the API and the text remind developers to think about security issues.
  • writing a simple ProcessProtocol that uses both stdin and stdout.
  • events that a ProcessProtocol may need to handle.
  • methods that you can call on a ProcessProtocol.
  • using getProcessOutput. Very good example, too.
  • using childFDs

This document seems to be attempting to cover the following subject matter, but its coverage is flawed:

  • "Doing it the Easy Way" (where getProcessOutput is described) is probably what most people need, and is simpler, so it really ought to be up-front instead of the low-level spawnProcess API.
  • the pitfalls of needing to manually ensure argv[0] is the executable and that the enviroment will be empty by default probably show be in a little "warning" note, rather than buried in the text.
  • The WCProcessProtocol example should be downloadable as a single file ready for execution, rather than only being available in several snippets -- the interleaved explanatory text (almost source comments, really) are very good though.
  • The methods of ProcessProtocols are listed with inconsistent use of parens (some .method(args), some .method -- I'd expect the latter to be written .method())
  • the SIGCHLD issue that prevents stdlib functions like commands.getoutput from working properly is only mentioned as an aside buried in the text. It deserves more prominence, it's a serious and unobvious pitfall.
  • the FortuneQuoter example for getProcessOutput has unused imports.
  • the GPGProtocol example is not a complete, executable program. It also violates PEP-8 by putting if childFD == 1: self.plaintext += data on one line instead of two, and not seperating methods with blank lines. The decrypt function it defines is actually the main API of the example, so it deserves a docstring. It is an unusually practical example though, which is excellent.

This document ought to be covering the following subject matter but is not:

  • the usePTY option
  • it should be stated that childFDs is not part of IReactorProcess because it does not necessarily exist on all platforms (it's probably a bit of a wart that usePTY is part of that interface... ah well).

This document is recommending the following things that it shouldn't be recommending:

  • os.kill(self.transport.pid, signal.SIGKILL) is not the right way to kill a process -- self.signalProcess('KILL') is, which is more portable. And quite likely it's more appropriate to use TERM instead of KILL.
  • in the "verbose" example, hard-coding a call to reactor.stop() inside a Protocol (or rather ProcessProtocol) implementation is poor taste. Protocols generally should be mechanism, and policy should be implemented on top of that, not intertwined.
  • also in the verbose example, the status_object variable name should be statusObject to properly conform to the coding standard.

This document could be supplemented by links to these existing pre-requisite ("you should read this first") documents:

  • reactor basics
  • deferred intro

This document could be supplemented by links to these existing follow-up ("now that you know X you can try Y") documents:

This document could be supplemented by as-yet non-existant pre-requisite ("you should read this first") documents on:

This document could be supplemented by links to these as-yet non-existant follow-up ("now that you know X you can try Y") documents:

  • using twisted.internet.stdio

This document could be supplemented by links to the following relevant source in Twisted, for concrete examples of this knowledge as well as deeper understanding:

  • twisted.internet.utils.getProcessOutput
  • twisted.web.cgi
  • doc/core/examples/ptyserv.py

This document could be supplemented by links to other references:

  • W. Richard Steven's Advanced Programming in the UNIX(R) Environment, ISBN: 0201563177.

Style

The following changes to the style of the document would make it easier to read:

Perhaps splitting out the childFDs stuff (the "Mapping File Descriptors" section) into a seperate document. It's less vital knowledge, and necessarily denser and requiring more prerequiste knowledge (e.g. what a "fd" is, some idea of how piping between processes works). It clearly feels like a seperate document tacked on to the rest, which is more-or-less true.

Following the current Twisted Documentation style guide by e.g. declaring prerequisite knowledge.

The "Easy Way" should definitely be mentioned first.

Overall summary

This document is: good, with some minor improvements to recommend