[Twisted-Python] Documentation

Mon Aug 10 00:56:20 MDT 2009

Prompted by exarkun, I have put together some simple documentation for
beginners starting with Twisted.
The points are made up of things I wish I had known from the start as a
complete beginner.

It still needs lots of work. The layout needs to change, there are
duplications and the points need to be made more succinctly.

But before I spend more time on honing the document, I thought it would be a
good idea to get some feedback.

The information in the document may already exist and I have just overlooked
it.

People may feel it is not appropriate for the documentation in Twisted and
should go elsewhere.

Some of the information needs to be checked for accuracy and to avoid
misleading readers.

Anyway I would appreciate any feedback positive or negative.

John Aherne

Here ie the text:

Basic Information for anyone starting with Twisted

If you don't know mauch about TCP, then bear this in mind.

BASIC TCP

TCP is a stream of data.  Once the connection is open, it stays open until
closed.
It does not have a beginning or an end. It does not know about your
messages.
You cannot wait until it sends your message since it will not tell you the
message
has been sent. If you want to know if your message reached the other end,
you need
to have in place a protocol for each end to respond that it received the
data.
You will then need to implement a timeout for when there are problems
otherwise
you may wait a long time for a response. TCP will wait forever. But
intermediate
routers may time you out after 2 - 30 minutes if there is no traffic on the
port.

TCP is just a stream of data. You have to process the stream of data looking
for
a marker your application has placed that signifies the end of message to
your
end of the application program.

Twisted provide the linereceiver and sendline functions to help in the
common
case of using CR/LF as a terminator of messages, expecially for chat type
protocols and http.

The reactor and the select command will process the outgoing and incoming
buffers without blocking.

The reactor uses the select command. Each time the reactor cycles around, it
will
use select to check the read and write buffers to see if any buffer is ready
to
read or write. It will process those that are ready. Ignoring any not yet
ready.

Anyone familiar with networking and select will probably already understand
this.
Anyone not familiar will not realise it and needs to become familiar with
how
select works.

If you want to know how Twisted processes network traffic, you should read
up
on the select command.

TWISTED - DIRECT SEND DATA CALLS

For simple network activity you do not need to use deferreds. They are not
necessary. And you can get a lot done without deferreds just by using the
transport.write or sendline functions. This is shown in the simple Chat
Server
example following.

Provided you are dealing in small amounts of data you will not block the
reactor.
If you are sending megabytes of data in a file, that is a different
matter.

Using sendline directly is faster than using a deferred.

John Goerzen in his Apress book Python Network Fundamentals has a very
simple
chat server example.

WHAT IS BLOCKING CODE

Blocking code is code that will block or may potentially block the continued
execution of the main reactor thread. Think for the most part of long
running
processes or operations that may be long running, doing file or network i/o,
calculating cpu intensive work, operations that may timeout like doing a
remote
call to another process or host machine, database operations are usually a
culprit, that may be flooded with work or crashed, the examples go on but
are
mainly about i/o and cpu intensive operations.

When these things happen on the reactor / main thread they block the server
from
doing anything else, it can't accept new connections, it can't do anything
else
until this blocking activity has completed and returned control to the
reactor
thread.

WHAT ARE DEFERREDS

By and large they seem very similar to callbacks. They aren't, but seem to
perform the same sort of function. Please refer to other documentation on
defers
for more detailed explanation.

As everyone hears interminably on the twisted list, deferreds do not make
blocking code non-blocking. We all try it - but you shouldn't.

If you have blocking code, then first think about putting it into
deferToThread
which will run the code in its own thread. It's not the only thing you can
do
but it is a good start.

Return a deferred when setting up this threaded function and add appropriate
callbacks and errbacks. This will run the blocking code in its own thread.
You should not call transport.write or sendline functions directly from the
thread since this is not thread-safe.

In the thread you must call the callback or errback to return processing to
the
reactor thread and then send any data from the reactor thread.

You can handle this without deferToThread by breaking the blocking code up
into
smaller pieces. Sometimes you need to transfer a large file to a socket,
instead of trying to send it all at once send 10KB at a time and yield back
to
the reactor and reschedule the next 10KB until finished. This will work, it
might
not be the fastest way and still may block for an unacceptable amount of
time
on just 10KB, depending on how heavily taxed the i/o system is at the
moment.

Usually deferToThread is just easier to implement.

DATABASE PROCESSING TENDS TO BE BLOCKING

The adbapi module seems to be a good example of using deferreds and threads.
The
adbapi module returns a deferred it has created, you add your callbacks to
it.
The thread then calls your callback when ready. It does seem like the
examplar
for doing deferreds.

The db stuff will normally block so put it in a thread and use deferreds to
wait
the result or failure.

THREADS

twisted is meant to avoid the problems of using threads for network
processing.
So why are we using threads. It's a way of moving potentially blocking code
out
of the way so it avoids hanging the reactor.

THREADS WON'T NECESSARILY PREVENT BLOCKING

A point about the db calls is that they can be very intensive. If you need
to
run some db function every 30 secs or 60 secs and the db takes 50% or more
of the
time to generate the results, you won't have much time to service any
incoming
requests that want to get results. The remote connections will be failing
bigtime.

So then I suppose you should break the code into 2 programs. One that does
the
db stuff, the other to handle the remote connections. The db code when it
has
a result will then connect to the other program and pass across its
results.
There may be better ways of doing this of course depending on circumstances.

WHEN TO USE DEFERREDS

If you have a cpu intensive process, then in all probability it will block
the
reactor since it will take 100% cpu time while running - whether in the main
thread
or in a separate thread. These are not good for running in twisted.

If you have I/O activity, such as reading lines of text from a disk file,
this seems a good candidate for deferreds.

This is what the dbapi module does. So it seems like a good example to
follow.

As a general rule, it is simplest to use deferreds with threads. This is not
always true so circumstances may indicate a better way of running a
deferred.

You still need to make sure that the bulk of the time is available for
handling
connections. Otherwise you will start to have failing connections

Using sendline directly is faster than putting a deferred in between.

BEWARE WHEN USING DEFERREDS IN THREADS

Since deferToThread runs the function you pass to it in a non-reactor
thread, you may not use any non-thread-safe Twisted APIs in the function
you
pass to it.

Beware of using shared data when running in the thread such as lists and
dictionaries.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://twistedmatrix.com/pipermail/twisted-python/attachments/20090810/cdf21a5b/attachment-0001.htm