johnaherne at rocs.co.uk
Mon Aug 10 02:56:20 EDT 2009
Prompted by exarkun, I have put together some simple documentation for
beginners starting with Twisted.
The points are made up of things I wish I had known from the start as a
It still needs lots of work. The layout needs to change, there are
duplications and the points need to be made more succinctly.
But before I spend more time on honing the document, I thought it would be a
good idea to get some feedback.
The information in the document may already exist and I have just overlooked
People may feel it is not appropriate for the documentation in Twisted and
should go elsewhere.
Some of the information needs to be checked for accuracy and to avoid
Anyway I would appreciate any feedback positive or negative.
Here ie the text:
Basic Information for anyone starting with Twisted
If you don't know mauch about TCP, then bear this in mind.
TCP is a stream of data. Once the connection is open, it stays open until
It does not have a beginning or an end. It does not know about your
You cannot wait until it sends your message since it will not tell you the
has been sent. If you want to know if your message reached the other end,
to have in place a protocol for each end to respond that it received the
You will then need to implement a timeout for when there are problems
you may wait a long time for a response. TCP will wait forever. But
routers may time you out after 2 - 30 minutes if there is no traffic on the
TCP is just a stream of data. You have to process the stream of data looking
a marker your application has placed that signifies the end of message to
end of the application program.
Twisted provide the linereceiver and sendline functions to help in the
case of using CR/LF as a terminator of messages, expecially for chat type
protocols and http.
The reactor and the select command will process the outgoing and incoming
buffers without blocking.
The reactor uses the select command. Each time the reactor cycles around, it
use select to check the read and write buffers to see if any buffer is ready
read or write. It will process those that are ready. Ignoring any not yet
Anyone familiar with networking and select will probably already understand
Anyone not familiar will not realise it and needs to become familiar with
If you want to know how Twisted processes network traffic, you should read
on the select command.
TWISTED - DIRECT SEND DATA CALLS
For simple network activity you do not need to use deferreds. They are not
necessary. And you can get a lot done without deferreds just by using the
transport.write or sendline functions. This is shown in the simple Chat
Provided you are dealing in small amounts of data you will not block the
If you are sending megabytes of data in a file, that is a different
Using sendline directly is faster than using a deferred.
John Goerzen in his Apress book Python Network Fundamentals has a very
chat server example.
WHAT IS BLOCKING CODE
Blocking code is code that will block or may potentially block the continued
execution of the main reactor thread. Think for the most part of long
processes or operations that may be long running, doing file or network i/o,
calculating cpu intensive work, operations that may timeout like doing a
call to another process or host machine, database operations are usually a
culprit, that may be flooded with work or crashed, the examples go on but
mainly about i/o and cpu intensive operations.
When these things happen on the reactor / main thread they block the server
doing anything else, it can't accept new connections, it can't do anything
until this blocking activity has completed and returned control to the
WHAT ARE DEFERREDS
By and large they seem very similar to callbacks. They aren't, but seem to
perform the same sort of function. Please refer to other documentation on
for more detailed explanation.
As everyone hears interminably on the twisted list, deferreds do not make
blocking code non-blocking. We all try it - but you shouldn't.
If you have blocking code, then first think about putting it into
which will run the code in its own thread. It's not the only thing you can
but it is a good start.
Return a deferred when setting up this threaded function and add appropriate
callbacks and errbacks. This will run the blocking code in its own thread.
You should not call transport.write or sendline functions directly from the
thread since this is not thread-safe.
In the thread you must call the callback or errback to return processing to
reactor thread and then send any data from the reactor thread.
You can handle this without deferToThread by breaking the blocking code up
smaller pieces. Sometimes you need to transfer a large file to a socket,
instead of trying to send it all at once send 10KB at a time and yield back
the reactor and reschedule the next 10KB until finished. This will work, it
not be the fastest way and still may block for an unacceptable amount of
on just 10KB, depending on how heavily taxed the i/o system is at the
Usually deferToThread is just easier to implement.
DATABASE PROCESSING TENDS TO BE BLOCKING
The adbapi module seems to be a good example of using deferreds and threads.
adbapi module returns a deferred it has created, you add your callbacks to
The thread then calls your callback when ready. It does seem like the
for doing deferreds.
The db stuff will normally block so put it in a thread and use deferreds to
the result or failure.
twisted is meant to avoid the problems of using threads for network
So why are we using threads. It's a way of moving potentially blocking code
of the way so it avoids hanging the reactor.
THREADS WON'T NECESSARILY PREVENT BLOCKING
A point about the db calls is that they can be very intensive. If you need
run some db function every 30 secs or 60 secs and the db takes 50% or more
time to generate the results, you won't have much time to service any
requests that want to get results. The remote connections will be failing
So then I suppose you should break the code into 2 programs. One that does
db stuff, the other to handle the remote connections. The db code when it
a result will then connect to the other program and pass across its
There may be better ways of doing this of course depending on circumstances.
WHEN TO USE DEFERREDS
If you have a cpu intensive process, then in all probability it will block
reactor since it will take 100% cpu time while running - whether in the main
or in a separate thread. These are not good for running in twisted.
If you have I/O activity, such as reading lines of text from a disk file,
this seems a good candidate for deferreds.
This is what the dbapi module does. So it seems like a good example to
As a general rule, it is simplest to use deferreds with threads. This is not
always true so circumstances may indicate a better way of running a
You still need to make sure that the bulk of the time is available for
connections. Otherwise you will start to have failing connections
Using sendline directly is faster than putting a deferred in between.
BEWARE WHEN USING DEFERREDS IN THREADS
Since deferToThread runs the function you pass to it in a non-reactor
thread, you may not use any non-thread-safe Twisted APIs in the function
pass to it.
Beware of using shared data when running in the thread such as lists and
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Twisted-Python