[Twisted-web] leightweight CMS

Andrea Arcangeli andrea at cpushare.com
Sun Mar 13 18:51:01 MST 2005


On Mon, Mar 14, 2005 at 09:37:14AM +1100, Christopher Armstrong wrote:
> I think this is a bit of an overstatement. It's not really that big of
> a deal. static.File uses blocking I/O to read files, and I've written
> other code that writes to them as well; if you do it in chunks of
> (say) 1024kb, and don't do something stupid like run it on NFS, then
> it doesn't stall the server and doesn't break the "async model of
> Twisted". I always suggest evaluating files as a storage mechanism
> before moving on to something more complex.

Writing in small chunks is worthless in terms of kernel blocking I/O.
1024k chunks is too small too, I suggest to write 8k at least every time
to be somewhat more optimal.

The problem isn't the cpu-bound-stall: the problem is that if you've a
fast network and your disk is busy (like during a backup) you might fill
the writeback cache faster than the disk can keep up flushing it, and at
some point every one of those write(8k) (or every one of those
write(1k), no difference since the kernel coalesce them togehter) can
take half a second or similar, and during such an half a second all
clients will hang, even if all but one could still receive data from
cache. That would break the async model of twisted, that's a fact. All
those addCallbacks pains you do everywhere to keep the thing async and
to let it scale with an huge number of connections without wasting tons
of memory in kernel stacks, ptes, fds that a thread-per-connection model
would require, gets completely screwed up by the single sync-writer (or
sync-reader) of the lazy code.

For reads is the very same issue, except it's much more likely to happen
for reads in real life if all your files don't fit in cache: if you do
something like what was suggested here i.e. to keep the db of the
articles in files instead of in a db.

For writes it will only happen on a big upload with fast intranet and
disk busy doing backup or some other I/O workload, so it's less
realistic but still a plausible scenario.

If you've just to implement a tiny app going lazy is ok, but it should
be discouraged.

Note that I use some os.listdir myself too in a few places, but those
places are masked by the rand.Page cache, and most important they've a
tiny size, and they're guaranteed to be cached. The bulk of the data
should be provided in a async way if it doesn't fit in cache.

In the long run twisted should support kernel async-io (and even more
important epoll for different reasons), then you won't have to use an
helper thread to do the bulk I/O. Using an sql server is the current
best solution to keep everything async and it normally provides several
other advantages too ;).



More information about the Twisted-web mailing list