[Twisted-Python] twisted.web and MySQLdb
illumen at yahoo.com
Wed Oct 29 11:35:20 EST 2003
adding more random thoughts below on LAMP sucking, and spitting.
Glyph Lefkowitz wrote:
> Nathan Seven wrote:
>> Hmmmm I would qualify that- I dont think the
>> filesystem is the place to be handling dynamic data.
> The filesystem is a fine place.
> For example, in the Prevayler persistence model, you just write
> logfiles to disk, and synchronize your state at a checkpoint. For
> highly dynamic applications, especially ones which require failover,
> (You can re-play the transaction log live, after all) this works quite
>> Databases were created *specifically* for this
> I think that databases were specifically designed to store accounting
> information, actually.
>> Sure, storing all your static blobs in your
>> database is a really quick way to grind shit to a
>> halt, but locking and concurrency?
>> If you're doing things properly, and your http server
>> is just serving static objects, then these are
> Databases can be amazingly slow, especially if you have a lot of
> updates to do. (Even a very fast database can be made slow by I/O
> bottlenecks if you are trying to make it remote for scalability
> reasons.) This has an easy solution: you can cache everything! Of
> course, then you need to be able to easily access the cache from all
> of the machines, because it may have been updated. Now you have
> problems with coherency. Then you need to lock the cache, because it
> could have been updated, and then you need to read from it.
> Pretty soon you're talking to your caching server as if it were a
> database. This is _great_ if you are Livejournal:
>> Yeah- through my line of work I deal with a *lot* of
>> different infrastructures. Everything from "Joe's BBQ
>> Sauce Garage" to Amazon. Literally the only
>> organization I can think of that can keep anything
>> coherent with MySQL is Livejournal- and then I believe
>> only because Brad seems to be a cache-god with
>> memcached and such.
> because then you don't have to worry about computation, mutable data,
> etc - you're basically just storing data and then spitting it back
> out, and you don't care if the timestamps are a little off.
> This is the important point about LAMP and Twisted:
> There are applications which can connect to HTTP which are not blogs.
> If you are writing a multiplayer game which wants to support lots of
> concurrent users, you can't afford to spawn a thread and do a database
> request every time a player picks something up.
New versions of linux have very quick threads. Also different apps have
different speeds for threads. Small programs (say less than 100
kilobytes) which are entirely static are lots faster than multi megabyte
processes which dynamically load things. Threads and processes also can
easily use multiple cpus.
Of course there are other reasons people don't like threads.
http for games? If your game is sensitive to latency and you can help
it, avoid http for games. Http gives you much more latency than a db.
So does a centralised server for that matter. In a two player game you
can half your latency by talking directly to each other. Assuming you
both aren't behind a non configurable firewall, or a proxy server. http
is good as a backup protocol though. Because some fascists don't allow
anything but access to the internet except through a proxy. Or maybe
you are playing a game in a web browser :)
Note that some dbs have async interfaces(eg postgres). So you wouldn't
need a thread.
btw, anyone know if sendfile is in (or going to be in python soon)?
> (Python is quite slow enough already, thanks.) You can't just use a
> cache because the data changes _all the time_, and you have to care
> about it from everywhere that you care about your data. Working with
> your objects directly in memory is close to the only option.
What about berkley dbs? I think bsddb3 was 3-4 times slower than a
python dict in general. They are very quick, and you don't need to lock
them. You can use transactions. Locking can kill performance.
Quickness depends on memory, and data sets really. These are
approximate speedinesses of python dictionary like things for different
data sizes and memory:
say 700MB of memory:
200 MB key value data - python dicts, kjDicts, berklydb3 on disk db.
400 MB key value data - kjDicts(as python dicts use more memory and
begin swapping), berkley db3-4 on disk, python dicts.
> 2 gigs of key value data - berkley db3-4 on disk, kjDicts(the kjdict
starts swapping before here), python dicts.
Of course deleting large python dicts is *really* slow. I used to kill
my python processes with kill -9 so that I didn't have to wait for the
reference counting garbage collecting beast to do lots of free()s on the
dicts memory. kjDicts and bsddb dbs were faster for deleting.
Some people do use RDBMS for large online games. Check out
gamasutra.com for some articles. Seems they have lots of fun
Compressing the hash would be nice. Maybe memory mapped files on a
compressed file system would be quick for this ;)
Distributed hashes would also be nice! Any good distruted hashes for
I think memory is one of the first things that kill apache
performance(when using preforking). Especially when it has massive php
compiled in! You can get big speed boosts by using different apache
configs for different request types(even on the same machine). Eg set
one up for static files(eg images), and one for your bloated php.
> If you're writing a real-time financial data system, you do want to
> use a database, but you want to very carefully control your access to
> it. Certainly, you don't want to equate 'web hit' with 'database
> query', as the LAMP model is wont to do.
> Or maybe you're writing an application that has to operate as a
> client-side proxy, and you don't have the leisure of a DBA at every
> desk, so you can't require that an RDBMS gets set up with each
> installation. This might require some hackish workarounds with the
> filesystem that you'd rather not do, but nevertheless, it's better
> than having the user editing pg_hba.conf themselves.
SQLlite and berkley db are good for easy to bundle dbs.
One good thing about LAMP though is that lots of servers have it
installed, and it can be quite cheap to use as a platform.
As you say, not everything is a blog. There are too many different
factors for a one size fits all 'this way is best' solution. Besides
everyone knows twisted rules ;)
More information about the Twisted-Python