[Twisted-Python] twisted.web and MySQLdb

Rene Dudfield illumen at yahoo.com
Wed Oct 29 11:35:20 EST 2003


Hello,

adding more random thoughts below on LAMP sucking, and spitting.


Glyph Lefkowitz wrote:

> Nathan Seven wrote:
>
>> Hmmmm I would qualify that- I dont think the
>> filesystem is the place to be handling dynamic data. 
>
>
> The filesystem is a fine place.
>
> For example, in the Prevayler persistence model, you just write 
> logfiles to disk, and synchronize your state at a checkpoint.  For 
> highly dynamic applications, especially ones which require failover, 
> (You can re-play the transaction log live, after all) this works quite 
> well.
>
>> Databases were created *specifically* for this
>> purpose.
>
>
> I think that databases were specifically designed to store accounting 
> information, actually.
>
>> Sure, storing all your static blobs in your
>> database is a really quick way to grind shit to a
>> halt, but locking and concurrency?
>> If you're doing things properly, and your http server
>> is just serving static objects, then these are
>> non-issues.
>
>
> Databases can be amazingly slow, especially if you have a lot of 
> updates to do.  (Even a very fast database can be made slow by I/O 
> bottlenecks if you are trying to make it remote for scalability 
> reasons.)  This has an easy solution: you can cache everything!  Of 
> course, then you need to be able to easily access the cache from all 
> of the machines, because it may have been updated.  Now you have 
> problems with coherency.  Then you need to lock the cache, because it 
> could have been updated, and then you need to read from it.
>
> Pretty soon you're talking to your caching server as if it were a 
> database.  This is _great_ if you are Livejournal:
>
>> Yeah- through my line of work I deal with a *lot* of
>> different infrastructures.  Everything from "Joe's BBQ
>> Sauce Garage" to Amazon.  Literally the only
>> organization I can think of that can keep anything
>> coherent with MySQL is Livejournal- and then I believe
>> only because Brad seems to be a cache-god with
>> memcached and such.
>
>
> because then you don't have to worry about computation, mutable data, 
> etc - you're basically just storing data and then spitting it back 
> out, and you don't care if the timestamps are a little off.
>
> This is the important point about LAMP and Twisted:
>
> There are applications which can connect to HTTP which are not blogs.
>
> If you are writing a multiplayer game which wants to support lots of 
> concurrent users, you can't afford to spawn a thread and do a database 
> request every time a player picks something up.  

New versions of linux have very quick threads.  Also different apps have 
different speeds for threads.  Small programs (say less than 100 
kilobytes) which are entirely static are lots faster than multi megabyte 
processes which dynamically load things.  Threads and processes also can 
easily use multiple cpus. 

Of course there are other reasons people don't like threads.

http for games?  If your game is sensitive to latency and you can help 
it, avoid http for games.  Http gives you much more latency than a db.  
So does a centralised server for that matter.  In a two player game you 
can half your latency by talking directly to each other.  Assuming you 
both aren't behind a non configurable firewall, or a proxy server.  http 
is good as a backup protocol though.  Because some fascists don't allow 
anything but access to the internet except through a proxy.  Or maybe 
you are playing a game in a web browser :)

Note that some dbs have async interfaces(eg postgres).  So you wouldn't 
need a thread.


btw, anyone know if sendfile is in (or going to be in python soon)?
http://mail.python.org/pipermail/python-dev/2002-March/021498.html

> (Python is quite slow enough already, thanks.)  You can't just use a 
> cache because the data changes _all the time_, and you have to care 
> about it from everywhere that you care about your data.  Working with 
> your objects directly in memory is close to the only option.

What about berkley dbs?  I think bsddb3 was 3-4 times slower than a 
python dict in general.  They are very quick, and you don't need to lock 
them.  You can use transactions.  Locking can kill performance.

Quickness depends on memory, and data sets really.  These are 
approximate speedinesses of python dictionary like things for different 
data sizes and memory:
say 700MB of memory:

200 MB key value data - python dicts, kjDicts, berklydb3 on disk db.
400 MB key value data - kjDicts(as python dicts use more memory and 
begin swapping), berkley db3-4 on disk, python dicts.
 > 2 gigs of key value data - berkley db3-4 on disk, kjDicts(the kjdict 
starts swapping before here), python dicts.

Of course deleting large python dicts is *really* slow.  I used to kill 
my python processes with kill -9 so that I didn't have to wait for the 
reference counting garbage collecting beast to do lots of free()s on the 
dicts memory.  kjDicts and bsddb dbs were faster for deleting.

Some people do use RDBMS for large online games.  Check out 
gamasutra.com for some articles.  Seems they have lots of fun 
performance problems.

Compressing the hash would be nice.  Maybe memory mapped files on a 
compressed file system would be quick for this ;) 

Distributed hashes would also be nice!  Any good distruted hashes for 
python?

I think memory is one of the first things that kill apache 
performance(when using preforking).  Especially when it has massive php 
compiled in!  You can get big speed boosts by using different apache 
configs for different request types(even on the same machine).  Eg set 
one up for static files(eg images), and one for your bloated php.

>
> If you're writing a real-time financial data system, you do want to 
> use a database, but you want to very carefully control your access to 
> it. Certainly, you don't want to equate 'web hit' with 'database 
> query', as the LAMP model is wont to do.
>
> Or maybe you're writing an application that has to operate as a 
> client-side proxy, and you don't have the leisure of a DBA at every 
> desk, so you can't require that an RDBMS gets set up with each 
> installation.  This might require some hackish workarounds with the 
> filesystem that you'd rather not do, but nevertheless, it's better 
> than having the user editing pg_hba.conf themselves.
>
SQLlite and berkley db are good for easy to bundle dbs.

One good thing about LAMP though is that lots of servers have it 
installed, and it can be quite cheap to use as a platform.

As you say, not everything is a blog.  There are too many different 
factors for a one size fits all 'this way is best' solution.  Besides 
everyone knows twisted rules ;)


Have fun!





More information about the Twisted-Python mailing list