[Twisted-web] Some help with Nevow and databse I/O

Fri Feb 24 15:38:10 MST 2006

On Fri, 24 Feb 2006 13:49:01 -0800, Eric Gaumer <egaumer at pagecache.org> wrote:
>So I have to use the adbapi (threads) regardless? Okay I can handle that but
>my question now is what's the most efficient way to go about it?

Have one connection pool for the whole application and run all queries through it. What I suggest is always using runInteraction because it allows you to write different objects whose methods can be passed to runInteraction to get the expected result (of course methods can be substituted with functions).

How to share the connection pool is something that I'll deal with later.

>Referring to Glyph's blog post, he claims most twisted developers are misusing
>the adbapi interface.

I don't know the real meaning, but I suggest you should always stick to runInteraction.

>My concern is this, my code thus far looks pretty good. I've reviewed many
>examples of bits and pieces and generally have a decent feel for what I'm
>doing. Now that my application "framework" is basically setup (XHTML,
>templates, authentication, server, etc...) I'm to the point where I need to
>interface with an existing SQL database to retrieve data to fill in some
>templates.
>
>This sort of method works fine:
>
>    def data_query(self, context, data):
>        return self.dbpool.runQuery('SELECT name, email, id, TransferType,
>		basic_charge, over_charge, permitted_transfer FROM Customers')
>
>But I'm wondering if this the best way to use the adbapi interface.

Yes and no. It is not the best way because of course you are exposing to the application a detail of the way to access data. The best way would be to move that query in a single place with all the other ones so that you will be able to change the source of information easily another time (and when needed).

>I guess I have questions concerning when should I connect. As it stands I
>connect in the class constructor like this:
>
>class ListCustomers(rend.Page):
>    def __init__(self, user):
>        self.dbpool = adbapi.ConnectionPool(...

>My gut says this is wrong because now each page class contains a connection
>object. That really seems to be more of a general Python design issue and I
>could probably come up with a better solution.

Indeed. The better solution is to put the dbpool in the user object that you pass to ListCustomers :).

>But what about queries themselves? Is it bad design to run a query from each
>data_* method for all data methods in all page objects? I have three methods
>to work with:

No it is not bad design, it is the only thing you can do actually, data methods where designed with that particular purpose in mind. Of course the other solution is to put everything in beforeRender but I don't like it that much.

>DBPool.runQuery
>DBPool.runOperation
>DBPool.runInteraction
>
>You mentioned runInteraction above. Would it be more wise to provide a general
>data_* method that did several queries and stored (cached) that data in the
>Page object itself for later retieval?

This is another solution, but it's poor on code reuse. In fact you might have just a few data methods that retrieve particular objects and you can inherit those methods in other pages to always have a way to retrieve those objects from the database.

Nevow development is mostly creating an environment where templates can get the data they need. Very few python lines are needed when you are using Nevow in the right way because you fill find yourself reusing pre-defined render and data methods many many many times.

In fact the biggest application I've written so far has more than twice as much xHTML than Python. And it is made of about 70 kinds of pages for a staggering 10 lines per page on average, not too bad huh? :)

>Each page has to do some database I/O but it would be helpful to not have to
>do it everytime a page is refreshed or as the user goes back and forth
>through pages since the data is unlikely to change that quickly.

This is a different aspect and might be part of the database layer (query caching). Putting this duty on the application itself is asking for troubles.

>I'm looking for some insight as to how handle this type of scenario from a
>Twisted/Nevow perspective. It's obvious that database I/O is going to slow
>things down a bit but there must be a "best case" situation that avoids
>unnecessary queries.

As I said this is really database layer dependent. You don't want your application to do this job because it's simply the wrong layer, it will cost you more lines and more debugging and it will be harder to reuse when you add new pages. A better approach is to use caching at the HTML level, there are different approaches on this and you can find both searching the mailing list or the Nevow bug tracker for the word cache.

>> button.handle_event('clicked', list_of_callbacks)
>>
>> you do
>>
>> button.clicked().addCallback(callback1).addCallback(callback2)
>
>This chaining is very cool. So what actually makes the code non-blocking? The
>use of select() sys calls? I guess that's where my hangup is. How can we take
>a connection to a mail server and defer that result but yet we can't do the
>same for a database connection? At the OS level, aren't they both sockets
>that can be watched via select()?
>
>My head hurts :-)

Ok, if you ever programmed a custom protocol in Twisted you would get this quite straightly. Basically:

Protocols are designed with simple callbacks (no deferreds there) like lineReceived or dataReceived or datagramReceived. This can be achieved thanks to the select() loop and some logic in the base protocol class and some coupling with the transport of course (although pretty light coupling).
Of course the next step is to write a state machine on top of these callbacks to deal with complex protocols (like HTTP).
The Factory associated with a Protocol might be designed to hold an unfired Deferred instance. Once the Protocol reaches a final state of the state machine it triggers the Deferred held by the Factory.
Again of course the Factory exposed some methods like 'getPage' or 'runQuery', those methods returned a reference to the enclosed deferred to the caller in order to allow it to attach the wanted callbacks.
You can imagine that when the Protocol fires the deferred all the callbacks attached by the caller will be called and the 'magic' begins.

Of course not everything uses Protocols and Factories, for example the DBPool doesn't. Instead it uses a similar way of doing things but conceptually it's the same thing.

>When you say adapter your are referring to (in this case) MySQLdb or it's
>underlying _mysql module or yet even lower down to the C api?

I'm referring to MySQLdb or any other (like psycopg).

>If I wanted to implement a limited set of queries, could I write some
>non-blocking way of doing that and wrap those functions in Deferred objects.
>A silly question I guess because it's obviously possible. I guess I should
>ask if it's feasible. Maybe just attempting this as an exercise would give me
>greater understanding of how Twisted provides non-blocking calls.

There are 3 reactor calls to deal with threads and a 'simplified' way of using them with a thread pool.

reactor.callFromThread()
reactor.callInThread()
reactor.suggestThreadPoolSize()

With these 3 APIs you can run a function in a separate thread taken from a thread pool and since the reactor is not thread safe you must use another reactor.API to call other reactor APIs from a separate thread (of course I can't remember which ones).

The simplified way is to use threads.deferToThread() which returns a deferred and runs the function that you pass it in a separate thread.

This is essentially the same thing that dbpool.runInteraction does.

>No it isn't the case but it is wise for me to do runQuery() calls in each
>data_ method of each page or should I be looking for some way to unify
>queries and cache that data?

Move runQuery away from it and abstract it out. Then call the abstraction each time in the API and have the abstraction cache stuff when needed.

Ok, then I told you that I would have explained how to share the DBPool among the whole application.

The Realm has a requestAvatar() method that you have to implement. That method should return an avatar based on an avatarId that is passed to it (the avatarId object is returned by the checker). Given this information you should now understand what an avatar is. The avatar is the interface over the data that a user has access to. How do you use it? Simply by making it the mean through which you access data from the application and passing the avatar throughout the application.

For example you can do something like:

class ListCustomers(rend.Page):
    def __init__(self, avatar):
        self.avatar = avatar
        rend.Page.__init__(self)

    def data_customers(self, ctx, data):
        return self.avatar.get_customers()

As you see now there is no implementation leak about the real source of information except that it comes from the avatar, which has many ways to get to the wanted information.

There are many ways to deal with the avatar 'construction' in case of permissions over the data to access and roles but this is a whole different topic, neat but pretty hard to explain in this already long email. As I said I have code that deals with this using SQLAlchemy in my new project and as soon as I'll have more time (I'll graduate on 6th March) I'll publish the data layer somewhere, hopefully doing something useful for the community :).

-- 
Valentino Volonghi aka Dialtone
Now Running MacOSX 10.4
Blog: http://vvolonghi.blogspot.com
New Pet: http://www.stiq.it