[Twisted-web] Some help with Nevow and databse I/O

Fri Feb 24 14:49:01 MST 2006

On Friday 24 February 2006 12:29 pm, Valentino Volonghi aka Dialtone wrote:

> >I have two questions that are somewhat interrelated. In my test cases I
> > have authentication (using nevow session guard) setup using a SHA stored
> > via the shelve module.
>
> This is technically wrong.

Sorry. I should have clarified a bit. I am authenticating with twisted.cred 
using a class/method like this:

class AuthMech(object):
    implements(checkers.ICredentialsChecker)
    credentialInterfaces = (credentials.IUsernamePassword,)

    def _checkPasswd(self, cipher, password):
        cipher = base64.decodestring(str(cipher))
        salt, hash = cipher[:4], cipher[4:]
        hash2 = sha.new(salt + password).digest()
        return hash2 == hash

p.registerChecker(pages.AuthMech())
r = guard.SessionWrapper(p)

> Let me explain why:
> Yes, you are using nevow session guard to have authentication but no, it is
> not guard that is setup to use a SHA stored via the shelve module. The
> thing that is setup in that way is twisted.cred.
>
> In fact guard is only glue that is used to get credentials from a request
> and pass them to twisted.cred.
>
> Why do I tell you this? Because it means that you don't have to change your
> application in order to change the source of authentication data.
>
> And yet people say that cred/guard is crappy... Oh man. :)

Not crappy at all, in fact it's been very easy to get running and seems to 
work perfectly. I haven't really encountered any problems and my questions 
are more about doing things the "right" way rather than just getting it 
working.

>
> >The true customer info is stored in a MySQL database. I can pull that data
> >doing the following:
> >
> >    def data_query(self, context, data):
> >        return self.dbpool.runQuery('SELECT name, email, id, TransferType,
> >		basic_charge, over_charge, permitted_transfer FROM Customers')
> >
> >I've read Abe's book on twisted and on page 54 he states "Nevow is
> > designed from the ground up for Twisted, which means you can use
> > Deferreds everywhere"
> >
> >My question is do I have to use the adbapi module or can I use deferreds
> > to handle database queries?
>
> adbapi module returns deferreds for each of the 3 methods:
>
> DBPool.runQuery
> DBPool.runOperation
> DBPool.runInteraction

So I have to use the adbapi (threads) regardless? Okay I can handle that but 
my question now is what's the most efficient way to go about it?

Referring to Glyph's blog post, he claims most twisted developers are misusing 
the adbapi interface.

My concern is this, my code thus far looks pretty good. I've reviewed many 
examples of bits and pieces and generally have a decent feel for what I'm 
doing. Now that my application "framework" is basically setup (XHTML, 
templates, authentication, server, etc...) I'm to the point where I need to 
interface with an existing SQL database to retrieve data to fill in some 
templates.

This sort of method works fine:

    def data_query(self, context, data):
        return self.dbpool.runQuery('SELECT name, email, id, TransferType,
		basic_charge, over_charge, permitted_transfer FROM Customers')

But I'm wondering if this the best way to use the adbapi interface.

I guess I have questions concerning when should I connect. As it stands I 
connect in the class constructor like this:

class ListCustomers(rend.Page):
    def __init__(self, user):
        self.dbpool = adbapi.ConnectionPool(...

My gut says this is wrong because now each page class contains a connection 
object. That really seems to be more of a general Python design issue and I 
could probably come up with a better solution.

But what about queries themselves? Is it bad design to run a query from each 
data_* method for all data methods in all page objects? I have three methods 
to work with:

DBPool.runQuery
DBPool.runOperation
DBPool.runInteraction

You mentioned runInteraction above. Would it be more wise to provide a general 
data_* method that did several queries and stored (cached) that data in the 
Page object itself for later retieval?

Each page has to do some database I/O but it would be helpful to not have to 
do it everytime a page is refreshed or as the user goes back and forth 
through pages since the data is unlikely to change that quickly.

I'm looking for some insight as to how handle this type of scenario from a 
Twisted/Nevow perspective. It's obvious that database I/O is going to slow 
things down a bit but there must be a "best case" situation that avoids 
unnecessary queries.

> >I've read the docs on deferreds but I'd be lying if I said I can fully
> > wrap my brain around all aspects of the concept or how it's implemented.
>
> class Deferred(list):
>     def addCallback(self, fun, *args, **kwargs):
>         self.append((fun, args, kwargs))
>
>     def callback(self, initial_value):
>         for fun, args, kwargs in self:
>             initial_value = fun(initial_value, *args, **kwargs)
>         return initial_value
>
> Easy isn't it?

Yes, it's the "deferreds don't make blocking code non-blocking" part that can 
get a bit confusing. I've never really had to think about blocking vs. 
non-blocking calls and truthfully I like it. It forces you to be a little 
more conscious about what you're doing. I'm really stoked about not using 
threads because they are such a nightmare to debug.

> deferreds don't make blocking code non-blocking, they are just a different
> mechanism to register callbacks. Instead of doing:
>
> button.handle_event('clicked', list_of_callbacks)
>
> you do
>
> button.clicked().addCallback(callback1).addCallback(callback2)

This chaining is very cool. So what actually makes the code non-blocking? The 
use of select() sys calls? I guess that's where my hangup is. How can we take 
a connection to a mail server and defer that result but yet we can't do the 
same for a database connection? At the OS level, aren't they both sockets 
that can be watched via select()?

My head hurts :-)

>
> You need to spawn a thread because there is hardly any database adapter
> that is written to be async, and those that are written to be async are
> either buggy or not complete or not really usable, plus having deferreds
> after every query in python (which doesn't support deferreds natively)
> makes coding a lot harder (you can deal with some deferreds here and there,
> but not with 10 deferreds each function).

When you say adapter your are referring to (in this case) MySQLdb or it's 
underlying _mysql module or yet even lower down to the C api?

If I wanted to implement a limited set of queries, could I write some 
non-blocking way of doing that and wrap those functions in Deferred objects. 
A silly question I guess because it's obviously possible. I guess I should 
ask if it's feasible. Maybe just attempting this as an exercise would give me 
greater understanding of how Twisted provides non-blocking calls.

> If you want to do N operations at the same time just use runInteraction
> instead of runQuery (but this doesn't seem to be the case).

No it isn't the case but it is wise for me to do runQuery() calls in each 
data_ method of each page or should I be looking for some way to unify 
queries and cache that data?

Thanks,

-- 
Eric Gaumer
Debian GNU/Linux PPC
egaumer at pagecache.org
http://egaumer.pagecache.org
PGP/GPG Key 0xF15D41E9
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://twistedmatrix.com/pipermail/twisted-web/attachments/20060224/d63812a6/attachment.pgp