[Twisted-Python] A Python metaclass for Twisted allowing __init__ to return a Deferred

glyph at divmod.com glyph at divmod.com
Mon Nov 3 13:17:09 MST 2008


On 03:56 pm, terry at jon.es wrote:
>Glyph - I appreciate your comments on testing, and agree that's
>problematic. I also like your pattern, have run into it myself, and 
>will
>use it - thanks.

OK, cool :).
>Glyph also said I didn't really provide a use-case, so I'll do that a 
>bit
>more clearly now. That leads me back in the direction of preferring my
>metaclass solution.

(snip example)
>The problem with approaches that don't actually create the class 
>instance,
>is that __init__ is calling self.createDBTable, but self doesn't exist
>yet. So putting code to deal with Deferreds into __new__ wont help 
>unless
>that code has nothing to do with the instance of the class.

Given that this provides a good object lesson for other folks writing 
Twisted-based code and potential contributors to Twisted to write code 
for Twisted itself, I will continue on with my critique.  I hope you 
find it helpful.

In this specific example, it doesn't seem like there's really any good 
reason for _createDBTable and createDBTable to be instance methods.  If 
I understand the thing you're implementing properly, you're not going to 
be calling those methods again once the instance is fully initialized 
(to create the table twice would be an error), so they arguably 
shouldn't even be public.  Assuming they should be public, though, they 
could easily be class methods - or even static methods or free 
functions.  The only attribute of 'self' accessed by either method is 
'db'; so why not just have a function that gets passed 'db' rather than 
'self'?

But, I'll take a step back and make the problem harder - let's assume 
you have lots of state on 'self' that these methods want to access, and 
there really is a complex multi-stage initialization process.  There are 
a number of simple solutions that don't involve metaclasses or __init__ 
returning a Deferred.

The simplest is to simply make your class's constructor just take a 'db' 
object.  Then you can do this:

    class CoordinatorHandler(object):
        @inlineCallbacks
        @classmethod
        # Untested, not totally sure that's the right stacking order...
        def fromSpec(cls, tableName, tableSpec, dbURI):
            self = cls(yield database.getDB(dbURI))
            yield self.createDBTable(tableName, tableSpec)
            returnValue(self)

        def __init__(self, db):
            self.db = db
        # ...

Now, that's a bit of a cop-out: __init__ hands back a partially- 
initialized object to application code.  The table might not yet be 
created.  Although your __metaclass__ pattern idea does that as well, 
what *I'd* want in this situation is a fully-initialized object from 
__init__, allowing only the internal multi-phase initialization code to 
see the partially-initialized object, since only that code really knows 
what methods you can and can't call before the object is fully ready.

The reality of RDBMSes is pretty crummy; it's (by definition) a big pile 
of global mutable state that you have very little control over and no 
way[1] of getting notified of changes to.  For example, you can't really 
know if a table exists or not, hypothetically somebody could come along 
at any moment and DROP TABLE on you and your whole application will 
break.  But, let's engage in a bit of fantasy for a moment (as all 
modern systems which interact with RDBMSes must do) and pretend that 
rather than spitting a string into a CREATE TABLE statement with no 
knowledge of success, the database (or some abstraction layer thereof) 
returns some kind of object to represent the table.

I say that because in this example, there's nothing you can pass to 
__init__ that will satisfy the object's idea of "fully initialized".  It 
just has to perform a bunch of potentially destructive operations on the 
"universe" object ('db'), then, once the results of those operations has 
taken effect, return an object.  So we need some kind of marker to say 
"we have performed those potentially destructive operations and they 
worked".  Code will probably be clearer than more prose at this point:

    class CoordinatorHandler(object):
        def __init__(self, db, tableHandle, otherStuff):
            "Do you know where to get a tableHandle from?  I do!  Call 
fromSpec."
            self.db = db
            self.tableHandle = tableHandle
            self.otherStuff = otherStuff

        @inlineCallbacks
        @classmethod
        def fromSpec(cls, tableName, tableSpec, dbURI, stuffFactory):
            # not fully initialized, but we're not handing this back to 
application
            # code yet...
            self = cls.__new__()
            # initialize juuuust enough to call that one method we want 
to call...
            self.db = yield database.getDB(dbURI)
            self.__init__(self.db, yield self.createDBTable(tableName, 
tableSpec),
                          yield stuffFactory.moreDeferredStuff())
            returnValue(self)

        def createDBTable(self, tableName, tableSpec):
            "We know this method only uses 'db' to do its work, so we're 
fine."
            return self.db.execute("CREATE TABLE ...").addCallback(
                lambda nothingUseful: TableHandle(tableName))


Here, you can't synchronously create a CoordinatorHandler unless you've 
got an object to stuff into its tableHandle slot from somewhere.  This 
provides a useful point at which to document the required type of the 
tableHandle, how one might create one (a pointer to some test utility 
classes, perhaps?).

This class also provides a nice factory function for you to generate one 
from a database, so you still get the same practical effect, but you 
still get all the benefits of testability that separated initialization 
can give you.  And your subclasses can still do interesting stuff in 
__init__ if they want to, since it will get invoked; it's just that 
there may be some pre-initialization variables present at that point.

This comes at the expense of one redundant line of code - the two times 
that 'db' is set - but I think the benefits are well worth that almost 
unmeasurably small cost :).  Plus, if that concerns you, you can factor 
the table-creation logic somewhere else so you don't need partial 
initialization.  In most cases, that's a better idea anyway (although 
I've very rarely seen code where I couldn't figure out how to cleanly do 
it).


[1]: I do know about http://www.postgresql.org/docs/8.1/static/sql- 
notify.html, but it's a pretty obscure feature that most databases don't 
have and that's apparently pretty difficult to use.  I hope it becomes 
more popular in the future though, rarara event-driven etc...




More information about the Twisted-Python mailing list