[Twisted-web] State and web2 (or, how to not follow REST)

Sun Nov 20 13:45:30 MST 2005

This is an attempt to summarize a conversation I had /w glyph on
#twisted.web earlier today.  I've attached the IRC log.  The basic
problem discussed was how to manage server-side state; which in
particular includes sessions and authentication.  Stateful servers
cause serious problems with scalability and with bug hunting.

Glyph is adamant that getSession() as it currently works (by raising a
redirect exception if the session does not already exist) must be fixed.
I absolutely agree with this; as it currently stands the top-level
resource must always ask for a Session object to avoid unexpected
redirects down the request stream.  This shouldn't be a lesson of
experience; it should be built-in convention.

A related issue, one to which Glyph is very concerned about, is the
implicit coupling of resources during the handling of a given request.
In particular, where a resource X (located at /foo) sets a variable V in
the request R and then a resource Y (at /foo/bar) comes to depend upon
this variable V.  If this coupling is not made explicit and checked-for,
then an opportunity for rather obscure bugs emerge; one where the
resource Y is re-used in another context (say at /bing/bar) and still
assumes that V is set.   While Twisted framework cannot prevent such
nonsense, it should propose an alternative mechanism, or at the very
least not promote such dynamic resource dependencies.

One way to make resource dependencies explicit is to require that the
constructor for a child resource take an optional ancestor resource in
its constructor.  In this model, each user/session would in essence have
its own top-level resource, and all resources which dependended upon
session state would take in its constructor the parent resource.  This
approach has a few deficiencies: (a) there might be more than one
instance of a resource Y at /foo/bar, one for each user; this is not
only inefficient but makes debugging hard beause the relation of a URI
onto a resource object is not a relation; (b) while leaf resources, such
as a static.File object need not take a parent resource in its
constructor; it forces generic Resources to have a "pass-through" parent
Resource, even if it does not need state information.  In the IRC
conversation, I believe (and hope) this was proposed and then eventually
rejected; but I'm not sure.  I don't like the idea of any Resource
objects in the system being user or session specific.

Another alternative is to add a getSiteResource() and setSiteResource()
to the Request interface.  The SiteResource would then contain the top
level resource "/" which reflected the user's Session and any other
application specific server side state (ie, nasty persistent global-like
variables which breaks REST).  The SiteResource would therefore be an
appliation specific object; it could, for example contain an session-id
and a username property for down-stream Resource authorization.   Later
in the IRC chat, Glyph said he is "coming around to the fact that it's
not really a resource".  This is good; beacuse I don't think that this
server side state is a resource by the definition of web architecture.

I didn't mention it in the IRC chat, but I'm now thinking that these
methods on the request object could be setState() and getState(); and
that they return an arbitrary application-defined object which has all
of the nastly (but unfortunately mandatory and pratical) session and
request "global variables" that break REST and can cause all sorts of
problems.   Glyph mentioned earlier in an email that perhaps a
declarative syntax could be introduced so that arbitrary Resources could
advertise exactly what "state" they will access; and hence, these sorts
of errors could be detected and reported more intelligently.  I like
this idea; it is framework support to prompt developers to put in
the assertion checks that they should already be doing.  It codifies
a solid pratice, and this is a good thing.  ;)

I think that Glyph and I did have a clear agreement: all of the
information on the State (in Glyph's terms SiteResource) object should
be set (and perhaps made read-only?) _before_ any Resource delegation is
made.   We do have a slight (but very slight) semantic difference.  I
see the process of setting up a session and creating any server-side
state as done in a IRequestHandler _before_ any IResources are called.
Glyph sees this as being done "by a resource which sits at the the top
level".  In both models, this sort of stuff is done before your average
every-day resources are processed; my model is just more explicit and
allows for chaining IRequestHandlers (such as one for sessions, and
another one for authentiction) before IResources are processed.  

I think that's about it.  I just want a simple solution to this, 
and soon.

Best,

Clark

-------------- next part --------------
12:18 -!- glyph [n=glyph at c-24-61-138-211.hsd1.ma.comcast.net] has joined #twisted.web
12:22 < cce> glyph: here
12:23 < cce> sorry; I have horrible connection /w latency and frequent drops
12:23 < cce> but my irc client is server side /w screen; so I can reconnect
12:25 < cce> Anyway, my primary question is how you pictured a down-stream Resource to obtain the Session-ID (which is needed for my application logging)
12:26 < glyph> cce: There are a number of ways
12:26 < glyph> cce: The getSiteResource thing from the mailing list is one way
12:26 < cce> A secondary question is... how is this method actually different from a getSession() both in operation and in the ability to prevent program logic errors.
12:26 < cce> ok
12:26 < glyph> cce: but that's really the non-preferred way
12:26  * cce listens intently.
12:26 < glyph> cce: the way that apps generally *should* work is that, under all these resources, there is a model.  your topmost resource is pointing at that model
12:27 < glyph> cce: when that resource renders itself or hands off the request to its children, it first gathers appropriate data or sub-objects from the model to create the Response or the Resource child
12:27 < cce> ok
12:27 < glyph> cce: so let's say an attribute of your underlying model is LoggingContext(sessionID=xxx)
12:28 < cce> so, all of the information needed to process the request (besides what is in the request itself) is collected by this GenerateModel mechanism.
12:28 < glyph> Right
12:28 < cce> That's how my app works; wonderful. ;)
12:29 < cce> by the time any resouce processing starts; all "state dependent" information has been previously constructed
12:29 < glyph> the top of the session for a particular user is all determined by what object is returned by the IRealm implementor returns from requestAvatar
12:29 < cce> in this way _all_ resources are stateless (and no additional information is put on the request)
12:30 < glyph> cce: Put another way, all Resources are *views*, and the underlying model is something distinct that they wrap
12:30 < cce> ok; we're on the same page here 
12:30 < cce> so the session logger is part of the model
12:30 < cce> (which is application specifi)
12:30 < glyph> cce: The way that Mantissa et. al. do this is just by adapting the user's login database object to IResource; that adaptation already has a 'original' (model) object, which is the user's private database
12:31 < glyph> cce: that's about where I stop suggesting that everyone do it the same way though; if you want to use mantissa's authentication model you can just use mantissa, IRealm implementations don't belong in web2 :)
12:31 < cce> and this is better than a request.get/setPeer(...) since the resources arn't modifying the request (they are only interacting with the model)
12:31 < glyph> cce: yes absolutely
12:31 < cce> well, this is common sense (IMHO)
12:32 < cce> glyph: ok, question, is there a reason why the Model should be considered a Resrouce itself?
12:32 < glyph> cce: no, I'm not advocating that
12:32 < cce> ok, so from my low-level resource / request pair, how do I get the model?
12:32 < glyph> cce: attributes of self :)
12:32 < cce> request.getModel() ?
12:33 < glyph> noooo
12:33 < cce> its a property of Request, right?
12:33 < cce>   def render(self, request):
12:33 < glyph> cce: each Resource is initialized with whatever model objects it needs as arguments to __init__
12:33 < cce>     # from within a nested IResource
12:33 < cce> oh; but then my resources are _user_ specific
12:33 < glyph> cce: that's why the "top of the resource tree" is your "session"
12:33 < cce> oh, I don't like that
12:34 < cce> my resources should not be specific to a given user
12:34 < glyph> cce: why not?
12:34 < cce> a File Resource, in particular, should not care what user it was created from
12:34 < glyph> cce: ah yes
12:35 < glyph> cce: that's the particular use case where you need getSiteResource
12:35 < cce> I _do_ like the idea of officially recongizing and talking abuot a model construction stage
12:35 < glyph> cce: your-app-with-model-data -> File resource -> some-other-part-of-your-app-dynamically-created-from-a-file
12:35 < cce> are you assuming that Resrouces form a tree?
12:35 < glyph> getSiteResource is for passing your model data between two resources which are on opposite sites of a generic resource doing some kind of dispatching
12:36 < glyph> cce: calling it a "tree" or a "graph" is really getting caught up in semantics
12:36 < cce> well, there is a difference
12:36 < glyph> cce: there's a traversal path of calls to locateChild
12:36 < glyph> cce: it could be traversing a tree, or a graph, whatever you like; the framework won't ever enforce that
12:36 < cce> in a tre you can get resource.getParent() and return a parent resource regardless of the Request
12:36 < cce> I don't want this assumption; so I'm just checking
12:37 < cce> so getSiteResource() is a property of Request then?
12:37 < glyph> yes.
12:37 < cce> how is  getSiteResource() different from getModel() in this case?
12:37 < glyph> cce: I don't like explicit functions for dealing with models
12:37 < glyph> cce: Maybe your view requires multiple attributes to its initializer, making its "model" complex
12:37 < cce> (where a model is the program's representation of the user's information that is constructed _before_ any resources are active_)
12:38 < glyph> cce: a good example of this is that you've got your user-specific model and your generic site-wide model
12:38 < glyph> cce: the reason I say 'SiteResource' and not 'SiteObject' is because the point of getSiteResource is view code communicating with other view code; if one view needs to extract a model component from another, that's fine, the view code is in the same layer and it can know what to do there
12:38 < cce> ok, so I'd do getSiteResource().getModel() then?
12:39 < glyph> cce: but the web framework code shouldn't be dealing in arbitrary objects, it should stick to interfaces it knows about
12:39 < glyph> cce: yes
12:39 < cce> glyph: I fail to see how this is any different than getSession()
12:40 < cce> other than the *pratice* that a "Session" or "Model" is constructed _before_ any resource processing takes place.
12:40 < glyph> cce: except getSiteResource() is a Resource (or, allowably, another object) of your own design, so getModel() can be whatever you want; getSessionIDForLog or whatever; the bottom resource in the tree
12:40 < glyph> cce: it's more restrictive
12:40 -!- dreid [i=dreid at c-67-166-157-80.hsd1.ca.comcast.net] has joined #twisted.web
12:40 < glyph> cce: getSession allows you to have multiple parallel sessions set by different systems and passed through the web framework
12:40 < cce> well, it's an extra call; but that's about it.
12:40 < cce> ok
12:40 < cce> (listening)
12:41 < cce> so there is exactly _one_ SiteResource independent of user?
12:41 < glyph> cce: yes
12:41 < glyph> cce: the idea is, don't provide *any* unnecessary mechanisms for users to jam their model objects into the web framework
12:41 < cce> ok, so request.getSiteResource().getModel(request) is needed
12:41 < glyph> cce: models should be passed from model to model
12:41 < cce> glyph: you're confusing me
12:42 < cce> I want to draw  "Hello John" in my resouce 5 layers down
12:42 < glyph> cce: OK wait, let me start with the simpler difference
12:42 < cce> how do I get the user's name...
12:42 < cce> from that resource/request pair
12:42 < glyph> cce: the main difference is that getSession raises a redirect and sets a cookie - this will not do that :)
12:42 < cce> ah; super
12:43 < cce> but let's assume I do a request.getModel().username instead
12:43 < cce> where my application model always sets up a session _before_ any Resources are accessed.
12:43 < cce> how would you do it?
12:43 < glyph> cce: the other difference is superficial, and not too important to understand.  It's a bit more limited than getSession (one state object per request vs. many) but you won't hit the limitations unless you are trying to do things which are bad, and if you already understand that you should be communicating between model objects in terms of your particular model interfaces and methods, it is unlikely that you will be trying to do that
12:44 < cce> yea, I was glossing over the session vs request-specific data
12:44 < cce> that can be handled in the application's model ;)
12:44 < glyph> right :)
12:44 < glyph> Session management will be implemented by a resource which sits at the top level and determines what the top-level model-wrapping site resource is
12:45 < cce> ok
12:45 < cce> what invocation would I need to get the username from my model?
12:45 < glyph> cce: getting there :)
12:45  * cce grins wildly.  (and thanks for your time, BTW)
12:46 < cce> I will make good by writing up meeting minutes and posting them
12:47 < glyph> the session manager won't put any data into the request at all; if your application needs to pass site-specific data down to views that are not created through explicit locateChild that passes along appropriate model data (say, through a static.File, or a vhost) then you can use setSiteResource when you create the File or the Vhost, and then in the locateChild of the resource created through the generic dispatcher resource (file or v
12:47 < glyph> you can call getSiteResource and retrieve it
12:48 < glyph> generic non-application-specific resources like files and static.Data and soforth will therefore never call either of those methods
12:48 < Karnaugh> tell me something
12:48 < glyph> When you need to get the username (here we are!) in a resource, it should be somewhere down in renderHTTP or after locateChild has been called
12:48 < Karnaugh> How does Guarded persist an Avatar internaly?
12:49 < cce> glyph: sorry, I'm confused
12:49 < glyph> Karnaugh: dictionary
12:49 < glyph> cce: let me go with a more concrete example
12:49 < glyph> example 1: your app can do all its communication properly without using any dispatching resources
12:50 < glyph> your user logs in and gets a CCEAvatarResource(cceAvatarModel) as their top resource
12:50 < Karnaugh> glyph: ok, but I'm wondering how it tracks the users session, or is that mangled into Twisted's resource interface or something
12:50 < glyph> cceAvatarModel.username is u"John"
12:51 < Karnaugh> oh wait, it comes from the realm
12:51 < glyph> Karnaugh: yep
12:51 < glyph> cce: The URL being processed is /app/appobject1
12:52 < cce> ok
12:52 < glyph> cce: CCEAvatarResource.child_app looks like this: 'return CCEApplicationObject(self.avatarModel)'
12:54 < glyph> cce: CCEApplicationObject.locateChild looks like this: objname = segments[0]; return CCESingleObjectView(self.avatarModel, self.avatarModel.dataModel.getObjectByName(objname)), segments[1:]
12:55 < glyph> cce: in CCESingleObjectView.renderHTTP (or in the appropriate place in the template) you can simply do 'return self.avatarModel.username'
12:57 < glyph> cce: does this example make sense?  (keep in mind that my point is not that you have to have all these intermediary resources - in an application like a blog where URLs are like /2005/11/15, you would probably create a 'Post' object straight from multiple segments rather than creating an intermediary Year/Month/Day object - my point is that if you do, this is the way to pass data between them.)
12:58 < cce> glyph: so, what your're saying is that the /app resource would be _specific_ to each user
12:58 < cce> and thus the /app/appobject1 would also be specific to each user
12:58 < Karnaugh> glyph: if you have static content or children under a resource, what would be a sensible way to ensure those ar eproccessesd first by locate child before getting arguments like /2005/11/15?
12:59 < cce> ie, you're talking about _alot_ of unnecessary objects, IMHO ;)
12:59 < Karnaugh> I think that sort of thing should be decoupled from the locateChild implementation...
12:59 < cce> Karnaugh: well, if your static resources have dynamic resources, you'd need one copy of each static object (one per user)
12:59 < glyph> cce: not true :)
13:00 < cce> so that it can create the dynamic objects /w the right model
13:00 < glyph> cce: You can easily store a shared static object in a fixed location
13:00 < cce> glyph: I was referring to static objects that have dynamic children (not static leaf objects, which are fine)
13:00 < glyph> cce: I'll get to the static.File example next :)
13:01 < cce> yea, static.File is fine with your presentation
13:01 < cce> it is a leaf (and I'm not sure if you can have static objects that arn't leaves)
13:01 < Karnaugh> What i was saying is if i have /blog/add and /blog/delete and /blog/1 or something where the latter is a refference to some id
13:01 < cce> glyph: well, I must say, I'd rather have request.getModel()
13:01 < Karnaugh> you have to reimplement locateChild to handle the children
13:01 < Karnaugh> afaik
13:01 < glyph> cce: also, you don't need the unnecessary objects; I imagine in a system like yours based on stages, you can have locateChild immediately locate the appropriate leaf resource
13:02 < glyph> cce: and just pass the model data to that leaf in the same way that I passed it down to each intermediary resource
13:02 < cce> yea; it works for me, no doubt
13:02 < Karnaugh> thing is if you're making a big system, constantly customising locateChild for each resource becomes a chore
13:02 < cce> (but anything can be fudged to wkr)
13:02 < cce> glyph: I agree with Karnaugh
13:03 < glyph> Karnaugh: Yeah, we are probably going to add some features to Nevow soon which can be cribbed by web2 for simplifying that chore
13:03 < cce> it is alot of creating/dicarding resource objects
13:03 < cce> unnecesarly
13:03 < Karnaugh> well you're not creating or discarding anything, you just have to reimplement locateChild every gosh darn time
13:03 < cce> how is this any different from putting a getModel() on the request?
13:04 < cce> Karnaugh: well, if each request creates a different model; then _all_ of the child Resoruces need to be re-created
13:04 < glyph> cce: getModel on the request requires that the resource be passed an "appropriate" request in order to be able to render itself
13:04 < glyph> cce: it's unnecessary coupling
13:04 < glyph> cce: in most cases, the resource ought to be able to render itself for *any* request
13:04 < cce> glyph: I think you're just moving the complexity (to a more innefficient form) not actually solving the problem ;)
13:05 < glyph> cce: in some cases it will depend on a cookie or a session-id or whatever, but the idea is to keep all that dependency locked away at the top of the tree, and have all the resources lower down be able to be unit tested with an extremely simple subset of Request
13:05 < glyph> cce: I don't understand why you think this is inefficient
13:05 < cce> well; with this mechanism you're essentially forcing Resoruces (that are Branches) into a Tree
13:06 < cce> ie, you mine as well add a getParent() method while you're at it
13:06 < glyph> cce: no
13:06 < cce> and this tree is essentially created at the time the request arrives
13:06 < glyph> cce: consider the case of the shared static File resource that multiple different users' Resource objects return
13:06 < glyph> cce: they are returned as children from different parents
13:06 < cce> I said Resources that are _branches_
13:06 < cce> a file isn't a branch
13:07 < cce> glyph: well, it's a neat idea, I suppose
13:07 < cce> glyph: ok
13:08 < cce> I've got a counter example problem
13:08 < cce> Suppose that you want to make a 3 deep resource tree
13:08 < cce> but the 2nd and Resource is "generic" module shared across several projects
13:09 < cce> in this case, you'll have to allow for the "model" to be anything, and it just passes along the model
13:09 < cce> so; your 3rd resource deep is passed a model; but it doesn't really know what kind of model
13:09 < glyph> Right.
13:09 < glyph> That's what getSiteResource is for.
13:09 < cce> hence, if your goal is to be able to mix resources form different sources you will always run into the problem of a bad context
13:09 < glyph> In this case, you do need context
13:09 < cce> ie, a resource is used in a way where the data it needs isn't there
13:10 < glyph> and you can't route around it, so the framework *needs* to provide support
13:10 < glyph> that's exactly the case I'm proposing getSiteResource for
13:10 < cce> glyph: the simplest solution, IMHO, is a getModel() on the request, and ask your programmres to be remotely intelligent by (a) asserting that the things they need are there, and (b) writing regression tests.
13:10 < glyph> however, I don't think we should introduce that context dependency *unless* it is needed
13:10 < glyph> the context dependency has a cost
13:11 < cce> ok, so getSiteResource() is a property of the Request?
13:11 < glyph> cce: Yes
13:11 < cce> ok, so that is essentialy my model...
13:11 < glyph> cce: You call setSiteResource in resource #1, and then call getSiteResource in resource #3 in your example, yes
13:11 < cce> any reason why you don't want to call it getModel ?
13:12 < cce> ie, it really isn't a Resource
13:12 < cce> (accoridng to the web definition anyway)
13:12 < glyph> cce: Yeah, I think I'm coming around to the fact that it's not really a resource
13:13 < cce> ok; as much as I like your 'pass-along-the-model-via-constructors' I think it doesn't allow for genreic resources and it has the tendency to create alot of "duplicate" Resoruce objects for the same URI, and this could make debugging harder, not easier.
13:13 < glyph> cce: I don't mind calling it something else, I don't like "getModel" because it sounds too generic, it implies to me that it's suggested that model data *always* be applied to the request, whereas I want to make it clear that such data shouldn't be applied unless it's necessary to traverse a generic resource
13:14 < cce> glyph: well, there are only two peices of information that I really need down-low
13:14 < glyph> cce: setRequestSpecific and getRequestSpecific, maybe
13:15 < cce> AuthenticatedUserIdentifier (a string)
13:15 < cce> and SessionIdentifier (a string)
13:15 < cce> I don't even want them to be objects
13:15 < cce> (my app can look the objects up if they need them)
13:15 < glyph> cce: you mean you don't want them to be user-defined classes? :)
13:15 < cce> could the Request object just have those as properties? None if the sessoin identifier or the authenticated user identifier isn't therE?
13:15 < glyph> cce: (strings are objects, rara)
13:16 < cce> yes; I suppose so; ok
13:16 < glyph> cce: Yes, you can set them in your app, and that will work fine, I just don't thin it should be the suggested mechanism
13:16 < glyph> cce: I'd prefer a documentation convention so that request-specific state is really explicit in every class that uses it
13:16 < cce> but the two common cases is I want to get the username and the session-id, and I'd guess these are the common needs of most commercial applications
13:16 < cce> glyph: fantastic idea
13:17 < glyph> cce: such a documentation convention strongly implies framework support and conventions too - that's all I'm suggesting this getRequestSpecific is; it's really a workaround in my mind :)
13:17 < glyph> anyway, I must depart
13:17 < cce> thank you so much for the chat