[Twisted-web] HTTP-AUTH for web2 / Kudos on web2's operation

Thu Nov 17 06:17:25 MST 2005

On Wed, 16 Nov 2005 11:11:49 -0500, James Y Knight <foom at fuhm.net> wrote:

>I think of something like parsing user authentication information at  a high 
>level up in the resource tree and making it available to a  Resource low in 
>the tree. It is fairly clearly a per-request bit of  state which may be 
>useful to a some subset of the pages, and as such  seems like it should be 
>attached to the Request. Just going and  adding random attributes onto 
>Request is not nice thing to do, so  I'll pretend I didn't just consider it. 
>But it really seems like some  designated storage spot for extra data is 
>necessary.

Abstractly: if you have a page which requires some "per-request state" to be set by a resource higher up in the tree, and can't operate without it, how do you guarantee it's been set?  Can we come up with any declarative interface for specifying this relationship?  I've never seen one and it seems like it would be insanely hard to do.  The convention in Nevow is to look for something in the context, find 'None', and midway through rendering a page, vomit a "'NoneType' object has no attribute 'x'" traceback onto the user's page with no explanation of what data was expected or where it was supposed to come from.  Regardless of whether we use the context or not, this is not a convention I think we should continue with.

Concretely: come to think of it I've never heard of a legitimate requirement other than "currently logged in user", which ought to be handled by another system anyway.  As per CCE's previous emails, the system for initializing those and setting them up needs to be pretty flexible in terms of how and when login forms get displayed, but the use of cred is non-negotiable.  We need a unifying cross-protocol abstraction.

Though I missed the conversation, I imagine the handwavy stuff that JP was suggesting was "the top resource, which is a proxy for the user, should pass information down to its children as it creates them".  Some kind of explicit relationship is required.

Let me try to anticipate the argument against this here: you have a top-level site which is an application.  One of its children is some generic type of resource with no mechanism to pass additional information down through it, such as a static.File.  That static.File has a dynamic child which is a super-simple epy-style script that is nevertheless part of the application - it needs to know what user is currently logged in, or it needs to display a little shopping cart emblem that indicates that the currently logged-in user can buy stuff on this page, and a link to the user-specific page where they can buy it.

I'll here backpedal just a tiny bit on my earlier adamance on getSession and say that perhaps there is a valid use for access to the currently logged-in user as request data.  By "currently logged-in user" I mean "'topmost' resource object".

This still establishes a slightly more implicit interface than I'd like, but without it, it would be difficult to write any kind of generic 'dispatcher' resource (virtual hosting, etc) which exists below a session-capable application.  At least the rules are simple: certain kinds of resources are "applications" or "sites".  Any resources they return from locateChild should expect them, and ONLY them, or nothing, as the current site resource.  Rather than being of the "something was missing" variety, errors will instead be of the "I expected a site of type X, instead I got one of type Y" variety, which tend to be easier to diagnose. Sketch of an interface:

Site resources call Request.setSiteResource() in locateChild - this method takes no arguments, and preserves the current URL being processed as well as the current resource.

Regular resources can, at any time, call Request.getSiteResource() or Request.getSiteURL() which return, respectively, the last resource to call setSiteResource() and a nevow.url-style object that indicates its location.

This was the intent of the original twisted.web Site object, although it has become clear over time that it isn't really feasible to use Site for this since putting them anywhere but at the absolute top of the resource hierarchy causes other problems.

I still think that this sort of interface should be used sparingly, but it has several advantages over the context as manifested in Nevow.

 - The relationship between objects which require implicit state and objects which provide it is more explicit.  They must both be resources, one calls setSiteResource, one calls getSiteResource.

 - By default, no site resource is set, so you must always implement both halves of the interface.  In other words, this is entirely a facility for an application to communicate with itself, no framework code should ever provide a site resource, and no framework code should ever expect to find anything useful from the site resource.

 - All information required by a given application must be encapsulated by the single top-level resource.  No piecemeal assembling of required information from 5 different interfaces set by 5 different systems during resource traversal.  If you need information from a different system, the top-level resource can aggregate it using traditional means (methods returning different objects, attributes, adaptation, etc)

 - All information is specifically "per HTTP request" rather than "per abstract transaction execution" - Nevow (or other templating / page generating mechanisms used with web2) can be expected to do any necessary translations at render time and not abuse the same mechanism used by the dispatch mechanism.

 - By providing a URL as well as a site object, sub-applications can properly link to resources defined by the application.  I'm sure that most people will ignore that and link directly to / most of the time, but at least it is the kind of issue which *can* be resolved by using this interface.

 - As I mentioned previously, error messages can be more detailed, since they will indicate what type the current site resource is, and thereby provide the developer some indication of how the offending resource got where it is.  Furthermore the URL can also be used to aid in error reporting, so the developer can tell "where" the path traversal went wrong.  If we want to push the error reporting even harder, we can have an explicit 'interface' argument which has to be passed to both setSiteResource and getSiteResource, used for nothing but matching up to make sure that the implicit state matches the application's expectations.

Thoughts?