[Twisted-web] HTTP-AUTH for web2 / Kudos on web2's operation

glyph at divmod.com glyph at divmod.com
Fri Nov 18 12:48:17 MST 2005

On Thu, 17 Nov 2005 16:20:05 -0500, "Clark C. Evans" <cce at clarkevans.com> wrote:
>On Thu, Nov 17, 2005 at 08:17:25AM -0500, glyph at divmod.com wrote:
>| On Wed, 16 Nov 2005 11:11:49 -0500, James Y Knight <foom at fuhm.net> wrote:
>| > I think of something like parsing user authentication information at a
>| > high level up in the resource tree and making it available to a
>| > Resource low in the tree.

>Must we assume that resources form a Tree?  Mine form a Graph; I have
>many resources which can be entered in any number of contexts with
>different "parent" resources.   Viewed this way, the resources which
>a _request_ has passed through forms a stack; but it isn't a tree.

>From the perspective of the request, yes, resources form a tree.  Although a tree-style API can certainly be used to access a graph, there is a definite "up" and "down" *with respect to a particular request*, and if your resources have relationships outside of that, then they can hold regular object references to each other and call methods on each other or whatever, but the web server should not grow abstractions to specifically support that.

Graphs can be problematic as a web data structure, for example, graphs can have cycles, which Nevow specifically disallows (and I think that is generally a good decision).

>Currently, it is my application; and I modify the request object to
>my heart's content.  It may seem dangerous, but I've yet to hit a
>single bug from this usage.

You'll continue to be able to do that indefinitely.  It certainly breaks encapsulation however, and encouraging it as a general technique will almost certainly create problems related to namespace clashes.  We shouldn't break it, but we also shouldn't suggest it.

>| Concretely: come to think of it I've never heard of a legitimate
>| requirement other than "currently logged in user", which ought to be
>| handled by another system anyway.
>See my previous email.  My resources go through many stages of
>processing and are heavily modified by the time I'm ready to
>generate a Response.

I've gone through that message now and more thoroughly understood what is going on.  Those stages are interesting, but I don't think that any of them belong in twisted.web2.  Twisted's model of web interoperability is, and has always been, object publishing.  We aren't going to change that to a stage-based or filter-based scheme.

>| Though I missed the conversation, I imagine the handwavy stuff that JP
>| was suggesting was "the top resource, which is a proxy for the user,
>| should pass information down to its children as it creates them".  Some
>| kind of explicit relationship is required.
>This statement, I don't get at all.  As I understand it, a resource
>processes more than one request; and hence, will be associated with
>more than one user.

A resource is an object.  It may process requests for one user, or for many.  In the twisted.cred model of looking at resources, each user's top resource is unique to that user.  Depending on session management policy the anonymous resource may or may not be shared between anonymous sessions.  It may *wrap* a resource which is common to all users, but the cred way of looking at an object is that each user has a distinct object they communicate with, which determines their view of the world.

Think of it this way: a resource should know what it looks like.  If you are looking at a page that says "Welcome, Clark!", then "Welcome, Clark!" should be an attribute of that resource.  Perhaps that data came from a cookie, perhaps it was somehow identified by a session identifier in the URL, but by whatever technique, by the time you are rendering a resource, it should not be looking at the Request object to determine every little thing about itself.  Things like accept-encodings and accept-languages can modify or filter the result, but the basic data that's there should be accessed by the application by looking at self, not by looking at request.getSession().getComponent(IMyApplication).dataFor(self).(...) or some similar monstrosity.

>| I'll here backpedal just a tiny bit on my earlier adamance on getSession
>| and say that perhaps there is a valid use for access to the currently
>| logged-in user as request data.  By "currently logged-in user" I mean
>| "'topmost' resource object".
>Sorry, I'm lost.   Are you saying that:
>    (a) An Avatar is a "auto-generated" resource perhaps constructed
>        from the SessionManager resource?

That's the way guard works and should continue to work, yes.

>    (b) Each Request object would have a 'stack' of 'previous-resources'
>        that it has visited?  And that I could ask for the 'Avatar'
>        resource in that 'stack' via a method on the request object?

It's not a stack; certain resources can just put themselves into a slot.  If an API is provided to build up large amounts of implicit state through accretion during resource traversal, then the request will snowball in complexity as more and more junk gets stuck to it by different bits of different applications.

>If so, it sounds more complicated than plain-old getSession().  Perhaps
>I just don't understand what problems you've had with getSession?

You're right that getSession and getSiteResource (or whatever this is called) are very similar, but there is a fundamentally different goal in mind.

getSession is designed to bridge requests automatically from within the HTTP server's framework code, by setting cookies and such.  Session management is a task that should be accomplished by a resource object which can be independently tested, not by the server code.

The proposed interface is something that would probably be *used* by a session-manager resource, and might even represent the session, but its purpose is simply to provide some per-request data that can be shared between resources processing the same request, without resorting to random attributes on the request, and with some way to link to the resource that provided that data.

>| Sketch of an interface:


>Ok.  That's very nice.   Just remove the word 'Resource' and you're all
>set; just let it be a regular object.

I suppose this doesn't make much difference.  I want it to be the resource because the accompanying URL should point to it, but I suppose that might be unnecessarily restrictive; at least the URL will point at the thing that set it.

>This similar system would work with Session then?
>   request.setSession( my ISession object )

We could call this object a session, although in that case there is no "ISession" - as I mentioned before, the object passed is application-specific, and the framework should expect absolutely nothing from it.

>Ok.  This is where I get confused.  The top level resource can handle
>multiple requests.  I think you're just referring to one's application
>data?  Perhaps...
>   request.setAppData(an IAppData object)
>where IAppData is any old object that the application wants.

The topmost resource for a particular user is unique to that user, assuming they have logged in with a system like cred.  It's shared among all users if there is no session management going on - in which case, why would you need to know the currently logged in user :).

>| - As I mentioned previously, error messages can be more detailed, since
>| they will indicate what type the current site resource is, and thereby
>| provide the developer some indication of how the offending resource got
>| where it is.  Furthermore the URL can also be used to aid in error
>| reporting, so the developer can tell "where" the path traversal went
>| wrong.  If we want to push the error reporting even harder, we can have
>| an explicit 'interface' argument which has to be passed to both
>| setSiteResource and getSiteResource, used for nothing but matching up to
>| make sure that the implicit state matches the application's expectations.
>Sorry; I don't get this one -- do you mind explaining a bit more?

error-reporting behavior with Nevow:

I have a resource which must be rendered after a Foo has been put into the context, remembered with IFoo.  This happens when FooRoot's locateChild is called, on the context passed to locate child.  I forget to put a FooRoot into the path, which I expect to be at /my-app/foo/<id>.

Keep in mind that locateChild is just one of many places the context is passed: it could be set in renderHTTP and used in the template, it could be set in locateChild and used in renderHTTP, it could be set somewhere early in the template and used somewhere later.

Someone else later places my Foo resource at /my-app/stuff/extra/current-foo.  They receive this error:

  KeyError: interface 'IFoo' not remembered

"Where does IFoo get remembered" can sometimes be as easy to find as a grep, but sometimes it's hard to figure out which of many places is being invoked in the actually-working example you're working from.

error-reporting behavior with interfaces and such using this system:

In a similar situation, I need a Foo on the request.  It's set by /my-app/foo.  I put my Foo resources at /my-app/foo/<blah>.  Someone else puts one at /my-app/stuff/extra/current-foo.  The error-reporting now becomes:

  SiteMismatchError: expected current site resource to provide 'IFoo', but instead found 'IMyApp' <MyApp at 13715> from http://example.com/my-app/

It is then possible for the developer to insert a 'print' statement into a working Foo and watch the logs, which would allow them to see that the URL which sets the IFoo it's using is http://example.com/my-app/foo/ - this might assist in figuring out how to set up a similar structure for /stuff/.

More information about the Twisted-web mailing list