[Twisted-web] HTTP-AUTH for web2 / Kudos on web2's operation

Fri Nov 18 17:54:58 MST 2005

On Fri, 18 Nov 2005 16:47:02 -0500, "Clark C. Evans" <cce at clarkevans.com> wrote:

>Thank you for taking time to discuss this more.

Thanks.  I am trying to be more involved in Twisted's direction.  I've been auditing some code lately (not necessarily in web2 ;-)), and finding some good stuff and some unpleasant surprises.  The surprises stem from people trying to preserve my original design constraints without really understanding what they were for or what's going on, so I'm trying to be a bit more forceful and direct, rather than hanging back and not saying anything when I don't have time to code it myself.

>I think I disagree that
>twisted core currently is, or should be an object publishing system.

I'm sorry that you disagree, Clark, but that's the design :).  We thought for a while about taking the resource model out of twisted.web2 (we even had some real-life, face-to-face meetings about it!  with someone taking minutes and everything!), but if you take it out, there isn't any way to integrate with cred, and thereby the rest of Twisted.

Object publishing will remain the design.  You seem to have some misconceptions about what that means, though...

>By an "object publishing system", I mean a system where every object
>in the system is a Resource, and hence has a *unique* URL.  

Lucky for you, that's not what *I* mean when I say "object publishing system".  URLs are insufficiently descriptive anyway; things like the current time, cookies, and even random numbers can influence what object is present at a particular URL.  As you observe, sometimes it's dynamically calculated.

>For starters, some objects in the system (such as a Session) by
>default do not have a URL, and thus by definition, is not a
>Published Object (aka a Resource).  But the current implementation
>of web2 goes even further; it is possible for two distinct "Resource"
>objects to have been accessed by the same URL (see web2.static.File,
>which dynamically creates a Resource object for children).

As I've mentioned in a few previous posts, under guard, the SessionManager returns a Resource which corresponds to the current session.  Generally the path portion of that URL will simply be "/" for whatever server you're on.

>Then, an IResource is defined as a _kind_ of request handler that eats
>exactly one path segment

Only being able to eat exactly one path segment is a design error; there is no reason for any interface to support only one segment at a time, c.f. http://divmod.org/trac/wiki/PotatoProgramming

>from the request; and it breaks handleRequest
>into two cases:  (a) one that returns another IResource aka locateChild(),
>or (b) one that returns a Response, aka render(). However, a IResource
>is a very special kind of IRequestHandler -- one that respects the
>uniqueness constraints of an object publshing system.

Your "IRequestHandler" abstraction breaks all kinds of useful patterns for cooperation between different chunks of web code, as far as I can tell.  In any case, concretely speaking, IRequestHandler sounds exactly like IResource minus the ability to distinguish between different handlers for different paths.  (And yes, the HTTP specs deal explicitly with path segments, the URI specs deal with them, and browsers implement HTML specifically to deal with them.  They're not some imagined thing on the part of twisted web.)

>In this logic, an IAuthenticator is _not_ a resource, but rather a
>IRequestHandler that does a bunch of checks; but otherwise largly
>passes-through the request onto the next processing stage.

The equivalent IAuthenticator in the IResource model simply consumes no segments and defers to another resource.  All you're talking about is removing the ability to consume segments from the base API, making top level resources radically different from and incompatible with IResource objects which implement the bulk of existing, useful functionality in t.web and nevow.

>In logical terms, the ISessionManager should associate each IRequest
>with an ISession; you can then adapt(request,ISession) to obtain the
>given session.  If the IRequest interface provides a short-cut for
>this is really an implementation detail; but one with clear value.

Out of curiosity, what methods do you think ISession provides?

As far as the functionality concerned: SessionManager as a Resource would simply consume zero segments from locateChild, as I said above.

>In summary, I think you're confusing arbitrary objects in the system
>with Resources; and I think the web2 module is already overly-complicated
>since it is addressing a higher level of abstraction than what is
>absolutely required.   In my application, I do not have Resources
>via the definition of an object publishing system -- nor do I want
>to be burdened with this distinction.  I have my own URL processing
>and I don't find the web2 concept of "segments" helpful.

At this point we may have to agree to disagree.  I don't find your URL processing helpful, either, and I feel that the Resource API has proven itself over the course of half a decade of my own, and many others', web work by now.  Being able to consume multiple segments at once is an important feature, but it's been around in Nevow for quite some time now.

You can definitely implement your traversal scheme with the mechanisms provided in twisted.web2 at multiple levels, either on top of the base HTTP implementation or as a Resource, and if it works for you, please, be my guest.  However, the point of twisted.* is not to address the absolute minimum required basis for your application.  Its purpose is to provide an integration framework, in the spirit of the various specifications that it implements.  Your innovations might be neat, but, for example, /x=y/ setting variables is definitely outside the spirit of what the HTTP spec says.

In the context of nevow/web/web2, Resources seem to work to do this for quite a few people.

>Following are specific comments related to the above...
(snip)
>Each IRequest object has a member variable, 'peer', which is a mapping
>from interfaces, such as IFoo onto the object that implements that
>interface.  So, request.peer[ISession] will give me the session
>associated with that request.  The appropriate __conform__ logic can
>also be implemented so that adapt(peer,ISession) works.

That's the way that Nevow's session handling works and I think it's worked out pretty poorly.  It leads to the same kind of confusion as the context.  I would prefer to avoid repeating that mistake.

>| I've gone through that message now and more thoroughly understood what is
>| going on.  Those stages are interesting, but I don't think that any of
>| them belong in twisted.web2.  Twisted's model of web interoperability is,
>| and has always been, object publishing.  We aren't going to change that
>| to a stage-based or filter-based scheme.
>
>Assume for a moment that IRequestHandler is the basis for web2,
>and that IResource layers on "object publishing" semantics.  Further
>assume that the 'peer' attribute on each request maps interfaces
>onto objects associated with that interface.

Now I'm assuming that I've somehow allowed two massive changes into Twisted for a benefit that I can't understand at all...

>there is no reason why
>I should be forced to layer my IStage on top of an IResource; my
>stages are not resources.

In fact there are lots of good reasons.  The main one is that by layering IStage on top of IResource, you can defer back to other IResources easily, and it is clear to the resources what portion of the path they should be handling.  Another is that someone else might want to have your Stage only apply to resources below a particular tree, let's say /cceapp/.

>Are all objects resources?  If not, what must an object have to be
>a resource.  If the answer is "implements IResource", then I ask
>you, is a Session a resource?  If so, what does it's locateChild
>look like?

nevow/guard.py lines 289 to 326. ;-)

Actually, that's slightly wrong.  The 'session' is a user-provided resource, whose locateChild does whatever they want.  The locateChild I'm referring to there does session management.

>| Depending on session management policy
>| the anonymous resource may or may not be shared between anonymous
>| sessions.  It may *wrap* a resource which is common to all users, but the
>| cred way of looking at an object is that each user has a distinct object
>| they communicate with, which determines their view of the world.
>
>Ok.  That's good, an Avatar; but is an Avatar an IResource?

"avatar" is a general term which means "implementation of protocol-specific interface which represents a session with a user, or the special anonymous 'user'".  In web-land it is generally an IResource, but as I said I am open to other suggestions, provided they come along with some *significant* benefit.

>| (snip Resources should be self-contained)

>Here is where we part ways.  This view of the the processing model
>is an unnecessary restriction and should not be pladed upon web2.

Please name one way that using the convention of 'stages' being simply Resources which consume zero segments is 'restrictive'.

>| >   (a) An Avatar is a "auto-generated" resource perhaps constructed
>| >       from the SessionManager resource?
>|
>| That's the way guard works and should continue to work, yes.
>
>An avatar is not a resrouce; if it is, what is it's URL?  What does it
>look like (to phrase it with your definition)?

Its URL is "/"+(implicit modifications by cookies and server's interpretation of cookies)

>Assuming a 'peers' collection; you only need to access the peers
>that your RequestHandler (or IResource) knows or cares about.

I've worked with a couple of systems that worked that way, and that's generally not what happens.  People notice that 'peers' (as you're calling them) are handily available in some context they're working in, and start using them.  Then they can't figure out how to write test cases for their own code becuase they don't know how that contextual information got set up.  Also, their model objects are totally broken without lots of implicit context from the web-rendering code path.  See also Zope's now-abandoned implicit acquisition for why this is bad.

>| getSession is designed to bridge requests automatically from within the
>| HTTP server's framework code, by setting cookies and such.  Session
>| management is a task that should be accomplished by a resource object
>| which can be independently tested, not by the server code.
>
>No disagreement here.

Great!  I was waiting for one of those :).

>| The proposed interface is something that would probably be *used* by a
>| session-manager resource, and might even represent the session, but its
>| purpose is simply to provide some per-request data that can be shared
>| between resources processing the same request, without resorting to
>| random attributes on the request, and with some way to link to the
>| resource that provided that data.
>
>It is not necessary to link data associated with a request with
>the 'Resource' that provided the data.

I disagree.  The only reason to avoid providing this kind of information is if performance requirements dictate that it is too expensive.

>| I suppose this doesn't make much difference.  I want it to be the
>| resource because the accompanying URL should point to it, but I suppose
>| that might be unnecessarily restrictive; at least the URL will point at
>| the thing that set it.
>
>Well, if you want to _expose_ a URL to the user for them to view
>their session; then, it is indeed a Resource.  However, not all
>sessions need to be Resources, no?

I don't understand what you mean by "session".  Broadly, your session is simply everything you could possibly access with the credentials and client-side state you have currently provided.  Perhaps you are talking about some smaller in-memory object which is an implementation detail of the session manager hooking your "session" in the broad sense to your web browser; in the guard sense these implementation details are hidden entirely from the user or the application programmer, and the visible session-object abstraction is the resource that the user is viewing.  Session-specific data can be attributes of 'self' on that resource, because presumably it affects the view, and then that same resource can access those attributes of 'self' and pass them to its children or render them in renderHTTP.

[snip more stuff about 'peer', I think I already addressed that enough times]

>Does this top-most "resource" have a URL?  If not, then it
>isn't a resource.  *poke*

Yes, the top-most resource is /.

>| error-reporting behavior with Nevow
>
>Ouch.  Is this good?

Noooo... it is exactly the thing I am trying to get away from.

(other case)

>Wow.  Is this good?

Better than the Nevow case, at least.  I'm not a big fan of per-request state, I think this should be handled sparingly.

>No way am I adding a /foo/ to my path to reflect that 'foo' logged-in;
>or perhaps I didn't understand.

Nope, /foo/ is just some random application component that lives at that URL, which has child resources that depend on some implicit state it provides.