[Twisted-web] HTTP-AUTH for web2 / Kudos on web2's operation

Fri Nov 18 14:47:02 MST 2005

Glyph,

Thank you for taking time to discuss this more.  I think I disagree that
twisted core currently is, or should be an object publishing system.
It's ok if Nevow is an object publishing system; but you should not
restrict the applications of twisted.web2 to object publishing.

By an "object publishing system", I mean a system where every object
in the system is a Resource, and hence has a *unique* URL.  That is,
if I have two distinct objects in the system, they have different
URLs; and if I have two URLs that refer to actual resources in the
system, they refer to different objects.

For starters, some objects in the system (such as a Session) by 
default do not have a URL, and thus by definition, is not a 
Published Object (aka a Resource).  But the current implementation
of web2 goes even further; it is possible for two distinct "Resource"
objects to have been accessed by the same URL (see web2.static.File,
which dynamically creates a Resource object for children).

But overall, I think the decision to give "special meaning" to path
segments is a mistake at such a low-level of web2; it seems to imply
this 1-1 correspondence which doesn't actually exist.  A better 
low-level interface would just be something like:

class IRequestHandler
     def handleRequest(self, request):
          # returns one of:
          # - a IResponse to be returned to the client
          # - an IRequestHandler which is used for further processing
          # or, a deferred which yields one of the above.

Then, an IResource is defined as a _kind_ of request handler that eats
exactly one path segment from the request; and it breaks handleRequest
into two cases:  (a) one that returns another IResource aka locateChild(),
or (b) one that returns a Response, aka render(). However, a IResource
is a very special kind of IRequestHandler -- one that respects the
uniqueness constraints of an object publshing system.

In this logic, an IAuthenticator is _not_ a resource, but rather a
IRequestHandler that does a bunch of checks; but otherwise largly
passes-through the request onto the next processing stage. I'm not sure
how an ISession fits into this, but it is not a request handler (and it
certainly isn't a resource). A Session object is the product of an
ISessionManager request handler when applied to the particular request.

In logical terms, the ISessionManager should associate each IRequest
with an ISession; you can then adapt(request,ISession) to obtain the 
given session.  If the IRequest interface provides a short-cut for
this is really an implementation detail; but one with clear value.

In summary, I think you're confusing arbitrary objects in the system
with Resources; and I think the web2 module is already overly-complicated 
since it is addressing a higher level of abstraction than what is
absolutely required.   In my application, I do not have Resources
via the definition of an object publishing system -- nor do I want
to be burdened with this distinction.  I have my own URL processing
and I don't find the web2 concept of "segments" helpful.  

Following are specific comments related to the above...

On Fri, Nov 18, 2005 at 02:48:17PM -0500, glyph at divmod.com wrote:
| Graphs can be problematic as a web data structure, for example, graphs 
| can have cycles, which Nevow specifically disallows (and I think that is 
| generally a good decision).

If you're talking about Resources, yes, I absolutely agree.  However,
this is not a necessary restriction on a RequestHandler; in deed, a
RequestHandler might return itself from handleRequest -- this forms
a cyclic graph, no?

| You'll continue to be able to do that indefinitely.  It certainly breaks 
| encapsulation however, and encouraging it as a general technique will 
| almost certainly create problems related to namespace clashes.  We 
| shouldn't break it, but we also shouldn't suggest it.

Ok.  The _primary_ problem with Request objects is that you don't
want to get into naming clashes.  This is legitimate.  I think 
some sort of adapt() mechanism is needed.  How about this...

Each IRequest object has a member variable, 'peer', which is a mapping
from interfaces, such as IFoo onto the object that implements that
interface.  So, request.peer[ISession] will give me the session
associated with that request.  The appropriate __conform__ logic can
also be implemented so that adapt(peer,ISession) works.

| I've gone through that message now and more thoroughly understood what is 
| going on.  Those stages are interesting, but I don't think that any of 
| them belong in twisted.web2.  Twisted's model of web interoperability is, 
| and has always been, object publishing.  We aren't going to change that 
| to a stage-based or filter-based scheme.

Assume for a moment that IRequestHandler is the basis for web2, 
and that IResource layers on "object publishing" semantics.  Further
assume that the 'peer' attribute on each request maps interfaces
onto objects associated with that interface.

In this case, my "alternative" to object publishing has an IState
object associated with each request; and an IStage interface that
inherits from IRequestHandler.  That said, there is no reason why
I should be forced to layer my IStage on top of an IResource; my
stages are not resources.

| A resource is an object.  It may process requests for one user, or for 
| many.  In the twisted.cred model of looking at resources, each user's top 
| resource is unique to that user. 

Are all objects resources?  If not, what must an object have to be
a resource.  If the answer is "implements IResource", then I ask
you, is a Session a resource?  If so, what does it's locateChild
look like?

| Depending on session management policy 
| the anonymous resource may or may not be shared between anonymous 
| sessions.  It may *wrap* a resource which is common to all users, but the 
| cred way of looking at an object is that each user has a distinct object 
| they communicate with, which determines their view of the world.

Ok.  That's good, an Avatar; but is an Avatar an IResource?

| Think of it this way: a resource should know what it looks like.  If you 
| are looking at a page that says "Welcome, Clark!", then "Welcome, Clark!" 
| should be an attribute of that resource.  Perhaps that data came from a 
| cookie, perhaps it was somehow identified by a session identifier in the 
| URL, but by whatever technique, by the time you are rendering a resource, 
| it should not be looking at the Request object to determine every little 
| thing about itself. 

Here is where we part ways.  This view of the the processing model
is an unnecessary restriction and should not be pladed upon web2.

| Things like accept-encodings and accept-languages 
| can modify or filter the result, but the basic data that's there should 
| be accessed by the application by looking at self, not by looking at 
| request.getSession().getComponent(IMyApplication).dataFor(self).(...) or 
| some similar monstrosity.

adapt(request,IState).bing

| >   (a) An Avatar is a "auto-generated" resource perhaps constructed
| >       from the SessionManager resource?
| 
| That's the way guard works and should continue to work, yes.

An avatar is not a resrouce; if it is, what is it's URL?  What does it
look like (to phrase it with your definition)?

| >   (b) Each Request object would have a 'stack' of 'previous-resources'
| >       that it has visited?  And that I could ask for the 'Avatar'
| >       resource in that 'stack' via a method on the request object?
| 
| It's not a stack; certain resources can just put themselves into a slot.  
| If an API is provided to build up large amounts of implicit state through 
| accretion during resource traversal, then the request will snowball in 
| complexity as more and more junk gets stuck to it by different bits of 
| different applications.

Assuming a 'peers' collection; you only need to access the peers
that your RequestHandler (or IResource) knows or cares about.

| getSession is designed to bridge requests automatically from within the 
| HTTP server's framework code, by setting cookies and such.  Session 
| management is a task that should be accomplished by a resource object 
| which can be independently tested, not by the server code.

No disagreement here.

| The proposed interface is something that would probably be *used* by a 
| session-manager resource, and might even represent the session, but its 
| purpose is simply to provide some per-request data that can be shared 
| between resources processing the same request, without resorting to 
| random attributes on the request, and with some way to link to the 
| resource that provided that data.

It is not necessary to link data associated with a request with 
the 'Resource' that provided the data.

| >Ok.  That's very nice.   Just remove the word 'Resource' and you're all
| >set; just let it be a regular object.
| 
| I suppose this doesn't make much difference.  I want it to be the 
| resource because the accompanying URL should point to it, but I suppose 
| that might be unnecessarily restrictive; at least the URL will point at 
| the thing that set it.

Well, if you want to _expose_ a URL to the user for them to view
their session; then, it is indeed a Resource.  However, not all
sessions need to be Resources, no?

| >This similar system would work with Session then?
| >
| >  request.setSession( my ISession object )
| 
| We could call this object a session, although in that case there is no 
| "ISession" - as I mentioned before, the object passed is 
| application-specific, and the framework should expect absolutely nothing 
| from it.

request.peer[IMySession] = mysession

| >Ok.  This is where I get confused.  The top level resource can handle
| >multiple requests.  I think you're just referring to one's application
| >data?  Perhaps...
| >
| >  request.setAppData(an IAppData object)
| >
| >where IAppData is any old object that the application wants.
| 
| The topmost resource for a particular user is unique to that user, 
| assuming they have logged in with a system like cred.  It's shared among 
| all users if there is no session management going on - in which case, why 
| would you need to know the currently logged in user :).

Does this top-most "resource" have a URL?  If not, then it
isn't a resource.  *poke*

| error-reporting behavior with Nevow

Ouch.  Is this good?

| In a similar situation, I need a Foo on the request.  It's set by 
| /my-app/foo.  I put my Foo resources at /my-app/foo/<blah>.  Someone else 
| puts one at /my-app/stuff/extra/current-foo.  The error-reporting now 
| becomes:
| 
|  SiteMismatchError: expected current site resource to provide 'IFoo', but 
|  instead found 'IMyApp' <MyApp at 13715> from http://example.com/my-app/
| 
| It is then possible for the developer to insert a 'print' statement into 
| a working Foo and watch the logs, which would allow them to see that the 
| URL which sets the IFoo it's using is http://example.com/my-app/foo/ - 
| this might assist in figuring out how to set up a similar structure for 
| /stuff/.

Wow.  Is this good?

I think this is overly complicated, and it stems from the attempt to
make "everything a resource".   My system is very very incompatible with
this approach; and I have lots and lots of customers who have written
custom code dependent on my current URLs, so I cannot be changing them.
No way am I adding a /foo/ to my path to reflect that 'foo' logged-in;
or perhaps I didn't understand.

I do hope I'm being helpful; I know it sounds argumentative, but
really, I'm trying to contribute.

Best,

Clark