[Twisted-web] HTTP-AUTH for web2 / Kudos on web2's operation

Sat Nov 19 22:47:08 MST 2005

(another long one, sorry; but I spent some time making it... shorter)

On Fri, Nov 18, 2005 at 07:54:58PM -0500, glyph at divmod.com wrote:
| Object publishing will remain the design.  You seem to have some 
| misconceptions about what that means, though...

Thanks, this is very helpful. I think the heart of this thread, as
observantly pointed out by L. Daniel Burr, is the definition of a
"resource", and hence what is expected of objects that implement
implement IResource? 

1.  If a resource exists, must it have a URI?

    No; a resource can exist without having a URI.  For example,
    "today's weather in Xocen, Mexico" is a resource, yet, I did 
    not find a weather page with this information. However, does 
    it make sense for an object to implement IResource if it does 
    not have a URI?

2.  May two different resources have the same URI?

    No; this is the point of being an identifier.  Certainly in real
    life, one's identifier, such as your name, may only be unique within
    a particular context.  However, this isn't real life.

3.  May a resource have more than one URI?

    Yes; a resource may have more than one identifier.  It is common
    for concepts, people, places and things to have more than one name.
    In databases, you might have a primary and a secondary key; and if
    you are to represent a drill-down via these two identifiers you
    will make two different URLs.  For example:

        /office-by-location/us/ct/bridgeport/
        /office-by-telephone/1.203.343.3333/

    I'm not saying these are great examples, but both of these URIs
    could in fact refer to the same office IResource.

4.  May a URI refer to two different resources over the course of time?

    No; the rationale for URIs is that they are stable resource
    identifiers and that the concept of the resource should not
    change.  However, the entity which the resource returns
    for a given "content negotiation" may vary over time in a 
    manner consistent with the definition of the resource.

I don't know if this helps you; but it helps me organize my thoughts
about what I expect and don't expect from an object implementing
IResource.  Let me continue now by responding to particular points
in your email...

| URLs are insufficiently descriptive anyway; things like the 
| current time, cookies, and even random numbers can influence 
| what object is present at a particular URL.

Given the definition above, this would seem to violate the 'immutable'
property of a resource.  If I have the same URI, ideally, the system
would send the request to the same IResource object.  What representation 
(aka entity) that IResource object returns (if it returns one at all)
after content negotiation may of course vary substantially.

| As I've mentioned in a few previous posts, under guard, the 
| SessionManager returns a Resource which corresponds to the current 
| session.  Generally the path portion of that URL will simply be "/" for 
| whatever server you're on.

 ...

| The equivalent IAuthenticator in the IResource model simply consumes no 
| segments and defers to another resource. 

(hopefully useful nit-picking, otherwise just ignore)

Then this Session Resource will know about how to delegate to the 
next stage of the request?  Does the Session have a URL, or is it
the same URL as the SessionManager Resource?   How about the
Authenticator?  Do the Session and Authenticator have the same URL?

I think neither SessionManager nor the Authenticator are resources; they
are lower-level RequestHandlers that are triggered before resource
resolution even occurs. Session is probably a Resource, where the
current Request object some-how refers to it. But it is not at the "top
of the resource tree". Its URL should be accessible, something like:
/sessions/3DBDK3DFE/ which publishes information about who the user is,
when they logged on, and details their activity.

| All you're talking about is removing the ability to consume segments 
| from the base API, making top level resources radically different from
| and incompatible with IResource objects which implement the bulk of
| existing, useful functionality in t.web and nevow.

 ...

| Its purpose is to provide an integration framework, in the spirit of 
| the various specifications that it implements.

My primary issue is that the IResource is over-used.  You have things
that are being shoe-horned into the resource model that aren't really
resources.   I think there is a lower-level abstraction that is missing.

| Out of curiosity, what methods do you think ISession provides?

For starters, user sessions completely break the REST model; and
are not friendly to debugging.  They should be used with extreme
caution.  That said, they are mandatory components of any commercial
system and therefore should have direct support -- warts and all.

I would use an ISession in my application only so that I can "authorize"
access to particular resources via the user's identity, and to track
what a particular user did on a particular time (I have to meet HIPPA
regulations for the system I've written).  So, the only thing I need
from the session is: 

  - the user to which the session was associated with
  - some identifier that is at least unique for each user
  - when the session was created
  - when the session timed out

I do not require anything else; my application is otherwise stateless
except for this reporting requirement.  That said, I will need some sort
of 'job-management' system for long-running tasks; but these will be
resources in and of themselves.

I would like it if sessions were actual Resources; but I do not require
that the URI for a request using the session contain the URI of the
session as a sub-string (as you seem to imply it should be).  A session
object needs to be accessible via a URI so that an administrator can
audit the activity of a questionable employee, etc.  

| At this point we may have to agree to disagree.  I don't find your URL 
| processing helpful, either, and I feel that the Resource API has proven 
| itself over the course of half a decade of my own, and many others', web 
| work by now.  Being able to consume multiple segments at once is an 
| important feature, but it's been around in Nevow for quite some time now.

I'll say it once more; I think your Resource method is delightful.
However, I think there is a very tiny lower layer you're missing.
Therefore, things that should be in that lower layer (filters,
authenticators, and session management) are instead being fudged as
exceptions or passed off as funky no-path-segment-consuming resources
that really cannot be accessed via their own URI.   *wink*

| >Each IRequest object has a member variable, 'peer', which is a mapping
| >from interfaces, such as IFoo onto the object that implements that
| >interface.  So, request.peer[ISession] will give me the session
| >associated with that request.  The appropriate __conform__ logic can
| >also be implemented so that adapt(peer,ISession) works.
| 
| That's the way that Nevow's session handling works and I think it's 
| worked out pretty poorly.  It leads to the same kind of confusion as the 
| context.  I would prefer to avoid repeating that mistake.

Ok.  Do we have a concrete suggestion for an alternative (that doesn't
have logically-equivalent problems)?  I've not seen one yet.  This
particular proposal is obvious and straight-forward (and clearly,
already implemented in Nevow).   You seem to be thinking that the "top
resource" could act as a place to put this stuff; but I'm not remotely
convinced that this isn't just an equivalent solution (different only by
the name of the ISession object).

What is the alternative, and why is it better?

| In fact there are lots of good reasons.  The main one is that by layering 
| IStage on top of IResource, you can defer back to other IResources 
| easily, and it is clear to the resources what portion of the path they 
| should be handling.  Another is that someone else might want to have your 
| Stage only apply to resources below a particular tree, let's say /cceapp/.

This is a very good point; I need to think on the 'mixing' concept
a bit more.  A consistent interface for all things (even if they
aren't really resources) is probably a good thing.  

| Your "IRequestHandler" abstraction breaks all kinds of useful patterns 
| for cooperation between different chunks of web code, as far as I can 
| tell.  In any case, concretely speaking, IRequestHandler sounds exactly 
| like IResource minus the ability to distinguish between different 
| handlers for different paths. 

I'm not saying that an IResource isn't a useful concept, nor that you'd
want to get rid of IResource.  What I'm saying is that if you _don't_
consume a path segment, you most likely do not have a resource.  Instead
you have an IRequestHandler.   That you try to model RequestHandler's as
a Resource will lead to logical inconsistencies: you will assume that it
has a URI that names it, when in fact, it doesn't.

Furthermore, if you distinguish between an IResource and a lower-level
IRequestHandler; you can extend the IResource with more goodies
such as a method to return the "Canonical URL" for that resource.
This would be a very valuable boon -- but it isn't possible for
your general IRequestHandler.

| >Assuming a 'peers' collection; you only need to access the peers
| >that your IRequestHandler (or IResource) knows or cares about.
| 
| I've worked with a couple of systems that worked that way, and that's 
| generally not what happens.  People notice that 'peers' (as you're 
| calling them) are handily available in some context they're working in, 
| and start using them.  Then they can't figure out how to write test cases 
| for their own code because they don't know how that contextual 
| information got set up.  Also, their model objects are totally broken 
| without lots of implicit context from the web-rendering code path.  See 
| also Zope's now-abandoned implicit acquisition for why this is bad.

This is a *perfect* argument for why you need a Request-Handling stage
that happens before Resource processing.  If someone is adding those
'peers', they should happen via RequestHandlers -- not Resources.

This would also allow you to 'cache' resources via URI; since the
request handlers would have already been run.   Please consider for
a moment the distinction that I'm making... picture this handler chain:

 SessionManager -> Authentication -> Resource-Processor -> Content-Compressor

where Resource-Processor is a IRequestHandler that implements your
existing Resource resolution and passes the Response on to the
Content-Compressor as it heads back to the client.  

It's just a layer one lower than your Resources (where they fit
into a stage of the request processing).

| >| getSession is designed to bridge requests automatically from within the
| >| HTTP server's framework code, by setting cookies and such.  Session
| >| management is a task that should be accomplished by a resource object
| >| which can be independently tested, not by the server code.
| >
| >No disagreement here.
| 
| Great!  I was waiting for one of those :).

I don't think that we have much of a disagreement.  Your system is,
IMHO, stella; or I wouldn't be using it nor spending time trying to
make it even better.

Kind Regards,

Clark