Opened 8 years ago

Closed 8 years ago

#2200 enhancement closed fixed (fixed)

Resource lookup is slow

Reported by: Cyrus Daboo Owned by: wsanchez
Priority: highest Milestone:
Component: web2 Keywords:
Cc: wsanchez, jknight Branch:
Author: Launchpad Bug:

Description

When doing e.g., a PROPFIND on a large collection, performance is not good. This is because the children of the targeted resource are found using twisted.web2.server.locateResource. That method travserses from the site.resource down to the child for each look. Given that we know the parent resource and its URI there is no need to do the whole traversal when all we want are the child resources.

The attached patch adds a new locateChildResource method that starts with the known resource andworks down - its much faster. The patch also includes a new URL->resource map saved in the samway that the resource->URL map is. This helps avoid the doing a full lookup each time the same resource is looked up within the scope of one request.

Attachments (1)

twisted.web2.server.patch (6.8 KB) - added by Cyrus Daboo 8 years ago.
patch

Download all attachments as: .zip

Change History (31)

Changed 8 years ago by Cyrus Daboo

patch

comment:1 Changed 8 years ago by wsanchez

  • Cc wsanchez added
  • Component changed from web2.dav to web2
  • Owner changed from wsanchez to jknight

comment:2 Changed 8 years ago by wsanchez

  • Owner changed from jknight to wsanchez

comment:3 Changed 8 years ago by wsanchez

  • Status changed from new to assigned

comment:4 Changed 8 years ago by wsanchez

  • Keywords review added

comment:5 Changed 8 years ago by jknight

The change to cache lookups in locateResource probably needs to be thought through a little more. While it does make some sense that, in the context of a single request, looking up the same thing twice should produce the same result, that isn't going to be the behavior currently. Of course, usually we only look up resources once per request...and that doesn't even go through the locateResource function.

Are there other times when resource lookups may be done? If so, do they all want caching behavior? Maybe so, I'm not sure.

I can clearly see the reason for the locateChildResource addition, that is solving an obvious deficiency. I'm less clear on the reason for the caching url->resource. What calls that repeatedly?

This change is clearly wrong:

- segments = path.split("/") 
+ segments = unquote(path).split("/")

resourcesByURL and urlsByResource are named backwards. xByY means to me you get an x when looking up by a y.

Also, this needs tests, of course.

comment:6 Changed 8 years ago by jknight

  • Cc jknight added

comment:7 Changed 8 years ago by wsanchez

url->resource caching is new here; the inverse cache already exists in trunk. One case in which it is looked up frequently is in implementing WebDAV ACLs, where we need to know a resource's URL in order to evaluate ACL rules. One example it in access control entries which are inherrited from another (typically a parent) resource. As a result, in a PROPFIND request, the same resources are looked up many times in a single request.

comment:8 Changed 8 years ago by jknight

url->resource caching is new here; the inverse cache already exists in trunk

Right. The only part of my comment aimed at the inverse was that it's named wrong. And yes, it's named wrong in trunk too. :)

But, as to the rest, I still don't understand. Maybe if you point me at the code in question it'd be clearer.

I get the need for looking up child resources given a resource in hand, but I don't get the need to look up resources lots of times from the top (except that right now, without locateChildResource, that's the only way it can be done)

comment:9 Changed 8 years ago by Cyrus Daboo

  • Status changed from assigned to new

The change from:

segments = path.split("/")
assert segments[0] == "", "URL path didn't begin with '/': %s" % (path,)
segments = segments[1:]
segments = map(unquote, segments)

to:

segments = unquote(path).split("/")
assert segments[0] == "", "URL path didn't begin with '/': %s" % (path,)
segments = segments[1:]

is wrong.

It is possible for a "/" character (0x2F) to be quoted in the path. e.g. '/test/left%2Fright/child'. Doing unquote before split would result in segements: 'test', 'left', 'right', 'child', which is wrong. The unquote must be done on the segments AFTER the split.

comment:10 Changed 8 years ago by wsanchez

branch: web2-lookup-2200

comment:11 Changed 8 years ago by wsanchez

  • Status changed from new to assigned

comment:12 Changed 8 years ago by wsanchez

OK, in r18607, I've committed a subset of this change plus some tests, and fixes to existing tests.

Missing so far is the reverse URL cache.

Also missing is the change to addResponseFilter(), which isn't related to this bug, I think.

Also, I think that unquote(child_name) in locateChildResource() is unnecessary. Child names should already be unquoted by the caller, since the child name isn't a URL.

comment:13 Changed 8 years ago by wsanchez

  • Owner changed from wsanchez to dreid
  • Status changed from assigned to new

comment:14 Changed 8 years ago by dreid

  • Status changed from new to assigned

First off, if twisted/web2/server.py needs something in twisted/web2/dav/util.py then that something needs to go somewhere else. In this case it's joinURL. Which goes back to #1569. We should have a standard way to do this, preferably sooner rather than later.

I'm a little wary of the name locateChildResource and the fact that it's placed on the request. However the former is completely accurate description it just seems like a potential source of confusion with locateChild. As to the latter it is nothing if not consistent, and moving it to the resource would just cause confusion.

The test coverage is a big improvement. The branch is currently building on all the buildbots that were green when I started this review. I'll update when they finish.

comment:15 Changed 8 years ago by dreid

  • Keywords review removed
  • Owner changed from dreid to wsanchez
  • Status changed from assigned to new

Tests pass on buildbots. Only unrelated errors on suse-py2.5-select and osx-py2.4-select.

I'd like the joinURL issue to be addressed before this is merged.

comment:16 Changed 8 years ago by dreid

I this shouldn't be merged until #2224 is addressed.

comment:17 Changed 8 years ago by wsanchez

That this ticket doesn't fix another issue which already exists in the code isn't a reason to shoot it down.

comment:18 Changed 8 years ago by wsanchez

  • Keywords review added
  • Owner changed from wsanchez to dreid

joinURL dependency is gone now

comment:19 Changed 8 years ago by dreid

  • Keywords review removed
  • Owner changed from dreid to wsanchez

Why is locateChildResource limited to immediate children?

It seems like 'I am at bar and i want to find /foo/bar/baz/bax' is just as valid a usecase.

You're not dealing with urlForResource returning none.

+        parentURL = self.urlForResource(parent)
+        if not parentURL.endswith("/"):
+            parentURL += "/"
+        url = parentURL + quote(childName)

Also should urlForResource raise an exception instead of returning None? It should almost always only return None if someone has instantiated and called urlFoResource on a resource that was not "located"

As a side note, the idea that a resource must always be "located" is starting to bother me more and more.

-David

comment:20 Changed 8 years ago by wsanchez

A resource much be located in order for us to know a URL for it. I'd love to have other options there, but web2's architecture doesn't otherwise give resources any knowledge of what their URL(s) are. And, it only makes sense to know a URL in the context of a request, because locateChild() can present a whole different URL hierarchy based on the request.

comment:21 Changed 8 years ago by wsanchez

I agree that urlForResource should raise rather that return None. Will fix.

comment:22 Changed 8 years ago by wsanchez

  • Keywords review added
  • Owner changed from wsanchez to dreid

OK, we now raise NoURLForResourceError if we can't find a URL for a resource.

comment:23 Changed 8 years ago by wsanchez

locateChildResource limited to immediate children because I don't need it. As soon as someone does need it, they are welcome to change childName to *children and do the recursion. Nothing here makes that impossible to do, but I don't think adding API before you need the functionality is a requirement.

comment:24 Changed 8 years ago by dreid

  • Keywords review removed
  • Owner changed from dreid to wsanchez

You still haven't addressed the first two issues raised.

Most importantly parentURL = self.urlForResource(parent) will now raise a NoURLForResource error. Should the caller of locateChildResource have to deal with this?

and second, I still don't see why locateChildResource is limited to a single path segment.

A resource shouldn't have to lose out on the performance gain from taking advantage of knowing it's position in the tree just because it wants to find a grandchild or beyond.

comment:25 Changed 8 years ago by wsanchez

On the second issue, nothing stops them from calling locateChildResource again on the returned child.

comment:26 Changed 8 years ago by wsanchez

  • Keywords review added
  • Owner changed from wsanchez to dreid

First issue requires clearer documentation... done.

comment:27 Changed 8 years ago by dreid

  • Keywords review removed
  • Owner changed from dreid to wsanchez

the web2 tests passed on all buildbots but there were unrelated failures in conch. Thanks for clearing up the documentation, we can defer a more flexible implementation of locateChildResource.

Merge.

comment:28 Changed 8 years ago by wsanchez

  • Priority changed from normal to highest

comment:29 Changed 8 years ago by wsanchez

  • Status changed from new to assigned

Will do when I'm not about to get on a plane for Saskatoon

comment:30 Changed 8 years ago by wsanchez

  • Resolution set to fixed
  • Status changed from assigned to closed

(In [18813]) Merge web2-lookup-2200.
Reviewed by dreid
Fixes #2200

Note: See TracTickets for help on using tickets.