Opened 12 years ago
Closed 12 years ago
#2200 enhancement closed fixed (fixed)
Resource lookup is slow
Reported by: | Cyrus Daboo | Owned by: | Wilfredo Sánchez Vega |
---|---|---|---|
Priority: | highest | Milestone: | |
Component: | web2 | Keywords: | |
Cc: | Wilfredo Sánchez Vega, jknight | Branch: | |
Author: |
Description
When doing e.g., a PROPFIND on a large collection, performance is not good. This is because the children of the targeted resource are found using twisted.web2.server.locateResource. That method travserses from the site.resource down to the child for each look. Given that we know the parent resource and its URI there is no need to do the whole traversal when all we want are the child resources.
The attached patch adds a new locateChildResource method that starts with the known resource andworks down - its much faster. The patch also includes a new URL->resource map saved in the samway that the resource->URL map is. This helps avoid the doing a full lookup each time the same resource is looked up within the scope of one request.
Attachments (1)
Change History (31)
Changed 12 years ago by
Attachment: | twisted.web2.server.patch added |
---|
comment:1 Changed 12 years ago by
Cc: | Wilfredo Sánchez Vega added |
---|---|
Component: | web2.dav → web2 |
Owner: | changed from Wilfredo Sánchez Vega to jknight |
comment:2 Changed 12 years ago by
Owner: | changed from jknight to Wilfredo Sánchez Vega |
---|
comment:3 Changed 12 years ago by
Status: | new → assigned |
---|
comment:4 Changed 12 years ago by
Keywords: | review added |
---|
comment:5 Changed 12 years ago by
The change to cache lookups in locateResource probably needs to be thought through a little more. While it does make some sense that, in the context of a single request, looking up the same thing twice should produce the same result, that isn't going to be the behavior currently. Of course, usually we only look up resources once per request...and that doesn't even go through the locateResource function.
Are there other times when resource lookups may be done? If so, do they all want caching behavior? Maybe so, I'm not sure.
I can clearly see the reason for the locateChildResource addition, that is solving an obvious deficiency. I'm less clear on the reason for the caching url->resource. What calls that repeatedly?
This change is clearly wrong:
- segments = path.split("/") + segments = unquote(path).split("/")
resourcesByURL and urlsByResource are named backwards. xByY means to me you get an x when looking up by a y.
Also, this needs tests, of course.
comment:6 Changed 12 years ago by
Cc: | jknight added |
---|
comment:7 Changed 12 years ago by
url->resource caching is new here; the inverse cache already exists in trunk. One case in which it is looked up frequently is in implementing WebDAV ACLs, where we need to know a resource's URL in order to evaluate ACL rules. One example it in access control entries which are inherrited from another (typically a parent) resource. As a result, in a PROPFIND request, the same resources are looked up many times in a single request.
comment:8 Changed 12 years ago by
url->resource caching is new here; the inverse cache already exists in trunk
Right. The only part of my comment aimed at the inverse was that it's named wrong. And yes, it's named wrong in trunk too. :)
But, as to the rest, I still don't understand. Maybe if you point me at the code in question it'd be clearer.
I get the need for looking up child resources given a resource in hand, but I don't get the need to look up resources lots of times from the top (except that right now, without locateChildResource, that's the only way it can be done)
comment:9 Changed 12 years ago by
Status: | assigned → new |
---|
The change from:
segments = path.split("/")
assert segments[0] == "", "URL path didn't begin with '/': %s" % (path,)
segments = segments[1:]
segments = map(unquote, segments)
to:
segments = unquote(path).split("/")
assert segments[0] == "", "URL path didn't begin with '/': %s" % (path,)
segments = segments[1:]
is wrong.
It is possible for a "/" character (0x2F) to be quoted in the path. e.g. '/test/left%2Fright/child'. Doing unquote before split would result in segements: 'test', 'left', 'right', 'child', which is wrong. The unquote must be done on the segments AFTER the split.
comment:11 Changed 12 years ago by
Status: | new → assigned |
---|
comment:12 Changed 12 years ago by
OK, in r18607, I've committed a subset of this change plus some tests, and fixes to existing tests.
Missing so far is the reverse URL cache.
Also missing is the change to addResponseFilter()
, which isn't related to this bug, I think.
Also, I think that unquote(child_name)
in locateChildResource()
is unnecessary. Child names should already be unquoted by the caller, since the child name isn't a URL.
comment:13 Changed 12 years ago by
Owner: | changed from Wilfredo Sánchez Vega to David Reid |
---|---|
Status: | assigned → new |
comment:14 Changed 12 years ago by
Status: | new → assigned |
---|
First off, if twisted/web2/server.py needs something in twisted/web2/dav/util.py then that something needs to go somewhere else. In this case it's joinURL. Which goes back to #1569. We should have a standard way to do this, preferably sooner rather than later.
I'm a little wary of the name locateChildResource and the fact that it's placed on the request. However the former is completely accurate description it just seems like a potential source of confusion with locateChild. As to the latter it is nothing if not consistent, and moving it to the resource would just cause confusion.
The test coverage is a big improvement. The branch is currently building on all the buildbots that were green when I started this review. I'll update when they finish.
comment:15 Changed 12 years ago by
Keywords: | review removed |
---|---|
Owner: | changed from David Reid to Wilfredo Sánchez Vega |
Status: | assigned → new |
Tests pass on buildbots. Only unrelated errors on suse-py2.5-select and osx-py2.4-select.
I'd like the joinURL issue to be addressed before this is merged.
comment:17 Changed 12 years ago by
That this ticket doesn't fix another issue which already exists in the code isn't a reason to shoot it down.
comment:18 Changed 12 years ago by
Keywords: | review added |
---|---|
Owner: | changed from Wilfredo Sánchez Vega to David Reid |
joinURL
dependency is gone now
comment:19 Changed 12 years ago by
Keywords: | review removed |
---|---|
Owner: | changed from David Reid to Wilfredo Sánchez Vega |
Why is locateChildResource limited to immediate children?
It seems like 'I am at bar and i want to find /foo/bar/baz/bax' is just as valid a usecase.
You're not dealing with urlForResource returning none.
+ parentURL = self.urlForResource(parent) + if not parentURL.endswith("/"): + parentURL += "/" + url = parentURL + quote(childName)
Also should urlForResource raise an exception instead of returning None? It should almost always only return None if someone has instantiated and called urlFoResource on a resource that was not "located"
As a side note, the idea that a resource must always be "located" is starting to bother me more and more.
-David
comment:20 Changed 12 years ago by
A resource much be located in order for us to know a URL for it. I'd love to have other options there, but web2's architecture doesn't otherwise give resources any knowledge of what their URL(s) are. And, it only makes sense to know a URL in the context of a request, because locateChild() can present a whole different URL hierarchy based on the request.
comment:21 Changed 12 years ago by
I agree that urlForResource
should raise rather that return None
. Will fix.
comment:22 Changed 12 years ago by
Keywords: | review added |
---|---|
Owner: | changed from Wilfredo Sánchez Vega to David Reid |
OK, we now raise NoURLForResourceError if we can't find a URL for a resource.
comment:23 Changed 12 years ago by
locateChildResource
limited to immediate children because I don't need it. As soon as someone does need it, they are welcome to change childName to *children and do the recursion. Nothing here makes that impossible to do, but I don't think adding API before you need the functionality is a requirement.
comment:24 Changed 12 years ago by
Keywords: | review removed |
---|---|
Owner: | changed from David Reid to Wilfredo Sánchez Vega |
You still haven't addressed the first two issues raised.
Most importantly parentURL = self.urlForResource(parent) will now raise a NoURLForResource error. Should the caller of locateChildResource have to deal with this?
and second, I still don't see why locateChildResource is limited to a single path segment.
A resource shouldn't have to lose out on the performance gain from taking advantage of knowing it's position in the tree just because it wants to find a grandchild or beyond.
comment:25 Changed 12 years ago by
On the second issue, nothing stops them from calling locateChildResource
again on the returned child.
comment:26 Changed 12 years ago by
Keywords: | review added |
---|---|
Owner: | changed from Wilfredo Sánchez Vega to David Reid |
First issue requires clearer documentation... done.
comment:27 Changed 12 years ago by
Keywords: | review removed |
---|---|
Owner: | changed from David Reid to Wilfredo Sánchez Vega |
the web2 tests passed on all buildbots but there were unrelated failures in conch. Thanks for clearing up the documentation, we can defer a more flexible implementation of locateChildResource.
Merge.
comment:28 Changed 12 years ago by
Priority: | normal → highest |
---|
comment:29 Changed 12 years ago by
Status: | new → assigned |
---|
Will do when I'm not about to get on a plane for Saskatoon
comment:30 Changed 12 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
patch