Opened 12 years ago
Last modified 11 years ago
#4519 enhancement new
Delay parsing of request body until Request.args is accessed
Reported by: | jamesh | Owned by: | rlotun |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | web | Keywords: | |
Cc: | Branch: | ||
Author: |
Description
At the moment, if the request body for a post is form data (either url encoded or multipart), it is read into memory as the Request.args dictionary. This means that large requests can chew up large amounts of memory.
Attached is a patch based on one from ticket #288 that delays parsing of the request body until Request.args is accessed. This should maintain compatibility with the existing API while allowing applications that are concerned about memory usage to parse the request body themselves.
Attachments (2)
Change History (8)
Changed 12 years ago by
Attachment: | twisted-web-request-args.patch added |
---|
comment:1 Changed 12 years ago by
Keywords: | review added |
---|
comment:2 Changed 12 years ago by
Keywords: | review removed |
---|---|
Owner: | changed from jknight to jamesh |
Thanks! A few simple things:
Request._args
should be documented. TheRequest.args
docs are fine as-is, since the interface isn't changing here.- The
args
getter function should have a docstring, though. - There are some potential ways in which application code could break arg parsing now:
- Changing the
method
oruri
attribute before accessing theargs
attribute. I'm not sure how important this is. It would be easy to save extra references to the original value and use those in theargs
getter, though. - Reading from
content
before accessingargs
. This would prevent theargs
getter from getting all of the request bytes and probably cause the parsing to fail. A similar issue is that accessing theargs
attribute will now cause aself.content.seek(0, 0)
which could be surprising.
- Changing the
- In the new test, the triples quotes of the docstring should be on lines by themselves.
comment:3 Changed 12 years ago by
Replying to jamesh:
At the moment, if the request body for a post is form data (either url encoded or multipart), it is read into memory as the Request.args dictionary. This means that large requests can chew up large amounts of memory.
Question concerning mem usage: As I read from the twisted http.py sourcecode, for large file uploads there will be a high mem usage nevertheless, because the whole request body is read into Request's content property before any parsing whatsoever is done. So delaying parsing of the body does IMHO not address this problem, you have to hook in at an earlier point, i.e. when the content variable is built. So you had to patch HTTPChannel and not Request. I need a twisted application which allows me to do large file uploads without storing the whole file to memory first, because it will serve as a firmware upgrader for an embedded device, which has only 128MB of RAM while firmware upgrades might take up to 150MB. So my approach will be to rewrite some of the functionality of cgi.FieldStorage in a class that handles async input, because FieldStorage requires an input stream it can read from which doesn't exist in twisted because of its async nature (or am I wrong in this?). This should result in a body parsing "on-the-fly" while body is received not after receival is completed, so that I could store files directly in a temporary file on disk and not into memory. Do you see any better solution to the problem than my approach?
comment:4 Changed 12 years ago by
because the whole request body is read into Request's content property before any parsing whatsoever is done
If the content-length is large, Request.content is a temporary file, not in-memory storage.
I need a twisted application which allows me to do large file uploads without storing the whole file to memory first, because it will serve as a firmware upgrader for an embedded device, which has only 128MB of RAM while firmware upgrades might take up to 150MB
You're really after the functionality from #288.
Changed 11 years ago by
Attachment: | twisted-web-request-args-2-rlotun.patch added |
---|
Patch to address review concerns.
comment:5 Changed 11 years ago by
Keywords: | review added |
---|---|
Owner: | jamesh deleted |
This is an interesting bit of work and perhaps even a minor optimization. It's been a year since there has been any work on this so I thought I'd step in and continue the work. Please let me know if I'm stepping on anyone's toes.
I believe I have addressed all the review points except one: could you please elaborate why a self.content.seek(0, 0)
could be problematic? In this patch I've saved the current position, did a seek(0, 0)
to read the data, and then did a seek to the saved position. I am now unsure if there's a bit of logic elsewhere that perhaps advances the pointer for some reason.
comment:6 Changed 11 years ago by
Keywords: | review removed |
---|---|
Owner: | set to rlotun |
Thanks for picking this up rlotun.
My earlier comment about self.content.seek(0, 0)
is addressed perfectly by your position saving/restoring additions. A few other minor comments about the patch:
- The
parse_qs
check is pretty whitebox. If we're comfortable doing that, then we should be comfortable looking at the_args
attribute instead. In fact, since it doesn't involve any monkey patching, I'd say we should be more comfortable doing that. :) So how about making the test just check that_args
is uninitialized until the appropriate time, rather than monkey patchingparse_qs
? - The content stream position won't be restored if an exception is raised during parsing - the code could use a try/finally.
- The new attributes should have camelCase names
- It'd be nice to verify that changes to
uri
andpath
don't affect the value ofargs
. Similarly, it would be nice to exercise all of the codepaths through theargs
property. Right now there's no test coverage for the'multipart/form-data'
case (which is probably whence will come exceptions that can preventseek(0, 0)
from happening in the patch's current form). The arg parsing isn't really changed by this patch, so I guess there's no coverage for these cases in trunk either: it may make sense to add the test coverage separately, since it should apply equally to the old and new versions of this code. Doing it separately keeps the patch small and narrowly focused.
Thanks again!
twisted-web-request-args.patch