Opened 7 years ago
Closed 7 years ago
#5763 defect closed fixed (fixed)
getBodyStructure wrong with multipart messages
Reported by: | Antoine Pitrou | Owned by: | Jean-Paul Calderone |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | Keywords: | ||
Cc: | Jean-Paul Calderone | Branch: |
branches/imap4-multipart-bodystructure-5763-2
branch-diff, diff-cov, branch-cov, buildbot |
Author: | exarkun, antoine |
Description
Judging by RFC 3501 and some real life examples, imap4.getBodyStructure() doesn't return a correct value for multipart messages.
Attachments (1)
Change History (14)
Changed 7 years ago by
Attachment: | getbodystructuremultipart.patch added |
---|
comment:1 Changed 7 years ago by
Keywords: | review added |
---|
comment:2 Changed 7 years ago by
Hi Antoine,
I'm far from an IMAP expert, but a few notes from trying to review this patch:
- There is a comment at the top of getBodyStructure that indicates it doesn't handle multipart messages correctly, it should be removed with this patch.
- The code in the
True
branch ofif disp:
(both times) is untested.
I've left the review keyword since I don't really feel qualified to do this review.
comment:3 Changed 7 years ago by
Keywords: | review removed |
---|---|
Owner: | set to Jean-Paul Calderone |
Status: | new → assigned |
Thanks! I think you have correctly divined the intent of the IMAP4 specification. The new behavior seems like an improvement over the old behavior. getBodyStructure
is still fairly obtuse, though (not your fault; my fault, I wrote it). I'm going to have a try at cleaning it up so that it is slightly easier to understand (maybe I can even make it easier to understand than the RFC!).
comment:4 Changed 7 years ago by
Author: | → exarkun |
---|---|
Branch: | → branches/imap4-multipart-bodystructure-5763 |
(In [34778]) Branching to 'imap4-multipart-bodystructure-5763'
comment:6 Changed 7 years ago by
comment:7 Changed 7 years ago by
Keywords: | review added |
---|---|
Owner: | Jean-Paul Calderone deleted |
Status: | assigned → new |
Found some more bugs, sadly. There's more tests now, which is nice at least. Some of them don't even run a full loopback IMAP4 client/server pair to exercise the BODYSTRUCTURE code.
I think the BODYSTRUCTURE computing code is clearly now too. Though I can imagine making it even clearer, and exposing the message structure objects as part of the public client API someday, I think maybe I've done enough in this branch for now.
comment:8 Changed 7 years ago by
Author: | exarkun → antoine, exarkun |
---|
comment:9 Changed 7 years ago by
Certain content-disposition parameters (the date parameters, and any arbitrary parameter) have their values quoted as per RFC 2183:
disposition-parm := filename-parm / creation-date-parm / modification-date-parm / read-date-parm / size-parm / parameter filename-parm := "filename" "=" value creation-date-parm := "creation-date" "=" quoted-date-time modification-date-parm := "modification-date" "=" quoted-date-time read-date-parm := "read-date" "=" quoted-date-time size-parm := "size" "=" 1*DIGIT
Where RFC 2045 defines parameter
as:
parameter := attribute "=" value attribute := token ; Matching of attributes ; is ALWAYS case-insensitive. value := token / quoted-string
That said, the original code didn't cater for that either, so I'm mainly mentioning this for reference.
I notice you sort the results in _unquotedAttrs, which should fix the issue with hash randomization. Of course, it also means the original order is not respected, which doesn't sound very much of a problem.
comment:10 Changed 7 years ago by
Keywords: | review removed |
---|---|
Owner: | set to Jean-Paul Calderone |
Okay. I think this is ready to land. I wish I could be surer, but that would require a PhD in IMAP which is beyond the scope of this review.
Some thoughts, though:
- It sounds like the issue antoine reported in his last comment should be reported as a separate bug. Or bugs, if the ordering thing is really important; it seems to me that the previous order is irrelevant there but I might be missing something.
- Basically all the new docstrings are invalid epytext. Please re-format them appropriately, and preview them by generating the documentation locally before merging. I don't understand why the buildbot isn't telling us about this; I guess that needs to be investigated. Unfortunately epytext doesn't support block quotes, you will have to use preformatted blocks, although bulleted or numbered lists should be fine for many of the uses.
getBodyStructure
is a lot more comprehensible, but, uh, still not super comprehensible. The new, expanded docstring really helps with understanding the return type though. I still don't entirely understand how this is possible without using more stuff from the 'email' module, though... I guess that, implicitly, any implementation ofIMessagePart
would either have to use it or provide its own whole MIME library?
Thanks very much to the authors.
comment:11 Changed 7 years ago by
It sounds like the issue antoine reported in his last comment should be reported as a separate bug. Or bugs, if the ordering thing is really important; it seems to me that the previous order is irrelevant there but I might be missing something.
I filed a ticket for handling quoting. The order is irrelevant as far as I know.
Basically all the new docstrings are invalid epytext. Please re-format them appropriately, and preview them by generating the documentation locally before merging. I don't understand why the buildbot isn't telling us about this; I guess that needs to be investigated. Unfortunately epytext doesn't support block quotes, you will have to use preformatted blocks, although bulleted or numbered lists should be fine for many of the uses.
Fixed those.
getBodyStructure is a lot more comprehensible, but, uh, still not super comprehensible. The new, expanded docstring really helps with understanding the return type though. I still don't entirely understand how this is possible without using more stuff from the 'email' module, though... I guess that, implicitly, any implementation of IMessagePart would either have to use it or provide its own whole MIME library?
Right. Although perhaps the email module will offer some header parsing functionality (I haven't checked whether it can handle, eg, Content-Disposition correctly, though).
comment:12 Changed 7 years ago by
Author: | antoine, exarkun → exarkun, antoine |
---|---|
Branch: | branches/imap4-multipart-bodystructure-5763 → branches/imap4-multipart-bodystructure-5763-2 |
(In [34929]) Branching to 'imap4-multipart-bodystructure-5763-2'
comment:13 Changed 7 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
(In [34933]) Merge imap4-multipart-bodystructure-5763-2
Author: antoine, exarkun Reviewer: antoine, glyph Fixes: #5763
Refactor the IMAP4 server's BODYSTRUCTURE response generation code, adding documentation, unit tests, and fixing several bugs:
- incorrect ordering of content-language/content-location
- incorrect structure for header parameters in multiple places
- overall incorrect structure for multipart/* messages
- incorrect empty lists which should instead be NILs
- failure to ever fill in certain other fields like content-id
Attached patch claims to fix this, and adds a test.