Ticket #5763 defect closed fixed

Opened 10 months ago

Last modified 10 months ago

getBodyStructure wrong with multipart messages

Reported by: antoine Owned by: exarkun
Priority: normal Milestone:
Component: mail Keywords:
Cc: exarkun Branch: branches/imap4-multipart-bodystructure-5763-2
Author: exarkun, antoine Launchpad Bug:

Description

Judging by  RFC 3501 and some  real life examples, imap4.getBodyStructure() doesn't return a correct value for multipart messages.

Attachments

getbodystructuremultipart.patch Download (5.6 KB) - added by antoine 10 months ago.

Change History

Changed 10 months ago by antoine

1

Changed 10 months ago by antoine

  • keywords review added

Attached patch claims to fix this, and adds a test.

2

Changed 10 months ago by Alex

Hi Antoine,

I'm far from an IMAP expert, but a few notes from trying to review this patch:

* There is a comment at the top of getBodyStructure that indicates it doesn't handle multipart messages correctly, it should be removed with this patch. * The code in the True branch of if disp: (both times) is untested.

I've left the review keyword since I don't really feel qualified to do this review.

3

Changed 10 months ago by exarkun

  • keywords review removed
  • status changed from new to assigned
  • owner set to exarkun

Thanks! I think you have correctly divined the intent of the IMAP4 specification. The new behavior seems like an improvement over the old behavior. getBodyStructure is still fairly obtuse, though (not your fault; my fault, I wrote it). I'm going to have a try at cleaning it up so that it is slightly easier to understand (maybe I can even make it easier to understand than the RFC!).

4

Changed 10 months ago by exarkun

  • branch set to branches/imap4-multipart-bodystructure-5763
  • branch_author set to exarkun

(In [34778]) Branching to 'imap4-multipart-bodystructure-5763'

5

Changed 10 months ago by exarkun

(In [34779]) Apply getbodystructuremultipart.patch

refs #5763

6

Changed 10 months ago by exarkun

(In [34786]) Well, maybe this is close (except for, like, docs, and maybe some more tests). Just two more failing tests (for multipart messages, haha)

refs #5763

7

Changed 10 months ago by exarkun

  • owner exarkun deleted
  • status changed from assigned to new
  • keywords review added

Found some more bugs, sadly. There's more tests now, which is nice at least. Some of them don't even run a full loopback IMAP4 client/server pair to exercise the BODYSTRUCTURE code.

I think the BODYSTRUCTURE computing code is clearly now too. Though I can imagine making it even clearer, and exposing the message structure objects as part of the public client API someday, I think maybe I've done enough in this branch for now.

 Build results

8

Changed 10 months ago by exarkun

  • branch_author changed from exarkun to antoine, exarkun

9

Changed 10 months ago by antoine

Certain content-disposition parameters (the date parameters, and any arbitrary parameter) have their values quoted as per RFC 2183:

     disposition-parm := filename-parm
                       / creation-date-parm
                       / modification-date-parm
                       / read-date-parm
                       / size-parm
                       / parameter

     filename-parm := "filename" "=" value

     creation-date-parm := "creation-date" "=" quoted-date-time

     modification-date-parm := "modification-date" "=" quoted-date-time

     read-date-parm := "read-date" "=" quoted-date-time

     size-parm := "size" "=" 1*DIGIT

Where RFC 2045 defines parameter as:

     parameter := attribute "=" value

     attribute := token
                  ; Matching of attributes
                  ; is ALWAYS case-insensitive.

     value := token / quoted-string

That said, the original code didn't cater for that either, so I'm mainly mentioning this for reference.

I notice you sort the results in _unquotedAttrs, which should fix the issue with hash randomization. Of course, it also means the original order is not respected, which doesn't sound very much of a problem.

10

Changed 10 months ago by glyph

  • owner set to exarkun
  • keywords review removed

Okay. I think this is ready to land. I wish I could be surer, but that would require a PhD in IMAP which is beyond the scope of this review.

Some thoughts, though:

  1. It sounds like the issue antoine reported in his last comment should be reported as a separate bug. Or bugs, if the ordering thing is really important; it seems to me that the previous order is irrelevant there but I might be missing something.
  2. Basically all the new docstrings are invalid epytext. Please re-format them appropriately, and preview them by generating the documentation locally before merging. I don't understand why the buildbot isn't telling us about this; I guess that needs to be investigated. Unfortunately epytext doesn't support block quotes, you will have to use preformatted blocks, although bulleted or numbered lists should be fine for many of the uses.
  3. getBodyStructure is a lot more comprehensible, but, uh, still not super comprehensible. The new, expanded docstring really helps with understanding the return type though. I still don't entirely understand how this is possible without using more stuff from the 'email' module, though... I guess that, implicitly, any implementation of IMessagePart would either have to use it or provide its own whole MIME library?

Thanks very much to the authors.

11

Changed 10 months ago by exarkun

It sounds like the issue antoine reported in his last comment should be reported as a separate bug. Or bugs, if the ordering thing is really important; it seems to me that the previous order is irrelevant there but I might be missing something.

I filed a ticket for handling quoting. The order is irrelevant as far as I know.

Basically all the new docstrings are invalid epytext. Please re-format them appropriately, and preview them by generating the documentation locally before merging. I don't understand why the buildbot isn't telling us about this; I guess that needs to be investigated. Unfortunately epytext doesn't support block quotes, you will have to use preformatted blocks, although bulleted or numbered lists should be fine for many of the uses.

Fixed those.

getBodyStructure is a lot more comprehensible, but, uh, still not super comprehensible. The new, expanded docstring really helps with understanding the return type though. I still don't entirely understand how this is possible without using more stuff from the 'email' module, though... I guess that, implicitly, any implementation of IMessagePart would either have to use it or provide its own whole MIME library?

Right. Although perhaps the email module will offer some header parsing functionality (I haven't checked whether it can handle, eg, Content-Disposition correctly, though).

12

Changed 10 months ago by exarkun

  • branch changed from branches/imap4-multipart-bodystructure-5763 to branches/imap4-multipart-bodystructure-5763-2
  • branch_author changed from antoine, exarkun to exarkun, antoine

(In [34929]) Branching to 'imap4-multipart-bodystructure-5763-2'

13

Changed 10 months ago by exarkun

  • status changed from new to closed
  • resolution set to fixed

(In [34933]) Merge imap4-multipart-bodystructure-5763-2

Author: antoine, exarkun Reviewer: antoine, glyph Fixes: #5763

Refactor the IMAP4 server's BODYSTRUCTURE response generation code, adding documentation, unit tests, and fixing several bugs:

  • incorrect ordering of content-language/content-location
  • incorrect structure for header parameters in multiple places
  • overall incorrect structure for multipart/* messages
  • incorrect empty lists which should instead be NILs
  • failure to ever fill in certain other fields like content-id
Note: See TracTickets for help on using tickets.