Opened 2 years ago

Closed 2 years ago

#5763 defect closed fixed (fixed)

getBodyStructure wrong with multipart messages

Reported by: antoine Owned by: exarkun
Priority: normal Milestone:
Component: mail Keywords:
Cc: exarkun Branch: branches/imap4-multipart-bodystructure-5763-2
(diff, github, buildbot, log)
Author: exarkun, antoine Launchpad Bug:

Description

Judging by RFC 3501 and some real life examples, imap4.getBodyStructure() doesn't return a correct value for multipart messages.

Attachments (1)

getbodystructuremultipart.patch (5.6 KB) - added by antoine 2 years ago.

Download all attachments as: .zip

Change History (14)

Changed 2 years ago by antoine

comment:1 Changed 2 years ago by antoine

  • Keywords review added

Attached patch claims to fix this, and adds a test.

comment:2 Changed 2 years ago by Alex

Hi Antoine,

I'm far from an IMAP expert, but a few notes from trying to review this patch:

  • There is a comment at the top of getBodyStructure that indicates it doesn't handle multipart messages correctly, it should be removed with this patch.
  • The code in the True branch of if disp: (both times) is untested.

I've left the review keyword since I don't really feel qualified to do this review.

comment:3 Changed 2 years ago by exarkun

  • Keywords review removed
  • Owner set to exarkun
  • Status changed from new to assigned

Thanks! I think you have correctly divined the intent of the IMAP4 specification. The new behavior seems like an improvement over the old behavior. getBodyStructure is still fairly obtuse, though (not your fault; my fault, I wrote it). I'm going to have a try at cleaning it up so that it is slightly easier to understand (maybe I can even make it easier to understand than the RFC!).

comment:4 Changed 2 years ago by exarkun

  • Author set to exarkun
  • Branch set to branches/imap4-multipart-bodystructure-5763

(In [34778]) Branching to 'imap4-multipart-bodystructure-5763'

comment:5 Changed 2 years ago by exarkun

(In [34779]) Apply getbodystructuremultipart.patch

refs #5763

comment:6 Changed 2 years ago by exarkun

(In [34786]) Well, maybe this is close (except for, like, docs, and maybe some more tests). Just two more failing tests (for multipart messages, haha)

refs #5763

comment:7 Changed 2 years ago by exarkun

  • Keywords review added
  • Owner exarkun deleted
  • Status changed from assigned to new

Found some more bugs, sadly. There's more tests now, which is nice at least. Some of them don't even run a full loopback IMAP4 client/server pair to exercise the BODYSTRUCTURE code.

I think the BODYSTRUCTURE computing code is clearly now too. Though I can imagine making it even clearer, and exposing the message structure objects as part of the public client API someday, I think maybe I've done enough in this branch for now.

Build results

comment:8 Changed 2 years ago by exarkun

  • Author changed from exarkun to antoine, exarkun

comment:9 Changed 2 years ago by antoine

Certain content-disposition parameters (the date parameters, and any arbitrary parameter) have their values quoted as per RFC 2183:

     disposition-parm := filename-parm
                       / creation-date-parm
                       / modification-date-parm
                       / read-date-parm
                       / size-parm
                       / parameter

     filename-parm := "filename" "=" value

     creation-date-parm := "creation-date" "=" quoted-date-time

     modification-date-parm := "modification-date" "=" quoted-date-time

     read-date-parm := "read-date" "=" quoted-date-time

     size-parm := "size" "=" 1*DIGIT

Where RFC 2045 defines parameter as:

     parameter := attribute "=" value

     attribute := token
                  ; Matching of attributes
                  ; is ALWAYS case-insensitive.

     value := token / quoted-string

That said, the original code didn't cater for that either, so I'm mainly mentioning this for reference.

I notice you sort the results in _unquotedAttrs, which should fix the issue with hash randomization. Of course, it also means the original order is not respected, which doesn't sound very much of a problem.

comment:10 Changed 2 years ago by glyph

  • Keywords review removed
  • Owner set to exarkun

Okay. I think this is ready to land. I wish I could be surer, but that would require a PhD in IMAP which is beyond the scope of this review.

Some thoughts, though:

  1. It sounds like the issue antoine reported in his last comment should be reported as a separate bug. Or bugs, if the ordering thing is really important; it seems to me that the previous order is irrelevant there but I might be missing something.
  2. Basically all the new docstrings are invalid epytext. Please re-format them appropriately, and preview them by generating the documentation locally before merging. I don't understand why the buildbot isn't telling us about this; I guess that needs to be investigated. Unfortunately epytext doesn't support block quotes, you will have to use preformatted blocks, although bulleted or numbered lists should be fine for many of the uses.
  3. getBodyStructure is a lot more comprehensible, but, uh, still not super comprehensible. The new, expanded docstring really helps with understanding the return type though. I still don't entirely understand how this is possible without using more stuff from the 'email' module, though... I guess that, implicitly, any implementation of IMessagePart would either have to use it or provide its own whole MIME library?

Thanks very much to the authors.

comment:11 Changed 2 years ago by exarkun

It sounds like the issue antoine reported in his last comment should be reported as a separate bug. Or bugs, if the ordering thing is really important; it seems to me that the previous order is irrelevant there but I might be missing something.

I filed a ticket for handling quoting. The order is irrelevant as far as I know.

Basically all the new docstrings are invalid epytext. Please re-format them appropriately, and preview them by generating the documentation locally before merging. I don't understand why the buildbot isn't telling us about this; I guess that needs to be investigated. Unfortunately epytext doesn't support block quotes, you will have to use preformatted blocks, although bulleted or numbered lists should be fine for many of the uses.

Fixed those.

getBodyStructure is a lot more comprehensible, but, uh, still not super comprehensible. The new, expanded docstring really helps with understanding the return type though. I still don't entirely understand how this is possible without using more stuff from the 'email' module, though... I guess that, implicitly, any implementation of IMessagePart would either have to use it or provide its own whole MIME library?

Right. Although perhaps the email module will offer some header parsing functionality (I haven't checked whether it can handle, eg, Content-Disposition correctly, though).

comment:12 Changed 2 years ago by exarkun

  • Author changed from antoine, exarkun to exarkun, antoine
  • Branch changed from branches/imap4-multipart-bodystructure-5763 to branches/imap4-multipart-bodystructure-5763-2

(In [34929]) Branching to 'imap4-multipart-bodystructure-5763-2'

comment:13 Changed 2 years ago by exarkun

  • Resolution set to fixed
  • Status changed from new to closed

(In [34933]) Merge imap4-multipart-bodystructure-5763-2

Author: antoine, exarkun
Reviewer: antoine, glyph
Fixes: #5763

Refactor the IMAP4 server's BODYSTRUCTURE response generation code, adding documentation,
unit tests, and fixing several bugs:

  • incorrect ordering of content-language/content-location
  • incorrect structure for header parameters in multiple places
  • overall incorrect structure for multipart/* messages
  • incorrect empty lists which should instead be NILs
  • failure to ever fill in certain other fields like content-id
Note: See TracTickets for help on using tickets.