Opened 2 years ago

Closed 22 months ago

#5799 defect closed invalid (invalid)

IMAP4 BODYSTRUCTURE code for handling Content-Disposition doesn't handle quoting correctly

Reported by: exarkun Owned by:
Priority: normal Milestone:
Component: mail Keywords:
Cc: fei@… Branch:
Author: Launchpad Bug:

Attachments (2)

5799-BODYSTRUCTURE_Content-Dispositiong-20120731.patch (4.9 KB) - added by argonemyth 2 years ago.
5799-BODYSTRUCTURE_Content-Dispositiong-20120801.patch (5.0 KB) - added by argonemyth 2 years ago.
Second try!

Download all attachments as: .zip

Change History (15)

comment:1 Changed 2 years ago by exarkun

  • Owner set to argonemyth

comment:2 Changed 2 years ago by argonemyth

  • Cc fei@… added
  • Keywords review added
  • Owner argonemyth deleted

First patch for the ticket!

  1. Added a new helper function called _ifQuatedString in imap4.py. Test for this function is added to IMAP4HelperTestCase.

2, _disposition will add quotes to parameter values depends on the parameter and value type. A new test is added to GetBodyStructureTests.

Please review & Let me know how to improve the patch, thanks!

comment:3 follow-up: Changed 2 years ago by antoine

  • Keywords review removed
  • Owner set to argonemyth

Ok, I'm not an IMAP expert, but I think there is a misunderstanding concerning the 78 characters threshold. This threshold applies when message headers are represented in their usual forms (i.e., under the "Name: Value" form in an e-mail header part); that's because, when a parameter is larger than 78 characters, RFC 2183 says the continuation form as defined in RFC 2184 should be used.

But a BODYSTRUCTURE return value does not know about continuation forms and, therefore, I think this threshold shouldn't exist here.

Other things:

  • _ifQuotedString sounds like a weird name, but I'm not a native English speaker; how about _isQuotableString, for example?
  • _ifQuotedString should return a boolean (i.e. True or False), not a plain integer
  • there should be a test where _ifQuotedString returns False

comment:4 follow-up: Changed 2 years ago by exarkun

Also, there's a misunderstanding of the parameters defined by the RFC. eg, there is no "creation-date-parm" anywhere in the Content-Disposition header. "creation-date-parm" is part of the grammar, and is defined as:

     creation-date-parm := "creation-date" "=" quoted-date-time

In other words, it's "creation-date" that can appear in a Content-Disposition header.

comment:5 in reply to: ↑ 4 Changed 2 years ago by argonemyth

Replying to exarkun:

Also, there's a misunderstanding of the parameters defined by the RFC. eg, there is no "creation-date-parm" anywhere in the Content-Disposition header. "creation-date-parm" is part of the grammar, and is defined as:

     creation-date-parm := "creation-date" "=" quoted-date-time

In other words, it's "creation-date" that can appear in a Content-Disposition header.

Can't believe I put 'creation-date-parm', I must be tired and copy-pasted the wrong string. Thanks for pointing it out!!

comment:6 in reply to: ↑ 3 Changed 2 years ago by argonemyth

Replying to antoine:

Ok, I'm not an IMAP expert, but I think there is a misunderstanding concerning the 78 characters threshold. This threshold applies when message headers are represented in their usual forms (i.e., under the "Name: Value" form in an e-mail header part); that's because, when a parameter is larger than 78 characters, RFC 2183 says the continuation form as defined in RFC 2184 should be used.

But a BODYSTRUCTURE return value does not know about continuation forms and, therefore, I think this threshold shouldn't exist here.

Thanks for clearing this up for me, I was a bit confused by the 78 character threshold! I will modify the patch accordingly.

Other things:

  • _ifQuotedString sounds like a weird name, but I'm not a native English speaker; how about _isQuotableString, for example?
  • _ifQuotedString should return a boolean (i.e. True or False), not a plain integer

I am not good at naming & I am not a native english speaker either :P I used the name ifQuotedString because the RFC memo call those string quoted-string...As for the function itself, I borrowed the structure of _needsQuote helper function in imap4.py. _needsQuote return 1 & 0 instead of the boolean value - I thought there might be some 'secret agenda' I was not aware of, so I kept the return value as 1 & 0. If I was the one choosing the return value, I would put True or False as well. Actually, I will do it in the next patch.

(P.S. the _ATOM_SPECIALS needs update and _needsQuote needs test. Actually, I think we should write a helper function to handle all the atoms needed in imap4.py. Right now, we have them scattered in different places. But I think that's for another ticket.)

comment:7 Changed 2 years ago by argonemyth

  • Keywords review added
  • Owner argonemyth deleted

So, the second patch:

  1. Removed the 78 character threshold in _disposition
  1. The return value of _ifQuotedString is a boolean now.
  1. Added a test when _ifQuotedString returns False: test_IfTokens in IMAP4HelperTestCase
  1. Corrected my silly mistaken pointed out by exarkun in comment 4.

Please review, thanks!

Changed 2 years ago by argonemyth

Second try!

comment:8 Changed 23 months ago by exarkun

  • Keywords review removed

Thanks. After some careful reading of the IMAP4 RFC (3501), I can't find any support for using the RFC 2183 rules for quoting content-disposition parameter values. It seems these should be quoted using the normal IMAP4 rules (for a string in a nested list) instead. Those rules should already be applied by the code, since the content-disposition parameter values end up in the list returned by getBodyStructure, which is in turn passed to collapseNestedLists, which deals with quoting on its own.

It looks like the added unit tests don't actually exercise this - instead they assert that quotes end up in the string in the list, which I suspect is wrong, since this will lead to double quoting when collapseNestedLists gets to that data.

The IMAP4 RFC is obtuse and I'm certainly not 100% confident of my reading. And there isn't even a section of the RFC I can point to and say "Look, there, it says how to quote these values". Instead, I came to this conclusion by observing:

  • Section 6.4.5 discusses the msg-att-static grammar rule, implying this is important for the response to FETCH commands
  • msg-att-static does indeed cover the BODYSTRUCTURE form of a FETCH response (it includes "BODY" ["STRUCTURE"] - obscurely optimizing the representation and making it hard to search for).
  • msg-att-static shows that the rule for the response following BODYSTRUCTURE is the body rule
  • The body rule (via body-type-1part and body-ext-1part) defines how the content-disposition is represented - in the body-fld-dsp grammar rule
  • body-fld-dsp is merely defined as "(" string SP body-fld-param ")" / nil
  • and body-fld-param is merely defined as "(" string SP string *(SP string SP string) ")" / nil. So there's nothing more complicated here than string, which implies no special quoting rules beyond what collapseNestedLists applies.

It would be good to have a test that exercises more of the encoding logic for this case though, so I think the ticket is still handy - but the implementation in trunk strikes me as correct.

comment:9 Changed 23 months ago by itamar

Would it be possible to see what some other client (say, Thunderbird) does?

comment:10 Changed 23 months ago by exarkun

It's a server-side behavior. :) Someone could certainly test this against other IMAP4 servers, yes. To do so, send an email to a mailbox attached to an IMAP4 account and then manually interact with the IMAP4 server:

S:   * OK IMAP4rev1 Service Ready
C:   a001 login mrc secret
S:   a001 OK LOGIN completed
C:   a002 select inbox
S:   * ...
C:   a003 FETCH <Message ID> BODYSTRUCTURE

Replace <Message ID> with the sequence number of the message you want to inspect. The result of select inbox may give some hints about good values for these (like if it says "50 EXISTS", 50 (or 49?) is probably a good neighborhood to poke around in).

comment:11 Changed 23 months ago by exarkun

Also, on further consideration, I'm not really sure the test coverage is presently all that deficient. There's test coverage for the result of getBodyStructure in the case of a message with an attachment. And there's full test coverage of collapseNestedLists. There could be a test that invokes the two units together, but since each unit actually works just fine on its own, and there's nothing terribly special about this specific interaction, I'm not sure it's worth it.

comment:12 Changed 23 months ago by exarkun

(Oops, I meant nearly full test coverage of collapseNestedLists, but the missing coverage is for functionality unrelated to this ticket)

comment:13 Changed 22 months ago by exarkun

  • Resolution set to invalid
  • Status changed from new to closed

Consulted gmail, which I'm prepared to take as a reasonable independent confirmation. For a file named < & ( " ? =, gmail produced a response including ("FILENAME" " < & ( \" ? ="). Thus, current Twisted behavior is correct and no fix is necessary.

Note: See TracTickets for help on using tickets.