Opened 23 months ago

Last modified 22 months ago

#9739 defect new

Multipart returns 400 Bad Request on Python 3.7+

Reported by: Areeb Jamal Owned by:
Priority: high Milestone:
Component: web Keywords: multipart, python3.7, bad request
Cc: Branch:
Author:

Description

Related: https://twistedmatrix.com/trac/ticket/9448 Pull Request: https://github.com/twisted/twisted/pull/1012

Downstream Issue: https://github.com/django/channels/issues/1386

Body:

--d66f495a-c4d1-487c-9277-9ab1a20001cc
Content-Disposition: form-data; name="content"
Content-Type: multipart/form-data; charset=utf-8

Anything
--d66f495a-c4d1-487c-9277-9ab1a20001cc--

Header:

Content-Type: multipart/form-data; boundary=d66f495a-c4d1-487c-9277-9ab1a20001cc

Twisted returns 400 Bad Request on Python 3.7+, works fine on Python 3.6-

In my brief debugging, Twisted tried to parse an empty line b'' as valid boundary and throws, thus returning 400 Bad Request. That doesn't happen on Python 3.6-

This is where the exception is raised: https://github.com/twisted/twisted/blob/009b79eef42185d3c2e6a0553311c33504ddabe7/src/twisted/web/http.py#L907-L912

Hence, the problem possibly originates from this commit - https://github.com/python/cpython/commit/cc3fa204d357be5fafc10eb8c2a80fe0bca998f1

Hence, we are capped to Python 3.6 and can't move forward with upgrading to 3.7+

Any person using a library depending on twisted (like Django Channels) is impacted

Change History (4)

comment:1 Changed 23 months ago by Tom Most

comment:2 Changed 23 months ago by Tom Most

In Python 3.7 cgi.parse_multipart() was changed to use FieldStorage internally: https://github.com/python/cpython/commit/cc3fa204d357be5fafc10eb8c2a80fe0bca998f1

This is what prompted the changes in https://github.com/python/cpython/commit/cc3fa204d357be5fafc10eb8c2a80fe0bca998f1

I can reproduce this without any Twisted code:

Python 3.7.5 (default, Nov  7 2019, 10:50:52) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cgi
>>> 
>>> payload = """--d66f495a-c4d1-487c-9277-9ab1a20001cc
... Content-Disposition: form-data; name=\"content\"
... Content-Type: multipart/form-data; charset=utf-8
... Hello World
... --d66f495a-c4d1-487c-9277-9ab1a20001cc--"""
>>> from io import BytesIO
>>> content = BytesIO(payload.encode())
>>> pdict = {'boundary': b'd66f495a-c4d1-487c-9277-9ab1a20001cc', 'CONTENT-LENGTH': len(payload)}
>>> cgi.parse_multipart(content, pdict, encoding='utf8', errors='surrogateescape')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/cgi.py", line 222, in parse_multipart
    environ={'REQUEST_METHOD': 'POST'})
  File "/usr/lib/python3.7/cgi.py", line 489, in __init__
    self.read_multi(environ, keep_blank_values, strict_parsing)
  File "/usr/lib/python3.7/cgi.py", line 666, in read_multi
    self.encoding, self.errors, max_num_fields)
  File "/usr/lib/python3.7/cgi.py", line 489, in __init__
    self.read_multi(environ, keep_blank_values, strict_parsing)
  File "/usr/lib/python3.7/cgi.py", line 616, in read_multi
    raise ValueError('Invalid boundary in multipart form: %r' % (ib,))
ValueError: Invalid boundary in multipart form: b''
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
    from apport.fileutils import likely_packaged, get_recent_crashes
  File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
    from apport.report import Report
  File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
    import apport.fileutils
  File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
    from apport.packaging_impl import impl as packaging
  File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 24, in <module>
    import apt
  File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
    import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'

Original exception was:
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/cgi.py", line 222, in parse_multipart
    environ={'REQUEST_METHOD': 'POST'})
  File "/usr/lib/python3.7/cgi.py", line 489, in __init__
    self.read_multi(environ, keep_blank_values, strict_parsing)
  File "/usr/lib/python3.7/cgi.py", line 666, in read_multi
    self.encoding, self.errors, max_num_fields)
  File "/usr/lib/python3.7/cgi.py", line 489, in __init__
    self.read_multi(environ, keep_blank_values, strict_parsing)
  File "/usr/lib/python3.7/cgi.py", line 616, in read_multi
    raise ValueError('Invalid boundary in multipart form: %r' % (ib,))
ValueError: Invalid boundary in multipart form: b''
Version 0, edited 23 months ago by Tom Most (next)

comment:3 Changed 23 months ago by Tom Most

This header from your example doesn't look valid to me:

Content-Type: multipart/form-data; charset=utf-8

It is missing the boundary parameter, which is required. This explains the b''. Also note that charset isn't defined for multipart/form-data (see that RFC --- charsets in form data are weird).

I'd guess that the behavioral change is due to bpo-29979. One aspect of the "consistency" it achieves is support for nested multipart, whereas the old implementation ignored the {{Content-Type}} of chunks as far as I can see.

What is generating this malformed input?

comment:4 Changed 22 months ago by Areeb Jamal

OkHTTP https://square.github.io/okhttp/

However, the request header does have a boundary, but parts in multipart don't. Not sure if it valid for parts headers to exclude boundary or not

Note: See TracTickets for help on using tickets.