Opened 5 years ago

Last modified 2 years ago

#3795 defect new

_ChunkedTransferDecoder may slow down quadratically and exhaust memory

Reported by: ivank Owned by:
Priority: high Milestone:
Component: web Keywords: security
Cc: exarkun, jknight, ivank Branch:
Author: Launchpad Bug:

Description

twisted.web.http._ChunkedTransferDecoder trusts the input too much, allowing malicious or corrupted input to quadratically slow down the parser and exhaust memory. This can happen when the parser reads the chunk length, or the trailer.

Attachments (9)

secure-chunked-3795-1.diff (12.8 KB) - added by ivank 5 years ago.
fix _ChunkedTransferDecoder by limiting self._buffer for chunk-length and trailer, and completely ignoring chunk extension data.
secure-chunked-3795-2.diff (19.8 KB) - added by ivank 5 years ago.
secure _ChunkedTransferDecoder while remaining bug-compatible
secure-chunked-3795-3.diff (12.7 KB) - added by ivank 5 years ago.
secure _ChunkedTransferDecoder properly without changing any behavior
secure-chunked-3795-4.diff (13.3 KB) - added by ivank 5 years ago.
the only change over patch 3 is to fix one major bug with padded negative chunk lengths
secure-chunked-3795-5.diff (13.9 KB) - added by ivank 5 years ago.
only change over patch 4 is to exercise the reattachCR logic
secure-chunked-3795-6.diff (18.9 KB) - added by ivank 5 years ago.
address exarkun's first review; fix trailer-parsing problems
secure-chunked-3795-7-hack.diff (20.3 KB) - added by ivank 5 years ago.
make strings with small chunks parse faster at the expense of normal ones (horrible hack)
secure-chunked-3795-8.diff (20.4 KB) - added by ivank 5 years ago.
a faster/cleaner version of the splitting idea in patch 7; but the implementation is still confusing
poc-3795.py (856 bytes) - added by MostAwesomeDude 2 years ago.
PoC for #3795, from ivan.

Download all attachments as: .zip

Change History (37)

comment:1 Changed 5 years ago by ivank

  • Component changed from core to web

comment:2 Changed 5 years ago by ivank

Patch is coming.

Changed 5 years ago by ivank

fix _ChunkedTransferDecoder by limiting self._buffer for chunk-length and trailer, and completely ignoring chunk extension data.

comment:3 Changed 5 years ago by ivank

  • Keywords review added

comment:4 Changed 5 years ago by jknight

  • Keywords review removed

This seems way too complicated. Why doesn't it work like a normal line reader and present the entire line for processing all at once, but limit the length of that line. All the separate states and complexity for ignoring the extensions seem totally superfluous.

Also: the code in the "trailer" case looks completely broken to me, both before and after your change, if it actually ever receives trailers...

That it's being used for both the simple "look for \r\n" processing after data chunks and for trailer processing seems wrong.

E.g. (hopefully I didn't screw up the syntax of this, I didn't go verify against the RFC):

1\r\n
a\r\n
0\r\n
Content-MD5: 0cc175b9c0f1b6a831c399e269772661\r\n
\r\n

Finally: does raising RuntimeException here actually do anything sensible?

Changed 5 years ago by ivank

secure _ChunkedTransferDecoder while remaining bug-compatible

comment:5 Changed 5 years ago by ivank

_ChunkedTransferDecoder is indeed broken (before and after), because it assumes the entire terminal chunk will be delivered at once. I have separated 'trailer' into 'trailer' and 'terminal-chunk', and added an optional callback to receive the entire terminal chunk. The quality of the code has gone down significantly, but all the tests pass.

I'll address the rest of the review soon, I hope.

comment:6 Changed 5 years ago by ivank

Er, don't look at my patches. Everything is wrong right now.

comment:7 Changed 5 years ago by ivank

Please ignore my first two patches forever.

<ivan> okay, I finally see what happened today
<ivan> I assumed finishCallback is called with the HTTP/1.1 trailer
<ivan> but finishCallback is actually called with leftover garbage
<ivan> either by design or by bug, I can't care at this point
<ivan> "This code accepts anything in the trailers section and discards it" indeed

comment:8 Changed 5 years ago by jknight

What's after the end of the chunked section is certainly not garbage: it's the beginning of the next request or response.

Changed 5 years ago by ivank

secure _ChunkedTransferDecoder properly without changing any behavior

comment:9 Changed 5 years ago by ivank

The tests still need docstrings, but how does the general idea look in secure-chunked-3795-3.diff?

If it's unnecessarily confusing, I'll think about "normal line reader and present the entire line for processing all at once" and maybe look more at web2's parser, or other parsers.

Changed 5 years ago by ivank

the only change over patch 3 is to fix one major bug with padded negative chunk lengths

Changed 5 years ago by ivank

only change over patch 4 is to exercise the reattachCR logic

comment:10 Changed 5 years ago by exarkun

Generally I think this looks good. The old implementation had the nice trait of fitting on my screen all at once (almost the only good thing I can think of to say about it now), which this new version doesn't. I can live with that, since breaking it up at all probably means adding more super slow Python method calls (otoh, maybe that's fine, and we should just have a version written in C for people who care about it being fast; plus it's not like I have any benchmarks that show anything about the performance of this code).

  1. The patch adds lines with trailing whitespace; please clean these up.
  2. You don't need lambdas with the assertRaises lines, you can use assertRaises(exc, f, *args).
  3. All the new places which raise RuntimeError use "%s" % (repr(list),). You could just use "%s" % (list,) and get the same outcome. Lists str and repr the same.
  4. It would seem slightly nicer if the check for a negative length were done on self.length rather than on the unparsed chunk length string. Likewise for the too-big-length check immediately following the negative check. (As long as that code is just about to parse it, it seems to make sense to avoid doing more checks than necessary on the unparsed version.) If you intentionally did the check against the string to handle the -0 case, I'm not sure if that's really worth it. -0 is invalid, so a client can't really complain if we interpret it the same way as 0, right? On the other hand, if you feel strongly that it's important to reject a chunk length of -0, I'm not going to argue.
  5. It might be nice if the 'crlf' state were handled after the 'body' state. This causes the code to continue to read in roughly the order in which the states are referenced in the code.
  6. In the _ChunkedTransferDecoder class docstring, don't use * for emphasis. B{} is the epytext for bold, I{} for italic. Though I'd suggest just rewriting the sentence before the one you added to the finishCallback ivar docs so that it doesn't imply that the trailer is what gets passed to that callback; then you won't need to correct it afterwards.
  7. Operators should have whitespace around them. CRLF_at + 2, not CRLF_at+2, etc.
  8. CRLF_at should be crlfAt or maybe even newlineAt.
  9. More importantly than any of that, the else clause in the 'crlf' state in _ChunkedTransferDecoder is untested.
  10. The strip call in the else clause in the chunk-length state is untested. Is extra space in these places allowed by spec, anyway?
  11. The new 'crlf' state is undocumented. The 'finish' flag is still documented.

comment:11 Changed 5 years ago by jknight

This implementation still gets trailer processing incorrect. See the example above. Trailer can have crlfs in it. Please add the chunk I wrote above as a test-case. Or, twisted.web2.test.test_http.testHTTP1_1_chunking also has a test case like that.

I still don't like the way this code is written one bit. It would be a lot easier to understand if it switched between line mode and raw mode with LineReceiver, like everything else in twisted that needs to parse lines does. (and like the code in web2 which does the same thing in a *lot* less code does.)

comment:12 Changed 5 years ago by exarkun

Using LineReceiver would probably shorten it a lot. I've been trying to avoid LineReceiver because it's got a lot of crap in, like pausing and dependence on a transport and on undocumented attributes of TCP transports.

Maybe taking a LineReceiver-like approach to this code without actually using LineReceiver would still help, though. Accumulate bytes, wait for an entire line to appear (when appropriate) before trying to process it, and enforcing a line length limit. I can't quite imagine what the code would end up looking like, but maybe it would be good.

comment:13 Changed 5 years ago by jknight

Well, if the chunk handler wasn't split out into a separate class you wouldn't have to worry about that, because the HTTP channel already *is* a line receiver.

comment:14 Changed 5 years ago by exarkun

Splitting it out makes it easier to test and easier to reuse in multiple places (eg, in the server and the client). Putting the code back into HTTPChannel doesn't really solve the LineReceiver problems either, it just sweeps them under a rug.

Of course, it would be nice if there were something like LineReceiver that I didn't feel bad about using, and there is an obvious solution to that problem. ;)

Changed 5 years ago by ivank

address exarkun's first review; fix trailer-parsing problems

comment:15 Changed 5 years ago by ivank

Patch 6 addresses just exarkun's first review, and the trailer-parsing problem.

Neither the original or patch5 version of _ChunkedTransferDecoder handled non-empty trailers correctly. The original would stop processing data and buffer forever when encountering a non-empty trailer. patch5 _ChunkedTransferDecoder assumed a non-empty trailer would end on \r\n, which is incorrect.

I think my _ChunkedTransferDecoder still doesn't solve all the memory/CPU-wasting problems, because in the 'body' state, it still does this:

chunk, data = data[:self.length], data[self.length:]

I think I might try to do this in C or Cython properly, with maybe real benchmarks.

comment:16 Changed 5 years ago by ivank

I still have trailing whitespace problems, and docstring problems.

comment:17 Changed 5 years ago by exarkun

An extra couple string slices aren't going to make a huge difference to performance. No doubt it could be better, but keep in mind all the other layers in the stack this code sits in. Lots of unnecessary copying is going on already; avoiding it in this one place will probably be lost in the noise.

comment:18 Changed 5 years ago by ivank

The issue is that malicious input could contain a large amount of 1-byte chunks, leading to far more copying than anyone would expect. This isn't a large problem with 1500 byte strings, but it may be a huge problem with much larger ones. I don't know if anyone will send very large strings into it, though.

comment:19 Changed 5 years ago by exarkun

Ah, good point. I was only thinking about the case of large chunk lengths and small sends, but the case of small chunk lengths and big sends is indeed vulnerable now.

Changed 5 years ago by ivank

make strings with small chunks parse faster at the expense of normal ones (horrible hack)

comment:20 Changed 5 years ago by ivank

  • Owner ivank deleted

This "solves" my problems by doing dumb tokenization, but I can't really prove it, since this has no benchmarks. I don't think this hack is for review.

comment:21 Changed 5 years ago by jknight

This code keeps getting less and less understandable!

There IS NO actual problem with 1-byte chunks and large strings, because dataReceived cannot be called with that much data at once! (It's limited to 65536 bytes by default).

And, if you're still concerned about such a problem, make this code use LineReceiver, and *fix it there*.

This type of change is absolutely useless without having changed the more fundamental parts of twisted that have the same issue, and if you fix it there, use it from there.

comment:22 Changed 5 years ago by ivank

I agree with you, the code is bad. There was a problem with small chunks, though: too many states in _ChunkedTransferDecoder would copy (most of) this 65536 byte string, for every single chunk. So yes, this really should be using LineReceiver instead.

comment:23 Changed 5 years ago by exarkun

LineReceiver has the same problem. James said make this code use LineReceiver?, and *fix it there*.. Just using LineReceiver isn't good enough.

What I'm really curious about is why this ticket is now unowned? ivank, are you saying you're not interested in finishing this work?

Changed 5 years ago by ivank

a faster/cleaner version of the splitting idea in patch 7; but the implementation is still confusing

comment:24 Changed 5 years ago by ivank

Sorry, I can't commit to porting _ChunkedTransferDecoder to use LineReceiver and fixing LineReceiver. My last patch is attached in case anyone wants to compare its performance to something else in the future.

comment:25 Changed 5 years ago by jknight

Just using LineReceiver isn't good enough.

Sure, but it'd be a huge improvement for this code to use line receiver even if the performance degradation with small lines and large blocks of data wasn't fixed.

comment:26 Changed 4 years ago by <automation>

Changed 2 years ago by MostAwesomeDude

PoC for #3795, from ivan.

comment:27 Changed 2 years ago by MostAwesomeDude

Attached a PoC. I was able to reproduce this by running:

$ twistd -n web --port 8080 --path .

The PoC chewed up 20% or so of one CPU, but the server chewed up 100% of another CPU. This is a problem.

It probably doesn't matter whether the server is running a path or a WSGI; I would imagine this works either way. Similarly, the path requested shouldn't matter, just the chunked encoding.

comment:28 Changed 2 years ago by exarkun

#6149 was a duplicate of this.

Note: See TracTickets for help on using tickets.