Opened 15 years ago

Last modified 10 years ago

#2122 enhancement new

Expose various accept failures to application-level code

Reported by: Jean-Paul Calderone Owned by:
Priority: normal Milestone:
Component: core Keywords:
Cc: Branch:
Author:

Description

Apps should be able to decide on policy when encountering resource exhaustion situations.

Accepting a new connection may fail because the process or system is out of file handles or because there is insufficient memory. An application may wish to close some existing connections in order to deal with this, rather than continue to allow all new connections to fail.

Change History (11)

comment:1 Changed 12 years ago by Glyph

"various" in the summary should be explained. It sounds like this ticket wants to deal mainly with what happens when accept() fails.

comment:2 Changed 12 years ago by Glyph

Owner: changed from Glyph to Jean-Paul Calderone

comment:3 Changed 12 years ago by Jean-Paul Calderone

This includes the EMFILE failure mode of accept(), so #3958 was a duplicate of this.

comment:4 Changed 11 years ago by <automation>

Owner: Jean-Paul Calderone deleted

comment:5 Changed 10 years ago by Itamar Turner-Trauring

File descriptor exhaustion can happen in a number of different places, since it can happen every time you open a file: callLater, callFromThread, datagram received, connection accepted on TCP port, new connection made, new listening port... plus anywhere in user code, sometimes in places where they will be catching and handling the exception.

Instead of trying to instrument all these places, it seems like a better implementation strategy for discovering resource exhaustion would be to have a logging observer that looks for EMFILE errors being logged. Since EMFILE is unexpected, the error will be logged in almost all cases.

comment:6 Changed 10 years ago by Jean-Paul Calderone

Can you clarify what you mean by including callLater, callFromThread, datagram received in that list? None of those allocate a new file descriptor in any existing reactor implementation that I know of.

comment:7 Changed 10 years ago by Itamar Turner-Trauring

callLater might be used to schedule a functions that opens a file, resulting in IOError with EMFILE. In such a situation, we should ideally be notifying the user that file descriptor exhaustion has occurred, even though it's not reactor code. The fact that it happened when opening a file rather than accepting a new connection is immaterial - we should notify as soon as possible so user code can handle the case.

comment:8 Changed 10 years ago by Jean-Paul Calderone

Okay. So you're thinking of providing handling of EMFILE from any and all application code, regardless of how it gets invoked.

comment:9 Changed 10 years ago by Itamar Turner-Trauring

Yes, exactly.

comment:10 in reply to:  8 Changed 10 years ago by Glyph

Replying to exarkun:

Okay. So you're thinking of providing handling of EMFILE from any and all application code, regardless of how it gets invoked.

This is an interesting idea! I don't quite like the shape of this proposal, because the main use-case I have is this:

If some function (accept(), most likely) wants to create a new file descriptor within the reactor, we have an advantage over the general oops-somebody-tried-to-open-a-file behavior: we can re-try after asking other resources to clean themselves up.

My original thought here was just to notify the factory that the accept was being called on behalf of, to see if it still had any open connections that it wanted to try and nuke before deciding that the listening socket needed to be removed from the reactor to prevent a busy-loop. (Now that #78 is closed, this is even more potentially useful!)

But this itamar's proposal for all logged EMFILEs makes an interesting point: every subsystem that might be consuming some FDs should get a notification that the system as a whole is under resource pressure, and possibly take the opportunity to release some of its own.

I still feel like there should be some prioritization here: if the FD that I can't create is for an incoming connection to an administrative console, I'd really like more things to take notice than if it's some random inbound request for a favicon.

Even cooler would be to propagate some similar notification before the actual EMFILE-return takes place, by examining resource limits every so often and using similar logic to what spawnProcess now uses to determine how many files are currently open.

comment:11 Changed 10 years ago by Itamar Turner-Trauring

Prioritization in notification doesn't really make sense: if a favicon hits EMFILE, sure you can ignore it, but real soon now something important will break due to lack of resources (the web page lookup that immediately follows the favicon, log file rotation, admin console, who knows). Pre-notification would be nice, but is more difficult and can be done later as an enhancement.

As far as retrying: failed accept() doesn't disconnect the end user, since they have TCP connection on the OS level already. Clean up operations involving abortConnection() will take at least one reactor iteration, so in most cases an immediate retry isn't feasible anyway. #5368 in its current form will have the retry happen in 1 second; that could be reduced (either a lower that default or in case a exhaustion handler is registered) if you or some other reviewer feel that's too long before retrying.

Note: See TracTickets for help on using tickets.