Opened 16 years ago

Last modified 13 years ago

#1228 defect new

selectreactor throws exceptions under pressure on windows (WSAENOBUFS)

Reported by: ghazel Owned by:
Priority: highest Milestone:
Component: core Keywords: win32
Cc: Jean-Paul Calderone, spiv, ghazel, teratorn, k3mper, Trent.Nelson, ivank Branch:
Author:

Description (last modified by jknight)

This is a dup of #816

Attachments (2)

reactor.critical.zip (736 bytes) - added by ghazel 16 years ago.
win32patch.diff (4.6 KB) - added by ghazel 16 years ago.

Download all attachments as: .zip

Change History (16)

Changed 16 years ago by ghazel

Attachment: reactor.critical.zip added

comment:1 Changed 16 years ago by ghazel

The selectreactor behaves very poorly when it hits either:

  File "twisted\internet\selectreactor.pyc", line 49, in win32select
exceptions.ValueError: too many file descriptors in select()

or:

  File "twisted\internet\selectreactor.pyc", line 49, in win32select
select.error: (10055, 'An operation on a socket could not be performed because 
the system lacked sufficient buffer space or because a queue was full')

Specifically, it throws exceptions and does not recover. Both errors should be 
treated in much the same way - it means too many file descriptors were passed 
to select. The number of connections should then be dynamically limited. In the 
case of the 10055 error the limit can change at run time, so the reactor should 
periodically try to increase the limit. In the case of the ValueError the limit 
is fixed, and connections can never exceed this amount.

It should be noted that the prunning function has no effect on windows, because 
the ValueError has nothing to do with the file descriptor id's, only the 
quantity of them.

Attached is a server and client which simulate heavy use of selectreactor. They 
quickly (<= 512 connections) demonstrate whichever limit is hit first.

comment:2 Changed 16 years ago by Jean-Paul Calderone

Downgrading to bug.

Pretty sure this is a dup of another issue?

Changed 16 years ago by ghazel

Attachment: win32patch.diff added

comment:3 Changed 16 years ago by ghazel

Here's a patch which band-aids the problem (postpone extra connections when it 
happens, set a limit and postpone connections attempted over that limit).
That means the application will recover from the situation, instead of printing 
stack traces forever. It's possible to continue trying to make connections for 
as long as you have system memory - a much higher barrier than the previous 
limit.

I'm obligated to mention the best answer is a new "too many connections" error 
added to all reactors, and a threaded select (that's select in multiple 
threads, not a single other thread) reactor.

comment:4 Changed 15 years ago by jknight

Component: conch
Description: modified (diff)

comment:5 Changed 15 years ago by ghazel

Priority: highhighest

in branch selectreactor-1228, needs a reviewer and probably some discussion

comment:6 Changed 15 years ago by Jean-Paul Calderone

Keywords: review added

Add the review keyword when a branch is ready for review, in addition to setting it to highest priority.

comment:7 Changed 15 years ago by Jean-Paul Calderone

Component: conchcore

comment:8 Changed 15 years ago by teratorn

Cc: teratorn added
Keywords: win32 added

comment:9 Changed 15 years ago by Jean-Paul Calderone

Keywords: review removed
Owner: changed from spiv to ghazel

Follow the naming standard - camelCase names, not underscore_separated. Also, are limit and overload_delay intended to be public? Give them an underscore prefix if not.

In general, I'm not sure this is the right approach to this problem. There should be some way to get application code involved in the decision regarding what to do in this case. It may have some idea about whether it is more appropriate to disconnect some existing connections, or stop accepting new ones, or stop paying attention to events from some of them, or whatever.

This is also a problem on other platforms, since select has similar limitations everywhere. This patch doesn't address the problem on unix at all, and the approach isn't applicable there either. This may be okay, since this ticket seems specifically targetted at the problem on win32, I'm just pointing out it doesn't completely fix selectreactor with too many connections.

client.py and server.py should be turned into a unit test. put this up for review again when that has been done.

comment:10 Changed 15 years ago by k3mper

Cc: k3mper added

comment:11 Changed 14 years ago by ghazel

Summary: selectreactor throws exceptions under pressure on windowsselectreactor throws exceptions under pressure on windows (WSAENOBUFS)

comment:12 Changed 14 years ago by Trent.Nelson

Cc: Trent.Nelson added

Perhaps providing a way for an application to register at startup with the reactor how it wants to behave/degrade (keep trying, drop connections, blow up, etc) in situations such as this is a more appropriate path, with the default being some sort of sensible/graceful degredation in service quality.

comment:13 Changed 13 years ago by ivank

Cc: ivank added

comment:14 Changed 11 years ago by <automation>

Owner: ghazel deleted
Note: See TracTickets for help on using tickets.