Opened 4 years ago

Last modified 3 months ago

#4747 defect reopened

tw.inet.test.ProcessTestsBuilder.test_openFileDescriptors is broken in many ways

Reported by: soyt Owned by: soyt
Priority: normal Milestone:
Component: core Keywords:
Cc: jessica.mckellar@… Branch:
Author: Launchpad Bug:

Description

The implementation of twisted.internet.process._listOpenFDs() is only a guess on some systems, with possible false positives. For example, on OpenBSD it returns something like range(64).

Furthermore, os.listdir() is not always used to return the fds, so the extra fd might not be there.

Finally, it seems that on some systems (guess what, OpenBSD!), for some reason, python internally uses a pair of interconnected pipes (I'll look further into this later) so there are actually more opened fds than just stdin, stdout and stderr.

So, as it is written the test makes little sense, and I am actually not sure what functionnality _listOpenFDs() really provides that is worth being tested.

Anyway, here is a patch that attempts to check that at least the three standard fds are there.

Attachments (5)

patch-twisted_internet_test_test_process_py (958 bytes) - added by soyt 4 years ago.
listopenfd.diff (3.4 KB) - added by soyt 4 years ago.
tw.i.process._listOpenFDs.diff (4.4 KB) - added by soyt 4 years ago.
patch-twisted_internet_process_py (2.5 KB) - added by soyt 4 years ago.
patch-twisted_internet_test_test_process_py.2 (706 bytes) - added by soyt 4 years ago.

Download all attachments as: .zip

Change History (30)

comment:1 Changed 4 years ago by jesstess

  • Cc jessica.mckellar@… added
  • Keywords review added

comment:2 Changed 4 years ago by exarkun

  • Keywords review removed
  • Owner changed from glyph to soyt

Thanks very much for the patch. We don't have any OpenBSD machines for testing, or I suppose we would have noticed this sooner. :)

To answer the question of what functionality it provides that's worth testing... _listOpenFDs is used by the reactor.spawnProcess support to find out which file descriptors need to be closed so that they aren't inherited by the child process. So, it must work right otherwise the child may inherit file descriptors you did not want it to inherit. This is an optimization over trying to close (roughly) every integer in the range (3, RLIMIT_NFILES), where RLIMIT_NFILES is often around 1024.

I don't think the patch is quite correct, since at most the one extra os.listdir file descriptor should be in the result set. Anything more than that indicates the desired functionality isn't being provided. The more correct test assertion would be that the output is either range(3) or range(4). However, I'm not sure what you mean by the "uses a pair of interconnected pipes" comment. Uses it for what? To implement os.listdir?

It would be nice if we didn't have to scan and close all of range(RLIMIT_NFILES) on OpenBSD though. Do you know if there is any way to get this information on that platform?

comment:3 Changed 4 years ago by glyph

  • Milestone Twisted 10.2 deleted

Sorry this didn't make it for 10.2.

comment:4 Changed 4 years ago by soyt

So, here is a suggestion for a new implementation of _listOpenFDs() returning a list of opened file descriptors. It assumes that poll(2) is available on the platform. I think it is more portable than the current approach which is very linux-specific, and suboptimal for the rest. If poll(2) is not available on a UNIX system that is supported, then there should a workaround that explicitely test all possible (within a reasonnable range) fd in a loop.

I also updated the test case to check that at least stdin, stdout and stderr are opened in the child. As exarkun suggested, a more complete test could involve passing an additionnal fd and checking that the child has it too.

Changed 4 years ago by soyt

comment:5 Changed 4 years ago by glyph

  • Type changed from regression to defect

Not a regression, as it's now been present in a release (and also not on any supported platform).

comment:6 Changed 4 years ago by glyph

I would like to continue using /dev/fd where it's available and reliable; listing the directory is much faster than filling up a gigantic buffer full of integers and then polling it, and the point of that change in the first place was to accelerate this process. Also, it's eminently possible to configure a Mac to have more than 1024 simultaneously open FDs, so I'd very much like not to reintroduce that bug to fix this one.

Also, as you submit more patches, you may want to attach the 'review' keyword and reassign to nobody as per the ReviewProcess.

comment:7 follow-up: Changed 4 years ago by jknight

The current behavior isn't linux-specific. I believe it's linux-and-osx-and-solaris-and-freebsd specific.

Has anyone submitted a bug against OpenBSD? That is pretty ridiculous behavior they have...

I'd suggest the most-right solution here is to check sys.platform against whatever value openbsd has ("openbsd" maybe?), and don't look for the directory in that case. It can just use the existing fallback case of a close loop...

comment:8 in reply to: ↑ 7 ; follow-up: Changed 4 years ago by glyph

Replying to jknight:

The current behavior isn't linux-specific. I believe it's linux-and-osx-and-solaris-and-freebsd specific.

Yeah. After testing on a bunch of platforms, I thought I must have just missed the standards reference somehow and this was in fact standard.

Has anyone submitted a bug against OpenBSD? That is pretty ridiculous behavior they have...

I sure haven't. Soyt, would you mind doing that?

I'd suggest the most-right solution here is to check sys.platform against whatever value openbsd has ("openbsd" maybe?), and don't look for the directory in that case. It can just use the existing fallback case of a close loop...

That sounds fine by me.

Although, if someone cares about other obscure platforms, another way to go about this would be to have a whitelist, and then a run-time check that gets performed if the current platform isn't in the whitelist: the check would be something like,

  • pipe()
  • dup2() one of the ends to some high number, near, but not at, RLIMIT_NOFILE
  • _listOpenFDs() and see if both ends and the dup2'd end show up in the result, but NOT the should-be-invalid numbers around the dup2'd end.

This would avoid a bunch of unnecessary syscalls on platforms that we know work, but double-check in the cases where we don't. It's a bunch more work though; if I were going to implement this I'd probably skip it and just do the OpenBSD check that James suggests.

comment:9 in reply to: ↑ 8 ; follow-up: Changed 4 years ago by soyt

Replying to glyph:

Replying to jknight:

The current behavior isn't linux-specific. I believe it's linux-and-osx-and-solaris-and-freebsd specific.

My bad. But it is still specific.

Yeah. After testing on a bunch of platforms, I thought I must have just missed the standards reference somehow and this was in fact standard.

I haven't found a standard that specifies /dev/fd/* is expected to behave like that.

Has anyone submitted a bug against OpenBSD? That is pretty ridiculous behavior they have...

I sure haven't. Soyt, would you mind doing that?

Well, I don't think it's a bug. And I don't think it is ridiculous just because it doesn't work as you would like. Everything in /dev is static, that's it.

I'd suggest the most-right solution here is to check sys.platform against whatever value openbsd has ("openbsd" maybe?), and don't look for the directory in that case. It can just use the existing fallback case of a close loop...

That sounds fine by me.

Yes, that would be fine. But still, I don't see the point of having a function that just returns a guess when the cost of doing the actual check will be paid anyway.

Although, if someone cares about other obscure platforms, another way to go about this would be to have a whitelist, and then a run-time check that gets performed if the current platform isn't in the whitelist: the check would be something like,

  • pipe()
  • dup2() one of the ends to some high number, near, but not at, RLIMIT_NOFILE
  • _listOpenFDs() and see if both ends and the dup2'd end show up in the result, but NOT the should-be-invalid numbers around the dup2'd end.

I don't understand that part. What are you trying to test?

This would avoid a bunch of unnecessary syscalls on platforms that we know work, but double-check in the cases where we don't. It's a bunch more work though; if I were going to implement this I'd probably skip it and just do the OpenBSD check that James suggests.

If I were pedantic, I would say that if the number of syscalls is a concern, my solution has only 2, whereas the open/read/close one is 3, plus a bogus result. If I were bitter, I would add that if the "gigantic amount" of mem (something like, what, a page?) used is a concern, then one should probably consider not using python from the beginning. Hopefully I am not :)

comment:10 follow-up: Changed 4 years ago by jknight

Well, I don't think it's a bug. And I don't think it is ridiculous just because it doesn't work as you would like. Everything in /dev is static, that's it.

Right, it's not a bug, it's just a shitty, but perfectly valid, implementation which makes it a pain in the ass to support OpenBSD, since it's different from everyone else. So, a feature request.

The point of using /dev/fd is that it's *NOT* a guess, it returns all the open fds, even those above 1024. Your poll solution does not do that, so it is not a replacement of /dev/fd on OSes where that exists (and where it actually works properly). The poll call only lets you avoid some of the close syscalls in a for fd in range(1024): close(fd) loop...which is only used in the fallback case, and which *is* incorrect (but we hope nobody actually notices), since it won't close fds > 1024.

comment:11 in reply to: ↑ 9 ; follow-up: Changed 4 years ago by glyph

Replying to soyt:

Replying to glyph:

Replying to jknight:

The current behavior isn't linux-specific. I believe it's linux-and-osx-and-solaris-and-freebsd specific.

My bad. But it is still specific.

I think you mean "non-standardized". It's hardly "specific" if four major platforms implement it. It's general, just not universal. (And, for that matter, POSIX is not universal: we have plenty of Windows support code, too.)

Yeah. After testing on a bunch of platforms, I thought I must have just missed the standards reference somehow and this was in fact standard.

I haven't found a standard that specifies /dev/fd/* is expected to behave like that.

Sorry, I wasn't clear. I know it's not standard. What I was saying was that it is so widely supported, I was starting to think that it might be standard and I'd missed it.

Has anyone submitted a bug against OpenBSD? That is pretty ridiculous behavior they have...

I sure haven't. Soyt, would you mind doing that?

Well, I don't think it's a bug. And I don't think it is ridiculous just because it doesn't work as you would like. Everything in /dev is static, that's it.

OK, maybe it's not ridiculous, but it's still not as useful as the behavior that all these other platforms have.

I'd suggest the most-right solution here is to check sys.platform against whatever value openbsd has ("openbsd" maybe?), and don't look for the directory in that case. It can just use the existing fallback case of a close loop...

That sounds fine by me.

Yes, that would be fine. But still, I don't see the point of having a function that just returns a guess when the cost of doing the actual check will be paid anyway.

I don't know what you mean by "actual check". What I'm saying is that we have 2 algorithms: the crappy one, which has to loop over a list of every possible file descriptor, that gets used on OpenBSD and maybe some similar platforms, and the good one, which can be used on all the other UNIX-y systems that we currently support. We have a cheap, one-time, start-up check that determines which one we should use.

Although, if someone cares about other obscure platforms, another way to go about this would be to have a whitelist, and then a run-time check that gets performed if the current platform isn't in the whitelist: the check would be something like,

  • pipe()
  • dup2() one of the ends to some high number, near, but not at, RLIMIT_NOFILE
  • _listOpenFDs() and see if both ends and the dup2'd end show up in the result, but NOT the should-be-invalid numbers around the dup2'd end.

I don't understand that part. What are you trying to test?

Whether or not /dev/fd does what we think it does. When I say "_listOpenFDs() and see ..." I mean, call the implementation that lists /dev/fd and check its results. If we get a static 0->64, then it's a platform like OpenBSD, and we switch to using the other algorithm for listing open FDs.

This would avoid a bunch of unnecessary syscalls on platforms that we know work, but double-check in the cases where we don't. It's a bunch more work though; if I were going to implement this I'd probably skip it and just do the OpenBSD check that James suggests.

If I were pedantic, I would say that if the number of syscalls is a concern, my solution has only 2, whereas the open/read/close one is 3, plus a bogus result.

All I was saying here is that we don't want to do any startup runtime checks for the platform's behavior if we don't have to, hence the whitelist. It was imprecise to phrase that as "syscalls". total syscall count optimization is not a goal :).

If I were bitter, I would add that if the "gigantic amount" of mem (something like, what, a page?) used is a concern, then one should probably consider not using python from the beginning. Hopefully I am not :)

It takes a lot more than a page to store 9223372036854775808 file descriptors :). Even in the cases where RLIMIT_NOFILE isn't wrong, you can have servers potentially processing very large numbers of file descriptors. And it is exactly that sort of server (where the ulimits are cranked waaay up to handle lots of volume) where it's quite important to be able to efficiently spawn and terminate processes.

comment:12 in reply to: ↑ 10 ; follow-up: Changed 4 years ago by soyt

Replying to jknight:

Well, I don't think it's a bug. And I don't think it is ridiculous just because it doesn't work as you would like. Everything in /dev is static, that's it.

Right, it's not a bug, it's just a shitty, but perfectly valid, implementation which makes it a pain in the ass to support OpenBSD, since it's different from everyone else. So, a feature request.

Ok, so it is valid and it is different, and I suggest a way to support it.

The point of using /dev/fd is that it's *NOT* a guess, it returns all the open fds, even those above 1024. Your poll solution does not do that, so it is not a replacement of /dev/fd on OSes where that exists (and where it actually works properly). The poll call only lets you avoid some of the close syscalls in a for fd in range(1024): close(fd) loop...which is only used in the fallback case, and which *is* incorrect (but we hope nobody actually notices), since it won't close fds > 1024.

Ok, the 1024 limit is a misplaced leftover from the previous implementation to handle the MacOS problem. I guess it can go away if resource.getrlimit(resource.RLIMIT_NOFILE)[0] is properly implemented. Other than that I don't see the problem with my solution.

comment:13 in reply to: ↑ 12 ; follow-up: Changed 4 years ago by glyph

Replying to soyt:

Ok, the 1024 limit is a misplaced leftover from the previous implementation to handle the MacOS problem. I guess it can go away if resource.getrlimit(resource.RLIMIT_NOFILE)[0] is properly implemented. Other than that I don't see the problem with my solution.

RLIMIT_NOFILE is sometimes very large.

The problem with your solution is that it is O(n) where n is the total number of all possible open file descriptors. It doesn't have to do a syscall for every one, but it does have to call a python function for every one, and that's almost as bad. The /dev/fd solution, by contrast, is O(n) where n is only the number of actually open file descriptors, which is by definition a smaller number (usually much smaller). Despite, as you have noted, actually calling more syscalls total; that part isn't very interesting.

The other problem is that resource.getrlimit does what it does on existing OS X releases, and we have to support those, so anything that doesn't take it into account is dead in the water.

Your solution is fine for OpenBSD, as long as other platforms still use the (better, faster) /dev/fd-listing solution; if anyone needs better performance on high-scale OpenBSD boxes, they can work with the kernel folks on that OS to provide us with an API that does the same thing as /dev/fd with comparable efficiency.

comment:14 in reply to: ↑ 11 Changed 4 years ago by soyt

Replying to glyph:

Ok, I'll post a new patch tomorrow that checks sys.platform at runtime to choose a correct implementation, and a regress test for detecting platforms that might need it.

comment:15 in reply to: ↑ 13 Changed 4 years ago by soyt

Replying to glyph:

Replying to soyt:

Ok, the 1024 limit is a misplaced leftover from the previous implementation to handle the MacOS problem. I guess it can go away if resource.getrlimit(resource.RLIMIT_NOFILE)[0] is properly implemented. Other than that I don't see the problem with my solution.

RLIMIT_NOFILE is sometimes very large.

The problem with your solution is that it is O(n) where n is the total number of all possible open file descriptors. It doesn't have to do a syscall for every one, but it does have to call a python function for every one, and that's almost as bad. The /dev/fd solution, by contrast, is O(n) where n is only the number of actually open file descriptors, which is by definition a smaller number (usually much smaller). Despite, as you have noted, actually calling more syscalls total; that part isn't very interesting.

The close loop in the fallback case is O(n) too, and the poll thing only calls one function per fd once if the rlimit doesn't change. The /dev/fd solution is nice but it simply doesn't work so I can't use it.

The other problem is that resource.getrlimit does what it does on existing OS X releases, and we have to support those, so anything that doesn't take it into account is dead in the water.

Are you sure that getrlimit()[0] is also that high? the [1] is max, whereas [0] is the currently set value. I am pretty sure even the MacOS X kernel knows how big the current fd table of a process is.

Your solution is fine for OpenBSD, as long as other platforms still use the (better, faster) /dev/fd-listing solution; if anyone needs better performance on high-scale OpenBSD boxes, they can work with the kernel folks on that OS to provide us with an API that does the same thing as /dev/fd with comparable efficiency.

Right. Checking sys.platform is fine. patch tomorrow.

comment:16 Changed 4 years ago by soyt

Overdue patch: Refactor the code to possibly choose the method(s) to try for _listOpenFDs based on sys.platform (or any other runtime test). I used the current fd table size given by getrlimit which is apparently properly reported on MacOSX.

Changed 4 years ago by soyt

comment:17 Changed 4 years ago by exarkun

#4881 was somewhat a duplicate of this. It's been resolved, and I think it fixed the /dev/fd problem. The implementation now does a sanity check on the results of each mechanism until it finds one that appears to work.

A remaining issue is the assertion that exactly [0, 1, 2, 3] are the valid file descriptors in the child process. This fails on the FreeBSD slave which has been online recently (though it is offline as I write this comment) because an implementation which returns xrange(256) is being used on that slave.

We should fix that test, twisted.internet.test.test_process.ProcessTestsBuilderBase.test_openFileDescriptors. I think the actual implementation in process.py is probably more or less fine at this point.

comment:18 Changed 4 years ago by soyt

I don't think the issue is really fixed. This function cannot have
any useful functional test because it does not do anything useful IMHO.
It just returns an iterable over potentially opened file
descriptors. What is the point then of testing that result against a
fixed list (which includes a totally bogus implementation-specific
listdir fd) when it is even ok to return xrange(1024)?

This whole thing cannot be fixed properly before it is really decided
that the function must return the list of really opened files,
because that is what is expected in the end.

Furthermore there are two smaller problems with the fallbacks:
1) _resourceFDImplementation should be tried before falling back to _fallbackFDImplemention in all casesthe
2) _resourceFDImplementation should be using the current fd limit as a reasonnable upper bound.

The attached patches is an attempt to address these issues.

Now this creates other issues in the regress suite, since the FDDetector
test is getting in the way...

Changed 4 years ago by soyt

comment:19 follow-up: Changed 4 years ago by exarkun

Thanks for the continued input on this.

I don't think the issue is really fixed. This function cannot have any useful functional test because it does not do anything useful IMHO.

Ah, but the function does do something useful, as described in the original ticket. It lets us skip some (often many) failing os.close calls, which are noticably slow.

It just returns an iterable over potentially opened file descriptors. What is the point then of testing that result against a fixed list (which includes a totally bogus implementation-specific listdir fd) when it is even ok to return xrange(1024)?

The function has specific, well-defined behavior. The behavior varies from platform to platform, but it is well defined on any particular platform. The per-platform behavior is important, because it implements the useful feature mentioned above.

It is okay to return xrange(1024) if that is the best answer possible on that platform. If a better answer is possible, it is not okay to return xrange(1024).

This whole thing cannot be fixed properly before it is really decided that the function must return the list of really opened files, because that is what is expected in the end.

It would be ideal if the list of really opened files could be returned on all platforms, as long as computing it is mostly cheaper than doing a lot of failed os.close calls. If it is as expensive or more expensive, then the point is lost. The purpose is not to create an exactly correct "what files are open" API. It's to speed up launching a child process.

1) _resourceFDImplementation should be tried before falling back to _fallbackFDImplemention in all cases

Can you file a new ticket for this?

2) _resourceFDImplementation should be using the current fd limit as a reasonnable upper bound.

Did you read the comment that you deleted from _resourceFDImplementation? It explained why the current fd limit is not a reasonable upper bound. Adding 9.2 sextillion fds to a poll object to determine if they're valid is not viable.

The attached patches is an attempt to address these issues.

This ticket is resolved. I think that the resolution is reasonable, so I am not going to revert it and re-open the ticket. Please file new tickets for further changes. Don't forget to add the review keyword when appropriate.

comment:20 in reply to: ↑ 19 ; follow-up: Changed 4 years ago by soyt

  • Resolution set to wontfix
  • Status changed from new to closed

Replying to exarkun:

Ah, but the function does do something useful, as described in the original ticket. It lets us skip some (often many) failing os.close calls, which are noticably slow.

Yes, "skip some", and the unit test wants it to skip "all but one due to implementation specific behavior", which is why I said from the beginning that this test is useless.

The function has specific, well-defined behavior. The behavior varies from platform to platform, but it is well defined on any particular platform. The per-platform behavior is important, because it implements the useful feature mentioned above.

So, as I said, that unit test makes no sense, because it tests a specific behavior for a function which behavior varies from platform to platform.

It is okay to return xrange(1024) if that is the best answer possible on that platform. If a better answer is possible, it is not okay to return xrange(1024).

A better answer (and arguably the best) is possible on all platforms but you don't want to consider it :)

This whole thing cannot be fixed properly before it is really decided that the function must return the list of really opened files, because that is what is expected in the end.

It would be ideal if the list of really opened files could be returned on all platforms, as long as computing it is mostly cheaper than doing a lot of failed os.close calls. If it is as expensive or more expensive, then the point is lost. The purpose is not to create an exactly correct "what files are open" API. It's to speed up launching a child process.

So, what you want is a benchmark, not a functionnal test.

1) _resourceFDImplementation should be tried before falling back to _fallbackFDImplemention in all cases

Can you file a new ticket for this?

I can.

Did you read the comment that you deleted from _resourceFDImplementation? It explained why the current fd limit is not a reasonable upper bound. Adding 9.2 sextillion fds to a poll object to determine if they're valid is not viable.

I did read it, but did you read the code I posted? MacOS reports the current fd limit correctly.
What is obviously wrong is the max fd limit.

This ticket is resolved. I think that the resolution is reasonable, so I am not going to revert it and re-open the ticket. Please file new tickets for further changes. Don't forget to add the review keyword when appropriate.

Well in that case, I am giving up on this. I'll just keep patching things locally.

What do you mean by "add the review keyword when appropriate."?

comment:21 Changed 4 years ago by exarkun

  • Resolution wontfix deleted
  • Status changed from closed to reopened

This ticket is resolved. I think that the resolution is reasonable, so I am not going to revert it and re-open the ticket. Please file new tickets for further changes. Don't forget to add the review keyword when appropriate.

Well in that case, I am giving up on this. I'll just keep patching things locally.

Sorry, I was confused, since clearly the ticket was not closed before.

comment:22 in reply to: ↑ 20 ; follow-up: Changed 4 years ago by glyph

Replying to soyt:

Replying to exarkun:

It is okay to return xrange(1024) if that is the best answer possible on that platform. If a better answer is possible, it is not okay to return xrange(1024).

A better answer (and arguably the best) is possible on all platforms but you don't want to consider it :)

Say again? I missed that. You said some stuff about poll() earlier, whose deficiencies were outlined in several earlier comments. What's the best answer possible on all platforms? Is there a POSIX API available for listing open file descriptors?

It would be ideal if the list of really opened files could be returned on all platforms, as long as computing it is mostly cheaper than doing a lot of failed os.close calls. If it is as expensive or more expensive, then the point is lost. The purpose is not to create an exactly correct "what files are open" API. It's to speed up launching a child process.

So, what you want is a benchmark, not a functionnal test.

It's not just about raw speed, it's about efficiency in the face of certain conditions. "Subprocesses should launch as fast as possible" is something you have a benchmark for, but "increasing RLIMIT_NOFILE should not change the behavior of subprocess spawning at all" is a functional requirement. It's OK if subprocess spawning is slow, as long as it doesn't get slower (or in this case, do more of a specific type of work) under those conditions.

1) _resourceFDImplementation should be tried before falling back to _fallbackFDImplemention in all cases

Can you file a new ticket for this?

I can.

Thanks. Link?

Did you read the comment that you deleted from _resourceFDImplementation? It explained why the current fd limit is not a reasonable upper bound. Adding 9.2 sextillion fds to a poll object to determine if they're valid is not viable.

I did read it, but did you read the code I posted? MacOS reports the current fd limit correctly.
What is obviously wrong is the max fd limit.

What you mean is that it reports the soft fd limit correctly and gives a bogus value for the hard limit. Both of these limits are "current".

It's possible (easy, even) to open an FD to a high number and then lower than the current soft limit, so you still might end up with open file descriptors which don't get listed. As the POSIX standard on getrlimit puts it, "unexpected behavior may occur".

This is possible with hard limits as well, of course, but with hard limits it's only possible once; with soft limits, the limit may be going up and down all the time. There's a real use-case for that, even: a process may want to keep a lid on its file descriptor allocation to guard against bugs, by keeping the soft limit low except when the application knows they'll need a few extra FDs.

This ticket is resolved. I think that the resolution is reasonable, so I am not going to revert it and re-open the ticket. Please file new tickets for further changes. Don't forget to add the review keyword when appropriate.

Well in that case, I am giving up on this. I'll just keep patching things locally.

Sorry, I think exarkun was confusing this with #4881. This ticket's not closed, please feel free to contribute a working patch. But please patch the test, as the summary describes, and not the implementation, which the core development team is obviously pretty comfortable with being correct :).

What do you mean by "add the review keyword when appropriate."?

See the step by step guide on contributing to Twisted, and the review process documentation which lists requirements for a patch. I think in that comment exarkun was referring specifically to the new ticket that you said you're going to file (i.e. "file it now, and put it into review when it's got an implementation attached"), but this comment applies to all tickets ever.

Also, as a minor nitpick, please submit entire patches for review in one file, i.e. the output of what 'svn diff' in the root of your Twisted checkout would return. It's always a little annoying when I download one patch, start reviewing it and running the tests, then realize there's another half of it that I'm missing.

comment:23 Changed 4 years ago by glyph

By the way, since this discussion keeps circling around to OpenBSD, I should note that the fix for FreeBSD on #4881 also applies there: according to this online man page I found, OpenBSD has an fdescfs as well (although the 'fs' is removed in various contexts for some reason). So OpenBSD should be able to get good behavior out of Twisted by mounting this "standard" filesystem as well.

comment:24 in reply to: ↑ 22 Changed 4 years ago by soyt

Replying to glyph:

A better answer (and arguably the best) is possible on all platforms but you don't want to consider it :)

Say again? I missed that. You said some stuff about poll() earlier, whose deficiencies were outlined in several earlier comments. What's the best answer possible on all platforms? Is there a POSIX API available for listing open file descriptors?

No. But there is a practical way of doing it which is IMHO better than the existing fallbacks anyway.

It's not just about raw speed, it's about efficiency in the face of certain conditions. "Subprocesses should launch as fast as possible" is something you have a benchmark for, but "increasing RLIMIT_NOFILE should not change the behavior of subprocess spawning at all" is a functional requirement. It's OK if subprocess spawning is slow, as long as it doesn't get slower (or in this case, do more of a specific type of work) under those conditions.

The _resourceFDImplementation will return more bogus fds if RLIMIT_NOFILE is increased, linearly slowing down subprocess spawning. What is the difference, except that polling first will be much faster (about 20 times on a quick benchmark here)?

Thanks. Link?

Sure. #5052

What you mean is that it reports the soft fd limit correctly and gives a bogus value for the hard limit. Both of these limits are "current".

sorry, wrong wording. I meant that.

It's possible (easy, even) to open an FD to a high number and then lower than the current soft limit, so you still might end up with open file descriptors which don't get listed. As the POSIX standard on getrlimit puts it, "unexpected behavior may occur".

If you deliberately put yourself in a situation where the standard says "unexpected behavior", then there is nothing to expect anyway.

This is possible with hard limits as well, of course, but with hard limits it's only possible once; with soft limits, the limit may be going up and down all the time. There's a real use-case for that, even: a process may want to keep a lid on its file descriptor allocation to guard against bugs, by keeping the soft limit low except when the application knows they'll need a few extra FDs.

Well, that is already an issue with the current fallback.

Sorry, I think exarkun was confusing this with #4881. This ticket's not closed, please feel free to contribute a working patch. But please patch the test, as the summary describes, and not the implementation, which the core development team is obviously pretty comfortable with being correct :).

What do you mean by "add the review keyword when appropriate."?

See the step by step guide on contributing to Twisted, and the review process documentation which lists requirements for a patch. I think in that comment exarkun was referring specifically to the new ticket that you said you're going to file (i.e. "file it now, and put it into review when it's got an implementation attached"), but this comment applies to all tickets ever.

Also, as a minor nitpick, please submit entire patches for review in one file, i.e. the output of what 'svn diff' in the root of your Twisted checkout would return. It's always a little annoying when I download one patch, start reviewing it and running the tests, then realize there's another half of it that I'm missing.

Will do. sorry.

Anyway, this discussion is getting way too far from the original topic. My main concern is this: what is the point of this test? It obviously fails on platforms that uses the fallback which, I am told, gives a "reasonnable" result. I really have enough problems with the regression suite already. I don't need to have this kind of non-test failing.

By the way, mount_fdesc has been removed from openbsd a while ago.

comment:25 Changed 3 months ago by glyph

This is related to #7633 but I'm not sure if it's a duplicate.

Note: See TracTickets for help on using tickets.