Opened 9 years ago

Closed 5 years ago

#970 defect closed fixed (fixed)

twisted.names.root.Resolver eats filedescriptors

Reported by: jojo Owned by:
Priority: high Milestone:
Component: names Keywords:
Cc: exarkun, spiv, itamarst, titty, jojo, teratorn, thijs Branch: branches/root-resolver-fd-leak-970-2
(diff, github, buildbot, log)
Author: exarkun Launchpad Bug:

Description


Attachments (2)

out.txt (5.1 KB) - added by jojo 9 years ago.
patchit.txt (2.4 KB) - added by cb 5 years ago.
this solves the leaking file descriptors for me ..

Download all attachments as: .zip

Change History (45)

Changed 9 years ago by jojo

comment:1 Changed 9 years ago by jojo

There seems to be a problem with Twisted Names on Windows.

In twisted/names/client.py, reactor has to be passed to
ThreadedResolver.__init__():

------------------------------------------------------------------------
Index: twisted/names/client.py
===================================================================
--- twisted/names/client.py	(Revision 13495)
+++ twisted/names/client.py	(Arbeitskopie)
@@ -359,6 +359,7 @@
 
 def createResolver(servers = None, resolvconf = None, hosts = None):
     from twisted.names import resolve, cache, root, hosts as hostsModule
+    from twisted.internet import reactor
     if platform.getType() == 'posix':
         if resolvconf is None:
             resolvconf = '/etc/resolv.conf'
@@ -369,7 +370,7 @@
     else:
         if hosts is None:
             hosts = r'c:\windows\hosts'
-        bootstrap = ThreadedResolver()
+        bootstrap = ThreadedResolver(reactor) # Pass reactor as argument!
         hostResolver = hostsModule.Resolver(hosts)
         theResolver = root.bootstrap(bootstrap)
------------------------------------------------------------------------

The following simple test stops working after a while and produces the
attached output. When using ThreadedResolver instead of
createResolver(), everything works fine.

from twisted.names.client import ThreadedResolver, createResolver
from twisted.internet import reactor

class TestResolver(object):
    def __init__(self):
	self.num_connections = 0
	self.resolver = createResolver()
        #self.resolver = ThreadedResolver(reactor)

    def resolve(self):
	self.num_connections += 1
	d = self.resolver.getHostByName("www.twistedmatrix.com")
	d.addCallback(self.gotIP)
	reactor.callLater(0.1, self.resolve)

    def gotIP(self, ip):
	self.num_connections -= 1
	print "Got IP %s. %d connections." % (ip, self.num_connections)

t = TestResolver()
t.resolve()
reactor.run()

comment:2 Changed 9 years ago by exarkun

First part of this is fixed and checked in at r13513 and r13514

Dunno what to do about the second part.  It seems to be a win32-only problem. 
It may even have nothing to do with t.names, I can't tell.

comment:3 Changed 9 years ago by titty

  • Keywords win32 removed
  • Summary changed from [PATCH] Problems with Twisted Names on Windows to twisted.names.root.Resolver eats filedescriptors [Problems with Twisted Names on Windows]

This isn't win32 only.

Change 'self.resolver = ...' to
self.resolver = twisted.names.root.Resolver(192.228.79.201?)

This results in the following error message on linux:

Unhandled error in Deferred:
Traceback (most recent call last):
  File "/scratch1/ralf/bbot/extern/Twisted-trunk/twisted/internet/defer.py", line 229, in callback
    
  File "/scratch1/ralf/bbot/extern/Twisted-trunk/twisted/internet/defer.py", line 294, in _startRunCallbacks
    
  File "/scratch1/ralf/bbot/extern/Twisted-trunk/twisted/internet/defer.py", line 307, in _runCallbacks
    
  File "/scratch1/ralf/bbot/extern/Twisted-trunk/twisted/internet/defer.py", line 630, in gotResult
    
--- <exception caught here> ---
  File "/scratch1/ralf/bbot/extern/Twisted-trunk/twisted/internet/defer.py", line 606, in _deferGenerator
    
  File "/scratch1/ralf/bbot/extern/Twisted-trunk/twisted/names/root.py", line 127, in discoverAuthority
    
  File "/scratch1/ralf/bbot/extern/Twisted-trunk/twisted/names/root.py", line 74, in lookupNameservers
    
  File "/scratch1/ralf/bbot/extern/Twisted-trunk/twisted/names/root.py", line 38, in retry
    
  File "/scratch1/ralf/bbot/extern/Twisted-trunk/twisted/names/dns.py", line 1126, in query
    
  File "/scratch1/ralf/bbot/extern/Twisted-trunk/twisted/names/dns.py", line 1084, in startListening
    
  File "/scratch1/ralf/bbot/extern/Twisted-trunk/twisted/internet/posixbase.py", line 307, in listenUDP
    
  File "/scratch1/ralf/bbot/extern/Twisted-trunk/twisted/internet/udp.py", line 84, in startListening
    
  File "/scratch1/ralf/bbot/extern/Twisted-trunk/twisted/internet/udp.py", line 92, in _bindSocket
    

comment:4 Changed 9 years ago by titty

While I'm here:
Shouldn't the twisted.names.root.bootstrap mechanism be replaced by a static list of IP addresses?

comment:5 Changed 9 years ago by exarkun

Ah. It's pretty clear what's going on here now. root.Resolver needs to clean up the resolver it creates for each lookup it does. I'll take care of this shortly.

Regarding bootstrap, what would the advantage of this be? The list would just have to be maintained and updated, and old installs of Twisted would randomly break over time.

What it should actually be replaced (or probably just augmented) with is a platform-specific mechanism for determining the system's actual configured nameservers.

comment:6 Changed 9 years ago by titty

The advantage is that one can use root.Resolver without first using dns resolving (which might not work).

The platform specific mechanism is working for unix, still root.Resolver isn't used (this is fine). In my opinion this has nothing to do with being able to determine the configured nameservers.

Maybe, it should just be rewritten with a static list of IPs and then using root.Resolver([ips]) to do the name lookups for the root servers. This would also be real bootstrapping :)

comment:7 Changed 8 years ago by exarkun

  • Owner changed from itamarst to exarkun

comment:8 Changed 8 years ago by teratorn

  • Cc teratorn added

comment:9 Changed 7 years ago by exarkun

  • Priority changed from high to low

Certainly this should still be fixed, but I'm not likely to get to it in the foreseeable future. If someone would like to supply a patch with unit tests, I'll gladly review it, though.

comment:10 Changed 5 years ago by exarkun

#3982 was a duplicate of this.

comment:11 Changed 5 years ago by exarkun

#4141 was a duplicate of this.

comment:12 Changed 5 years ago by exarkun

  • Summary changed from twisted.names.root.Resolver eats filedescriptors [Problems with Twisted Names on Windows] to twisted.names.root.Resolver eats filedescriptors

comment:13 follow-up: Changed 5 years ago by cb

  • Priority changed from low to high

I think this bug has a pretty high priority ..

It shows that might be a design-bug in the twisted.named.root.Resolver()

being able to bootstrap a recursive caching Resolver is very important

at the Moment the root.Resolver() is pretty much useless and should be marked broken

yours

Christian

comment:14 in reply to: ↑ 13 Changed 5 years ago by cb

Replying to cb:
please also have look at #4141

comment:15 Changed 5 years ago by cb

i fixed the leaking resolver ports for my use case ..
.. though it seems to have problems when the reactor is not running?

please see the attached patchit.txt

yours

Christian Bahls

Changed 5 years ago by cb

this solves the leaking file descriptors for me ..

comment:16 Changed 5 years ago by thijs

  • Cc thijs added
  • Keywords review added
  • Owner exarkun deleted

comment:17 Changed 5 years ago by exarkun

  • Keywords review removed
  • Owner set to exarkun
  • Status changed from new to assigned

Needs unit tests badly.

comment:18 Changed 5 years ago by exarkun

  • Author set to exarkun
  • Branch set to branches/root-resolver-fd-leak-970

(In [27646]) Branching to 'root-resolver-fd-leak-970'

comment:19 Changed 5 years ago by exarkun

  • Branch changed from branches/root-resolver-fd-leak-970 to branches/root-resolver-fd-leak-970-2

(In [27694]) Branching to 'root-resolver-fd-leak-970-2'

comment:20 Changed 5 years ago by exarkun

  • Keywords review added
  • Owner exarkun deleted
  • Status changed from assigned to new

comment:21 Changed 5 years ago by TimAllen

  • Owner set to TimAllen

comment:22 Changed 5 years ago by TimAllen

  • Keywords review removed
  • Owner changed from TimAllen to exarkun

OK, this is my first attempt at a code-review. Hope this works!

  1. t/n/client.py, line 2: File has been edited, but copyright date not bumped.
  2. t/n/client.py, lines 33-35: You've deleted the _errormap variable from Resolver, but the exceptions it contained are still being imported even though they're not used.
  3. It's not immediately obvious why t.n.dns.RRHeader has learned how to compare itself to other instances. I mean, it's a useful-looking change, but is it necessary for the rest of the code in this patch to work?
  4. t/n/root.py, line 121: The list of timeouts (1, 3, 11, 45) appears in nearly a dozen places through the Twisted codebase, and three times in root.py alone. Does this sequence come from an RFC or something? Should it be a module-level constant?
  5. t/n/root.py, in Resolver: This code is very confused about whether it wants a single server to query, or a list of them. __init__() takes a list, which _lookup() passes to _discoverAuthority(), which grabs the server at the head of the list and throws the tail away. The head gets passed to _query() which wraps it back into a list and passes it to client.Resolver(). Later on in _discoveredAuthority(), we take a list of potential servers in traps, look up only the first one, and wrap it back into a list to pass to _discoverAuthority() (which will unpack it so _query() can re-pack it). Since this class is given a list and passes a list to the underlying utility functions, they might as well stay lists all along.
  6. t/n/root.py, line 137: The docstring says timeout should be a list of ints, but everywhere that this code supplies a timeout, it's a tuple of ints.
  7. t/n/root.py, line 183: There's a comment that says "Check class too!" If this is just a matter of adding "and answer.class == query.class", you might as well do it.
  8. t/n/root.py, line 206: If we get a response with a list of authoritative name servers and addresses for *some* of them, is it OK to just forward the query to the servers we have addresses for and ignore the others? If we're ignoring information, doesn't that make things less reliable?

comment:23 Changed 5 years ago by exarkun

(In [27697]) Update copyright date; remove unused imports

refs #970

comment:24 Changed 5 years ago by exarkun

(In [27698]) Check the RR class before considering it an answer to our query

refs #970

comment:25 Changed 5 years ago by exarkun

(In [27699]) Pass around just one server address while recursing, since only server is ever actually used

refs #970

comment:26 Changed 5 years ago by exarkun

(In [27700]) timeouts are typically tuples, not lists

refs #970

comment:27 Changed 5 years ago by exarkun

(In [27701]) A test for the behavior of a lookup method when there is a CNAME for the QNAME in the response, but no record matching the query type

refs #970

comment:28 Changed 5 years ago by exarkun

(In [27702]) A test for cname cycles

refs #970

comment:29 Changed 5 years ago by exarkun

(In [27703]) Follow CNAMEs within a single response, detecting cycles.

refs #970

comment:30 Changed 5 years ago by exarkun

  • Keywords review added
  • Owner exarkun deleted

Thanks for the in-depth review.

  1. Date updated
  2. Unused imports deleted
  3. This was to ease testing. In a couple places the tests want to verify that the right RRHeader is returned.
  4. It's somewhat recommended by some standard, I don't remember which. I think it's an API mistake, but it's hard to fix at this point (I think there's room for a higher-level, nicer API, where perhaps it can be fixed). Since most of the occurrences in root.py are in deprecated code, I'm not too bothered. The net count will go down once that code is deleted.
  5. Indeed. It keeps lists around in more places now.
  6. Fixed.
  7. Added a test for that and implemented the class check.
  8. That's a pretty good question. I'm not sure what the current best practice is for this. Supposedly, every server included in that response is supposed to be able to answer the query. Of course, in practice, a server might be down, overloaded, misconfigured, etc. It might be sensible to make sure all servers are tried. I think this could easily be a separate unit of work, too, though. And it's pretty reliable as-is.

Beyond what was mentioned in the review, I also implemented some more CNAME logic (following them with an additional query when their target is not included in the response, detecting cycles). I'm not all that happy with how long _discoveredAuthority is now. The processing logic is somewhat complex, and the implementation reflects this.

comment:31 Changed 5 years ago by TimAllen

  • Owner set to TimAllen

comment:32 Changed 5 years ago by TimAllen

  • Keywords review removed
  • Owner changed from TimAllen to exarkun

I did a brief search for the sequence "1, 3, 11, 45", and the only hits are
for various copies of the Twisted source across the internet, and a Japanese
PDF about DNSSEC. Oh, and it's the first few terms of the sequence of Super
Catalan Numbers (they fight crime!). I think it'd be good to have a comment mentioning that these numbers are arbitrary rather than being
fundamentally important parts of some DNS RFC, just to ward off cargo-cult
programmers (it may be too late for Japanese DNSSEC researchers).

  1. t/n/root.py, in findAnswerOrCName(): You say "if record.name == name". Since the records variable was populated by grabbing the .name attribute from each record, surely that condition should always be true.
  2. t/n/root.py, line 207 in findAnswerOrCName(): It might be worth pointing out in a comment that a given DNS name should only ever have one CNAME, so it doesn't matter that this method always chooses the last CNAME.
  3. t/n/root.py, in _discoverAuthority(): I agree, this method is getting unwieldly and long, especially the "while True" loop is a bit mind-bending. On the other hand, I don't have any concrete suggestions on how to improve it either.
  4. t/n/root.py, line 266 in _discoverAuthority(): If the reply included no hints, this code assumes it must have included at least one entry in traps, which seems a bold assertion (there's got to be *some* sufficiently misconfigured DNS server out there somewhere). In this circumstance, raising some kind of 'DNS server is stupid' error would be more helpful than IndexError: list index out of range. Likewise for assuming that the first entry in traps resolves to at least one sensible IP address.
  5. t/n/t/test_rootresolve.py, in _getResolver docstring: The "@param" line needs an extra space of indent.

comment:33 Changed 5 years ago by exarkun

(In [27718]) Remove redundant name check

refs #970

comment:34 Changed 5 years ago by exarkun

(In [27720]) Comment about the cname handling in findAnswerOrCName

refs #970

comment:35 Changed 5 years ago by exarkun

(In [27722]) Handle the case where a response has no answers and is not a delegation

refs #970

comment:36 Changed 5 years ago by exarkun

(In [27723]) Add a test making sure nameserver lookup errors propagate

refs #970

comment:37 Changed 5 years ago by exarkun

(In [27724]) Add a test making sure another kind of nameserver lookup error propagates

refs #970

comment:38 Changed 5 years ago by exarkun

(In [27725]) Fix whitespace in _getResolver docstring

refs #970

comment:39 Changed 5 years ago by exarkun

(In [27726]) comment about why we use 1, 3, 11, 45

refs #970

comment:40 Changed 5 years ago by exarkun

  • Keywords review added
  • Owner exarkun deleted

Thanks! All points addressed.

comment:41 Changed 5 years ago by TimAllen

  • Keywords review removed
  • Owner set to exarkun

Looks good to me! Ready to merge, as far as I can see.

comment:42 Changed 5 years ago by exarkun

  • Resolution set to fixed
  • Status changed from new to closed

(In [27740]) Merge root-resolver-fd-leak-970-2

Author: exarkun
Reviewer: TimAllen
Fixes: #970

Fix twisted.names.root.Resolver so that it does not leak several file descriptors
(udp ports) for each lookup it performs. Also re-introduce tests for this code and
deprecate several of the previous APIs which were really only implementation details,
not functionality intended to be made public.

comment:43 Changed 4 years ago by <automation>

  • Owner exarkun deleted
Note: See TracTickets for help on using tickets.