[Twisted-Python] Some beginner questions about "twisted.names.client" and ".tac" environment

Jean-Paul Calderone exarkun at divmod.com
Sat Dec 17 18:10:16 EST 2005


On Sat, 17 Dec 2005 23:14:10 +0100, Jesus Cea <jcea at argo.es> wrote:
>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>Twisted 2.1, twisted.named 0.2, here.
>
>I'm taking my first steps with Twisted (documentation -inexistence-
>nightmare :-), and my first project will be a bulk mailer as the backend
>of my mailing list system.
>
>The application would take the message and the subscriber list and a)
>resolve the MX for the domains and b) connect to the MX's and send the
>message, trying to minimice traffic sending a single envelope for
>several recipients sharing the domain or the MX's.
>
>I'm doing currently the DNS stuff. The result are promising, resolving
>about 200 domains per second in a 1.4GHz P4, so my biggest mailing list
>(about 31500 unique domains, múltiple subscribers per domain) is
>"resolved" in less than three minutes.
>
>Nice so far. The demo code (2Kbytes) is the following (if I'm violating
>the rules posting this code, please tell me):
>
>=====
># File "dns.tac"
>
>from twisted.application import service
>
>application = service.Application("DNS test")
>

You probably want to move most of your program out if "dns.tac" and into an importable Python module.  Code defined inside .tac files lives in a weird world where some surprising rules apply.  It's best to keep the .tac file as short as possible.  Generally, you just want to create an Application and give it some children, importing from modules the definitions of all classes and functions needed to set this up.

>import time
>t=time.time()
>
>class resolucion(object) :
>  def __init__(self,dominio) :
>    from twisted.names import client
>    d = client.lookupMailExchange(dominio,timeout=(60,))

Passing (60,) as the timeout might not be the best idea.  This will cause the DNS client to send one request and then wait 60 seconds for a response.  If either the request or the response is dropped (as often happens with UDP traffic), you will never get a result, and you will have to wait 60 seconds to discover this fact.

If you don't want retransmission, a value of (15,) or so is probably better.  However, I suspect you really do want retransmissions.  The default timeout is also 60 seconds total, but performs several retransmissions during the interim.

>    d.addCallbacks(self._cbMailExchange, self._ebMailExchange)
>    self.dominio=dominio
>
>  def _cbMailExchange(self,results):
>    # Callback for MX query
>    global aun_pendientes
>    aun_pendientes-=1
>    if not aun_pendientes :
>      print "OK",time.time()-t
>      return
>      from twisted.internet import reactor
>      reactor.stop()
>      return
>    if not len(pendientes) :
>      return
>
>    resolucion(pendientes.pop())
>    from twisted.names.dns import QUERY_TYPES
>    for i in results[0] :
>      n=i.payload.name
>      tipo=QUERY_TYPES[i.payload.TYPE]
>      if tipo=="MX" :

You can just use dns.MX here, instead of looking up "MX" in QUERY_TYPES.

>        return
>        p=i.payload.preference
>        print n,p,
>        for j in results[2] :
>          if n==j.name :
>            print j.payload.dottedQuad(),"(%d)" %j.ttl
>            break
>        else :
>          print "???"
>      elif tipo=="CNAME" :
>        redirigidos.append((self.dominio,i.payload.name))
>
>  def _ebMailExchange(self,failure):
>    # Error callback for MX query
>    global aun_pendientes
>    aun_pendientes-=1
>    if not aun_pendientes :
>      print "ERROR",time.time()-t
>      return
>      from twisted.internet import reactor
>      reactor.stop()
>      return
>    if not len(pendientes) :
>      return
>
>    resolucion(pendientes.pop())
>    print "XXX",self.dominio
>    print 'Lookup failed:'
>    failure.printTraceback()
>
>pendientes=[]
>redirigidos=[]
>
>f=open("domain_list")
>for i in f :
>  pendientes.append(i)
>
>aun_pendientes=len(pendientes)
>
>concurrencia=1000
>
>for i in pendientes[:concurrencia] :
>  resolucion(i)
>
>from twisted.names import client
>client.theResolver.resolvers[-1].dynServers=[('127.0.0.1', 53)]
># client.theResolver.resolvers=[client.theResolver.resolvers[-1]]

To customize the server used by the resolver, you may want to create your own resolver instance, rather than relying on the defaults guessed by the resolver automatically created in the client module.

>
>pendientes=pendientes[concurrencia:]
>
>=====
>
>I launch the code as "twistd -ny dns.tac".
>
>The demo does 1000 resolutions in parallel. If you experiment with the
>code, reduce the value.
>
>Questions:
>
>1. I get a warning: "[Uninitialized]
>/usr/local/lib/python2.4/site-packages/twisted/names/dns.py:1227:
>exceptions.DeprecationWarning: Deferred.setTimeout is deprecated.  Look
>for timeout support specific to the API you are using instead."
>
> I'm using, the native "twisted.names" timeout API, as far as I know...

This is a problem internal to twisted.names.  Your code isn't doing anything wrong to cause it.  Hopefully this will be fixed by the next release.

>
>2. By default "twisted.names.client" uses the "/etc/resolv.conf" file to
>know which nameservers to use. I, nevertheless, want to use a particular
>nameserver, so:
>
> 2.1. I couldn't to find an appropiate API. I had to do a "hack",
>reading the "twisted.names" core to know implementation details:
>"client.theResolver.resolvers[-1].dynServers=[('127.0.0.1', 53)]"
>
> 2.2. The previous "hack" is only effective for future
>"twisted.names.client" instances. The previous ones use the
>"/etc/resolv.conf" entries. Putting the "hack" code before any instance
>creation doesn't work.
>
> 2.3. While reading the framework code, I saw that "client" uses a
>resolver chain: host, cache, network. But the cache is initially clear
>(of course) and NEVER ever gets populated, so we are not using it but
>checking missing entries eats CPU: 155 seconds for the unchanged code,
>125 seconds if I drop the host and cache resolvers.
>
> A caching client would be very nice, if the client is long running (my
>original idea).

All three of these can be addressed by constructing your own resolver:

  from twisted.names import client
  myResolver = client.Resolver(servers=[('127.0.0.1', 53)])

This gives you a resolver which uses only localhost, doesn't involve any nasty hacks, and doesn't have an /etc/hosts resolver or a caching resolver to slow things down.

>
> 2.4. The resolution failure code is only called if the resolution
>timeouts. But if the domain doesn't exists, the code called is the
>"success" one, with a "nil" answer. So we can't diferenciate between
>inexistant domains and inexistant RRs.

Hmm.  The non-existence of the domain is hidden by the very last step in performing the lookup.  The Resolver class has a method, filterAnswers, which is used to turn a DNS response into the three-tuple of lists which all the lookup* methods return.  You may want to subclass Resolver and override filterAnswers to behave differently when the `message' argument it is given has an `rCode' attribute equal to twisted.names.dns.ENAME, which indicates the name requested does not exist.

>
>3. How can I stop this ".tac"?. If I do "reactor.stop()", I get an
>infinite error, repeated forever:

reactor.stop() is the correct way to end the program.  If you still have this problem after you have split the program into multiple files, please post again.

Jean-Paul




More information about the Twisted-Python mailing list