Opened 3 years ago

Closed 3 years ago

#9137 release blocker: regression closed fixed (fixed)

Twisted 17.1 ThreadedResolver DNSLookupError with internationalized domain name

Reported by: Paul Tremberth Owned by: Amber Brown <hawkowl@…>
Priority: normal Milestone: 17.1.1
Component: core Keywords:
Cc: Branch:
Author:

Description

A recent bug report on scrapy is showing DNSLookupErrors for IDNs, e.g. https://шанти-шанти.рф, in Python 2.7 with Twisted 17.1.

Scrapy currently uses a customized `twisted.internet.base.ThreadedResolver` with reactor.installResolver.

  • name resolution works fine with Twisted 16.6 and Python 2.7
  • name resolution also works fine in Python 3, both with Twisted 16.6 or 17.1

We've tracked down the difference to how the name is passed to getHostByName with Twisted 17.1 (it looks like Unicode is passed).

Note that Scrapy passes the URL to Agent as bytes.

We are aware that Twisted 17.1 started using socket.getaddrinfo in its default resolver.

We have a patch to idna-encode the name before passing it to gethostbyname in our custom resolver. But we are wondering if the change was intended or not.

Sample program to reproduce the issue:

# -*- coding: utf-8 -*-

from __future__ import print_function

from twisted.internet import reactor
from twisted.internet.base import ThreadedResolver
from twisted.web.client import Agent
from twisted.web.http_headers import Headers


reactor.installResolver(ThreadedResolver(reactor))
agent = Agent(reactor)

d = agent.request(
    'GET',
    #'https://шанти-шанти.рф',
    'https://xn----7sbb4ac0ad0be6cf.xn--p1ai/',
    Headers({'User-Agent': ['Twisted Web Client Example']}),
    None)

def cbResponse(ignored):
    print('Response received: %r' % ignored)
d.addCallback(cbResponse)

def cbShutdown(ignored):
    print("cbShutdown: %r" % ignored)
    reactor.stop()
d.addBoth(cbShutdown)

reactor.run()

Result with Python 2.7.12+ (on Ubuntu 16.10) and Twisted 17.1:

$ python test.py 
cbShutdown: <twisted.python.failure.Failure twisted.internet.error.DNSLookupError: DNS lookup failed: no results for hostname lookup: xn----7sbb4ac0ad0be6cf.xn--p1ai.>

Result with Python 2.7.12+ (on Ubuntu 16.10) and Twisted 16.6:

$ python test.py 
Response received: <twisted.web._newclient.Response object at 0x7fe75c5c7ed0>
cbShutdown: None

Change History (8)

comment:1 Changed 3 years ago by Jean-Paul Calderone

Milestone: 17.1.1

comment:2 Changed 3 years ago by hawkowl

Keywords: review added

comment:4 Changed 3 years ago by hawkowl

Keywords: review added

@glyph might like this better now

comment:6 Changed 3 years ago by hawkowl

Keywords: review added

I IDNA encode now. A bit bigger scope than I wanted for this bug, but it's required.

comment:8 Changed 3 years ago by Amber Brown <hawkowl@…>

Owner: set to Amber Brown <hawkowl@…>
Resolution: fixed
Status: newclosed

In fedec1a:

Merge 9137-bytes-suck: Fix IResolverSimple not getting bytes on Py2/not doing IDNA

Author: hawkowl
Reviewers: markrwilliams, glyph
Fixes: ticket:9137

Note: See TracTickets for help on using tickets.