Opened 6 years ago

Closed 5 years ago

Last modified 5 years ago

#7407 enhancement closed fixed (fixed)

Port twisted.web.client.Agent to Python 3

Reported by: Vadim Markovtsev Owned by: hawkowl
Priority: low Milestone: Python-3.x
Component: web Keywords:
Cc: jknight, Maxim Branch: branches/agent-py3-7407-3
branch-diff, diff-cov, branch-cov, buildbot
Author: hawkowl

Description

For the patches, please refer to GitHub. They enable successful run of the examples from client docs.

I need help with testing this on Python 2 I am not sure if I do not break anything.

Attachments (2)

agent.patch (15.1 KB) - added by Vadim Markovtsev 6 years ago.
agent.2.patch (80.7 KB) - added by Vadim Markovtsev 6 years ago.

Download all attachments as: .zip

Change History (25)

comment:1 Changed 6 years ago by DefaultCC Plugin

Cc: jknight added

comment:2 Changed 6 years ago by Jean-Paul Calderone

Branch: trunk
Keywords: web removed
Summary: Port twisted.web.client.AgentPort twisted.web.client.Agent to Python 3

comment:3 Changed 6 years ago by Jean-Paul Calderone

Thanks. Please split this up into multiple tickets. The patch that finally ports Agent shouldn't involve changes to twisted/internet/ and twisted/python/. Please refer to https://twistedmatrix.com/trac/wiki/Plan/Python3 (in particular the Tips section) for a guideline on how to contribute porting effort. Thanks again.

Changed 6 years ago by Vadim Markovtsev

Attachment: agent.patch added

comment:4 Changed 6 years ago by Vadim Markovtsev

Added tickets #7414, #7415, #7416, #7417 for previous 4 commits and agent.patch for the final one.

Last edited 5 years ago by Glyph (previous) (diff)

comment:5 Changed 6 years ago by Vadim Markovtsev

Please note that I do not touch python/dist3.py (we need to add _newclient.py there). #6197 now depends on this ticket.

Changed 6 years ago by Vadim Markovtsev

Attachment: agent.2.patch added

comment:6 Changed 6 years ago by Vadim Markovtsev

The second patch changes the following: 1) Python 2.7 test_agent.py tests pass 100% 2) test_agent.py was ported to Python 3 3) test_agent.py has 36 errors and 0 failures. The errors are dedicated to cookies and redirection. 4) #6197 cleanly passes test_webagent.py 5) Changed the initial port nicely

comment:7 Changed 6 years ago by Maxim

Cc: Maxim added

I work on porting twisted.web.client to Python 3 too. And I have a question. There are stubs for "context factories for https" that are used in tests, but they are now deprecated in favor of IPolicyForHTTPS providers.

Should tests be adjusted to reflect this change?

comment:8 Changed 6 years ago by Jean-Paul Calderone

Should tests be adjusted to reflect this change?

Not if doing so would remove test coverage for supported functionality - even if that functionality is deprecated.

comment:9 Changed 5 years ago by hawkowl

Author: hawkowl
Branch: branches/agent-py3-7407

(In [45485]) Branching to agent-py3-7407.

comment:10 Changed 5 years ago by posita

UPDATE: Please disregard. I was confused. My original comment is preserved below. But see my comment:12.


My guess is that you also want the following in t.w.c.CookieAgent.request:

 --- twisted/web/client.py
 +++ twisted/web/client.py
 @@ -1768,7 +1768,7 @@ class CookieAgent(object):
          # cookies.
          if not headers.hasHeader(b'cookie'):
              self.cookieJar.add_cookie_header(lastRequest)
 -            cookieHeader = lastRequest.get_header('Cookie', None)
 +            cookieHeader = lastRequest.get_header(b'Cookie', None)
              if cookieHeader is not None:
                  headers = headers.copy()
                  headers.addRawHeader(b'cookie', networkString(cookieHeader))
Last edited 5 years ago by posita (previous) (diff)

comment:11 Changed 5 years ago by posita

Also, per @adiroiban's comment in #6197 regarding the use of encode('utf-8') in twisted/web/test/test_webclient.py, I noticed that the same encode call appears in the line modified in [45485]. You may want to make the same change here to avoid a potentially confusing merge conflict later, depending on which branch is merged first.

comment:12 Changed 5 years ago by posita

Regarding my (now retracted) question regarding t.w.c.CookieAgent.request, there's something a bit unsettling about having literals out there that are interpreted as different types depending on which interpreter is used. I realize that's what twisted.python.compat is for, but is there any reason not to use from __future__ import unicode_literals in as many modules as possible as part of the Python 3 general strategy?

This risks being off-topic, but I think the following is appropriate:

 --- twisted/python/compat.py
 +++ twisted/python/compat.py
 @@ -439,6 +439,8 @@ if _PY3:


      def networkString(s):
 +        if isinstance(s, bytes):
 +            return s
          if not isinstance(s, unicode):
              raise TypeError("Can only convert text to bytes on Python 3")
          return s.encode('ascii')
 @@ -454,11 +456,9 @@ else:
      lazyByteSlice = buffer

      def networkString(s):
 -        if not isinstance(s, str):
 +        if not isinstance(s, basestring): # falls through for both str and unicode
              raise TypeError("Can only pass-through bytes on Python 2")
 -        # Ensure we're limited to ASCII subset:
 -        s.decode('ascii')
 -        return s
 +        return s.encode('ascii') # always a str, if successful

  iterbytes.__doc__ = """
  Return an iterable wrapper for a C{bytes} object that provides the behavior of

A side effect of such a change is that one could submit bytes literals to t.w.c._FakeUrllib2Request.get_header (per my original comment:10), which (if I understand correctly), one probably should be able to do anyway?

Last edited 5 years ago by posita (previous) (diff)

comment:13 Changed 5 years ago by posita

What about updating the docs, or should that be handled under a separate ticket?

More specifically, the code samples in docs/web/howto/client.rst will only run on Python 2 (they don't call print as a function, they don't explicitly use bytes for URLs, etc.). To illustrate, here's an update of the first in-line code sample which should run in both 2 and 3:

 --- docs/web/howto/client.rst
 +++ docs/web/howto/client.rst
 @@ -371,6 +371,7 @@ an *HTTPS* URL with no certificate verification.
  .. code-block:: python


 +    from __future__ import print_function, unicode_literals
      from twisted.python.log import err
      from twisted.web.client import Agent
      from twisted.internet import reactor
 @@ -381,13 +382,13 @@ an *HTTPS* URL with no certificate verification.
              return ClientContextFactory.getContext(self)

      def display(response):
 -        print "Received response"
 -        print response
 +        print("Received response")
 +        print(response)

      def main():
          contextFactory = WebClientContextFactory()
          agent = Agent(reactor, contextFactory)
 -        d = agent.request("GET", "https://example.com/")
 +        d = agent.request(b"GET", b"https://example.com/")
          d.addCallbacks(display, err)
          d.addCallback(lambda ignored: reactor.stop())
          reactor.run()

The __future__ import is probably unnecessary for this particular example, but I find it useful for narrowing behavioral gaps between 2 and 3 (e.g., generating similar errors in both environments when one makes a mistake).

Either way, it looks like many of the sub-listings may be affected as well:

% grep -l Agent docs/web/howto/listings/client/*
docs/web/howto/listings/client/cookies.py
docs/web/howto/listings/client/endpointconstructor.py
docs/web/howto/listings/client/filesendbody.py
docs/web/howto/listings/client/gzipdecoder.py
docs/web/howto/listings/client/request.py
docs/web/howto/listings/client/response.py
docs/web/howto/listings/client/responseBody.py
docs/web/howto/listings/client/sendbody.py
Last edited 5 years ago by posita (previous) (diff)

comment:14 Changed 5 years ago by hawkowl

Branch: branches/agent-py3-7407branches/agent-py3-7407-2

(In [45843]) Branching to agent-py3-7407-2.

comment:15 Changed 5 years ago by hawkowl

Keywords: review added; client Agent removed
Owner: set to Glyph

This is merged forward.

The twistedchecker error is spurious, the static analysis can't pick up that it's set on __doc__ later.

Please review (or rather, Glyph, please review, rgd previous discussions).

comment:16 Changed 5 years ago by hawkowl

Branch: branches/agent-py3-7407-2branches/agent-py3-7407-3

(In [45858]) Branching to agent-py3-7407-3.

comment:17 Changed 5 years ago by hawkowl

Merged forward again, builders spun.

comment:18 Changed 5 years ago by Glyph

Keywords: review removed
Owner: changed from Glyph to hawkowl

Thanks for the constant slog towards 3, hawkowl. All this work is very helpful.

There are a couple of things that should really be addressed before merging.

  1. shouldRetry should get a @param on its method parameter now that it's being explicitly checked against bytes. (I mean, it always should have.)
  1. _requestWithEndpoint: nativeString is really intended to be called with an ASCII-only string literal, not arbitrary user bytes. In fairness its docstring doesn't say this, but it should; the idea is that sometimes you need a literal to use for native-str processing as with i.e. identifiers. By using it with uri.host in various places you're exposing it to input from users, which means that you're exposing it to potential unicode input, which means you are inviting inscrutable exceptions when non-ASCII is passed in. Of course, the right solution to this is (drum roll please...) #5388. In the meanwhile, I can see why nativeString appeals; nothing else converts to the requisite ASCII-only value for the hostname, but please deal with the returned errors and emit something meaningful, like ValueError("The URI b'http://foo\xe2\x80\x94bar/' contains non-ASCII octets in its hostname") or something.
  1. _FakeUrllib2Request should really update its documentation to say native L{str} on all its @ivars to be more explicit. (Given that _FakeUrllib2Request is effectively always constructed by a string literal in the tests, nativeString is appropriate here, in contrast to the above criticism of its use in _requestWithEndpoint; the exception raised will be directly where the problem string was passed.)

I think you can address these and land; although there's a lot to think about with the second point, there's no really a whole lot to do - just one test to check an error condition.

It looks like the rest of the branch is just sprinkling b prefixes around all the strings in the tests, so otherwise pretty good.

Some things I noticed that weren't really related to this branch but I should nevertheless raise, though:

  1. Why are we using TCP4ClientEndpoint and not HostnameEndpoint in the http case? I realize that `TLSEndpoint` isn't quite ready yet, so we still need SSL4ClientEndpoint, but there should probably be a ticket for fixing this.
  2. Looking at the code in this branch I am really starting to want the methods on Headers to be more permissive in terms of what types they take.
  3. I don't like the idiom used for redirections in compat. Conditional definitions like this should take into account the fact that twistedchecker is going to complain about them; there are ways to do this so it doesn't complain (which actually look nicer to read), and we should do it that way. However, I don't want to hold up this branch on its copying of a bad pattern; I'll do a separate branch that fixes the pattern overall within that module.
  4. thank you for getting rid of the usage of splithost and splittype. Pretty sure those were not supposed to be public APIs.
  5. It seems slightly inconsistent that urlquote, urlunquote, and cookielib are in compat, but urlunparse, urljoin, and urldefrag are in twisted.web.client directly. Not really worth addressing though; I'll probably get rid of these from twisted.web.client entirely and do the conditional imports in twisted.python.url (to ensure it's easier to vendor out).

Thanks again!

comment:19 Changed 5 years ago by hawkowl

Keywords: review added
Owner: changed from hawkowl to Glyph

T GLYPH THANX FOR THE REVIEW

I've fixed #1 and #3, but I don't understand #2. _requestWithEndpoint doesn't seem to have a call to nativeString. Could you give me some line numbers?

Otherwise, I'll put this back up for review in the meantime, as in fixing #3 I added more docstrings.

comment:20 in reply to:  19 Changed 5 years ago by Glyph

Keywords: review removed
Owner: changed from Glyph to hawkowl

Replying to hawkowl:

T GLYPH THANX FOR THE REVIEW

I've fixed #1 and #3,

Have you? I don't see fixes on the github link.

but I don't understand #2. _requestWithEndpoint doesn't seem to have a call to nativeString. Could you give me some line numbers?

Sorry. endpointForURI. 1441. What I mean is just that the encoding error from nativeString isn't sufficiently expressive, and you should catch it to have an error that is intentionally chosen and displays a comprehensible representation of the entire URL.

Otherwise, I'll put this back up for review in the meantime, as in fixing #3 I added more docstrings.

I don't see these docstrings either. Did the mirroring hook break or is there supposed to be a new branch?

comment:21 Changed 5 years ago by hawkowl

Keywords: review added
Owner: changed from hawkowl to Glyph

Hi Glyph, all those things are actually pushed up now.

Builders spun, please review.

comment:22 Changed 5 years ago by Glyph

Keywords: review removed
Owner: changed from Glyph to hawkowl

OK, looks pretty good. Apropos IRC discussion, I guess nativeString is OK for now.

Please land.

comment:23 Changed 5 years ago by hawkowl

Resolution: fixed
Status: newclosed

(In [45917]) Merge agent-py3-7407-3: Port twisted.web.client.Agent to Python 3

Author: hawkowl Reviewer: glyph Fixes: #7407

Note: See TracTickets for help on using tickets.