Opened 4 years ago

Closed 8 months ago

#6197 enhancement closed fixed (fixed)

Port twisted.web.client.downloadPage to Python 3

Reported by: itamar Owned by: hawkowl
Priority: lowest Milestone: Python-3.x
Component: web Keywords:
Cc: jknight Branch: branches/downloadpage-py3-6197-2
branch-diff, diff-cov, branch-cov, buildbot
Author: hawkowl

Description

Port twisted.web.client.downloadPage and related code should be ported to Python 3 (unless we decide to deprecate it before the port happens).

Attachments (2)

downloadPage.patch (1.2 KB) - added by vmarkovtsev 2 years ago.
downloadPage_getPage.patch (1.6 KB) - added by vmarkovtsev 2 years ago.

Download all attachments as: .zip

Change History (19)

comment:1 Changed 4 years ago by DefaultCC Plugin

  • Cc jknight added

Changed 2 years ago by vmarkovtsev

comment:2 Changed 2 years ago by vmarkovtsev

My patch does it. It depends on porting web.client.Agent (ticket #7407).

comment:3 Changed 2 years ago by vmarkovtsev

web/test/test_webclient.py is passed 100% without any modifications

Changed 2 years ago by vmarkovtsev

comment:4 Changed 2 years ago by vmarkovtsev

The second patch ports getPage() together with downloadPage().

comment:5 Changed 11 months ago by hawkowl

  • Author set to hawkowl
  • Branch set to branches/downloadpage-py3-6197

(In [45480]) Branching to downloadpage-py3-6197.

comment:6 Changed 11 months ago by hawkowl

  • Keywords review added

Tests are green. Please review.

comment:7 Changed 10 months ago by adiroiban

  • Keywords review removed
  • Owner set to hawkowl

Sorry for the delay.

I got some feedback from the ML and since Glyph is also in favor of bytes we no longer have blocking tie :)

Many thanks for the patience and for working on this.


In HTTPDownloader is it ok to change the default value for arguments? I know that 0/1 were supposed to be used as bool, but maybe some people were already using them as int and then the code will break.

For isinstance(fileOrName, (str, unicode)) maybe we can implement in t.python.compat something like isText(fileOrName) so that it make more sense and hint about the reason why we check the type.

In twisted/web/error.py why do we need to convert the code to bytes when it is documented as requiring bytes?

In twisted/web/test/test_webclient.py do we still need to encode('utf-8') the test.file ?


Please check my comments and resubmit.

Thanks!

comment:8 Changed 10 months ago by hawkowl

  • Keywords review added
  • Owner hawkowl deleted

Hi Adi, thanks for the review.

  1. I think it's fine. It was quite clearly True/False before, so it's better to have it explicitly so now.
  2. I think that such a general check isn't that good -- this is just replacing what types.StringTypes was, and it shouldn't be replicated in other errors.
  3. This is because some things gave it as an int, but that was only tests. Fixed.
  4. No, we don't. Fixed.

Builders spun. Please review.

comment:9 Changed 10 months ago by adiroiban

  • Keywords review removed
  • Owner set to hawkowl

Hi,

Thanks for the update!

There is no mention in this ticket or in the code/news fragment why int is now an accepted type for the argument.

In a previous comments you mentioned that it should only accept bytes. Why does it still need to accept int?

Please check my comments and resubmit.

Thanks!

comment:10 Changed 9 months ago by posita

Two notes:

  1. This might be the same question as @adiroiban's most recent, but if I'm reading your prior response correctly, namely ...

This is because some things gave it as an int, but that was only tests. Fixed.

... then the if isinstance(code, int) block should no longer be necessary in t.w.e.Error.__init__. Does it need to remain for backward compatibility with third party libraries?

  1. This is a super minor nitpick (feel free to ignore).

In the docstrings for some places (see t.w.e.Error as an example), C{str} is changed to C{bytes}, but in others, C{str} (or C{bytes}, etc.) is changed to L{bytes}. Does L{...} make sense with builtin types?

Some of these look like they're adjusted to C{bytes} in [45485] (see #7404), but not all.

This is probably inconsequential. It appears L{<builtin-type>} renders the same as C{<builtin-type}, at least in all the places I can see.

comment:11 Changed 9 months ago by hawkowl

  • Branch changed from branches/downloadpage-py3-6197 to branches/downloadpage-py3-6197-2

(In [45786]) Branching to downloadpage-py3-6197-2.

comment:12 Changed 8 months ago by hawkowl

  • Owner hawkowl deleted

Okay, I've added a note about backwards compatibility.

RE: posita -- L{} does make sense for built in types -- L{} should link to the Python API docs, now or in the future. If it can't resolve it, it works the same as C{}.

Please review.

comment:13 Changed 8 months ago by hawkowl

  • Keywords review added

comment:14 follow-up: Changed 8 months ago by adiroiban

  • Keywords review removed
  • Owner set to hawkowl

Since the code is now documented as both bytes and int I think that the conversion code is not there for backward compatibility but to comply with the docstring.

If we want to keep backward compatiblity, maybe we can silently do the conversion, without also advertising the support for int in the docstring.

Also, from I understand, the problem is with the Agent code which passes 'int' instead of 'str' (as it was documented). Since downloadPage passed str and not bytes, it should be fine and it does not require any special backward compatibility.

Please update the docstring or the comment... I am still confused :(

Thanks for the changes!

comment:15 in reply to: ↑ 14 Changed 8 months ago by glyph

Replying to adiroiban:

Since the code is now documented as both bytes and int I think that the conversion code is not there for backward compatibility but to comply with the docstring.

The comment doesn't actually say "for backwards compatibility", it just gives some historical context.

If we want to keep backward compatiblity, maybe we can silently do the conversion, without also advertising the support for int in the docstring.

The problem with not advertising int in the docstring is that all of our public constants in twisted.web.http are exposed as int. So that would be a glaring incompatibility in the API. But, HTTP responses may received malformed codes, so sometimes it makes sense to get the bytes instead, since it may not fit into an int. Although IANA has never assigned something which cannot be represented as an int - http://www.iana.org/assignments/http-status-codes/http-status-codes.xhtml - microsoft has a couple of weird extensions - (see 401.1, etc, here: https://support.microsoft.com/en-us/kb/943891 ) I do not think it is very clear-cut what to do in this case.

Also, from I understand, the problem is with the Agent code which passes 'int' instead of 'str' (as it was documented). Since downloadPage passed str and not bytes, it should be fine and it does not require any special backward compatibility.

str and int are still two quite different types; I'm not sure what you're getting at here.

Please update the docstring or the comment... I am still confused :(

I would replace the / with the word or, but other than that, I actually think both docstring and comment are fine. If we want a holistic redesign of twisted.web.error let's do that but I think combining that with this ticket would be doing too much at once. (And as the comment says - this is already the way things were, mostly; just making it correct and consistent in the face of variable input, which we already had to deal with.)

comment:16 Changed 8 months ago by glyph

I think this is basically ready to land; I can appreciate Adi's concerns about comprehensibility, but this is a very small corner of the codebase, and mostly this is just describing the situation a bit better than it is now. We can follow up with other fixes; nothing about those comments suggests to me that this change needs to be held up.

comment:17 Changed 8 months ago by hawkowl

  • Resolution set to fixed
  • Status changed from new to closed

(In [45857]) Merge downloadpage-py3-6197-2: Port twisted.web.client.downloadPage to Python 3

Author: hawkowl Reviewers: adiroiban, glyph Fixes: #6197

Note: See TracTickets for help on using tickets.