Opened 6 months ago

Last modified 6 months ago

#7243 defect new

getPage still isn't going to verify HTTPS URLs

Reported by: glyph Owned by:
Priority: normal Milestone:
Component: web Keywords:
Cc: jknight Branch:
Author: Launchpad Bug:

Description

#4888 doesn't fix older APIs. They should probably also verify stuff, because there's no high-level API on top of the new thing yet, so they're not deprecated, and lots of examples use them.

Change History (9)

comment:1 Changed 6 months ago by DefaultCC Plugin

  • Cc jknight added

comment:2 Changed 6 months ago by glyph

  • Keywords review added

Replying to rwall:

Glyph, here's a quick and dirty review while I wait for a flight. I'll try and make some more thorough notes later (unless someone gets there first)

Points:

  1. source:branches/redux-4888/twisted/web/client.py
    1. Missing epydocs: https://buildbot.twistedmatrix.com/builders/twistedchecker/builds/2047/steps/run-twistedchecker/logs/new%20twistedchecker%20errors

Aside from these, there are spurious twistedchecker errors yet again.

I filed https://github.com/twisted/twistedchecker/issues/76 for set_connect_state (which is the only unfixable one) and https://github.com/twisted/twistedchecker/issues/75 for the inner-class stuff which I've worked around.

  1. The CreatorCreator suffix seems very confusing (as noted earlier) your suggested name seems better.

Right-o. Name changed.

  1. source:branches/redux-4888/twisted/web/test/test_agent.py
    1. Build Failures due to OpenSSL import errors - please fix
      1. https://buildbot.twistedmatrix.com/builders/debian-easy-no-zope-py2.6-epoll/builds/1862/steps/epoll/logs/problems
      2. https://buildbot.twistedmatrix.com/builders/debian-easy-old-zope-py2.6-epoll/builds/1857/steps/epoll/logs/problems
      3. https://buildbot.twistedmatrix.com/builders/py-without-modules/builds/2456/steps/trial/logs/problems
      4. https://buildbot.twistedmatrix.com/builders/windows7-64-py2.7/builds/3872/steps/iocp/logs/problems

These all seem to be the same failure. Perhaps more of our build farm should have pyOpenSSL installed? There's really just supposed to be the one builder without optional dependencies, I thought.

  1. Missing epydocs - please fix the important ones: https://buildbot.twistedmatrix.com/builders/twistedchecker/builds/2047/steps/run-twistedchecker/logs/new%20twistedchecker%20errors

I think these should all be fixed now, important or not :).

  1. Add a test for the deprecation of t.w.c.WebClientContextFactory

Hmm. While I'm at it, I think this should probably be using deprecatedModuleAttribute, actually; at least until the tickets referenced by ticket:4804#comment:15 are addressed.

And then when I tried to do the right thing here I discovered #7242 as well, and filed it; you can see the comment in the test code.

  1. Missing news file

Added.

I also added a formal interface because all these silly half-finished
looks-kinda-like-this-one-weird-class types are a bad idea and I should stop
putting them into public interfaces.

Last time when I put it up for review I knew it was in kinda bad shape and I
wanted feedback before putting more effort in; this time I think it's probably
ready to go.

I also filed #7243 for other APIs. Hopefully that's a duplicate of something?

comment:3 Changed 6 months ago by glyph

  • Keywords review removed

Blah, wrong ticket :(

comment:4 follow-up: Changed 6 months ago by exarkun

I wonder what remaining issues prevent getPage from being reimplemented on top of Agent.

comment:5 in reply to: ↑ 4 Changed 6 months ago by glyph

Replying to exarkun:

I wonder what remaining issues prevent getPage from being reimplemented on top of Agent.

That would certainly be an awesome way to fix this.

I thought there were parts of the getPage/downloadPage interface - particularly the who-knows-what "**kwargs" piece - that made it impossible to replicate completely?

We could just deprecate the use of that, if it's not important...

comment:6 follow-up: Changed 6 months ago by exarkun

I think we were just waiting for the IAgent implementations to reach feature parity with HTTPClientFactory and HTTPDownloader. That might have happened when we weren't looking.

Let's see. The features of getPage are:

  1. Specify the URL (Agent.request, check)
  2. Specify the method (Agent.request, check)
  3. Specify some data to upload (Agent.request, FileBodyProducer, check)
  4. Specify the header (Agent.request, check)
  5. Specify the User-Agent (Agent.request, check)
  6. Specify a request timeout (distinct from the connect timeout) (reactor.callLater, Deferred.cancel, check)
  7. Specify some cookies (CookieAgent, check)
  8. Control redirect following behavior (RedirectAgent or maybe BrowserLikeRedirectAgent, check)
  9. Impose a limit on how many redirects to follow (RedirectAgent, etc, check)
  10. afterFoundGet - this is more redirect control behavior, I think the choice of which redirect IAgent implementation you use lets you implement the feature this flag represents (check)
  11. Additionally, downloadPage has features for writing the result to the filesystem and resuming interrupted downloads. These can be implemented in terms of existing IAgent APIs but aren't directly supported by them. Still, there should be a net reduction in code if we reimplemented downloadPage on top of IAgent.

So... yea, actually, it looks like the IAgent universe has reached parity.

We *could* simply deprecate getPage and downloadPage but it might be gentler on users to leave the old interface. I don't feel strongly either way but it does seem feasible to just swap out the implementation. It might also be a nice way to get everyone onto IAgent sooner rather than later (less work for application developers is kind of yay).

comment:7 Changed 6 months ago by glyph

Thanks a ton for enumerating this. I'm honestly not familiar with all the ins and outs of either the old or the new HTTP client APIs so this list was tremendously informative to me.

comment:8 in reply to: ↑ 6 Changed 6 months ago by glyph

Replying to exarkun:

We *could* simply deprecate getPage and downloadPage but it might be gentler on users to leave the old interface. I don't feel strongly either way but it does seem feasible to just swap out the implementation. It might also be a nice way to get everyone onto IAgent sooner rather than later (less work for application developers is kind of yay).

So, I still frequently use getPage in examples, because using Agent is considerably wordier. I think we could probably have a version that took a reactor without making it too tedious, but more than that would start to be burdensome. (Particularly I want to have a tool for examples that doesn't involve typing a string like "GET" which you might get wrong. something.get, treq-style, might be an OK way to work around that.)

I'm not sure if you were referencing what I was proposing or not, but what I was saying was that we could deprecate use of **kwargs, not getPage itself.

comment:9 Changed 6 months ago by glyph

Based on your list though, it sounds like we don't need to deprecate kwargs, we could just literally implement all of the features exposed thereby.

Note: See TracTickets for help on using tickets.