[Twisted-web] Design for HTTP client form login and scraping

Jean-Paul Calderone exarkun at divmod.com
Mon Jul 27 10:22:45 EDT 2009


On Mon, 27 Jul 2009 09:07:49 -0500, David Bern <odie5533 at gmail.com> wrote:
>On Mon, Jul 27, 2009 at 8:01 AM, Jean-Paul Calderone<exarkun at divmod.com> wrote:
>> On Mon, 27 Jul 2009 01:05:21 -0500, David Bern <odie5533 at gmail.com> wrote:
>>>I want to make multiple HTTP requests using the same set of cookies.
>>>Should I call client.getPage a lot thus creating multiple factories to
>>>do this?
>>
>> This is probably the right approach for the near term.  If you're worried
>> about the overhead of creating a lot of factories, I don't think you should.
>> Creating these objects isn't very expensive (particularly compared to parsing
>> html).
>>
>> Jean-Paul
>>
>> _______________________________________________
>> Twisted-web mailing list
>> Twisted-web at twistedmatrix.com
>> http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-web
>>
>
>Thank you for the fast reply. I am more worried about the difficulty
>of programming it this way, and was wondering if there is a better
>method. I want a class with functions which would correspond to
>different forms and pages on the web site. For instance, login(), then
>post_message(), or some such set of commands based on a configuration
>file. Would creating a class which calls client.getPage without
>inheriting anything be the best method to accomplish this?

That's probably what I'd do.  There's not much to be gained by subclassing
anything from twisted.web.client in this case.  I would reserve that for
cases where I wanted to change the behavior of something at the HTTP level,
for example providing different behavior for handling redirects.

>When would
>inheritance be the right method, only when extending the functionality
>of HTTP client in Twisted and not to make use of it? My main question
>here is that of style and ease of programming. I want to get into a
>good programming habit with Twisted rather than have to redesign and
>rewrite huge portions later because I put the logic in the wrong
>place.
>

I'm coming to think that a good rule of thumb is not to subclass things
unless you really need to.  There are probably some exceptions - for
example, I'll probably keep subclassing twisted.internet.protocol.Protocol
for a while to come, but my list of such exceptions is pretty short right
now.

I find subclassing to mainly cause problems - mostly to do with backwards
compatibility - and not offer sufficient benefits to outweigh these.

Jean-Paul



More information about the Twisted-web mailing list