Opened 5 years ago

Last modified 23 months ago

#4180 task new

Why is the check on self.dtpInstance in FTP.ftp_NLST different from ftp_RETR/ftp_STOR?

Reported by: jesstess Owned by: adiroiban
Priority: normal Milestone:
Component: ftp Keywords:
Cc: jesstess, adi@… Branch: branches/ftp-dtp-checks-4180
(diff, github, buildbot, log)
Author: exarkun Launchpad Bug:

Description

Is there a reason, or is one of the checks wrong?

Attachments (3)

4180-list-checks-1.diff (10.4 KB) - added by adiroiban 2 years ago.
4180-list-checks-2.diff (11.9 KB) - added by adiroiban 2 years ago.
4180-list-checks-3.diff (11.7 KB) - added by adiroiban 2 years ago.

Download all attachments as: .zip

Change History (18)

comment:1 Changed 5 years ago by glyph

  • Component changed from core to ftp
  • Owner changed from glyph to jesstess

I don't know anything about FTP, sorry.

You should find someone more appropriate to assign this to. Spiv, maybe?

comment:2 Changed 4 years ago by adiroiban

  • Cc adi@… added

ftp_LIST and ftp_NLST are now really well implemented.
They use a simple implementation and assume that once we have a connected DTP transport nothing can go wrong after that. They are not sending an error if the DTP tranport is closed while data are still sent.

The implementation for ftp_RETR and ftp_STOR is better since ftp_RETR and ftp_STOR are checking for DTP status while sending/receiving files and you can see that if there are problems with the DTP tranport they issue a CNX_CLOSED_TXFR_ABORTED code.

The comment „# Uh, for now, do this retarded thing.” could be improved to explain why the check is not that bright.

To have a consistent way of checking for valid DTP tranport, ftp_LIST and ftp_NLST should be enhanced so that they will also handle the case when the DTP transport is closed in mid-connection.

I have made some changes on my local branch and I can spend some more time to finalize the changes in case someone has time to review the branch....
This should also be a good exercise for discovering the contribution process for twisted.

comment:3 Changed 4 years ago by <automation>

  • Owner jesstess deleted

comment:4 Changed 2 years ago by adiroiban

  • Keywords review added

Hi,

I start working toward fixing this ticket.

An initial diff is here:
https://github.com/adiroiban/twisted/compare/4180-list-checks

Please let me know if you want me to attached a svn patch to this ticket.


The difference between ftp_RETR and ftp.NLST is due to the fact that ftp_NLST/ftp_LIST does not provides a mechanism for waiting for the data channel connection to be initialized.

I added a callback that waits for data channel initialization with a timeout for ftp_NLST and ftp_LIST.

I have also added tests for the changes.


Thanks!

comment:5 Changed 2 years ago by exarkun

  • Keywords review removed
  • Owner set to adiroiban

Please at least link to a plaintext version of the diff. An HTMLized version can't be applied, so can't be dealt with easily.

comment:6 Changed 2 years ago by adiroiban

  • Keywords review added

Hi,

Here is the diff URL:
https://github.com/adiroiban/twisted/compare/4180-list-checks.diff

Please let me know if this is ok, otherwise I will look at using SVN.

For me using Github is much easier since is complicated for me to work with SVN branches and commits on a cloned svn server.

PS: In Github you can generally get the diff by appending ".diff" to the url.

comment:7 Changed 2 years ago by thijs

  • Owner adiroiban deleted

comment:8 Changed 2 years ago by spiv

  • Owner set to spiv
  • Status changed from new to assigned

comment:9 follow-up: Changed 2 years ago by spiv

  • Keywords review removed
  • Owner changed from spiv to adiroiban
  • Status changed from assigned to new

(As you've found on other tickets, attaching a patch is easier for us, as it provides both the raw patch for us to apply, and a nicely colourized form we are used to reading on other tickets. As nice as Github may be, consistency for Twisted reviewers is even nicer.)

The patch adds a .gitignore file. We obviously don't want that :)

+from twisted.internet import reactor, interfaces, protocol, task, error, defer

I'm not sure why you added task to the middle of the imports. Either appending or using alphabetical order would be better. This is obviously just a minor nit :)

+    DATA_CONNECTION_COMMANDS = 'PORT or PASV'

This constant is used precisely once, interpolated into a string literal. I don't see the point, just write "PORT or PASV" directly in the other literal.

+        If data transport was not requeted using PORT, PASV etc it raises

Typo: "requeted" should be "requested".

-            return defer.fail(BadCmdSequenceError('must send PORT or PASV before RETR'))

FWIW, I mildly prefer this phrasing of the error message to the new one.

+            log.err(results)

Why add this?

+        timout_call = reactor.callLater(

This is both a typo and a style violation. It should be timeoutCall.

There many names_with_underscores in this patch. Please update them all to be namesWithCamelCase.

+            if timout_call.called:

This smells. You're *polling* (on a 100ms interval!) when you could just wait on the event using Deferreds. It would make _cbWaitDTPConnectionWithTimeout much simpler too. Get rid of the polling.

+    def checkDataTransportStarted(self, command):

This should be a private method, It think. (i.e. method name starts with an underscore.)

     def ftp_STOR(self, path):
-        if self.dtpInstance is None:
-            raise BadCmdSequenceError('PORT or PASV required before STOR')
+        """
+        This command causes the server-DTP to accept the data

Thank you very much for adding docstrings to existing methods!

+        '''Prepare data transport protocol to look like it was created by

Style: always use triple double-quotes for docstrings.

+        # Set timeout to a very small value to not slow down tests.

Don't do this. It's better to make the clock used by the object a parameter, so you don't need to use the system clock at all, and can precisely control how timeouts behave. Look at how other tests in Twisted do this.

comment:10 in reply to: ↑ 9 Changed 2 years ago by adiroiban

  • Keywords review added
  • Owner changed from adiroiban to spiv

Replying to spiv:

(As you've found on other tickets, attaching a patch is easier for us, as it provides both the raw patch for us to apply, and a nicely colourized form we are used to reading on other tickets. As nice as Github may be, consistency for Twisted reviewers is even nicer.)

The patch adds a .gitignore file. We obviously don't want that :)

Sorry for the patch.
I agree and I will attach patches in Trac.

+from twisted.internet import reactor, interfaces, protocol, task, error, defer

I'm not sure why you added task to the middle of the imports. Either appending or using alphabetical order would be better. This is obviously just a minor nit :)

This is not really a minor issues. Don't know what I was thinking.
I have updated the patch to import them in alphabetical order.

+    DATA_CONNECTION_COMMANDS = 'PORT or PASV'

This constant is used precisely once, interpolated into a string literal. I don't see the point, just write "PORT or PASV" directly in the other literal.

I have updated the patch.
This was imported as a constant since this was taken from a code where EPRT and EPSV are also implemented and the implementation just updates this string.

I will remove it for now.

+        If data transport was not requeted using PORT, PASV etc it raises

Typo: "requeted" should be "requested".

-            return defer.fail(BadCmdSequenceError('must send PORT or PASV before RETR'))

FWIW, I mildly prefer this phrasing of the error message to the new one.

I used this phrasing to be consistent with the other messages from FTP.processCommand:

  • USER required before PASS
  • PASS required after USER
  • RNTO required after RNFR
+            log.err(results)

Why add this?

I have removed it.
It was added since I saw that RETR and STOR were logging error messages.

+        timout_call = reactor.callLater(

This is both a typo and a style violation. It should be timeoutCall.

There many names_with_underscores in this patch. Please update them all to be namesWithCamelCase.

I have fixed the type and updated all names to camel case.

+            if timout_call.called:

This smells. You're *polling* (on a 100ms interval!) when you could just wait on the event using Deferreds. It would make _cbWaitDTPConnectionWithTimeout much simpler too. Get rid of the polling.

I removed the polling, and used the DTPFactory deferred to get an event when the dtpInstance is connected.

+    def checkDataTransportStarted(self, command):

This should be a private method, It think. (i.e. method name starts with an underscore.)

Done.

     def ftp_STOR(self, path):
-        if self.dtpInstance is None:
-            raise BadCmdSequenceError('PORT or PASV required before STOR')
+        """
+        This command causes the server-DTP to accept the data

Thank you very much for adding docstrings to existing methods!

+        '''Prepare data transport protocol to look like it was created by

Style: always use triple double-quotes for docstrings.

Done.
Hope I have replaced all single quotes.

+        # Set timeout to a very small value to not slow down tests.

Don't do this. It's better to make the clock used by the object a parameter, so you don't need to use the system clock at all, and can precisely control how timeouts behave. Look at how other tests in Twisted do this.

There is an initial change for not using polling, but I don't think is the right way do do it.

The timeout test are not working since I don't know how to use the task.Clock()

I will attach a diff with latest changes and will continue working at the tests.

Thanks again for the review!
Adi

Changed 2 years ago by adiroiban

comment:11 Changed 2 years ago by thijs

Thanks for your patch. Also make sure docstrings are

"""
On 3 lines.
"""

and epytext references use C{} or L{}, I saw a {} somewhere.

Changed 2 years ago by adiroiban

comment:12 Changed 2 years ago by adiroiban

I have attached an updated patch.

I have updated all docstrings to follow the 3 line convention.

The timeout tests are not using task.Clock since I still don't know how to fit it in the test.

The current docstring for NLST state that RFC 959 http://www.ietf.org/rfc/rfc959.txt demands that NLST will not return any errors, but at page 54, NLST has the same state diagram as LIST with error and failure messages.
Maybe this should be handled in another ticket, so I have updated to test according to the current docstring.

Any hints about how to fit task.Clock in the current tests are much appreciated.

Thanks!

Changed 2 years ago by adiroiban

comment:13 Changed 23 months ago by exarkun

  • Owner changed from spiv to exarkun

This is not really a minor issues

Actually, I'd say that the order of modules in an import statement is a 'very' minor issue. :) It's possible for order to make a difference to behavior, but no Twisted code should ever be written so that it actually does make a difference. Adding the module to the end of the module list makes the diff easier to read.

comment:14 Changed 23 months ago by exarkun

  • Author set to exarkun
  • Branch set to branches/ftp-dtp-checks-4180

(In [36375]) Branching to 'ftp-dtp-checks-4180'

comment:15 Changed 23 months ago by exarkun

  • Keywords review removed
  • Owner changed from exarkun to adiroiban

Thanks for working on this. Sorry about the review latency. It's nice to see some interest in the FTP implementation, I wish a few more people would get involved. :)

  1. twisted/test/test_ftp.py
    1. _startDataConnection doesn't set up dtpInstance properly:
      1. None is a better dummy value than 'ignore_address', to pass to buildProtocol.
      2. The dtpTransport local doesn't -- or shouldn't -- need to have a protocol attribute set on it. A protocol attribute is not part of the transport interface. If any code in ftp.py relies on this, it's a bug that should be fixed (perhaps separately, before proceeding on this ticket).
      3. buildProtocol is called and then its result is ignored. Then DTP() is instantiated directly to create the protocol. One of these is redundant (and using buildProtocol is probably better, since it exercises real code instead of made up test-only code).
      4. The way to hook a protocol up to its transport is proto.makeConnection(transport), not to directly set the transport attribute on the protocol.
    2. The need to explicitly set a tiny timeout in test_LISTTimeout gives a good example of why the prevailing testing style in this module, to set up real FTP servers and clients that interact over TCP using a real reactor, is not ideal. Something to keep in mind for future maintenance of this module is to fix the tests not to use real reactors or real TCP connections.
    3. Since ftp_STOR and ftp_RETR are changing in this refactoring as well, it would be good to also improve their test coverage. Can you add similar timeout tests for them?
    4. Also, it wouldn't hurt to refactor the existing tests for command sequence (testRETRBeforePORT, testSTORBeforePORT) and the new tests (test_NLSTWithoutDataChannel, test_LISTWithoutDataChannel) to share code.
  2. twisted/protocols/ftp.py
    1. _checkDataTransportStarted is checking for fewer error cases than some of the existing code that was changed to start using this helper. The not self.dtpInstance.isConnected check is important, since the factory's buildProtocol is necessarily called 'before' the protocol instance is connected to a transport (since it must create the protocol instance before it can be connected to anything). dtpInstance is set in buildProtocol, but dtpInstance.isConnected is only set to true after DTP.connectionMade is called. These will likely happen at almost the same time, but not necessarily exactly the same time. The unit tests don't detect this problem because they always connect the protocol immediately, before letting the test continue. If they ever called buildProtocol and then issued a LIST/NLST/STOR/RECV without calling dtpInstance.makeConnection(transport) first, they'd get tripped up from trying to use an unconnected protocol instance.
    2. Related to the previous point, ftp_STOR and ftp_RETR confuse me a little bit. They each have code to try to send back either DATA_CNX_ALREADY_OPEN_START_XFR or FILE_STATUS_OK_OPEN_DATA_CNX. Does this mean they really try to support PORT after STOR/RETR? If so, how does that relate to what the RFC says to allow? And how does that relate to the new _checkDataTransportStarted call they each now begin with? It seems like it will prevent them from ever getting to the FILE_STATUS_OK_OPEN_DATA_CNX case.
    3. in _cbWaitDTPConnectionWithTimeout, in the nested function cbContinueCommand, I don't see how timeoutCall could ever be None
    4. According to python-coverage, one branch through cbContinueCommand is not exercised by the test suite. The code only ever goes into the if timeoutCall is not None and timeoutCall.active(): branch, it never jumps over it.
    5. The change in how listErr is added as an errback in ftp_NLST looks like a good one, but it seems to be untested. It also seems largely unrelated to the rest of the work for this ticket, perhaps it merits a ticket of its own?

Thanks again for working on this. I'll make a personal effort to re-review the next version of the patch quickly (so feel free to re-assign the ticket to me when the time comes).

Also note I've checked the patch into a subversion branch and made just a few minor changes to it. If it's convenient for you, please generate the next patch against the branch. If something else is more convenient, please just describe what that is when you attach the new patch.

Note: See TracTickets for help on using tickets.