Ticket #3886 (closed enhancement: fixed)

Opened 15 months ago

Last modified 7 months ago

Fixing the Socks4 proxy to be Socksv4a compliant

Reported by: lorph Owned by: jesstess
Priority: normal Milestone:
Component: core Keywords: socks
Cc: exarkun, jesstess, lorph, thijs, khorn Branch: branches/socksv4a-3886
Author: lorph Launchpad Bug:

Description

Socks4a is a common extension to Socks4 where domain names are appended to the end so DNS queries are resolved at the host.

It looked like the previous author attempted to implement it, but did not complete it as evidenced by the commenting out of "server=gethostbyname(server)". I uncommented it and implemented several additional fixes. (Note that you will eventually like to switch this to an asynchronous DNS query, but that would take a large overhaul, and this fix seems to be ok for now.)

The main bug I found after enabling DNS resolution, is that Firefox doesn't always send the handshake in a single packet.

Good: '\x04\x01\x00P\x00\x00\x00\x01MOZ\x00en.wikipedia.org\x00' Bad: '\x04\x01\x00P\x00\x00\x00\x01MOZ\x00'

'en.wikipedia.org\x00'

The previous version of the socks proxy will choke on the "Bad" input. This is solved by returning if the IP address is invalid and there is no domain name at the end.

--- C:\temp\socks.py-revBASE.svn003.tmp.py	Fri Jun 19 01:48:00 2009
+++ C:\Twisted\twisted\protocols\socks.py	Fri Jun 19 01:47:56 2009
@@ -69,16 +69,22 @@
             self.otherConn.write(data)
             return
         self.buf=self.buf+data
+        complete_buffer = self.buf
         if '\000' in self.buf[8:]:
             head,self.buf=self.buf[:8],self.buf[8:]
             try:
                 version,code,port=struct.unpack("!BBH",head[:4])
             except struct.error:
                 raise RuntimeError, "struct error with head='%s' and buf='%s'"%(repr(head),repr(self.buf))
-            user,self.buf=string.split(self.buf,"\000",1)
+            user,self.buf=self.buf.split("\000",1)
             if head[4:7]=="\000\000\000": # domain is after
-                server,self.buf=string.split(self.buf,'\000',1)
-                #server=gethostbyname(server)
+                # if the IP is invalid, and the Domain name is not present, we restore the buffer and wait for it
+                if self.buf == "":
+                    self.buf = complete_buffer
+                    return
+                server,self.buf=self.buf.split("\000",1)
+                server = socket.gethostbyname(server) # we need to resolve the host name since the IP is invalid
+                                                      # you probably will want to use asynchronous DNS, but its a big change.
             else:
                 server=socket.inet_ntoa(head[4:8])
             assert version==4, "Bad version code: %s"%version

Attachments

diff.txt Download (1.5 KB) - added by lorph 15 months ago.
Diff file
test_socks.txt Download (3.1 KB) - added by lorph 15 months ago.
Diff for socks4a unit test
SOCKSv4a.patch Download (9.3 KB) - added by jesstess 9 months ago.

Change History

Changed 15 months ago by lorph

Diff file

Changed 15 months ago by exarkun

  • owner changed from glyph to lorph
  • milestone Twisted-8.2+1 deleted

Thank you very much for the patch! It's nice to see someone do some work on this code.

One thing that's very important to include here is unit test coverage. Can you expand your patch to include new unit tests for this functionality?

Changed 15 months ago by lorph

Diff for socks4a unit test

Changed 15 months ago by lorph

  • keywords easy added

exarkun: I attached a diff for the unit tests in the file test_socks.txt

Changed 15 months ago by exarkun

  • owner lorph deleted
  • keywords review added

Changed 14 months ago by exarkun

  • branch set to branches/socksv4a-3886
  • branch_author changed from lorph to exarkun, lorph

(In [27119]) Branching to 'socksv4a-3886'

Changed 14 months ago by exarkun

(In [27120]) Apply diff.txt and test_socks.txt from lorph

refs #3886

Changed 14 months ago by exarkun

  • branch_author changed from exarkun, lorph to lorph

I tossed the two patches into a branch for easier viewing.

Changed 14 months ago by exarkun

  • owner set to lorph
  • cc exarkun added
  • keywords review removed

Thanks for adding the test patch. Here's some feedback (mostly, but entirely, minor and aesthetic):

  1. test_socks.py
    1. All test methods need docstrings. The docstring's job is to explain the behavior which the test verifies. The comment in test_socks4a_firefox_split is close to what the docstring should be, but it should explain what the desired behavior in this case is (it only describes the case being tested now).
    2. test_socks4a is a good test name, but test_socks4a_firefox_split doesn't follow the naming convention. An underscore should be used to separate test from the rest of the method name, but elsewhere, camelCase should be used. The socks code is really old, so it predates most of our current quality control procedures. Since there are only a handful of other socks test methods, I would suggest renaming them all to conform to the naming convention as well.
    3. test_socks4a_firefox_split is testing important behavior that's not really Firefox specific. TCP works such that it is always possible for part of the handshake to be delivered separately from the rest of it. The bytes of the handshake might even be delivered one at a time, separately from each other. So the test shouldn't talk about firefox, and there should be at least one more test that breaks up the packet in an even more obscure way (say, halfway through the domain name, or even better - right before the terminating nul).
    4. assertTrue is preferred over assert_; assertFalse is preferred over assert_(not ...); assertEquals is preferred over assertEqual. There's also assertIdentical and assertNotIdentical.
  2. socks.py
    1. complete_buffer should be completeBuffer
    2. Using socket.gethostbyname here is problematic. You're right that it'll be a bit more of a change to do the lookup asynchronously, but I think it's necessary. Doing this name lookup in a blocking manner creates the possibility of a very simple DoS attack. If you'd like help with this transformation, I'm happy to provide it.

Changed 12 months ago by jesstess

  • cc jesstess, lorph added

lorph, did you still want to finish up this ticket and #3920?

Changed 10 months ago by khorn

  • cc khorn added

Changed 9 months ago by exarkun

  • status changed from new to assigned
  • owner changed from lorph to exarkun

Changed 9 months ago by jesstess

  • owner exarkun deleted
  • status changed from assigned to new
  • keywords socks, review added; socks easy removed

SOCKSv4a.patch (patch against the branch) addresses exarkun's points above and also:

  • linewraps to 80 in both files
  • adds spacing in dataReceived
  • the existing code special-cased a split between the user and the domain. The new patch allows arbitrary splitting. The existing code looked like it was already trying to test this in test_socks4a, so now that that test case passes, test_socks4aSplit may be superfluous.

If this gets closed I'm happy to work on #3920.

Changed 9 months ago by khorn

Awesome! Great to see someone working this ticket.

A quick glance shows some test methods still need docstrings, but otherwise looks much closer.

Changed 9 months ago by jesstess

Changed 9 months ago by jesstess

New patch has docstrings for the SOCKSv4 class and dataReceived in socks.py, and for the Connect class in test_socks.py.

Changed 9 months ago by exarkun

Feel free to commit to the branch to ease the reviewer's job.

Changed 9 months ago by jesstess

Bah, forgot the ref in the commit message, but the patch has been applied in the branch.

Changed 8 months ago by thijs

  • keywords socks added; socks, review removed
  • owner set to jesstess
  • cc thijs added

Can you get rid of those one-liners;

"""A utility class for building protocols for incoming connections."""

and put docstrings on three lines in twisted.protocols.socks?

Changed 8 months ago by jesstess

(In [27851]) Make docstrings conform to current standard.

refs #3886

Changed 8 months ago by jesstess

  • keywords socks, review added; socks removed
  • owner jesstess deleted

Changed 8 months ago by exarkun

  • owner set to jesstess
  • keywords socks added; socks, review removed

Thanks for picking this up jesstess. A couple things:

  1. I agree with your comment above about test_socks4aSplit being redundant. Since test_socks4a delivers data one byte at a time, it's going to verify that the necessary buffering is done.
  2. The implementation is still introducing a new call to socket.gethostbyname. In an earlier review I mentioned that this is problematic. The code really needs to be using reactor.resolve instead. Otherwise, this code blocks on every connection setup request.

Changed 8 months ago by khorn

@exarkun: FYI, there is another ticket to fix the hostname resolution problem (#3920). I expect jesstess was intending to make that change under the auspices of that ticket.

Changed 8 months ago by jesstess

(In [27862]) Remove test_socks4aSplit as testing split handshakes is now covered by test_socks4a.

refs #3886

Changed 8 months ago by jesstess

  • owner jesstess deleted
  • keywords socks, review added; socks removed

Thanks for all the reviews!

Point 1 addressed. I had planned on handling point 2 in #3920 as khorn suggested, but I can do it here if people would prefer that.

Changed 8 months ago by exarkun

  • keywords socks added; socks, review removed
  • owner set to jesstess

Hm, I'd like to have the async stuff done here, if that doesn't introduce significant difficulties. Introducing the misbehavior here and fixing it separately later leaves trunk in a non-ideal state for the interim. As much as possible, trunk should be in the best state we can make it.

Changed 8 months ago by thijs

So I left a comment on this ticket, and did get a Trac email but its not showing up here, cause I hit a database locked or timeout issue, so I'll repeat the missed comment:

Replying to khorn: @exarkun: FYI, there is another ticket to fix the hostname resolution problem (#3920). I expect jesstess was intending to make that change under the auspices of that ticket.

First of all, as a side-comment, it's great to work on all of this stuff with you all :) Second of all, this ticket has now landed in the review queue, cause there was a typo in the review keyword that is fixed now. Thanks for the heads up khorn.

Changed 8 months ago by exarkun

Thanks for keeping track of all the moving pieces here. Given my comment above, I've also go ahead and closed #3920 as a duplicate of this one. Hopefully that means we can soon close this ticket with a merge that adds v4a support with asynchronous name lookups. If some strange circumstance occurs in which that ends up not happening, we can always re-open #3920.

Changed 8 months ago by jesstess

  • keywords socks, review added; socks removed
  • owner jesstess deleted

Added the asynchronous hostname resolution from #3920.

Changed 8 months ago by TimAllen

  • owner set to TimAllen

Changed 8 months ago by TimAllen

  • keywords review removed
  • owner changed from TimAllen to jesstess

Man, this SOCKS stuff is old and crufty. There's a lot of annoying and awkward stuff in t.p.socks that is totally unrelated to this change, including but not limited to:

  • All the awkward pattern-matching used to parse SOCKSv4 headers.
  • We check the version field with assert, instead of logging the event, and dropping the connection.
  • SOCKS seems to have its own private logging system, completely unrelated to t.p.log.
  • No symbolic constants for important magic numbers like CONNECT and BIND and the various response codes.
  • After parsing the header, we assert that there's no remaining data, rather than buffering it and forwarding it once the connection is established.

Those things aside, here's some review comments:

  1. The spec says that v4a packets are distinguished by having the IP field of the header contain three NULL octets followed by a non-NULL octet. The code and the tests check for the three NULL octets, but we never check that the fourth octet is non-NULL. Although the spec doesn't use RFC2119 language, it's pretty clear that the fourth-octet needs to be non-NULL, although I don't understand why, myself (AFAIK, 0.0.0.0 is still a non-routable IP that no DNS record should ever point to). We should at least mention in a comment that we're taking a shortcut.
  2. The SOCKS code checks for the NUL terminating the username with "'\000' in self.buf[8:]" but it checks for the NUL terminating the unresolved domain name with "len(self.buf.split("\000")) < 2". We should do it the same way both times; I think the former is easier to read and more efficient.
  3. When we ask the reactor to resolve a hostname for us, I guess we'll need an errBack that sends a "91" response code to the client.
  4. In _dataReceived2, if we got a BIND request, we call gethostbyname() on the server parameter... but thanks to our reactor resolving, server should already be a dotted-quad IP address.

Thanks for working on this!

Changed 8 months ago by jesstess

(In [27921]) Add SOCKSv4a test for BIND case.

refs #3886

Changed 8 months ago by jesstess

  • owner jesstess deleted
  • keywords review added

Thanks for the review.

re 1: Given that in a SOCKSv4a packet the last byte of the IP cannot be null, now, if the IP is 0.0.0.0, the packet is treated as a regular SOCKS packet.

re 2, 3: Done.

re 4: Whoops, that's what I get for not having test coverage for the bind case. Unit test added.

Also updated the copyrights.

Changed 8 months ago by exarkun

  • keywords socks added; socks, review removed
  • owner set to jesstess

Thanks for continuing to push this forward. :)

1. Bind.test_socks4a and Connect.test_socks4a share a problem: they rely on a real name lookup. This will make them prone to intermittent failures and makes them require more time to complete than strictly necessary. They would be more robust if the reactor used by SOCKSv4 were parameterized and the tests supplied a fake implementation with deterministic behavior for the resolve method. There doesn't appear to be a suitable fake reactor implementation already, so you may need to write one. This should be pretty straightforward, though. I don't know if it will help at all, but there is a FakeResolver class in twisted/internet/test/test_tcp.py (this would be helpful at a slightly different level, but maybe isn't directly applicable to the socks case).

This will let you control exactly when the resolution succeeds which means you shouldn't need a reactor.callLater(0.1, ...) in between the two halves of the test. It will also let you test the error case which is not, I believe, currently tested.

There are a few approaches to parameterizing the reactor for a protocol class. You can find various examples in Twisted by grepping for "reactor = reactor". Doing this with protocols is a bit annoying, since you usually can't require that a reactor be passed to __init__ (which is probably what you'd want in the ideal case). SOCKSv4 is even weirder than usual - it already has an overridden __init__ with an optional parameter, and its canonical factory creates it without any reference to the factory. Given all that, I'd probably add an optional reactor parameter to __init__, have the factory pass in the global reactor, and pass in a fake reactor in the tests. (Just a thought, though; if something else seems to make more sense to you, go for it.)

2. I think the struct.error handling code in SOCKSv4.dataReceived is dead. head is always at least 4 bytes long, and that excludes the only case I know of where struct.unpack might raise an exception. The code is untested, at least. If you can't think of a way it might be triggered in a test, feel free to delete it.

Changed 7 months ago by jesstess

(In [28107]) Parameterize the reactor to SOCKSv4 and use a dummy reactor with a deterministic resolve method for the socksv4a test.

refs #3886

Changed 7 months ago by jesstess

  • keywords socks, review added; socks removed
  • owner jesstess deleted

Thanks for the review, exarkun.

In the latest commit:

  • Parameterize the reactor used by SOCKSv4 and use a dummy reactor that has deterministic behavior for its resolve method in the SOCKSv4a tests.
  • Add a test for failed hostname lookups on a SOCKSv4a packet
  • remove the dead struct.error handling code

Changed 7 months ago by exarkun

Anyone know of any existing (non-windows) socks4a client software that might be used to test interoperability with this?

Changed 7 months ago by khorn

Changed 7 months ago by khorn

Also it appears cURL and OpenSSH both have SOCKS4a support, though it took some digging to figure that out. Looks like 4a support isn't something that people advertise much.

Changed 7 months ago by exarkun

  • owner set to jesstess
  • keywords socks added; socks, review removed
  1. The "my" prefix is a C++/Javaism. You can just name the parameter to SOCKSv4.__init__ "reactor". :)
  2. Can you add documentation for reactor to the SOCKSv4's docstring? (Couldn't hurt to do the same for buf, logging, and otherConn if you feel like it.)
  3. dataReceived - whew, what a mess. However, it looks right to me, the tests look good and pass, and I successfully used curl's socks4a feature against it. So, yay.

The rest looks good. Please merge after the above points are addressed.

Changed 7 months ago by jesstess

(In [28117]) Document the SOCKSv4 instance variables.

refs #3886

Changed 7 months ago by jesstess

  • status changed from new to closed
  • resolution set to fixed

(In [28118]) Merge socksv4a-3886

Authors: lorph, jesstess Reviewers: exarkun, thijs, TimAllen, khorn Fixes: #3886

Add SOCKSv4a support to the SOCKSv4 protocol implementation.

Changed 7 months ago by exarkun

Oops. Review fail. :/ twisted.test.test_socks.Bind.test_socks4a still uses a real DNS resolver and callLater(0.1. This is the source of some intermittent failures in this test.

Changed 7 months ago by jesstess

The SOCKSv4a BIND tests use a real DNS resolver, causing intermittent test failures. See #4275 for the regression ticket.

Note: See TracTickets for help on using tickets.