Message ID | pull.1767.v2.git.1722062682858.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] http: do not ignore proxy path | expand |
On Sat, Jul 27, 2024 at 06:44:42AM +0000, Ryan Hendrickson via GitGitGadget wrote: > Re earlier discussion about regressions, it turns out that curl has only > supported this syntax since 2022 [https://curl.se/ch/7.84.0.html]. > Earlier versions of curl, if provided an http_proxy that contained a > path, would also ignore it. curl didn't seem to consider this a breaking > change when the feature was introduced > [https://github.com/curl/curl/pull/8668], though; is that a valid > argument for Git not to lose sleep over it either? IMHO it is probably OK. The path did not do anything before then, so why would anybody pass it? Looking into the curl support, I did notice two things: - unix sockets are only supported for socks proxies. Is it meaningful to have a path on an http proxy? I don't think so (there is no place to send it in the "GET" line). But perhaps we should limit the new code only to socks. - curl says you should put "localhost" in the host field for this, though obviously it is not otherwise used. Should we likewise require that to kick in the code? > @@ -1265,7 +1266,24 @@ static CURL *get_curl_handle(void) > if (!proxy_auth.host) > die("Invalid proxy URL '%s'", curl_http_proxy); > > - curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host); > + strbuf_addstr(&proxy, proxy_auth.host); > + if (proxy_auth.path) { There's one gotcha here I wondered about: we usually throw away the path element, unless credential.useHTTPPath is set. That only happens after we load the per-credential config, though, via credential_apply_config(), which in turn is triggered by a call to credential_fill(), etc. I think this is OK here, since we have just called credential_from_url() ourselves, and the earliest credential_fill() we'd hit is from init_curl_proxy_auth(), which is called right after the code you're touching. > + int path_is_supported = 0; > + /* curl_version_info was added in curl 7.10 */ > +#if LIBCURL_VERSION_NUM >= 0x070a00 > + curl_version_info_data *ver = curl_version_info(CURLVERSION_NOW); > + path_is_supported = ver->version_num >= 0x075400; > +#endif We've been trying to keep our curl version #ifdefs in git-curl-compat.h, with human-readable names. But in this case, I think we can ditch it entirely, as we require curl 7.21.3 or later already. This will be the first time we check curl_version_info() instead of a build-time option. I think that makes sense here. Most features require extra information at build-time (e.g., CURLOPT_* constants), but in this case it is purely a check on the runtime behavior. We perhaps could get away with just letting an older curl quietly ignore the path field, but what you have here is probably a bit friendlier for users. > diff --git a/t/socks5-proxy-server/src/socks5.pl b/t/socks5-proxy-server/src/socks5.pl > new file mode 100755 > index 00000000000..3ef66a1a287 > --- /dev/null > +++ b/t/socks5-proxy-server/src/socks5.pl This is a lot more code than I was hoping for. There are definitely parts of it we don't care about (e.g, threading, handling multiple connections at once) that could be pared down a bit. I don't think we particularly care about auth either (we just want to make sure we talk to the unix-socket proxy at all). What if we switched to socks4, which drops all of the auth bits and supports only IPs, not hostnames (that came in socks4a). This is the shortest I could come up with (only lightly tested): -- >8 -- use strict; use IO::Select; use IO::Socket::UNIX; use IO::Socket::INET; my $path = shift; unlink($path); my $server = IO::Socket::UNIX->new(Listen => 1, Local => $path) or die "unable to listen on $path: $!"; $| = 1; print "ready to rumble\n"; while (my $client = $server->accept()) { sysread $client, my $buf, 8; my ($version, $cmd, $port, $ip) = unpack 'CCnN', $buf; next unless $version == 4; # socks4 next unless $cmd == 1; # TCP stream connection # skip NUL-terminated id while (sysread $client, my $char, 1) { last unless ord($char); } # version(0), reply(5a == granted), port (ignored), ip (ignored) syswrite $client, "\x00\x5a\x00\x00\x00\x00\x00\x00"; my $remote = IO::Socket::INET->new(PeerHost => $ip, PeerPort => $port) or die "unable to connect to $ip/$port: $!"; my $io = IO::Select->new($client, $remote); while ($io->count) { for my $fh ($io->can_read(0)) { for my $pair ([$client, $remote], [$remote, $client]) { my ($from, $to) = @$pair; next unless $fh == $from; my $r = sysread $from, my $buf, 1024; if (!defined $r || $r <= 0) { $io->remove($from); next; } syswrite $to, $buf; } } } } -- >8 -- That's still more than I'd like, but quite a bit smaller. I dunno. Probably the one you found is more battle-tested, but I really think we have pretty simple requirements. > diff --git a/t/t5564-http-proxy.sh b/t/t5564-http-proxy.sh > index bb35b87071d..bafaa31adf8 100755 > --- a/t/t5564-http-proxy.sh > +++ b/t/t5564-http-proxy.sh > @@ -39,4 +39,22 @@ test_expect_success 'clone can prompt for proxy password' ' > expect_askpass pass proxuser > ' > > +start_socks() { > + # The %30 tests that the correct amount of percent-encoding is applied > + # to the proxy string passed to curl. > + "$PERL_PATH" "$TEST_DIRECTORY/socks5-proxy-server/src/socks5.pl" \ > + "$TRASH_DIRECTORY/%30.sock" proxuser proxpass & > + socks_pid=$! > +} > + > +test_lazy_prereq LIBCURL_7_84 'expr x$(curl-config --vernum) \>= x075400' This is the first time we'd be relying on curl-config in the test suite. I suspect it would _usually_ work, but I'd worry about a few things: 1. The Makefile allows for a different path for $(CURL_CONFIG), or even skipping it entirely by setting $(CURLDIR). 2. We'd usually build and test in the same environment, but not necessarily. In particular, I know Windows uses separate CI jobs for this, and I'm not sure if curl-config will be around in the test jobs. I can see two paths forward: a. There's been recent discussion about adding an option for Git to report the run-time curl version. That doesn't exist yet, though "git version --build-options" will report the build-time version. That might be good enough for the purposes of testing a build. b. Write the test such that it expects the request to work _or_ we see the "version too old" failure. > +test_expect_success PERL,LIBCURL_7_84 'clone via Unix socket' ' I'm not sure if this PERL prereq is sufficient. We also need to know that we can make unix sockets. Probably we need to actually run the socks proxy and make sure it gets to the "starting..." line before accepting that it works. Which also gets us to... > + start_socks && > + test_when_finished "rm -rf clone && kill $socks_pid" && > + test_config_global http.proxy "socks5://proxuser:proxpass@localhost$PWD/%2530.sock" && > + GIT_TRACE_CURL=$PWD/trace git clone "$HTTPD_URL/smart/repo.git" clone && This is racy. We started the proxy in the background, but we don't know that it's ready to accept connections by the time we run "git clone". My first few runs all failed, but putting a "sleep 1" in between fixed it (which obviously is not a real fix, but just a debugging aid). If you usually see success, try running the test script with "--stress" to create extra load, and you should see failures. The usual trick here is to start the process in the background, and then in the foreground wait for some "I'm ready" output, which involves ugly tricks with fifos. There's prior art in lib-git-daemon.sh (though it's a little more complicated there, because we want to relay its stderr, too, but with a script under our control we can just put the ready message on stdout). So coupled with the prereq fix that I mentioned above, you might be able to do something like (totally untested): start_socks_proxy () { mkfifo socks_output && { perl proxy.pl ... >socks_output & socks_pid=$! } && read line <socks_output && test "$line" = "Starting..." } test_expect_success 'try to start socks proxy' ' if start_socks_proxy then test_seq_prereq SOCKS_PROXY fi ' test_expect_success SOCKS_PROXY 'clone via unix socket' ' ... ' -Peff
At 2024-07-29 16:09-0400, Jeff King <peff@peff.net> sent: > Looking into the curl support, I did notice two things: > > - unix sockets are only supported for socks proxies. Is it meaningful > to have a path on an http proxy? I don't think so (there is no place > to send it in the "GET" line). But perhaps we should limit the new > code only to socks. > > - curl says you should put "localhost" in the host field for this, > though obviously it is not otherwise used. Should we likewise > require that to kick in the code? Sounds good, I've added checks and tests for both of these cases. >> @@ -1265,7 +1266,24 @@ static CURL *get_curl_handle(void) >> if (!proxy_auth.host) >> die("Invalid proxy URL '%s'", curl_http_proxy); >> >> - curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host); >> + strbuf_addstr(&proxy, proxy_auth.host); >> + if (proxy_auth.path) { > > There's one gotcha here I wondered about: we usually throw away the path > element, unless credential.useHTTPPath is set. That only happens after > we load the per-credential config, though, via credential_apply_config(), > which in turn is triggered by a call to credential_fill(), etc. > > I think this is OK here, since we have just called credential_from_url() > ourselves, and the earliest credential_fill() we'd hit is from > init_curl_proxy_auth(), which is called right after the code you're > touching. Yes, good lookout but I don't think this is a problem either. >> + int path_is_supported = 0; >> + /* curl_version_info was added in curl 7.10 */ >> +#if LIBCURL_VERSION_NUM >= 0x070a00 >> + curl_version_info_data *ver = curl_version_info(CURLVERSION_NOW); >> + path_is_supported = ver->version_num >= 0x075400; >> +#endif > > We've been trying to keep our curl version #ifdefs in git-curl-compat.h, > with human-readable names. But in this case, I think we can ditch it > entirely, as we require curl 7.21.3 or later already. Ah, okay, that is good to know! I'll remove the #if. > This will be the first time we check curl_version_info() instead of a > build-time option. I think that makes sense here. Most features require > extra information at build-time (e.g., CURLOPT_* constants), but in this > case it is purely a check on the runtime behavior. > > We perhaps could get away with just letting an older curl quietly ignore > the path field, but what you have here is probably a bit friendlier for > users. Yes, curl has a nasty tendency to quietly ignore a lot of things. I didn't want users to be uncertain about whether they were using the feature incorrectly if an older curl was the actual issue. > What if we switched to socks4, which drops all of the auth bits and > supports only IPs, not hostnames (that came in socks4a). This is the > shortest I could come up with (only lightly tested): Ah, many thanks! I like this much better, and I'm not proficient enough with Perl to have written it myself. > I can see two paths forward: > > a. There's been recent discussion about adding an option for Git to > report the run-time curl version. That doesn't exist yet, though > "git version --build-options" will report the build-time version. > That might be good enough for the purposes of testing a build. > > b. Write the test such that it expects the request to work _or_ we see > the "version too old" failure. Got it, I'll go with option b. > If you usually see success, try running the test script with "--stress" > to create extra load, and you should see failures. Huh. I never saw any race issues on my machine even with --stress, but your approach is safer so I'm running with it. Thank you! R
diff --git a/Documentation/config/http.txt b/Documentation/config/http.txt index 162b33fc52f..a14371b5c96 100644 --- a/Documentation/config/http.txt +++ b/Documentation/config/http.txt @@ -5,8 +5,8 @@ http.proxy:: proxy string with a user name but no password, in which case git will attempt to acquire one in the same way it does for other credentials. See linkgit:gitcredentials[7] for more information. The syntax thus is - '[protocol://][user[:password]@]proxyhost[:port]'. This can be overridden - on a per-remote basis; see remote.<name>.proxy + '[protocol://][user[:password]@]proxyhost[:port][/path]'. This can be + overridden on a per-remote basis; see remote.<name>.proxy + Any proxy, however configured, must be completely transparent and must not modify, transform, or buffer the request or response in any way. Proxies which diff --git a/http.c b/http.c index 623ed234891..a6001296dd0 100644 --- a/http.c +++ b/http.c @@ -1227,6 +1227,7 @@ static CURL *get_curl_handle(void) */ curl_easy_setopt(result, CURLOPT_PROXY, ""); } else if (curl_http_proxy) { + struct strbuf proxy = STRBUF_INIT; if (starts_with(curl_http_proxy, "socks5h")) curl_easy_setopt(result, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5_HOSTNAME); @@ -1265,7 +1266,24 @@ static CURL *get_curl_handle(void) if (!proxy_auth.host) die("Invalid proxy URL '%s'", curl_http_proxy); - curl_easy_setopt(result, CURLOPT_PROXY, proxy_auth.host); + strbuf_addstr(&proxy, proxy_auth.host); + if (proxy_auth.path) { + int path_is_supported = 0; + /* curl_version_info was added in curl 7.10 */ +#if LIBCURL_VERSION_NUM >= 0x070a00 + curl_version_info_data *ver = curl_version_info(CURLVERSION_NOW); + path_is_supported = ver->version_num >= 0x075400; +#endif + if (path_is_supported) { + strbuf_addch(&proxy, '/'); + strbuf_add_percentencode(&proxy, proxy_auth.path, 0); + } else { + die("libcurl 7.84 or later is required to support paths in proxy URLs"); + } + } + curl_easy_setopt(result, CURLOPT_PROXY, proxy.buf); + strbuf_release(&proxy); + var_override(&curl_no_proxy, getenv("NO_PROXY")); var_override(&curl_no_proxy, getenv("no_proxy")); curl_easy_setopt(result, CURLOPT_NOPROXY, curl_no_proxy); diff --git a/t/socks5-proxy-server/LICENSE b/t/socks5-proxy-server/LICENSE new file mode 100644 index 00000000000..94981550c94 --- /dev/null +++ b/t/socks5-proxy-server/LICENSE @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2014 kaimi.io + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/t/socks5-proxy-server/README.md b/t/socks5-proxy-server/README.md new file mode 100644 index 00000000000..6eed0e5ce7a --- /dev/null +++ b/t/socks5-proxy-server/README.md @@ -0,0 +1,27 @@ +socks5-proxy-server +=================== + +This Perl code implements a SOCKS5 proxy server that listens for incoming connections and processes them in separate threads. The server takes three input parameters: `path`, `login`, and `password`. + +When a client attempts to connect to the server, the server checks if the client supports any of the available authentication methods (no authentication or login/password authentication). If a suitable method is found, the server establishes a connection with the target server and begins forwarding data between the client and the target server. + +The code uses the `IO::Select` module for working with sockets and the `threads` module for creating and managing threads. It includes several functions, including: + +- `main`: the main function that creates threads for processing incoming connections. +- `replenish`: a function that creates additional threads if the number of free threads is less than the specified value. +- `new_client`: a function that handles incoming client connections and checks if the available authentication methods are supported by the client. +- `socks_auth`: a function that performs login and password authentication. +- `handle_client`: a function that establishes a connection with the target server and forwards data between the client and the target server. + +This code includes the use of the following Perl modules: `IO::Select`, `Socket`, `threads`, `threads::shared`. + +To run this code, enter the following command: +`perl socks5.pl path login password` + +Note that this code is designed for educational purposes and should not be used in production environments without proper modifications and security measures.