diff mbox series

[2/2] http, imap-send: stop using CURLOPT_VERBOSE

Message ID 1df9e9deb7831b32694ea453759bf5d21952e165.1589218693.git.jonathantanmy@google.com (mailing list archive)
State New, archived
Headers show
Series Safer GIT_CURL_VERBOSE | expand

Commit Message

Jonathan Tan May 11, 2020, 5:43 p.m. UTC
Whenever GIT_CURL_VERBOSE is set, teach Git to behave as if
GIT_TRACE_CURL=1 and GIT_TRACE_CURL_NO_DATA=1 is set, instead of setting
CURLOPT_VERBOSE.

This is to prevent inadvertent revelation of sensitive data. In
particular, GIT_CURL_VERBOSE redacts neither the "Authorization" header
nor any cookies specified by GIT_REDACT_COOKIES.

Unifying the tracing mechanism also has the future benefit that any
improvements to the tracing mechanism will benefit both users of
GIT_CURL_VERBOSE and GIT_TRACE_CURL, and we do not need to remember to
implement any improvement twice.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/git.txt        |  2 --
 http.c                       |  8 +++++++-
 http.h                       |  7 +++++++
 imap-send.c                  |  2 +-
 t/t5551-http-fetch-smart.sh  | 24 ++++++++++++++++++++++++
 t/t5581-http-curl-verbose.sh |  2 +-
 trace.c                      | 20 ++++++++++++++++----
 trace.h                      |  6 ++++++
 8 files changed, 62 insertions(+), 9 deletions(-)

Comments

Jeff King May 12, 2020, 7:16 p.m. UTC | #1
On Mon, May 11, 2020 at 10:43:10AM -0700, Jonathan Tan wrote:

> Whenever GIT_CURL_VERBOSE is set, teach Git to behave as if
> GIT_TRACE_CURL=1 and GIT_TRACE_CURL_NO_DATA=1 is set, instead of setting
> CURLOPT_VERBOSE.
> 
> This is to prevent inadvertent revelation of sensitive data. In
> particular, GIT_CURL_VERBOSE redacts neither the "Authorization" header
> nor any cookies specified by GIT_REDACT_COOKIES.
> 
> Unifying the tracing mechanism also has the future benefit that any
> improvements to the tracing mechanism will benefit both users of
> GIT_CURL_VERBOSE and GIT_TRACE_CURL, and we do not need to remember to
> implement any improvement twice.

Yeah, I think this is worth doing. The patch looks OK to me, though:

> +void http_trace_curl_no_data(void)
> +{
> +	trace_override_envvar(&trace_curl, "1");
> +	trace_curl_data = 0;
> +}

Would:

  setenv("GIT_TRACE_CURL", "1", 0);
  setenv("GIT_TRACE_CURL_NO_DATA", "0", 0);

be simpler? Perhaps it makes the flow more convoluted as we'd go on to
parse those variables, but it puts us on the same paths that we'd use if
the user specified those things (and avoids the need for the special
"override" function entirely).

Other than that nit, it seems very cleanly done.

-Peff

PS I sometimes find the normal trace a bit verbose, but I do still want
   to see data. Do others feel the same? Particularly I find the "SSL"
   lines totally worthless (I guess maybe you could be debugging ssl
   stuff, but that would be the exception, I'd think). Ditto the split
   of data into two lines: one with the size and one with the actual
   data.

   I dunno. I haven't been debugging any git-over-http stuff lately, so
   it hasn't been bothering me. But I definitely have written perl
   scripts to extract the data to a more readable format. Maybe it would
   be easier if it had a few more knobs.
Jonathan Tan May 12, 2020, 7:23 p.m. UTC | #2
> > +void http_trace_curl_no_data(void)
> > +{
> > +	trace_override_envvar(&trace_curl, "1");
> > +	trace_curl_data = 0;
> > +}
> 
> Would:
> 
>   setenv("GIT_TRACE_CURL", "1", 0);
>   setenv("GIT_TRACE_CURL_NO_DATA", "0", 0);
> 
> be simpler? Perhaps it makes the flow more convoluted as we'd go on to
> parse those variables, but it puts us on the same paths that we'd use if
> the user specified those things (and avoids the need for the special
> "override" function entirely).
> 
> Other than that nit, it seems very cleanly done.

Thanks for the review. I thought of doing that, but thought that it
might add some latent complications - in particular, someone inspecting
the environment variables of this running process might see some
environment variables that they didn't set. But I'm OK either way.

> PS I sometimes find the normal trace a bit verbose, but I do still want
>    to see data. Do others feel the same? Particularly I find the "SSL"
>    lines totally worthless (I guess maybe you could be debugging ssl
>    stuff, but that would be the exception, I'd think). Ditto the split
>    of data into two lines: one with the size and one with the actual
>    data.
> 
>    I dunno. I haven't been debugging any git-over-http stuff lately, so
>    it hasn't been bothering me. But I definitely have written perl
>    scripts to extract the data to a more readable format. Maybe it would
>    be easier if it had a few more knobs.

Data can be turned on using GIT_TRACE_CURL=1 and refraining from setting
GIT_TRACE_CURL_NO_DATA. What knobs were you thinking of?
Jeff King May 12, 2020, 7:27 p.m. UTC | #3
On Tue, May 12, 2020 at 12:23:00PM -0700, Jonathan Tan wrote:

> > PS I sometimes find the normal trace a bit verbose, but I do still want
> >    to see data. Do others feel the same? Particularly I find the "SSL"
> >    lines totally worthless (I guess maybe you could be debugging ssl
> >    stuff, but that would be the exception, I'd think). Ditto the split
> >    of data into two lines: one with the size and one with the actual
> >    data.
> > 
> >    I dunno. I haven't been debugging any git-over-http stuff lately, so
> >    it hasn't been bothering me. But I definitely have written perl
> >    scripts to extract the data to a more readable format. Maybe it would
> >    be easier if it had a few more knobs.
> 
> Data can be turned on using GIT_TRACE_CURL=1 and refraining from setting
> GIT_TRACE_CURL_NO_DATA. What knobs were you thinking of?

I still want to see data, but less cruft. I.e., something like
"GIT_TRACE_CURL_SSL" (which I'd default to "off"), and probably just
reducing:

  15:24:01.169101 [pid=55191] http.c:702            <= Recv data, 0000000004 bytes (0x00000004)
  15:24:01.169104 [pid=55191] http.c:717            <= Recv data: 3e..

to just the second line. Actually, we might not need a knob at all for
SSL data. I was thinking that people might actually be debugging SSL
problems with it, but since all of the non-printable characters are
munged to "." anyway, it's basically useless (you can often pick out a
few strings from the cert during handshake, but you'd be much better off
to just connect with "openssl s_client" and ask it to dump the cert).

-Peff
brian m. carlson May 12, 2020, 11:13 p.m. UTC | #4
On 2020-05-11 at 17:43:10, Jonathan Tan wrote:
> Whenever GIT_CURL_VERBOSE is set, teach Git to behave as if
> GIT_TRACE_CURL=1 and GIT_TRACE_CURL_NO_DATA=1 is set, instead of setting
> CURLOPT_VERBOSE.
> 
> This is to prevent inadvertent revelation of sensitive data. In
> particular, GIT_CURL_VERBOSE redacts neither the "Authorization" header
> nor any cookies specified by GIT_REDACT_COOKIES.

I actually use GIT_CURL_VERBOSE to debug authentication problems from
time to time, so I'd like to keep an option to produce full, unredacted
output.  Since everyone uses HTTPS, it's not possible to perform this
debugging using a tool like Wireshark unless you use a MITM CA cert,
which seems excessive.
Junio C Hamano May 13, 2020, 12:10 a.m. UTC | #5
"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> On 2020-05-11 at 17:43:10, Jonathan Tan wrote:
>> Whenever GIT_CURL_VERBOSE is set, teach Git to behave as if
>> GIT_TRACE_CURL=1 and GIT_TRACE_CURL_NO_DATA=1 is set, instead of setting
>> CURLOPT_VERBOSE.
>> 
>> This is to prevent inadvertent revelation of sensitive data. In
>> particular, GIT_CURL_VERBOSE redacts neither the "Authorization" header
>> nor any cookies specified by GIT_REDACT_COOKIES.
>
> I actually use GIT_CURL_VERBOSE to debug authentication problems from
> time to time, so I'd like to keep an option to produce full, unredacted
> output.  Since everyone uses HTTPS, it's not possible to perform this
> debugging using a tool like Wireshark unless you use a MITM CA cert,
> which seems excessive.

Hmm, that is a valid concern.  Introducing yet another environment
feels a bit yucky, but something like GIT_NO_REDACT that disables
any redacting, not limited to curl but in all codepaths, might turn
out to be a useful escape hatch.

Opinions?
Jeff King May 13, 2020, 4:50 a.m. UTC | #6
On Tue, May 12, 2020 at 05:10:24PM -0700, Junio C Hamano wrote:

> > On 2020-05-11 at 17:43:10, Jonathan Tan wrote:
> >> Whenever GIT_CURL_VERBOSE is set, teach Git to behave as if
> >> GIT_TRACE_CURL=1 and GIT_TRACE_CURL_NO_DATA=1 is set, instead of setting
> >> CURLOPT_VERBOSE.
> >> 
> >> This is to prevent inadvertent revelation of sensitive data. In
> >> particular, GIT_CURL_VERBOSE redacts neither the "Authorization" header
> >> nor any cookies specified by GIT_REDACT_COOKIES.
> >
> > I actually use GIT_CURL_VERBOSE to debug authentication problems from
> > time to time, so I'd like to keep an option to produce full, unredacted
> > output.  Since everyone uses HTTPS, it's not possible to perform this
> > debugging using a tool like Wireshark unless you use a MITM CA cert,
> > which seems excessive.
> 
> Hmm, that is a valid concern.  Introducing yet another environment
> feels a bit yucky, but something like GIT_NO_REDACT that disables
> any redacting, not limited to curl but in all codepaths, might turn
> out to be a useful escape hatch.
> 
> Opinions?

Having an environment variable was my first thought, as well. I do
think it's key that the default be to redact. That makes life slightly
harder for people debugging auth problems, but prevents people from
accidentally leaking private info.

Regarding the name:

  - should it be under GIT_TRACE_CURL_* to make its impact clear? Or do
    we imagine it might eventually be applied elsewhere?

  - doing GIT_TRACE_REDACT would get rid of the negative (and it could
    just default to "true")

-Peff
Junio C Hamano May 13, 2020, 5:05 a.m. UTC | #7
Jeff King <peff@peff.net> writes:

>> Hmm, that is a valid concern.  Introducing yet another environment
>> feels a bit yucky, but something like GIT_NO_REDACT that disables
>> any redacting, not limited to curl but in all codepaths, might turn
>> out to be a useful escape hatch.
>> 
>> Opinions?
>
> Having an environment variable was my first thought, as well. I do
> think it's key that the default be to redact. That makes life slightly
> harder for people debugging auth problems, but prevents people from
> accidentally leaking private info.
>
> Regarding the name:
>
>   - should it be under GIT_TRACE_CURL_* to make its impact clear? Or do
>     we imagine it might eventually be applied elsewhere?
>
>   - doing GIT_TRACE_REDACT would get rid of the negative (and it could
>     just default to "true")

I like the latter, and I do expect it to allow the person who debugs
to temporarily disable *any* codepath that redacts its output.

Thanks.
Daniel Stenberg May 13, 2020, 6:16 a.m. UTC | #8
On Tue, 12 May 2020, brian m. carlson wrote:

Sorry for going slightly off-topic, but I figure this could help as a sort of 
PSA.

> Since everyone uses HTTPS, it's not possible to perform this debugging using 
> a tool like Wireshark unless you use a MITM CA cert, which seems excessive.

When you want to Wireshark the connection with libcurl, your friend is 
SSLKEYLOGFILE. If you set that environment variable (assuming the libcurl TLS 
backend supports it - several do), libcurl will save the TLS secrets in the 
file that environment variable mentions - at run-time in a format that 
Wireshark understands.

Then you can analyze the traffic, in real time, with Wirehark without fiddling 
with a MITM etc.
Jeff King May 13, 2020, 2:45 p.m. UTC | #9
On Wed, May 13, 2020 at 08:16:59AM +0200, Daniel Stenberg wrote:

> On Tue, 12 May 2020, brian m. carlson wrote:
> 
> Sorry for going slightly off-topic, but I figure this could help as a sort
> of PSA.
> 
> > Since everyone uses HTTPS, it's not possible to perform this debugging
> > using a tool like Wireshark unless you use a MITM CA cert, which seems
> > excessive.
> 
> When you want to Wireshark the connection with libcurl, your friend is
> SSLKEYLOGFILE. If you set that environment variable (assuming the libcurl
> TLS backend supports it - several do), libcurl will save the TLS secrets in
> the file that environment variable mentions - at run-time in a format that
> Wireshark understands.
> 
> Then you can analyze the traffic, in real time, with Wirehark without
> fiddling with a MITM etc.

Very cool. I didn't know about that at all. Now I want to debug some
https sessions... ;)

Thank you!

-Peff
diff mbox series

Patch

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 9d6769e95a..427ea70701 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -721,8 +721,6 @@  of clones and fetches.
 	Enables a curl full trace dump of all incoming and outgoing data,
 	including descriptive information, of the git transport protocol.
 	This is similar to doing curl `--trace-ascii` on the command line.
-	This option overrides setting the `GIT_CURL_VERBOSE` environment
-	variable.
 	See `GIT_TRACE` for available trace output options.
 
 `GIT_TRACE_CURL_NO_DATA`::
diff --git a/http.c b/http.c
index 62aa995245..4882c9f5b2 100644
--- a/http.c
+++ b/http.c
@@ -804,6 +804,12 @@  static int curl_trace(CURL *handle, curl_infotype type, char *data, size_t size,
 	return 0;
 }
 
+void http_trace_curl_no_data(void)
+{
+	trace_override_envvar(&trace_curl, "1");
+	trace_curl_data = 0;
+}
+
 void setup_curl_trace(CURL *handle)
 {
 	if (!trace_want(&trace_curl))
@@ -993,7 +999,7 @@  static CURL *get_curl_handle(void)
 	warning(_("Protocol restrictions not supported with cURL < 7.19.4"));
 #endif
 	if (getenv("GIT_CURL_VERBOSE"))
-		curl_easy_setopt(result, CURLOPT_VERBOSE, 1L);
+		http_trace_curl_no_data();
 	setup_curl_trace(result);
 	if (getenv("GIT_TRACE_CURL_NO_DATA"))
 		trace_curl_data = 0;
diff --git a/http.h b/http.h
index 5e0ad724f9..faf8cbb0d1 100644
--- a/http.h
+++ b/http.h
@@ -252,6 +252,13 @@  int finish_http_object_request(struct http_object_request *freq);
 void abort_http_object_request(struct http_object_request *freq);
 void release_http_object_request(struct http_object_request *freq);
 
+/*
+ * Instead of using environment variables to determine if curl tracing happens,
+ * behave as if GIT_TRACE_CURL=1 and GIT_TRACE_CURL_NO_DATA=1 is set. Call this
+ * before calling setup_curl_trace().
+ */
+void http_trace_curl_no_data(void);
+
 /* setup routine for curl_easy_setopt CURLOPT_DEBUGFUNCTION */
 void setup_curl_trace(CURL *handle);
 #endif /* HTTP_H */
diff --git a/imap-send.c b/imap-send.c
index 6c54d8c29d..52737546f3 100644
--- a/imap-send.c
+++ b/imap-send.c
@@ -1464,7 +1464,7 @@  static CURL *setup_curl(struct imap_server_conf *srvc, struct credential *cred)
 	curl_easy_setopt(curl, CURLOPT_UPLOAD, 1L);
 
 	if (0 < verbosity || getenv("GIT_CURL_VERBOSE"))
-		curl_easy_setopt(curl, CURLOPT_VERBOSE, 1L);
+		http_trace_curl_no_data();
 	setup_curl_trace(curl);
 
 	return curl;
diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index acc8473a72..be01cf7bb2 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -197,6 +197,18 @@  test_expect_success 'GIT_TRACE_CURL redacts auth details' '
 	grep "Authorization: Basic <redacted>" trace
 '
 
+test_expect_success 'GIT_CURL_VERBOSE redacts auth details' '
+	rm -rf redact-auth trace &&
+	set_askpass user@host pass@host &&
+	GIT_CURL_VERBOSE=1 git clone --bare "$HTTPD_URL/auth/smart/repo.git" redact-auth 2>trace &&
+	expect_askpass both user@host &&
+
+	# Ensure that there is no "Basic" followed by a base64 string, but that
+	# the auth details are redacted
+	! grep "Authorization: Basic [0-9a-zA-Z+/]" trace &&
+	grep "Authorization: Basic <redacted>" trace
+'
+
 test_expect_success 'disable dumb http on server' '
 	git --git-dir="$HTTPD_DOCUMENT_ROOT_PATH/repo.git" \
 		config http.getanyfile false
@@ -454,6 +466,18 @@  test_expect_success 'GIT_REDACT_COOKIES redacts cookies' '
 	! grep "Cookie:.*Bar=2" err
 '
 
+test_expect_success 'GIT_REDACT_COOKIES redacts cookies when GIT_CURL_VERBOSE=1' '
+	rm -rf clone &&
+	echo "Set-Cookie: Foo=1" >cookies &&
+	echo "Set-Cookie: Bar=2" >>cookies &&
+	GIT_CURL_VERBOSE=1 GIT_REDACT_COOKIES=Bar,Baz \
+		git -c "http.cookieFile=$(pwd)/cookies" clone \
+		$HTTPD_URL/smart/repo.git clone 2>err &&
+	grep "Cookie:.*Foo=1" err &&
+	grep "Cookie:.*Bar=<redacted>" err &&
+	! grep "Cookie:.*Bar=2" err
+'
+
 test_expect_success 'GIT_REDACT_COOKIES handles empty values' '
 	rm -rf clone &&
 	echo "Set-Cookie: Foo=" >cookies &&
diff --git a/t/t5581-http-curl-verbose.sh b/t/t5581-http-curl-verbose.sh
index 5129b0724f..927aad0820 100755
--- a/t/t5581-http-curl-verbose.sh
+++ b/t/t5581-http-curl-verbose.sh
@@ -20,7 +20,7 @@  test_expect_success 'failure in git-upload-pack is shown' '
 	test_might_fail env GIT_CURL_VERBOSE=1 \
 		git clone "$HTTPD_URL/error_git_upload_pack/smart/repo.git" \
 		2>curl_log &&
-	grep "< HTTP/1.1 500 Intentional Breakage" curl_log
+	grep "<= Recv header: HTTP/1.1 500 Intentional Breakage" curl_log
 '
 
 test_done
diff --git a/trace.c b/trace.c
index b3ef0e627f..f726686fd9 100644
--- a/trace.c
+++ b/trace.c
@@ -29,7 +29,7 @@  struct trace_key trace_perf_key = TRACE_KEY_INIT(PERFORMANCE);
 struct trace_key trace_setup_key = TRACE_KEY_INIT(SETUP);
 
 /* Get a trace file descriptor from "key" env variable. */
-static int get_trace_fd(struct trace_key *key)
+static int get_trace_fd(struct trace_key *key, const char *override_envvar)
 {
 	const char *trace;
 
@@ -37,7 +37,7 @@  static int get_trace_fd(struct trace_key *key)
 	if (key->initialized)
 		return key->fd;
 
-	trace = getenv(key->key);
+	trace = override_envvar ? override_envvar : getenv(key->key);
 
 	if (!trace || !strcmp(trace, "") ||
 	    !strcmp(trace, "0") || !strcasecmp(trace, "false"))
@@ -68,6 +68,18 @@  static int get_trace_fd(struct trace_key *key)
 	return key->fd;
 }
 
+void trace_override_envvar(struct trace_key *key, const char *value)
+{
+	trace_disable(key);
+	key->initialized = 0;
+
+	/*
+	 * Invoke get_trace_fd() to initialize key using the given value
+	 * instead of the value of the environment variable.
+	 */
+	get_trace_fd(key, value);
+}
+
 void trace_disable(struct trace_key *key)
 {
 	if (key->need_close)
@@ -112,7 +124,7 @@  static int prepare_trace_line(const char *file, int line,
 
 static void trace_write(struct trace_key *key, const void *buf, unsigned len)
 {
-	if (write_in_full(get_trace_fd(key), buf, len) < 0) {
+	if (write_in_full(get_trace_fd(key, NULL), buf, len) < 0) {
 		warning("unable to write trace for %s: %s",
 			key->key, strerror(errno));
 		trace_disable(key);
@@ -383,7 +395,7 @@  void trace_repo_setup(const char *prefix)
 
 int trace_want(struct trace_key *key)
 {
-	return !!get_trace_fd(key);
+	return !!get_trace_fd(key, NULL);
 }
 
 #if defined(HAVE_CLOCK_GETTIME) && defined(HAVE_CLOCK_MONOTONIC)
diff --git a/trace.h b/trace.h
index 9826618b33..0dbbad0e41 100644
--- a/trace.h
+++ b/trace.h
@@ -101,6 +101,12 @@  void trace_repo_setup(const char *prefix);
  */
 int trace_want(struct trace_key *key);
 
+/**
+ * Enables or disables tracing for the specified key, as if the environment
+ * variable was set to the given value.
+ */
+void trace_override_envvar(struct trace_key *key, const char *value);
+
 /**
  * Disables tracing for the specified key, even if the environment variable
  * was set.