[2/2] http: update curl http/2 info matching for curl 8.3.0

Message ID	20230915113443.GB3531587@coredump.intra.peff.net (mailing list archive)
State	Accepted
Commit	0763c3a2c4f21a9e81990cc5cbee4a66d4efefcb
Headers	show Return-Path: <git-owner@vger.kernel.org> Date: Fri, 15 Sep 2023 07:34:43 -0400 From: Jeff King <peff@peff.net> To: git@vger.kernel.org Subject: [PATCH 2/2] http: update curl http/2 info matching for curl 8.3.0 Message-ID: <20230915113443.GB3531587@coredump.intra.peff.net> References: <20230915113237.GA3531328@coredump.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20230915113237.GA3531328@coredump.intra.peff.net> Precedence: bulk
Series	updating curl http/2 header matching (again) \| expand [0/2] updating curl http/2 header matching (again) [1/2] http: factor out matching of curl http/2 trace lines [2/2] http: update curl http/2 info matching for curl 8.3.0

Jeff King Sept. 15, 2023, 11:34 a.m. UTC

To redact header lines in http/2 curl traces, we have to parse past some
prefix bytes that curl sticks in the info lines it passes to us. That
changed once already, and we adapted in db30130165 (http: handle both
"h2" and "h2h3" in curl info lines, 2023-06-17).

Now it has changed again, in curl's fbacb14c4 (http2: cleanup trace
messages, 2023-08-04), which was released in curl 8.3.0. Running a build
of git linked against that version will fail to redact the trace (and as
before, t5559 notices and complains).

The format here is a little more complicated than the other ones, as it
now includes a "stream id". This is not constant but is always numeric,
so we can easily parse past it.

We'll continue to match the old versions, of course, since we want to
work with many different versions of curl. We can't even select one
format at compile time, because the behavior depends on the runtime
version of curl we use, not the version we build against.

Signed-off-by: Jeff King <peff@peff.net>
---
 http.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Junio C Hamano Sept. 15, 2023, 6:21 p.m. UTC | #1

Jeff King <peff@peff.net> writes:

> @@ -751,6 +753,18 @@ static int match_curl_h2_trace(const char *line, const char **out)
>  	    skip_iprefix(line, "h2 [", out))
>  		return 1;
>  
> +	/*
> +	 * curl 8.3.0 uses:
> +	 *   [HTTP/2] [<stream-id>] [<header-name>: <header-val>]
> +	 * where <stream-id> is numeric.
> +	 */
> +	if (skip_iprefix(line, "[HTTP/2] [", &p)) {
> +		while (isdigit(*p))
> +			p++;
> +		if (skip_prefix(p, "] [", out))
> +			return 1;
> +	}

Looking good assuming that <stream-id> part will never be updated to
allow spaces around the ID, or allow non-digits in the ID, in the
future.  Is there much harm if this code allowed false positives and
sent something that is *not* a curl trace, like "foo]" parsed out of
"[HTTP/2] [PATCH] [foo]", to redact_sensitive_header() function?

By the way, would this patch make sense?  Everybody in the function
that try to notice a sensitive header seems to check the sentting
independently, which seems error prone for those who want to add a
new header to redact.

 http.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git c/http.c w/http.c
index 8f71bf00d8..3dfa34fe65 100644
--- c/http.c
+++ w/http.c
@@ -684,8 +684,10 @@ static int redact_sensitive_header(struct strbuf *header, size_t offset)
 	int ret = 0;
 	const char *sensitive_header;
 
-	if (trace_curl_redact &&
-	    (skip_iprefix(header->buf + offset, "Authorization:", &sensitive_header) ||
+	if (!trace_curl_redact)
+		return ret;
+
+	if ((skip_iprefix(header->buf + offset, "Authorization:", &sensitive_header) ||
 	     skip_iprefix(header->buf + offset, "Proxy-Authorization:", &sensitive_header))) {
 		/* The first token is the type, which is OK to log */
 		while (isspace(*sensitive_header))
@@ -696,8 +698,7 @@ static int redact_sensitive_header(struct strbuf *header, size_t offset)
 		strbuf_setlen(header,  sensitive_header - header->buf);
 		strbuf_addstr(header, " <redacted>");
 		ret = 1;
-	} else if (trace_curl_redact &&
-		   skip_iprefix(header->buf + offset, "Cookie:", &sensitive_header)) {
+	} else if (skip_iprefix(header->buf + offset, "Cookie:", &sensitive_header)) {
 		struct strbuf redacted_header = STRBUF_INIT;
 		const char *cookie;

Taylor Blau Sept. 15, 2023, 6:38 p.m. UTC | #2

On Fri, Sep 15, 2023 at 07:34:43AM -0400, Jeff King wrote:
> @@ -751,6 +753,18 @@ static int match_curl_h2_trace(const char *line, const char **out)
>  	    skip_iprefix(line, "h2 [", out))
>  		return 1;
>
> +	/*
> +	 * curl 8.3.0 uses:
> +	 *   [HTTP/2] [<stream-id>] [<header-name>: <header-val>]
> +	 * where <stream-id> is numeric.
> +	 */
> +	if (skip_iprefix(line, "[HTTP/2] [", &p)) {
> +		while (isdigit(*p))
> +			p++;
> +		if (skip_prefix(p, "] [", out))
> +			return 1;
> +	}
> +

This looks good, too, though I do have one question. The HTTP/2
specification in 5.1 says (among other things):

    Streams are identified with an unsigned 31-bit integer. Streams
    initiated by a client MUST use odd-numbered stream identifiers; those
    initiated by the server MUST use even-numbered stream identifiers. A
    stream identifier of zero (0x0) is used for connection control messages;
    the stream identifier of zero cannot be used to establish a new stream.

So the parsing you wrote here makes sense in that we consume digits
between the pair of square brackets enclosing the stream identifier.

But I think we would happily eat a line like:

    [HTTP/2] [] [Secret: xyz]

even lacking a stream identifier. I think that's reasonably OK in
practice, because we're being over-eager in redacting instead of the
other way around. And we're unlikely to see such a line from curl
anyway, so I don't think that it matters.

If you feel otherwise, though, I think something as simple as:

    if (skip_iprefix(line, "[HTTP/2] [", &p)) {
      if (!*p)
        return 0;
      while (isdigit(*p))
        p++;
      if (skip_prefix(p, "] [", out))
        return 1;
    }

would do the trick. I *think* that this would also work:

    if (skip_iprefix(line, "[HTTP/2] [", &p)) {
      do {
        p++;
      } while (isdigit(*p))
      if (skip_prefix(p, "] [", out))
        return 1;
    }

since we know that p is non-NULL, and if it's the end of the line, *p
will be NUL and isdigit(*p) will return 0. But it's arguably less
direct, and requires some extra reasoning, so I have a vague preference
for the former.

But this may all be moot anyway, I don't feel strongly one way or the
other.

Thanks,
Taylor

Jeff King Sept. 16, 2023, 5:25 a.m. UTC | #3

On Fri, Sep 15, 2023 at 11:21:55AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > @@ -751,6 +753,18 @@ static int match_curl_h2_trace(const char *line, const char **out)
> >  	    skip_iprefix(line, "h2 [", out))
> >  		return 1;
> >  
> > +	/*
> > +	 * curl 8.3.0 uses:
> > +	 *   [HTTP/2] [<stream-id>] [<header-name>: <header-val>]
> > +	 * where <stream-id> is numeric.
> > +	 */
> > +	if (skip_iprefix(line, "[HTTP/2] [", &p)) {
> > +		while (isdigit(*p))
> > +			p++;
> > +		if (skip_prefix(p, "] [", out))
> > +			return 1;
> > +	}
> 
> Looking good assuming that <stream-id> part will never be updated to
> allow spaces around the ID, or allow non-digits in the ID, in the
> future.  Is there much harm if this code allowed false positives and
> sent something that is *not* a curl trace, like "foo]" parsed out of
> "[HTTP/2] [PATCH] [foo]", to redact_sensitive_header() function?

The current code on the generating side is pretty strict. It's
literally a printf using "[HTTP/2] [%d] [%.*s: %.*s]". As far as future
changes, I'm hesitant to make any changes based on guesses of what
_could_ happen. Our chance of hitting the mark is not high (I never
would have dreamed about this format after seeing the existing h2h3
ones), and it always carries the risk of misinterpretation.

You are right that the cost of a false positive is probably not too high
(the absolute worst case is that we redact something that looks
header-ish in the trace output). But even still, I'd prefer not to
complicate the code with extra parsing for a format that may or may not
ever come to exist.

If we were to loosen the parsing, it would make more sense to me to
loosen _much_ more, and just look for anything inside brackets.
Something like:

	p = header->buf;
	while ((p = strchr(p, '['))) {
		if (redact_sensitive_header(header, p - header->buf + 1)) {
			/* redaction ate our closing bracket */
			strbuf_addch(header, ']');
			break;
		}
		p++; /* skip past to look for next opening bracket */
	}

Then we are relying on redact_sensitive_header() to match the header
strings, and we'll pass it lots of garbage which it will reject. But at
least we've bought something: all of the h2 formats we know about will
just work, and any future ones which retain the bracketing will as well.

That said, I'm still somewhat inclined to the stricter parsing, just
because it's possible for us to see arbitrary bytes here. So if you had
a header that happened to have brackets in it, we'd match those.
Probably nothing too bad could come of it, but it just feels sloppy to
me.

> By the way, would this patch make sense?  Everybody in the function
> that try to notice a sensitive header seems to check the sentting
> independently, which seems error prone for those who want to add a
> new header to redact.
> [...]
> +	if (!trace_curl_redact)
> +		return ret;

Yeah, that looks a reasonable simplification to me (though obviously
orthogonal to the patch under discussion).

-Peff

Jeff King Sept. 16, 2023, 5:32 a.m. UTC | #4

On Fri, Sep 15, 2023 at 02:38:06PM -0400, Taylor Blau wrote:

> This looks good, too, though I do have one question. The HTTP/2
> specification in 5.1 says (among other things):
> 
>     Streams are identified with an unsigned 31-bit integer. Streams
>     initiated by a client MUST use odd-numbered stream identifiers; those
>     initiated by the server MUST use even-numbered stream identifiers. A
>     stream identifier of zero (0x0) is used for connection control messages;
>     the stream identifier of zero cannot be used to establish a new stream.
> 
> So the parsing you wrote here makes sense in that we consume digits
> between the pair of square brackets enclosing the stream identifier.

Yes, though I'm less concerned with what the standard says than with
what curl's code does (and it uses %d).

> But I think we would happily eat a line like:
> 
>     [HTTP/2] [] [Secret: xyz]
> 
> even lacking a stream identifier. I think that's reasonably OK in
> practice, because we're being over-eager in redacting instead of the
> other way around. And we're unlikely to see such a line from curl
> anyway, so I don't think that it matters.

Yes, you're correct that we'd allow an empty stream identifier. I'm
content to leave it in the name of simplicity.

> If you feel otherwise, though, I think something as simple as:
> 
>     if (skip_iprefix(line, "[HTTP/2] [", &p)) {
>       if (!*p)
>         return 0;
>       while (isdigit(*p))
>         p++;
>       if (skip_prefix(p, "] [", out))
>         return 1;
>     }

Yes, that would work, but...

> would do the trick. I *think* that this would also work:
> 
>     if (skip_iprefix(line, "[HTTP/2] [", &p)) {
>       do {
>         p++;
>       } while (isdigit(*p))
>       if (skip_prefix(p, "] [", out))
>         return 1;
>     }
>
> since we know that p is non-NULL, and if it's the end of the line, *p
> will be NUL and isdigit(*p) will return 0. But it's arguably less
> direct, and requires some extra reasoning, so I have a vague preference
> for the former.

Your do-while is too eager, I think. It advances the first "p" before
we've looked at it, so:

  - we'd match "[HTTP/2] [x1] [foo]", allowing one byte of non-digit
    cruft

  - if the string is "[HTTP/2] [", then "p" is at the NUL after the
    skip_iprefix call, and p++ walks us off the end of the array.

> But this may all be moot anyway, I don't feel strongly one way or the
> other.

My inclination is to leave it. I was actually tempted to just allow
_anything_ in the brackets if only because it makes the code even
simpler, but the "skip past digits" seemed like a reasonable middle
ground.

-Peff

Taylor Blau Sept. 19, 2023, 5:56 p.m. UTC | #5

On Sat, Sep 16, 2023 at 01:32:01AM -0400, Jeff King wrote:
> > But I think we would happily eat a line like:
> >
> >     [HTTP/2] [] [Secret: xyz]
> >
> > even lacking a stream identifier. I think that's reasonably OK in
> > practice, because we're being over-eager in redacting instead of the
> > other way around. And we're unlikely to see such a line from curl
> > anyway, so I don't think that it matters.
>
> Yes, you're correct that we'd allow an empty stream identifier. I'm
> content to leave it in the name of simplicity.

Yeah, I am definitely OK with that as well. I don't think it's worth
being overly specific in what we accept for redaction, since we're
erring on the side of being less restrictive.

> > But this may all be moot anyway, I don't feel strongly one way or the
> > other.
>
> My inclination is to leave it. I was actually tempted to just allow
> _anything_ in the brackets if only because it makes the code even
> simpler, but the "skip past digits" seemed like a reasonable middle
> ground.

Yep, same. Thanks for the sanity check :-).

Thanks,
Taylor

[2/2] http: update curl http/2 info matching for curl 8.3.0

Commit Message

Comments

Patch