commit.c: ensure strchrnul() doesn't scan beyond range

Message ID	pull.1652.git.1707153705840.gitgitgadget@gmail.com (mailing list archive)
State	New, archived
Headers	show Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5ACA928DD1 for <git@vger.kernel.org>; Mon, 5 Feb 2024 17:21:49 +0000 (UTC) Message-ID: <pull.1652.git.1707153705840.gitgitgadget@gmail.com> Date: Mon, 05 Feb 2024 17:21:45 +0000 Subject: [PATCH] commit.c: ensure strchrnul() doesn't scan beyond range Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk MIME-Version: 1.0 To: git@vger.kernel.org Cc: Chandra Pratap <chandrapratap376@gmail.com>, Chandra Pratap <chandrapratap3519@gmail.com> From: Chandra Pratap <chandrapratap3519@gmail.com>
Series	commit.c: ensure strchrnul() doesn't scan beyond range \| expand commit.c: ensure strchrnul() doesn't scan beyond range

Chandra Pratap Feb. 5, 2024, 5:21 p.m. UTC

From: Chandra Pratap <chandrapratap3519@gmail.com>

Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
---
    commit.c: ensure strchrnul() doesn't scan beyond range

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1652%2FChand-ra%2Fstrchrnul-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1652/Chand-ra/strchrnul-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1652

 commit.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)


base-commit: a54a84b333adbecf7bc4483c0e36ed5878cac17b

René Scharfe Feb. 5, 2024, 7:57 p.m. UTC | #1

Am 05.02.24 um 18:21 schrieb Chandra Pratap via GitGitGadget:
> From: Chandra Pratap <chandrapratap3519@gmail.com>
>
> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
> ---
>     commit.c: ensure strchrnul() doesn't scan beyond range
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1652%2FChand-ra%2Fstrchrnul-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1652/Chand-ra/strchrnul-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1652
>
>  commit.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
>
> diff --git a/commit.c b/commit.c
> index ef679a0b939..a65b8e92e94 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -1743,15 +1743,9 @@ const char *find_header_mem(const char *msg, size_t len,
>  	int key_len = strlen(key);
>  	const char *line = msg;
>
> -	/*
> -	 * NEEDSWORK: It's possible for strchrnul() to scan beyond the range
> -	 * given by len. However, current callers are safe because they compute
> -	 * len by scanning a NUL-terminated block of memory starting at msg.
> -	 * Nonetheless, it would be better to ensure the function does not look
> -	 * at msg beyond the len provided by the caller.
> -	 */
>  	while (line && line < msg + len) {
>  		const char *eol = strchrnul(line, '\n');
> +		assert(eol - line <= len);

Something like this might work in Verse, but C is more simple-minded.
You can't undo an out-of-bounds access after the fact, and assert()
would be compiled out if the code is built with NDEBUG anyway.

If you want to make the code work with buffers that lack a terminating
NUL then you need to replace the strchrnul() call with something that
respects buffer lengths.  You could e.g. call memchr().  Don't forget
to check for NUL to preserve the original behavior.  Or you could roll
your own custom replacement, perhaps like this:

char *strnchrnul(const char *s, int c, size_t len)
{
	while (len-- && *s && *s != c)
		s++;
	return (char *)s;
}

A test with the new unit-test framework would be nice.  It should be
possible to show that the current code runs over the passed len,
without causing undefined behavior.  E.g. find_header_mem("foo bar",
2, "foo", &len) is safe, but returns "bar" instead of NULL.

>
>  		if (line == eol)
>  			return NULL;
>
> base-commit: a54a84b333adbecf7bc4483c0e36ed5878cac17b

Kyle Lippincott Feb. 6, 2024, 1:41 a.m. UTC | #2

On Mon, Feb 5, 2024 at 9:23 AM Chandra Pratap via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Chandra Pratap <chandrapratap3519@gmail.com>
>
> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com>
> ---
>     commit.c: ensure strchrnul() doesn't scan beyond range
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1652%2FChand-ra%2Fstrchrnul-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1652/Chand-ra/strchrnul-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1652
>
>  commit.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
>
> diff --git a/commit.c b/commit.c
> index ef679a0b939..a65b8e92e94 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -1743,15 +1743,9 @@ const char *find_header_mem(const char *msg, size_t len,
>         int key_len = strlen(key);
>         const char *line = msg;
>
> -       /*
> -        * NEEDSWORK: It's possible for strchrnul() to scan beyond the range
> -        * given by len. However, current callers are safe because they compute
> -        * len by scanning a NUL-terminated block of memory starting at msg.
> -        * Nonetheless, it would be better to ensure the function does not look
> -        * at msg beyond the len provided by the caller.
> -        */
>         while (line && line < msg + len) {
>                 const char *eol = strchrnul(line, '\n');
> +               assert(eol - line <= len);

I don't think this is sufficient to address the NEEDSWORK. `assert` is
only active in debug builds, and strchrnul would have already
potentially exceeded the bounds of its memory by the time this check
is happening. We'd need a safe version of strchrnul that took the
maximum length and never exceeded it.

>
>                 if (line == eol)
>                         return NULL;
>
> base-commit: a54a84b333adbecf7bc4483c0e36ed5878cac17b
> --
> gitgitgadget
>

Junio C Hamano Feb. 6, 2024, 6:44 p.m. UTC | #3

René Scharfe <l.s.r@web.de> writes:

>>  	while (line && line < msg + len) {
>>  		const char *eol = strchrnul(line, '\n');
>> +		assert(eol - line <= len);
>
> Something like this might work in Verse, but C is more simple-minded.
> You can't undo an out-of-bounds access after the fact, and assert()
> would be compiled out if the code is built with NDEBUG anyway.

Good comments.  Thanks.

Jeff King Feb. 8, 2024, 1 a.m. UTC | #4

On Mon, Feb 05, 2024 at 08:57:46PM +0100, René Scharfe wrote:

> If you want to make the code work with buffers that lack a terminating
> NUL then you need to replace the strchrnul() call with something that
> respects buffer lengths.  You could e.g. call memchr().  Don't forget
> to check for NUL to preserve the original behavior.  Or you could roll
> your own custom replacement, perhaps like this:

I'm not sure it is worth retaining the check for NUL. The original
function added by me in fe6eb7f2c5 (commit: provide a function to find a
header in a buffer, 2014-08-27) just took a NUL-terminated string, so
we certainly were not expecting embedded NULs.

In cfc5cf428b (receive-pack.c: consolidate find header logic,
2022-01-06) we switched to taking the "len" parameter, but the new
caller just passes strlen(msg) anyway.

I guess you could argue that before that commit, receive-pack.c's
find_header() which took a length was buggy to use strchrnul(). It gets
fed with a push-cert buffer. I guess it's possible for there to be an
embedded NUL there, but in practice there shouldn't be. If we are
thinking of malformed or malicious input, it's not clear which behavior
(finding or not finding a header past a NUL) is more harmful. So all
things being equal, I would try to reduce the number of special cases
here by not worrying about NULs.

(Though if somebody really wants to dig, it's possible there's a clever
dual-parser attack here where "\nfoo\0bar baz" finds the header "bar
baz" in one parser but not in another).

-Peff

René Scharfe Feb. 8, 2024, 6:31 p.m. UTC | #5

Am 08.02.24 um 02:00 schrieb Jeff King:
> On Mon, Feb 05, 2024 at 08:57:46PM +0100, René Scharfe wrote:
>
>> If you want to make the code work with buffers that lack a terminating
>> NUL then you need to replace the strchrnul() call with something that
>> respects buffer lengths.  You could e.g. call memchr().  Don't forget
>> to check for NUL to preserve the original behavior.  Or you could roll
>> your own custom replacement, perhaps like this:
>
> I'm not sure it is worth retaining the check for NUL. The original
> function added by me in fe6eb7f2c5 (commit: provide a function to find a
> header in a buffer, 2014-08-27) just took a NUL-terminated string, so
> we certainly were not expecting embedded NULs.
>
> In cfc5cf428b (receive-pack.c: consolidate find header logic,
> 2022-01-06) we switched to taking the "len" parameter, but the new
> caller just passes strlen(msg) anyway.
>
> I guess you could argue that before that commit, receive-pack.c's
> find_header() which took a length was buggy to use strchrnul(). It gets
> fed with a push-cert buffer. I guess it's possible for there to be an
> embedded NUL there, but in practice there shouldn't be. If we are
> thinking of malformed or malicious input, it's not clear which behavior
> (finding or not finding a header past a NUL) is more harmful. So all
> things being equal, I would try to reduce the number of special cases
> here by not worrying about NULs.
>
> (Though if somebody really wants to dig, it's possible there's a clever
> dual-parser attack here where "\nfoo\0bar baz" finds the header "bar
> baz" in one parser but not in another).

Good point.  A _mem function shouldn't worry about NULs.  Its callers
are responsible for that -- if necessary.

No idea what an attacker could do with nonce and push-option headers
with varying visibility.  Version detection?  Something worse?

But anyway: If NULs are of no concern and we currently end parsing when
we see one in all cases, why do we need a _mem function at all?  The
original version of the function, find_commit_header(), should suffice.
check_nonce() could be run against the NUL-terminated sigcheck.payload
and check_cert_push_options() parses an entire strbuf, so there is no
risk of out-of-bounds access.

René

Junio C Hamano Feb. 8, 2024, 7:48 p.m. UTC | #6

René Scharfe <l.s.r@web.de> writes:

> But anyway: If NULs are of no concern and we currently end parsing when
> we see one in all cases, why do we need a _mem function at all?  The
> original version of the function, find_commit_header(), should suffice.
> check_nonce() could be run against the NUL-terminated sigcheck.payload
> and check_cert_push_options() parses an entire strbuf, so there is no
> risk of out-of-bounds access.

If I recall correctly, the caller that does not pass strlen() as the
payload length gives a length that is shorter than the buffer, i.e.
"stop the parsing here, do not get confused into thinking the
garbage after this point contains useful payload" was the reason why
we have a separate "len".

Kyle Lippincott Feb. 8, 2024, 7:52 p.m. UTC | #7

On Thu, Feb 8, 2024 at 11:48 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> René Scharfe <l.s.r@web.de> writes:
>
> > But anyway: If NULs are of no concern and we currently end parsing when
> > we see one in all cases, why do we need a _mem function at all?  The
> > original version of the function, find_commit_header(), should suffice.
> > check_nonce() could be run against the NUL-terminated sigcheck.payload
> > and check_cert_push_options() parses an entire strbuf, so there is no
> > risk of out-of-bounds access.
>
> If I recall correctly, the caller that does not pass strlen() as the
> payload length gives a length that is shorter than the buffer, i.e.
> "stop the parsing here, do not get confused into thinking the
> garbage after this point contains useful payload" was the reason why
> we have a separate "len".
>

I just rediscovered that. I think this probably should be something
that caller (check_nonce) implements, then. Having a _mem function
implies to me (though I'm very new to this codebase) that it supports
embedded NULs, but that's not what's happening here.

Jeff King Feb. 8, 2024, 9:41 p.m. UTC | #8

On Thu, Feb 08, 2024 at 11:48:05AM -0800, Junio C Hamano wrote:

> René Scharfe <l.s.r@web.de> writes:
> 
> > But anyway: If NULs are of no concern and we currently end parsing when
> > we see one in all cases, why do we need a _mem function at all?  The
> > original version of the function, find_commit_header(), should suffice.
> > check_nonce() could be run against the NUL-terminated sigcheck.payload
> > and check_cert_push_options() parses an entire strbuf, so there is no
> > risk of out-of-bounds access.
> 
> If I recall correctly, the caller that does not pass strlen() as the
> payload length gives a length that is shorter than the buffer, i.e.
> "stop the parsing here, do not get confused into thinking the
> garbage after this point contains useful payload" was the reason why
> we have a separate "len".

Yes, check_nonce() passes in a length limited by the start of the actual
signature, as determined by parse_signed_buffer(). Though that generally
comes after a blank line, which would also stop find_header() from
parsing further.

But more interestingly: even though we pass a buf/len pair to
parse_signed_buffer(), it then calls get_format_by_sig() which takes
only a NUL-terminated string. So:

  1. It is not possible for the buf/len pair we pass to check_nonce() to
     contain a NUL. And thus there is no caller of find_header_mem()
     that can contain an embedded NUL. So switching from strchrnul() to
     just memchr() should be OK there.

  2. That raises the question of whether parse_signed_buffer() has a
     similar walk-too-far problem. ;) The answer is no, because we feed
     it from a strbuf. But it's not a great pattern overall.

-Peff

Junio C Hamano Feb. 8, 2024, 9:44 p.m. UTC | #9

Jeff King <peff@peff.net> writes:

>   1. It is not possible for the buf/len pair we pass to check_nonce() to
>      contain a NUL. And thus there is no caller of find_header_mem()
>      that can contain an embedded NUL. So switching from strchrnul() to
>      just memchr() should be OK there.

Correct.

>   2. That raises the question of whether parse_signed_buffer() has a
>      similar walk-too-far problem. ;) The answer is no, because we feed
>      it from a strbuf. But it's not a great pattern overall.

True, too.

Thanks.

commit.c: ensure strchrnul() doesn't scan beyond range

Commit Message

Comments

Patch