diff mbox series

builtin/blame: fix out-of-bounds read with excessive `--abbrev`

Message ID 20250109-b4-pks-blame-truncate-hash-length-v1-1-9ad4bb09e059@pks.im (mailing list archive)
State Superseded
Headers show
Series builtin/blame: fix out-of-bounds read with excessive `--abbrev` | expand

Commit Message

Patrick Steinhardt Jan. 9, 2025, 6:21 a.m. UTC
In 6411a0a896 (builtin/blame: fix type of `length` variable when
emitting object ID, 2024-12-06) we have fixed the type of the `length`
variable. In order to avoid a cast from `size_t` to `int` in the call to
printf(3p) with the "%.*s" formatter we have converted the code to
instead use fwrite(3p), which accepts the length as a `size_t`.

It was reported though that this makes us read over the end of the OID
array when the provided `--abbrev=` length exceeds the length of the
object ID. This is because fwrite(3p) of course doesn't stop when it
sees a NUL byte, where as printf(3p) does.

Fix the bug by reverting back to printf(3p) and culling the provided
length to `GIT_MAX_HEXSZ` to keep it from overflowing when cast to an
`int`.

Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
This fixes the issue reported in [1]. Thanks!

Patrick

[1]: <4d812802-afbc-4635-7a19-73896fcda625@gmx.de>
---
 builtin/blame.c  | 4 +++-
 t/t8002-blame.sh | 4 ++++
 2 files changed, 7 insertions(+), 1 deletion(-)


---
base-commit: 14650065b76b28d3cfa9453356ac5669b19e706e
change-id: 20250109-b4-pks-blame-truncate-hash-length-c875cac66d71

Comments

Kristoffer Haugsbakk Jan. 9, 2025, 10:31 a.m. UTC | #1
On Thu, Jan 9, 2025, at 07:21, Patrick Steinhardt wrote:
> object ID. This is because fwrite(3p) of course doesn't stop when it
> sees a NUL byte, where as printf(3p) does.

s/where as/whereas
Johannes Schindelin Jan. 9, 2025, 10:49 a.m. UTC | #2
Hi Patrick,

On Thu, 9 Jan 2025, Patrick Steinhardt wrote:

> In 6411a0a896 (builtin/blame: fix type of `length` variable when
> emitting object ID, 2024-12-06) we have fixed the type of the `length`
> variable. In order to avoid a cast from `size_t` to `int` in the call to
> printf(3p) with the "%.*s" formatter we have converted the code to
> instead use fwrite(3p), which accepts the length as a `size_t`.
>
> It was reported though that this makes us read over the end of the OID
> array when the provided `--abbrev=` length exceeds the length of the
> object ID. This is because fwrite(3p) of course doesn't stop when it
> sees a NUL byte, where as printf(3p) does.
>
> Fix the bug by reverting back to printf(3p) and culling the provided
> length to `GIT_MAX_HEXSZ` to keep it from overflowing when cast to an
> `int`.
>
> Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> This fixes the issue reported in [1]. Thanks!

Thank you for the quick fix! We will need to adjust it a little more,
though:

>
> Patrick
>
> [1]: <4d812802-afbc-4635-7a19-73896fcda625@gmx.de>
> ---
>  builtin/blame.c  | 4 +++-
>  t/t8002-blame.sh | 4 ++++
>  2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/blame.c b/builtin/blame.c
> index 867032e4c16878ffd56df8a73162b89ca4bd2694..ad91fe9e97f90625dd2708fbd44bf2dd24a337a6 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -475,6 +475,8 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int
>  		char ch;
>  		size_t length = (opt & OUTPUT_LONG_OBJECT_NAME) ?
>  			the_hash_algo->hexsz : (size_t) abbrev;
> +		if (length > GIT_MAX_HEXSZ)
> +			length = GIT_MAX_HEXSZ;

This causes a subtle change of behavior because there are a couple of
conditional code blocks between this change and the `printf()` call
decrease `length`, i.e. specifying values larger than the maximal hex size
causes potentially-desirable, different behavior (and think about
https://www.hyrumslaw.com/).

>
>  		if (opt & OUTPUT_COLOR_LINE) {
>  			if (cnt > 0) {
> @@ -505,7 +507,7 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int
>  			length--;
>  			putchar('?');
>  		}
> -		fwrite(hex, 1, length, stdout);
> +		printf("%.*s", (int)length, hex);
>  		if (opt & OUTPUT_ANNOTATE_COMPAT) {
>  			const char *name;
>  			if (opt & OUTPUT_SHOW_EMAIL)
> diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh
> index 0147de304b4d104cc7f05ea1f8d68f1a07ceb80d..fcaba8c11f7ede084e069eefd292f337e8396cb4 100755
> --- a/t/t8002-blame.sh
> +++ b/t/t8002-blame.sh
> @@ -126,6 +126,10 @@ test_expect_success '--no-abbrev works like --abbrev with full length' '
>  	check_abbrev $hexsz --no-abbrev
>  '
>
> +test_expect_success 'blame --abbrev gets truncated' '
> +	check_abbrev 9000 --abbrev=$hexsz HEAD

This is actually incorrect: it passes `--abbrev=$hexsz` instead of a value
that needs to be truncated.

> +'
> +
>  test_expect_success '--exclude-promisor-objects does not BUG-crash' '
>  	test_must_fail git blame --exclude-promisor-objects one
>  '
>
> ---
> base-commit: 14650065b76b28d3cfa9453356ac5669b19e706e
> change-id: 20250109-b4-pks-blame-truncate-hash-length-c875cac66d71

Here is my proposed fixup:

-- snipsnap --
[PATCH] fixup! builtin/blame: fix out-of-bounds read with excessive `--abbrev`

The test case needs to actually test an excessive `--abbrev` value.

Also, when calling `git blame --abbrev=<N>` with an `N` that is larger
than the maximal OID hex size, there is a subtle side effect that makes
it behave _differently_ than specifying said maximal hex size: When the
command outputs boundary, unblamable or ignored commits' OIDs, those
outputs are prefixed with characters indicating this, and the `abbrev`
value is used to align the information that comes after the OID,
clipping it as needed. Specifying a "too large" abbrev value here tells
Git that yes, we want the full OIDs and don't you worry about alignment.

Thanks to SHA-256 being _larger_ than the default SHA-1-based OIDs, and
thanks to clipping at `GIT_MAX_HEXSZ`, this change of behavior can only
be observed when running the test in SHA-256 mode.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/blame.c  | 9 +++++++--
 t/t8002-blame.sh | 2 +-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index ad91fe9e97f9..5b4976835066 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -475,8 +475,13 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int
 		char ch;
 		size_t length = (opt & OUTPUT_LONG_OBJECT_NAME) ?
 			the_hash_algo->hexsz : (size_t) abbrev;
-		if (length > GIT_MAX_HEXSZ)
-			length = GIT_MAX_HEXSZ;
+
+		/*
+		 * Leave enough space for ^, * and ? indicators (boundary,
+		 * unblamable, ignored).
+		 */
+		if (length > GIT_MAX_HEXSZ + 3)
+			length = GIT_MAX_HEXSZ + 3;

 		if (opt & OUTPUT_COLOR_LINE) {
 			if (cnt > 0) {
diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh
index fcaba8c11f7e..71fa70a64679 100755
--- a/t/t8002-blame.sh
+++ b/t/t8002-blame.sh
@@ -127,7 +127,7 @@ test_expect_success '--no-abbrev works like --abbrev with full length' '
 '

 test_expect_success 'blame --abbrev gets truncated' '
-	check_abbrev 9000 --abbrev=$hexsz HEAD
+	check_abbrev 9000 --abbrev=9000 HEAD..
 '

 test_expect_success '--exclude-promisor-objects does not BUG-crash' '
--
2.48.0.rc0.windows.1
Patrick Steinhardt Jan. 9, 2025, 11:14 a.m. UTC | #3
On Thu, Jan 09, 2025 at 11:49:43AM +0100, Johannes Schindelin wrote:
> > diff --git a/builtin/blame.c b/builtin/blame.c
> > index 867032e4c16878ffd56df8a73162b89ca4bd2694..ad91fe9e97f90625dd2708fbd44bf2dd24a337a6 100644
> > --- a/builtin/blame.c
> > +++ b/builtin/blame.c
> > @@ -475,6 +475,8 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int
> >  		char ch;
> >  		size_t length = (opt & OUTPUT_LONG_OBJECT_NAME) ?
> >  			the_hash_algo->hexsz : (size_t) abbrev;
> > +		if (length > GIT_MAX_HEXSZ)
> > +			length = GIT_MAX_HEXSZ;
> 
> This causes a subtle change of behavior because there are a couple of
> conditional code blocks between this change and the `printf()` call
> decrease `length`, i.e. specifying values larger than the maximal hex size
> causes potentially-desirable, different behavior (and think about
> https://www.hyrumslaw.com/).

Alternatively we can move this until after we have done the
subtractions. Then we don't have to do weird gymnastics.

> >
> >  		if (opt & OUTPUT_COLOR_LINE) {
> >  			if (cnt > 0) {
> > @@ -505,7 +507,7 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int
> >  			length--;
> >  			putchar('?');
> >  		}
> > -		fwrite(hex, 1, length, stdout);
> > +		printf("%.*s", (int)length, hex);
> >  		if (opt & OUTPUT_ANNOTATE_COMPAT) {
> >  			const char *name;
> >  			if (opt & OUTPUT_SHOW_EMAIL)
> > diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh
> > index 0147de304b4d104cc7f05ea1f8d68f1a07ceb80d..fcaba8c11f7ede084e069eefd292f337e8396cb4 100755
> > --- a/t/t8002-blame.sh
> > +++ b/t/t8002-blame.sh
> > @@ -126,6 +126,10 @@ test_expect_success '--no-abbrev works like --abbrev with full length' '
> >  	check_abbrev $hexsz --no-abbrev
> >  '
> >
> > +test_expect_success 'blame --abbrev gets truncated' '
> > +	check_abbrev 9000 --abbrev=$hexsz HEAD
> 
> This is actually incorrect: it passes `--abbrev=$hexsz` instead of a value
> that needs to be truncated.

Oh dear. The test did manage to catch the bug, but thinking more about
it that was only because my initial fix was broken.

> diff --git a/builtin/blame.c b/builtin/blame.c
> index ad91fe9e97f9..5b4976835066 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -475,8 +475,13 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int
>  		char ch;
>  		size_t length = (opt & OUTPUT_LONG_OBJECT_NAME) ?
>  			the_hash_algo->hexsz : (size_t) abbrev;
> -		if (length > GIT_MAX_HEXSZ)
> -			length = GIT_MAX_HEXSZ;
> +
> +		/*
> +		 * Leave enough space for ^, * and ? indicators (boundary,
> +		 * unblamable, ignored).
> +		 */
> +		if (length > GIT_MAX_HEXSZ + 3)
> +			length = GIT_MAX_HEXSZ + 3;
> 
>  		if (opt & OUTPUT_COLOR_LINE) {
>  			if (cnt > 0) {

How about this instead?

    diff --git a/builtin/blame.c b/builtin/blame.c
    index ad91fe9e97..f92e487bed 100644
    --- a/builtin/blame.c
    +++ b/builtin/blame.c
    @@ -475,8 +475,6 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int
            char ch;
            size_t length = (opt & OUTPUT_LONG_OBJECT_NAME) ?
                the_hash_algo->hexsz : (size_t) abbrev;
    -		if (length > GIT_MAX_HEXSZ)
    -			length = GIT_MAX_HEXSZ;
     
            if (opt & OUTPUT_COLOR_LINE) {
                if (cnt > 0) {
    @@ -507,6 +505,9 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int
                length--;
                putchar('?');
            }
    +
    +		if (length > GIT_MAX_HEXSZ)
    +			length = GIT_MAX_HEXSZ;
            printf("%.*s", (int)length, hex);
            if (opt & OUTPUT_ANNOTATE_COMPAT) {
                const char *name;

In that case there's no need to juggle with the magic indicators, which
makes it a bit easier to reason about.

> diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh
> index fcaba8c11f7e..71fa70a64679 100755
> --- a/t/t8002-blame.sh
> +++ b/t/t8002-blame.sh
> @@ -127,7 +127,7 @@ test_expect_success '--no-abbrev works like --abbrev with full length' '
>  '
> 
>  test_expect_success 'blame --abbrev gets truncated' '
> -	check_abbrev 9000 --abbrev=$hexsz HEAD
> +	check_abbrev 9000 --abbrev=9000 HEAD..
>  '

This should be `check_abbrev $hexsz --abbrev=9000`, shouldn't it?

Patrick
Johannes Schindelin Jan. 9, 2025, 1:41 p.m. UTC | #4
Hi Patrick,

On Thu, 9 Jan 2025, Patrick Steinhardt wrote:

> On Thu, Jan 09, 2025 at 11:49:43AM +0100, Johannes Schindelin wrote:
> > > diff --git a/builtin/blame.c b/builtin/blame.c
> > > index 867032e4c16878ffd56df8a73162b89ca4bd2694..ad91fe9e97f90625dd2708fbd44bf2dd24a337a6 100644
> > > --- a/builtin/blame.c
> > > +++ b/builtin/blame.c
> > > @@ -475,6 +475,8 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int
> > >  		char ch;
> > >  		size_t length = (opt & OUTPUT_LONG_OBJECT_NAME) ?
> > >  			the_hash_algo->hexsz : (size_t) abbrev;
> > > +		if (length > GIT_MAX_HEXSZ)
> > > +			length = GIT_MAX_HEXSZ;
> >
> > This causes a subtle change of behavior because there are a couple of
> > conditional code blocks between this change and the `printf()` call
> > decrease `length`, i.e. specifying values larger than the maximal hex size
> > causes potentially-desirable, different behavior (and think about
> > https://www.hyrumslaw.com/).
>
> Alternatively we can move this until after we have done the
> subtractions. Then we don't have to do weird gymnastics.

Or we can even avoid assiging a maximum altogether:

		if (length < GIT_MAX_HEXSZ)
			printf("%.*s", (int)length, hex);
		else
			printf("%s", hex);

Or be more consistent with Git's source code style which often prefers
ternaries, favoring succinctness over readability:

		printf("%.*s", (int)(length < GIT_MAX_HEXSZ ? length : GIT_MAX_HEXSZ), hex);

> > >  		if (opt & OUTPUT_COLOR_LINE) {
> > >  			if (cnt > 0) {
> > > @@ -505,7 +507,7 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int
> > >  			length--;
> > >  			putchar('?');
> > >  		}
> > > -		fwrite(hex, 1, length, stdout);
> > > +		printf("%.*s", (int)length, hex);
> > >  		if (opt & OUTPUT_ANNOTATE_COMPAT) {
> > >  			const char *name;
> > >  			if (opt & OUTPUT_SHOW_EMAIL)
> > > diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh
> > > index 0147de304b4d104cc7f05ea1f8d68f1a07ceb80d..fcaba8c11f7ede084e069eefd292f337e8396cb4 100755
> > > --- a/t/t8002-blame.sh
> > > +++ b/t/t8002-blame.sh
> > > @@ -126,6 +126,10 @@ test_expect_success '--no-abbrev works like --abbrev with full length' '
> > >  	check_abbrev $hexsz --no-abbrev
> > >  '
> > >
> > > +test_expect_success 'blame --abbrev gets truncated' '
> > > +	check_abbrev 9000 --abbrev=$hexsz HEAD
> >
> > This is actually incorrect: it passes `--abbrev=$hexsz` instead of a value
> > that needs to be truncated.
>
> Oh dear. The test did manage to catch the bug, but thinking more about
> it that was only because my initial fix was broken.
>
> > diff --git a/builtin/blame.c b/builtin/blame.c
> > index ad91fe9e97f9..5b4976835066 100644
> > --- a/builtin/blame.c
> > +++ b/builtin/blame.c
> > @@ -475,8 +475,13 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int
> >  		char ch;
> >  		size_t length = (opt & OUTPUT_LONG_OBJECT_NAME) ?
> >  			the_hash_algo->hexsz : (size_t) abbrev;
> > -		if (length > GIT_MAX_HEXSZ)
> > -			length = GIT_MAX_HEXSZ;
> > +
> > +		/*
> > +		 * Leave enough space for ^, * and ? indicators (boundary,
> > +		 * unblamable, ignored).
> > +		 */
> > +		if (length > GIT_MAX_HEXSZ + 3)
> > +			length = GIT_MAX_HEXSZ + 3;
> >
> >  		if (opt & OUTPUT_COLOR_LINE) {
> >  			if (cnt > 0) {
>
> How about this instead?
>
>     diff --git a/builtin/blame.c b/builtin/blame.c
>     index ad91fe9e97..f92e487bed 100644
>     --- a/builtin/blame.c
>     +++ b/builtin/blame.c
>     @@ -475,8 +475,6 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int
>             char ch;
>             size_t length = (opt & OUTPUT_LONG_OBJECT_NAME) ?
>                 the_hash_algo->hexsz : (size_t) abbrev;
>     -		if (length > GIT_MAX_HEXSZ)
>     -			length = GIT_MAX_HEXSZ;
>
>             if (opt & OUTPUT_COLOR_LINE) {
>                 if (cnt > 0) {
>     @@ -507,6 +505,9 @@ static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int
>                 length--;
>                 putchar('?');
>             }
>     +
>     +		if (length > GIT_MAX_HEXSZ)
>     +			length = GIT_MAX_HEXSZ;
>             printf("%.*s", (int)length, hex);
>             if (opt & OUTPUT_ANNOTATE_COMPAT) {
>                 const char *name;
>
> In that case there's no need to juggle with the magic indicators, which
> makes it a bit easier to reason about.
>
> > diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh
> > index fcaba8c11f7e..71fa70a64679 100755
> > --- a/t/t8002-blame.sh
> > +++ b/t/t8002-blame.sh
> > @@ -127,7 +127,7 @@ test_expect_success '--no-abbrev works like --abbrev with full length' '
> >  '
> >
> >  test_expect_success 'blame --abbrev gets truncated' '
> > -	check_abbrev 9000 --abbrev=$hexsz HEAD
> > +	check_abbrev 9000 --abbrev=9000 HEAD..
> >  '
>
> This should be `check_abbrev $hexsz --abbrev=9000`, shouldn't it?

I kind of liked the idea to keep the same cut-off threshold for the
validation. But I won't insist. The construct is obtuse in either case ;-)
(Not your fault, of course, you merely imitated existing code, which is
the correct thing to do).

Ciao,
Johannes
diff mbox series

Patch

diff --git a/builtin/blame.c b/builtin/blame.c
index 867032e4c16878ffd56df8a73162b89ca4bd2694..ad91fe9e97f90625dd2708fbd44bf2dd24a337a6 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -475,6 +475,8 @@  static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int
 		char ch;
 		size_t length = (opt & OUTPUT_LONG_OBJECT_NAME) ?
 			the_hash_algo->hexsz : (size_t) abbrev;
+		if (length > GIT_MAX_HEXSZ)
+			length = GIT_MAX_HEXSZ;
 
 		if (opt & OUTPUT_COLOR_LINE) {
 			if (cnt > 0) {
@@ -505,7 +507,7 @@  static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, int
 			length--;
 			putchar('?');
 		}
-		fwrite(hex, 1, length, stdout);
+		printf("%.*s", (int)length, hex);
 		if (opt & OUTPUT_ANNOTATE_COMPAT) {
 			const char *name;
 			if (opt & OUTPUT_SHOW_EMAIL)
diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh
index 0147de304b4d104cc7f05ea1f8d68f1a07ceb80d..fcaba8c11f7ede084e069eefd292f337e8396cb4 100755
--- a/t/t8002-blame.sh
+++ b/t/t8002-blame.sh
@@ -126,6 +126,10 @@  test_expect_success '--no-abbrev works like --abbrev with full length' '
 	check_abbrev $hexsz --no-abbrev
 '
 
+test_expect_success 'blame --abbrev gets truncated' '
+	check_abbrev 9000 --abbrev=$hexsz HEAD
+'
+
 test_expect_success '--exclude-promisor-objects does not BUG-crash' '
 	test_must_fail git blame --exclude-promisor-objects one
 '