diff mbox series

[v2,4/4] object-file: fix a unpack_loose_header() regression in 3b6a8db3b03

Message ID 20220512223218.237544-5-gitster@pobox.com (mailing list archive)
State Accepted
Commit 4627c67fa68d5669be511962a6437a11c0db3c99
Headers show
Series test fixes around valgrind | expand

Commit Message

Junio C Hamano May 12, 2022, 10:32 p.m. UTC
From: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

Fix a regression in my 3b6a8db3b03 (object-file.c: use "enum" return
type for unpack_loose_header(), 2021-10-01) revealed both by running
the test suite with --valgrind, and with the amended "git fsck" test.

In practice this regression in v2.34.0 caused us to claim that we
couldn't parse the header, as opposed to not being able to unpack
it. Before the change in the C code the test_cmp added here would emit:

	-error: unable to unpack header of ./objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
	+error: unable to parse header of ./objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391

I.e. we'd proceed to call parse_loose_header() on the uninitialized
"hdr" value, and it would have been very unlikely for that
uninitialized memory to be a valid git object.

The other callers of unpack_loose_header() were already checking the
enum values exhaustively. See 3b6a8db3b03 and
5848fb11acd (object-file.c: return ULHR_TOO_LONG on "header too long",
2021-10-01).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 object-file.c       |  8 ++++++--
 t/t1006-cat-file.sh | 10 ++++++++--
 t/t1450-fsck.sh     | 13 +++++++++++--
 3 files changed, 25 insertions(+), 6 deletions(-)

Comments

Junio C Hamano May 12, 2022, 11:39 p.m. UTC | #1
Junio C Hamano <gitster@pobox.com> writes:

> diff --git a/object-file.c b/object-file.c
> index 5ffbf3d4fd..b5d1d12b68 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -2623,8 +2623,12 @@ int read_loose_object(const char *path,
>  		goto out;
>  	}
>  
> -	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
> -				NULL) < 0) {
> +	switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
> +				    NULL)) {
> +	case ULHR_OK:
> +		break;
> +	case ULHR_BAD:
> +	case ULHR_TOO_LONG:
>  		error(_("unable to unpack header of %s"), path);
>  		goto out;
>  	}

Regarding this hunk, since we only care about a single "did we get
any error, or did we unpack OK" bit, I think this should be more
like

	if (unpack_loose_header(...) != ULHR_OK) {
		error(_("unable to..."), path);
		goto out;
	}

It is true, as Ævar mentioned, that there is another place in the
same file that uses switch() in loose_object_info(), and it should
remain to be switch() on the returned enum because it wants to
behave differnetly depending on the kind of error it gets.  But that
is not a reason to make this part that only cares about a single
"did it fail?" into a switch and force future developers to add a
useless case arm.

I left it there as posted in the previous round because I was too
lazy ;-) and also it is something we can clean up with a follow up
patch outside the series.  As my today's focus has been to reduce
the number of topics waiting for a reroll, I'd rather leave things
that are not outright broken but needs clean up as they are for the
sake of expediency.

Thanks.
Derrick Stolee May 16, 2022, 2:59 p.m. UTC | #2
On 5/12/2022 7:39 PM, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
>> diff --git a/object-file.c b/object-file.c
>> index 5ffbf3d4fd..b5d1d12b68 100644
>> --- a/object-file.c
>> +++ b/object-file.c
>> @@ -2623,8 +2623,12 @@ int read_loose_object(const char *path,
>>  		goto out;
>>  	}
>>  
>> -	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
>> -				NULL) < 0) {
>> +	switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
>> +				    NULL)) {
>> +	case ULHR_OK:
>> +		break;
>> +	case ULHR_BAD:
>> +	case ULHR_TOO_LONG:
>>  		error(_("unable to unpack header of %s"), path);
>>  		goto out;
>>  	}
> 
> Regarding this hunk, since we only care about a single "did we get
> any error, or did we unpack OK" bit, I think this should be more
> like
> 
> 	if (unpack_loose_header(...) != ULHR_OK) {
> 		error(_("unable to..."), path);
> 		goto out;
> 	}
> 
> It is true, as Ævar mentioned, that there is another place in the
> same file that uses switch() in loose_object_info(), and it should
> remain to be switch() on the returned enum because it wants to
> behave differnetly depending on the kind of error it gets.  But that
> is not a reason to make this part that only cares about a single
> "did it fail?" into a switch and force future developers to add a
> useless case arm.
> 
> I left it there as posted in the previous round because I was too
> lazy ;-) and also it is something we can clean up with a follow up
> patch outside the series.  As my today's focus has been to reduce
> the number of topics waiting for a reroll, I'd rather leave things
> that are not outright broken but needs clean up as they are for the
> sake of expediency.

Taking a look at your new version, I agree that this use of 'switch'
is out of place and can make things more confusing in the future.

Here is a patch doing exactly what you recommended, which you can
choose to add or squash. I made you co-author, but I expect you to
add your sign-off after mine.

-- >8 --

From 85cd37b4f23e06980ea95311067d735144fe932f Mon Sep 17 00:00:00 2001
From: Derrick Stolee <derrickstolee@github.com>
Date: Mon, 16 May 2022 10:53:27 -0400
Subject: [PATCH] object-file: convert 'switch' back to 'if'

This switch statement was recently added to make it clear that
unpack_loose_header() returns an enum value, not an int. This adds
complications for future developers if that enum gains new values, since
that developer would need to add a case statement to this switch for
little real value.

Instead, we can revert back to an 'if' statement, but make the enum
explicit by using "!= ULHR_OK" instead of assuming it has the numerical
value zero.

Co-authored-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---

 object-file.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/object-file.c b/object-file.c
index b5d1d12b68a..52e4ae1b5f0 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2623,12 +2623,8 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
-				    NULL)) {
-	case ULHR_OK:
-		break;
-	case ULHR_BAD:
-	case ULHR_TOO_LONG:
+	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				NULL) != ULHR_OK) {
 		error(_("unable to unpack header of %s"), path);
 		goto out;
 	}
diff mbox series

Patch

diff --git a/object-file.c b/object-file.c
index 5ffbf3d4fd..b5d1d12b68 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2623,8 +2623,12 @@  int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
-				NULL) < 0) {
+	switch (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				    NULL)) {
+	case ULHR_OK:
+		break;
+	case ULHR_BAD:
+	case ULHR_TOO_LONG:
 		error(_("unable to unpack header of %s"), path);
 		goto out;
 	}
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 1b85207694..dadf3b1458 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -681,7 +681,7 @@  test_expect_success 'cat-file -t and -s on corrupt loose object' '
 
 		# Setup and create the empty blob and its path
 		empty_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$EMPTY_BLOB")) &&
-		git hash-object -w --stdin </dev/null &&
+		empty_blob=$(git hash-object -w --stdin </dev/null) &&
 
 		# Create another blob and its path
 		echo other >other.blob &&
@@ -722,7 +722,13 @@  test_expect_success 'cat-file -t and -s on corrupt loose object' '
 		# content out as-is. Try to make it zlib-invalid.
 		mv -f other.blob "$empty_path" &&
 		test_must_fail git fsck 2>err.fsck &&
-		grep "^error: inflate: data stream error (" err.fsck
+		cat >expect <<-EOF &&
+		error: inflate: data stream error (incorrect header check)
+		error: unable to unpack header of ./$empty_path
+		error: $empty_blob: object corrupt or missing: ./$empty_path
+		EOF
+		grep "^error: " err.fsck >actual &&
+		test_cmp expect actual
 	)
 '
 
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index de50c0ea01..ab7f31f1dc 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -774,10 +774,19 @@  test_expect_success 'fsck finds problems in duplicate loose objects' '
 		# no "-d" here, so we end up with duplicates
 		git repack &&
 		# now corrupt the loose copy
-		file=$(sha1_file "$(git rev-parse HEAD)") &&
+		oid="$(git rev-parse HEAD)" &&
+		file=$(sha1_file "$oid") &&
 		rm "$file" &&
 		echo broken >"$file" &&
-		test_must_fail git fsck
+		test_must_fail git fsck 2>err &&
+
+		cat >expect <<-EOF &&
+		error: inflate: data stream error (incorrect header check)
+		error: unable to unpack header of $file
+		error: $oid: object corrupt or missing: $file
+		EOF
+		grep "^error: " err >actual &&
+		test_cmp expect actual
 	)
 '