[2/4] fsck: check rev-index checksums

Message ID	7db4ec3e327ed3695f4f5409cb2dc80c72688758.1681748502.git.gitgitgadget@gmail.com (mailing list archive)
State	Accepted
Commit	d975fe1fa57d57cfd21a97f96f4a94b99f50f2f4
Headers	show Return-Path: <git-owner@vger.kernel.org> Message-Id: <7db4ec3e327ed3695f4f5409cb2dc80c72688758.1681748502.git.gitgitgadget@gmail.com> In-Reply-To: <pull.1512.git.1681748502.gitgitgadget@gmail.com> References: <pull.1512.git.1681748502.gitgitgadget@gmail.com> Date: Mon, 17 Apr 2023 16:21:39 +0000 Subject: [PATCH 2/4] fsck: check rev-index checksums Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: me@ttaylorr.com, gitster@pobox.com, Derrick Stolee <derrickstolee@github.com>, Derrick Stolee <derrickstolee@github.com> Precedence: bulk From: Derrick Stolee <derrickstolee@github.com>
Series	git fsck: check pack rev-index files \| expand [0/4] git fsck: check pack rev-index files [1/4] fsck: create scaffolding for rev-index checks [2/4] fsck: check rev-index checksums [3/4] fsck: check rev-index position values [4/4] fsck: validate .rev file header

Derrick Stolee April 17, 2023, 4:21 p.m. UTC

From: Derrick Stolee <derrickstolee@github.com>

The previous change added calls to verify_pack_revindex() in
builtin/fsck.c, but the implementation of the method was left empty. Add
the first and most-obvious check to this method: checksum verification.

While here, create a helper method in the test script that makes it easy
to adjust the .rev file and check that 'git fsck' reports the correct
error message.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 pack-revindex.c          | 10 ++++++++++
 t/t5325-reverse-index.sh | 40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

Junio C Hamano April 17, 2023, 10:15 p.m. UTC | #1

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +corrupt_rev_and_verify () {
> +	(
> +		pos="$1" &&
> +...
> +		grep "$error" err
> +	)
> +}

Curious.  Would it work equally well to write

	corrupt_rev_and_verify () (
		pos="$1" &&
		...
		grep "$error" err
	)

without an extra level of indentation?

Taylor Blau April 17, 2023, 10:24 p.m. UTC | #2

On Mon, Apr 17, 2023 at 04:21:39PM +0000, Derrick Stolee via GitGitGadget wrote:
> @@ -309,6 +310,15 @@ int load_pack_revindex(struct repository *r, struct packed_git *p)
>   */
>  int verify_pack_revindex(struct packed_git *p)
>  {
> +	/* Do not bother checking if not initialized. */

Yep, makes sense; if we don't have an on-disk reverse index (which is
mmap'd into `revindex_map` we don't have anything to verify), so we can
bail here.

> +	if (!p->revindex_map)
> +		return 0;
> +
> +	if (!hashfile_checksum_valid((const unsigned char *)p->revindex_map, p->revindex_size)) {
> +		error(_("invalid checksum"));
> +		return -1;
> +	}
> +
>  	return 0;
>  }
>
> diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh
> index 206c412f50b..6b7c709a1f6 100755
> --- a/t/t5325-reverse-index.sh
> +++ b/t/t5325-reverse-index.sh
> @@ -145,4 +145,44 @@ test_expect_success 'fsck succeeds on good rev-index' '
>  	)
>  '
>
> +test_expect_success 'set up rev-index corruption tests' '

s/set up/setup for easy `--run`-ing (e.g., ./t5325-*.sh --run=setup,fsck).

> +	git init corrupt &&
> +	(
> +		cd corrupt &&
> +
> +		test_commit commit &&
> +		git -c pack.writeReverseIndex=true repack -ad &&
> +
> +		revfile=$(ls .git/objects/pack/pack-*.rev) &&
> +		chmod a+w $revfile &&
> +		cp $revfile $revfile.bak
> +	)
> +'
> +
> +corrupt_rev_and_verify () {
> +	(
> +		pos="$1" &&
> +		value="$2" &&
> +		error="$3" &&
> +
> +		cd corrupt &&
> +		revfile=$(ls .git/objects/pack/pack-*.rev) &&
> +
> +		# Reset to original rev-file.
> +		cp $revfile.bak $revfile &&
> +
> +		printf "$value" | dd of=$revfile bs=1 seek="$pos" conv=notrunc &&
> +		test_must_fail git fsck 2>err &&
> +		grep "$error" err
> +	)
> +}
> +
> +test_expect_success 'fsck catches invalid checksum' '
> +	revfile=$(ls corrupt/.git/objects/pack/pack-*.rev) &&

Would this test be tighter if we introduced a sub-shell and cd'd into
"corrupt" here?

> +	orig_size=$(wc -c <$revfile) &&

I'm nitpicking, but we may want to use `test_file_size` here instead of
`wc -c`. The latter outnumbers the former in terms of number of uses,
but I think we consider `test_file_size` to be canonical these days.

> +	hashpos=$((orig_size - 10)) &&
> +	corrupt_rev_and_verify $hashpos bogus \
> +		"invalid checksum"
> +'

This looks good.

Thanks,
Taylor

Derrick Stolee April 18, 2023, 2:24 p.m. UTC | #3

On 4/17/2023 6:15 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +corrupt_rev_and_verify () {
>> +	(
>> +		pos="$1" &&
>> +...
>> +		grep "$error" err
>> +	)
>> +}
> 
> Curious.  Would it work equally well to write
> 
> 	corrupt_rev_and_verify () (
> 		pos="$1" &&
> 		...
> 		grep "$error" err
> 	)
> 
> without an extra level of indentation?

I've never seen that replacement (and it only exists in our tree
in t4018/bash-arithmetic-function and t4018/bash-subshell-function)
but it works and looks nicer.

Thanks,
-Stolee

Derrick Stolee April 18, 2023, 2:27 p.m. UTC | #4

On 4/17/2023 6:24 PM, Taylor Blau wrote:
> On Mon, Apr 17, 2023 at 04:21:39PM +0000, Derrick Stolee via GitGitGadget wrote:

>> +test_expect_success 'set up rev-index corruption tests' '
> 
> s/set up/setup for easy `--run`-ing (e.g., ./t5325-*.sh --run=setup,fsck).

Makes sense.

>> +test_expect_success 'fsck catches invalid checksum' '
>> +	revfile=$(ls corrupt/.git/objects/pack/pack-*.rev) &&
> 
> Would this test be tighter if we introduced a sub-shell and cd'd into
> "corrupt" here?

corrupt_rev_and_verify does the subshell thing. Why should we do that
here in the test?
 
>> +	orig_size=$(wc -c <$revfile) &&
> 
> I'm nitpicking, but we may want to use `test_file_size` here instead of
> `wc -c`. The latter outnumbers the former in terms of number of uses,
> but I think we consider `test_file_size` to be canonical these days.

Makes sense.

Thanks,
-Stolee

Taylor Blau April 18, 2023, 2:51 p.m. UTC | #5

On Tue, Apr 18, 2023 at 10:27:57AM -0400, Derrick Stolee wrote:
> >> +test_expect_success 'fsck catches invalid checksum' '
> >> +	revfile=$(ls corrupt/.git/objects/pack/pack-*.rev) &&
> >
> > Would this test be tighter if we introduced a sub-shell and cd'd into
> > "corrupt" here?
>
> corrupt_rev_and_verify does the subshell thing. Why should we do that
> here in the test?

I was thinking that it might be more concise if you moved the subshell
to the test and out of corrupt_rev_and_verify. In addition to making
corrupt_rev_and_verify work in other instances where the repository
isn't required to be in a directory named "corrupt", I think it
simplifies the result.

Here's what I was thinking, as a diff on top of this patch:

--- >8 ---
diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh
index 6b7c709a1f..7dfbaf6b37 100755
--- a/t/t5325-reverse-index.sh
+++ b/t/t5325-reverse-index.sh
@@ -160,29 +160,30 @@ test_expect_success 'set up rev-index corruption tests' '
 '

 corrupt_rev_and_verify () {
-	(
-		pos="$1" &&
-		value="$2" &&
-		error="$3" &&
+	pos="$1" &&
+	value="$2" &&
+	error="$3" &&

-		cd corrupt &&
-		revfile=$(ls .git/objects/pack/pack-*.rev) &&
+	revfile=$(ls .git/objects/pack/pack-*.rev) &&

-		# Reset to original rev-file.
-		cp $revfile.bak $revfile &&
+	# Reset to original rev-file.
+	cp $revfile.bak $revfile &&

-		printf "$value" | dd of=$revfile bs=1 seek="$pos" conv=notrunc &&
-		test_must_fail git fsck 2>err &&
-		grep "$error" err
-	)
+	printf "$value" | dd of=$revfile bs=1 seek="$pos" conv=notrunc &&
+	test_must_fail git fsck 2>err &&
+	grep "$error" err
 }

 test_expect_success 'fsck catches invalid checksum' '
-	revfile=$(ls corrupt/.git/objects/pack/pack-*.rev) &&
-	orig_size=$(wc -c <$revfile) &&
-	hashpos=$((orig_size - 10)) &&
-	corrupt_rev_and_verify $hashpos bogus \
-		"invalid checksum"
+	(
+		cd corrupt &&
+
+		revfile=$(ls .git/objects/pack/pack-*.rev) &&
+		orig_size=$(wc -c <$revfile) &&
+		hashpos=$((orig_size - 10)) &&
+		corrupt_rev_and_verify $hashpos bogus \
+			"invalid checksum"
+	)
 '

 test_done
--- 8< ---

If you do take my suggestion, make sure to remember to come back in
patches 3/4 and 4/4 and adjust those instances of
'corrupt_rev_and_verify' to first change into "corrupt".

Thanks,
Taylor

Derrick Stolee April 18, 2023, 2:57 p.m. UTC | #6

On 4/18/2023 10:51 AM, Taylor Blau wrote:
> On Tue, Apr 18, 2023 at 10:27:57AM -0400, Derrick Stolee wrote:
>>>> +test_expect_success 'fsck catches invalid checksum' '
>>>> +	revfile=$(ls corrupt/.git/objects/pack/pack-*.rev) &&
>>>
>>> Would this test be tighter if we introduced a sub-shell and cd'd into
>>> "corrupt" here?
>>
>> corrupt_rev_and_verify does the subshell thing. Why should we do that
>> here in the test?
> 
> I was thinking that it might be more concise if you moved the subshell
> to the test and out of corrupt_rev_and_verify. In addition to making
> corrupt_rev_and_verify work in other instances where the repository
> isn't required to be in a directory named "corrupt", I think it
> simplifies the result.

I don't think there is a good reason to allow using a different repo
name. This is the only test that requires doing anything but calling
corrupt_rev_and_verify with different parameters, so I think this
makes the test script at the end of the series noisier.

Thanks,
-Stolee

Taylor Blau April 18, 2023, 3:03 p.m. UTC | #7

On Tue, Apr 18, 2023 at 10:57:15AM -0400, Derrick Stolee wrote:
> On 4/18/2023 10:51 AM, Taylor Blau wrote:
> > On Tue, Apr 18, 2023 at 10:27:57AM -0400, Derrick Stolee wrote:
> >>>> +test_expect_success 'fsck catches invalid checksum' '
> >>>> +	revfile=$(ls corrupt/.git/objects/pack/pack-*.rev) &&
> >>>
> >>> Would this test be tighter if we introduced a sub-shell and cd'd into
> >>> "corrupt" here?
> >>
> >> corrupt_rev_and_verify does the subshell thing. Why should we do that
> >> here in the test?
> >
> > I was thinking that it might be more concise if you moved the subshell
> > to the test and out of corrupt_rev_and_verify. In addition to making
> > corrupt_rev_and_verify work in other instances where the repository
> > isn't required to be in a directory named "corrupt", I think it
> > simplifies the result.
>
> I don't think there is a good reason to allow using a different repo
> name. This is the only test that requires doing anything but calling
> corrupt_rev_and_verify with different parameters, so I think this
> makes the test script at the end of the series noisier.

No worries. I was thinking that it might be convenient in the future if
we wanted to corrupt a .rev file in a different repository, but that's
absolutely a bridge we can cross if/when we get to it.

Thanks,
Taylor

[2/4] fsck: check rev-index checksums

Commit Message

Comments

Patch