diff mbox series

[v3] show-index: fix uninitialized hash function

Message ID 20241026120950.72727-1-abhijeet.nkt@gmail.com (mailing list archive)
State Superseded
Headers show
Series [v3] show-index: fix uninitialized hash function | expand

Commit Message

Abhijeet Sonar Oct. 26, 2024, 12:09 p.m. UTC
As stated in the docs, show-index should use SHA1 as the default hash algorithm
when run outsize of a repository.  However, 'the_hash_algo' is currently left
uninitialized if we are not in a repository and no explicit hash function is
specified, causing a crash.  Fix it by falling back to SHA1 when it is found
uninitialized. Also add test that verifies this behaviour.

Signed-off-by: Abhijeet Sonar <abhijeet.nkt@gmail.com>
---
 builtin/show-index.c   | 3 +++
 t/t5300-pack-object.sh | 4 ++++
 2 files changed, 7 insertions(+)

Comments

Taylor Blau Oct. 28, 2024, 12:10 a.m. UTC | #1
On Sat, Oct 26, 2024 at 05:39:50PM +0530, Abhijeet Sonar wrote:
> As stated in the docs, show-index should use SHA1 as the default hash algorithm
> when run outsize of a repository.  However, 'the_hash_algo' is currently left
> uninitialized if we are not in a repository and no explicit hash function is
> specified, causing a crash.  Fix it by falling back to SHA1 when it is found
> uninitialized. Also add test that verifies this behaviour.

This commit description is good, and would benefit further from a
bisection showing where the regression began. I don't think that it is a
prerequisite for us moving this patch forward, though.

> Signed-off-by: Abhijeet Sonar <abhijeet.nkt@gmail.com>
> ---
>  builtin/show-index.c   | 3 +++
>  t/t5300-pack-object.sh | 4 ++++
>  2 files changed, 7 insertions(+)
>
> diff --git a/builtin/show-index.c b/builtin/show-index.c
> index f164c01bbe..978ae70470 100644
> --- a/builtin/show-index.c
> +++ b/builtin/show-index.c
> @@ -38,6 +38,9 @@ int cmd_show_index(int argc,
>  		repo_set_hash_algo(the_repository, hash_algo);
>  	}
>
> +	if (!the_hash_algo)
> +		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> +
>  	hashsz = the_hash_algo->rawsz;
>
>  	if (fread(top_index, 2 * 4, 1, stdin) != 1)
> diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
> index 3b9dae331a..51fed26cc4 100755
> --- a/t/t5300-pack-object.sh
> +++ b/t/t5300-pack-object.sh
> @@ -523,6 +523,10 @@ test_expect_success 'index-pack --strict <pack> works in non-repo' '
>  	test_path_is_file foo.idx
>  '
>
> +test_expect_success SHA1 'show-index works OK outside a repository' '
> +	nongit git show-index <foo.idx
> +'
> +
>  test_expect_success !PTHREADS,!FAIL_PREREQS \
>  	'index-pack --threads=N or pack.threads=N warns when no pthreads' '
>  	test_must_fail git index-pack --threads=2 2>err &&
> --
> 2.47.0.107.g34b6ce9b30

These all look reasonable and as-expected to me. Patrick (CC'd) has been
reviewing similar changes elsewhere, so I'd like him to chime in as well
on whether or not this looks good to go.

Thanks,
Taylor
Patrick Steinhardt Oct. 28, 2024, 5:35 a.m. UTC | #2
On Sun, Oct 27, 2024 at 08:10:16PM -0400, Taylor Blau wrote:
> On Sat, Oct 26, 2024 at 05:39:50PM +0530, Abhijeet Sonar wrote:
> > diff --git a/builtin/show-index.c b/builtin/show-index.c
> > index f164c01bbe..978ae70470 100644
> > --- a/builtin/show-index.c
> > +++ b/builtin/show-index.c
> > @@ -38,6 +38,9 @@ int cmd_show_index(int argc,
> >  		repo_set_hash_algo(the_repository, hash_algo);
> >  	}
> >
> > +	if (!the_hash_algo)
> > +		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);

Let's add a todo-comment here. The behaviour with this patch is somewhat
broken as you cannot inspect indices that use any other object hash than
SHA256 outside of a repository. This is fine from my point of view and
nothing that you have to fix here, as you simply fix up the broken
behaviour. But in the future, we should either:

  - Add logic to detect the format of the passed-in index and set that
    up as the hash algorithm.

  - If that is impossible, add a command line option to pick the hash
    algo.

> >  	hashsz = the_hash_algo->rawsz;
> >
> >  	if (fread(top_index, 2 * 4, 1, stdin) != 1)
> > diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
> > index 3b9dae331a..51fed26cc4 100755
> > --- a/t/t5300-pack-object.sh
> > +++ b/t/t5300-pack-object.sh
> > @@ -523,6 +523,10 @@ test_expect_success 'index-pack --strict <pack> works in non-repo' '
> >  	test_path_is_file foo.idx
> >  '
> >
> > +test_expect_success SHA1 'show-index works OK outside a repository' '
> > +	nongit git show-index <foo.idx
> > +'

So how does this behave with SHA256? Does it raise an error? Does it
segfault?

I think it's okay to fail with SHA256 for now, but I'd like the
failure behaviour to be cleanish. So I'd prefer to not skip the test
completely, but adapt our expectations based on the hash algo. Or have
two separate tests, one for each hash, that explicitly init the repo
with `git init --ref-format=$hash`, and then exercise the behaviour for
each of them.

> >  test_expect_success !PTHREADS,!FAIL_PREREQS \
> >  	'index-pack --threads=N or pack.threads=N warns when no pthreads' '
> >  	test_must_fail git index-pack --threads=2 2>err &&
> > --
> > 2.47.0.107.g34b6ce9b30
> 
> These all look reasonable and as-expected to me. Patrick (CC'd) has been
> reviewing similar changes elsewhere, so I'd like him to chime in as well
> on whether or not this looks good to go.

Ah, thanks. I've missed this topic somehow.

Patrick
Taylor Blau Oct. 28, 2024, 5:42 p.m. UTC | #3
On Mon, Oct 28, 2024 at 06:35:15AM +0100, Patrick Steinhardt wrote:
> > >  test_expect_success !PTHREADS,!FAIL_PREREQS \
> > >  	'index-pack --threads=N or pack.threads=N warns when no pthreads' '
> > >  	test_must_fail git index-pack --threads=2 2>err &&
> > > --
> > > 2.47.0.107.g34b6ce9b30
> >
> > These all look reasonable and as-expected to me. Patrick (CC'd) has been
> > reviewing similar changes elsewhere, so I'd like him to chime in as well
> > on whether or not this looks good to go.
>
> Ah, thanks. I've missed this topic somehow.

Not a problem at all. Thanks for a very helpful review.

Abhijeet: I've gone ahead and marked this in my notes as "expecting
another round" to address the feedback from Patrick. I'll keep my eyes
out for the new version of this patch. Thanks!

Thanks,
Taylor
Abhijeet Sonar Oct. 29, 2024, 10:30 a.m. UTC | #4
On 28/10/24 05:40, Taylor Blau wrote:
> On Sat, Oct 26, 2024 at 05:39:50PM +0530, Abhijeet Sonar wrote:
>> As stated in the docs, show-index should use SHA1 as the default hash algorithm
>> when run outsize of a repository.  However, 'the_hash_algo' is currently left
>> uninitialized if we are not in a repository and no explicit hash function is
>> specified, causing a crash.  Fix it by falling back to SHA1 when it is found
>> uninitialized. Also add test that verifies this behaviour.
> 
> This commit description is good, and would benefit further from a
> bisection showing where the regression began. I don't think that it is a
> prerequisite for us moving this patch forward, though.

On bisecting, the offending commit appears to be c8aed5e8da (repository:
stop setting SHA1 as the default object hash).

Another related commit is ab274909d4 (builtin/diff: explicitly set hash
algo when there is no repo) which did something very similar to my patch.

Thanks
Abhijeet Sonar Oct. 29, 2024, 11:54 a.m. UTC | #5
On 28/10/24 11:05, Patrick Steinhardt wrote:
> On Sun, Oct 27, 2024 at 08:10:16PM -0400, Taylor Blau wrote:
>> On Sat, Oct 26, 2024 at 05:39:50PM +0530, Abhijeet Sonar wrote:
>>> diff --git a/builtin/show-index.c b/builtin/show-index.c
>>> index f164c01bbe..978ae70470 100644
>>> --- a/builtin/show-index.c
>>> +++ b/builtin/show-index.c
>>> @@ -38,6 +38,9 @@ int cmd_show_index(int argc,
>>>  		repo_set_hash_algo(the_repository, hash_algo);
>>>  	}
>>>
>>> +	if (!the_hash_algo)
>>> +		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> 
> Let's add a todo-comment here. The behaviour with this patch is somewhat
> broken as you cannot inspect indices that use any other object hash than
> SHA256 outside of a repository. This is fine from my point of view and
> nothing that you have to fix here, as you simply fix up the broken
> behaviour. But in the future, we should either:

I will add those comments, thanks.

> 
>   - Add logic to detect the format of the passed-in index and set that
>     up as the hash algorithm

I am very interested in hacking around and trying to implement this.
I read about the index format here:
https://git-scm.com/docs/index-format and (assuming I am looking at the
right thing) it does not seem like index files contain information about
the hash algorithm used in their headers or anywhere else. Are there
other leads I should follow?

> 
>   - If that is impossible, add a command line option to pick the hash
>     algo.
>
Actually there is already an option (--object-format) that allows
changing the hash algo used, it's just that default format is SHA1 when
one is not specified. (or at least will be, after this patch)

>>>  	hashsz = the_hash_algo->rawsz;
>>>
>>>  	if (fread(top_index, 2 * 4, 1, stdin) != 1)
>>> diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
>>> index 3b9dae331a..51fed26cc4 100755
>>> --- a/t/t5300-pack-object.sh
>>> +++ b/t/t5300-pack-object.sh
>>> @@ -523,6 +523,10 @@ test_expect_success 'index-pack --strict <pack> works in non-repo' '
>>>  	test_path_is_file foo.idx
>>>  '
>>>
>>> +test_expect_success SHA1 'show-index works OK outside a repository' '
>>> +	nongit git show-index <foo.idx
>>> +'
> 
> So how does this behave with SHA256? Does it raise an error? Does it
> segfault?
> 
> I think it's okay to fail with SHA256 for now, but I'd like the
> failure behaviour to be cleanish. So I'd prefer to not skip the test
> completely, but adapt our expectations based on the hash algo. Or have
> two separate tests, one for each hash, that explicitly init the repo
> with `git init --ref-format=$hash`, and then exercise the behaviour for
> each of the>

Running show-index outside of a repository with a SHA256 based index
file gives:

$ git show-index <foo.idx
fatal: inconsistent 64b offset index

At the very least, it does not crash.

>>>  test_expect_success !PTHREADS,!FAIL_PREREQS \
>>>  	'index-pack --threads=N or pack.threads=N warns when no pthreads' '
>>>  	test_must_fail git index-pack --threads=2 2>err &&
>>> --
>>> 2.47.0.107.g34b6ce9b30
>>
>> These all look reasonable and as-expected to me. Patrick (CC'd) has been
>> reviewing similar changes elsewhere, so I'd like him to chime in as well
>> on whether or not this looks good to go.
> 
> Ah, thanks. I've missed this topic somehow.
> 
> Patrick

Thanks
diff mbox series

Patch

diff --git a/builtin/show-index.c b/builtin/show-index.c
index f164c01bbe..978ae70470 100644
--- a/builtin/show-index.c
+++ b/builtin/show-index.c
@@ -38,6 +38,9 @@  int cmd_show_index(int argc,
 		repo_set_hash_algo(the_repository, hash_algo);
 	}
 
+	if (!the_hash_algo)
+		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
 	hashsz = the_hash_algo->rawsz;
 
 	if (fread(top_index, 2 * 4, 1, stdin) != 1)
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 3b9dae331a..51fed26cc4 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -523,6 +523,10 @@  test_expect_success 'index-pack --strict <pack> works in non-repo' '
 	test_path_is_file foo.idx
 '
 
+test_expect_success SHA1 'show-index works OK outside a repository' '
+	nongit git show-index <foo.idx
+'
+
 test_expect_success !PTHREADS,!FAIL_PREREQS \
 	'index-pack --threads=N or pack.threads=N warns when no pthreads' '
 	test_must_fail git index-pack --threads=2 2>err &&