diff mbox series

[v3,4/5] builtin/hash-object: fix uninitialized hash function

Message ID 20240513224127.2042052-5-gitster@pobox.com (mailing list archive)
State Superseded
Headers show
Series Fix use of uninitialized hash algorithms | expand

Commit Message

Junio C Hamano May 13, 2024, 10:41 p.m. UTC
From: Patrick Steinhardt <ps@pks.im>

The git-hash-object(1) command allows users to hash an object even
without a repository. Starting with c8aed5e8da (repository: stop setting
SHA1 as the default object hash, 2024-05-07), this will make us hit an
uninitialized hash function, which subsequently leads to a segfault.

Fix this by falling back to SHA-1 explicitly when running outside of a
Git repository. Users can use GIT_DEFAULT_HASH_ALGORITHM environment to
specify what hash algorithm they want, so arguably this code should not
be needed.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/hash-object.c   | 3 +++
 t/t1007-hash-object.sh  | 6 ++++++
 t/t1517-outside-repo.sh | 2 +-
 3 files changed, 10 insertions(+), 1 deletion(-)

Comments

Junio C Hamano May 13, 2024, 11:13 p.m. UTC | #1
Junio C Hamano <gitster@pobox.com> writes:

> From: Patrick Steinhardt <ps@pks.im>
>
> The git-hash-object(1) command allows users to hash an object even
> without a repository. Starting with c8aed5e8da (repository: stop setting
> SHA1 as the default object hash, 2024-05-07), this will make us hit an
> uninitialized hash function, which subsequently leads to a segfault.
>
> Fix this by falling back to SHA-1 explicitly when running outside of a
> Git repository. Users can use GIT_DEFAULT_HASH_ALGORITHM environment to
> specify what hash algorithm they want, so arguably this code should not
> be needed.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  builtin/hash-object.c   | 3 +++
>  t/t1007-hash-object.sh  | 6 ++++++
>  t/t1517-outside-repo.sh | 2 +-
>  3 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/hash-object.c b/builtin/hash-object.c
> index 82ca6d2bfd..c767414a0c 100644
> --- a/builtin/hash-object.c
> +++ b/builtin/hash-object.c
> @@ -123,6 +123,9 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
>  	else
>  		prefix = setup_git_directory_gently(&nongit);
>  
> +	if (nongit && !the_hash_algo)
> +		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);

This is slightly different from your version, which always used SHA-1
when nongit is true, in the hope that it would make it more useful if
the user can say "GIT_DEFAULT_HASH_ALGORITHM=sha256 git hash-objects"
outside a repository.  I am not 100% convinced it is better or we
rather should aim for predictability that you'd always and only get
SHA-1 outside a repository.
Patrick Steinhardt May 14, 2024, 4:32 a.m. UTC | #2
On Mon, May 13, 2024 at 04:13:19PM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > From: Patrick Steinhardt <ps@pks.im>
> >
> > The git-hash-object(1) command allows users to hash an object even
> > without a repository. Starting with c8aed5e8da (repository: stop setting
> > SHA1 as the default object hash, 2024-05-07), this will make us hit an
> > uninitialized hash function, which subsequently leads to a segfault.
> >
> > Fix this by falling back to SHA-1 explicitly when running outside of a
> > Git repository. Users can use GIT_DEFAULT_HASH_ALGORITHM environment to
> > specify what hash algorithm they want, so arguably this code should not
> > be needed.
> >
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > Signed-off-by: Junio C Hamano <gitster@pobox.com>
> > ---
> >  builtin/hash-object.c   | 3 +++
> >  t/t1007-hash-object.sh  | 6 ++++++
> >  t/t1517-outside-repo.sh | 2 +-
> >  3 files changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/builtin/hash-object.c b/builtin/hash-object.c
> > index 82ca6d2bfd..c767414a0c 100644
> > --- a/builtin/hash-object.c
> > +++ b/builtin/hash-object.c
> > @@ -123,6 +123,9 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
> >  	else
> >  		prefix = setup_git_directory_gently(&nongit);
> >  
> > +	if (nongit && !the_hash_algo)
> > +		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> 
> This is slightly different from your version, which always used SHA-1
> when nongit is true, in the hope that it would make it more useful if
> the user can say "GIT_DEFAULT_HASH_ALGORITHM=sha256 git hash-objects"
> outside a repository.  I am not 100% convinced it is better or we
> rather should aim for predictability that you'd always and only get
> SHA-1 outside a repository.

I'd prefer the latter -- always use SHA-1. As you say, it's easier to
understand and doesn't create implicit mechanisms that we'll have to
maintain going forward. Also, users didn't have a desire yet to pick a
different algorithm than SHA-1, which probably also comes from the fact
that SHA-256 repositories are still scarce.

Eventually, we should then add a new option `--object-hash=` to
git-hash-object(1) and other commands that may run outside of a Git
repository to let the user pick their desired hash.

Patrick
Junio C Hamano May 14, 2024, 3:55 p.m. UTC | #3
Patrick Steinhardt <ps@pks.im> writes:

> I'd prefer the latter -- always use SHA-1. As you say, it's easier to
> understand and doesn't create implicit mechanisms that we'll have to
> maintain going forward. Also, users didn't have a desire yet to pick a
> different algorithm than SHA-1, which probably also comes from the fact
> that SHA-256 repositories are still scarce.
> ...
> Eventually, we should then add a new option `--object-hash=` to
> git-hash-object(1) and other commands that may run outside of a Git
> repository to let the user pick their desired hash.

If we were planning to allow using something other than SHA-1 in the
future, then I do not think I agree with your argument.  We can use
the same GIT_DEFAULT_HASH mechanism to specify what hash to use without
later adding an extra option just fine.
diff mbox series

Patch

diff --git a/builtin/hash-object.c b/builtin/hash-object.c
index 82ca6d2bfd..c767414a0c 100644
--- a/builtin/hash-object.c
+++ b/builtin/hash-object.c
@@ -123,6 +123,9 @@  int cmd_hash_object(int argc, const char **argv, const char *prefix)
 	else
 		prefix = setup_git_directory_gently(&nongit);
 
+	if (nongit && !the_hash_algo)
+		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
+
 	if (vpath && prefix) {
 		vpath_free = prefix_filename(prefix, vpath);
 		vpath = vpath_free;
diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index 64aea38486..4c138c6ca4 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -260,4 +260,10 @@  test_expect_success '--literally with extra-long type' '
 	echo example | git hash-object -t $t --literally --stdin
 '
 
+test_expect_success '--stdin outside of repository' '
+	nongit git hash-object --stdin <hello >actual &&
+	echo "$(test_oid hello)" >expect &&
+	test_cmp expect actual
+'
+
 test_done
diff --git a/t/t1517-outside-repo.sh b/t/t1517-outside-repo.sh
index f1fd5c9888..80261e43c7 100755
--- a/t/t1517-outside-repo.sh
+++ b/t/t1517-outside-repo.sh
@@ -33,7 +33,7 @@  test_expect_success 'compute a patch-id outside repository' '
 	test_cmp patch-id.expect patch-id.actual
 '
 
-test_expect_failure 'hash-object outside repository' '
+test_expect_success 'hash-object outside repository' '
 	git hash-object --stdin <sample.patch >hash.expect &&
 	(
 		cd non-repo &&