mbox series

[0/2] Fix use of uninitialized hash algos

Message ID cover.1715582857.git.ps@pks.im (mailing list archive)
Headers show
Series Fix use of uninitialized hash algos | expand

Message

Patrick Steinhardt May 13, 2024, 7:15 a.m. UTC
Hi,

with c8aed5e8da (repository: stop setting SHA1 as the default object
hash, 2024-05-07), we have stopped setting up the default hash function
for `the_repository`. This change was done so that we stop implicitly
using SHA1 in places where we don't really intend to. Instead, code
where we try to access `the_hash_algo` without having `the_repository`
properly initialized will now crash hard.

I have found two more cases where this can now be triggered:

  - git-patch-id(1) can read diffs from stdin.

  - git-hash-object(1) can hash data from stdin.

Both cases can work without a repository, and if they don't have one
they will now crash.

I still consider it a good thing that we did the change regardless of
those crashes. In the case of git-patch-id(1) I would claim that using
`the_hash_algo` is wrong in the first place, as patch IDs should be
stable and are documented to always use SHA1. Thus, patch IDs in SHA256
repos are essentially broken. And in the case of git-hash-object(1), we
should expose a command line option to let the user specify the object
hash. So both cases demonstrate that there is room for improvement.

If these cases keep on popping up and we don't feel comfortable with it,
then we can still decide to drop c8aed5e8da. The remainder of the topic
that this commit was part of should in that case stay though, as those
are real bug fixes. We could then re-try in a subsequent release cycle.
But for now I don't think this would be warranted yet.

This topic depends on js/ps/undecided-is-not-necessarily-sha1 at
c8aed5e8da (repository: stop setting SHA1 as the default object hash,
2024-05-07).

Thanks!

Patrick

Patrick Steinhardt (2):
  builtin/patch-id: fix uninitialized hash function
  builtin/hash-object: fix uninitialized hash function

 builtin/hash-object.c  |  7 +++++++
 builtin/patch-id.c     | 13 +++++++++++++
 t/t1007-hash-object.sh |  6 ++++++
 t/t4204-patch-id.sh    | 34 ++++++++++++++++++++++++++++++++++
 4 files changed, 60 insertions(+)

Comments

Junio C Hamano May 13, 2024, 4:01 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> with c8aed5e8da (repository: stop setting SHA1 as the default object
> hash, 2024-05-07), we have stopped setting up the default hash function
> for `the_repository`. This change was done so that we stop implicitly
> using SHA1 in places where we don't really intend to. Instead, code
> where we try to access `the_hash_algo` without having `the_repository`
> properly initialized will now crash hard.
>
> I have found two more cases where this can now be triggered:
>
>   - git-patch-id(1) can read diffs from stdin.
>
>   - git-hash-object(1) can hash data from stdin.
>
> Both cases can work without a repository, and if they don't have one
> they will now crash.

Perhaps we should double-check with all commands that are designed
to be able to work outside a repository, e.g. "git apply", "git grep
--no-index", "git diff --no-index" (tried to be exhausitive without
consulting documentation, so the list is not exhausitive at all).

> I still consider it a good thing that we did the change regardless of
> those crashes. In the case of git-patch-id(1) I would claim that using
> `the_hash_algo` is wrong in the first place, as patch IDs should be
> stable and are documented to always use SHA1. Thus, patch IDs in SHA256
> repos are essentially broken. And in the case of git-hash-object(1), we
> should expose a command line option to let the user specify the object
> hash. So both cases demonstrate that there is room for improvement.

It is good that the topic is kept outside 'master' (and it is in
'next' to give the topic a bit wider exposure than merely in 'seen'
and the list archive).

We may want a test file that explicitly make commands that ought
to work outside a repository actually run outside a repository,
making use of the GIT_CEILING_DIRECTORIES mechanism, something along
the lines of the attached.

 t/t1517-outside-repo.sh | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git c/t/t1517-outside-repo.sh w/t/t1517-outside-repo.sh
new file mode 100755
index 0000000000..4c595c2ff7
--- /dev/null
+++ w/t/t1517-outside-repo.sh
@@ -0,0 +1,42 @@
+#!/bin/sh
+
+test_description='check random commands outside repo'
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success 'set up a non-repo directory and test file' '
+	GIT_CEILING_DIRECTORIES=$(pwd) &&
+	export GIT_CEILING_DIRECTORIES &&
+	mkdir non-repo &&
+	(
+		cd non-repo &&
+		# confirm that git does not find a repo
+		test_must_fail git rev-parse --git-dir
+	) &&
+	test_write_lines one two three four >nums &&
+	git add nums &&
+	cp nums nums.old &&
+	test_write_lines five >>nums &&
+	git diff >sample.patch
+'
+
+test_expect_success 'apply a patch outside repository' '
+	(
+		cd non-repo &&
+		cp ../nums.old nums &&
+		git apply ../sample.patch
+	) &&
+	test_cmp nums non-repo/nums
+'
+
+test_expect_success 'compute a patch-id outside repository' '
+	git patch-id <sample.patch >patch-id.expect &&
+	(
+		cd non-repo &&
+		git patch-id <../sample.patch >../patch-id.actual
+	) &&
+	test_cmp patch-id.expect patch-id.actual
+'
+
+test_done
Junio C Hamano May 13, 2024, 6:36 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

>> I still consider it a good thing that we did the change regardless of
>> those crashes. In the case of git-patch-id(1) I would claim that using
>> `the_hash_algo` is wrong in the first place, as patch IDs should be
>> stable and are documented to always use SHA1. Thus, patch IDs in SHA256
>> repos are essentially broken. And in the case of git-hash-object(1), we
>> should expose a command line option to let the user specify the object
>> hash. So both cases demonstrate that there is room for improvement.
>
> It is good that the topic is kept outside 'master' (and it is in
> 'next' to give the topic a bit wider exposure than merely in 'seen'
> and the list archive).
>
> We may want a test file that explicitly make commands that ought
> to work outside a repository actually run outside a repository,
> making use of the GIT_CEILING_DIRECTORIES mechanism, something along
> the lines of the attached.

Another thought.  Perhaps we should do, in the meantime, the
attached patch?

Without your "fix patch-id" patch, one of the tests in 1517 will
fail, and the other fails as there is no fix posted yet, but with
the GIT_TEST_DEFAULT_HASH_IS_SHA1 set explicit to 1, they keep
passing.  This way, when we run out tests, we will always leave the
default hash uninitialized to catch more errors, the binary built
out of our source by default will crash to help our users catch more
errors for us, yet when they really really need to, the users can
set and export GIT_TEST_DEFAULT_HASH_IS_SHA1=1 to keep assuming that
SHA1 is the hash algorithm when there is no specified.


 repository.c            | 24 ++++++++++++++++++++++++
 t/t1517-outside-repo.sh | 42 ++++++++++++++++++++++++++++++++++++++++++
 t/test-lib.sh           |  4 ++++
 3 files changed, 70 insertions(+)

diff --git c/repository.c w/repository.c
index 15c10015b0..6f67966e35 100644
--- c/repository.c
+++ w/repository.c
@@ -26,6 +26,30 @@ void initialize_repository(struct repository *repo)
 	repo->parsed_objects = parsed_object_pool_new();
 	ALLOC_ARRAY(repo->index, 1);
 	index_state_init(repo->index, repo);
+
+	/*
+	 * Unfortunately, we need to keep this hack around for the time being:
+	 *
+	 *   - Not setting up the hash algorithm for `the_repository` leads to
+	 *     crashes because `the_hash_algo` is a macro that expands to
+	 *     `the_repository->hash_algo`. So if Git commands try to access
+	 *     `the_hash_algo` without a Git directory we crash.
+	 *
+	 *   - Setting up the hash algorithm to be SHA1 by default breaks other
+	 *     commands when running with SHA256.
+	 *
+	 * This is another point in case why having global state is a bad idea.
+	 * Eventually, we should remove this hack and stop setting the hash
+	 * algorithm in this function altogether. Instead, it should only ever
+	 * be set via our repository setup procedures. But that requires more
+	 * work.
+	 *
+	 * As we discover the code paths that need fixing, we may remove this
+	 * code completely, but we are not there yet.
+	 */
+	if (repo == the_repository &&
+	    git_env_bool("GIT_TEST_DEFAULT_HASH_IS_SHA1", 0))
+		repo_set_hash_algo(repo, GIT_HASH_SHA1);
 }
 
 static void expand_base_dir(char **out, const char *in,
diff --git c/t/t1517-outside-repo.sh w/t/t1517-outside-repo.sh
new file mode 100755
index 0000000000..4c595c2ff7
--- /dev/null
+++ w/t/t1517-outside-repo.sh
@@ -0,0 +1,42 @@
+#!/bin/sh
+
+test_description='check random commands outside repo'
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success 'set up a non-repo directory and test file' '
+	GIT_CEILING_DIRECTORIES=$(pwd) &&
+	export GIT_CEILING_DIRECTORIES &&
+	mkdir non-repo &&
+	(
+		cd non-repo &&
+		# confirm that git does not find a repo
+		test_must_fail git rev-parse --git-dir
+	) &&
+	test_write_lines one two three four >nums &&
+	git add nums &&
+	cp nums nums.old &&
+	test_write_lines five >>nums &&
+	git diff >sample.patch
+'
+
+test_expect_success 'apply a patch outside repository' '
+	(
+		cd non-repo &&
+		cp ../nums.old nums &&
+		git apply ../sample.patch
+	) &&
+	test_cmp nums non-repo/nums
+'
+
+test_expect_success 'compute a patch-id outside repository' '
+	git patch-id <sample.patch >patch-id.expect &&
+	(
+		cd non-repo &&
+		git patch-id <../sample.patch >../patch-id.actual
+	) &&
+	test_cmp patch-id.expect patch-id.actual
+'
+
+test_done
diff --git c/t/test-lib.sh w/t/test-lib.sh
index 79d3e0e7d9..36d311fb4a 100644
--- c/t/test-lib.sh
+++ w/t/test-lib.sh
@@ -542,6 +542,10 @@ export EDITOR
 
 GIT_DEFAULT_HASH="${GIT_TEST_DEFAULT_HASH:-sha1}"
 export GIT_DEFAULT_HASH
+
+GIT_TEST_DEFAULT_HASH_IS_SHA1=0
+export GIT_TEST_DEFAULT_HASH_IS_SHA1
+
 GIT_DEFAULT_REF_FORMAT="${GIT_TEST_DEFAULT_REF_FORMAT:-files}"
 export GIT_DEFAULT_REF_FORMAT
 GIT_TEST_MERGE_ALGORITHM="${GIT_TEST_MERGE_ALGORITHM:-ort}"