Message ID | cover.1715582857.git.ps@pks.im (mailing list archive) |
---|---|
Headers | show |
Series | Fix use of uninitialized hash algos | expand |
Patrick Steinhardt <ps@pks.im> writes: > with c8aed5e8da (repository: stop setting SHA1 as the default object > hash, 2024-05-07), we have stopped setting up the default hash function > for `the_repository`. This change was done so that we stop implicitly > using SHA1 in places where we don't really intend to. Instead, code > where we try to access `the_hash_algo` without having `the_repository` > properly initialized will now crash hard. > > I have found two more cases where this can now be triggered: > > - git-patch-id(1) can read diffs from stdin. > > - git-hash-object(1) can hash data from stdin. > > Both cases can work without a repository, and if they don't have one > they will now crash. Perhaps we should double-check with all commands that are designed to be able to work outside a repository, e.g. "git apply", "git grep --no-index", "git diff --no-index" (tried to be exhausitive without consulting documentation, so the list is not exhausitive at all). > I still consider it a good thing that we did the change regardless of > those crashes. In the case of git-patch-id(1) I would claim that using > `the_hash_algo` is wrong in the first place, as patch IDs should be > stable and are documented to always use SHA1. Thus, patch IDs in SHA256 > repos are essentially broken. And in the case of git-hash-object(1), we > should expose a command line option to let the user specify the object > hash. So both cases demonstrate that there is room for improvement. It is good that the topic is kept outside 'master' (and it is in 'next' to give the topic a bit wider exposure than merely in 'seen' and the list archive). We may want a test file that explicitly make commands that ought to work outside a repository actually run outside a repository, making use of the GIT_CEILING_DIRECTORIES mechanism, something along the lines of the attached. t/t1517-outside-repo.sh | 42 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git c/t/t1517-outside-repo.sh w/t/t1517-outside-repo.sh new file mode 100755 index 0000000000..4c595c2ff7 --- /dev/null +++ w/t/t1517-outside-repo.sh @@ -0,0 +1,42 @@ +#!/bin/sh + +test_description='check random commands outside repo' + +TEST_PASSES_SANITIZE_LEAK=true +. ./test-lib.sh + +test_expect_success 'set up a non-repo directory and test file' ' + GIT_CEILING_DIRECTORIES=$(pwd) && + export GIT_CEILING_DIRECTORIES && + mkdir non-repo && + ( + cd non-repo && + # confirm that git does not find a repo + test_must_fail git rev-parse --git-dir + ) && + test_write_lines one two three four >nums && + git add nums && + cp nums nums.old && + test_write_lines five >>nums && + git diff >sample.patch +' + +test_expect_success 'apply a patch outside repository' ' + ( + cd non-repo && + cp ../nums.old nums && + git apply ../sample.patch + ) && + test_cmp nums non-repo/nums +' + +test_expect_success 'compute a patch-id outside repository' ' + git patch-id <sample.patch >patch-id.expect && + ( + cd non-repo && + git patch-id <../sample.patch >../patch-id.actual + ) && + test_cmp patch-id.expect patch-id.actual +' + +test_done
Junio C Hamano <gitster@pobox.com> writes: >> I still consider it a good thing that we did the change regardless of >> those crashes. In the case of git-patch-id(1) I would claim that using >> `the_hash_algo` is wrong in the first place, as patch IDs should be >> stable and are documented to always use SHA1. Thus, patch IDs in SHA256 >> repos are essentially broken. And in the case of git-hash-object(1), we >> should expose a command line option to let the user specify the object >> hash. So both cases demonstrate that there is room for improvement. > > It is good that the topic is kept outside 'master' (and it is in > 'next' to give the topic a bit wider exposure than merely in 'seen' > and the list archive). > > We may want a test file that explicitly make commands that ought > to work outside a repository actually run outside a repository, > making use of the GIT_CEILING_DIRECTORIES mechanism, something along > the lines of the attached. Another thought. Perhaps we should do, in the meantime, the attached patch? Without your "fix patch-id" patch, one of the tests in 1517 will fail, and the other fails as there is no fix posted yet, but with the GIT_TEST_DEFAULT_HASH_IS_SHA1 set explicit to 1, they keep passing. This way, when we run out tests, we will always leave the default hash uninitialized to catch more errors, the binary built out of our source by default will crash to help our users catch more errors for us, yet when they really really need to, the users can set and export GIT_TEST_DEFAULT_HASH_IS_SHA1=1 to keep assuming that SHA1 is the hash algorithm when there is no specified. repository.c | 24 ++++++++++++++++++++++++ t/t1517-outside-repo.sh | 42 ++++++++++++++++++++++++++++++++++++++++++ t/test-lib.sh | 4 ++++ 3 files changed, 70 insertions(+) diff --git c/repository.c w/repository.c index 15c10015b0..6f67966e35 100644 --- c/repository.c +++ w/repository.c @@ -26,6 +26,30 @@ void initialize_repository(struct repository *repo) repo->parsed_objects = parsed_object_pool_new(); ALLOC_ARRAY(repo->index, 1); index_state_init(repo->index, repo); + + /* + * Unfortunately, we need to keep this hack around for the time being: + * + * - Not setting up the hash algorithm for `the_repository` leads to + * crashes because `the_hash_algo` is a macro that expands to + * `the_repository->hash_algo`. So if Git commands try to access + * `the_hash_algo` without a Git directory we crash. + * + * - Setting up the hash algorithm to be SHA1 by default breaks other + * commands when running with SHA256. + * + * This is another point in case why having global state is a bad idea. + * Eventually, we should remove this hack and stop setting the hash + * algorithm in this function altogether. Instead, it should only ever + * be set via our repository setup procedures. But that requires more + * work. + * + * As we discover the code paths that need fixing, we may remove this + * code completely, but we are not there yet. + */ + if (repo == the_repository && + git_env_bool("GIT_TEST_DEFAULT_HASH_IS_SHA1", 0)) + repo_set_hash_algo(repo, GIT_HASH_SHA1); } static void expand_base_dir(char **out, const char *in, diff --git c/t/t1517-outside-repo.sh w/t/t1517-outside-repo.sh new file mode 100755 index 0000000000..4c595c2ff7 --- /dev/null +++ w/t/t1517-outside-repo.sh @@ -0,0 +1,42 @@ +#!/bin/sh + +test_description='check random commands outside repo' + +TEST_PASSES_SANITIZE_LEAK=true +. ./test-lib.sh + +test_expect_success 'set up a non-repo directory and test file' ' + GIT_CEILING_DIRECTORIES=$(pwd) && + export GIT_CEILING_DIRECTORIES && + mkdir non-repo && + ( + cd non-repo && + # confirm that git does not find a repo + test_must_fail git rev-parse --git-dir + ) && + test_write_lines one two three four >nums && + git add nums && + cp nums nums.old && + test_write_lines five >>nums && + git diff >sample.patch +' + +test_expect_success 'apply a patch outside repository' ' + ( + cd non-repo && + cp ../nums.old nums && + git apply ../sample.patch + ) && + test_cmp nums non-repo/nums +' + +test_expect_success 'compute a patch-id outside repository' ' + git patch-id <sample.patch >patch-id.expect && + ( + cd non-repo && + git patch-id <../sample.patch >../patch-id.actual + ) && + test_cmp patch-id.expect patch-id.actual +' + +test_done diff --git c/t/test-lib.sh w/t/test-lib.sh index 79d3e0e7d9..36d311fb4a 100644 --- c/t/test-lib.sh +++ w/t/test-lib.sh @@ -542,6 +542,10 @@ export EDITOR GIT_DEFAULT_HASH="${GIT_TEST_DEFAULT_HASH:-sha1}" export GIT_DEFAULT_HASH + +GIT_TEST_DEFAULT_HASH_IS_SHA1=0 +export GIT_TEST_DEFAULT_HASH_IS_SHA1 + GIT_DEFAULT_REF_FORMAT="${GIT_TEST_DEFAULT_REF_FORMAT:-files}" export GIT_DEFAULT_REF_FORMAT GIT_TEST_MERGE_ALGORITHM="${GIT_TEST_MERGE_ALGORITHM:-ort}"