Message ID | 20240513224127.2042052-4-gitster@pobox.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Fix use of uninitialized hash algorithms | expand |
Junio C Hamano <gitster@pobox.com> writes: > We explicitly document patch IDs to be using SHA-1. Furthermore, patch > IDs are supposed to be stable for most of the part. But even with the > same input, the patch IDs will now be different depending on the repo's > configured object hash. > > Work around the issue by setting up SHA-1 when there was no startup > repository for now. This is arguably not the correct fix, but for now we > rather want to focus on getting the segfault fixed. I tend to agree that the use of SHA-256 in patch-id computation is a regression when we added the SHA-256 support. Even with the GIT_DEFAULT_HASH_ALGORITHM fallback, we cannot fix it in an initialized SHA-256 repository. We should fix it but I agree that is probably outside the scope of the "oops, we leave the hash algorithm totally uninitialized" fix. > + /* > + * We rely on `the_hash_algo` to compute patch IDs. This is dubious as > + * it means that the hash algorithm now depends on the object hash of > + * the repository, even though git-patch-id(1) clearly defines that > + * patch IDs always use SHA1. > + * > + * NEEDSWORK: This hack should be removed in favor of converting > + * the code that computes patch IDs to always use SHA1. > + */ > + if (!startup_info->have_repository) > + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); Hmph, in other places I did if (!the_hash_algo) repo_set_hash_algo(the_repository, GIT_HASH_SHA1); to find the case where we need a reasonable default. Is there a practical difference? If there isn't we should standardise one and use the same test consistently everywhere. Not that it matters for this particular case, where we in the longer term should be hardcoding the use of SHA-1 even in SHA-256 repository for the pupose of computing the patch-id. Thanks.
On Mon, May 13, 2024 at 04:11:01PM -0700, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: [snip] > > + /* > > + * We rely on `the_hash_algo` to compute patch IDs. This is dubious as > > + * it means that the hash algorithm now depends on the object hash of > > + * the repository, even though git-patch-id(1) clearly defines that > > + * patch IDs always use SHA1. > > + * > > + * NEEDSWORK: This hack should be removed in favor of converting > > + * the code that computes patch IDs to always use SHA1. > > + */ > > + if (!startup_info->have_repository) > > + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); > > Hmph, in other places I did > > if (!the_hash_algo) > repo_set_hash_algo(the_repository, GIT_HASH_SHA1); > > to find the case where we need a reasonable default. > > Is there a practical difference? If there isn't we should > standardise one and use the same test consistently everywhere. > > Not that it matters for this particular case, where we in the longer > term should be hardcoding the use of SHA-1 even in SHA-256 repository > for the pupose of computing the patch-id. To the best of my knowledge there isn't. What I prefer about my approach is that it explicitly points out that this is conditional on whether or not we have a repository. But in the end I don't mind much which of both versions we use. Patrick
Patrick Steinhardt <ps@pks.im> writes: >> Hmph, in other places I did >> >> if (!the_hash_algo) >> repo_set_hash_algo(the_repository, GIT_HASH_SHA1); >> >> to find the case where we need a reasonable default. >> >> Is there a practical difference? If there isn't we should >> standardise one and use the same test consistently everywhere. >> ... > > To the best of my knowledge there isn't. What I prefer about my approach > is that it explicitly points out that this is conditional on whether or > not we have a repository. But in the end I don't mind much which of both > versions we use. Ah, that makes sense, and it is quite subjective but makes certain sense. The reason I prefered to check "the_hash_algo" is very much the opposite. In this particular decision to call (or not call) set_hash_algo(), we only care if the_hash_algo is not yet set, and that is why the_hash_algo is checked. Specifically, we do not care *why* it is still unset; in the current codebase, the most likely reason why we do not have the_hash_algo set might be that we haven't found the repository yet, but we do not have to rely on that assumption to hold true. It would help maintainability into the future where the_hash_algo is set already before we come here when outside a repository, or vice versa.
diff --git a/builtin/patch-id.c b/builtin/patch-id.c index 3894d2b970..e6ae89beab 100644 --- a/builtin/patch-id.c +++ b/builtin/patch-id.c @@ -5,6 +5,7 @@ #include "hash.h" #include "hex.h" #include "parse-options.h" +#include "setup.h" static void flush_current_id(int patchlen, struct object_id *id, struct object_id *result) { @@ -237,6 +238,18 @@ int cmd_patch_id(int argc, const char **argv, const char *prefix) argc = parse_options(argc, argv, prefix, builtin_patch_id_options, patch_id_usage, 0); + /* + * We rely on `the_hash_algo` to compute patch IDs. This is dubious as + * it means that the hash algorithm now depends on the object hash of + * the repository, even though git-patch-id(1) clearly defines that + * patch IDs always use SHA1. + * + * NEEDSWORK: This hack should be removed in favor of converting + * the code that computes patch IDs to always use SHA1. + */ + if (!startup_info->have_repository) + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); + generate_id_list(opts ? opts > 1 : config.stable, opts ? opts == 3 : config.verbatim); return 0; diff --git a/t/t1517-outside-repo.sh b/t/t1517-outside-repo.sh index 16d9714c27..f1fd5c9888 100755 --- a/t/t1517-outside-repo.sh +++ b/t/t1517-outside-repo.sh @@ -24,7 +24,7 @@ test_expect_success 'set up a non-repo directory and test file' ' git diff >sample.patch ' -test_expect_failure 'compute a patch-id outside repository' ' +test_expect_success 'compute a patch-id outside repository' ' git patch-id <sample.patch >patch-id.expect && ( cd non-repo && diff --git a/t/t4204-patch-id.sh b/t/t4204-patch-id.sh index a7fa94ce0a..605faea0c7 100755 --- a/t/t4204-patch-id.sh +++ b/t/t4204-patch-id.sh @@ -310,4 +310,38 @@ test_expect_success 'patch-id handles diffs with one line of before/after' ' test_config patchid.stable true && calc_patch_id diffu1stable <diffu1 ' + +test_expect_failure 'patch-id computes same ID with different object hashes' ' + test_when_finished "rm -rf repo-sha1 repo-sha256" && + + cat >diff <<-\EOF && + diff --git a/bar b/bar + index bdaf90f..31051f6 100644 + --- a/bar + +++ b/bar + @@ -2 +2,2 @@ + b + +c + EOF + + git init --object-format=sha1 repo-sha1 && + git -C repo-sha1 patch-id <diff >patch-id-sha1 && + git init --object-format=sha256 repo-sha256 && + git -C repo-sha256 patch-id <diff >patch-id-sha256 && + test_cmp patch-id-sha1 patch-id-sha256 +' + +test_expect_success 'patch-id without repository' ' + cat >diff <<-\EOF && + diff --git a/bar b/bar + index bdaf90f..31051f6 100644 + --- a/bar + +++ b/bar + @@ -2 +2,2 @@ + b + +c + EOF + nongit git patch-id <diff +' + test_done