Message ID | pull.1824.git.1731610074707.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | fast-import: avoid making replace refs point to themselves | expand |
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > Most git commands that you try to run in such a repository with a > self-pointing replace object will result in an error: > > $ git log > fatal: replace depth too high for object fb92ebc654641b310e7d0360d0a5a49316fd7264 After reading up to this point, with "Most git commands", I was afraid that you were going to say "... but I found this command that fails to stop gracefully, and instead infinitely spin". If that is not a case, then I am happy ;-) but in that case probably "Most" -> "All" is warranted. > Avoid such problems by deleting replace refs that will simply end up > pointing to themselves at the end of our writing. Warn the users when > we do so, unless they specify --quiet. This may need a bit of rephrasing. Even when they specify "--quiet", you'd refrain from creating a self-referencing replace entry, which makes sense, but it was not clear with only the above description. I had to read the patch text to find it out. Is it reasonable to expect that a self referencing replace entry is the most common thing to happen, or loops with more than one elements are equally plausible to happen but the self-referencing one is singled out by this patch because it is trivial to notice, unlike other forms of equally problematic loops? > diff --git a/builtin/fast-import.c b/builtin/fast-import.c > index 76d5c20f141..51c8228cb7b 100644 > --- a/builtin/fast-import.c > +++ b/builtin/fast-import.c > @@ -179,6 +179,7 @@ static unsigned long branch_load_count; > static int failure; > static FILE *pack_edges; > static unsigned int show_stats = 1; > +static unsigned int quiet; > static int global_argc; > static const char **global_argv; > static const char *global_prefix; > @@ -1602,7 +1603,19 @@ static int update_branch(struct branch *b) > struct ref_transaction *transaction; > struct object_id old_oid; > struct strbuf err = STRBUF_INIT; > - > + static const char *replace_prefix = "refs/replace/"; > + > + if (starts_with(b->name, replace_prefix) && Curious how the "diff" machinery decided to represent the hunk like this, instead of say > struct strbuf err = STRBUF_INIT; > + static const char *replace_prefix = "refs/replace/"; > > + if (starts_with(b->name, replace_prefix) && Of course that has nothing to do with this patch or its review. > + !strcmp(b->name + strlen(replace_prefix), > + oid_to_hex(&b->oid))) { > + if (!quiet) > + warning("Dropping %s since it would point to " > + "itself (i.e. to %s)", > + b->name, oid_to_hex(&b->oid)); > + refs_delete_ref(get_main_ref_store(the_repository), > + NULL, b->name, NULL, 0); > + return 0; I am not sure if a warning is even warranted. If you decide to replace an object A with the same object A, the result ought to be a no-op. I wonder if it is makes more sense to (1) do this unconditionally and silently, and (2) when we prepare the replace_map, ignore self-referencing ones. instead. If (2) makes sense entirely depends on the answer of an earlier question (i.e. "is there a reason why self-reference is more common than general loop?"). Thanks.
Junio C Hamano <gitster@pobox.com> writes: > I am not sure if a warning is even warranted. If you decide to > replace an object A with the same object A, the result ought to be a > no-op. I wonder if it is makes more sense to > > (1) do this unconditionally and silently, and > (2) when we prepare the replace_map, ignore self-referencing ones. > > instead. If (2) makes sense entirely depends on the answer of an > earlier question (i.e. "is there a reason why self-reference is more > common than general loop?"). Forgot to add. (1) could be done even at a lower layer, i.e. the ref API could be told to reject such a replace ref creation/update that maps an object name to itself.
On Thu, Nov 14, 2024 at 4:31 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > Most git commands that you try to run in such a repository with a > > self-pointing replace object will result in an error: > > > > $ git log > > fatal: replace depth too high for object fb92ebc654641b310e7d0360d0a5a49316fd7264 > > After reading up to this point, with "Most git commands", I was > afraid that you were going to say "... but I found this command that > fails to stop gracefully, and instead infinitely spin". > > If that is not a case, then I am happy ;-) but in that case probably > "Most" -> "All" is warranted. Some git commands are not bothered by the existence of the problematic replace ref; e.g.: $ git show-ref 64026eaa60e0208a00946cdd3cb41b059c6675bd refs/heads/main fb92ebc654641b310e7d0360d0a5a49316fd7264 refs/replace/fb92ebc654641b310e7d0360d0a5a49316fd7264 fb92ebc654641b310e7d0360d0a5a49316fd7264 refs/tags/msg 14f901a95ebae912feb4805f40ef68f15b0192c2 refs/tags/test So, it's not "all" git commands. Maybe I should rephrase as: """ Some git commands will ignore such replace refs, but many will fail with an error: $ git log fatal: replace depth too high for object fb92ebc654641b310e7d0360d0a5a49316fd7264 """ ? > > Avoid such problems by deleting replace refs that will simply end up > > pointing to themselves at the end of our writing. Warn the users when > > we do so, unless they specify --quiet. > > This may need a bit of rephrasing. > > Even when they specify "--quiet", you'd refrain from creating a > self-referencing replace entry, which makes sense, but it was not > clear with only the above description. I had to read the patch text > to find it out. Would reordering the two phrases of the last sentence do it? """ Avoid such problems by deleting replace refs that will simply end up pointing to themselves at the end of our writing. Unless users specify --quiet, warn the users when we do so. """ > Is it reasonable to expect that a self referencing replace entry is > the most common thing to happen, or loops with more than one > elements are equally plausible to happen but the self-referencing > one is singled out by this patch because it is trivial to notice, > unlike other forms of equally problematic loops? Loops with more than one element are possible, but I couldn't see an angle by which they are plausible. In particular, I know how a user can (and did) accidentally trigger a self-referencing replace ref (if you're curious, see https://github.com/newren/git-filter-repo/issues/401; it's much less likely to trigger now due to an unrelated change to the default for `--replace-refs`, but could still be triggered in combination with that option). There are many variations on their theme of "do a change, then undo it" (in combination with the old `--replace-refs` default), but I don't see how to either vary their testcase or come up with some other way to get git-filter-repo to trigger a replace ref loop with more than one element. > > diff --git a/builtin/fast-import.c b/builtin/fast-import.c > > index 76d5c20f141..51c8228cb7b 100644 > > --- a/builtin/fast-import.c > > +++ b/builtin/fast-import.c > > @@ -179,6 +179,7 @@ static unsigned long branch_load_count; > > static int failure; > > static FILE *pack_edges; > > static unsigned int show_stats = 1; > > +static unsigned int quiet; > > static int global_argc; > > static const char **global_argv; > > static const char *global_prefix; > > @@ -1602,7 +1603,19 @@ static int update_branch(struct branch *b) > > struct ref_transaction *transaction; > > struct object_id old_oid; > > struct strbuf err = STRBUF_INIT; > > - > > + static const char *replace_prefix = "refs/replace/"; > > + > > + if (starts_with(b->name, replace_prefix) && > > Curious how the "diff" machinery decided to represent the hunk like > this, instead of say > > > struct strbuf err = STRBUF_INIT; > > + static const char *replace_prefix = "refs/replace/"; > > > > + if (starts_with(b->name, replace_prefix) && > > Of course that has nothing to do with this patch or its review. Oh, that is kinda weird. It's not a whitespace-change issue either; both the removed and added blank-looking lines really are blank lines. No idea why the diff machinery does that. > > + !strcmp(b->name + strlen(replace_prefix), > > + oid_to_hex(&b->oid))) { > > + if (!quiet) > > + warning("Dropping %s since it would point to " > > + "itself (i.e. to %s)", > > + b->name, oid_to_hex(&b->oid)); > > + refs_delete_ref(get_main_ref_store(the_repository), > > + NULL, b->name, NULL, 0); > > + return 0; > > I am not sure if a warning is even warranted. If you decide to > replace an object A with the same object A, the result ought to be a > no-op. I wonder if it is makes more sense to > > (1) do this unconditionally and silently, and > (2) when we prepare the replace_map, ignore self-referencing ones. > > instead. If (2) makes sense entirely depends on the answer of an > earlier question (i.e. "is there a reason why self-reference is more > common than general loop?"). > > Thanks. Since you added more in a subsequent response, I'll go respond to that.
On Thu, Nov 14, 2024 at 5:08 PM Junio C Hamano <gitster@pobox.com> wrote: > > Junio C Hamano <gitster@pobox.com> writes: > > > I am not sure if a warning is even warranted. If you decide to > > replace an object A with the same object A, the result ought to be a > > no-op. Do you mean that the result ought to be that there is no replacement? If so, I agree, but that's not always equivalent to a no-op... > > I wonder if it is makes more sense to > > > > (1) do this unconditionally and silently, and > > (2) when we prepare the replace_map, ignore self-referencing ones. > > > > instead. If (2) makes sense entirely depends on the answer of an > > earlier question (i.e. "is there a reason why self-reference is more > > common than general loop?"). > > Forgot to add. (1) could be done even at a lower layer, i.e. the > ref API could be told to reject such a replace ref creation/update > that maps an object name to itself. Perhaps rejecting the creation or update of a replace ref would make sense at a lower level, but a no-op isn't what I want here -- it'd just result in a different bug. In particular, note that my patch modifies fast-import to issue a delete where it would otherwise issue an update instead of merely rejecting the update. If the repository had a value for the replace ref before fast-import was run, and the replace ref was explicitly named in the fast-import stream, I don't want the replace ref to be left with a pre-fast-import value.
Elijah Newren <newren@gmail.com> writes: > ... If the repository > had a value for the replace ref before fast-import was run, and the > replace ref was explicitly named in the fast-import stream, I don't > want the replace ref to be left with a pre-fast-import value. Ah, right. We want to honor the user's latest wish, i.e., if they create a replace ref that maps A to A, when they ask for object A, instead of returning it, we need to return what the replace ref refers to, which happens to be object A. So you are right. "You are trying to map A to A, so we'll ignore that request" is a wrong thing to do. "In order to give you A when you ask for A, we will remove the existing mapping for A" is absolutely the right thing to do, which is what your patch does. Thanks.
diff --git a/builtin/fast-import.c b/builtin/fast-import.c index 76d5c20f141..51c8228cb7b 100644 --- a/builtin/fast-import.c +++ b/builtin/fast-import.c @@ -179,6 +179,7 @@ static unsigned long branch_load_count; static int failure; static FILE *pack_edges; static unsigned int show_stats = 1; +static unsigned int quiet; static int global_argc; static const char **global_argv; static const char *global_prefix; @@ -1602,7 +1603,19 @@ static int update_branch(struct branch *b) struct ref_transaction *transaction; struct object_id old_oid; struct strbuf err = STRBUF_INIT; - + static const char *replace_prefix = "refs/replace/"; + + if (starts_with(b->name, replace_prefix) && + !strcmp(b->name + strlen(replace_prefix), + oid_to_hex(&b->oid))) { + if (!quiet) + warning("Dropping %s since it would point to " + "itself (i.e. to %s)", + b->name, oid_to_hex(&b->oid)); + refs_delete_ref(get_main_ref_store(the_repository), + NULL, b->name, NULL, 0); + return 0; + } if (is_null_oid(&b->oid)) { if (b->delete) refs_delete_ref(get_main_ref_store(the_repository), @@ -3388,6 +3401,7 @@ static int parse_one_option(const char *option) option_export_pack_edges(option); } else if (!strcmp(option, "quiet")) { show_stats = 0; + quiet = 1; } else if (!strcmp(option, "stats")) { show_stats = 1; } else if (!strcmp(option, "allow-unsafe-features")) { diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh index 6224f54d4d2..425a261c161 100755 --- a/t/t9300-fast-import.sh +++ b/t/t9300-fast-import.sh @@ -3692,6 +3692,34 @@ test_expect_success ICONV 'X: handling encoding' ' git log -1 --format=%B encoding | grep $(printf "\317\200") ' +test_expect_success 'X: replace ref that becomes useless is removed' ' + git init -qb main testrepo && + cd testrepo && + ( + test_commit test && + + test_commit msg somename content && + + git mv somename othername && + NEW_TREE=$(git write-tree) && + MSG="$(git log -1 --format=%B HEAD)" && + NEW_COMMIT=$(git commit-tree -p HEAD^1 -m "$MSG" $NEW_TREE) && + git replace main $NEW_COMMIT && + + echo more >>othername && + git add othername && + git commit -qm more && + + git fast-export --all >tmp && + sed -e s/othername/somename/ tmp >tmp2 && + git fast-import --force <tmp2 2>msgs && + + grep "Dropping.*since it would point to itself" msgs && + git show-ref >refs && + ! grep refs/replace refs + ) +' + ### ### series Y (submodules and hash algorithms) ###