diff mbox series

[v2] fast-import: avoid making replace refs point to themselves

Message ID pull.1824.v2.git.1731968389590.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit 5e904f1a4ade49372970bd172d63f9c9fd7b2653
Headers show
Series [v2] fast-import: avoid making replace refs point to themselves | expand

Commit Message

Elijah Newren Nov. 18, 2024, 10:19 p.m. UTC
From: Elijah Newren <newren@gmail.com>

If someone replaces a commit with a modified version, then builds on
that commit, and then later decides to rewrite history in a format like

    git fast-export --all | CMD_TO_TWEAK_THE_STREAM | git fast-import

and CMD_TO_TWEAK_THE_STREAM undoes the modifications that the
replacement did, then at the end you'd get a replace ref that points to
itself.  For example:

    $ git show-ref | grep replace
    fb92ebc654641b310e7d0360d0a5a49316fd7264 refs/replace/fb92ebc654641b310e7d0360d0a5a49316fd7264

Git commands which pay attention to replace refs will die with an error
when a self-referencing replace ref is present:

    $ git log
    fatal: replace depth too high for object fb92ebc654641b310e7d0360d0a5a49316fd7264

Avoid such problems by deleting replace refs that will simply end up
pointing to themselves at the end of our writing.  Unless users specify
--quiet, warn them when we delete such a replace ref.

Two notes about this patch:
  * We are not ignoring the problematic update of the replace ref
    (turning it into a no-op), we are replacing the update with a delete.
    The logic here is that if the repository had a value for the replace
    ref before fast-import was run, and the replace ref was explicitly
    named in the fast-import stream, we don't want the replace ref to be
    left with a pre-fast-import value.
  * While loops with more than one element (e.g. refs/replace/A points
    to B, and refs/replace/B points to A) are possible, they seem much
    less plausible.  It is pretty easy to create a sequence of
    git-filter-repo commands that will trigger a self-referencing replace
    ref, but I do not know how to trigger a scenario with a cycle length
    greater than 1.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
    fast-import: avoid making replace refs point to themselves
    
    Changes since v1: Clarified wording in the commit message, and added a
    little more text at the end of the commit message to address questions
    that came up in review.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1824%2Fnewren%2Ffast-import-self-pointing-replace-ref-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1824/newren/fast-import-self-pointing-replace-ref-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1824

Range-diff vs v1:

 1:  7402518e806 ! 1:  cc77a004cea fast-import: avoid making replace refs point to themselves
     @@ Commit message
              $ git show-ref | grep replace
              fb92ebc654641b310e7d0360d0a5a49316fd7264 refs/replace/fb92ebc654641b310e7d0360d0a5a49316fd7264
      
     -    Most git commands that you try to run in such a repository with a
     -    self-pointing replace object will result in an error:
     +    Git commands which pay attention to replace refs will die with an error
     +    when a self-referencing replace ref is present:
      
              $ git log
              fatal: replace depth too high for object fb92ebc654641b310e7d0360d0a5a49316fd7264
      
          Avoid such problems by deleting replace refs that will simply end up
     -    pointing to themselves at the end of our writing.  Warn the users when
     -    we do so, unless they specify --quiet.
     +    pointing to themselves at the end of our writing.  Unless users specify
     +    --quiet, warn them when we delete such a replace ref.
     +
     +    Two notes about this patch:
     +      * We are not ignoring the problematic update of the replace ref
     +        (turning it into a no-op), we are replacing the update with a delete.
     +        The logic here is that if the repository had a value for the replace
     +        ref before fast-import was run, and the replace ref was explicitly
     +        named in the fast-import stream, we don't want the replace ref to be
     +        left with a pre-fast-import value.
     +      * While loops with more than one element (e.g. refs/replace/A points
     +        to B, and refs/replace/B points to A) are possible, they seem much
     +        less plausible.  It is pretty easy to create a sequence of
     +        git-filter-repo commands that will trigger a self-referencing replace
     +        ref, but I do not know how to trigger a scenario with a cycle length
     +        greater than 1.
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      


 builtin/fast-import.c  | 16 +++++++++++++++-
 t/t9300-fast-import.sh | 28 ++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+), 1 deletion(-)


base-commit: 8f8d6eee531b3fa1a8ef14f169b0cb5035f7a772

Comments

Junio C Hamano Nov. 19, 2024, 12:44 a.m. UTC | #1
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +		grep "Dropping.*since it would point to itself" msgs &&
> +		git show-ref >refs &&
> +		! grep refs/replace refs

"test_grep !" may make a failure case easier to diagnose.

Other than that, looking good.

Thanks, will queue.
diff mbox series

Patch

diff --git a/builtin/fast-import.c b/builtin/fast-import.c
index 76d5c20f141..51c8228cb7b 100644
--- a/builtin/fast-import.c
+++ b/builtin/fast-import.c
@@ -179,6 +179,7 @@  static unsigned long branch_load_count;
 static int failure;
 static FILE *pack_edges;
 static unsigned int show_stats = 1;
+static unsigned int quiet;
 static int global_argc;
 static const char **global_argv;
 static const char *global_prefix;
@@ -1602,7 +1603,19 @@  static int update_branch(struct branch *b)
 	struct ref_transaction *transaction;
 	struct object_id old_oid;
 	struct strbuf err = STRBUF_INIT;
-
+	static const char *replace_prefix = "refs/replace/";
+
+	if (starts_with(b->name, replace_prefix) &&
+	    !strcmp(b->name + strlen(replace_prefix),
+		    oid_to_hex(&b->oid))) {
+		if (!quiet)
+			warning("Dropping %s since it would point to "
+				"itself (i.e. to %s)",
+				b->name, oid_to_hex(&b->oid));
+		refs_delete_ref(get_main_ref_store(the_repository),
+				NULL, b->name, NULL, 0);
+		return 0;
+	}
 	if (is_null_oid(&b->oid)) {
 		if (b->delete)
 			refs_delete_ref(get_main_ref_store(the_repository),
@@ -3388,6 +3401,7 @@  static int parse_one_option(const char *option)
 		option_export_pack_edges(option);
 	} else if (!strcmp(option, "quiet")) {
 		show_stats = 0;
+		quiet = 1;
 	} else if (!strcmp(option, "stats")) {
 		show_stats = 1;
 	} else if (!strcmp(option, "allow-unsafe-features")) {
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 6224f54d4d2..425a261c161 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -3692,6 +3692,34 @@  test_expect_success ICONV 'X: handling encoding' '
 	git log -1 --format=%B encoding | grep $(printf "\317\200")
 '
 
+test_expect_success 'X: replace ref that becomes useless is removed' '
+	git init -qb main testrepo &&
+	cd testrepo &&
+	(
+		test_commit test &&
+
+		test_commit msg somename content &&
+
+		git mv somename othername &&
+		NEW_TREE=$(git write-tree) &&
+		MSG="$(git log -1 --format=%B HEAD)" &&
+		NEW_COMMIT=$(git commit-tree -p HEAD^1 -m "$MSG" $NEW_TREE) &&
+		git replace main $NEW_COMMIT &&
+
+		echo more >>othername &&
+		git add othername &&
+		git commit -qm more &&
+
+		git fast-export --all >tmp &&
+		sed -e s/othername/somename/ tmp >tmp2 &&
+		git fast-import --force <tmp2 2>msgs &&
+
+		grep "Dropping.*since it would point to itself" msgs &&
+		git show-ref >refs &&
+		! grep refs/replace refs
+	)
+'
+
 ###
 ### series Y (submodules and hash algorithms)
 ###