diff mbox series

worktree: repair copied repository and linked worktrees

Message ID 20240923075416.54289-1-ericsunshine@charter.net (mailing list archive)
State Accepted
Commit 992f7a4fdbadda9f30ff3a8995e966f0562bc73a
Headers show
Series worktree: repair copied repository and linked worktrees | expand

Commit Message

Eric Sunshine Sept. 23, 2024, 7:54 a.m. UTC
From: Eric Sunshine <sunshine@sunshineco.com>

For each linked worktree, Git maintains two pointers: (1)
<repo>/worktrees/<id>/gitdir which points at the linked worktree, and
(2) <worktree>/.git which points back at <repo>/worktrees/<id>. Both
pointers are absolute pathnames.

Aside from manually manipulating those raw files, it is possible to
easily "break" one or both pointers by ignoring the "git worktree move"
command and instead manually moving a linked worktree, moving the
repository, or moving both. The "git worktree repair" command was
invented to handle this case by restoring these pointers to sane values.

For the "repair" command, the "git worktree" manual page states:

  Repair worktree administrative files, if possible, if they have
  become corrupted or outdated due to external factors.

The "if possible" clause was chosen deliberately to convey that the
existing implementation may not be able to fix every possible breakage,
and to imply that improvements may be made to handle other types of
breakage.

A recent problem report[*] illustrates a case in which "git worktree
repair" not only fails to fix breakage, but actually causes breakage.
Specifically, if a repository / main-worktree and linked worktrees are
*copied* as a unit (rather than *moved*), then "git worktree repair" run
in the copy leaves the copy untouched but botches the pointers in the
original repository and the original worktrees.

For instance, given this directory structure:

  orig/
    main/ (main-worktree)
    linked/ (linked worktree)

if "orig" is copied (not moved) to "dup", then immediately after the
manual copy operation:

  * orig/main/.git/worktrees/linked/gitdir points at orig/linked/.git
  * orig/linked/.git points at orig/main/.git/worktrees/linked
  * dup/main/.git/worktrees/linked/gitdir points at orig/linked/.git
  * dup/linked/.git points at orig/main/.git/worktrees/linked

So, dup/main thinks its linked worktree is orig/linked, and worktree
dup/linked thinks its repository / main-worktree is orig/main.

"git worktree repair" is reasonably simple-minded; it wants to trust
valid-looking pointers, hence doesn't try to second-guess them. In this
case, when validating dup/linked/.git, it finds a legitimate repository
pointer, orig/main/.git/worktrees/linked, thus trusts that is correct,
but does notice that gitdir in that directory doesn't point at
dup/linked/.git, so it (incorrectly) _fixes_
orig/main/.git/worktrees/linked/gitdir to point at dup/linked/.git.
Similarly, when validating dup/main/.git/worktrees/linked/gitdir, it
finds a legitimate worktree pointer, orig/linked/.git, but notices that
its .git file doesn't point back at dup/main, thus (incorrectly) _fixes_
orig/linked/.git to point at dup/main/.git/worktrees/linked. Hence, it
has modified and broken the linkage between orig/main and orig/linked
rather than fixing dup/main and dup/linked as expected.

Fix this problem by also checking if a plausible .git/worktrees/<id>
exists in the *current* repository -- not just in the repository pointed
at by the worktree's .git file -- and comparing whether they are the
same. If not, then it is likely because the repository / main-worktree
and linked worktrees were copied, so prefer the discovered plausible
pointer rather than the one from the existing .git file.

[*]: https://lore.kernel.org/git/E1sr5iF-0007zV-2k@binarylane-bailey.stuart.id.au/

Reported-by: Russell Stuart <russell+git.vger.kernel.org@stuart.id.au>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
---
 Documentation/git-worktree.txt |  2 +-
 t/t2406-worktree-repair.sh     | 19 ++++++++++++++++
 worktree.c                     | 40 +++++++++++++++++++++++++++++++++-
 3 files changed, 59 insertions(+), 2 deletions(-)

Comments

Eric Sunshine Sept. 23, 2024, 8:04 a.m. UTC | #1
On Mon, Sep 23, 2024 at 3:55 AM Eric Sunshine <ericsunshine@charter.net> wrote:
> [...]
> Fix this problem by also checking if a plausible .git/worktrees/<id>
> exists in the *current* repository -- not just in the repository pointed
> at by the worktree's .git file -- and comparing whether they are the
> same. If not, then it is likely because the repository / main-worktree
> and linked worktrees were copied, so prefer the discovered plausible
> pointer rather than the one from the existing .git file.
>
> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
> ---
> diff --git a/worktree.c b/worktree.c
> @@ -698,12 +699,24 @@ void repair_worktree_at_path(const char *path,
> +       inferred_backlink = infer_backlink(realdotgit.buf);
>         backlink = xstrdup_or_null(read_gitfile_gently(realdotgit.buf, &err));
>         if (err == READ_GITFILE_ERR_NOT_A_FILE) {
>                 fn(1, realdotgit.buf, _("unable to locate repository; .git is not a file"), cb_data);
>                 goto done;
>         } else if (err == READ_GITFILE_ERR_NOT_A_REPO) {
> -               if (!(backlink = infer_backlink(realdotgit.buf))) {
> +               if (inferred_backlink) {
> +                       /*
> +                        * Worktree's .git file does not point at a repository
> +                        * but we found a .git/worktrees/<id> in this
> +                        * repository with the same <id> as recorded in the
> +                        * worktree's .git file so make the worktree point at
> +                        * the discovered .git/worktrees/<id>. (Note: backlink
> +                        * is already NULL, so no need to free it first.)
> +                        */
> +                       backlink = inferred_backlink;
> +                       inferred_backlink = NULL;
> +               } else {
>                         fn(1, realdotgit.buf, _("unable to locate repository; .git file does not reference a repository"), cb_data);
>                         goto done;
>                 }

In case it's not obvious at first glance, there is no change of
behavior here. This changes in this hunk:

(1) factor out the call to infer_backlink() since the result is now
consulted in multiple code paths and the code needs to be careful to
ensure that all allocated strings get freed (at the end of the
function)

(2) add a comment explaining the *existing* behavior

The later hunk in this patch contains the actual behavior change.

> @@ -712,6 +725,30 @@ void repair_worktree_at_path(const char *path,
> +       /*
> +        * If we got this far, either the worktree's .git file pointed at a
> +        * valid repository (i.e. read_gitfile_gently() returned success) or
> +        * the .git file did not point at a repository but we were able to
> +        * infer a suitable new value for the .git file by locating a
> +        * .git/worktrees/<id> in *this* repository corresponding to the <id>
> +        * recorded in the worktree's .git file.
> +        *
> +        * However, if, at this point, inferred_backlink is non-NULL (i.e. we
> +        * found a suitable .git/worktrees/<id> in *this* repository) *and* the
> +        * worktree's .git file points at a valid repository *and* those two
> +        * paths differ, then that indicates that the user probably *copied*
> +        * the main and linked worktrees to a new location as a unit rather
> +        * than *moving* them. Thus, the copied worktree's .git file actually
> +        * points at the .git/worktrees/<id> in the *original* repository, not
> +        * in the "copy" repository. In this case, point the "copy" worktree's
> +        * .git file at the "copy" repository.
> +        */
> +       if (inferred_backlink && fspathcmp(backlink, inferred_backlink)) {
> +               free(backlink);
> +               backlink = inferred_backlink;
> +               inferred_backlink = NULL;
> +       }
> +
Junio C Hamano Sept. 23, 2024, 9:22 p.m. UTC | #2
Eric Sunshine <ericsunshine@charter.net> writes:

> Fix this problem by also checking if a plausible .git/worktrees/<id>
> exists in the *current* repository -- not just in the repository pointed
> at by the worktree's .git file -- and comparing whether they are the
> same. If not, then it is likely because the repository / main-worktree
> and linked worktrees were copied, so prefer the discovered plausible
> pointer rather than the one from the existing .git file.

In other words, before futzing with a repository to point at a
(potentially moved) worktree that points back to it, make sure that
the repository is in a corrupt state---if one of the worktrees we
know correctly points back to us already, that pointer should not be
mucked with, or we would end up spreading the damage further?

It does make sense.

> -If both the main worktree and linked worktrees have been moved manually,
> +If both the main worktree and linked worktrees have been moved or copied manually,
>  then running `repair` in the main worktree and specifying the new `<path>`
>  of each linked worktree will reestablish all connections in both
>  directions.

OK.

> diff --git a/worktree.c b/worktree.c
> index 30a947426e..7d1b60ff9f 100644
> --- a/worktree.c
> +++ b/worktree.c
> @@ -683,6 +683,7 @@ void repair_worktree_at_path(const char *path,
>  	struct strbuf gitdir = STRBUF_INIT;
>  	struct strbuf olddotgit = STRBUF_INIT;
>  	char *backlink = NULL;
> +	/*
> +	 * If we got this far, either the worktree's .git file pointed at a
> +	 * valid repository (i.e. read_gitfile_gently() returned success) or
> +	 * the .git file did not point at a repository but we were able to
> +	 * infer a suitable new value for the .git file by locating a
> +	 * .git/worktrees/<id> in *this* repository corresponding to the <id>
> +	 * recorded in the worktree's .git file.
> +	 *
> +	 * However, if, at this point, inferred_backlink is non-NULL (i.e. we
> +	 * found a suitable .git/worktrees/<id> in *this* repository) *and* the
> +	 * worktree's .git file points at a valid repository *and* those two
> +	 * paths differ, then that indicates that the user probably *copied*
> +	 * the main and linked worktrees to a new location as a unit rather
> +	 * than *moving* them. Thus, the copied worktree's .git file actually
> +	 * points at the .git/worktrees/<id> in the *original* repository, not
> +	 * in the "copy" repository. In this case, point the "copy" worktree's
> +	 * .git file at the "copy" repository.
> +	 */

Tricky, but cannot be avoided in order to help those who "copy".

> diff --git a/t/t2406-worktree-repair.sh b/t/t2406-worktree-repair.sh
> index edbf502ec5..7686e60f6a 100755
> --- a/t/t2406-worktree-repair.sh
> +++ b/t/t2406-worktree-repair.sh
> @@ -197,4 +197,23 @@ test_expect_success 'repair moved main and linked worktrees' '
>  	test_cmp expect-gitfile sidemoved/.git
>  '
>  
> +test_expect_success 'repair copied main and linked worktrees' '
> +	test_when_finished "rm -rf orig dup" &&
> +	mkdir -p orig &&
> +	git -C orig init main &&
> +	test_commit -C orig/main nothing &&
> +	git -C orig/main worktree add ../linked &&
> +	cp orig/main/.git/worktrees/linked/gitdir orig/main.expect &&
> +	cp orig/linked/.git orig/linked.expect &&
> +	cp -R orig dup &&
> +	sed "s,orig/linked/\.git$,dup/linked/.git," orig/main.expect >dup/main.expect &&
> +	sed "s,orig/main/\.git/worktrees/linked$,dup/main/.git/worktrees/linked," \
> +		orig/linked.expect >dup/linked.expect &&
> +	git -C dup/main worktree repair ../linked &&
> +	test_cmp orig/main.expect orig/main/.git/worktrees/linked/gitdir &&
> +	test_cmp orig/linked.expect orig/linked/.git &&
> +	test_cmp dup/main.expect dup/main/.git/worktrees/linked/gitdir &&
> +	test_cmp dup/linked.expect dup/linked/.git
> +'
> +
>  test_done
diff mbox series

Patch

diff --git a/Documentation/git-worktree.txt b/Documentation/git-worktree.txt
index 2a240f53ba..70437c815f 100644
--- a/Documentation/git-worktree.txt
+++ b/Documentation/git-worktree.txt
@@ -157,7 +157,7 @@  will reestablish the connection. If multiple linked worktrees are moved,
 running `repair` from any worktree with each tree's new `<path>` as an
 argument, will reestablish the connection to all the specified paths.
 +
-If both the main worktree and linked worktrees have been moved manually,
+If both the main worktree and linked worktrees have been moved or copied manually,
 then running `repair` in the main worktree and specifying the new `<path>`
 of each linked worktree will reestablish all connections in both
 directions.
diff --git a/t/t2406-worktree-repair.sh b/t/t2406-worktree-repair.sh
index edbf502ec5..7686e60f6a 100755
--- a/t/t2406-worktree-repair.sh
+++ b/t/t2406-worktree-repair.sh
@@ -197,4 +197,23 @@  test_expect_success 'repair moved main and linked worktrees' '
 	test_cmp expect-gitfile sidemoved/.git
 '
 
+test_expect_success 'repair copied main and linked worktrees' '
+	test_when_finished "rm -rf orig dup" &&
+	mkdir -p orig &&
+	git -C orig init main &&
+	test_commit -C orig/main nothing &&
+	git -C orig/main worktree add ../linked &&
+	cp orig/main/.git/worktrees/linked/gitdir orig/main.expect &&
+	cp orig/linked/.git orig/linked.expect &&
+	cp -R orig dup &&
+	sed "s,orig/linked/\.git$,dup/linked/.git," orig/main.expect >dup/main.expect &&
+	sed "s,orig/main/\.git/worktrees/linked$,dup/main/.git/worktrees/linked," \
+		orig/linked.expect >dup/linked.expect &&
+	git -C dup/main worktree repair ../linked &&
+	test_cmp orig/main.expect orig/main/.git/worktrees/linked/gitdir &&
+	test_cmp orig/linked.expect orig/linked/.git &&
+	test_cmp dup/main.expect dup/main/.git/worktrees/linked/gitdir &&
+	test_cmp dup/linked.expect dup/linked/.git
+'
+
 test_done
diff --git a/worktree.c b/worktree.c
index 30a947426e..7d1b60ff9f 100644
--- a/worktree.c
+++ b/worktree.c
@@ -683,6 +683,7 @@  void repair_worktree_at_path(const char *path,
 	struct strbuf gitdir = STRBUF_INIT;
 	struct strbuf olddotgit = STRBUF_INIT;
 	char *backlink = NULL;
+	char *inferred_backlink = NULL;
 	const char *repair = NULL;
 	int err;
 
@@ -698,12 +699,24 @@  void repair_worktree_at_path(const char *path,
 		goto done;
 	}
 
+	inferred_backlink = infer_backlink(realdotgit.buf);
 	backlink = xstrdup_or_null(read_gitfile_gently(realdotgit.buf, &err));
 	if (err == READ_GITFILE_ERR_NOT_A_FILE) {
 		fn(1, realdotgit.buf, _("unable to locate repository; .git is not a file"), cb_data);
 		goto done;
 	} else if (err == READ_GITFILE_ERR_NOT_A_REPO) {
-		if (!(backlink = infer_backlink(realdotgit.buf))) {
+		if (inferred_backlink) {
+			/*
+			 * Worktree's .git file does not point at a repository
+			 * but we found a .git/worktrees/<id> in this
+			 * repository with the same <id> as recorded in the
+			 * worktree's .git file so make the worktree point at
+			 * the discovered .git/worktrees/<id>. (Note: backlink
+			 * is already NULL, so no need to free it first.)
+			 */
+			backlink = inferred_backlink;
+			inferred_backlink = NULL;
+		} else {
 			fn(1, realdotgit.buf, _("unable to locate repository; .git file does not reference a repository"), cb_data);
 			goto done;
 		}
@@ -712,6 +725,30 @@  void repair_worktree_at_path(const char *path,
 		goto done;
 	}
 
+	/*
+	 * If we got this far, either the worktree's .git file pointed at a
+	 * valid repository (i.e. read_gitfile_gently() returned success) or
+	 * the .git file did not point at a repository but we were able to
+	 * infer a suitable new value for the .git file by locating a
+	 * .git/worktrees/<id> in *this* repository corresponding to the <id>
+	 * recorded in the worktree's .git file.
+	 *
+	 * However, if, at this point, inferred_backlink is non-NULL (i.e. we
+	 * found a suitable .git/worktrees/<id> in *this* repository) *and* the
+	 * worktree's .git file points at a valid repository *and* those two
+	 * paths differ, then that indicates that the user probably *copied*
+	 * the main and linked worktrees to a new location as a unit rather
+	 * than *moving* them. Thus, the copied worktree's .git file actually
+	 * points at the .git/worktrees/<id> in the *original* repository, not
+	 * in the "copy" repository. In this case, point the "copy" worktree's
+	 * .git file at the "copy" repository.
+	 */
+	if (inferred_backlink && fspathcmp(backlink, inferred_backlink)) {
+		free(backlink);
+		backlink = inferred_backlink;
+		inferred_backlink = NULL;
+	}
+
 	strbuf_addf(&gitdir, "%s/gitdir", backlink);
 	if (strbuf_read_file(&olddotgit, gitdir.buf, 0) < 0)
 		repair = _("gitdir unreadable");
@@ -727,6 +764,7 @@  void repair_worktree_at_path(const char *path,
 	}
 done:
 	free(backlink);
+	free(inferred_backlink);
 	strbuf_release(&olddotgit);
 	strbuf_release(&gitdir);
 	strbuf_release(&realdotgit);