diff mbox series

[5/6] Comment important codepaths regarding nuking untracked files/dirs

Message ID 60c5d6b4615a6ac4179ec6c10e17cca480bc147a.1632006924.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Fix various issues around removal of untracked files/directories | expand

Commit Message

Elijah Newren Sept. 18, 2021, 11:15 p.m. UTC
From: Elijah Newren <newren@gmail.com>

In the last few commits we focused on code in unpack-trees.c that
mistakenly removed untracked files or directories.  There may be more of
those, but in this commit we change our focus: callers of toplevel
commands that are expected to remove untracked files or directories.

As noted previously, we have toplevel commands that are expected to
delete untracked files such as 'read-tree --reset', 'reset --hard', and
'checkout --force'.  However, that does not mean that other highlevel
commands that happen to call these other commands thought about or
conveyed to users the possibility that untracked files could be removed.
Audit the code for such callsites, and add comments near existing
callsites to mention whether these are safe or not.

My auditing is somewhat incomplete, though; it skipped several cases:
  * git-rebase--preserve-merges.sh: is in the process of being
    deprecated/removed, so I won't leave a note that there are
    likely more bugs in that script.
  * contrib/git-new-workdir: why is the -f flag being used in a new
    empty directory??  It shouldn't hurt, but it seems useless.
  * git-p4.py: Don't see why -f is needed for a new dir (maybe it's
    not and is just superfluous), but I'm not at all familiar with
    the p4 stuff
  * git-archimport.perl: Don't care; arch is long since dead
  * git-cvs*.perl: Don't care; cvs is long since dead

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/stash.c             | 1 +
 builtin/submodule--helper.c | 4 ++++
 builtin/worktree.c          | 5 +++++
 contrib/rerere-train.sh     | 2 +-
 submodule.c                 | 1 +
 5 files changed, 12 insertions(+), 1 deletion(-)

Comments

Luke Diamand Sept. 24, 2021, 11:47 a.m. UTC | #1
On Sun, 19 Sept 2021 at 00:15, Elijah Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Elijah Newren <newren@gmail.com>
>
> In the last few commits we focused on code in unpack-trees.c that
> mistakenly removed untracked files or directories.  There may be more of
> those, but in this commit we change our focus: callers of toplevel
> commands that are expected to remove untracked files or directories.
>
> As noted previously, we have toplevel commands that are expected to
> delete untracked files such as 'read-tree --reset', 'reset --hard', and
> 'checkout --force'.  However, that does not mean that other highlevel
> commands that happen to call these other commands thought about or
> conveyed to users the possibility that untracked files could be removed.
> Audit the code for such callsites, and add comments near existing
> callsites to mention whether these are safe or not.
>
> My auditing is somewhat incomplete, though; it skipped several cases:
>   * git-rebase--preserve-merges.sh: is in the process of being
>     deprecated/removed, so I won't leave a note that there are
>     likely more bugs in that script.
>   * contrib/git-new-workdir: why is the -f flag being used in a new
>     empty directory??  It shouldn't hurt, but it seems useless.
>   * git-p4.py: Don't see why -f is needed for a new dir (maybe it's
>     not and is just superfluous), but I'm not at all familiar with
>     the p4 stuff

Assuming you're talking about this code in git-p4.py:

            print("Synchronizing p4 checkout...")
            if new_client_dir:
                # old one was destroyed, and maybe nobody told p4
                p4_sync("...", "-f")
            else:
                p4_sync("...")

This is doing a Perforce sync in the P4 repo, not the git repo.

In the usual/happy case, this directory already exists, the Perforce
server knows about its state, and a normal "p4 sync ..." will bring it
up to date.

But, if someone manually deleted the directory then "p4 sync ..." will
only update modified files, and all sorts of things will then go wrong
(e.g. the files we updated in the git view won't be present, and
git-p4 will fall flat on its face).

So in this case, do a forced sync, which syncs everything ignoring the
P4 server's idea of what files are/not present.

Luke
Elijah Newren Sept. 24, 2021, 1:41 p.m. UTC | #2
On Fri, Sep 24, 2021 at 4:47 AM Luke Diamand <luke@diamand.org> wrote:
>
> On Sun, 19 Sept 2021 at 00:15, Elijah Newren via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Elijah Newren <newren@gmail.com>
> >
> > In the last few commits we focused on code in unpack-trees.c that
> > mistakenly removed untracked files or directories.  There may be more of
> > those, but in this commit we change our focus: callers of toplevel
> > commands that are expected to remove untracked files or directories.
> >
> > As noted previously, we have toplevel commands that are expected to
> > delete untracked files such as 'read-tree --reset', 'reset --hard', and
> > 'checkout --force'.  However, that does not mean that other highlevel
> > commands that happen to call these other commands thought about or
> > conveyed to users the possibility that untracked files could be removed.
> > Audit the code for such callsites, and add comments near existing
> > callsites to mention whether these are safe or not.
> >
> > My auditing is somewhat incomplete, though; it skipped several cases:
> >   * git-rebase--preserve-merges.sh: is in the process of being
> >     deprecated/removed, so I won't leave a note that there are
> >     likely more bugs in that script.
> >   * contrib/git-new-workdir: why is the -f flag being used in a new
> >     empty directory??  It shouldn't hurt, but it seems useless.
> >   * git-p4.py: Don't see why -f is needed for a new dir (maybe it's
> >     not and is just superfluous), but I'm not at all familiar with
> >     the p4 stuff
>
> Assuming you're talking about this code in git-p4.py:
>
>             print("Synchronizing p4 checkout...")
>             if new_client_dir:
>                 # old one was destroyed, and maybe nobody told p4
>                 p4_sync("...", "-f")
>             else:
>                 p4_sync("...")

No, I was talking about this code:

                system([ "git", "checkout", "-f" ])
diff mbox series

Patch

diff --git a/builtin/stash.c b/builtin/stash.c
index 4ceb3581b47..ce5e0364c68 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -1519,6 +1519,7 @@  static int do_push_stash(const struct pathspec *ps, const char *stash_msg, int q
 		} else {
 			struct child_process cp = CHILD_PROCESS_INIT;
 			cp.git_cmd = 1;
+			/* BUG: this nukes untracked files in the way */
 			strvec_pushl(&cp.args, "reset", "--hard", "-q",
 				     "--no-recurse-submodules", NULL);
 			if (run_command(&cp)) {
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index ef2776a9e45..a49242d15ae 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -2864,6 +2864,10 @@  static int add_submodule(const struct add_data *add_data)
 		prepare_submodule_repo_env(&cp.env_array);
 		cp.git_cmd = 1;
 		cp.dir = add_data->sm_path;
+		/*
+		 * NOTE: we only get here if add_data->force is true, so
+		 * passing --force to checkout is reasonable.
+		 */
 		strvec_pushl(&cp.args, "checkout", "-f", "-q", NULL);
 
 		if (add_data->branch) {
diff --git a/builtin/worktree.c b/builtin/worktree.c
index 0d0a80da61f..383947ff54f 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -356,6 +356,11 @@  static int add_worktree(const char *path, const char *refname,
 	if (opts->checkout) {
 		cp.argv = NULL;
 		strvec_clear(&cp.args);
+		/*
+		 * NOTE: reset --hard is okay here, because 'worktree add'
+		 * refuses to work in an extant non-empty directory, so there
+		 * is no risk of deleting untracked files.
+		 */
 		strvec_pushl(&cp.args, "reset", "--hard", "--no-recurse-submodules", NULL);
 		if (opts->quiet)
 			strvec_push(&cp.args, "--quiet");
diff --git a/contrib/rerere-train.sh b/contrib/rerere-train.sh
index eeee45dd341..75125d6ae00 100755
--- a/contrib/rerere-train.sh
+++ b/contrib/rerere-train.sh
@@ -91,7 +91,7 @@  do
 		git checkout -q $commit -- .
 		git rerere
 	fi
-	git reset -q --hard
+	git reset -q --hard  # Might nuke untracked files...
 done
 
 if test -z "$branch"
diff --git a/submodule.c b/submodule.c
index 8e611fe1dbf..a9b71d585cf 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1866,6 +1866,7 @@  static void submodule_reset_index(const char *path)
 
 	strvec_pushf(&cp.args, "--super-prefix=%s%s/",
 		     get_super_prefix_or_empty(), path);
+	/* TODO: determine if this might overwright untracked files */
 	strvec_pushl(&cp.args, "read-tree", "-u", "--reset", NULL);
 
 	strvec_push(&cp.args, empty_tree_oid_hex());