diff mbox series

[diff-lib] Fix check_removed when fsmonitor is on

Message ID 20230906060241.944886-2-sokcevic@google.com (mailing list archive)
State New, archived
Headers show
Series [diff-lib] Fix check_removed when fsmonitor is on | expand

Commit Message

Josip Sokcevic Sept. 6, 2023, 6:02 a.m. UTC
git-diff-index may return incorrect deleted entries when fsmonitor is used in a
repository with git submodules. This can be observed on Mac machines, but it
can affect all other supported platforms too.

If fsmonitor is used, `stat *st` may not be initialized. Since `lstat` calls
aren't not desired when fsmonitor is on, skip the entire gitlink check using
the same condition used to initialize `stat *st`.
---
This patch is using jonathantanmy@ test case, which now passes. It's
possible there are similar edge cases with fsmonitor, so I'll do more
extensive e2e testing tomorrow.
 diff-lib.c                   | 2 +-
 t/t7527-builtin-fsmonitor.sh | 6 ++++++
 2 files changed, 7 insertions(+), 1 deletion(-)

Comments

Jonathan Tan Sept. 6, 2023, 8:37 p.m. UTC | #1
Josip Sokcevic <sokcevic@google.com> writes:
> git-diff-index may return incorrect deleted entries when fsmonitor is used in a
> repository with git submodules. This can be observed on Mac machines, but it
> can affect all other supported platforms too.
> 
> If fsmonitor is used, `stat *st` may not be initialized. Since `lstat` calls
> aren't not desired when fsmonitor is on, skip the entire gitlink check using
> the same condition used to initialize `stat *st`.

Ah, that's a great catch. Thanks for narrowing it down to
check_removed(). 

I see from "git blame" that this was introduced in 4f3d6d0261
(fsmonitor: skip lstat deletion check during git diff-index, 2021-03-
17). In that commit, subsequent code in check_removed() already used
"st" when it was possibly uninitialized, so perhaps no one noticed it.
I don't think anyone discussed it either on the mailing list thread that
introduced this patch [1].

[1] https://lore.kernel.org/git/pull.903.git.1615760258.gitgitgadget@gmail.com/

Looking at this function in more detail, though, I see that the solution
in this patch set is incomplete - both before and after this patch,
there are code paths in which "st" is never initialized, but "st->mode"
is used (not in check_removed(), but in its callers).

I think a more complete solution needs to look something like this:

        diff --git a/diff-lib.c b/diff-lib.c
        index d8aa777a73..5637237003 100644
        --- a/diff-lib.c
        +++ b/diff-lib.c
        @@ -39,10 +39,20 @@
         static int check_removed(const struct index_state *istate, const struct cache_entry *ce, struct stat *st)
         {
                assert(is_fsmonitor_refreshed(istate));
        -       if (!(ce->ce_flags & CE_FSMONITOR_VALID) && lstat(ce->name, st) < 0) {
        -               if (!is_missing_file_error(errno))
        -                       return -1;
        -               return 1;
        +       if (ce->ce_flags & CE_FSMONITOR_VALID) {
        +               /*
        +                * Both check_removed() and its callers expect lstat() to have
        +                * happened and, in particular, the st_mode field to be set.
        +                * Simulate this with the contents of ce.
        +                */
        +               memset(st, 0, sizeof(*st));
        +               st->st_mode = ce->ce_mode;
        +       } else {
        +               if (lstat(ce->name, st) < 0) {
        +                       if (!is_missing_file_error(errno))
        +                               return -1;
        +                       return 1;
        +               }
                }
                if (has_symlink_leading_path(ce->name, ce_namelen(ce)))
                        return 1;

Also, regarding the commit message, the title should be formatted like:

  diff-lib: simulate stat when using fsmonitor

or something like that (file, then colon, then message).

You'll also need a sign-off. See the [[sign-off]] section in
Documentation/SubmittingPatches for more information.

> diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
> index 0c241d6c14..894a80fbe8 100755
> --- a/t/t7527-builtin-fsmonitor.sh
> +++ b/t/t7527-builtin-fsmonitor.sh
> @@ -824,6 +824,10 @@ test_expect_success 'submodule always visited' '
>  
>  	create_super super &&
>  	create_sub sub &&
> +	git -C super --no-optional-locks diff-index --name-status HEAD >actual.with &&
> +	git -C super --no-optional-locks -c core.fsmonitor=false \
> +	    diff-index --name-status HEAD >actual.without &&
> +	    test_cmp actual.with actual.without &&
>  
>  	git -C super submodule add ../sub ./dir_1/dir_2/sub &&
>  	git -C super commit -m "add sub" &&

Any reason why these additions are here instead of where I put them (in
my_match_and_clean())? I think it makes more sense where I put them,
as then they would automatically apply to all tests.

> @@ -837,6 +841,8 @@ test_expect_success 'submodule always visited' '
>  	# some dirt in the submodule and confirm matching output.
>  
>  	# Completely clean status.
> +	echo Now running for clean status &&
> +
>  	my_match_and_clean &&
>  
>  	# .M S..U

You can remove this - I included this to show exactly where in the code
the test fails.
diff mbox series

Patch

diff --git a/diff-lib.c b/diff-lib.c
index d8aa777a73..c67aa5ff89 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -46,7 +46,7 @@  static int check_removed(const struct index_state *istate, const struct cache_en
 	}
 	if (has_symlink_leading_path(ce->name, ce_namelen(ce)))
 		return 1;
-	if (S_ISDIR(st->st_mode)) {
+	if (!(ce->ce_flags & CE_FSMONITOR_VALID) && S_ISDIR(st->st_mode)) {
 		struct object_id sub;
 
 		/*
diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
index 0c241d6c14..894a80fbe8 100755
--- a/t/t7527-builtin-fsmonitor.sh
+++ b/t/t7527-builtin-fsmonitor.sh
@@ -824,6 +824,10 @@  test_expect_success 'submodule always visited' '
 
 	create_super super &&
 	create_sub sub &&
+	git -C super --no-optional-locks diff-index --name-status HEAD >actual.with &&
+	git -C super --no-optional-locks -c core.fsmonitor=false \
+	    diff-index --name-status HEAD >actual.without &&
+	    test_cmp actual.with actual.without &&
 
 	git -C super submodule add ../sub ./dir_1/dir_2/sub &&
 	git -C super commit -m "add sub" &&
@@ -837,6 +841,8 @@  test_expect_success 'submodule always visited' '
 	# some dirt in the submodule and confirm matching output.
 
 	# Completely clean status.
+	echo Now running for clean status &&
+
 	my_match_and_clean &&
 
 	# .M S..U