diff mbox series

[v6,23/28] fsmonitor: never set CE_FSMONITOR_VALID on submodules

Message ID d0e25f6bac663e9ae4d63322f102378dd2ecba84.1650662994.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Builtin FSMonitor Part 3 | expand

Commit Message

Jeff Hostetler April 22, 2022, 9:29 p.m. UTC
From: Jeff Hostetler <jeffhost@microsoft.com>

Never set CE_FSMONITOR_VALID on the cache-entry of submodule
directories.

During a client command like 'git status', we may need to recurse
into each submodule to compute a status summary for the submodule.
Since the purpose of the ce_flag is to let Git avoid scanning a
cache-entry, setting the flag causes the recursive call to be
avoided and we report incorrect (no status) for the submodule.

We created an OS watch on the root directory of our working
directory and we receive events for everything in the cone
under it.  When submodules are present inside our working
directory, we receive events for both our repo (the super) and
any subs within it.  Since our index doesn't have any information
for items within the submodules, we can't use those events.

We could try to truncate the paths of those events back to the
submodule boundary and mark the GITLINK as dirty, but that
feels expensive since we would have to prefix compare every FS
event that we receive against a list of submodule roots.  And
it still wouldn't be sufficient to correctly report status on
the submodule, since we don't have any space in the cache-entry
to cache the submodule's status (the 'SCMU' bits in porcelain
V2 speak).  That is, the CE_FSMONITOR_VALID bit just says that
we don't need to scan/inspect it because we already know the
answer -- it doesn't say that the item is clean -- and we
don't have space in the cache-entry to store those answers.
So we should always do the recursive scan.

Therefore, we should never set the flag on GITLINK cache-entries.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 fsmonitor.c                  |   2 +
 fsmonitor.h                  |  11 ++++
 t/t7527-builtin-fsmonitor.sh | 111 +++++++++++++++++++++++++++++++++++
 3 files changed, 124 insertions(+)

Comments

Johannes Schindelin May 12, 2022, 3:11 p.m. UTC | #1
Hi Jeff,

On Fri, 22 Apr 2022, Jeff Hostetler via GitGitGadget wrote:

> From: Jeff Hostetler <jeffhost@microsoft.com>
>
> Never set CE_FSMONITOR_VALID on the cache-entry of submodule
> directories.
>
> During a client command like 'git status', we may need to recurse
> into each submodule to compute a status summary for the submodule.
> Since the purpose of the ce_flag is to let Git avoid scanning a
> cache-entry, setting the flag causes the recursive call to be
> avoided and we report incorrect (no status) for the submodule.
>
> We created an OS watch on the root directory of our working
> directory and we receive events for everything in the cone
> under it.  When submodules are present inside our working
> directory, we receive events for both our repo (the super) and
> any subs within it.  Since our index doesn't have any information
> for items within the submodules, we can't use those events.
>
> We could try to truncate the paths of those events back to the
> submodule boundary and mark the GITLINK as dirty, but that
> feels expensive since we would have to prefix compare every FS
> event that we receive against a list of submodule roots.  And
> it still wouldn't be sufficient to correctly report status on
> the submodule, since we don't have any space in the cache-entry
> to cache the submodule's status (the 'SCMU' bits in porcelain
> V2 speak).  That is, the CE_FSMONITOR_VALID bit just says that
> we don't need to scan/inspect it because we already know the
> answer -- it doesn't say that the item is clean -- and we
> don't have space in the cache-entry to store those answers.
> So we should always do the recursive scan.
>
> Therefore, we should never set the flag on GITLINK cache-entries.
>
> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> ---
>  fsmonitor.c                  |   2 +
>  fsmonitor.h                  |  11 ++++
>  t/t7527-builtin-fsmonitor.sh | 111 +++++++++++++++++++++++++++++++++++
>  3 files changed, 124 insertions(+)
>
> diff --git a/fsmonitor.c b/fsmonitor.c
> index e1229c289cf..57d6a483bee 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -580,6 +580,8 @@ void tweak_fsmonitor(struct index_state *istate)
>  		if (fsmonitor_enabled) {
>  			/* Mark all entries valid */
>  			for (i = 0; i < istate->cache_nr; i++) {
> +				if (S_ISGITLINK(istate->cache[i]->ce_mode))
> +					continue;
>  				istate->cache[i]->ce_flags |= CE_FSMONITOR_VALID;
>  			}
>
> diff --git a/fsmonitor.h b/fsmonitor.h
> index 3f41f653691..edf7ce5203b 100644
> --- a/fsmonitor.h
> +++ b/fsmonitor.h
> @@ -68,6 +68,15 @@ static inline int is_fsmonitor_refreshed(const struct index_state *istate)
>   * Set the given cache entries CE_FSMONITOR_VALID bit. This should be
>   * called any time the cache entry has been updated to reflect the
>   * current state of the file on disk.
> + *
> + * However, never mark submodules as valid.  When commands like "git
> + * status" run they might need to recurse into the submodule (using a
> + * child process) to get a summary of the submodule state.  We don't
> + * have (and don't want to create) the facility to translate every
> + * FS event that we receive and that happens to be deep inside of a
> + * submodule back to the submodule root, so we cannot correctly keep
> + * track of this bit on the gitlink directory.  Therefore, we never
> + * set it on submodules.
>   */
>  static inline void mark_fsmonitor_valid(struct index_state *istate, struct cache_entry *ce)
>  {
> @@ -75,6 +84,8 @@ static inline void mark_fsmonitor_valid(struct index_state *istate, struct cache
>
>  	if (fsm_mode > FSMONITOR_MODE_DISABLED &&
>  	    !(ce->ce_flags & CE_FSMONITOR_VALID)) {
> +		if (S_ISGITLINK(ce->ce_mode))
> +			return;
>  		istate->cache_changed = 1;
>  		ce->ce_flags |= CE_FSMONITOR_VALID;
>  		trace_printf_key(&trace_fsmonitor, "mark_fsmonitor_clean '%s'", ce->name);
> diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
> index d0e681d008f..4c49ae5a684 100755
> --- a/t/t7527-builtin-fsmonitor.sh
> +++ b/t/t7527-builtin-fsmonitor.sh
> @@ -721,4 +721,115 @@ do
>  	'
>  done
>
> +# Test fsmonitor interaction with submodules.
> +#
> +# If we start the daemon in the super, it will see FS events for
> +# everything in the working directory cone and this includes any
> +# files/directories contained *within* the submodules.
> +#
> +# A `git status` at top level will get events for items within the
> +# submodule and ignore them, since they aren't named in the index
> +# of the super repo.  This makes the fsmonitor response a little
> +# noisy, but it doesn't alter the correctness of the state of the
> +# super-proper.
> +#
> +# When we have submodules, `git status` normally does a recursive
> +# status on each of the submodules and adds a summary row for any
> +# dirty submodules.  (See the "S..." bits in porcelain V2 output.)
> +#
> +# It is therefore important that the top level status not be tricked
> +# by the FSMonitor response to skip those recursive calls.  That is,
> +# even if FSMonitor says that the mtime of the submodule directory
> +# hasn't changed and it could be implicitly marked valid, we must
> +# not take that shortcut.  We need to force the recusion into the
> +# submodule so that we get a summary of the status *within* the
> +# submodule.
> +
> +create_super () {
> +	super="$1" &&
> +
> +	git init "$super" &&
> +	echo x >"$super/file_1" &&
> +	echo y >"$super/file_2" &&
> +	echo z >"$super/file_3" &&
> +	mkdir "$super/dir_1" &&
> +	echo a >"$super/dir_1/file_11" &&
> +	echo b >"$super/dir_1/file_12" &&
> +	mkdir "$super/dir_1/dir_2" &&
> +	echo a >"$super/dir_1/dir_2/file_21" &&
> +	echo b >"$super/dir_1/dir_2/file_22" &&
> +	git -C "$super" add . &&
> +	git -C "$super" commit -m "initial $super commit"
> +}
> +
> +create_sub () {
> +	sub="$1" &&
> +
> +	git init "$sub" &&
> +	echo x >"$sub/file_x" &&
> +	echo y >"$sub/file_y" &&
> +	echo z >"$sub/file_z" &&
> +	mkdir "$sub/dir_x" &&
> +	echo a >"$sub/dir_x/file_a" &&
> +	echo b >"$sub/dir_x/file_b" &&
> +	mkdir "$sub/dir_x/dir_y" &&
> +	echo a >"$sub/dir_x/dir_y/file_a" &&
> +	echo b >"$sub/dir_x/dir_y/file_b" &&
> +	git -C "$sub" add . &&
> +	git -C "$sub" commit -m "initial $sub commit"
> +}
> +
> +my_match_and_clean () {
> +	git -C super --no-optional-locks status --porcelain=v2 >actual.with &&
> +	git -C super --no-optional-locks -c core.fsmonitor=false \
> +		status --porcelain=v2 >actual.without &&
> +	test_cmp actual.with actual.without &&
> +
> +	git -C super/dir_1/dir_2/sub reset --hard &&
> +	git -C super/dir_1/dir_2/sub clean -d -f
> +}
> +
> +test_expect_success "Submodule always visited" '

I almost feel bad offering this nit: could you use single-quotes, and
start with a lower-case `s`?

Thanks,
Dscho

> +	test_when_finished "git -C super fsmonitor--daemon stop; \
> +			    rm -rf super; \
> +			    rm -rf sub" &&
> +
> +	create_super super &&
> +	create_sub sub &&
> +
> +	git -C super submodule add ../sub ./dir_1/dir_2/sub &&
> +	git -C super commit -m "add sub" &&
> +
> +	start_daemon -C super &&
> +	git -C super config core.fsmonitor true &&
> +	git -C super update-index --fsmonitor &&
> +	git -C super status &&
> +
> +	# Now run pairs of commands w/ and w/o FSMonitor while we make
> +	# some dirt in the submodule and confirm matching output.
> +
> +	# Completely clean status.
> +	my_match_and_clean &&
> +
> +	# .M S..U
> +	echo z >super/dir_1/dir_2/sub/dir_x/dir_y/foobar_u &&
> +	my_match_and_clean &&
> +
> +	# .M S.M.
> +	echo z >super/dir_1/dir_2/sub/dir_x/dir_y/foobar_m &&
> +	git -C super/dir_1/dir_2/sub add . &&
> +	my_match_and_clean &&
> +
> +	# .M S.M.
> +	echo z >>super/dir_1/dir_2/sub/dir_x/dir_y/file_a &&
> +	git -C super/dir_1/dir_2/sub add . &&
> +	my_match_and_clean &&
> +
> +	# .M SC..
> +	echo z >>super/dir_1/dir_2/sub/dir_x/dir_y/file_a &&
> +	git -C super/dir_1/dir_2/sub add . &&
> +	git -C super/dir_1/dir_2/sub commit -m "SC.." &&
> +	my_match_and_clean
> +'
> +
>  test_done
> --
> gitgitgadget
>
>
diff mbox series

Patch

diff --git a/fsmonitor.c b/fsmonitor.c
index e1229c289cf..57d6a483bee 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -580,6 +580,8 @@  void tweak_fsmonitor(struct index_state *istate)
 		if (fsmonitor_enabled) {
 			/* Mark all entries valid */
 			for (i = 0; i < istate->cache_nr; i++) {
+				if (S_ISGITLINK(istate->cache[i]->ce_mode))
+					continue;
 				istate->cache[i]->ce_flags |= CE_FSMONITOR_VALID;
 			}
 
diff --git a/fsmonitor.h b/fsmonitor.h
index 3f41f653691..edf7ce5203b 100644
--- a/fsmonitor.h
+++ b/fsmonitor.h
@@ -68,6 +68,15 @@  static inline int is_fsmonitor_refreshed(const struct index_state *istate)
  * Set the given cache entries CE_FSMONITOR_VALID bit. This should be
  * called any time the cache entry has been updated to reflect the
  * current state of the file on disk.
+ *
+ * However, never mark submodules as valid.  When commands like "git
+ * status" run they might need to recurse into the submodule (using a
+ * child process) to get a summary of the submodule state.  We don't
+ * have (and don't want to create) the facility to translate every
+ * FS event that we receive and that happens to be deep inside of a
+ * submodule back to the submodule root, so we cannot correctly keep
+ * track of this bit on the gitlink directory.  Therefore, we never
+ * set it on submodules.
  */
 static inline void mark_fsmonitor_valid(struct index_state *istate, struct cache_entry *ce)
 {
@@ -75,6 +84,8 @@  static inline void mark_fsmonitor_valid(struct index_state *istate, struct cache
 
 	if (fsm_mode > FSMONITOR_MODE_DISABLED &&
 	    !(ce->ce_flags & CE_FSMONITOR_VALID)) {
+		if (S_ISGITLINK(ce->ce_mode))
+			return;
 		istate->cache_changed = 1;
 		ce->ce_flags |= CE_FSMONITOR_VALID;
 		trace_printf_key(&trace_fsmonitor, "mark_fsmonitor_clean '%s'", ce->name);
diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
index d0e681d008f..4c49ae5a684 100755
--- a/t/t7527-builtin-fsmonitor.sh
+++ b/t/t7527-builtin-fsmonitor.sh
@@ -721,4 +721,115 @@  do
 	'
 done
 
+# Test fsmonitor interaction with submodules.
+#
+# If we start the daemon in the super, it will see FS events for
+# everything in the working directory cone and this includes any
+# files/directories contained *within* the submodules.
+#
+# A `git status` at top level will get events for items within the
+# submodule and ignore them, since they aren't named in the index
+# of the super repo.  This makes the fsmonitor response a little
+# noisy, but it doesn't alter the correctness of the state of the
+# super-proper.
+#
+# When we have submodules, `git status` normally does a recursive
+# status on each of the submodules and adds a summary row for any
+# dirty submodules.  (See the "S..." bits in porcelain V2 output.)
+#
+# It is therefore important that the top level status not be tricked
+# by the FSMonitor response to skip those recursive calls.  That is,
+# even if FSMonitor says that the mtime of the submodule directory
+# hasn't changed and it could be implicitly marked valid, we must
+# not take that shortcut.  We need to force the recusion into the
+# submodule so that we get a summary of the status *within* the
+# submodule.
+
+create_super () {
+	super="$1" &&
+
+	git init "$super" &&
+	echo x >"$super/file_1" &&
+	echo y >"$super/file_2" &&
+	echo z >"$super/file_3" &&
+	mkdir "$super/dir_1" &&
+	echo a >"$super/dir_1/file_11" &&
+	echo b >"$super/dir_1/file_12" &&
+	mkdir "$super/dir_1/dir_2" &&
+	echo a >"$super/dir_1/dir_2/file_21" &&
+	echo b >"$super/dir_1/dir_2/file_22" &&
+	git -C "$super" add . &&
+	git -C "$super" commit -m "initial $super commit"
+}
+
+create_sub () {
+	sub="$1" &&
+
+	git init "$sub" &&
+	echo x >"$sub/file_x" &&
+	echo y >"$sub/file_y" &&
+	echo z >"$sub/file_z" &&
+	mkdir "$sub/dir_x" &&
+	echo a >"$sub/dir_x/file_a" &&
+	echo b >"$sub/dir_x/file_b" &&
+	mkdir "$sub/dir_x/dir_y" &&
+	echo a >"$sub/dir_x/dir_y/file_a" &&
+	echo b >"$sub/dir_x/dir_y/file_b" &&
+	git -C "$sub" add . &&
+	git -C "$sub" commit -m "initial $sub commit"
+}
+
+my_match_and_clean () {
+	git -C super --no-optional-locks status --porcelain=v2 >actual.with &&
+	git -C super --no-optional-locks -c core.fsmonitor=false \
+		status --porcelain=v2 >actual.without &&
+	test_cmp actual.with actual.without &&
+
+	git -C super/dir_1/dir_2/sub reset --hard &&
+	git -C super/dir_1/dir_2/sub clean -d -f
+}
+
+test_expect_success "Submodule always visited" '
+	test_when_finished "git -C super fsmonitor--daemon stop; \
+			    rm -rf super; \
+			    rm -rf sub" &&
+
+	create_super super &&
+	create_sub sub &&
+
+	git -C super submodule add ../sub ./dir_1/dir_2/sub &&
+	git -C super commit -m "add sub" &&
+
+	start_daemon -C super &&
+	git -C super config core.fsmonitor true &&
+	git -C super update-index --fsmonitor &&
+	git -C super status &&
+
+	# Now run pairs of commands w/ and w/o FSMonitor while we make
+	# some dirt in the submodule and confirm matching output.
+
+	# Completely clean status.
+	my_match_and_clean &&
+
+	# .M S..U
+	echo z >super/dir_1/dir_2/sub/dir_x/dir_y/foobar_u &&
+	my_match_and_clean &&
+
+	# .M S.M.
+	echo z >super/dir_1/dir_2/sub/dir_x/dir_y/foobar_m &&
+	git -C super/dir_1/dir_2/sub add . &&
+	my_match_and_clean &&
+
+	# .M S.M.
+	echo z >>super/dir_1/dir_2/sub/dir_x/dir_y/file_a &&
+	git -C super/dir_1/dir_2/sub add . &&
+	my_match_and_clean &&
+
+	# .M SC..
+	echo z >>super/dir_1/dir_2/sub/dir_x/dir_y/file_a &&
+	git -C super/dir_1/dir_2/sub add . &&
+	git -C super/dir_1/dir_2/sub commit -m "SC.." &&
+	my_match_and_clean
+'
+
 test_done