mbox series

[v6,0/2] Support untracked cache with '--untracked-files=all' if configured

Message ID pull.985.v6.git.1648742535.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Support untracked cache with '--untracked-files=all' if configured | expand

Message

Philippe Blain via GitGitGadget March 31, 2022, 4:02 p.m. UTC
Make it possible for users of the -uall flag to git status, either by
preference or by need (eg UI tooling), to benefit from the untracked cache
when they set their 'status.showuntrackedfiles' config setting to 'all'.
This is especially useful for large repos in Windows, where without
untracked cache "git status" times can easily be 5x higher, and GUI tooling
typically does use -uall.

In this sixth version, clarify the main commit message and some of the code
comments, and adjust the logic slightly such that an existing untracked
cache structure, when consistent with the requested flags in the current
run, can be reused even though the current config would set/store/use the
other set of flags on a new untracked cache structure.

I'm comfortable with this patch as-is, but am still interested in any
thoughts as to whether it makes sense and is likely to be accepted to do
this as a simple enhancement as proposed here, or whether people be more
comfortable with a new configuration option, given the potential for worse
performance under specific (and, I believe, vanishingly rare) circumstances.

Tao Klerks (2):
  untracked-cache: test untracked-cache-bypassing behavior with -uall
  untracked-cache: support '--untracked-files=all' if configured

 dir.c                             |  88 ++++++++++++++++++-----
 t/t7063-status-untracked-cache.sh | 113 ++++++++++++++++++++++++++++++
 2 files changed, 185 insertions(+), 16 deletions(-)


base-commit: 805e0a68082a217f0112db9ee86a022227a9c81b
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-985%2FTaoK%2Ftaok-untracked-cache-with-uall-v6
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-985/TaoK/taok-untracked-cache-with-uall-v6
Pull-Request: https://github.com/gitgitgadget/git/pull/985

Range-diff vs v5:

 1:  98a4d8f35c5 ! 1:  f27f018493a untracked-cache: test untracked-cache-bypassing behavior with -uall
     @@ Commit message
          '--untracked-files=normal', and it gets ignored when
          '--untracked-files=all' is specified instead.
      
     -    Add explicit tests for this behavior.
     +    Add explicit tests for this known as-designed behavior.
      
          Signed-off-by: Tao Klerks <tao@klerks.biz>
      
     @@ t/t7063-status-untracked-cache.sh: test_expect_success 'untracked cache after se
      +?? three
      +EOF
      +
     -+# Bypassing the untracked cache here is not desirable, but it expected
     -+# in the current implementation
     ++# Bypassing the untracked cache here is not desirable from an
     ++# end-user perspective, but is expected in the current design.
     ++# The untracked cache data stored for a -unormal run cannot be
     ++# correctly used in a -uall run - it would yield incorrect output.
      +test_expect_success 'untracked cache is bypassed with -uall' '
      +	: >../trace.output &&
      +	GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace.output" \
 2:  f60d2c6e36c ! 2:  0095ec49c5f untracked-cache: support '--untracked-files=all' if configured
     @@ Commit message
          untracked-cache: support '--untracked-files=all' if configured
      
          Untracked cache was originally designed to only work with
     -    '--untracked-files=normal', but this causes performance issues for UI
     -    tooling that wants to see "all" on a frequent basis. On the other hand,
     -    the conditions that prevented applicability to the "all" mode no
     -    longer seem to apply.
     +    "--untracked-files=normal", and is bypassed when
     +    "--untracked-files=all" is requested, but this causes performance
     +    issues for UI tooling that wants to see "all" on a frequent basis.
      
     -    The disqualification of untracked cache has a particularly significant
     -    impact on Windows with the defaulted fscache, where the introduction
     -    of fsmonitor can make the first and costly directory-iteration happen
     -    in untracked file detection, single-threaded, rather than in
     -    preload-index on multiple threads. Improving the performance of a
     -    "normal" 'git status' run with fsmonitor can make
     -    'git status --untracked-files=all' perform much worse.
     +    On the other hand, the conditions that altogether prevented
     +    applicability to the "all" mode no longer seem to apply, after
     +    several major refactors in recent years; this possibility was
     +    discussed in
     +    81153d02-8e7a-be59-e709-e90cd5906f3a@jeffhostetler.com and
     +    CABPp-BFiwzzUgiTj_zu+vF5x20L0=1cf25cHwk7KZQj2YkVzXw@mail.gmail.com,
     +    and somewhat confirmed experimentally by several users using a
     +    version of this patch to use untracked cache with -uall for about a
     +    year.
     +
     +    When 'git status' runs without using the untracked cache, on a large
     +    repo, on windows, with fsmonitor, it can run very slowly. This can
     +    make GUIs that need to use "-uall" (and therefore currently bypass
     +    untracked cache) unusable when fsmonitor is enabled, on such large
     +    repos.
      
          To partially address this, align the supported directory flags for the
          stored untracked cache data with the git config. If a user specifies
     @@ dir.c: static void new_untracked_cache(struct index_state *istate)
       		}
       	}
       }
     -@@ dir.c: void remove_untracked_cache(struct index_state *istate)
     - }
     - 
     - static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *dir,
     --						      int base_len,
     --						      const struct pathspec *pathspec,
     --						      struct index_state *istate)
     -+							    int base_len,
     -+							    const struct pathspec *pathspec,
     -+							    struct index_state *istate)
     - {
     - 	struct untracked_cache_dir *root;
     - 	static int untracked_cache_disabled = -1;
      @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_struct *d
       	if (base_len || (pathspec && pathspec->nr))
       		return NULL;
     @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_st
       		return NULL;
       	}
       
     -+	/*
     -+	 * We don't support using or preparing the untracked cache if
     -+	 * the current effective flags don't match the configured
     -+	 * flags.
     -+	 */
     -+	if (dir->flags != new_untracked_cache_flags(istate))
     -+		return NULL;
     -+
      +	/*
      +	 * If the untracked structure we received does not have the same flags
     -+	 * as configured, but the configured flags do match the effective flags,
     -+	 * then we need to reset / create a new "untracked" structure to match
     -+	 * the new config.
     -+	 * Keeping the saved and used untracked cache in-line with the
     -+	 * configuration provides an opportunity for frequent users of
     -+	 * "git status -uall" to leverage the untracked cache by aligning their
     -+	 * configuration (setting "status.showuntrackedfiles" to "all" or
     -+	 * "normal" as appropriate), where previously this option was
     -+	 * incompatible with untracked cache and *consistently* caused
     -+	 * surprisingly bad performance (with fscache and fsmonitor enabled) on
     -+	 * Windows.
     -+	 *
     -+	 * IMPROVEMENT OPPORTUNITY: If we reworked the untracked cache storage
     -+	 * to not be as bound up with the desired output in a given run,
     -+	 * and instead iterated through and stored enough information to
     -+	 * correctly serve both "modes", then users could get peak performance
     -+	 * with or without '-uall' regardless of their
     -+	 * "status.showuntrackedfiles" config.
     ++	 * as requested in this run, we're going to need to either discard the
     ++	 * existing structure (and potentially later recreate), or bypass the
     ++	 * untracked cache mechanism for this run.
      +	 */
      +	if (dir->flags != dir->untracked->dir_flags) {
     -+		free_untracked_cache(istate->untracked);
     -+		new_untracked_cache(istate, dir->flags);
     -+		dir->untracked = istate->untracked;
     ++		/*
     ++		 * If the untracked structure we received does not have the same flags
     ++		 * as configured, then we need to reset / create a new "untracked"
     ++		 * structure to match the new config.
     ++		 *
     ++		 * Keeping the saved and used untracked cache consistent with the
     ++		 * configuration provides an opportunity for frequent users of
     ++		 * "git status -uall" to leverage the untracked cache by aligning their
     ++		 * configuration - setting "status.showuntrackedfiles" to "all" or
     ++		 * "normal" as appropriate.
     ++		 *
     ++		 * Previously using -uall (or setting "status.showuntrackedfiles" to
     ++		 * "all") was incompatible with untracked cache and *consistently*
     ++		 * caused surprisingly bad performance (with fscache and fsmonitor
     ++		 * enabled) on Windows.
     ++		 *
     ++		 * IMPROVEMENT OPPORTUNITY: If we reworked the untracked cache storage
     ++		 * to not be as bound up with the desired output in a given run,
     ++		 * and instead iterated through and stored enough information to
     ++		 * correctly serve both "modes", then users could get peak performance
     ++		 * with or without '-uall' regardless of their
     ++		 * "status.showuntrackedfiles" config.
     ++		 */
     ++		if (dir->untracked->dir_flags != new_untracked_cache_flags(istate)) {
     ++			free_untracked_cache(istate->untracked);
     ++			new_untracked_cache(istate, dir->flags);
     ++			dir->untracked = istate->untracked;
     ++		}
     ++		else {
     ++			/*
     ++			 * Current untracked cache data is consistent with config, but not
     ++			 * usable in this request/run; just bypass untracked cache.
     ++			 */
     ++			return NULL;
     ++		}
      +	}
      +
       	if (!dir->untracked->root) {
     @@ t/t7063-status-untracked-cache.sh: test_expect_success 'untracked cache remains
      +	test_cmp ../trace.expect ../trace.relevant
      +'
      +
     -+# Bypassing the untracked cache here is not desirable, but it expected
     -+# in the current implementation
     ++# Bypassing the untracked cache here is not desirable from an
     ++# end-user perspective, but is expected in the current design.
     ++# The untracked cache data stored for a -all run cannot be
     ++# correctly used in a -unormal run - it would yield incorrect
     ++# output.
      +test_expect_success 'if -uall is configured, untracked cache is bypassed with -unormal' '
      +	test_config status.showuntrackedfiles all &&
      +	: >../trace.output &&