mbox series

[v2,0/3] Empty untracked cache performance issue

Message ID pull.986.v2.git.1645809015.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Empty untracked cache performance issue | expand

Message

Philippe Blain via GitGitGadget Feb. 25, 2022, 5:10 p.m. UTC
This patchset addresses a performance issue with untracked cache. When a new
untracked cache structure is added to the index but remains empty,
subsequent "git status" calls populate it but do not write the index - so
they perform as though the index were disabled.

This situation can be caused in several different ways:

 * Running "git update-index --untracked-cache" on a repo that did not have
   the untracked cache
 * Modifying the git configuration to enable untracked cache, but then
   immediately running "git status -uall", causing the untracked cache to be
   created, but not used/populated (and the index written).
 * (likely others)

The patchset includes fixes to t7519, which otherwise starts failing because
it wasn't testing what it intended to.

Tao Klerks (3):
  t7519: avoid file to index mtime race for untracked cache
  t7519: populate untracked cache before test
  untracked-cache: write index when populating empty untracked cache

 dir.c                       | 10 +++++++---
 t/t7519-status-fsmonitor.sh |  7 +++++++
 2 files changed, 14 insertions(+), 3 deletions(-)


base-commit: 4c53a8c20f8984adb226293a3ffd7b88c3f4ac1a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-986%2FTaoK%2Ftaok-empty-untracked-cache-bug-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-986/TaoK/taok-empty-untracked-cache-bug-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/986

Range-diff vs v1:

 1:  c4d2a3cab4b ! 1:  9421b71540d Add a second's delay to t7519 for untracked cache
     @@ Metadata
      Author: Tao Klerks <tao@klerks.biz>
      
       ## Commit message ##
     -    Add a second's delay to t7519 for untracked cache
     +    t7519: avoid file to index mtime race for untracked cache
      
          In t7519 there is a test that writes files to disk, and immediately
     -    writes the untracked index. Because of mtime-comparison logic
     -    that uses a 1-second resolution, this means the cached entries
     -    are not trusted/used under some circumstances (see
     -    read-cache.c#is_racy_stat()).
     +    writes the untracked index. Because of mtime-comparison logic that
     +    uses a 1-second resolution, this means the cached entries are not
     +    trusted/used under some circumstances
     +    (see read-cache.c#is_racy_stat()).
      
     -    Untracked cache tests in t7063 use a 1-second delay to avoid
     -    this issue. We should do the same here.
     +    Untracked cache tests in t7063 use a 1-second delay to avoid this
     +    issue, but we don't want to introduce arbitrary slowdowns, so instead
     +    use test-tool chmtime to backdate the files slightly.
      
     -    This change doesn't actually affect the outcome of the test,
     -    but does enhance its validity, and becomes relevant after
     -    later changes
     +    This change doesn't actually affect the outcome of the test, but does
     +    enhance its validity, and becomes relevant after later changes.
      
          Signed-off-by: Tao Klerks <tao@klerks.biz>
      
       ## t/t7519-status-fsmonitor.sh ##
     -@@ t/t7519-status-fsmonitor.sh: clean_repo () {
     - 	git clean -fd
     - }
     - 
     -+avoid_racy() {
     -+	sleep 1
     -+}
     -+
     - dirty_repo () {
     - 	: >untracked &&
     - 	: >dir1/untracked &&
      @@ t/t7519-status-fsmonitor.sh: test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR'
     + 		cd dot-git &&
     + 		mkdir -p .git/hooks &&
     + 		: >tracked &&
     ++		test-tool chmtime =-60 tracked &&
     + 		: >modified &&
     ++		test-tool chmtime =-60 modified &&
     + 		mkdir dir1 &&
     + 		: >dir1/tracked &&
     ++		test-tool chmtime =-60 dir1/tracked &&
     + 		: >dir1/modified &&
     ++		test-tool chmtime =-60 dir1/modified &&
     + 		mkdir dir2 &&
     + 		: >dir2/tracked &&
     ++		test-tool chmtime =-60 dir2/tracked &&
       		: >dir2/modified &&
     ++		test-tool chmtime =-60 dir2/modified &&
       		write_integration_script &&
       		git config core.fsmonitor .git/hooks/fsmonitor-test &&
     -+		avoid_racy &&
       		git update-index --untracked-cache &&
     - 		git update-index --fsmonitor &&
     - 		GIT_TRACE2_PERF="$TRASH_DIRECTORY/trace-before" \
 2:  d63faad03a4 ! 2:  d29a68e65a0 In t7519, populate untracked cache before test
     @@ Metadata
      Author: Tao Klerks <tao@klerks.biz>
      
       ## Commit message ##
     -    In t7519, populate untracked cache before test
     +    t7519: populate untracked cache before test
      
     -    In its current state, the t7519 test dealing with untracked
     -    cache assumes that
     -    "git update-index --untracked-cache" will *populate*
     -    the untracked cache. This is not correct - it will only add
     -    an empty untracked cache structure to the index.
     +    In its current state, the t7519 test dealing with untracked cache
     +    assumes that "git update-index --untracked-cache" will *populate* the
     +    untracked cache. This is not correct - it will only add an empty
     +    untracked cache structure to the index.
      
     -    If we're going to compare two git status runs with
     -    something interesting happening in-between, we
     -    need to ensure that the index is in a stable/steady
     -    state *before* that first run.
     +    If we're going to compare two git status runs with something
     +    interesting happening in-between, we need to ensure that the index is
     +    in a stable/steady state *before* that first run.
      
     -    We achieve this by adding another prior "git status"
     -    run.
     +    Achieve this by adding another prior "git status" run.
      
     -    At this stage this change does nothing, because there
     -    is a bug, addressed in the next patch. whereby once
     -    the empty untracked cache structure is added by the
     -    update-index invocation, the untracked cache gets
     -    updated in every subsequent "git status" call, but the
     -    index with these updates does not get written down.
     +    At this stage this change does nothing, because there is a bug,
     +    addressed in the next patch, whereby once the empty untracked cache
     +    structure is added by the update-index invocation, the untracked cache
     +    gets updated in every subsequent "git status" call, but the index with
     +    these updates does not get written down.
      
     -    That bug actually invalidates this entire test case -
     -    but we're fixing that next.
     +    That bug actually invalidates this entire test case - but we're fixing
     +    that next.
      
          Signed-off-by: Tao Klerks <tao@klerks.biz>
      
       ## t/t7519-status-fsmonitor.sh ##
      @@ t/t7519-status-fsmonitor.sh: test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR'
     - 		avoid_racy &&
     + 		git config core.fsmonitor .git/hooks/fsmonitor-test &&
       		git update-index --untracked-cache &&
       		git update-index --fsmonitor &&
      +		git status &&
 3:  627f1952fd8 ! 3:  190b27e518a Write index when populating empty untracked cache
     @@ Metadata
      Author: Tao Klerks <tao@klerks.biz>
      
       ## Commit message ##
     -    Write index when populating empty untracked cache
     +    untracked-cache: write index when populating empty untracked cache
      
     -    It is expected that an empty/unpopulated untracked
     -    cache structure can be written to the index - by update-
     -    index, or by a "git status" call that sees the untracked cache
     -    should be enabled, but is running with options that make
     -    it non-applicable in that run.
     +    It is expected that an empty/unpopulated untracked cache structure can
     +    be written to the index - by update-index, or by a "git status" call
     +    that sees the untracked cache should be enabled and is not, but is
     +    running with options that make the untracked cache non-applicable in
     +    that run (eg a pathspec).
      
     -    Currently, if that happens, then subsequent "git status"
     -    calls end up populating the untracked cache, but not
     -    writing the index (not saving their work) - so the
     -    performance outcome is almost identical to the cache
     -    being altogether disabled.
     +    Currently, if that happens, then subsequent "git status" calls end up
     +    populating the untracked cache, but not writing the index (not saving
     +    their work) - so the performance outcome is almost identical to the
     +    cache being altogether disabled.
      
     -    This continues until the index gets written with the cache
     -    populated, for some *other* reason.
     +    This continues until the index gets written with the untracked cache
     +    populated, for some *other* reason, such as a working tree change.
      
     -    In this change, we detect the "existing cache is empty
     -    and it looks like we are using it" condition, and queues
     -    an index write when this happens.
     +    Detect the condition where an empty untracked cache exists in the
     +    index and we will collect the list of untracked paths, and queue an
     +    index write under that condition, so that the collected untracked
     +    paths can be written out to the untracked cache extension in the
     +    index.
      
     -    This change depends on previous fixes to t7519 for the
     -    "ignore .git changes when invalidating UNTR" test case to
     -    pass - before this fix, the test never actually did anything
     -    as it was not set up correctly.
     +    This change depends on previous fixes to t7519 for the "ignore .git
     +    changes when invalidating UNTR" test case to pass - before this fix,
     +    the test never actually did anything as it was not set up correctly.
      
          Signed-off-by: Tao Klerks <tao@klerks.biz>
      
     @@ dir.c: static struct untracked_cache_dir *validate_untracked_cache(struct dir_st
       
      -	if (!dir->untracked->root)
      +	if (!dir->untracked->root) {
     ++		/* Untracked cache existed but is not initialized; fix that */
       		FLEX_ALLOC_STR(dir->untracked->root, name, "");
     -+		/*
     -+		 * If we've had to initialize the root, then what we had was an
     -+		 * empty uninitialized untracked cache structure. We will be
     -+		 * populating it now, so we should trigger an index write.
     -+		 */
      +		istate->cache_changed |= UNTRACKED_CHANGED;
      +	}