Message ID | 20220803045118.1243087-5-shaoxuan.yuan02@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | rm: integrate with sparse-index | expand |
On 8/3/2022 12:51 AM, Shaoxuan Yuan wrote: > Enable the sparse index within the `git-rm` command. > > The `p2000` tests demonstrate a ~96% execution time reduction for > 'git rm' using a sparse index. Sorry that I got sidetracked yesterday when I was reviewing this series, but I noticed something looking at these results: > Test before after > ------------------------------------------------------------- > 2000.74: git rm -f f2/f4/a (full-v3) 0.66 0.88 +33.0% > 2000.75: git rm -f f2/f4/a (full-v4) 0.67 0.75 +12.0% The range of _growth_ here seemed odd, so I wanted to check if this was due to a small sample size or not. > 2000.76: git rm -f f2/f4/a (sparse-v3) 1.99 0.08 -96.0% > 2000.77: git rm -f f2/f4/a (sparse-v4) 2.06 0.07 -96.6% These numbers are as expected. > test_perf_on_all git read-tree -mu HEAD > test_perf_on_all git checkout-index -f --all > test_perf_on_all git update-index --add --remove $SPARSE_CONE/a > +test_perf_on_all git rm -f $SPARSE_CONE/a At first, I was confused why we needed '-f' and thought that maybe this was turning into a no-op after the first deletion. However, the test_perf_on_all helper does an "echo >>$SPARSE_CONE/a" before hand, so the file exists _in the worktree_ every time. That requires '-f' since otherwise Git complains that we have modifications. However, after the first instance the file no longer exists in the index, so we are losing some testing of the index modification. We can fix this by resetting the index in each test loop: test_perf_on_all "git rm -f $SPARSE_CONE/a && git checkout HEAD -- $SPARSE_CONE/a" Running this version of the test with GIT_PERF_REPEAT_COUNT=10 and using the Git repository itself, I get these numbers: Test HEAD~1 HEAD -------------------------------------------------------------------------- 2000.74: git rm ... (full-v3) 0.41(0.37+0.05) 0.43(0.36+0.07) +4.9% 2000.75: git rm ... (full-v4) 0.38(0.34+0.05) 0.39(0.35+0.05) +2.6% 2000.76: git rm ... (sparse-v3) 0.57(0.56+0.01) 0.05(0.05+0.00) -91.2% 2000.77: git rm ... (sparse-v4) 0.57(0.55+0.02) 0.03(0.03+0.00) -94.7% Yes, the 'git checkout' command is contributing to the overall numbers, but it also already has the performance improvements of the sparse-index, so it contributes only a little to the performance on the left. (Also note that the full index cases change only by amounts within reasonable noise. The repeat count helps there.) Thanks, -Stolee
On 8/4/2022 10:48 PM, Derrick Stolee wrote: > On 8/3/2022 12:51 AM, Shaoxuan Yuan wrote: >> Enable the sparse index within the `git-rm` command. >> >> The `p2000` tests demonstrate a ~96% execution time reduction for >> 'git rm' using a sparse index. > > Sorry that I got sidetracked yesterday when I was reviewing this > series, but I noticed something looking at these results: > >> Test before after >> ------------------------------------------------------------- >> 2000.74: git rm -f f2/f4/a (full-v3) 0.66 0.88 +33.0% >> 2000.75: git rm -f f2/f4/a (full-v4) 0.67 0.75 +12.0% > > The range of _growth_ here seemed odd, so I wanted to check if this was > due to a small sample size or not. Yes, I do feel they are odd, as I've been checking pervious integrations and p2000 results, they usuallly fall below 10% range. But I was not discerning enough to name a problem :-( >> 2000.76: git rm -f f2/f4/a (sparse-v3) 1.99 0.08 -96.0% >> 2000.77: git rm -f f2/f4/a (sparse-v4) 2.06 0.07 -96.6% > > These numbers are as expected. > >> test_perf_on_all git read-tree -mu HEAD >> test_perf_on_all git checkout-index -f --all >> test_perf_on_all git update-index --add --remove $SPARSE_CONE/a >> +test_perf_on_all git rm -f $SPARSE_CONE/a > > At first, I was confused why we needed '-f' and thought that maybe > this was turning into a no-op after the first deletion. However, the > test_perf_on_all helper does an "echo >>$SPARSE_CONE/a" before hand, > so the file exists _in the worktree_ every time. That requires '-f' > since otherwise Git complains that we have modifications. Yeah, it took me some time to find out. > However, after the first instance the file no longer exists in the > index, so we are losing some testing of the index modification. So true, I didn't realize at all. > We can fix this by resetting the index in each test loop: > > test_perf_on_all "git rm -f $SPARSE_CONE/a && git checkout HEAD -- $SPARSE_CONE/a" > > Running this version of the test with GIT_PERF_REPEAT_COUNT=10 and > using the Git repository itself, I get these numbers: > > Test HEAD~1 HEAD > -------------------------------------------------------------------------- > 2000.74: git rm ... (full-v3) 0.41(0.37+0.05) 0.43(0.36+0.07) +4.9% > 2000.75: git rm ... (full-v4) 0.38(0.34+0.05) 0.39(0.35+0.05) +2.6% > 2000.76: git rm ... (sparse-v3) 0.57(0.56+0.01) 0.05(0.05+0.00) -91.2% > 2000.77: git rm ... (sparse-v4) 0.57(0.55+0.02) 0.03(0.03+0.00) -94.7% > > Yes, the 'git checkout' command is contributing to the overall > numbers, but it also already has the performance improvements of > the sparse-index, so it contributes only a little to the performance > on the left. > > (Also note that the full index cases change only by amounts within > reasonable noise. The repeat count helps there.) New thing learned, repeat to average out noise. -- Thanks, Shaoxuan
diff --git a/builtin/rm.c b/builtin/rm.c index 58ed924f0d..b6ba859fe4 100644 --- a/builtin/rm.c +++ b/builtin/rm.c @@ -287,6 +287,8 @@ int cmd_rm(int argc, const char **argv, const char *prefix) if (!index_only) setup_work_tree(); + prepare_repo_settings(the_repository); + the_repository->settings.command_requires_full_index = 0; hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR); if (read_cache() < 0) diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh index c181110a43..853513eb9b 100755 --- a/t/perf/p2000-sparse-operations.sh +++ b/t/perf/p2000-sparse-operations.sh @@ -123,5 +123,6 @@ test_perf_on_all git blame $SPARSE_CONE/f3/a test_perf_on_all git read-tree -mu HEAD test_perf_on_all git checkout-index -f --all test_perf_on_all git update-index --add --remove $SPARSE_CONE/a +test_perf_on_all git rm -f $SPARSE_CONE/a test_done diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 75649e3265..58632fe483 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -912,7 +912,7 @@ test_expect_success 'read-tree --prefix' ' test_all_match git read-tree --prefix=deep/deeper1/deepest -u deepest && test_all_match git status --porcelain=v2 && - test_all_match git rm -rf --sparse folder1/ && + run_on_all git rm -rf --sparse folder1/ && test_all_match git read-tree --prefix=folder1/ -u update-folder1 && test_all_match git status --porcelain=v2 && @@ -1870,7 +1870,7 @@ test_expect_success 'rm pathspec inside sparse definition' ' test_all_match git status --porcelain=v2 ' -test_expect_failure 'rm pathspec outside sparse definition' ' +test_expect_success 'rm pathspec outside sparse definition' ' init_repos && for file in folder1/a folder1/0/1 @@ -1910,7 +1910,7 @@ test_expect_failure 'rm pathspec outside sparse definition' ' test_sparse_match git status --porcelain=v2 ' -test_expect_failure 'sparse index is not expanded: rm' ' +test_expect_success 'sparse index is not expanded: rm' ' init_repos && ensure_not_expanded rm deep/a &&
Enable the sparse index within the `git-rm` command. The `p2000` tests demonstrate a ~96% execution time reduction for 'git rm' using a sparse index. Test before after ------------------------------------------------------------- 2000.74: git rm -f f2/f4/a (full-v3) 0.66 0.88 +33.0% 2000.75: git rm -f f2/f4/a (full-v4) 0.67 0.75 +12.0% 2000.76: git rm -f f2/f4/a (sparse-v3) 1.99 0.08 -96.0% 2000.77: git rm -f f2/f4/a (sparse-v4) 2.06 0.07 -96.6% Also, normalize a behavioral difference of `git-rm` under sparse-index. See related discussion [1]. `git-rm` a sparse-directory entry within a sparse-index enabled repo behaves differently from a sparse directory within a sparse-checkout enabled repo. For example, in a sparse-index repo, where 'folder1' is a sparse-directory entry, `git rm -r --sparse folder1` provides this: rm 'folder1/' Whereas in a sparse-checkout repo *without* sparse-index, doing so provides this: rm 'folder1/0/0/0' rm 'folder1/0/1' rm 'folder1/a' Because `git rm` a sparse-directory entry does not need to expand the index, therefore we should accept the current behavior, which is faster than "expand the sparse-directory entry to match the sparse-checkout situation". Modify a previous test so such difference is not considered as an error. [1] https://github.com/ffyuanda/git/pull/6#discussion_r934861398 Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> --- builtin/rm.c | 2 ++ t/perf/p2000-sparse-operations.sh | 1 + t/t1092-sparse-checkout-compatibility.sh | 6 +++--- 3 files changed, 6 insertions(+), 3 deletions(-)