diff mbox series

[v1,4/4] rm: integrate with sparse-index

Message ID 20220803045118.1243087-5-shaoxuan.yuan02@gmail.com (mailing list archive)
State Superseded
Headers show
Series rm: integrate with sparse-index | expand

Commit Message

Shaoxuan Yuan Aug. 3, 2022, 4:51 a.m. UTC
Enable the sparse index within the `git-rm` command.

The `p2000` tests demonstrate a ~96% execution time reduction for
'git rm' using a sparse index.

Test                                     before  after
-------------------------------------------------------------
2000.74: git rm -f f2/f4/a (full-v3)     0.66    0.88 +33.0%
2000.75: git rm -f f2/f4/a (full-v4)     0.67    0.75 +12.0%
2000.76: git rm -f f2/f4/a (sparse-v3)   1.99    0.08 -96.0%
2000.77: git rm -f f2/f4/a (sparse-v4)   2.06    0.07 -96.6%

Also, normalize a behavioral difference of `git-rm` under sparse-index.
See related discussion [1].

`git-rm` a sparse-directory entry within a sparse-index enabled repo
behaves differently from a sparse directory within a sparse-checkout
enabled repo.

For example, in a sparse-index repo, where 'folder1' is a
sparse-directory entry, `git rm -r --sparse folder1` provides this:

        rm 'folder1/'

Whereas in a sparse-checkout repo *without* sparse-index, doing so
provides this:

        rm 'folder1/0/0/0'
        rm 'folder1/0/1'
        rm 'folder1/a'

Because `git rm` a sparse-directory entry does not need to expand the
index, therefore we should accept the current behavior, which is faster
than "expand the sparse-directory entry to match the sparse-checkout
situation".

Modify a previous test so such difference is not considered as an error.

[1] https://github.com/ffyuanda/git/pull/6#discussion_r934861398

Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
---
 builtin/rm.c                             | 2 ++
 t/perf/p2000-sparse-operations.sh        | 1 +
 t/t1092-sparse-checkout-compatibility.sh | 6 +++---
 3 files changed, 6 insertions(+), 3 deletions(-)

Comments

Derrick Stolee Aug. 4, 2022, 2:48 p.m. UTC | #1
On 8/3/2022 12:51 AM, Shaoxuan Yuan wrote:
> Enable the sparse index within the `git-rm` command.
> 
> The `p2000` tests demonstrate a ~96% execution time reduction for
> 'git rm' using a sparse index.

Sorry that I got sidetracked yesterday when I was reviewing this
series, but I noticed something looking at these results:
 
> Test                                     before  after
> -------------------------------------------------------------
> 2000.74: git rm -f f2/f4/a (full-v3)     0.66    0.88 +33.0%
> 2000.75: git rm -f f2/f4/a (full-v4)     0.67    0.75 +12.0%

The range of _growth_ here seemed odd, so I wanted to check if this was
due to a small sample size or not.

> 2000.76: git rm -f f2/f4/a (sparse-v3)   1.99    0.08 -96.0%
> 2000.77: git rm -f f2/f4/a (sparse-v4)   2.06    0.07 -96.6%

These numbers are as expected.

>  test_perf_on_all git read-tree -mu HEAD
>  test_perf_on_all git checkout-index -f --all
>  test_perf_on_all git update-index --add --remove $SPARSE_CONE/a
> +test_perf_on_all git rm -f $SPARSE_CONE/a

At first, I was confused why we needed '-f' and thought that maybe
this was turning into a no-op after the first deletion. However, the
test_perf_on_all helper does an "echo >>$SPARSE_CONE/a" before hand,
so the file exists _in the worktree_ every time. That requires '-f'
since otherwise Git complains that we have modifications.

However, after the first instance the file no longer exists in the
index, so we are losing some testing of the index modification.

We can fix this by resetting the index in each test loop:

  test_perf_on_all "git rm -f $SPARSE_CONE/a && git checkout HEAD -- $SPARSE_CONE/a"

Running this version of the test with GIT_PERF_REPEAT_COUNT=10 and
using the Git repository itself, I get these numbers:

Test                              HEAD~1            HEAD
--------------------------------------------------------------------------
2000.74: git rm ... (full-v3)     0.41(0.37+0.05)   0.43(0.36+0.07) +4.9% 
2000.75: git rm ... (full-v4)     0.38(0.34+0.05)   0.39(0.35+0.05) +2.6% 
2000.76: git rm ... (sparse-v3)   0.57(0.56+0.01)   0.05(0.05+0.00) -91.2%
2000.77: git rm ... (sparse-v4)   0.57(0.55+0.02)   0.03(0.03+0.00) -94.7%

Yes, the 'git checkout' command is contributing to the overall
numbers, but it also already has the performance improvements of
the sparse-index, so it contributes only a little to the performance
on the left.

(Also note that the full index cases change only by amounts within
reasonable noise. The repeat count helps there.)

Thanks,
-Stolee
Shaoxuan Yuan Aug. 6, 2022, 3:18 a.m. UTC | #2
On 8/4/2022 10:48 PM, Derrick Stolee wrote:
 > On 8/3/2022 12:51 AM, Shaoxuan Yuan wrote:
 >> Enable the sparse index within the `git-rm` command.
 >>
 >> The `p2000` tests demonstrate a ~96% execution time reduction for
 >> 'git rm' using a sparse index.
 >
 > Sorry that I got sidetracked yesterday when I was reviewing this
 > series, but I noticed something looking at these results:
 >
 >> Test                                     before  after
 >> -------------------------------------------------------------
 >> 2000.74: git rm -f f2/f4/a (full-v3)     0.66    0.88 +33.0%
 >> 2000.75: git rm -f f2/f4/a (full-v4)     0.67    0.75 +12.0%
 >
 > The range of _growth_ here seemed odd, so I wanted to check if this was
 > due to a small sample size or not.

Yes, I do feel they are odd, as I've been checking pervious
integrations and p2000 results, they usuallly fall below 10% range.
But I was not discerning enough to name a problem :-(

 >> 2000.76: git rm -f f2/f4/a (sparse-v3)   1.99    0.08 -96.0%
 >> 2000.77: git rm -f f2/f4/a (sparse-v4)   2.06    0.07 -96.6%
 >
 > These numbers are as expected.
 >
 >>  test_perf_on_all git read-tree -mu HEAD
 >>  test_perf_on_all git checkout-index -f --all
 >>  test_perf_on_all git update-index --add --remove $SPARSE_CONE/a
 >> +test_perf_on_all git rm -f $SPARSE_CONE/a
 >
 > At first, I was confused why we needed '-f' and thought that maybe
 > this was turning into a no-op after the first deletion. However, the
 > test_perf_on_all helper does an "echo >>$SPARSE_CONE/a" before hand,
 > so the file exists _in the worktree_ every time. That requires '-f'
 > since otherwise Git complains that we have modifications.

Yeah, it took me some time to find out.

 > However, after the first instance the file no longer exists in the
 > index, so we are losing some testing of the index modification.

So true, I didn't realize at all.

 > We can fix this by resetting the index in each test loop:
 >
 >   test_perf_on_all "git rm -f $SPARSE_CONE/a && git checkout HEAD -- 
$SPARSE_CONE/a"
 >
 > Running this version of the test with GIT_PERF_REPEAT_COUNT=10 and
 > using the Git repository itself, I get these numbers:
 >
 > Test                              HEAD~1            HEAD
 > 
--------------------------------------------------------------------------
 > 2000.74: git rm ... (full-v3)     0.41(0.37+0.05) 0.43(0.36+0.07) +4.9%
 > 2000.75: git rm ... (full-v4)     0.38(0.34+0.05) 0.39(0.35+0.05) +2.6%
 > 2000.76: git rm ... (sparse-v3)   0.57(0.56+0.01) 0.05(0.05+0.00) -91.2%
 > 2000.77: git rm ... (sparse-v4)   0.57(0.55+0.02) 0.03(0.03+0.00) -94.7%
 >
 > Yes, the 'git checkout' command is contributing to the overall
 > numbers, but it also already has the performance improvements of
 > the sparse-index, so it contributes only a little to the performance
 > on the left.
 >
 > (Also note that the full index cases change only by amounts within
 > reasonable noise. The repeat count helps there.)

New thing learned, repeat to average out noise.

--
Thanks,
Shaoxuan
diff mbox series

Patch

diff --git a/builtin/rm.c b/builtin/rm.c
index 58ed924f0d..b6ba859fe4 100644
--- a/builtin/rm.c
+++ b/builtin/rm.c
@@ -287,6 +287,8 @@  int cmd_rm(int argc, const char **argv, const char *prefix)
 	if (!index_only)
 		setup_work_tree();
 
+	prepare_repo_settings(the_repository);
+	the_repository->settings.command_requires_full_index = 0;
 	hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);
 
 	if (read_cache() < 0)
diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index c181110a43..853513eb9b 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -123,5 +123,6 @@  test_perf_on_all git blame $SPARSE_CONE/f3/a
 test_perf_on_all git read-tree -mu HEAD
 test_perf_on_all git checkout-index -f --all
 test_perf_on_all git update-index --add --remove $SPARSE_CONE/a
+test_perf_on_all git rm -f $SPARSE_CONE/a
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 75649e3265..58632fe483 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -912,7 +912,7 @@  test_expect_success 'read-tree --prefix' '
 	test_all_match git read-tree --prefix=deep/deeper1/deepest -u deepest &&
 	test_all_match git status --porcelain=v2 &&
 
-	test_all_match git rm -rf --sparse folder1/ &&
+	run_on_all git rm -rf --sparse folder1/ &&
 	test_all_match git read-tree --prefix=folder1/ -u update-folder1 &&
 	test_all_match git status --porcelain=v2 &&
 
@@ -1870,7 +1870,7 @@  test_expect_success 'rm pathspec inside sparse definition' '
 	test_all_match git status --porcelain=v2
 '
 
-test_expect_failure 'rm pathspec outside sparse definition' '
+test_expect_success 'rm pathspec outside sparse definition' '
 	init_repos &&
 
 	for file in folder1/a folder1/0/1
@@ -1910,7 +1910,7 @@  test_expect_failure 'rm pathspec outside sparse definition' '
 	test_sparse_match git status --porcelain=v2
 '
 
-test_expect_failure 'sparse index is not expanded: rm' '
+test_expect_success 'sparse index is not expanded: rm' '
 	init_repos &&
 
 	ensure_not_expanded rm deep/a &&