diff mbox series

[v2,2/2] builtin/grep.c: integrate with sparse index

Message ID 20220829232843.183711-3-shaoxuan.yuan02@gmail.com (mailing list archive)
State Superseded
Headers show
Series grep: integrate with sparse index | expand

Commit Message

Shaoxuan Yuan Aug. 29, 2022, 11:28 p.m. UTC
Turn on sparse index and remove ensure_full_index().

Change it to only expands the index when using --sparse.

The p2000 tests demonstrate a ~99.4% execution time reduction for
`git grep` using a sparse index.

Test                                  Before       After
-----------------------------------------------------------------------------
git grep --cached bogus (full-v3)     0.019        0.018  (-5.2%)
git grep --cached bogus (full-v4)     0.017        0.016  (-5.8%)
git grep --cached bogus (sparse-v3)   0.29         0.0015 (-99.4%)
git grep --cached bogus (sparse-v4)   0.30         0.0018 (-99.4%)

Optional reading about performance test results
-----------------------------------------------
Notice that because `git-grep` needs to parse blobs in the index, the
index reading time is minuscule comparing to the object parsing time.
And because of this, the p2000 test results cannot clearly reflect the
speedup for index reading: combining with the object parsing time,
the aggregated time difference is extremely close between HEAD~1 and
HEAD.

Hence, the results presenting here are not directly extracted from the
p2000 test results. Instead, to make the performance difference more
visible, the test command is manually ran with GIT_TRACE2_PERF in the
four repos (full-v3, sparse-v3, full-v4, sparse-v4). The numbers here
are then extracted from the time difference between "region_enter" and
"region_leave" of label "do_read_index".

Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
---
 builtin/grep.c                           | 10 ++++++++--
 t/perf/p2000-sparse-operations.sh        |  1 +
 t/t1092-sparse-checkout-compatibility.sh | 18 ++++++++++++++++++
 3 files changed, 27 insertions(+), 2 deletions(-)

Comments

Derrick Stolee Aug. 30, 2022, 1:45 p.m. UTC | #1
On 8/29/2022 7:28 PM, Shaoxuan Yuan wrote:
> Turn on sparse index and remove ensure_full_index().
> 
> Change it to only expands the index when using --sparse.

s/expands/expand/

These two sentences should be combined, anyway.

  Enable the sparse index for 'git grep', and only call
  ensure_full_index() when the --sparse argument is provided.

> The p2000 tests demonstrate a ~99.4% execution time reduction for
> `git grep` using a sparse index.
> 
> Test                                  Before       After
> -----------------------------------------------------------------------------
> git grep --cached bogus (full-v3)     0.019        0.018  (-5.2%)
> git grep --cached bogus (full-v4)     0.017        0.016  (-5.8%)
> git grep --cached bogus (sparse-v3)   0.29         0.0015 (-99.4%)
> git grep --cached bogus (sparse-v4)   0.30         0.0018 (-99.4%)

Last time I asked that you don't present this to look like a
performance test to make it clear that it is not the end-to-end
process time. You removed the test numbers, but it still looks
like end-to-end process time, then elaborate after the table.

Instead, you can prepare the reader before the table using
something like this:

  The p2000 tests do not demonstrate a significant improvement,
  because the index read is a small portion of the full process
  time, compared to the blob parsing. The times below reflect the
  time spent in the "do_read_index" trace region as shown using
  GIT_TRACE2_PERF=1. 
> -	/* TODO: audit for interaction with sparse-index. */
> -	ensure_full_index(repo->index);
> +	if (grep_sparse)
> +		ensure_full_index(repo->index);
> +

As we've discussed, there are ways to remove even this call, but
that shouldn't hold up this series which is already an improvement.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/builtin/grep.c b/builtin/grep.c
index 12abd832fa..a0b4dbc1dc 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -522,8 +522,9 @@  static int grep_cache(struct grep_opt *opt,
 	if (repo_read_index(repo) < 0)
 		die(_("index file corrupt"));
 
-	/* TODO: audit for interaction with sparse-index. */
-	ensure_full_index(repo->index);
+	if (grep_sparse)
+		ensure_full_index(repo->index);
+
 	for (nr = 0; nr < repo->index->cache_nr; nr++) {
 		const struct cache_entry *ce = repo->index->cache[nr];
 
@@ -992,6 +993,11 @@  int cmd_grep(int argc, const char **argv, const char *prefix)
 			     PARSE_OPT_KEEP_DASHDASH |
 			     PARSE_OPT_STOP_AT_NON_OPTION);
 
+	if (the_repository->gitdir) {
+		prepare_repo_settings(the_repository);
+		the_repository->settings.command_requires_full_index = 0;
+	}
+
 	if (use_index && !startup_info->have_repository) {
 		int fallback = 0;
 		git_config_get_bool("grep.fallbacktonoindex", &fallback);
diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh
index fce8151d41..9a466fcbbe 100755
--- a/t/perf/p2000-sparse-operations.sh
+++ b/t/perf/p2000-sparse-operations.sh
@@ -124,5 +124,6 @@  test_perf_on_all git read-tree -mu HEAD
 test_perf_on_all git checkout-index -f --all
 test_perf_on_all git update-index --add --remove $SPARSE_CONE/a
 test_perf_on_all "git rm -f $SPARSE_CONE/a && git checkout HEAD -- $SPARSE_CONE/a"
+test_perf_on_all git grep --cached bogus
 
 test_done
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index a6a14c8a21..270b47840b 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -1972,4 +1972,22 @@  test_expect_success 'sparse index is not expanded: rm' '
 	ensure_not_expanded rm -r deep
 '
 
+test_expect_success 'grep with --sparse and --cached' '
+	init_repos &&
+
+	test_all_match git grep --sparse --cached a &&
+	test_all_match git grep --sparse --cached a -- "folder1/*"
+'
+
+test_expect_success 'grep is not expanded' '
+	init_repos &&
+
+	ensure_not_expanded grep a &&
+	ensure_not_expanded grep a -- deep/* &&
+
+	# All files within the folder1/* pathspec are sparse,
+	# so this command does not find any matches
+	ensure_not_expanded ! grep a -- folder1/*
+'
+
 test_done