From patchwork Mon Aug 29 23:28:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shaoxuan Yuan X-Patchwork-Id: 12958534 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9131ECAAD4 for ; Mon, 29 Aug 2022 23:29:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229679AbiH2X3Q (ORCPT ); Mon, 29 Aug 2022 19:29:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40448 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229572AbiH2X3I (ORCPT ); Mon, 29 Aug 2022 19:29:08 -0400 Received: from mail-qk1-x729.google.com (mail-qk1-x729.google.com [IPv6:2607:f8b0:4864:20::729]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF904876A4 for ; Mon, 29 Aug 2022 16:29:04 -0700 (PDT) Received: by mail-qk1-x729.google.com with SMTP id f14so7288348qkm.0 for ; Mon, 29 Aug 2022 16:29:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=CYwbk3QoZfWtJBa32JM5RLzyASvy1XW8ZyJqOdkx03g=; b=n1vcehgHcNbsHC24ZgQd9RCIh3HbqUB2LJ5dXojBgM8LGom/FBSHgzEEeXvKDvQ6dl g8hnKnuEjKg1UFUO1gqXC6vA7b9xSNYXoqn4pqMY7g5EGLoY7kRI/MLt8dzQ+TdsO3/S F8AWzEEUmznphE5+6BQEaNfjPeaoCG4Zy07IZiwF7/7+wlB4dHhPI9vDae9KS4hYnhLb BQQlJGHsDTfRSJKjm343zIl6V+RiwCsHZly6CecwuPaDTLrgmyLdjmqCtJOKl6vgZlcB kLpiWKG7E/Gcex1FI1mZjRTGDYb+nuBZZMhuO73OewE1kPvQrWLQWfbN4xZH0iDTDbN6 oCig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=CYwbk3QoZfWtJBa32JM5RLzyASvy1XW8ZyJqOdkx03g=; b=v7we1f/u682L+bz1C0AK+/elm075vZ9/04irwNZEsZZXdz+RA/ydrk3fU1Fl7gab+C YnkU8y8CW1xYygFCEOVgTLDmqPGOSEvfaivQIsgykUFzW2aCfmbNN2++OUdY1JTW3Q+t INu+0Ps2/Ne9Wglv46lgrJFQo/xM1fKp7IDaLv4ukOmBwifExMhNJMemtX2KUibJVQCi CdKjQXcA6bYd3Il3WBCq/oqYaLDNY7CxzVQIMm2TPsOXYj3JxJhcgHUENfBGPXwBzaq/ NuxI/REAmNdd85+zdS1dDKxbYVUtdj8P2M34mD2n0YRgV5bqd583gUMYB+1ZW40BtY39 QBCw== X-Gm-Message-State: ACgBeo0japFeO051U4ofMiAvoXpctrHLwLJ23YAridg+oAiiKgPEHAn7 +L7kVbx5uA6UBpgkLg/6G7XDR1i/UpkbXA== X-Google-Smtp-Source: AA6agR5sfRMgOTGzF2ifyeAbsxGHrPZt2hCrYfPi/zkL73xloeyQ02csBKgEsFO8p5mOKL9VmR7Rgw== X-Received: by 2002:ae9:f302:0:b0:6bc:c42:e80d with SMTP id p2-20020ae9f302000000b006bc0c42e80dmr10081230qkg.426.1661815743803; Mon, 29 Aug 2022 16:29:03 -0700 (PDT) Received: from ffyuanda.localdomain (99-110-131-145.lightspeed.irvnca.sbcglobal.net. [99.110.131.145]) by smtp.gmail.com with ESMTPSA id l7-20020a05620a28c700b006a793bde241sm6679802qkp.63.2022.08.29.16.29.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Aug 2022 16:29:03 -0700 (PDT) From: Shaoxuan Yuan To: git@vger.kernel.org Cc: derrickstolee@github.com, vdye@github.com, Shaoxuan Yuan Subject: [PATCH v2 1/2] builtin/grep.c: add --sparse option Date: Mon, 29 Aug 2022 16:28:42 -0700 Message-Id: <20220829232843.183711-2-shaoxuan.yuan02@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: <20220829232843.183711-1-shaoxuan.yuan02@gmail.com> References: <20220817075633.217934-1-shaoxuan.yuan02@gmail.com> <20220829232843.183711-1-shaoxuan.yuan02@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Add a --sparse option to `git-grep`. When the '--cached' option is used with the 'git grep' command, the search is limited to the blobs found in the index, not in the worktree. If the user has enabled sparse-checkout, this might present more results than they would like, since the files outside of the sparse-checkout are unlikely to be important to them. Change the default behavior of 'git grep' to focus on the files within the sparse-checkout definition. To enable the previous behavior, add a '--sparse' option to 'git grep' that triggers the old behavior that inspects paths outside of the sparse-checkout definition when paired with the '--cached' option. Helped-by: Derrick Stolee Suggested-by: Victoria Dye Signed-off-by: Shaoxuan Yuan --- Documentation/git-grep.txt | 5 ++++- builtin/grep.c | 10 +++++++++- t/t7817-grep-sparse-checkout.sh | 34 +++++++++++++++++++++++++++------ 3 files changed, 41 insertions(+), 8 deletions(-) diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 58d944bd57..bdd3d5b8a6 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -28,7 +28,7 @@ SYNOPSIS [-f ] [-e] [--and|--or|--not|(|)|-e ...] [--recurse-submodules] [--parent-basename ] - [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | ...] + [ [--[no-]exclude-standard] [--cached [--sparse] | --no-index | --untracked] | ...] [--] [...] DESCRIPTION @@ -45,6 +45,9 @@ OPTIONS Instead of searching tracked files in the working tree, search blobs registered in the index file. +--sparse:: + Use with --cached. Search outside of sparse-checkout definition. + --no-index:: Search files in the current directory that is not managed by Git. diff --git a/builtin/grep.c b/builtin/grep.c index e6bcdf860c..12abd832fa 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -96,6 +96,8 @@ static pthread_cond_t cond_result; static int skip_first_line; +static int grep_sparse = 0; + static void add_work(struct grep_opt *opt, struct grep_source *gs) { if (opt->binary != GREP_BINARY_TEXT) @@ -525,7 +527,11 @@ static int grep_cache(struct grep_opt *opt, for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; - if (!cached && ce_skip_worktree(ce)) + /* + * Skip entries with SKIP_WORKTREE unless both --sparse and + * --cached are given. + */ + if (!(grep_sparse && cached) && ce_skip_worktree(ce)) continue; strbuf_setlen(&name, name_base_len); @@ -963,6 +969,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix) PARSE_OPT_NOCOMPLETE), OPT_INTEGER('m', "max-count", &opt.max_count, N_("maximum number of results per file")), + OPT_BOOL(0, "sparse", &grep_sparse, + N_("search the contents of files outside the sparse-checkout definition")), OPT_END() }; grep_prefix = prefix; diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh index eb59564565..a9879cc980 100755 --- a/t/t7817-grep-sparse-checkout.sh +++ b/t/t7817-grep-sparse-checkout.sh @@ -118,13 +118,19 @@ test_expect_success 'grep searches unmerged file despite not matching sparsity p test_cmp expect actual ' -test_expect_success 'grep --cached searches entries with the SKIP_WORKTREE bit' ' +test_expect_success 'grep --cached and --sparse searches entries with the SKIP_WORKTREE bit' ' + cat >expect <<-EOF && + a:text + EOF + git grep --cached "text" >actual && + test_cmp expect actual && + cat >expect <<-EOF && a:text b:text dir/c:text EOF - git grep --cached "text" >actual && + git grep --cached --sparse "text" >actual && test_cmp expect actual ' @@ -143,7 +149,15 @@ test_expect_success 'grep --recurse-submodules honors sparse checkout in submodu test_cmp expect actual ' -test_expect_success 'grep --recurse-submodules --cached searches entries with the SKIP_WORKTREE bit' ' +test_expect_success 'grep --recurse-submodules --cached and --sparse searches entries with the SKIP_WORKTREE bit' ' + cat >expect <<-EOF && + a:text + sub/B/b:text + sub2/a:text + EOF + git grep --recurse-submodules --cached "text" >actual && + test_cmp expect actual && + cat >expect <<-EOF && a:text b:text @@ -152,7 +166,7 @@ test_expect_success 'grep --recurse-submodules --cached searches entries with th sub/B/b:text sub2/a:text EOF - git grep --recurse-submodules --cached "text" >actual && + git grep --recurse-submodules --cached --sparse "text" >actual && test_cmp expect actual ' @@ -166,7 +180,15 @@ test_expect_success 'working tree grep does not search the index with CE_VALID a test_cmp expect actual ' -test_expect_success 'grep --cached searches index entries with both CE_VALID and SKIP_WORKTREE' ' +test_expect_success 'grep --cached and --sparse searches index entries with both CE_VALID and SKIP_WORKTREE' ' + cat >expect <<-EOF && + a:text + EOF + test_when_finished "git update-index --no-assume-unchanged b" && + git update-index --assume-unchanged b && + git grep --cached text >actual && + test_cmp expect actual && + cat >expect <<-EOF && a:text b:text @@ -174,7 +196,7 @@ test_expect_success 'grep --cached searches index entries with both CE_VALID and EOF test_when_finished "git update-index --no-assume-unchanged b" && git update-index --assume-unchanged b && - git grep --cached text >actual && + git grep --cached --sparse text >actual && test_cmp expect actual ' From patchwork Mon Aug 29 23:28:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shaoxuan Yuan X-Patchwork-Id: 12958533 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7A71ECAAD4 for ; Mon, 29 Aug 2022 23:29:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229651AbiH2X3M (ORCPT ); Mon, 29 Aug 2022 19:29:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40442 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229488AbiH2X3H (ORCPT ); Mon, 29 Aug 2022 19:29:07 -0400 Received: from mail-qv1-xf2e.google.com (mail-qv1-xf2e.google.com [IPv6:2607:f8b0:4864:20::f2e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1991D83BF4 for ; Mon, 29 Aug 2022 16:29:06 -0700 (PDT) Received: by mail-qv1-xf2e.google.com with SMTP id l5so7487182qvs.13 for ; Mon, 29 Aug 2022 16:29:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=j0cjOcQ+vMiF17X5msejAmkCgE/96Qt1wuz7o/avwHw=; b=O8U1r1sp9QezIDOq8NDutg9fR4z9eY+I9MgLeMIRqh1mCJ1z0GVnJh/P0813uDZ7PR +TdAjLRweChuSKzthb/APQN9YydyEmb7eyvoVPSh0rBA/PV4ceQb4gkVhhj2nQ40teRN Mq3W3Yqe2BhA1jiqKrC6iwOUNyu2gCEY5rCALkr3tu7iXOkPHEdHmHqDKLeCM+iA+ATT 9xQJQtnwCZKxkfpcgpqtd5ic2TeDQsqnymzaCjtc6UMscrap/ASuUdAKFjgddPEDX0SI FpbbvtBUCjefWIc5vY9liQlGSr8ObAtG4s1cnct969AfL821sPqOymE3nGTU9bOth0WX Jtvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=j0cjOcQ+vMiF17X5msejAmkCgE/96Qt1wuz7o/avwHw=; b=vGEUVnNcLxQjmLtmDRxhwxAq0QA88EvcSYPS+PgX7kBdJaOBE01/flXrtxIqCy27tP octTMw8WeJry4mMMfmwOxwAi4ez+71ztM18Unls/hCRDE0ucBgrsZDtHfMrsS3fOQYSV N0brTV0+NzD6lhriwW+5/472V1QU5EA4yj8pI/PkF0czvr7vexuixcXiZDA6qz3wUXLI tpNdP9Ll/d/Zh5HA8lUuml0Uunuf6X+ombkKo1kfo40x+etxKdh0iC+5FRNmstdcvOAj n73cYcVHGu69aFFcilmlnRX6pB6+Wt38X4BROFD0oUZoQTg94GAmRJ8hsLm5ieFzeUlu 0xBA== X-Gm-Message-State: ACgBeo0maNtiquR96iaL+FZDo1C9CAzrxf1swTU3QivloWes9ISWNdcy 6eNyOKy9ktlRem1ZMYP/mogF0vlbQLEifA== X-Google-Smtp-Source: AA6agR4C2IphAfUgIX8myfgMcsWJdLKVVLLPeySbrlf10lNDs0kxMs6NYwbosw8Q2u0B1x4Xas4mwg== X-Received: by 2002:a05:6214:e4d:b0:499:42f:bebd with SMTP id o13-20020a0562140e4d00b00499042fbebdmr5249198qvc.102.1661815744970; Mon, 29 Aug 2022 16:29:04 -0700 (PDT) Received: from ffyuanda.localdomain (99-110-131-145.lightspeed.irvnca.sbcglobal.net. [99.110.131.145]) by smtp.gmail.com with ESMTPSA id l7-20020a05620a28c700b006a793bde241sm6679802qkp.63.2022.08.29.16.29.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Aug 2022 16:29:04 -0700 (PDT) From: Shaoxuan Yuan To: git@vger.kernel.org Cc: derrickstolee@github.com, vdye@github.com, Shaoxuan Yuan Subject: [PATCH v2 2/2] builtin/grep.c: integrate with sparse index Date: Mon, 29 Aug 2022 16:28:43 -0700 Message-Id: <20220829232843.183711-3-shaoxuan.yuan02@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: <20220829232843.183711-1-shaoxuan.yuan02@gmail.com> References: <20220817075633.217934-1-shaoxuan.yuan02@gmail.com> <20220829232843.183711-1-shaoxuan.yuan02@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Turn on sparse index and remove ensure_full_index(). Change it to only expands the index when using --sparse. The p2000 tests demonstrate a ~99.4% execution time reduction for `git grep` using a sparse index. Test Before After ----------------------------------------------------------------------------- git grep --cached bogus (full-v3) 0.019 0.018 (-5.2%) git grep --cached bogus (full-v4) 0.017 0.016 (-5.8%) git grep --cached bogus (sparse-v3) 0.29 0.0015 (-99.4%) git grep --cached bogus (sparse-v4) 0.30 0.0018 (-99.4%) Optional reading about performance test results ----------------------------------------------- Notice that because `git-grep` needs to parse blobs in the index, the index reading time is minuscule comparing to the object parsing time. And because of this, the p2000 test results cannot clearly reflect the speedup for index reading: combining with the object parsing time, the aggregated time difference is extremely close between HEAD~1 and HEAD. Hence, the results presenting here are not directly extracted from the p2000 test results. Instead, to make the performance difference more visible, the test command is manually ran with GIT_TRACE2_PERF in the four repos (full-v3, sparse-v3, full-v4, sparse-v4). The numbers here are then extracted from the time difference between "region_enter" and "region_leave" of label "do_read_index". Helped-by: Derrick Stolee Signed-off-by: Shaoxuan Yuan --- builtin/grep.c | 10 ++++++++-- t/perf/p2000-sparse-operations.sh | 1 + t/t1092-sparse-checkout-compatibility.sh | 18 ++++++++++++++++++ 3 files changed, 27 insertions(+), 2 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index 12abd832fa..a0b4dbc1dc 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -522,8 +522,9 @@ static int grep_cache(struct grep_opt *opt, if (repo_read_index(repo) < 0) die(_("index file corrupt")); - /* TODO: audit for interaction with sparse-index. */ - ensure_full_index(repo->index); + if (grep_sparse) + ensure_full_index(repo->index); + for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; @@ -992,6 +993,11 @@ int cmd_grep(int argc, const char **argv, const char *prefix) PARSE_OPT_KEEP_DASHDASH | PARSE_OPT_STOP_AT_NON_OPTION); + if (the_repository->gitdir) { + prepare_repo_settings(the_repository); + the_repository->settings.command_requires_full_index = 0; + } + if (use_index && !startup_info->have_repository) { int fallback = 0; git_config_get_bool("grep.fallbacktonoindex", &fallback); diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh index fce8151d41..9a466fcbbe 100755 --- a/t/perf/p2000-sparse-operations.sh +++ b/t/perf/p2000-sparse-operations.sh @@ -124,5 +124,6 @@ test_perf_on_all git read-tree -mu HEAD test_perf_on_all git checkout-index -f --all test_perf_on_all git update-index --add --remove $SPARSE_CONE/a test_perf_on_all "git rm -f $SPARSE_CONE/a && git checkout HEAD -- $SPARSE_CONE/a" +test_perf_on_all git grep --cached bogus test_done diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index a6a14c8a21..270b47840b 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -1972,4 +1972,22 @@ test_expect_success 'sparse index is not expanded: rm' ' ensure_not_expanded rm -r deep ' +test_expect_success 'grep with --sparse and --cached' ' + init_repos && + + test_all_match git grep --sparse --cached a && + test_all_match git grep --sparse --cached a -- "folder1/*" +' + +test_expect_success 'grep is not expanded' ' + init_repos && + + ensure_not_expanded grep a && + ensure_not_expanded grep a -- deep/* && + + # All files within the folder1/* pathspec are sparse, + # so this command does not find any matches + ensure_not_expanded ! grep a -- folder1/* +' + test_done