From patchwork Thu Sep 1 04:57:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shaoxuan Yuan X-Patchwork-Id: 12961774 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C710ECAAD1 for ; Thu, 1 Sep 2022 04:59:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232702AbiIAE7H (ORCPT ); Thu, 1 Sep 2022 00:59:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232690AbiIAE7E (ORCPT ); Thu, 1 Sep 2022 00:59:04 -0400 Received: from mail-qt1-x832.google.com (mail-qt1-x832.google.com [IPv6:2607:f8b0:4864:20::832]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D81B114C6C for ; Wed, 31 Aug 2022 21:59:03 -0700 (PDT) Received: by mail-qt1-x832.google.com with SMTP id y18so12605023qtv.5 for ; Wed, 31 Aug 2022 21:59:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=CYwbk3QoZfWtJBa32JM5RLzyASvy1XW8ZyJqOdkx03g=; b=GxNjIXgqB4ENm/yAMEd5QDHPr5HFnTBrUPLlrnV3U9cKm85sh95NkPK78n4AXbFrXZ yVZcSzXaPfPJHqMz/n7exGNeUsyjpvAAxXi61BnpXukXWuuDCVJmEZLkqlSVcVCHDePg a8dljy4dNHxvwZ1HvD6q0u3RQQooazsx2ElayWZ5jhv8t1z5EgqVUK3XXA5Ay5/qpuUW AJw7Gu7AguBGLSPhFfP/BLw61B0r82+flyN8iHGBgp5UB2WYfb6RcT6n1fL1b0pUax5v fcuaB3LU5ZgdEJ9eW/G6st3ZvC7nQT1I0CkXfCd0FySjvNkyKaPVrnZxg0gMFTIRxhol jlMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=CYwbk3QoZfWtJBa32JM5RLzyASvy1XW8ZyJqOdkx03g=; b=nUPZ1eryosnqKVIP8j9wUV5n7fVpBN0nUvilL4PpOeQ7SQd1GmQjg5yGh1ix8flvnC XbNhb+31PgB3RKifdeg+nUP3i99321Yqmz/3Gy9jQYdfVT4A18FrDC51HC5qrH2mWnpM FVt60Uz0/NdoHiesc2YeHM3G6H7rGKybwyFKOVoetqi2N46zGCrt7e8spW4zu29k15mH usF9zyHTIKtq8rvDw6ulCs5QL40wcdnFpyuKgLwIW379trr4IqSKturezzJ5ipKI36z6 eFvNPNnqoJP2Tcf/J7g8vh0F3WXYSfXM7heiSE2rISkKuZ3BlBOUWgcXv2E7AjAXzhjW Cy/g== X-Gm-Message-State: ACgBeo3yPazDxQKq+Ea3Abiv/4Y0Vqv2Km5D6k1bWE5k0QtlU6iwQiLI kf1yyg7ya5uYiaunBo36SBP5QU6BIcU= X-Google-Smtp-Source: AA6agR7ERYgKFdZKoKgHqMHGs04YBs9SyLTHXyZnXUhzquksfwPZyZtJEt8etJImYvoyG/EdZnBkmw== X-Received: by 2002:a05:622a:1184:b0:343:6481:2ae3 with SMTP id m4-20020a05622a118400b0034364812ae3mr22731558qtk.305.1662008342435; Wed, 31 Aug 2022 21:59:02 -0700 (PDT) Received: from ffyuanda.localdomain (99-110-131-145.lightspeed.irvnca.sbcglobal.net. [99.110.131.145]) by smtp.gmail.com with ESMTPSA id i5-20020a375405000000b006bb2f555ba4sm10752245qkb.41.2022.08.31.21.59.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 31 Aug 2022 21:59:02 -0700 (PDT) From: Shaoxuan Yuan To: git@vger.kernel.org Cc: derrickstolee@github.com, vdye@github.com, Shaoxuan Yuan Subject: [PATCH v3 1/3] builtin/grep.c: add --sparse option Date: Wed, 31 Aug 2022 21:57:34 -0700 Message-Id: <20220901045736.523371-2-shaoxuan.yuan02@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: <20220901045736.523371-1-shaoxuan.yuan02@gmail.com> References: <20220817075633.217934-1-shaoxuan.yuan02@gmail.com> <20220901045736.523371-1-shaoxuan.yuan02@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Add a --sparse option to `git-grep`. When the '--cached' option is used with the 'git grep' command, the search is limited to the blobs found in the index, not in the worktree. If the user has enabled sparse-checkout, this might present more results than they would like, since the files outside of the sparse-checkout are unlikely to be important to them. Change the default behavior of 'git grep' to focus on the files within the sparse-checkout definition. To enable the previous behavior, add a '--sparse' option to 'git grep' that triggers the old behavior that inspects paths outside of the sparse-checkout definition when paired with the '--cached' option. Helped-by: Derrick Stolee Suggested-by: Victoria Dye Signed-off-by: Shaoxuan Yuan --- Documentation/git-grep.txt | 5 ++++- builtin/grep.c | 10 +++++++++- t/t7817-grep-sparse-checkout.sh | 34 +++++++++++++++++++++++++++------ 3 files changed, 41 insertions(+), 8 deletions(-) diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 58d944bd57..bdd3d5b8a6 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -28,7 +28,7 @@ SYNOPSIS [-f ] [-e] [--and|--or|--not|(|)|-e ...] [--recurse-submodules] [--parent-basename ] - [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | ...] + [ [--[no-]exclude-standard] [--cached [--sparse] | --no-index | --untracked] | ...] [--] [...] DESCRIPTION @@ -45,6 +45,9 @@ OPTIONS Instead of searching tracked files in the working tree, search blobs registered in the index file. +--sparse:: + Use with --cached. Search outside of sparse-checkout definition. + --no-index:: Search files in the current directory that is not managed by Git. diff --git a/builtin/grep.c b/builtin/grep.c index e6bcdf860c..12abd832fa 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -96,6 +96,8 @@ static pthread_cond_t cond_result; static int skip_first_line; +static int grep_sparse = 0; + static void add_work(struct grep_opt *opt, struct grep_source *gs) { if (opt->binary != GREP_BINARY_TEXT) @@ -525,7 +527,11 @@ static int grep_cache(struct grep_opt *opt, for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; - if (!cached && ce_skip_worktree(ce)) + /* + * Skip entries with SKIP_WORKTREE unless both --sparse and + * --cached are given. + */ + if (!(grep_sparse && cached) && ce_skip_worktree(ce)) continue; strbuf_setlen(&name, name_base_len); @@ -963,6 +969,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix) PARSE_OPT_NOCOMPLETE), OPT_INTEGER('m', "max-count", &opt.max_count, N_("maximum number of results per file")), + OPT_BOOL(0, "sparse", &grep_sparse, + N_("search the contents of files outside the sparse-checkout definition")), OPT_END() }; grep_prefix = prefix; diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh index eb59564565..a9879cc980 100755 --- a/t/t7817-grep-sparse-checkout.sh +++ b/t/t7817-grep-sparse-checkout.sh @@ -118,13 +118,19 @@ test_expect_success 'grep searches unmerged file despite not matching sparsity p test_cmp expect actual ' -test_expect_success 'grep --cached searches entries with the SKIP_WORKTREE bit' ' +test_expect_success 'grep --cached and --sparse searches entries with the SKIP_WORKTREE bit' ' + cat >expect <<-EOF && + a:text + EOF + git grep --cached "text" >actual && + test_cmp expect actual && + cat >expect <<-EOF && a:text b:text dir/c:text EOF - git grep --cached "text" >actual && + git grep --cached --sparse "text" >actual && test_cmp expect actual ' @@ -143,7 +149,15 @@ test_expect_success 'grep --recurse-submodules honors sparse checkout in submodu test_cmp expect actual ' -test_expect_success 'grep --recurse-submodules --cached searches entries with the SKIP_WORKTREE bit' ' +test_expect_success 'grep --recurse-submodules --cached and --sparse searches entries with the SKIP_WORKTREE bit' ' + cat >expect <<-EOF && + a:text + sub/B/b:text + sub2/a:text + EOF + git grep --recurse-submodules --cached "text" >actual && + test_cmp expect actual && + cat >expect <<-EOF && a:text b:text @@ -152,7 +166,7 @@ test_expect_success 'grep --recurse-submodules --cached searches entries with th sub/B/b:text sub2/a:text EOF - git grep --recurse-submodules --cached "text" >actual && + git grep --recurse-submodules --cached --sparse "text" >actual && test_cmp expect actual ' @@ -166,7 +180,15 @@ test_expect_success 'working tree grep does not search the index with CE_VALID a test_cmp expect actual ' -test_expect_success 'grep --cached searches index entries with both CE_VALID and SKIP_WORKTREE' ' +test_expect_success 'grep --cached and --sparse searches index entries with both CE_VALID and SKIP_WORKTREE' ' + cat >expect <<-EOF && + a:text + EOF + test_when_finished "git update-index --no-assume-unchanged b" && + git update-index --assume-unchanged b && + git grep --cached text >actual && + test_cmp expect actual && + cat >expect <<-EOF && a:text b:text @@ -174,7 +196,7 @@ test_expect_success 'grep --cached searches index entries with both CE_VALID and EOF test_when_finished "git update-index --no-assume-unchanged b" && git update-index --assume-unchanged b && - git grep --cached text >actual && + git grep --cached --sparse text >actual && test_cmp expect actual ' From patchwork Thu Sep 1 04:57:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shaoxuan Yuan X-Patchwork-Id: 12961775 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9860DECAAD8 for ; Thu, 1 Sep 2022 04:59:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232714AbiIAE7J (ORCPT ); Thu, 1 Sep 2022 00:59:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55294 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232693AbiIAE7F (ORCPT ); Thu, 1 Sep 2022 00:59:05 -0400 Received: from mail-qk1-x72a.google.com (mail-qk1-x72a.google.com [IPv6:2607:f8b0:4864:20::72a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 71AF1114C6A for ; Wed, 31 Aug 2022 21:59:04 -0700 (PDT) Received: by mail-qk1-x72a.google.com with SMTP id a15so12437577qko.4 for ; Wed, 31 Aug 2022 21:59:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=inumtH4PqEjgPNLVBXUM+xaHlJT/IZQZ243OG+t0yDw=; b=eOLZNnrOCAonrfWfUneTqgjzqryb6DFDu2m3oF0rHciro0/ApfJdHN49OLCE/o1O5W 7FSMihusBtesdp0jy7rxymtlG0FRzTezVhEYCsmditkxPjfehixTQk3Jx5ifTbViejTm WU/8b94WCpDvXft2OZDPBNNZfy4i3/vi41b0K2yeX9X9BAafpPajFDR7PKHRlCK9r/hs y/cIKfL8JE0qKipl5aUgw5YH6jhuADQ1r7XKvSF6mE3TwZeCpU9hJ4HD5zYdu9qqMJ/M CP7KgNr9y4wMdBAsBhOyDZR3sXFf1L3lM3ObqziaKmP34S1rr6twRhbCZ16F6mL0ct/2 GPSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=inumtH4PqEjgPNLVBXUM+xaHlJT/IZQZ243OG+t0yDw=; b=ofS344PL/K8vVWqJFckt/oxpKxoaQ7ejSWfol3/shrEhJvFWcbS2CQvB6Rdyhw+gwu q4tYytqNTDBcw6jEVHT+G9YR/KuI7ZpdTA/va+HzsMLNH0+YUYt/8wRJFq2m0Qzc5B0p 4HCEpPexmpExHYPbE4g2qDnOFdTeg4gU4PLXW74qn5n5tVOF/D2wfBXtZS2VP0R0H0ww o8OkBAo8s3iFGNlfBY6wAg3Nil7j2TSSifWwv6HV+l+Ib0Wap8fgYcVIxUWOxYRUCR+c MlwAGotB7eiRPWosb+13Sw/8R+aBMNEVA6PYOu5nA1EALVaDGKe7mHjBzABO6q8oCpL8 gXoA== X-Gm-Message-State: ACgBeo1e2A1Kj5SwTStEooDS9DRlxcUtux4H50WoY6apYXCoee0igujE GHHaPt6G8Ni46hlmXdOT6wXOeoUiFnc= X-Google-Smtp-Source: AA6agR4dleTrn4vkCaV9/bJ57lHm4nnUxMAL0ALJuuCf3iS20hZcm8eqDO0jjbMf+PSxydxwG9KseA== X-Received: by 2002:a05:620a:3722:b0:6be:9f84:4c71 with SMTP id de34-20020a05620a372200b006be9f844c71mr6508456qkb.138.1662008343396; Wed, 31 Aug 2022 21:59:03 -0700 (PDT) Received: from ffyuanda.localdomain (99-110-131-145.lightspeed.irvnca.sbcglobal.net. [99.110.131.145]) by smtp.gmail.com with ESMTPSA id i5-20020a375405000000b006bb2f555ba4sm10752245qkb.41.2022.08.31.21.59.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 31 Aug 2022 21:59:03 -0700 (PDT) From: Shaoxuan Yuan To: git@vger.kernel.org Cc: derrickstolee@github.com, vdye@github.com, Shaoxuan Yuan Subject: [PATCH v3 2/3] builtin/grep.c: integrate with sparse index Date: Wed, 31 Aug 2022 21:57:35 -0700 Message-Id: <20220901045736.523371-3-shaoxuan.yuan02@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: <20220901045736.523371-1-shaoxuan.yuan02@gmail.com> References: <20220817075633.217934-1-shaoxuan.yuan02@gmail.com> <20220901045736.523371-1-shaoxuan.yuan02@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Turn on sparse index and remove ensure_full_index(). Change it to only expand the index when using --sparse. The p2000 tests do not demonstrate a significant improvement, because the index read is a small portion of the full process time, compared to the blob parsing. The times below reflect the time spent in the "do_read_index" trace region as shown using GIT_TRACE2_PERF=1. The tests demonstrate a ~99.4% execution time reduction for `git grep` using a sparse index. Test HEAD~ HEAD ----------------------------------------------------------------------------- git grep --cached bogus (full-v3) 0.019 0.018 (-5.2%) git grep --cached bogus (full-v4) 0.017 0.016 (-5.8%) git grep --cached bogus (sparse-v3) 0.29 0.0015 (-99.4%) git grep --cached bogus (sparse-v4) 0.30 0.0018 (-99.4%) Optional reading about performance test results ----------------------------------------------- Notice that because `git-grep` needs to parse blobs in the index, the index reading time is minuscule comparing to the object parsing time. And because of this, the p2000 test results cannot clearly reflect the speedup for index reading: combining with the object parsing time, the aggregated time difference is extremely close between HEAD~1 and HEAD. Hence, the results presenting here are not directly extracted from the p2000 test results. Instead, to make the performance difference more visible, the test command is manually ran with GIT_TRACE2_PERF in the four repos (full-v3, sparse-v3, full-v4, sparse-v4). The numbers here are then extracted from the time difference between "region_enter" and "region_leave" of label "do_read_index". Helped-by: Derrick Stolee Signed-off-by: Shaoxuan Yuan --- builtin/grep.c | 10 ++++++++-- t/t1092-sparse-checkout-compatibility.sh | 18 ++++++++++++++++++ 2 files changed, 26 insertions(+), 2 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index 12abd832fa..a0b4dbc1dc 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -522,8 +522,9 @@ static int grep_cache(struct grep_opt *opt, if (repo_read_index(repo) < 0) die(_("index file corrupt")); - /* TODO: audit for interaction with sparse-index. */ - ensure_full_index(repo->index); + if (grep_sparse) + ensure_full_index(repo->index); + for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; @@ -992,6 +993,11 @@ int cmd_grep(int argc, const char **argv, const char *prefix) PARSE_OPT_KEEP_DASHDASH | PARSE_OPT_STOP_AT_NON_OPTION); + if (the_repository->gitdir) { + prepare_repo_settings(the_repository); + the_repository->settings.command_requires_full_index = 0; + } + if (use_index && !startup_info->have_repository) { int fallback = 0; git_config_get_bool("grep.fallbacktonoindex", &fallback); diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index 0302e36fd6..63becc3138 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -1972,4 +1972,22 @@ test_expect_success 'sparse index is not expanded: rm' ' ensure_not_expanded rm -r deep ' +test_expect_success 'grep with --sparse and --cached' ' + init_repos && + + test_all_match git grep --sparse --cached a && + test_all_match git grep --sparse --cached a -- "folder1/*" +' + +test_expect_success 'grep is not expanded' ' + init_repos && + + ensure_not_expanded grep a && + ensure_not_expanded grep a -- deep/* && + + # All files within the folder1/* pathspec are sparse, + # so this command does not find any matches + ensure_not_expanded ! grep a -- folder1/* +' + test_done From patchwork Thu Sep 1 04:57:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Shaoxuan Yuan X-Patchwork-Id: 12961776 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B3ACECAAD1 for ; Thu, 1 Sep 2022 04:59:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232733AbiIAE7L (ORCPT ); Thu, 1 Sep 2022 00:59:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55308 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229970AbiIAE7G (ORCPT ); Thu, 1 Sep 2022 00:59:06 -0400 Received: from mail-qt1-x833.google.com (mail-qt1-x833.google.com [IPv6:2607:f8b0:4864:20::833]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7D87C114C6D for ; Wed, 31 Aug 2022 21:59:05 -0700 (PDT) Received: by mail-qt1-x833.google.com with SMTP id x5so12591729qtv.9 for ; Wed, 31 Aug 2022 21:59:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=1PkTszXOfBFdziwolQmk/RQW12UAYu5GxbEPYVQct2g=; b=hD3ikJPTCie27VzxY8gyW9XGpoCtCElfE2aKk0QfmLu/JqlytNe9SjTrc7FOeIIiT9 1jz0jAtxJpZZEG80h7P5a0YglcqIqBqt5LKUabCEDtfFMV294MqA0w4tM/dG/tXYlOmt vo1QJRKMAh+RabOFhnkn8HgDtBD69RsC+xcZFKvkp2WmW4KfGWUFeyCcouLXxBTXlPep vTJ0jsoLB2tKBQnil37A90Nps9733xrhEZxhxl/+2NcYT5PYfXFBFyWnRQEyI5G9uRdV 8OHewHK1UPp5ijDc3IbajGxIfDLO/STU8Rtufyf7oujAVpCPoN66R0OtgybUDw5t6r5b cC8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=1PkTszXOfBFdziwolQmk/RQW12UAYu5GxbEPYVQct2g=; b=dtq13s93KIhY2y7pvE8Jccng/DZEoxR4j5U2RndshzL0MQquDw9MmAp4DtQYLhsM5x D282Z2ynkUtjRZDUGaiXITi4+xSOCg5Qefa9+snNn1MSUnAjKOozwoFTdoh5LbcZkuNG kcPByuP3CaV5UXXvrxBU/LJJxHCW+l0d40W9awasEyiu+rgQovcMFs/gD5kVaVzTqkPk eDr0CIBs+Bx7UxUJjyKAJ7XPUUs+znwQnSsanJd8VV8T2LJuU10yRPIP7rpTzFjgKzLQ vtua08GnZg68KxYjZ3KNQSkEQjWdm5hVO3XKJVRquv9PSAg3w6HRSC6v64/TwOAibCZp RXSA== X-Gm-Message-State: ACgBeo3vIDLVtwcQAZWXvcWmlfulhXqSbEjNO1ET46f1CLdBKTONgf86 1rUtSGKzV2oGld3pke1zm+wsYfcF7qk= X-Google-Smtp-Source: AA6agR5+ekUxISKqW/LJ3SlzFPOMh1oUKRvWE9t7vRORKSWcQ91ORZFPBEhMJsESYegP7/WW9nBykQ== X-Received: by 2002:a05:622a:199c:b0:344:7645:9ba1 with SMTP id u28-20020a05622a199c00b0034476459ba1mr22205414qtc.629.1662008344440; Wed, 31 Aug 2022 21:59:04 -0700 (PDT) Received: from ffyuanda.localdomain (99-110-131-145.lightspeed.irvnca.sbcglobal.net. [99.110.131.145]) by smtp.gmail.com with ESMTPSA id i5-20020a375405000000b006bb2f555ba4sm10752245qkb.41.2022.08.31.21.59.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 31 Aug 2022 21:59:04 -0700 (PDT) From: Shaoxuan Yuan To: git@vger.kernel.org Cc: derrickstolee@github.com, vdye@github.com, Shaoxuan Yuan Subject: [PATCH v3 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Date: Wed, 31 Aug 2022 21:57:36 -0700 Message-Id: <20220901045736.523371-4-shaoxuan.yuan02@gmail.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: <20220901045736.523371-1-shaoxuan.yuan02@gmail.com> References: <20220817075633.217934-1-shaoxuan.yuan02@gmail.com> <20220901045736.523371-1-shaoxuan.yuan02@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Before this patch, whenever --sparse is used, `git-grep` utilizes the ensure_full_index() method to expand the index and search all the entries. Because this method requires walking all the trees and constructing the index, it is the slow part within the whole command. To achieve better performance, this patch uses grep_tree() to search the sparse directory entries and get rid of the ensure_full_index() method. Why grep_tree() is a better choice over ensure_full_index()? 1) grep_tree() is as correct as ensure_full_index(). grep_tree() looks into every sparse-directory entry (represented by a tree) recursively when looping over the index, and the result of doing so matches the result of expanding the index. 2) grep_tree() utilizes pathspecs to limit the scope of searching. ensure_full_index() always expands the index when --sparse is used, that means it will always walk all the trees and blobs in the repo without caring if the user only wants a subset of the content, i.e. using a pathspec. On the other hand, grep_tree() will only search the contents that match the pathspec, and thus possibly walking fewer trees. 3) grep_tree() does not construct and copy back a new index, while ensure_full_index() does. This also saves some time. ---------------- Performance test - Summary: p2000 tests demonstrate a ~91% execution time reduction for `git grep --cached --sparse -- ` using tree-walking logic. Test HEAD~ HEAD --------------------------------------------------------------------------------------------------- 2000.78: git grep --cached --sparse bogus -- f2/f1/f1/builtin/* (full-v3) 0.11 0.09 (≈) 2000.79: git grep --cached --sparse bogus -- f2/f1/f1/builtin/* (full-v4) 0.08 0.09 (≈) 2000.80: git grep --cached --sparse bogus -- f2/f1/f1/builtin/* (sparse-v3) 0.44 0.04 (-90.9%) 2000.81: git grep --cached --sparse bogus -- f2/f1/f1/builtin/* (sparse-v4) 0.46 0.04 (-91.3%) - Command used for testing: git grep --cached --sparse bogus -- f2/f1/f1/builtin/* The reason for specifying a pathspec is that, if we don't specify a pathspec, then grep_tree() will walk all the trees and blobs to find the pattern, and the time consumed doing so is not too different from using the original ensure_full_index() method, which also spends most of the time walking trees. However, when a pathspec is specified, this latest logic will only walk the area of trees enclosed by the pathspec, and the time consumed is reasonably a lot less. That is, if we don't specify a pathspec, the performance difference [1] is quite small: both methods walk all the trees and take generally same amount of time (even with the index construction time included for ensure_full_index()). [1] Performance test result without pathspec: Test HEAD~ HEAD ----------------------------------------------------------------------------- 2000.78: git grep --cached --sparse bogus (full-v3) 6.17 5.19 (≈) 2000.79: git grep --cached --sparse bogus (full-v4) 6.19 5.46 (≈) 2000.80: git grep --cached --sparse bogus (sparse-v3) 6.57 6.44 (≈) 2000.81: git grep --cached --sparse bogus (sparse-v4) 6.65 6.28 (≈) Suggested-by: Derrick Stolee Helped-by: Derrick Stolee Helped-by: Victoria Dye Signed-off-by: Shaoxuan Yuan --- builtin/grep.c | 32 ++++++++++++++++++++++++++----- t/perf/p2000-sparse-operations.sh | 1 + 2 files changed, 28 insertions(+), 5 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index a0b4dbc1dc..8c0edccd8e 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -522,9 +522,6 @@ static int grep_cache(struct grep_opt *opt, if (repo_read_index(repo) < 0) die(_("index file corrupt")); - if (grep_sparse) - ensure_full_index(repo->index); - for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; @@ -537,8 +534,26 @@ static int grep_cache(struct grep_opt *opt, strbuf_setlen(&name, name_base_len); strbuf_addstr(&name, ce->name); + if (S_ISSPARSEDIR(ce->ce_mode)) { + enum object_type type; + struct tree_desc tree; + void *data; + unsigned long size; + struct strbuf base = STRBUF_INIT; + + strbuf_addstr(&base, ce->name); + + data = read_object_file(&ce->oid, &type, &size); + init_tree_desc(&tree, data, size); - if (S_ISREG(ce->ce_mode) && + /* + * sneak in the ce_mode using check_attr parameter + */ + hit |= grep_tree(opt, pathspec, &tree, &base, + base.len, ce->ce_mode); + strbuf_release(&base); + free(data); + } else if (S_ISREG(ce->ce_mode) && match_pathspec(repo->index, pathspec, name.buf, name.len, 0, NULL, S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode))) { @@ -598,7 +613,14 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, int te_len = tree_entry_len(&entry); if (match != all_entries_interesting) { - strbuf_addstr(&name, base->buf + tn_len); + if (S_ISSPARSEDIR(check_attr)) { + // object is a sparse directory entry + strbuf_addbuf(&name, base); + } else { + // object is a commit or a root tree + strbuf_addstr(&name, base->buf + tn_len); + } + match = tree_entry_interesting(repo->index, &entry, &name, 0, pathspec); diff --git a/t/perf/p2000-sparse-operations.sh b/t/perf/p2000-sparse-operations.sh index fce8151d41..a0b71bb3b4 100755 --- a/t/perf/p2000-sparse-operations.sh +++ b/t/perf/p2000-sparse-operations.sh @@ -124,5 +124,6 @@ test_perf_on_all git read-tree -mu HEAD test_perf_on_all git checkout-index -f --all test_perf_on_all git update-index --add --remove $SPARSE_CONE/a test_perf_on_all "git rm -f $SPARSE_CONE/a && git checkout HEAD -- $SPARSE_CONE/a" +test_perf_on_all git grep --cached --sparse bogus -- "f2/f1/f1/builtin/*" test_done