Message ID | 6d6b230e3318007150aebefebc16dfb8b9b6c401.1614111270.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Sparse Index: Design, Format, Tests | expand |
On Wed, 24 Feb 2021 at 00:57, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote: > +that is not completely understood by other tools. Enabling sparse index > +enables the `extensions.spareseIndex` config value, which might cause s/sparese/sparse > +other tools to stop working with your repository. If you have trouble with > +this compatibility, then run `git sparse-checkout sparse-index disable` to > +remove this config and rewrite your index to not be sparse. While I'm commenting on this..: There are several "layers" here, for lack of a better term. "Enabling foo enables bar which may cause baz. If you fail due to baz, try dropping bar by dropping foo." If I remove any mention of the config variable from your text, I get the following. Enabling sparse index might cause other tools to stop working with your repository. If you have trouble with this compatibility, then run `git sparse-checkout sparse-index disable` to rewrite your index to not be sparse. I'm tempted to suggest such a rewrite to relieve readers of knowing of the middle step, which you could say is more of an implementation detail. But if we think that the symptoms / error messages might involve "extensions.sparseIndex" or, as would be the case with an older Git installation, fatal: unknown repository extensions found: sparseindex maybe there is some value in mentioning the config item by name. Just thinking out loud, really, and I don't have any strong opinion. I only came here to point out the typo in the docs. Martin
On 2/24/2021 2:11 PM, Martin Ågren wrote: > On Wed, 24 Feb 2021 at 00:57, Derrick Stolee via GitGitGadget > <gitgitgadget@gmail.com> wrote: >> +that is not completely understood by other tools. Enabling sparse index >> +enables the `extensions.spareseIndex` config value, which might cause > > s/sparese/sparse Thanks! >> +other tools to stop working with your repository. If you have trouble with >> +this compatibility, then run `git sparse-checkout sparse-index disable` to >> +remove this config and rewrite your index to not be sparse. > > While I'm commenting on this..: > > There are several "layers" here, for lack of a better term. "Enabling foo > enables bar which may cause baz. If you fail due to baz, try dropping > bar by dropping foo." If I remove any mention of the config variable from > your text, I get the following. > > Enabling sparse index might cause other tools to stop working with your > repository. If you have trouble with this compatibility, then run `git > sparse-checkout sparse-index disable` to rewrite your index to not be > sparse. > > I'm tempted to suggest such a rewrite to relieve readers of knowing of > the middle step, which you could say is more of an implementation > detail. But if we think that the symptoms / error messages might involve > "extensions.sparseIndex" or, as would be the case with an older Git > installation, > > fatal: unknown repository extensions found: > sparseindex > > maybe there is some value in mentioning the config item by name. Just > thinking out loud, really, and I don't have any strong opinion. I only > came here to point out the typo in the docs. I agree that the layers are confusing. We could rearrange and have a similar flow to what you recommend by mentioning the extension at the end: **WARNING:** Using a sparse index requires modifying the index in a way that is not completely understood by other tools. If you have trouble with this compatibility, then run `git sparse-checkout sparse-index disable` to rewrite your index to not be sparse. Older versions of Git will not understand the `sparseIndex` repository extension and may fail to interact with your repository until it is disabled. Thanks, -Stolee
On Tue, Mar 9, 2021 at 12:52 PM Derrick Stolee <stolee@gmail.com> wrote: > > On 2/24/2021 2:11 PM, Martin Ågren wrote: > > On Wed, 24 Feb 2021 at 00:57, Derrick Stolee via GitGitGadget > > <gitgitgadget@gmail.com> wrote: > >> +that is not completely understood by other tools. Enabling sparse index > >> +enables the `extensions.spareseIndex` config value, which might cause > > > > s/sparese/sparse > > Thanks! > > > >> +other tools to stop working with your repository. If you have trouble with > >> +this compatibility, then run `git sparse-checkout sparse-index disable` to > >> +remove this config and rewrite your index to not be sparse. > > > > While I'm commenting on this..: > > > > There are several "layers" here, for lack of a better term. "Enabling foo > > enables bar which may cause baz. If you fail due to baz, try dropping > > bar by dropping foo." If I remove any mention of the config variable from > > your text, I get the following. > > > > Enabling sparse index might cause other tools to stop working with your > > repository. If you have trouble with this compatibility, then run `git > > sparse-checkout sparse-index disable` to rewrite your index to not be > > sparse. > > > > I'm tempted to suggest such a rewrite to relieve readers of knowing of > > the middle step, which you could say is more of an implementation > > detail. But if we think that the symptoms / error messages might involve > > "extensions.sparseIndex" or, as would be the case with an older Git > > installation, > > > > fatal: unknown repository extensions found: > > sparseindex > > > > maybe there is some value in mentioning the config item by name. Just > > thinking out loud, really, and I don't have any strong opinion. I only > > came here to point out the typo in the docs. > > I agree that the layers are confusing. We could rearrange and have > a similar flow to what you recommend by mentioning the extension at > the end: > > **WARNING:** Using a sparse index requires modifying the index in a way > that is not completely understood by other tools. If you have trouble with > this compatibility, then run `git sparse-checkout sparse-index disable` to > rewrite your index to not be sparse. Older versions of Git will not > understand the `sparseIndex` repository extension and may fail to interact > with your repository until it is disabled. > > Thanks, > -Stolee This looks pretty good to me, but could we change the first sentence to read "...modifying the index in a way that may not yet be understood by external tools." ? I'm worried "other tools" might make people worry about different builtin commands (e.g. fast-export, log). I also prefer "may" and "yet" because I suspect most external tools (e.g. git filter-repo just to name a personal example) won't need to read an index format and will thus be unaffected, and any tools that do read the index format will probably eventually learn how to work with the new format.
On 3/9/2021 4:03 PM, Elijah Newren wrote: > On Tue, Mar 9, 2021 at 12:52 PM Derrick Stolee <stolee@gmail.com> wrote: >> >> On 2/24/2021 2:11 PM, Martin Ågren wrote: >>> There are several "layers" here, for lack of a better term. "Enabling foo >>> enables bar which may cause baz. If you fail due to baz, try dropping >>> bar by dropping foo." If I remove any mention of the config variable from >>> your text, I get the following. >>> >>> Enabling sparse index might cause other tools to stop working with your >>> repository. If you have trouble with this compatibility, then run `git >>> sparse-checkout sparse-index disable` to rewrite your index to not be >>> sparse. >>> >>> I'm tempted to suggest such a rewrite to relieve readers of knowing of >>> the middle step, which you could say is more of an implementation >>> detail. But if we think that the symptoms / error messages might involve >>> "extensions.sparseIndex" or, as would be the case with an older Git >>> installation, >>> >>> fatal: unknown repository extensions found: >>> sparseindex >>> >>> maybe there is some value in mentioning the config item by name. Just >>> thinking out loud, really, and I don't have any strong opinion. I only >>> came here to point out the typo in the docs. >> >> I agree that the layers are confusing. We could rearrange and have >> a similar flow to what you recommend by mentioning the extension at >> the end: >> >> **WARNING:** Using a sparse index requires modifying the index in a way >> that is not completely understood by other tools. If you have trouble with >> this compatibility, then run `git sparse-checkout sparse-index disable` to >> rewrite your index to not be sparse. Older versions of Git will not >> understand the `sparseIndex` repository extension and may fail to interact >> with your repository until it is disabled. >> >> Thanks, >> -Stolee > > This looks pretty good to me, but could we change the first sentence > to read "...modifying the index in a way that may not yet be > understood by external tools." ? I'm worried "other tools" might make > people worry about different builtin commands (e.g. fast-export, log). > I also prefer "may" and "yet" because I suspect most external tools > (e.g. git filter-repo just to name a personal example) won't need to > read an index format and will thus be unaffected, and any tools that > do read the index format will probably eventually learn how to work > with the new format. I can make the change, but I do want to point out that the current use of a repository extension _does_ mean that tools that (correctly) interact with a Git repository should fail even if they don't try to access the index file. This is only something to make this work until we introduce a new index file format version and then can drop the extension. "git filter-repo" _should_ be safe because it's really just shelling to Git, right? I'm more concerned about tools like libgit2. Thanks, -Stolee
On Tue, Mar 9, 2021 at 1:10 PM Derrick Stolee <stolee@gmail.com> wrote: > > On 3/9/2021 4:03 PM, Elijah Newren wrote: > > On Tue, Mar 9, 2021 at 12:52 PM Derrick Stolee <stolee@gmail.com> wrote: > >> > >> On 2/24/2021 2:11 PM, Martin Ågren wrote: > >>> There are several "layers" here, for lack of a better term. "Enabling foo > >>> enables bar which may cause baz. If you fail due to baz, try dropping > >>> bar by dropping foo." If I remove any mention of the config variable from > >>> your text, I get the following. > >>> > >>> Enabling sparse index might cause other tools to stop working with your > >>> repository. If you have trouble with this compatibility, then run `git > >>> sparse-checkout sparse-index disable` to rewrite your index to not be > >>> sparse. > >>> > >>> I'm tempted to suggest such a rewrite to relieve readers of knowing of > >>> the middle step, which you could say is more of an implementation > >>> detail. But if we think that the symptoms / error messages might involve > >>> "extensions.sparseIndex" or, as would be the case with an older Git > >>> installation, > >>> > >>> fatal: unknown repository extensions found: > >>> sparseindex > >>> > >>> maybe there is some value in mentioning the config item by name. Just > >>> thinking out loud, really, and I don't have any strong opinion. I only > >>> came here to point out the typo in the docs. > >> > >> I agree that the layers are confusing. We could rearrange and have > >> a similar flow to what you recommend by mentioning the extension at > >> the end: > >> > >> **WARNING:** Using a sparse index requires modifying the index in a way > >> that is not completely understood by other tools. If you have trouble with > >> this compatibility, then run `git sparse-checkout sparse-index disable` to > >> rewrite your index to not be sparse. Older versions of Git will not > >> understand the `sparseIndex` repository extension and may fail to interact > >> with your repository until it is disabled. > >> > >> Thanks, > >> -Stolee > > > > This looks pretty good to me, but could we change the first sentence > > to read "...modifying the index in a way that may not yet be > > understood by external tools." ? I'm worried "other tools" might make > > people worry about different builtin commands (e.g. fast-export, log). > > I also prefer "may" and "yet" because I suspect most external tools > > (e.g. git filter-repo just to name a personal example) won't need to > > read an index format and will thus be unaffected, and any tools that > > do read the index format will probably eventually learn how to work > > with the new format. > > I can make the change, but I do want to point out that the current > use of a repository extension _does_ mean that tools that (correctly) > interact with a Git repository should fail even if they don't try to > access the index file. This is only something to make this work until > we introduce a new index file format version and then can drop the > extension. Good point, though... > "git filter-repo" _should_ be safe because it's really just shelling > to Git, right? I'm more concerned about tools like libgit2. Yes, libgit2 and jgit and similar tools are clearly going to be affected and deeply. Those are of concern, but I suspect most users when they see "external tools" will be thinking of the large multitude of scripts out there that just shell out to git under the hood to provide some higher level wrapper of some sort. And anything that operates that way won't be affected directly by the repository extension. I'm not sure I'd even mark things that shell out to git as _should_ be safe. In general, scripts can make all kinds of assumptions on interpreting output, and I suspect some of those may become invalidated by this new feature. We have a recent guidepost that's very close to home on that too -- git stash had *3* different bugs in it once sparse-checkouts were introduced, based on the fact that it was designed as a just-shell-out-to-low-level-git-commands script and it made assumptions on how those commands worked together. See https://lore.kernel.org/git/ccfedc7140dbf63ba26a15f93bd3885180b26517.1606861519.git.gitgitgadget@gmail.com/. Sure git-stash is a builtin (supposedly, anyway), but external tools can make similar logical jumps.
On Tue, 9 Mar 2021 at 21:52, Derrick Stolee <stolee@gmail.com> wrote: > > I agree that the layers are confusing. We could rearrange and have > a similar flow to what you recommend by mentioning the extension at > the end: > > **WARNING:** Using a sparse index requires modifying the index in a way > that is not completely understood by other tools. If you have trouble with > this compatibility, then run `git sparse-checkout sparse-index disable` to > rewrite your index to not be sparse. Older versions of Git will not > understand the `sparseIndex` repository extension and may fail to interact > with your repository until it is disabled. I like it. I find this easier to read than the previous version. That said, is `git sparse-index sparse-checkout disable` really the way to do this? I don't see a "sparse-index" subcommand of git-sparse-checkout. ... Hmm, no, after building and installing your patches, I get $ git sparse-checkout sparse-index disable usage: git sparse-checkout (init|list|set|add|reapply|disable) <options> Should that be `git sparse-checkout init --no-sparse-index`? I just tried that on a fresh, empty repo. It seems to work in the sense that it drops the config item. I'm guessing re-initing a sparse checkout is a safe and sane thing to do? I don't find any tests for this. If re-initing should be ok and in particular if it should allow toggling the use of sparse index, it might be good having a test. At a minimum to see that the command passes and that the config item goes away? And check that the actual index is rewritten back to the "old" format? (Sorry if you have that already and I'm just bad at finding it.) Martin
On 3/14/2021 4:08 PM, Martin Ågren wrote: > On Tue, 9 Mar 2021 at 21:52, Derrick Stolee <stolee@gmail.com> wrote: >> >> I agree that the layers are confusing. We could rearrange and have >> a similar flow to what you recommend by mentioning the extension at >> the end: >> >> **WARNING:** Using a sparse index requires modifying the index in a way >> that is not completely understood by other tools. If you have trouble with >> this compatibility, then run `git sparse-checkout sparse-index disable` to >> rewrite your index to not be sparse. Older versions of Git will not >> understand the `sparseIndex` repository extension and may fail to interact >> with your repository until it is disabled. > > I like it. I find this easier to read than the previous version. That > said, is `git sparse-index sparse-checkout disable` really the way to do > this? I don't see a "sparse-index" subcommand of git-sparse-checkout. > ... Hmm, no, after building and installing your patches, I get > > $ git sparse-checkout sparse-index disable > usage: git sparse-checkout (init|list|set|add|reapply|disable) <options> > > Should that be `git sparse-checkout init --no-sparse-index`? I just > tried that on a fresh, empty repo. It seems to work in the sense that it > drops the config item. I'm guessing re-initing a sparse checkout is a > safe and sane thing to do? Yes! Sorry I missed updating this instance when changing the design. Your suggestion is indeed the proper way to disable the sparse-index. > I don't find any tests for this. If re-initing should be ok and in > particular if it should allow toggling the use of sparse index, it might > be good having a test. At a minimum to see that the command passes and > that the config item goes away? And check that the actual index is > rewritten back to the "old" format? (Sorry if you have that already and > I'm just bad at finding it.) We have tests already that 'git sparse-checkout init' will preserve existing sparse-checkout patterns. I should definitely have a test to ensure that '--no-sparse-index' rewrites the index to be a full one. Thanks! -Stolee
diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt index a0eeaeb02ee3..b51b8450cfd9 100644 --- a/Documentation/git-sparse-checkout.txt +++ b/Documentation/git-sparse-checkout.txt @@ -45,6 +45,20 @@ To avoid interfering with other worktrees, it first enables the When `--cone` is provided, the `core.sparseCheckoutCone` setting is also set, allowing for better performance with a limited set of patterns (see 'CONE PATTERN SET' below). ++ +Use the `--[no-]sparse-index` option to toggle the use of the sparse +index format. This reduces the size of the index to be more closely +aligned with your sparse-checkout definition. This can have significant +performance advantages for commands such as `git status` or `git add`. +This feature is still experimental. Some commands might be slower with +a sparse index until they are properly integrated with the feature. ++ +**WARNING:** Using a sparse index requires modifying the index in a way +that is not completely understood by other tools. Enabling sparse index +enables the `extensions.spareseIndex` config value, which might cause +other tools to stop working with your repository. If you have trouble with +this compatibility, then run `git sparse-checkout sparse-index disable` to +remove this config and rewrite your index to not be sparse. 'set':: Write a set of patterns to the sparse-checkout file, as given as diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c index e00b82af727b..ca63e2c64e95 100644 --- a/builtin/sparse-checkout.c +++ b/builtin/sparse-checkout.c @@ -14,6 +14,7 @@ #include "unpack-trees.h" #include "wt-status.h" #include "quote.h" +#include "sparse-index.h" static const char *empty_base = ""; @@ -283,12 +284,13 @@ static int set_config(enum sparse_checkout_mode mode) } static char const * const builtin_sparse_checkout_init_usage[] = { - N_("git sparse-checkout init [--cone]"), + N_("git sparse-checkout init [--cone] [--[no-]sparse-index]"), NULL }; static struct sparse_checkout_init_opts { int cone_mode; + int sparse_index; } init_opts; static int sparse_checkout_init(int argc, const char **argv) @@ -303,11 +305,15 @@ static int sparse_checkout_init(int argc, const char **argv) static struct option builtin_sparse_checkout_init_options[] = { OPT_BOOL(0, "cone", &init_opts.cone_mode, N_("initialize the sparse-checkout in cone mode")), + OPT_BOOL(0, "sparse-index", &init_opts.sparse_index, + N_("toggle the use of a sparse index")), OPT_END(), }; repo_read_index(the_repository); + init_opts.sparse_index = -1; + argc = parse_options(argc, argv, NULL, builtin_sparse_checkout_init_options, builtin_sparse_checkout_init_usage, 0); @@ -326,6 +332,15 @@ static int sparse_checkout_init(int argc, const char **argv) sparse_filename = get_sparse_checkout_filename(); res = add_patterns_from_file_to_list(sparse_filename, "", 0, &pl, NULL); + if (init_opts.sparse_index >= 0) { + if (set_sparse_index_config(the_repository, init_opts.sparse_index) < 0) + die(_("failed to modify sparse-index config")); + + /* force an index rewrite */ + repo_read_index(the_repository); + the_repository->index->updated_workdir = 1; + } + /* If we already have a sparse-checkout file, use it. */ if (res >= 0) { free(sparse_filename); diff --git a/sparse-index.c b/sparse-index.c index 97b0d0c57857..a991c5331e9e 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -104,23 +104,37 @@ static int convert_to_sparse_rec(struct index_state *istate, static int enable_sparse_index(struct repository *repo) { - const char *config_path = repo_git_path(repo, "config.worktree"); + int res; if (upgrade_repository_format(1) < 0) { warning(_("unable to upgrade repository format to enable sparse-index")); return -1; } - git_config_set_in_file_gently(config_path, - "extensions.sparseIndex", - "true"); + res = git_config_set_gently("extensions.sparseindex", "true"); prepare_repo_settings(repo); repo->settings.sparse_index = 1; - return 0; + return res; +} + +int set_sparse_index_config(struct repository *repo, int enable) +{ + int res; + + if (enable) + return enable_sparse_index(repo); + + /* Don't downgrade repository format, just remove the extension. */ + res = git_config_set_gently("extensions.sparseindex", NULL); + + prepare_repo_settings(repo); + repo->settings.sparse_index = 0; + return res; } int convert_to_sparse(struct index_state *istate) { + int test_env; if (istate->split_index || istate->sparse_index || !core_apply_sparse_checkout || !core_sparse_checkout_cone) return 0; @@ -129,14 +143,13 @@ int convert_to_sparse(struct index_state *istate) istate->repo = the_repository; /* - * The GIT_TEST_SPARSE_INDEX environment variable triggers the - * extensions.sparseIndex config variable to be on. + * If GIT_TEST_SPARSE_INDEX=1, then trigger extensions.sparseIndex + * to be fully enabled. If GIT_TEST_SPARSE_INDEX=0 (set explicitly), + * then purposefully disable the setting. */ - if (git_env_bool("GIT_TEST_SPARSE_INDEX", 0)) { - int err = enable_sparse_index(istate->repo); - if (err < 0) - return err; - } + test_env = git_env_bool("GIT_TEST_SPARSE_INDEX", -1); + if (test_env >= 0) + set_sparse_index_config(istate->repo, test_env); /* * Only convert to sparse if extensions.sparseIndex is set. diff --git a/sparse-index.h b/sparse-index.h index 64380e121d80..39dcc859735e 100644 --- a/sparse-index.h +++ b/sparse-index.h @@ -5,4 +5,7 @@ struct index_state; void ensure_full_index(struct index_state *istate); int convert_to_sparse(struct index_state *istate); +struct repository; +int set_sparse_index_config(struct repository *repo, int enable); + #endif diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh index bfc9e28ef0e1..9c2bc4d25f66 100755 --- a/t/t1092-sparse-checkout-compatibility.sh +++ b/t/t1092-sparse-checkout-compatibility.sh @@ -4,6 +4,7 @@ test_description='compare full workdir to sparse workdir' GIT_TEST_CHECK_CACHE_TREE=0 GIT_TEST_SPLIT_INDEX=0 +GIT_TEST_SPARSE_INDEX= . ./test-lib.sh @@ -98,25 +99,26 @@ init_repos () { # initialize sparse-checkout definitions git -C sparse-checkout sparse-checkout init --cone && git -C sparse-checkout sparse-checkout set deep && - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout init --cone && - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep + git -C sparse-index sparse-checkout init --cone --sparse-index && + test_cmp_config -C sparse-index true extensions.sparseindex && + git -C sparse-index sparse-checkout set deep } run_on_sparse () { ( cd sparse-checkout && - GIT_TEST_SPARSE_INDEX=0 "$@" >../sparse-checkout-out 2>../sparse-checkout-err + "$@" >../sparse-checkout-out 2>../sparse-checkout-err ) && ( cd sparse-index && - GIT_TEST_SPARSE_INDEX=1 "$@" >../sparse-index-out 2>../sparse-index-err + "$@" >../sparse-index-out 2>../sparse-index-err ) } run_on_all () { ( cd full-checkout && - GIT_TEST_SPARSE_INDEX=0 "$@" >../full-checkout-out 2>../full-checkout-err + "$@" >../full-checkout-out 2>../full-checkout-err ) && run_on_sparse "$@" } @@ -146,7 +148,7 @@ test_expect_success 'sparse-index contents' ' || return 1 done && - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set folder1 && + git -C sparse-index sparse-checkout set folder1 && test-tool -C sparse-index read-cache --table >cache && for dir in deep folder2 x @@ -156,7 +158,7 @@ test_expect_success 'sparse-index contents' ' || return 1 done && - GIT_TEST_SPARSE_INDEX=1 git -C sparse-index sparse-checkout set deep/deeper1 && + git -C sparse-index sparse-checkout set deep/deeper1 && test-tool -C sparse-index read-cache --table >cache && for dir in deep/deeper2 folder1 folder2 x @@ -394,19 +396,15 @@ test_expect_success 'submodule handling' ' test_expect_success 'sparse-index is expanded and converted back' ' init_repos && - ( - GIT_TEST_SPARSE_INDEX=1 && - export GIT_TEST_SPARSE_INDEX && - GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ - git -C sparse-index -c core.fsmonitor="" reset --hard && - test_region index convert_to_sparse trace2.txt && - test_region index ensure_full_index trace2.txt && - - rm trace2.txt && - GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ - git -C sparse-index -c core.fsmonitor="" status -uno && - test_region index ensure_full_index trace2.txt - ) + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c core.fsmonitor="" reset --hard && + test_region index convert_to_sparse trace2.txt && + test_region index ensure_full_index trace2.txt && + + rm trace2.txt && + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \ + git -C sparse-index -c core.fsmonitor="" status -uno && + test_region index ensure_full_index trace2.txt ' test_done