Message ID | 77bf5d5ff27729a39ac00d52af3c09610d733b14.1670433958.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Optionally skip hashing index on write | expand |
On Wed, Dec 07 2022, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <derrickstolee@github.com> > [...] > diff --git a/read-cache.c b/read-cache.c > index fb4d6fb6387..1844953fba7 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -2923,12 +2923,13 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, > int ieot_entries = 1; > struct index_entry_offset_table *ieot = NULL; > int nr, nr_threads; > - int skip_hash; > > f = hashfd(tempfile->fd, tempfile->filename.buf); > > - if (!git_config_get_maybe_bool("index.skiphash", &skip_hash)) > - f->skip_hash = skip_hash; > + if (istate->repo) { > + prepare_repo_settings(istate->repo); > + f->skip_hash = istate->repo->settings.index_skip_hash; > + } Urm, are we ever going to find ourselves in a situation where: * We have read the settings for the_repository * We have an index we're about to write out as our "main index", but the istate->repo *isn't* the_repository. * Even then, wouldn't the two copies of the repos have read the same repo settings? But maybe there's a really obvious submodule / worktree / whatever edge case I'm missing. But if not, shouldn't we just always read/write this from the_repository? > + rm -f .git/index && > + git -c feature.manyFiles=true \ > + -c index.skipHash=false add a && > + test_trailing_hash .git/index >hash && > + ! test_cmp expect hash We had a parallel thread where we discussed "! test_cmp" being an anti-pattern, i.e. you want them not to be the same, but you want it to still show a diff, Maybe just "! cmp" ? I.e. either the diff will be meaningless, or we really should be asserting the actual value we want, not what it shouldn't be. so in this case, shouldn't we assert that it's the 0000... value, or the actual hash (depending on which way around we're testing this)?
On 12/7/2022 5:30 PM, Ævar Arnfjörð Bjarmason wrote: > > On Wed, Dec 07 2022, Derrick Stolee via GitGitGadget wrote: > >> From: Derrick Stolee <derrickstolee@github.com> >> [...] >> diff --git a/read-cache.c b/read-cache.c >> index fb4d6fb6387..1844953fba7 100644 >> --- a/read-cache.c >> +++ b/read-cache.c >> @@ -2923,12 +2923,13 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, >> int ieot_entries = 1; >> struct index_entry_offset_table *ieot = NULL; >> int nr, nr_threads; >> - int skip_hash; >> >> f = hashfd(tempfile->fd, tempfile->filename.buf); >> >> - if (!git_config_get_maybe_bool("index.skiphash", &skip_hash)) >> - f->skip_hash = skip_hash; >> + if (istate->repo) { >> + prepare_repo_settings(istate->repo); >> + f->skip_hash = istate->repo->settings.index_skip_hash; >> + } > > Urm, are we ever going to find ourselves in a situation where: > > * We have read the settings for the_repository > * We have an index we're about to write out as our "main index", but > the istate->repo *isn't* the_repository. > * Even then, wouldn't the two copies of the repos have read the same > repo settings? > > But maybe there's a really obvious submodule / worktree / whatever edge > case I'm missing. > > But if not, shouldn't we just always read/write this from > the_repository? I don't understand your concern. We call prepare_repo_settings(istate->repo) just before using these settings, so we are using whatever repository-local config we have available to us. If you're thinking that we could be writing an index but istate->repo is somehow the "wrong" repo, then that is a larger problem. This patch is doing the best thing it can with the information it is given. >> + rm -f .git/index && >> + git -c feature.manyFiles=true \ >> + -c index.skipHash=false add a && >> + test_trailing_hash .git/index >hash && >> + ! test_cmp expect hash > > We had a parallel thread where we discussed "! test_cmp" being an > anti-pattern, i.e. you want them not to be the same, but you want it to > still show a diff, Maybe just "! cmp" ? I couldn't tell from this sentence whether test_cmp or cmp would show the diff, but from testing I see that test_cmp shows the diff (for debugging purposes, I'm sure) while cmp shows the position of the first difference. "! cmp" would work here, since we don't care about what the real hash is. > I.e. either the diff will be meaningless, or we really should be > asserting the actual value we want, not what it shouldn't be. > > so in this case, shouldn't we assert that it's the 0000... value, or the > actual hash (depending on which way around we're testing this)? When it should be the null hash, we assert that it is that value. When it isn't, we do not assert the exact hash because we do not want other modifications to the index (or surrounding tests) to cause that hash to change, causing toil for future contributors. "! cmp" suffices for this case to show that the config inheritance is working correctly. Thanks, -Stolee
On Mon, Dec 12 2022, Derrick Stolee wrote: > On 12/7/2022 5:30 PM, Ævar Arnfjörð Bjarmason wrote: >> >> On Wed, Dec 07 2022, Derrick Stolee via GitGitGadget wrote: >> >>> From: Derrick Stolee <derrickstolee@github.com> >>> [...] >>> diff --git a/read-cache.c b/read-cache.c >>> index fb4d6fb6387..1844953fba7 100644 >>> --- a/read-cache.c >>> +++ b/read-cache.c >>> @@ -2923,12 +2923,13 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, >>> int ieot_entries = 1; >>> struct index_entry_offset_table *ieot = NULL; >>> int nr, nr_threads; >>> - int skip_hash; >>> >>> f = hashfd(tempfile->fd, tempfile->filename.buf); >>> >>> - if (!git_config_get_maybe_bool("index.skiphash", &skip_hash)) >>> - f->skip_hash = skip_hash; >>> + if (istate->repo) { >>> + prepare_repo_settings(istate->repo); >>> + f->skip_hash = istate->repo->settings.index_skip_hash; >>> + } >> >> Urm, are we ever going to find ourselves in a situation where: >> >> * We have read the settings for the_repository >> * We have an index we're about to write out as our "main index", but >> the istate->repo *isn't* the_repository. >> * Even then, wouldn't the two copies of the repos have read the same >> repo settings? >> >> But maybe there's a really obvious submodule / worktree / whatever edge >> case I'm missing. >> >> But if not, shouldn't we just always read/write this from >> the_repository? > > I don't understand your concern. We call prepare_repo_settings(istate->repo) > just before using these settings, so we are using whatever repository-local > config we have available to us. > > If you're thinking that we could be writing an index but istate->repo is > somehow the "wrong" repo, then that is a larger problem. This patch is > doing the best thing it can with the information it is given. It's not a concern, just confusion :) In the preceding step (and this is still the case in your v2) we used git_config_get_maybe_bool(), if we meant to use istate->repo shouldn't we have used repo_config_get_maybe_bool() to begin with? And will we ever get !istate->repo? If not should we BUG() here? Otherwise the 4/4 changes this to a state where we'll no longer read the index.skipHash setting if that "repo" is NULL, but our previous the_repository was non-NULL...
diff --git a/Documentation/config/feature.txt b/Documentation/config/feature.txt index 95975e50912..f0e1d4cb2be 100644 --- a/Documentation/config/feature.txt +++ b/Documentation/config/feature.txt @@ -23,6 +23,9 @@ feature.manyFiles:: working directory. With many files, commands such as `git status` and `git checkout` may be slow and these new defaults improve performance: + +* `index.skipHash=true` speeds up index writes by not computing a trailing + checksum. ++ * `index.version=4` enables path-prefix compression in the index. + * `core.untrackedCache=true` enables the untracked cache. This setting assumes diff --git a/read-cache.c b/read-cache.c index fb4d6fb6387..1844953fba7 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2923,12 +2923,13 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, int ieot_entries = 1; struct index_entry_offset_table *ieot = NULL; int nr, nr_threads; - int skip_hash; f = hashfd(tempfile->fd, tempfile->filename.buf); - if (!git_config_get_maybe_bool("index.skiphash", &skip_hash)) - f->skip_hash = skip_hash; + if (istate->repo) { + prepare_repo_settings(istate->repo); + f->skip_hash = istate->repo->settings.index_skip_hash; + } for (i = removed = extended = 0; i < entries; i++) { if (cache[i]->ce_flags & CE_REMOVE) diff --git a/repo-settings.c b/repo-settings.c index 3021921c53d..3dbd3f0e2ec 100644 --- a/repo-settings.c +++ b/repo-settings.c @@ -47,6 +47,7 @@ void prepare_repo_settings(struct repository *r) } if (manyfiles) { r->settings.index_version = 4; + r->settings.index_skip_hash = 1; r->settings.core_untracked_cache = UNTRACKED_CACHE_WRITE; } @@ -61,6 +62,7 @@ void prepare_repo_settings(struct repository *r) repo_cfg_bool(r, "pack.usesparse", &r->settings.pack_use_sparse, 1); repo_cfg_bool(r, "core.multipackindex", &r->settings.core_multi_pack_index, 1); repo_cfg_bool(r, "index.sparse", &r->settings.sparse_index, 0); + repo_cfg_bool(r, "index.skiphash", &r->settings.index_skip_hash, r->settings.index_skip_hash); /* * The GIT_TEST_MULTI_PACK_INDEX variable is special in that diff --git a/repository.h b/repository.h index 6c461c5b9de..e8c67ffe165 100644 --- a/repository.h +++ b/repository.h @@ -42,6 +42,7 @@ struct repo_settings { struct fsmonitor_settings *fsmonitor; /* lazily loaded */ int index_version; + int index_skip_hash; enum untracked_cache_setting core_untracked_cache; int pack_use_sparse; diff --git a/scalar.c b/scalar.c index 6c52243cdf1..b49bb8c24ec 100644 --- a/scalar.c +++ b/scalar.c @@ -143,6 +143,7 @@ static int set_recommended_config(int reconfigure) { "credential.validate", "false", 1 }, /* GCM4W-only */ { "gc.auto", "0", 1 }, { "gui.GCWarning", "false", 1 }, + { "index.skipHash", "false", 1 }, { "index.threads", "true", 1 }, { "index.version", "4", 1 }, { "merge.stat", "false", 1 }, diff --git a/t/t1600-index.sh b/t/t1600-index.sh index 55816756607..be0a0a8a008 100755 --- a/t/t1600-index.sh +++ b/t/t1600-index.sh @@ -72,7 +72,18 @@ test_expect_success 'index.skipHash config option' ' test_trailing_hash .git/index >hash && echo $(test_oid zero) >expect && test_cmp expect hash && - git fsck + git fsck && + + rm -f .git/index && + git -c feature.manyFiles=true add a && + test_trailing_hash .git/index >hash && + test_cmp expect hash && + + rm -f .git/index && + git -c feature.manyFiles=true \ + -c index.skipHash=false add a && + test_trailing_hash .git/index >hash && + ! test_cmp expect hash ) '