Message ID | 94db10a4e5943d689113693c64633ddffa5508cc.1628888668.git.jonathantanmy@google.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | In grep, no adding submodule ODB as alternates | expand |
On Fri, Aug 13, 2021 at 6:05 PM Jonathan Tan <jonathantanmy@google.com> wrote: > > When reading the config of a submodule, if reading from a blob, read > using an explicitly specified repository instead of by adding the > submodule's ODB as an alternate and then reading an object from > the_repository. Great! At first, I thought this would also allow us to remove another NEEDSWORK comment in grep_submodule(), together with a lock protection: /* * NEEDSWORK: repo_read_gitmodules() might call * add_to_alternates_memory() via config_from_gitmodules(). This * operation causes a race condition with concurrent object readings * performed by the worker threads. That's why we need obj_read_lock() * here. It should be removed once it's no longer necessary to add the * subrepo's odbs to the in-memory alternates list. */ obj_read_lock(); repo_read_gitmodules(subrepo, 0); Back when I wrote this comment, my conclusion was that the alternates mechanics were the only thread-unsafe object-reading operations in repo_read_gitmodules()'s call chains. So once the add-to-alternates mechanics were gone, we could also remove the lock. But with further inspection now, I see that this is not really the case. For example, we have a few global variables in packfile.c collecting some statistics (pack_mmap_calls, pack_open_windows, etc.) which are updated on obj readings from both the_repository *and* submodules. So I no longer think its safe to remove the obj_read_lock() protection here, as the NEEDSWORK comment suggests, even if we are not using the alternates list anymore. Do you want to remove this comment in your patchset? I can also send a follow-up patch explaining this situation and removing the comment (but not the locking), if you prefer. > Signed-off-by: Jonathan Tan <jonathantanmy@google.com> > --- [...] > diff --git a/submodule-config.c b/submodule-config.c > index 2026120fb3..f95344028b 100644 > --- a/submodule-config.c > +++ b/submodule-config.c > @@ -649,9 +649,10 @@ static void config_from_gitmodules(config_fn_t fn, struct repository *repo, void > config_source.file = file; > } else if (repo_get_oid(repo, GITMODULES_INDEX, &oid) >= 0 || > repo_get_oid(repo, GITMODULES_HEAD, &oid) >= 0) { > + config_source.repo = repo; > config_source.blob = oidstr = xstrdup(oid_to_hex(&oid)); > if (repo != the_repository) > - add_to_alternates_memory(repo->objects->odb->path); > + add_submodule_odb_by_path(repo->objects->odb->path); Ok. Like in grep_submodule(), this should no longer add the submodule ODB to the alternates list, so this call is now mostly used as a fallback and also for testing. To see if we are indeed testing this add-to-alternates case, I reverted the change that made the code read from the submodule instead of the_repository: diff --git a/config.c b/config.c index a85c12e6cc..cd37a9dcd9 100644 --- a/config.c +++ b/config.c @@ -1805,7 +1805,7 @@ int git_config_from_blob_oid(config_fn_t fn, unsigned long size; int ret; - buf = repo_read_object_file(repo, oid, &type, &size); + buf = read_object_file(oid, &type, &size); Then, I ran t7814-grep-recurse-submodules.sh , where you've added the GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB=1 envvar. This correctly produced the following error: BUG: submodule.c:205: register_all_submodule_odb_as_alternates() called [...] not ok 23 - grep --recurse-submodules with submodules without .gitmodules in the working tree Nice! So the change made by this patch is covered by test 23. I think it would be nice to mention that in this patch's message.
On Fri, Aug 13, 2021 at 6:05 PM Jonathan Tan <jonathantanmy@google.com> wrote: > Oops, I accidentally deleted this part in my previous reply: > diff --git a/config.c b/config.c > index f33abeab85..a85c12e6cc 100644 > --- a/config.c > +++ b/config.c > @@ -1820,6 +1821,7 @@ int git_config_from_blob_oid(config_fn_t fn, > } > > static int git_config_from_blob_ref(config_fn_t fn, > + struct repository *repo, > const char *name, > void *data) > { > @@ -1827,7 +1829,7 @@ static int git_config_from_blob_ref(config_fn_t fn, > > if (get_oid(name, &oid) < 0) This should be `repo_get_oid(repo, ...)` now. > return error(_("unable to resolve config blob '%s'"), name); > - return git_config_from_blob_oid(fn, name, &oid, data); > + return git_config_from_blob_oid(fn, name, repo, &oid, data); > }
On Mon, Aug 16, 2021 at 11:32 AM Matheus Tavares Bernardino <matheus.bernardino@usp.br> wrote: > > On Fri, Aug 13, 2021 at 6:05 PM Jonathan Tan <jonathantanmy@google.com> wrote: > > > > When reading the config of a submodule, if reading from a blob, read > > using an explicitly specified repository instead of by adding the > > submodule's ODB as an alternate and then reading an object from > > the_repository. > > Great! > > At first, I thought this would also allow us to remove another > NEEDSWORK comment in grep_submodule(), together with a lock > protection: > > /* > * NEEDSWORK: repo_read_gitmodules() might call > * add_to_alternates_memory() via config_from_gitmodules(). This > * operation causes a race condition with concurrent object readings > * performed by the worker threads. That's why we need obj_read_lock() > * here. It should be removed once it's no longer necessary to add the > * subrepo's odbs to the in-memory alternates list. > */ > obj_read_lock(); > repo_read_gitmodules(subrepo, 0); > > Back when I wrote this comment, my conclusion was that the alternates > mechanics were the only thread-unsafe object-reading operations in > repo_read_gitmodules()'s call chains. So once the add-to-alternates > mechanics were gone, we could also remove the lock. > > But with further inspection now, I see that this is not really the > case. For example, we have a few global variables in packfile.c > collecting some statistics (pack_mmap_calls, pack_open_windows, etc.) > which are updated on obj readings from both the_repository *and* > submodules. Sorry, this is incorrect. I forgot that repo_read_object_file() (which is part of repo_read_gitmodules()'s call chain) also acquires the obj_read_mutex before accessing those global variables. So the NEEDSWORK might be right. Nevertheless, I think it might be better to look into repo_read_gitmodules() more carefully before removing this lock. And this is something for another series. Sorry about the noise.
> /* > * NEEDSWORK: repo_read_gitmodules() might call > * add_to_alternates_memory() via config_from_gitmodules(). This > * operation causes a race condition with concurrent object readings > * performed by the worker threads. That's why we need obj_read_lock() > * here. It should be removed once it's no longer necessary to add the > * subrepo's odbs to the in-memory alternates list. > */ > obj_read_lock(); > repo_read_gitmodules(subrepo, 0); > > Back when I wrote this comment, my conclusion was that the alternates > mechanics were the only thread-unsafe object-reading operations in > repo_read_gitmodules()'s call chains. So once the add-to-alternates > mechanics were gone, we could also remove the lock. > > But with further inspection now, I see that this is not really the > case. For example, we have a few global variables in packfile.c > collecting some statistics (pack_mmap_calls, pack_open_windows, etc.) > which are updated on obj readings from both the_repository *and* > submodules. So I no longer think its safe to remove the > obj_read_lock() protection here, as the NEEDSWORK comment suggests, > even if we are not using the alternates list anymore. > > Do you want to remove this comment in your patchset? I can also send a > follow-up patch explaining this situation and removing the comment > (but not the locking), if you prefer. I think you can make the patch yourself - the comment change you describe seems unrelated to this patch set. > Ok. Like in grep_submodule(), this should no longer add the submodule > ODB to the alternates list, so this call is now mostly used as a > fallback and also for testing. > > To see if we are indeed testing this add-to-alternates case, I > reverted the change that made the code read from the submodule instead > of the_repository: > > diff --git a/config.c b/config.c > index a85c12e6cc..cd37a9dcd9 100644 > --- a/config.c > +++ b/config.c > @@ -1805,7 +1805,7 @@ int git_config_from_blob_oid(config_fn_t fn, > unsigned long size; > int ret; > > - buf = repo_read_object_file(repo, oid, &type, &size); > + buf = read_object_file(oid, &type, &size); > > Then, I ran t7814-grep-recurse-submodules.sh , where you've added the > GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB=1 envvar. This correctly > produced the following error: > > BUG: submodule.c:205: register_all_submodule_odb_as_alternates() called > [...] > not ok 23 - grep --recurse-submodules with submodules without > .gitmodules in the working tree > > Nice! So the change made by this patch is covered by test 23. I think > it would be nice to mention that in this patch's message. Thanks for checking this. I'll mention this in the commit message.
> > @@ -1827,7 +1829,7 @@ static int git_config_from_blob_ref(config_fn_t fn, > > > > if (get_oid(name, &oid) < 0) > > This should be `repo_get_oid(repo, ...)` now. Ah, good catch! This wasn't caught by the tests because the submodule config mechanism always passes a full-length hexadecimal string hash as "name" - and probably would never be caught because git_config_from_blob_ref() is only called from config_with_options(), which is called with non-NULL source from only 2 files: submodule-config.c (this one) and builtin/config.c (which most likely will never operate on any repo other than the_repository). I'll refactor the API to avoid this situation in the first place.
> > > @@ -1827,7 +1829,7 @@ static int git_config_from_blob_ref(config_fn_t fn, > > > > > > if (get_oid(name, &oid) < 0) > > > > This should be `repo_get_oid(repo, ...)` now. > > Ah, good catch! This wasn't caught by the tests because the submodule > config mechanism always passes a full-length hexadecimal string hash as > "name" - and probably would never be caught because > git_config_from_blob_ref() is only called from config_with_options(), > which is called with non-NULL source from only 2 files: > submodule-config.c (this one) and builtin/config.c (which most likely > will never operate on any repo other than the_repository). I'll refactor > the API to avoid this situation in the first place. The refactoring I was thinking of was parsing the OID in builtin/config.c and then passing only the OID (instead of a user-provided string that could contain anything), but that doesn't really work because the user-provided string is used in certain outputs. I'll just change it to repo_get_oid() in this patch.
diff --git a/config.c b/config.c index f33abeab85..a85c12e6cc 100644 --- a/config.c +++ b/config.c @@ -1796,6 +1796,7 @@ int git_config_from_mem(config_fn_t fn, int git_config_from_blob_oid(config_fn_t fn, const char *name, + struct repository *repo, const struct object_id *oid, void *data) { @@ -1804,7 +1805,7 @@ int git_config_from_blob_oid(config_fn_t fn, unsigned long size; int ret; - buf = read_object_file(oid, &type, &size); + buf = repo_read_object_file(repo, oid, &type, &size); if (!buf) return error(_("unable to load config blob object '%s'"), name); if (type != OBJ_BLOB) { @@ -1820,6 +1821,7 @@ int git_config_from_blob_oid(config_fn_t fn, } static int git_config_from_blob_ref(config_fn_t fn, + struct repository *repo, const char *name, void *data) { @@ -1827,7 +1829,7 @@ static int git_config_from_blob_ref(config_fn_t fn, if (get_oid(name, &oid) < 0) return error(_("unable to resolve config blob '%s'"), name); - return git_config_from_blob_oid(fn, name, &oid, data); + return git_config_from_blob_oid(fn, name, repo, &oid, data); } char *git_system_config(void) @@ -1958,12 +1960,16 @@ int config_with_options(config_fn_t fn, void *data, * If we have a specific filename, use it. Otherwise, follow the * regular lookup sequence. */ - if (config_source && config_source->use_stdin) + if (config_source && config_source->use_stdin) { return git_config_from_stdin(fn, data); - else if (config_source && config_source->file) + } else if (config_source && config_source->file) { return git_config_from_file(fn, config_source->file, data); - else if (config_source && config_source->blob) - return git_config_from_blob_ref(fn, config_source->blob, data); + } else if (config_source && config_source->blob) { + struct repository *repo = config_source->repo ? + config_source->repo : the_repository; + return git_config_from_blob_ref(fn, repo, config_source->blob, + data); + } return do_git_config_sequence(opts, fn, data); } diff --git a/config.h b/config.h index a2200f3111..147f5e0490 100644 --- a/config.h +++ b/config.h @@ -49,6 +49,8 @@ const char *config_scope_name(enum config_scope scope); struct git_config_source { unsigned int use_stdin:1; const char *file; + /* The repository if blob is not NULL; leave blank for the_repository */ + struct repository *repo; const char *blob; enum config_scope scope; }; @@ -136,6 +138,7 @@ int git_config_from_mem(config_fn_t fn, const char *buf, size_t len, void *data, const struct config_options *opts); int git_config_from_blob_oid(config_fn_t fn, const char *name, + struct repository *repo, const struct object_id *oid, void *data); void git_config_push_parameter(const char *text); void git_config_push_env(const char *spec); diff --git a/submodule-config.c b/submodule-config.c index 2026120fb3..f95344028b 100644 --- a/submodule-config.c +++ b/submodule-config.c @@ -649,9 +649,10 @@ static void config_from_gitmodules(config_fn_t fn, struct repository *repo, void config_source.file = file; } else if (repo_get_oid(repo, GITMODULES_INDEX, &oid) >= 0 || repo_get_oid(repo, GITMODULES_HEAD, &oid) >= 0) { + config_source.repo = repo; config_source.blob = oidstr = xstrdup(oid_to_hex(&oid)); if (repo != the_repository) - add_to_alternates_memory(repo->objects->odb->path); + add_submodule_odb_by_path(repo->objects->odb->path); } else { goto out; } @@ -702,7 +703,7 @@ void gitmodules_config_oid(const struct object_id *commit_oid) if (gitmodule_oid_from_commit(commit_oid, &oid, &rev)) { git_config_from_blob_oid(gitmodules_cb, rev.buf, - &oid, the_repository); + the_repository, &oid, the_repository); } strbuf_release(&rev);
When reading the config of a submodule, if reading from a blob, read using an explicitly specified repository instead of by adding the submodule's ODB as an alternate and then reading an object from the_repository. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- config.c | 18 ++++++++++++------ config.h | 3 +++ submodule-config.c | 5 +++-- 3 files changed, 18 insertions(+), 8 deletions(-)