diff mbox series

[8/8] submodule: fix bug and remove add_submodule_odb()

Message ID 20220210044152.78352-9-chooglen@google.com (mailing list archive)
State Superseded
Headers show
Series fetch --recurse-submodules: fetch unpopulated submodules | expand

Commit Message

Glen Choo Feb. 10, 2022, 4:41 a.m. UTC
add_submodule_odb() is a hack - it adds a submodule's odb as an
alternate, allowing the submodule's objects to be read via
the_repository. Its last caller is submodule_has_commits(), which calls
add_submodule_odb() to prepare for check_has_commit(). This used to be
necessary because check_has_commit() used the_repository's odb, but this
is longer true as of 13a2f620b2 (submodule: pass repo to
check_has_commit(), 2021-10-08).

Removing add_submodule_odb() reveals a bug in check_has_commit(), where
check_has_commit() will segfault if the submodule is missing (e.g. the
user has not init-ed the submodule). This happens because the
submodule's struct repository cannot be initialized, but
check_has_commit() tries to cleanup the uninitialized struct anyway.
This was masked by add_submodule_odb(), because add_submodule_odb()
fails when the submodule is missing, causing the caller to return early
and avoid calling check_has_commit().

Fix the bug and remove the call to add_submodule_odb(). Since
add_submodule_odb() has no more callers, remove it too.

Note that submodule odbs can still by added as alternates via
add_submodule_odb_by_path().

Signed-off-by: Glen Choo <chooglen@google.com>
---
This bug only exists because we can't call repo_clear() twice on the
same struct repository. So instead of just fixing this site, an
alternative (and maybe better) fix would be to fix repo_clear(). If
others think that's a good idea, I'll do that instead.

 submodule.c | 35 ++---------------------------------
 submodule.h |  9 ++++-----
 2 files changed, 6 insertions(+), 38 deletions(-)

Comments

Junio C Hamano Feb. 10, 2022, 10:54 p.m. UTC | #1
Glen Choo <chooglen@google.com> writes:

> add_submodule_odb() is a hack - it adds a submodule's odb as an
> alternate, allowing the submodule's objects to be read via
> the_repository. Its last caller is submodule_has_commits(), which calls
> add_submodule_odb() to prepare for check_has_commit(). This used to be
> necessary because check_has_commit() used the_repository's odb, but this
> is longer true as of 13a2f620b2 (submodule: pass repo to
> check_has_commit(), 2021-10-08).

Yes!  I wonder if we can do this much earlier in the series (or even
an independent clean-up that the rest of the series depends on) and
have it graduate earlier?
Jonathan Tan Feb. 10, 2022, 11:04 p.m. UTC | #2
Glen Choo <chooglen@google.com> writes:
> add_submodule_odb() is a hack - it adds a submodule's odb as an
> alternate, allowing the submodule's objects to be read via
> the_repository. Its last caller is submodule_has_commits(), which calls
> add_submodule_odb() to prepare for check_has_commit(). This used to be
> necessary because check_has_commit() used the_repository's odb, but this
> is longer true as of 13a2f620b2 (submodule: pass repo to
> check_has_commit(), 2021-10-08).
> 
> Removing add_submodule_odb() reveals a bug in check_has_commit(), where
> check_has_commit() will segfault if the submodule is missing (e.g. the
> user has not init-ed the submodule). This happens because the
> submodule's struct repository cannot be initialized, but
> check_has_commit() tries to cleanup the uninitialized struct anyway.
> This was masked by add_submodule_odb(), because add_submodule_odb()
> fails when the submodule is missing, causing the caller to return early
> and avoid calling check_has_commit().
> 
> Fix the bug and remove the call to add_submodule_odb(). Since
> add_submodule_odb() has no more callers, remove it too.
> 
> Note that submodule odbs can still by added as alternates via
> add_submodule_odb_by_path().
> 
> Signed-off-by: Glen Choo <chooglen@google.com>
> ---
> This bug only exists because we can't call repo_clear() twice on the
> same struct repository. So instead of just fixing this site, an
> alternative (and maybe better) fix would be to fix repo_clear(). If
> others think that's a good idea, I'll do that instead.

Reading the first paragraph of the commit message, I'm given the
impression that this is the last site in which we attempt to add a
submodule ODB as an alternate, but that is not true. This is indeed the
last usage of add_submodule_odb(), but add_submodule_odb_by_path() still
exists.

I think the primary point of this commit is to fix a latent bug in
check_has_commit(), and add_submodule_odb()'s role here is just hiding
it. Its hacky behavior does not play a role.

I would write the commit message like this:

  submodule: fix latent check_has_commit() bug

  check_has_commit() will attempt to clear a non-initialized struct
  repository if initialization fails. This bug is masked by its only
  caller, submodule_has_commits(), first calling add_submodule_odb().
  The latter fails if the repo does not exist, making
  submodule_has_commits() exit early and not invoke check_has_commit()
  in a situation in which initialization would fail.

  Fix this bug, and because calling add_submodule_odb() is no longer
  necessary as of 13a2f620b2 (submodule: pass repo to
  check_has_commit(), 2021-10-08), remove that call too.

  This is the last caller of add_submodule_odb(), so remove that
  function. (Adding submodule ODBs as alternates is still present in the
  form of add_submodule_odb_by_path().)
Glen Choo Feb. 11, 2022, 3:13 a.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> add_submodule_odb() is a hack - it adds a submodule's odb as an
>> alternate, allowing the submodule's objects to be read via
>> the_repository. Its last caller is submodule_has_commits(), which calls
>> add_submodule_odb() to prepare for check_has_commit(). This used to be
>> necessary because check_has_commit() used the_repository's odb, but this
>> is longer true as of 13a2f620b2 (submodule: pass repo to
>> check_has_commit(), 2021-10-08).
>
> Yes!  I wonder if we can do this much earlier in the series (or even
> an independent clean-up that the rest of the series depends on) and
> have it graduate earlier?

This patch is totally conflict-free and dependency-free with regard to
the rest of the series, so there's almost no overhead to sending this as
an independent clean up.

And since you seem interested in seeing this graduate early, I'll send
it out independently :)
Glen Choo Feb. 11, 2022, 3:18 a.m. UTC | #4
Jonathan Tan <jonathantanmy@google.com> writes:

> Glen Choo <chooglen@google.com> writes:
>> add_submodule_odb() is a hack - it adds a submodule's odb as an
>> alternate, allowing the submodule's objects to be read via
>> the_repository. Its last caller is submodule_has_commits(), which calls
>> add_submodule_odb() to prepare for check_has_commit(). This used to be
>> necessary because check_has_commit() used the_repository's odb, but this
>> is longer true as of 13a2f620b2 (submodule: pass repo to
>> check_has_commit(), 2021-10-08).
>> 
>> Removing add_submodule_odb() reveals a bug in check_has_commit(), where
>> check_has_commit() will segfault if the submodule is missing (e.g. the
>> user has not init-ed the submodule). This happens because the
>> submodule's struct repository cannot be initialized, but
>> check_has_commit() tries to cleanup the uninitialized struct anyway.
>> This was masked by add_submodule_odb(), because add_submodule_odb()
>> fails when the submodule is missing, causing the caller to return early
>> and avoid calling check_has_commit().
>> 
>> Fix the bug and remove the call to add_submodule_odb(). Since
>> add_submodule_odb() has no more callers, remove it too.
>> 
>> Note that submodule odbs can still by added as alternates via
>> add_submodule_odb_by_path().
>> 
>> Signed-off-by: Glen Choo <chooglen@google.com>
>> ---
>> This bug only exists because we can't call repo_clear() twice on the
>> same struct repository. So instead of just fixing this site, an
>> alternative (and maybe better) fix would be to fix repo_clear(). If
>> others think that's a good idea, I'll do that instead.
>
> Reading the first paragraph of the commit message, I'm given the
> impression that this is the last site in which we attempt to add a
> submodule ODB as an alternate, but that is not true. This is indeed the
> last usage of add_submodule_odb(), but add_submodule_odb_by_path() still
> exists.
>
> I think the primary point of this commit is to fix a latent bug in
> check_has_commit(), and add_submodule_odb()'s role here is just hiding
> it. Its hacky behavior does not play a role.
>
> I would write the commit message like this:
>
>   submodule: fix latent check_has_commit() bug
>
>   check_has_commit() will attempt to clear a non-initialized struct
>   repository if initialization fails. This bug is masked by its only
>   caller, submodule_has_commits(), first calling add_submodule_odb().
>   The latter fails if the repo does not exist, making
>   submodule_has_commits() exit early and not invoke check_has_commit()
>   in a situation in which initialization would fail.
>
>   Fix this bug, and because calling add_submodule_odb() is no longer
>   necessary as of 13a2f620b2 (submodule: pass repo to
>   check_has_commit(), 2021-10-08), remove that call too.
>
>   This is the last caller of add_submodule_odb(), so remove that
>   function. (Adding submodule ODBs as alternates is still present in the
>   form of add_submodule_odb_by_path().)

Hm.. that is a good point, the commit message seems to promise more than
what it actually delivers. I'll take your suggestion, thanks!
Junio C Hamano Feb. 11, 2022, 5:19 p.m. UTC | #5
Jonathan Tan <jonathantanmy@google.com> writes:

> I would write the commit message like this:
>
>   submodule: fix latent check_has_commit() bug
>
>   check_has_commit() will attempt to clear a non-initialized struct
>   repository if initialization fails. This bug is masked by its only
>   caller, submodule_has_commits(), first calling add_submodule_odb().
>   The latter fails if the repo does not exist, making
>   submodule_has_commits() exit early and not invoke check_has_commit()
>   in a situation in which initialization would fail.
>
>   Fix this bug, and because calling add_submodule_odb() is no longer
>   necessary as of 13a2f620b2 (submodule: pass repo to
>   check_has_commit(), 2021-10-08), remove that call too.
>
>   This is the last caller of add_submodule_odb(), so remove that
>   function. (Adding submodule ODBs as alternates is still present in the
>   form of add_submodule_odb_by_path().)

Looks more clearly explained.  

We still end up calling add_to_alternate_memory(), so I take the
"let's have this early" back.
Glen Choo Feb. 14, 2022, 2:52 a.m. UTC | #6
Junio C Hamano <gitster@pobox.com> writes:

> Jonathan Tan <jonathantanmy@google.com> writes:
>
>> I would write the commit message like this:
>>
>>   submodule: fix latent check_has_commit() bug
>>
>>   check_has_commit() will attempt to clear a non-initialized struct
>>   repository if initialization fails. This bug is masked by its only
>>   caller, submodule_has_commits(), first calling add_submodule_odb().
>>   The latter fails if the repo does not exist, making
>>   submodule_has_commits() exit early and not invoke check_has_commit()
>>   in a situation in which initialization would fail.
>>
>>   Fix this bug, and because calling add_submodule_odb() is no longer
>>   necessary as of 13a2f620b2 (submodule: pass repo to
>>   check_has_commit(), 2021-10-08), remove that call too.
>>
>>   This is the last caller of add_submodule_odb(), so remove that
>>   function. (Adding submodule ODBs as alternates is still present in the
>>   form of add_submodule_odb_by_path().)
>
> Looks more clearly explained.  
>
> We still end up calling add_to_alternate_memory(), so I take the
> "let's have this early" back.

Ok, so I won't send this patch separately. Thanks Jonathan for making
things clearer :)
diff mbox series

Patch

diff --git a/submodule.c b/submodule.c
index 0c02bbc9c3..fdfddd3aac 100644
--- a/submodule.c
+++ b/submodule.c
@@ -168,26 +168,6 @@  void stage_updated_gitmodules(struct index_state *istate)
 
 static struct string_list added_submodule_odb_paths = STRING_LIST_INIT_NODUP;
 
-/* TODO: remove this function, use repo_submodule_init instead. */
-int add_submodule_odb(const char *path)
-{
-	struct strbuf objects_directory = STRBUF_INIT;
-	int ret = 0;
-
-	ret = strbuf_git_path_submodule(&objects_directory, path, "objects/");
-	if (ret)
-		goto done;
-	if (!is_directory(objects_directory.buf)) {
-		ret = -1;
-		goto done;
-	}
-	string_list_insert(&added_submodule_odb_paths,
-			   strbuf_detach(&objects_directory, NULL));
-done:
-	strbuf_release(&objects_directory);
-	return ret;
-}
-
 void add_submodule_odb_by_path(const char *path)
 {
 	string_list_insert(&added_submodule_odb_paths, xstrdup(path));
@@ -972,7 +952,8 @@  static int check_has_commit(const struct object_id *oid, void *data)
 
 	if (repo_submodule_init(&subrepo, cb->repo, cb->path, cb->super_oid)) {
 		cb->result = 0;
-		goto cleanup;
+		/* subrepo failed to init, so don't clean it up. */
+		return 0;
 	}
 
 	type = oid_object_info(&subrepo, oid, NULL);
@@ -1003,18 +984,6 @@  static int submodule_has_commits(struct repository *r,
 {
 	struct has_commit_data has_commit = { r, 1, path, super_oid };
 
-	/*
-	 * Perform a cheap, but incorrect check for the existence of 'commits'.
-	 * This is done by adding the submodule's object store to the in-core
-	 * object store, and then querying for each commit's existence.  If we
-	 * do not have the commit object anywhere, there is no chance we have
-	 * it in the object store of the correct submodule and have it
-	 * reachable from a ref, so we can fail early without spawning rev-list
-	 * which is expensive.
-	 */
-	if (add_submodule_odb(path))
-		return 0;
-
 	oid_array_for_each_unique(commits, check_has_commit, &has_commit);
 
 	if (has_commit.result) {
diff --git a/submodule.h b/submodule.h
index 784ceffc0e..ca1f12b78b 100644
--- a/submodule.h
+++ b/submodule.h
@@ -103,12 +103,11 @@  int submodule_uses_gitfile(const char *path);
 int bad_to_remove_submodule(const char *path, unsigned flags);
 
 /*
- * Call add_submodule_odb() to add the submodule at the given path to a list.
- * When register_all_submodule_odb_as_alternates() is called, the object stores
- * of all submodules in that list will be added as alternates in
- * the_repository.
+ * Call add_submodule_odb_by_path() to add the submodule at the given
+ * path to a list. When register_all_submodule_odb_as_alternates() is
+ * called, the object stores of all submodules in that list will be
+ * added as alternates in the_repository.
  */
-int add_submodule_odb(const char *path);
 void add_submodule_odb_by_path(const char *path);
 int register_all_submodule_odb_as_alternates(void);