diff mbox series

[v5,2/4] introduce submodule.superprojectGitDir record

Message ID 20211104234942.3473650-3-emilyshaffer@google.com (mailing list archive)
State New, archived
Headers show
Series cache parent project's gitdir in submodules | expand

Commit Message

Emily Shaffer Nov. 4, 2021, 11:49 p.m. UTC
Teach submodules a reference to their superproject's gitdir. This allows
us to A) know that we're running from a submodule, and B) have a
shortcut to the superproject's vitals, for example, configs.

By using a relative path instead of an absolute path, we can move the
superproject directory around on the filesystem without breaking the
submodule's cache. And by using the path from gitdir to gitdir, we can
move the submodule within the superproject's tree structure without
breaking the submodule's cache, too. Finally, by pointing at
"get_git_common_dir()" instead of "get_git_dir()", we ensure the link
will refer to the parent repo, not to a specific worktree.

Since this hint value is only introduced during new submodule creation
via `git submodule add`, though, there is more work to do to allow the
record to be created at other times.

If the new config is present, we can do some optional value-added
behavior, like letting "git status" print additional info about the
submodule's status in relation to its superproject, or like letting the
superproject and submodule share an additional config file separate from
either one's local config.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config/submodule.txt | 12 +++++++++++
 builtin/submodule--helper.c        |  4 ++++
 t/t7400-submodule-basic.sh         | 32 ++++++++++++++++++++----------
 3 files changed, 38 insertions(+), 10 deletions(-)

Comments

Junio C Hamano Nov. 5, 2021, 7:50 a.m. UTC | #1
Emily Shaffer <emilyshaffer@google.com> writes:

> By using a relative path instead of an absolute path, we can move the
> superproject directory around on the filesystem without breaking the
> submodule's cache. And by using the path from gitdir to gitdir, we can
> move the submodule within the superproject's tree structure without
> breaking the submodule's cache, too.

All of the above explains what the design choice is, and what
benefit the chosen design brings us.  But this last one ...

> Finally, by pointing at "get_git_common_dir()" instead of
> "get_git_dir()", we ensure the link will refer to the parent repo,
> not to a specific worktree.

... only says that we choose to point at the common one, not a
specific worktree (i.e. what behaviour was chosen by the design),
but it is left unclear what benefit it is trying to bring us.

Thinking aloud, imagine that there are two worktrees for the
superproject.  For simplicity, let's further imagine that these
worktrees have a clean check-out of the same commit, hence, these
two worktrees have the same commit from the same submodule checked
out at the same location relative to the project root.

The subdirectory in each of these two superproject worktrees that
checks out the submodule has .git file (as we "absorb" the gitdir in
the modern submodule layout) pointing at somewhere.  It probably is
OK if they point at the same place, but it might be more natural if
these two submodule checkouts are two worktrees for a single
submodule repository (this part I am not very clear, and that is why
I labeled the discussion "thinking aloud").

It seems to me that both design choices would have equally valid
arguments for them.  If both of these submodule worktrees point at
the "common" dir of the superproject, because the "common" one is
part of the primary worktree, which is harder to lose than secondary
worktrees, of the superproject, it is presumably harder to go stale
when "worktree rm" is done at superproject, which may be a plus.
But then from the "cached" pointer, each of these submodule
worktrees cannot tell which worktree of the superproject they are
checked out as a part of.  Of course we can go to the root level of
the submodule worktree and do the usual repository discovery to
learn where the root level of the superproject worktree it belongs
to, but it somehow feels that it defeats half the point of having
this "cache" information.

If we instead point at the git-dir, from there, it is just one level
of indirection to find out where the common dir of the superproject
is.

> If the new config is present, we can do some optional value-added
> behavior, like letting "git status" print additional info about the
> submodule's status in relation to its superproject, or like letting the
> superproject and submodule share an additional config file separate from
> either one's local config.

And one value-add that I would think of off the top of my head is to
show "we have commit X checked out, which is 4 commits on top of
what the superproject's index records for this submodule" when "git
status" is run in the submodule's worktree.  I do not know that is
an operation to specifically optimize for, but by choosing to
"cache" the common dir, not the one specific to the worktree of the
superporject our submodule checkout is a part of, the chosen design
seems to make it harder to implement?

Thanks.
Emily Shaffer Nov. 8, 2021, 11:16 p.m. UTC | #2
On Fri, Nov 05, 2021 at 12:50:22AM -0700, Junio C Hamano wrote:
> 
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > By using a relative path instead of an absolute path, we can move the
> > superproject directory around on the filesystem without breaking the
> > submodule's cache. And by using the path from gitdir to gitdir, we can
> > move the submodule within the superproject's tree structure without
> > breaking the submodule's cache, too.
> 
> All of the above explains what the design choice is, and what
> benefit the chosen design brings us.  But this last one ...
> 
> > Finally, by pointing at "get_git_common_dir()" instead of
> > "get_git_dir()", we ensure the link will refer to the parent repo,
> > not to a specific worktree.
> 
> ... only says that we choose to point at the common one, not a
> specific worktree (i.e. what behaviour was chosen by the design),
> but it is left unclear what benefit it is trying to bring us.
> 
> Thinking aloud, imagine that there are two worktrees for the
> superproject.  For simplicity, let's further imagine that these
> worktrees have a clean check-out of the same commit, hence, these
> two worktrees have the same commit from the same submodule checked
> out at the same location relative to the project root.
> 
> The subdirectory in each of these two superproject worktrees that
> checks out the submodule has .git file (as we "absorb" the gitdir in
> the modern submodule layout) pointing at somewhere.  It probably is
> OK if they point at the same place, but it might be more natural if
> these two submodule checkouts are two worktrees for a single
> submodule repository (this part I am not very clear, and that is why
> I labeled the discussion "thinking aloud").
> 
> It seems to me that both design choices would have equally valid
> arguments for them.  If both of these submodule worktrees point at
> the "common" dir of the superproject, because the "common" one is
> part of the primary worktree, which is harder to lose than secondary
> worktrees, of the superproject, it is presumably harder to go stale
> when "worktree rm" is done at superproject, which may be a plus.
> But then from the "cached" pointer, each of these submodule
> worktrees cannot tell which worktree of the superproject they are
> checked out as a part of.  Of course we can go to the root level of
> the submodule worktree and do the usual repository discovery to
> learn where the root level of the superproject worktree it belongs
> to, but it somehow feels that it defeats half the point of having
> this "cache" information.
> 
> If we instead point at the git-dir, from there, it is just one level
> of indirection to find out where the common dir of the superproject
> is.
> 
> > If the new config is present, we can do some optional value-added
> > behavior, like letting "git status" print additional info about the
> > submodule's status in relation to its superproject, or like letting the
> > superproject and submodule share an additional config file separate from
> > either one's local config.
> 
> And one value-add that I would think of off the top of my head is to
> show "we have commit X checked out, which is 4 commits on top of
> what the superproject's index records for this submodule" when "git
> status" is run in the submodule's worktree.  I do not know that is
> an operation to specifically optimize for, but by choosing to
> "cache" the common dir, not the one specific to the worktree of the
> superporject our submodule checkout is a part of, the chosen design
> seems to make it harder to implement?

Yeah, that is compelling. Sorry not to reply in depth, but I am
convinced.

 - Emily
diff mbox series

Patch

diff --git a/Documentation/config/submodule.txt b/Documentation/config/submodule.txt
index ee454f8126..8cc57fe1c1 100644
--- a/Documentation/config/submodule.txt
+++ b/Documentation/config/submodule.txt
@@ -91,3 +91,15 @@  submodule.alternateErrorStrategy::
 	`ignore`, `info`, `die`. Default is `die`. Note that if set to `ignore`
 	or `info`, and if there is an error with the computed alternate, the
 	clone proceeds as if no alternate was specified.
+
+submodule.superprojectGitDir::
+	The relative path from the submodule's gitdir to its superproject's
+	common dir. When Git is run in a repository, it usually makes no
+	difference whether this repository is standalone or a submodule, but if
+	this configuration variable is present, additional behavior may be
+	possible, such as "git status" printing additional information about
+	this submodule's status with respect to its superproject. This config
+	should only be present in projects which are submodules, but is not
+	guaranteed to be present in every submodule, so only optional
+	value-added behavior should be linked to it. It is set automatically
+	during submodule creation.
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 6298cbdd4e..f803812225 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1838,6 +1838,10 @@  static int clone_submodule(struct module_clone_data *clone_data)
 		git_config_set_in_file(p, "submodule.alternateErrorStrategy",
 				       error_strategy);
 
+	git_config_set_in_file(p, "submodule.superprojectGitdir",
+			       relative_path(absolute_path(get_git_common_dir()),
+					     sm_gitdir, &sb));
+
 	free(sm_alternate);
 	free(error_strategy);
 
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index d69a5c0032..3c20d42fbe 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -109,12 +109,24 @@  submodurl=$(pwd -P)
 
 inspect() {
 	sub_dir=$1 &&
+	super_dir=$2 &&
 
 	git -C "$sub_dir" for-each-ref --format='%(refname)' 'refs/heads/*' >heads &&
 	{ git -C "$sub_dir" symbolic-ref HEAD || :; } >head &&
 	git -C "$sub_dir" rev-parse HEAD >head-sha1 &&
 	git -C "$sub_dir" update-index --refresh &&
 	git -C "$sub_dir" diff-files --exit-code &&
+
+	# Ensure that submodule.superprojectGitDir contains the path from the
+	# submodule's gitdir to the superproject's gitdir.
+
+	super_abs_gitdir=$(git -C "$super_dir" rev-parse --path-format=absolute --git-common-dir) &&
+	sub_abs_gitdir=$(git -C "$sub_dir" rev-parse --path-format=absolute --git-common-dir) &&
+
+	[ "$(git -C "$sub_dir" config --get submodule.superprojectGitDir)" = \
+	  "$(test-tool path-utils relative_path "$super_abs_gitdir" \
+						"$sub_abs_gitdir")" ] &&
+
 	git -C "$sub_dir" clean -n -d -x >untracked
 }
 
@@ -138,7 +150,7 @@  test_expect_success 'submodule add' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/submod &&
+	inspect addtest/submod addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -240,7 +252,7 @@  test_expect_success 'submodule add --branch' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/submod-branch &&
+	inspect addtest/submod-branch addtest &&
 	test_cmp expect-heads heads &&
 	test_cmp expect-head head &&
 	test_must_be_empty untracked
@@ -256,7 +268,7 @@  test_expect_success 'submodule add with ./ in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/dotsubmod/frotz &&
+	inspect addtest/dotsubmod/frotz addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -272,7 +284,7 @@  test_expect_success 'submodule add with /././ in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/dotslashdotsubmod/frotz &&
+	inspect addtest/dotslashdotsubmod/frotz addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -288,7 +300,7 @@  test_expect_success 'submodule add with // in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/slashslashsubmod/frotz &&
+	inspect addtest/slashslashsubmod/frotz addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -304,7 +316,7 @@  test_expect_success 'submodule add with /.. in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/realsubmod &&
+	inspect addtest/realsubmod addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -320,7 +332,7 @@  test_expect_success 'submodule add with ./, /.. and // in path' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/realsubmod2 &&
+	inspect addtest/realsubmod2 addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -351,7 +363,7 @@  test_expect_success 'submodule add in subdirectory' '
 	) &&
 
 	rm -f heads head untracked &&
-	inspect addtest/realsubmod3 &&
+	inspect addtest/realsubmod3 addtest &&
 	test_cmp expect heads &&
 	test_cmp expect head &&
 	test_must_be_empty untracked
@@ -492,7 +504,7 @@  test_expect_success 'update should work when path is an empty dir' '
 	git submodule update -q >update.out &&
 	test_must_be_empty update.out &&
 
-	inspect init &&
+	inspect init . &&
 	test_cmp expect head-sha1
 '
 
@@ -551,7 +563,7 @@  test_expect_success 'update should checkout rev1' '
 	echo "$rev1" >expect &&
 
 	git submodule update init &&
-	inspect init &&
+	inspect init . &&
 
 	test_cmp expect head-sha1
 '