Message ID | 20211104234942.3473650-3-emilyshaffer@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | cache parent project's gitdir in submodules | expand |
Emily Shaffer <emilyshaffer@google.com> writes: > By using a relative path instead of an absolute path, we can move the > superproject directory around on the filesystem without breaking the > submodule's cache. And by using the path from gitdir to gitdir, we can > move the submodule within the superproject's tree structure without > breaking the submodule's cache, too. All of the above explains what the design choice is, and what benefit the chosen design brings us. But this last one ... > Finally, by pointing at "get_git_common_dir()" instead of > "get_git_dir()", we ensure the link will refer to the parent repo, > not to a specific worktree. ... only says that we choose to point at the common one, not a specific worktree (i.e. what behaviour was chosen by the design), but it is left unclear what benefit it is trying to bring us. Thinking aloud, imagine that there are two worktrees for the superproject. For simplicity, let's further imagine that these worktrees have a clean check-out of the same commit, hence, these two worktrees have the same commit from the same submodule checked out at the same location relative to the project root. The subdirectory in each of these two superproject worktrees that checks out the submodule has .git file (as we "absorb" the gitdir in the modern submodule layout) pointing at somewhere. It probably is OK if they point at the same place, but it might be more natural if these two submodule checkouts are two worktrees for a single submodule repository (this part I am not very clear, and that is why I labeled the discussion "thinking aloud"). It seems to me that both design choices would have equally valid arguments for them. If both of these submodule worktrees point at the "common" dir of the superproject, because the "common" one is part of the primary worktree, which is harder to lose than secondary worktrees, of the superproject, it is presumably harder to go stale when "worktree rm" is done at superproject, which may be a plus. But then from the "cached" pointer, each of these submodule worktrees cannot tell which worktree of the superproject they are checked out as a part of. Of course we can go to the root level of the submodule worktree and do the usual repository discovery to learn where the root level of the superproject worktree it belongs to, but it somehow feels that it defeats half the point of having this "cache" information. If we instead point at the git-dir, from there, it is just one level of indirection to find out where the common dir of the superproject is. > If the new config is present, we can do some optional value-added > behavior, like letting "git status" print additional info about the > submodule's status in relation to its superproject, or like letting the > superproject and submodule share an additional config file separate from > either one's local config. And one value-add that I would think of off the top of my head is to show "we have commit X checked out, which is 4 commits on top of what the superproject's index records for this submodule" when "git status" is run in the submodule's worktree. I do not know that is an operation to specifically optimize for, but by choosing to "cache" the common dir, not the one specific to the worktree of the superporject our submodule checkout is a part of, the chosen design seems to make it harder to implement? Thanks.
On Fri, Nov 05, 2021 at 12:50:22AM -0700, Junio C Hamano wrote: > > Emily Shaffer <emilyshaffer@google.com> writes: > > > By using a relative path instead of an absolute path, we can move the > > superproject directory around on the filesystem without breaking the > > submodule's cache. And by using the path from gitdir to gitdir, we can > > move the submodule within the superproject's tree structure without > > breaking the submodule's cache, too. > > All of the above explains what the design choice is, and what > benefit the chosen design brings us. But this last one ... > > > Finally, by pointing at "get_git_common_dir()" instead of > > "get_git_dir()", we ensure the link will refer to the parent repo, > > not to a specific worktree. > > ... only says that we choose to point at the common one, not a > specific worktree (i.e. what behaviour was chosen by the design), > but it is left unclear what benefit it is trying to bring us. > > Thinking aloud, imagine that there are two worktrees for the > superproject. For simplicity, let's further imagine that these > worktrees have a clean check-out of the same commit, hence, these > two worktrees have the same commit from the same submodule checked > out at the same location relative to the project root. > > The subdirectory in each of these two superproject worktrees that > checks out the submodule has .git file (as we "absorb" the gitdir in > the modern submodule layout) pointing at somewhere. It probably is > OK if they point at the same place, but it might be more natural if > these two submodule checkouts are two worktrees for a single > submodule repository (this part I am not very clear, and that is why > I labeled the discussion "thinking aloud"). > > It seems to me that both design choices would have equally valid > arguments for them. If both of these submodule worktrees point at > the "common" dir of the superproject, because the "common" one is > part of the primary worktree, which is harder to lose than secondary > worktrees, of the superproject, it is presumably harder to go stale > when "worktree rm" is done at superproject, which may be a plus. > But then from the "cached" pointer, each of these submodule > worktrees cannot tell which worktree of the superproject they are > checked out as a part of. Of course we can go to the root level of > the submodule worktree and do the usual repository discovery to > learn where the root level of the superproject worktree it belongs > to, but it somehow feels that it defeats half the point of having > this "cache" information. > > If we instead point at the git-dir, from there, it is just one level > of indirection to find out where the common dir of the superproject > is. > > > If the new config is present, we can do some optional value-added > > behavior, like letting "git status" print additional info about the > > submodule's status in relation to its superproject, or like letting the > > superproject and submodule share an additional config file separate from > > either one's local config. > > And one value-add that I would think of off the top of my head is to > show "we have commit X checked out, which is 4 commits on top of > what the superproject's index records for this submodule" when "git > status" is run in the submodule's worktree. I do not know that is > an operation to specifically optimize for, but by choosing to > "cache" the common dir, not the one specific to the worktree of the > superporject our submodule checkout is a part of, the chosen design > seems to make it harder to implement? Yeah, that is compelling. Sorry not to reply in depth, but I am convinced. - Emily
diff --git a/Documentation/config/submodule.txt b/Documentation/config/submodule.txt index ee454f8126..8cc57fe1c1 100644 --- a/Documentation/config/submodule.txt +++ b/Documentation/config/submodule.txt @@ -91,3 +91,15 @@ submodule.alternateErrorStrategy:: `ignore`, `info`, `die`. Default is `die`. Note that if set to `ignore` or `info`, and if there is an error with the computed alternate, the clone proceeds as if no alternate was specified. + +submodule.superprojectGitDir:: + The relative path from the submodule's gitdir to its superproject's + common dir. When Git is run in a repository, it usually makes no + difference whether this repository is standalone or a submodule, but if + this configuration variable is present, additional behavior may be + possible, such as "git status" printing additional information about + this submodule's status with respect to its superproject. This config + should only be present in projects which are submodules, but is not + guaranteed to be present in every submodule, so only optional + value-added behavior should be linked to it. It is set automatically + during submodule creation. diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c index 6298cbdd4e..f803812225 100644 --- a/builtin/submodule--helper.c +++ b/builtin/submodule--helper.c @@ -1838,6 +1838,10 @@ static int clone_submodule(struct module_clone_data *clone_data) git_config_set_in_file(p, "submodule.alternateErrorStrategy", error_strategy); + git_config_set_in_file(p, "submodule.superprojectGitdir", + relative_path(absolute_path(get_git_common_dir()), + sm_gitdir, &sb)); + free(sm_alternate); free(error_strategy); diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh index d69a5c0032..3c20d42fbe 100755 --- a/t/t7400-submodule-basic.sh +++ b/t/t7400-submodule-basic.sh @@ -109,12 +109,24 @@ submodurl=$(pwd -P) inspect() { sub_dir=$1 && + super_dir=$2 && git -C "$sub_dir" for-each-ref --format='%(refname)' 'refs/heads/*' >heads && { git -C "$sub_dir" symbolic-ref HEAD || :; } >head && git -C "$sub_dir" rev-parse HEAD >head-sha1 && git -C "$sub_dir" update-index --refresh && git -C "$sub_dir" diff-files --exit-code && + + # Ensure that submodule.superprojectGitDir contains the path from the + # submodule's gitdir to the superproject's gitdir. + + super_abs_gitdir=$(git -C "$super_dir" rev-parse --path-format=absolute --git-common-dir) && + sub_abs_gitdir=$(git -C "$sub_dir" rev-parse --path-format=absolute --git-common-dir) && + + [ "$(git -C "$sub_dir" config --get submodule.superprojectGitDir)" = \ + "$(test-tool path-utils relative_path "$super_abs_gitdir" \ + "$sub_abs_gitdir")" ] && + git -C "$sub_dir" clean -n -d -x >untracked } @@ -138,7 +150,7 @@ test_expect_success 'submodule add' ' ) && rm -f heads head untracked && - inspect addtest/submod && + inspect addtest/submod addtest && test_cmp expect heads && test_cmp expect head && test_must_be_empty untracked @@ -240,7 +252,7 @@ test_expect_success 'submodule add --branch' ' ) && rm -f heads head untracked && - inspect addtest/submod-branch && + inspect addtest/submod-branch addtest && test_cmp expect-heads heads && test_cmp expect-head head && test_must_be_empty untracked @@ -256,7 +268,7 @@ test_expect_success 'submodule add with ./ in path' ' ) && rm -f heads head untracked && - inspect addtest/dotsubmod/frotz && + inspect addtest/dotsubmod/frotz addtest && test_cmp expect heads && test_cmp expect head && test_must_be_empty untracked @@ -272,7 +284,7 @@ test_expect_success 'submodule add with /././ in path' ' ) && rm -f heads head untracked && - inspect addtest/dotslashdotsubmod/frotz && + inspect addtest/dotslashdotsubmod/frotz addtest && test_cmp expect heads && test_cmp expect head && test_must_be_empty untracked @@ -288,7 +300,7 @@ test_expect_success 'submodule add with // in path' ' ) && rm -f heads head untracked && - inspect addtest/slashslashsubmod/frotz && + inspect addtest/slashslashsubmod/frotz addtest && test_cmp expect heads && test_cmp expect head && test_must_be_empty untracked @@ -304,7 +316,7 @@ test_expect_success 'submodule add with /.. in path' ' ) && rm -f heads head untracked && - inspect addtest/realsubmod && + inspect addtest/realsubmod addtest && test_cmp expect heads && test_cmp expect head && test_must_be_empty untracked @@ -320,7 +332,7 @@ test_expect_success 'submodule add with ./, /.. and // in path' ' ) && rm -f heads head untracked && - inspect addtest/realsubmod2 && + inspect addtest/realsubmod2 addtest && test_cmp expect heads && test_cmp expect head && test_must_be_empty untracked @@ -351,7 +363,7 @@ test_expect_success 'submodule add in subdirectory' ' ) && rm -f heads head untracked && - inspect addtest/realsubmod3 && + inspect addtest/realsubmod3 addtest && test_cmp expect heads && test_cmp expect head && test_must_be_empty untracked @@ -492,7 +504,7 @@ test_expect_success 'update should work when path is an empty dir' ' git submodule update -q >update.out && test_must_be_empty update.out && - inspect init && + inspect init . && test_cmp expect head-sha1 ' @@ -551,7 +563,7 @@ test_expect_success 'update should checkout rev1' ' echo "$rev1" >expect && git submodule update init && - inspect init && + inspect init . && test_cmp expect head-sha1 '
Teach submodules a reference to their superproject's gitdir. This allows us to A) know that we're running from a submodule, and B) have a shortcut to the superproject's vitals, for example, configs. By using a relative path instead of an absolute path, we can move the superproject directory around on the filesystem without breaking the submodule's cache. And by using the path from gitdir to gitdir, we can move the submodule within the superproject's tree structure without breaking the submodule's cache, too. Finally, by pointing at "get_git_common_dir()" instead of "get_git_dir()", we ensure the link will refer to the parent repo, not to a specific worktree. Since this hint value is only introduced during new submodule creation via `git submodule add`, though, there is more work to do to allow the record to be created at other times. If the new config is present, we can do some optional value-added behavior, like letting "git status" print additional info about the submodule's status in relation to its superproject, or like letting the superproject and submodule share an additional config file separate from either one's local config. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Helped-by: Junio C Hamano <gitster@pobox.com> --- Documentation/config/submodule.txt | 12 +++++++++++ builtin/submodule--helper.c | 4 ++++ t/t7400-submodule-basic.sh | 32 ++++++++++++++++++++---------- 3 files changed, 38 insertions(+), 10 deletions(-)