Message ID | 20210616004508.87186-3-emilyshaffer@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | cache parent project's gitdir in submodules | expand |
Emily Shaffer <emilyshaffer@google.com> writes: > Teach submodules a reference to their superproject's gitdir. This allows > us to A) know that we're running from a submodule, and B) have a > shortcut to the superproject's vitals, for example, configs. > > By using a relative path instead of an absolute path, we can move the > superproject directory around on the filesystem without breaking the > submodule's cache. > > Since this cached value is only introduced during new submodule creation > via `git submodule add`, though, there is more work to do to allow the > cache to be created at other times. > > Signed-off-by: Emily Shaffer <emilyshaffer@google.com> > --- > Documentation/config/submodule.txt | 12 +++++++++ > builtin/submodule--helper.c | 4 +++ > t/t7400-submodule-basic.sh | 40 ++++++++++++++++-------------- > 3 files changed, 38 insertions(+), 18 deletions(-) > > diff --git a/Documentation/config/submodule.txt b/Documentation/config/submodule.txt > index d7a63c8c12..7c459cc19e 100644 > --- a/Documentation/config/submodule.txt > +++ b/Documentation/config/submodule.txt > @@ -90,3 +90,15 @@ submodule.alternateErrorStrategy:: > `ignore`, `info`, `die`. Default is `die`. Note that if set to `ignore` > or `info`, and if there is an error with the computed alternate, the > clone proceeds as if no alternate was specified. > + > +submodule.superprojectGitDir:: > + The relative path from the submodule's worktree to the superproject's > + gitdir. This config should only be present in projects which are > + submodules, but is not guaranteed to be present in every submodule. It > + is set automatically during submodule creation. > ++ > + In situations where more than one superproject references the same > + submodule worktree, the value of this config and the behavior of > + operations which use it are undefined. To reference a single project > + from multiple superprojects, it is better to create a worktree of the > + submodule for each superproject. You'd need to dedent the second paragraph that follows a lone '+' sign to typeset this correctly. The new paragraph suggests separate worktrees for the same submodule repository, but for that to work correctly, - "git clone [--recurse-submodules]" that clones the second superproject that shares the same submodule with a superproject that we already locally has to support a way for users to tell where to grab that existing submodule from and arrange a new worktree, instead of creating another instance of the submodule repository by cloning it afresh. - The "submodule.superprojectGitDir" needs to be set to per-worktree half of the repo-local configuration file. Because I usually do not pay much attention to the submodule part of the toolset, I may well be mistaken, but I suspect that the former does not exist yet. If I recall correctly, the latter was a NEEDSWORK item in the previous round of this patchset? As I said, I think it is OK for now to stop at declaring that you cannot simply do it without hinting something that may not fully work. Thanks.
Junio C Hamano <gitster@pobox.com> writes: > As I said, I think it is OK for now to stop at declaring that you > cannot simply do it without hinting something that may not fully > work. I'll add the following to the tip of the topic for now. Documentation/config/submodule.txt | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git c/Documentation/config/submodule.txt w/Documentation/config/submodule.txt index 7c459cc19e..58ba63a75e 100644 --- c/Documentation/config/submodule.txt +++ w/Documentation/config/submodule.txt @@ -97,8 +97,5 @@ submodule.superprojectGitDir:: submodules, but is not guaranteed to be present in every submodule. It is set automatically during submodule creation. + - In situations where more than one superproject references the same - submodule worktree, the value of this config and the behavior of - operations which use it are undefined. To reference a single project - from multiple superprojects, it is better to create a worktree of the - submodule for each superproject. +Because of this configuration variable, it is forbidden to use the +same submodule worktree shared by multiple superprojects.
On Wed, Jun 16, 2021 at 01:40:36PM +0900, Junio C Hamano wrote: > > Emily Shaffer <emilyshaffer@google.com> writes: > > +submodule.superprojectGitDir:: > > + The relative path from the submodule's worktree to the superproject's > > + gitdir. This config should only be present in projects which are > > + submodules, but is not guaranteed to be present in every submodule. It > > + is set automatically during submodule creation. > > ++ > > + In situations where more than one superproject references the same > > + submodule worktree, the value of this config and the behavior of > > + operations which use it are undefined. To reference a single project > > + from multiple superprojects, it is better to create a worktree of the > > + submodule for each superproject. > > You'd need to dedent the second paragraph that follows a lone '+' > sign to typeset this correctly. Ok. > > The new paragraph suggests separate worktrees for the same submodule > repository, but for that to work correctly, > > - "git clone [--recurse-submodules]" that clones the second > superproject that shares the same submodule with a superproject > that we already locally has to support a way for users to tell > where to grab that existing submodule from and arrange a new > worktree, instead of creating another instance of the submodule > repository by cloning it afresh. > > - The "submodule.superprojectGitDir" needs to be set to > per-worktree half of the repo-local configuration file. > > Because I usually do not pay much attention to the submodule part of > the toolset, I may well be mistaken, but I suspect that the former > does not exist yet. If I recall correctly, the latter was a NEEDSWORK > item in the previous round of this patchset? > > As I said, I think it is OK for now to stop at declaring that you > cannot simply do it without hinting something that may not fully > work. Yeah, that is all correct. Ok, I will drop the broken suggestion. - Emily
On Wed, Jun 16, 2021 at 01:43:36PM +0900, Junio C Hamano wrote: > > Junio C Hamano <gitster@pobox.com> writes: > > > As I said, I think it is OK for now to stop at declaring that you > > cannot simply do it without hinting something that may not fully > > work. > > I'll add the following to the tip of the topic for now. Just saw this - yes, it looks fine to me. I'll squash that locally in case anybody has more comments and wants a reroll. > > > Documentation/config/submodule.txt | 7 ++----- > 1 file changed, 2 insertions(+), 5 deletions(-) > > diff --git c/Documentation/config/submodule.txt w/Documentation/config/submodule.txt > index 7c459cc19e..58ba63a75e 100644 > --- c/Documentation/config/submodule.txt > +++ w/Documentation/config/submodule.txt > @@ -97,8 +97,5 @@ submodule.superprojectGitDir:: > submodules, but is not guaranteed to be present in every submodule. It > is set automatically during submodule creation. > + > - In situations where more than one superproject references the same > - submodule worktree, the value of this config and the behavior of > - operations which use it are undefined. To reference a single project > - from multiple superprojects, it is better to create a worktree of the > - submodule for each superproject. > +Because of this configuration variable, it is forbidden to use the > +same submodule worktree shared by multiple superprojects.
> Teach submodules a reference to their superproject's gitdir. This allows > us to A) know that we're running from a submodule, and B) have a > shortcut to the superproject's vitals, for example, configs. The first sentence is probably better "Introduce a new config variable storing a submodule's reference to its superproject's gitdir, and teach 'git submodule add' to write it". Also, I think there should be more detail about the planned use both here in the commit message and in the config documentation. Is the plan just to use it for best-effort explanatory messages? (Using it as a true cache is probably too performance-intensive, I would think, since in its absence, we have no idea whether the repo is a submodule and would always have to search to the root of the filesystem.) If it is just for best-effort explanatory messages, maybe write: If present, commands like "git status" in this submodule may print additional information about this submodule's status with respect to its superproject. This would further reinforce that this variable being missing is OK. > diff --git a/Documentation/config/submodule.txt b/Documentation/config/submodule.txt > index d7a63c8c12..7c459cc19e 100644 > --- a/Documentation/config/submodule.txt > +++ b/Documentation/config/submodule.txt > @@ -90,3 +90,15 @@ submodule.alternateErrorStrategy:: > `ignore`, `info`, `die`. Default is `die`. Note that if set to `ignore` > or `info`, and if there is an error with the computed alternate, the > clone proceeds as if no alternate was specified. > + > +submodule.superprojectGitDir:: > + The relative path from the submodule's worktree to the superproject's > + gitdir. This config should only be present in projects which are > + submodules, but is not guaranteed to be present in every submodule. It > + is set automatically during submodule creation. > ++ > + In situations where more than one superproject references the same > + submodule worktree, the value of this config and the behavior of > + operations which use it are undefined. To reference a single project > + from multiple superprojects, it is better to create a worktree of the > + submodule for each superproject. So my suggestion would be: The relative path from this repository's worktree to its superproject's gitdir. When Git is run in a repository, it usually makes no difference whether this repository is standalone or a submodule, but if this configuration variable is present, commands like "git status" in this submodule may print additional information about this submodule's status with respect to its superproject. (I believe Junio has commented on the second paragraph - I don't have additional comments on that.)
On Tue, Jul 27, 2021 at 10:46:50AM -0700, Jonathan Tan wrote: > > > Teach submodules a reference to their superproject's gitdir. This allows > > us to A) know that we're running from a submodule, and B) have a > > shortcut to the superproject's vitals, for example, configs. > > The first sentence is probably better "Introduce a new config variable > storing a submodule's reference to its superproject's gitdir, and teach > 'git submodule add' to write it". > > Also, I think there should be more detail about the planned use both > here in the commit message and in the config documentation. Is the plan > just to use it for best-effort explanatory messages? (Using it as a true > cache is probably too performance-intensive, I would think, since in its > absence, we have no idea whether the repo is a submodule and would > always have to search to the root of the filesystem.) If it is just for > best-effort explanatory messages, maybe write: > > If present, commands like "git status" in this submodule may print > additional information about this submodule's status with respect to > its superproject. > > This would further reinforce that this variable being missing is OK. Ok, I'll expand the commit message. The first use case for this extra pointer is for a shared config between superproject and submodule, which I've sent a series for already; I'll mention that in the commit message too. > > > diff --git a/Documentation/config/submodule.txt b/Documentation/config/submodule.txt > > index d7a63c8c12..7c459cc19e 100644 > > --- a/Documentation/config/submodule.txt > > +++ b/Documentation/config/submodule.txt > > @@ -90,3 +90,15 @@ submodule.alternateErrorStrategy:: > > `ignore`, `info`, `die`. Default is `die`. Note that if set to `ignore` > > or `info`, and if there is an error with the computed alternate, the > > clone proceeds as if no alternate was specified. > > + > > +submodule.superprojectGitDir:: > > + The relative path from the submodule's worktree to the superproject's > > + gitdir. This config should only be present in projects which are > > + submodules, but is not guaranteed to be present in every submodule. It > > + is set automatically during submodule creation. > > ++ > > + In situations where more than one superproject references the same > > + submodule worktree, the value of this config and the behavior of > > + operations which use it are undefined. To reference a single project > > + from multiple superprojects, it is better to create a worktree of the > > + submodule for each superproject. > > So my suggestion would be: > > The relative path from this repository's worktree to its > superproject's gitdir. When Git is run in a repository, it usually > makes no difference whether this repository is standalone or a > submodule, but if this configuration variable is present, commands > like "git status" in this submodule may print additional information > about this submodule's status with respect to its superproject. > > (I believe Junio has commented on the second paragraph - I don't have > additional comments on that.) The spirit of this suggestion seems to be "describe a possible side effect of this config", so I'll do that, although the wording may not be exactly the same. Thanks. - Emily
On Tue, Jun 15 2021, Emily Shaffer wrote: > Teach submodules a reference to their superproject's gitdir. This allows > us to A) know that we're running from a submodule, and B) have a > shortcut to the superproject's vitals, for example, configs. > > By using a relative path instead of an absolute path, we can move the > superproject directory around on the filesystem without breaking the > submodule's cache. > > Since this cached value is only introduced during new submodule creation > via `git submodule add`, though, there is more work to do to allow the > cache to be created at other times. > > Signed-off-by: Emily Shaffer <emilyshaffer@google.com> > --- > Documentation/config/submodule.txt | 12 +++++++++ > builtin/submodule--helper.c | 4 +++ > t/t7400-submodule-basic.sh | 40 ++++++++++++++++-------------- > 3 files changed, 38 insertions(+), 18 deletions(-) > > diff --git a/Documentation/config/submodule.txt b/Documentation/config/submodule.txt > index d7a63c8c12..7c459cc19e 100644 > --- a/Documentation/config/submodule.txt > +++ b/Documentation/config/submodule.txt > @@ -90,3 +90,15 @@ submodule.alternateErrorStrategy:: > `ignore`, `info`, `die`. Default is `die`. Note that if set to `ignore` > or `info`, and if there is an error with the computed alternate, the > clone proceeds as if no alternate was specified. > + > +submodule.superprojectGitDir:: > + The relative path from the submodule's worktree to the superproject's > + gitdir. This config should only be present in projects which are > + submodules, but is not guaranteed to be present in every submodule. It > + is set automatically during submodule creation. > ++ > + In situations where more than one superproject references the same > + submodule worktree, the value of this config and the behavior of > + operations which use it are undefined. To reference a single project > + from multiple superprojects, it is better to create a worktree of the > + submodule for each superproject. > diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c > index d55f6262e9..d60fcd2c7d 100644 > --- a/builtin/submodule--helper.c > +++ b/builtin/submodule--helper.c > @@ -1910,6 +1910,10 @@ static int module_clone(int argc, const char **argv, const char *prefix) > git_config_set_in_file(p, "submodule.alternateErrorStrategy", > error_strategy); > > + git_config_set_in_file(p, "submodule.superprojectGitdir", > + relative_path(absolute_path(get_git_dir()), > + path, &sb)); > + > free(sm_alternate); > free(error_strategy); > Am I correct that what this series really does is avoid needing to: 1. Run the equivalent of $(git rev-parse --absolute-git-dir), let's call the result of that $X. 2. Feed that to the equvialent of $(git -C $X/../ rev-parse --absolute-git-dir) 3. Check if the relationship between the two is really that of a submodule, i.e. running "git submodule status", check if $X contains ".git/modules/" etc. I see an earlier iteration of this series had such a shell-out, and that this is the "cache": https://lore.kernel.org/git/20210423001539.4059524-5-emilyshaffer@google.com/; and your v1 cover letter seems to support the above summary: https://lore.kernel.org/git/20210611225428.1208973-1-emilyshaffer@google.com/ I think it's fine to fine to add such a cache in principle if it's needed. But I'm a bit iffy on a series that's structured in a way as to not start by asserting that we should have given semantics without the cache, and then adds the cache later as an optional optimization. Particularly as the whole submodule business is moving to C, so isn't this whole caching business something we can avoid doing, and instead just call a C function? The optimization part of this was not calling "git rev-parse" on every submodule invocation wasn't it, not avoiding a few syscalls that deal with the FS. Your initial RFC had modifications to git-submodule.sh, in the interim it seems that's been moved sufficiently to C that we're modifying just the submodule.c here. I'm not very familiar with setup_git_directory_gently_1(), discover_git_directory() etc, but wherever you are in a git worktree we'll chdir() to the top-level when running built-ins. So that gives us step #1 of the above. And #2 would be adding "/../" to the end of that path and running those functions again? Perhaps with a #3 for "is there a submodule relationship?". Even if we still have some bits in shellscript etc, couldn't we then setenv() that in some GIT_* variable, e.g. GIT_SUPERPROJECT_DIR? Or is the problem really that this isn't a cache, because we can't say with absolute certainty that there is such a gitdir/submodule relationship, except at the time of running "git submodule" in the former for some reason?
diff --git a/Documentation/config/submodule.txt b/Documentation/config/submodule.txt index d7a63c8c12..7c459cc19e 100644 --- a/Documentation/config/submodule.txt +++ b/Documentation/config/submodule.txt @@ -90,3 +90,15 @@ submodule.alternateErrorStrategy:: `ignore`, `info`, `die`. Default is `die`. Note that if set to `ignore` or `info`, and if there is an error with the computed alternate, the clone proceeds as if no alternate was specified. + +submodule.superprojectGitDir:: + The relative path from the submodule's worktree to the superproject's + gitdir. This config should only be present in projects which are + submodules, but is not guaranteed to be present in every submodule. It + is set automatically during submodule creation. ++ + In situations where more than one superproject references the same + submodule worktree, the value of this config and the behavior of + operations which use it are undefined. To reference a single project + from multiple superprojects, it is better to create a worktree of the + submodule for each superproject. diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c index d55f6262e9..d60fcd2c7d 100644 --- a/builtin/submodule--helper.c +++ b/builtin/submodule--helper.c @@ -1910,6 +1910,10 @@ static int module_clone(int argc, const char **argv, const char *prefix) git_config_set_in_file(p, "submodule.alternateErrorStrategy", error_strategy); + git_config_set_in_file(p, "submodule.superprojectGitdir", + relative_path(absolute_path(get_git_dir()), + path, &sb)); + free(sm_alternate); free(error_strategy); diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh index f5dc051a6e..e45f42588f 100755 --- a/t/t7400-submodule-basic.sh +++ b/t/t7400-submodule-basic.sh @@ -108,14 +108,18 @@ test_expect_success 'setup - repository to add submodules to' ' submodurl=$(pwd -P) inspect() { - dir=$1 && - - git -C "$dir" for-each-ref --format='%(refname)' 'refs/heads/*' >heads && - { git -C "$dir" symbolic-ref HEAD || :; } >head && - git -C "$dir" rev-parse HEAD >head-sha1 && - git -C "$dir" update-index --refresh && - git -C "$dir" diff-files --exit-code && - git -C "$dir" clean -n -d -x >untracked + sub_dir=$1 && + super_dir=$2 && + + git -C "$sub_dir" for-each-ref --format='%(refname)' 'refs/heads/*' >heads && + { git -C "$sub_dir" symbolic-ref HEAD || :; } >head && + git -C "$sub_dir" rev-parse HEAD >head-sha1 && + git -C "$sub_dir" update-index --refresh && + git -C "$sub_dir" diff-files --exit-code && + cached_super_dir="$(git -C "$sub_dir" config --get submodule.superprojectGitDir)" && + [ "$(git -C "$super_dir" rev-parse --absolute-git-dir)" \ + -ef "$sub_dir/$cached_super_dir" ] && + git -C "$sub_dir" clean -n -d -x >untracked } @@ -139,7 +143,7 @@ test_expect_success 'submodule add' ' ) && rm -f heads head untracked && - inspect addtest/submod && + inspect addtest/submod addtest && test_cmp expect heads && test_cmp expect head && test_must_be_empty untracked @@ -230,7 +234,7 @@ test_expect_success 'submodule add --branch' ' ) && rm -f heads head untracked && - inspect addtest/submod-branch && + inspect addtest/submod-branch addtest && test_cmp expect-heads heads && test_cmp expect-head head && test_must_be_empty untracked @@ -246,7 +250,7 @@ test_expect_success 'submodule add with ./ in path' ' ) && rm -f heads head untracked && - inspect addtest/dotsubmod/frotz && + inspect addtest/dotsubmod/frotz addtest && test_cmp expect heads && test_cmp expect head && test_must_be_empty untracked @@ -262,7 +266,7 @@ test_expect_success 'submodule add with /././ in path' ' ) && rm -f heads head untracked && - inspect addtest/dotslashdotsubmod/frotz && + inspect addtest/dotslashdotsubmod/frotz addtest && test_cmp expect heads && test_cmp expect head && test_must_be_empty untracked @@ -278,7 +282,7 @@ test_expect_success 'submodule add with // in path' ' ) && rm -f heads head untracked && - inspect addtest/slashslashsubmod/frotz && + inspect addtest/slashslashsubmod/frotz addtest && test_cmp expect heads && test_cmp expect head && test_must_be_empty untracked @@ -294,7 +298,7 @@ test_expect_success 'submodule add with /.. in path' ' ) && rm -f heads head untracked && - inspect addtest/realsubmod && + inspect addtest/realsubmod addtest && test_cmp expect heads && test_cmp expect head && test_must_be_empty untracked @@ -310,7 +314,7 @@ test_expect_success 'submodule add with ./, /.. and // in path' ' ) && rm -f heads head untracked && - inspect addtest/realsubmod2 && + inspect addtest/realsubmod2 addtest && test_cmp expect heads && test_cmp expect head && test_must_be_empty untracked @@ -341,7 +345,7 @@ test_expect_success 'submodule add in subdirectory' ' ) && rm -f heads head untracked && - inspect addtest/realsubmod3 && + inspect addtest/realsubmod3 addtest && test_cmp expect heads && test_cmp expect head && test_must_be_empty untracked @@ -482,7 +486,7 @@ test_expect_success 'update should work when path is an empty dir' ' git submodule update -q >update.out && test_must_be_empty update.out && - inspect init && + inspect init . && test_cmp expect head-sha1 ' @@ -541,7 +545,7 @@ test_expect_success 'update should checkout rev1' ' echo "$rev1" >expect && git submodule update init && - inspect init && + inspect init . && test_cmp expect head-sha1 '
Teach submodules a reference to their superproject's gitdir. This allows us to A) know that we're running from a submodule, and B) have a shortcut to the superproject's vitals, for example, configs. By using a relative path instead of an absolute path, we can move the superproject directory around on the filesystem without breaking the submodule's cache. Since this cached value is only introduced during new submodule creation via `git submodule add`, though, there is more work to do to allow the cache to be created at other times. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> --- Documentation/config/submodule.txt | 12 +++++++++ builtin/submodule--helper.c | 4 +++ t/t7400-submodule-basic.sh | 40 ++++++++++++++++-------------- 3 files changed, 38 insertions(+), 18 deletions(-)