Message ID | 20211104234942.3473650-1-emilyshaffer@google.com (mailing list archive) |
---|---|
Headers | show |
Series | cache parent project's gitdir in submodules | expand |
On 11/4/2021 7:49 PM, Emily Shaffer wrote: > The only real change here is a slight semantics change to map from > <submodule gitdir> to <superproject common git dir>. In every case > *except* for when the superproject has a worktree, this changes nothing. > For the case when the superproject has a worktree, this means that now > submodules will refer to the general superproject common dir (e.g. no > worktree-specific refs or configs or whatnot). > > I *think* that because a submodule should exist in the context of the > common dir, not the worktree gitdir, that is ok. However, it does mean > it would be difficult to do something like sharing a config specific to > the worktree (the initial goal of this series). > > $ROOT/.git > $ROOT/.git/config.superproject <- shared by $ROOT/.git/modules/sub > $ROOT/.git/modules/sub <- points to $ROOT/.git > $ROOT/.git/worktrees/wt > $ROOT/.git/worktrees/wt/config.superproject <- contains a certain config-based pre-commit hook > > If the submodule only knows about the common dir, that is tough, because > the submodule would basically have to guess which worktree it's in from > its own path. There would be no way for '$WT/sub' to inherit > '$ROOT/.git/worktrees/wt/config.superproject'. > > That said... right now, we don't support submodules in worktrees very > well at all. A submodule in a worktree will get a brand new gitdir in > $ROOT/.git/worktrees/modules/ (and that brand new gitdir would point to > the super's common dir). So I think we can punt on this entire question > until we teach submodules and worktrees to play more gracefully together > (it's on my long list...), (I omit a portion that will be discussed later.) > Or, to summarize the long ramble above: "this is still kind of weird > with worktrees, but let's fix it later when we fix worktrees more > thoroughly". I'm concerned about punting here, because making a messy situation worse is unlikely to have a clean way out. Could we set up a design that works with superproject worktrees? You mentioned that submodules cannot have worktrees. At least, you said that 'absorbgitdirs' does not allow them. Could those subprojects still exist and be registered as submodules without using that command? What I'm trying to hint at is that if the submodules can't have worktrees, then maybe we could make their 'config.worktree' files be relative to the superproject worktrees. Then, these submodules could point to the commondir in their base config and _also_ to the worktree gitdir in their config.worktree. The issue that is immediately obvious here is that my definition is circular: we need to know the superproject worktree in order to discover the config.worktree which contains the information about the superproject worktree. > and at that time we can probably introduce a > pointer from $ROOT/.git/modules/sub/worktrees/wt/ to > $ROOT/.git/worktrees/wt/.... Your idea here appears to assume that if the superproject has worktrees, then the submodule is divided into worktrees in an exact correspondence. This would allow the submodule's config.worktree to point to the superproject's worktree (or possibly it could be inferred from the submodule's worktree relative to the submodule's commondir). This seems like an interesting way forward, but requires changing how 'git absorbgitdirs' works, along with changes to 'git worktree' or other submodule commands when the submodule first appears during a 'git checkout' in a worktree. I imagine there are a lot of "gotchas" here. It is worth spending some time imagining how to create this setup and/or enforce it as submodules are added in the lifecycle of a repository, if only to validate the config design presented by this series. Thanks, -Stolee
On Sun, Nov 07, 2021 at 08:24:43PM -0500, Derrick Stolee wrote: > > On 11/4/2021 7:49 PM, Emily Shaffer wrote: > > > The only real change here is a slight semantics change to map from > > <submodule gitdir> to <superproject common git dir>. In every case > > *except* for when the superproject has a worktree, this changes nothing. > > For the case when the superproject has a worktree, this means that now > > submodules will refer to the general superproject common dir (e.g. no > > worktree-specific refs or configs or whatnot). > > > > I *think* that because a submodule should exist in the context of the > > common dir, not the worktree gitdir, that is ok. However, it does mean > > it would be difficult to do something like sharing a config specific to > > the worktree (the initial goal of this series). > > > > $ROOT/.git > > $ROOT/.git/config.superproject <- shared by $ROOT/.git/modules/sub > > $ROOT/.git/modules/sub <- points to $ROOT/.git > > $ROOT/.git/worktrees/wt > > $ROOT/.git/worktrees/wt/config.superproject <- contains a certain config-based pre-commit hook > > > > If the submodule only knows about the common dir, that is tough, because > > the submodule would basically have to guess which worktree it's in from > > its own path. There would be no way for '$WT/sub' to inherit > > '$ROOT/.git/worktrees/wt/config.superproject'. > > > > That said... right now, we don't support submodules in worktrees very > > well at all. A submodule in a worktree will get a brand new gitdir in > > $ROOT/.git/worktrees/modules/ (and that brand new gitdir would point to > > the super's common dir). So I think we can punt on this entire question > > until we teach submodules and worktrees to play more gracefully together > > (it's on my long list...), > > (I omit a portion that will be discussed later.) > > > Or, to summarize the long ramble above: "this is still kind of weird > > with worktrees, but let's fix it later when we fix worktrees more > > thoroughly". > > I'm concerned about punting here, because making a messy situation worse > is unlikely to have a clean way out. Could we set up a design that works > with superproject worktrees? > > You mentioned that submodules cannot have worktrees. At least, you said > that 'absorbgitdirs' does not allow them. Could those subprojects still > exist and be registered as submodules without using that command? > > What I'm trying to hint at is that if the submodules can't have > worktrees, then maybe we could make their 'config.worktree' files be > relative to the superproject worktrees. Then, these submodules could > point to the commondir in their base config and _also_ to the worktree > gitdir in their config.worktree. > > The issue that is immediately obvious here is that my definition is > circular: we need to know the superproject worktree in order to discover > the config.worktree which contains the information about the superproject > worktree. > > > and at that time we can probably introduce a > > pointer from $ROOT/.git/modules/sub/worktrees/wt/ to > > $ROOT/.git/worktrees/wt/.... > > Your idea here appears to assume that if the superproject has worktrees, > then the submodule is divided into worktrees in an exact correspondence. > This would allow the submodule's config.worktree to point to the > superproject's worktree (or possibly it could be inferred from the > submodule's worktree relative to the submodule's commondir). > > This seems like an interesting way forward, but requires changing how > 'git absorbgitdirs' works, along with changes to 'git worktree' or other > submodule commands when the submodule first appears during a 'git checkout' > in a worktree. I imagine there are a lot of "gotchas" here. It is worth > spending some time imagining how to create this setup and/or enforce it > as submodules are added in the lifecycle of a repository, if only to > validate the config design presented by this series. Yeah, I think we may be overthinking it, especially with the concerns about common dir vs. gitdir. More specifically - I think we accidentally did the right thing in the previous iteration by using the gitdir :) I think we can probably put it pretty simply: submodule.superprojectGitDir should point from the most local gitdir of the submodule to the most local gitdir of the superproject. Luckily there are not so many permutations to worry about here. Super doesn't have worktrees, sub doesn't have worktrees: .git/ modules/ sub/ config <- contains a pointer to ".git/" config Super doesn't have worktrees, sub does have worktrees (and as you suggest above, right now this would have to be created carefully and manually, but later we probably want this to Just Work): .git/ modules/ sub/ config config.worktree <- contains a pointer to ".git/" worktrees/ sub-wt/ config <- contains a pointer to ".git/" config Super has worktrees, sub doesn't have worktrees: Actually, I think in the future this might not be possible, if we want to make `git worktree add --recurse-submodules` work gracefully (and I do want that). But in the interim, in practice it looks like this: .git/ modules/ sub/ config <- contains a pointer to ".git/" worktrees/ super-wt/ modules/ sub/ config <- contains a pointer to ".git/worktrees/super-wt" config config This case is pretty weird anyway, because in order for a submodule to be present in multiple worktrees of the superproject, the submodule itself needs to either have multiple worktrees or multiple repos. But on the flip side, today it's impossible for a single submodule gitdir to need to know about more superproject worktrees than the one it was initted/whatever into. Both super and sub have worktrees: And this won't exist until we have graceful support of `git worktree add --recurse-submodules` or with some manual effort, now. .git/ modules/ sub/ worktrees/ sub-wt/ config <- contains a pointer to ".git/worktrees/super-wt/" config config.worktree <- contains a pointer to ".git/" worktrees/ super-wt/ config config config.worktree I think this will give us access to both the worktree gitdir *and* the common gitdir: ~/git/.git/worktrees/git-second [GIT_DIR!]$ git rev-parse --git-common-dir /usr/local/google/home/emilyshaffer/git/.git So that means from any submodule, we can determine: - submodule's gitdir (from the .git link in the submodule wt) - submodule's common dir (from existing commands) - gitdir of superproject which submodule inhabits (from the config in the submodule's gitdir, or the submodule's config.worktree) - common dir of superproject (from existing commands + prior config) The upshot to me, then, means that we should be 1) making sure to get the path to the gitdir, not the common dir, of the superproject; and 2) using helpers to write to the worktree config, not to the local config, of the submodule. In other words, we want to avoid the following: .git/ modules/ sub/ worktrees/ wt/ config config <- "submodule.superprojectGitDir = ../../../.." as written by the worktree Will take a look at the rest of the comments too, but this sounds like a reasonable approach to me. - Emily > > Thanks, > -Stolee
On 11/8/2021 6:11 PM, Emily Shaffer wrote: > Yeah, I think we may be overthinking it, especially with the concerns > about common dir vs. gitdir. More specifically - I think we accidentally > did the right thing in the previous iteration by using the gitdir :) > > I think we can probably put it pretty simply: > submodule.superprojectGitDir should point from the most local gitdir of > the submodule to the most local gitdir of the superproject. > > Luckily there are not so many permutations to worry about here. > > Super doesn't have worktrees, sub doesn't have worktrees: > Super doesn't have worktrees, sub does have worktrees (and as you > suggest above, right now this would have to be created carefully and > manually, but later we probably want this to Just Work): > Super has worktrees, sub doesn't have worktrees: > Actually, I think in the future this might not be possible, if we want > to make `git worktree add --recurse-submodules` work gracefully (and I > do want that). But in the interim, in practice it looks like this: > Both super and sub have worktrees: > And this won't exist until we have graceful support of `git worktree add > --recurse-submodules` or with some manual effort, now. > I think this will give us access to both the worktree gitdir *and* the > common gitdir: > > ~/git/.git/worktrees/git-second [GIT_DIR!]$ git rev-parse --git-common-dir > /usr/local/google/home/emilyshaffer/git/.git > > So that means from any submodule, we can determine: > - submodule's gitdir (from the .git link in the submodule wt) > - submodule's common dir (from existing commands) > - gitdir of superproject which submodule inhabits (from the config in > the submodule's gitdir, or the submodule's config.worktree) > - common dir of superproject (from existing commands + prior config) > > The upshot to me, then, means that we should be 1) making sure to get > the path to the gitdir, not the common dir, of the superproject; and 2) > using helpers to write to the worktree config, not to the local config, > of the submodule. In other words, we want to avoid the following: > > .git/ > modules/ > sub/ > worktrees/ > wt/ > config > config <- "submodule.superprojectGitDir = ../../../.." as written by the worktree > > Will take a look at the rest of the comments too, but this sounds like a > reasonable approach to me. I agree, that this seems reasonable. Spelling it out carefully like this, along with your list of possibilities, clarifies where the data is located and how we can construct any information we need from that. You point out that there are cases that can be a bit tricky to get into with current features, but this config approach won't make that any worse right now. Thanks, -Stolee
For the original cover letter, see https://lore.kernel.org/git/20210611225428.1208973-1-emilyshaffer%40google.com. The CI run is here: https://github.com/nasamuffin/git/actions/runs/1423475715 It shows a broken (pedantic,fedora) test, but those are known broken: https://lore.kernel.org/git/nycvar.QRO.7.76.6.2111040007170.56%40tvgsbejvaqbjf.bet [v4 cover letter] Since v3, a pretty major change: the semantics of submodule.superprojectGitDir has changed, to point from the submodule's gitdir to the superproject's gitdir (in v3 and earlier, we kept a path from the submodule's *worktree* to the superproject's gitdir instead). This cleans up some of the confusions about the behavior when a submodule worktree moves around in the superproject's tree, or in a future when we support submodules having multiple worktrees. I also tried to simplify the tests to use 'test-tool path-utils relative_path' everywhere - I think that makes them much more clear for a test reader, but if you're reviewing and it isn't obvious what we're testing for, please speak up. I think this is pretty mature and there was a lot of general agreement that the gitdir->gitdir association was the way to go, so please be brutal and look for nits, leaks, etc. this round ;) [/v4 cover letter] Since v4: The only real change here is a slight semantics change to map from <submodule gitdir> to <superproject common git dir>. In every case *except* for when the superproject has a worktree, this changes nothing. For the case when the superproject has a worktree, this means that now submodules will refer to the general superproject common dir (e.g. no worktree-specific refs or configs or whatnot). I *think* that because a submodule should exist in the context of the common dir, not the worktree gitdir, that is ok. However, it does mean it would be difficult to do something like sharing a config specific to the worktree (the initial goal of this series). $ROOT/.git $ROOT/.git/config.superproject <- shared by $ROOT/.git/modules/sub $ROOT/.git/modules/sub <- points to $ROOT/.git $ROOT/.git/worktrees/wt $ROOT/.git/worktrees/wt/config.superproject <- contains a certain config-based pre-commit hook If the submodule only knows about the common dir, that is tough, because the submodule would basically have to guess which worktree it's in from its own path. There would be no way for '$WT/sub' to inherit '$ROOT/.git/worktrees/wt/config.superproject'. That said... right now, we don't support submodules in worktrees very well at all. A submodule in a worktree will get a brand new gitdir in $ROOT/.git/worktrees/modules/ (and that brand new gitdir would point to the super's common dir). So I think we can punt on this entire question until we teach submodules and worktrees to play more gracefully together (it's on my long list...), and at that time we can probably introduce a pointer from $ROOT/.git/modules/sub/worktrees/wt/ to $ROOT/.git/worktrees/wt/.... Or, to summarize the long ramble above: "this is still kind of weird with worktrees, but let's fix it later when we fix worktrees more thoroughly". (More rambling about worktree weirdness here: https://lore.kernel.org/git/YYRaII8YWVxlBqsF%40google.com ) - Emily Emily Shaffer (4): t7400-submodule-basic: modernize inspect() helper introduce submodule.superprojectGitDir record submodule: record superproject gitdir during absorbgitdirs submodule: record superproject gitdir during 'update' Documentation/config/submodule.txt | 12 +++++++ builtin/submodule--helper.c | 4 +++ git-submodule.sh | 14 ++++++++ submodule.c | 9 +++++ t/t7400-submodule-basic.sh | 54 ++++++++++++++++-------------- t/t7406-submodule-update.sh | 12 +++++++ t/t7412-submodule-absorbgitdirs.sh | 50 +++++++++++++++++++++++++-- 7 files changed, 128 insertions(+), 27 deletions(-) Range-diff against v4: 1: 2ff151aaa2 = 1: 6ff10beaf2 t7400-submodule-basic: modernize inspect() helper 2: ed5479ad5d ! 2: d4f4627585 introduce submodule.superprojectGitDir record @@ Commit message superproject directory around on the filesystem without breaking the submodule's cache. And by using the path from gitdir to gitdir, we can move the submodule within the superproject's tree structure without - breaking the submodule's cache, too. + breaking the submodule's cache, too. Finally, by pointing at + "get_git_common_dir()" instead of "get_git_dir()", we ensure the link + will refer to the parent repo, not to a specific worktree. Since this hint value is only introduced during new submodule creation via `git submodule add`, though, there is more work to do to allow the @@ Documentation/config/submodule.txt: submodule.alternateErrorStrategy:: + +submodule.superprojectGitDir:: + The relative path from the submodule's gitdir to its superproject's -+ gitdir. When Git is run in a repository, it usually makes no difference -+ whether this repository is standalone or a submodule, but if this -+ configuration variable is present, additional behavior may be possible, -+ such as "git status" printing additional information about this -+ submodule's status with respect to its superproject. This config should -+ only be present in projects which are submodules, but is not guaranteed -+ to be present in every submodule, so only optional value-added behavior -+ should be linked to it. It is set automatically during -+ submodule creation. ++ common dir. When Git is run in a repository, it usually makes no ++ difference whether this repository is standalone or a submodule, but if ++ this configuration variable is present, additional behavior may be ++ possible, such as "git status" printing additional information about ++ this submodule's status with respect to its superproject. This config ++ should only be present in projects which are submodules, but is not ++ guaranteed to be present in every submodule, so only optional ++ value-added behavior should be linked to it. It is set automatically ++ during submodule creation. ## builtin/submodule--helper.c ## @@ builtin/submodule--helper.c: static int clone_submodule(struct module_clone_data *clone_data) @@ builtin/submodule--helper.c: static int clone_submodule(struct module_clone_data error_strategy); + git_config_set_in_file(p, "submodule.superprojectGitdir", -+ relative_path(absolute_path(get_git_dir()), ++ relative_path(absolute_path(get_git_common_dir()), + sm_gitdir, &sb)); + free(sm_alternate); @@ t/t7400-submodule-basic.sh: submodurl=$(pwd -P) + # Ensure that submodule.superprojectGitDir contains the path from the + # submodule's gitdir to the superproject's gitdir. + -+ super_abs_gitdir=$(git -C "$super_dir" rev-parse --absolute-git-dir) && -+ sub_abs_gitdir=$(git -C "$sub_dir" rev-parse --absolute-git-dir) && ++ super_abs_gitdir=$(git -C "$super_dir" rev-parse --path-format=absolute --git-common-dir) && ++ sub_abs_gitdir=$(git -C "$sub_dir" rev-parse --path-format=absolute --git-common-dir) && + + [ "$(git -C "$sub_dir" config --get submodule.superprojectGitDir)" = \ + "$(test-tool path-utils relative_path "$super_abs_gitdir" \ 3: 60e6cf77c5 ! 3: 2dae297943 submodule: record superproject gitdir during absorbgitdirs @@ Commit message Already during 'git submodule add' we record a pointer to the superproject's gitdir. However, this doesn't help brand-new submodules created with 'git init' and later absorbed with 'git - submodule absorbgitdir'. Let's start adding that pointer during 'git - submodule absorbgitdir' too. + submodule absorbgitdirs'. Let's start adding that pointer during 'git + submodule absorbgitdirs' too. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> @@ submodule.c: static void relocate_single_git_dir_into_superproject(const char *p relocate_gitdir(path, real_old_git_dir, real_new_git_dir); + /* cache pointer to superproject's gitdir */ -+ /* NEEDSWORK: this may differ if experimental.worktreeConfig is enabled */ + strbuf_addf(&config_path, "%s/config", real_new_git_dir); + git_config_set_in_file(config_path.buf, "submodule.superprojectGitdir", -+ relative_path(absolute_path(get_git_dir()), ++ relative_path(absolute_path(get_git_common_dir()), + real_new_git_dir, &sb)); + + strbuf_release(&config_path); @@ t/t7412-submodule-absorbgitdirs.sh: test_expect_success 'absorb the git dir' ' + test_cmp expect.2 actual.2 && + + # make sure the submodule cached the superproject gitdir correctly -+ submodule_gitdir="$(git -C sub1 rev-parse --absolute-git-dir)" && -+ superproject_gitdir="$(git rev-parse --absolute-git-dir)" && ++ submodule_gitdir="$(git -C sub1 rev-parse --path-format=absolute --git-common-dir)" && ++ superproject_gitdir="$(git rev-parse --path-format=absolute --git-common-dir)" && + + test-tool path-utils relative_path "$superproject_gitdir" \ + "$submodule_gitdir" >expect && @@ t/t7412-submodule-absorbgitdirs.sh: test_expect_success 'absorb the git dir in a - test_cmp expect.2 actual.2 + test_cmp expect.2 actual.2 && + -+ sub1_gitdir="$(git -C sub1 rev-parse --absolute-git-dir)" && -+ sub1_nested_gitdir="$(git -C sub1/nested rev-parse --absolute-git-dir)" && ++ sub1_gitdir="$(git -C sub1 rev-parse --path-format=absolute --git-common-dir)" && ++ sub1_nested_gitdir="$(git -C sub1/nested rev-parse --path-format=absolute --git-common-dir)" && + + test-tool path-utils relative_path "$sub1_gitdir" "$sub1_nested_gitdir" \ + >expect && @@ t/t7412-submodule-absorbgitdirs.sh: test_expect_success 'absorb the git dir in a ' test_expect_success 're-setup nested submodule' ' +@@ t/t7412-submodule-absorbgitdirs.sh: test_expect_success 'absorbing fails for a submodule with multiple worktrees' ' + test_i18ngrep "not supported" error + ' + ++test_expect_success 'absorbgitdirs works when called from a superproject worktree' ' ++ # set up a worktree of the superproject ++ git worktree add wt && ++ ( ++ cd wt && ++ ++ # create a new unembedded git dir ++ git init sub4 && ++ test_commit -C sub4 first && ++ git submodule add ./sub4 && ++ test_tick && ++ ++ # absorb the git dir ++ git submodule absorbgitdirs sub4 && ++ ++ # make sure the submodule cached the superproject gitdir correctly ++ submodule_gitdir="$(git -C sub4 rev-parse --path-format=absolute --git-common-dir)" && ++ superproject_gitdir="$(git rev-parse --path-format=absolute --git-common-dir)" && ++ ++ test-tool path-utils relative_path "$superproject_gitdir" \ ++ "$submodule_gitdir" >expect && ++ git -C sub4 config submodule.superprojectGitDir >actual && ++ ++ test_cmp expect actual ++ ) ++' ++ + test_done 4: df9879ff93 ! 4: f0412d6d34 submodule: record superproject gitdir during 'update' @@ git-submodule.sh: cmd_update() ;; esac -+ # Cache a pointer to the superproject's gitdir. This may have ++ # Cache a pointer to the superproject's common dir. This may have + # changed, unless it's a fresh clone. Writes it to worktree + # if applicable, otherwise to local. + if test -z "$just_cloned" @@ git-submodule.sh: cmd_update() + sm_gitdir="$(git -C "$sm_path" rev-parse --absolute-git-dir)" + relative_gitdir="$(git rev-parse --path-format=relative \ + --prefix "${sm_gitdir}" \ -+ --git-dir)" ++ --git-common-dir)" + + git -C "$sm_path" config --worktree \ + submodule.superprojectgitdir "$relative_gitdir" @@ t/t7406-submodule-update.sh: test_expect_success 'submodule update --quiet passe + git -C submodule config --unset submodule.superprojectGitdir && + git submodule update && + test-tool path-utils relative_path \ -+ "$(git rev-parse --absolute-git-dir)" \ -+ "$(git -C submodule rev-parse --absolute-git-dir)" >expect && ++ "$(git rev-parse --path-format=absolute --git-common-dir)" \ ++ "$(git -C submodule rev-parse --path-format=absolute --git-common-dir)" >expect && + git -C submodule config submodule.superprojectGitdir >actual && + test_cmp expect actual + )