Message ID | 20210611225428.1208973-1-emilyshaffer@google.com (mailing list archive) |
---|---|
Headers | show |
Series | cache parent project's gitdir in submodules | expand |
On Fri, Jun 11, 2021 at 3:54 PM Emily Shaffer <emilyshaffer@google.com> wrote: > > It's necessary for a superproject to know which submodules it contains. > However, historically submodules do not know they are anything but a > normal single-repo Git project (or a superproject, in nested-submodules > cases). This decision does help prevent us from having to support > divergent behaviors in submodule projects vs. superprojects, which makes > sure Git is (somewhat) less confusing for the reader, and helps simplify > our code. > > One could imagine, though, some conveniences we could gain from > submodules learning added behavior (though not necessarily *different* > behavior) to provide more context about the state of the project as a > whole, and to make large submodule-based projects easier to work with. > One example is a series[1] I sent some time ago, adding a config to be > shared between the superproject and all its submodules. The RFC[2] I > sent around the same time mentions a few other examples, such as "git > status" run in the submodule noticing whether the superproject has a > commit referencing the submodule's current HEAD. > > It's expensive and non-definitive to try and guess whether or not the > current repo is a submodule. submodule.c:get_superproject_working_tree() > does so by essentially running 'git -C .. ls-files -- <own-path>', > invoking an additional process. get_superproject_working_tree() is not > called often, so that's mostly fine. However, [1] attempted to include > an additional config located in the superproject's gitdir by running > 'git -C .. rev-parse --git-dir' during startup - a little expensive in > the best case, because it's an extra process, but extremely expensive in > the case when the current repo is *not* a submodule, because we hunt all > the way up the filesystem looking for a '.git'. Adding that cost to > every startup is infeasible. > It also adds cost for no benefit to the normal case where it's not a submodule. The cost added for non-submodules ought to be as near-zero as possible. > To that end, in this series I propose caching a path to the > superproject's gitdir - by having the superproject write that relative > path to the submodule's config on creation or update. The goal here is > *not* to say "If I am a submodule, I must have > submodule.superprojectGitDir set" - but instead to say "If I have > submodule.superprojectGitDir set, then I must be a submodule." That is, > I expect we will find edge cases where a submodule was introduced in > some interesting way that bypassed any of the patches below, and > therefore doesn't have the superproject's gitdir cached. > > The combination of these two rules: > - Anything relying on submodule.superprojectGitDir must be nice to > have, but not essential, because > - It's possible for a submodule to be valid without having > submodule.superprojectGitDir set > makes me feel more comfortable with the idea of submodules learning > additional behavior based on this config. I feel pretty unconfident in > our ability to ensure that *every* submodule has this config set. > I think this is a good direction. I do think having some information about being a submodule could be very useful for tools such as git status, and making it more of a "if we know for sure, we get some small benefit, but if we don't know then it's no harm" is a good direction. > The series covers a few paths for introducing that config, which I'm > hoping covers most cases. > - "git submodule update" (which seems to be part of the "git submodule > init" flow) > - "git submodule absorbgitdir" to convert a "git init"'d repo into a > submodule > > Notably, we can only really set this config when 'the_repository' is the > superproject - that appears to be the only time when we know the gitdirs > of both the superproject and the submodule. We could also presumably add a new command for setting this? > > I'm expecting folks may have a lot to say about this, so I look forward > to discussion :) > > - Emily > > 1: https://lore.kernel.org/git/20210423001539.4059524-1-emilyshaffer@google.com > 2: https://lore.kernel.org/git/YHofmWcIAidkvJiD@google.com > > Emily Shaffer (4): > t7400-submodule-basic: modernize inspect() helper > introduce submodule.superprojectGitDir cache > submodule: cache superproject gitdir during absorbgitdirs > submodule: cache superproject gitdir during 'update' > > builtin/submodule--helper.c | 4 +++ > git-submodule.sh | 9 ++++++ > submodule.c | 10 ++++++ > t/t7400-submodule-basic.sh | 49 ++++++++++++++---------------- > t/t7406-submodule-update.sh | 10 ++++++ > t/t7412-submodule-absorbgitdirs.sh | 1 + > 6 files changed, 57 insertions(+), 26 deletions(-) > > -- > 2.32.0.272.g935e593368-goog >
Emily Shaffer <emilyshaffer@google.com> writes: > It's necessary for a superproject to know which submodules it contains. > However, historically submodules do not know they are anything but a > normal single-repo Git project (or a superproject, in nested-submodules > cases). This decision does help prevent us from having to support > divergent behaviors in submodule projects vs. superprojects, which makes > sure Git is (somewhat) less confusing for the reader, and helps simplify > our code. https://github.com/git/git/actions/runs/934803365 is the CI run of 'seen' that passes. With this and another topic merged to 'seen', as can be seen in https://github.com/git/git/actions/runs/934680687, a handful of tests fail. Thanks.
On Mon, Jun 14, 2021 at 04:26:17PM +0900, Junio C Hamano wrote: > > Emily Shaffer <emilyshaffer@google.com> writes: > > > It's necessary for a superproject to know which submodules it contains. > > However, historically submodules do not know they are anything but a > > normal single-repo Git project (or a superproject, in nested-submodules > > cases). This decision does help prevent us from having to support > > divergent behaviors in submodule projects vs. superprojects, which makes > > sure Git is (somewhat) less confusing for the reader, and helps simplify > > our code. > > https://github.com/git/git/actions/runs/934803365 is the CI run of > 'seen' that passes. With this and another topic merged to 'seen', > as can be seen in https://github.com/git/git/actions/runs/934680687, > a handful of tests fail. Thanks, I will investigate.