Message ID | 20220310004423.2627181-1-emilyshaffer@google.com (mailing list archive) |
---|---|
Headers | show |
Series | teach submodules to know they're submodules | expand |
On Wed, Mar 09 2022, Emily Shaffer wrote: > For the original cover letter, see > https://lore.kernel.org/git/20210611225428.1208973-1-emilyshaffer%40google.com. > > CI run: https://github.com/nasamuffin/git/actions/runs/1954710601 > > Since v8: > > Only a couple of minor fixes. > > Junio pointed out that I could write the tests better using --type=bool > and 'test_cmp_config', and that we could be a little more careful about > when to give up on 'git rev-parse --show-superproject-working-dir'. > > Glen mentioned that builtin/submodule--helper.c:run_update_procedure() is called > unconditionally earlier in the same function where I had added the > config in git-submodule.sh. So, I moved the config set into > submodule--helper.c to reduce possible edge cases where the config might > not be set. > > Otherwise, this series is pretty much unchanged. > > Since v7: > > Actually a fairly large rework. Rather than keeping the path from gitdir > to gitdir, just keep a boolean under 'submodule.hasSuperproject'. The > idea is that from this boolean, we can decide whether to traverse the > filesystem looking for a superproject. > > Because this simplifies the implementation, I compressed the three > middle commits into one. As proof-of-concept, I added a patch at the end > to check for this boolean when running `git rev-parse > --show-superproject-working-tree`. > > One thing I'm not sure about: in the tests, I check whether the config > is set, but not what the boolean value of it is. Is there a better way > to do that? For example, I could imagine someone deciding to set > `submodule.hasSuperproject = false` and the tests would not function > correctly in that case. I think we don't really normalize the value on a > boolean config like that, so I didn't want to write a lot of comparison > to check if the value is 1 or true or True or TRUE or Yes or .... Am I > overthinking it? > > The other thing I'm not sure about: since it's just a bool, we're not > restricted to setting this config only when we have both gitdir paths > available. That makes me want to set the config any time we are doing > something with submodules anyway, like any time 'git-submodule--helper' > is used. But that helper seems to be called in the context of the > superproject, not of the submodules, so adding this config for each > submodule we touch would be a second child process. Is there some other > common entry point for submodules that we can use? I really don't mean to bring up the same points again, but I'm still genuinely unsure what this is intended to solve in the end. I.e. from the original RFC we went from it being for optimizations for the shellscript "git rev-parse", to suggestions that the configured path would be "canonical" in a way we couldn't discover on-the-fly (i.e. some of Jonathan's noted edge cases [1]). But now it's a boolean indicating "it's there, discover it", and the implied (but not really explicitly stated) reason in 2/3 is that it's purely for optimization purposes at this point. But it's an optimization without a benchmark. In [1] Jonathan (if I understood it correctly, see [2]) might have suggested this is important to deal with some Google in-house NFS-a-like auto-mounting software, i.e. the "walking up" is truly expensive in some scenarios. I do worry a bit that we'll be creating behavior edge cases related to this, and if the problem being solved is for a relatively obscure setup is it worth it, and in that case perhaps there should be a "I need this optimization" setting guarding it? But I don't know, a concrete case where this series makes a difference would really help. I tried to come up with one before[3] and all I could find was fleeting cases we'd see go away with the migration of the remaining parts of git-submodule.sh to C, which we already have in-flight patches for (or rather, Glen is AFAIK at series 1/2 of submitting those, with 1/2 in-flight). In any case I think lifting the bits of [3] where we assert that this doesn't introduce any behavior change with a GIT_TEST_* knob would be valuable. I.e. as long a the intent isn't a behavior change let's test that get_superproject_working_tree() doesn't need this across the entire test suite, with specific tests that opt-in to the behavior (or do a whole test suite run in that mode), rather than the default being opt-out. An opt-out is just a recipe for growing accidental implicit dependencies, which explicitly isn't what we want for a "just an optimization" knob. We do the same sort of opt-in/out-out testing for e.g. split index, untracked cache etc (see the GIT_TEST_* bits in ci/run-build-and-tests.sh). AFAICT a fix-up of just adding the git_env_bool() here to this code in your 3/3 would do it: if (!git_env_bool("GIT_TEST_NO_SUBMODULE_HAS_SUPERPROJECT", 0) && !git_config_get_bool("submodule.hassuperproject", &has_superproject_cfg) && !has_superproject_cfg) And then adding GIT_TEST_NO_SUBMODULE_HAS_SUPERPROJECT=true to linux-TEST-vars in ci/run-build-and-tests.sh. The tests that do rely on submodule.hassuperproject would need to set GIT_TEST_NO_SUBMODULE_HAS_SUPERPROJECT=false of course... 1. https://lore.kernel.org/git/YgF5V2Y0Btr8B4cd@google.com/ 2. https://lore.kernel.org/git/220212.864k53yfws.gmgdl@evledraar.gmail.com/ 3. https://lore.kernel.org/git/RFC-cover-0.2-00000000000-20211117T113134Z-avarab@gmail.com/
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > But now it's a boolean indicating "it's there, discover it", and the > implied (but not really explicitly stated) reason in 2/3 is that it's > purely for optimization purposes at this point. You may know that I have a separate checkout of the 'todo' branch at path "Meta" in my working tree. I could use the hasSuperproject=false setting there, to say "this is *NOT* a submodule, even the parent directory is a working tree of a different repository, it is not our superproject, so do *NOT* bother to go up to discover anything". If that configuration weren't there in the "Meta/.git/config", the parent directory of "Meta" (which has its own ".git") cannot tell if that "Meta" thing is a submodule being prepared that hasn't been added yet, or it will never intended to be a submodule. I would imagine that "git add X" can later be taught to refuse to add X if there is X/.git and X/.git/config says it explicitly says that it does not have a superproject. So, I am not sure if it is a good characterization that it is for optimization at all.