Message ID | 20220308001433.94995-1-chooglen@google.com (mailing list archive) |
---|---|
Headers | show |
Series | fetch --recurse-submodules: fetch unpopulated submodules | expand |
Glen Choo <chooglen@google.com> writes: > It's true that we don't need <.super_oid, .path> in order to init the > subrepo, but it turns out that recursive fetch reads some > configuration values from .gitmodules (via submodule_from_path()), so > we still need to store super_oid in order to read the correct > .gitmodules file. OK, but then do we know which .gitmodules file is the "correct" one, when there are more than one .super_oid? Or do we assume that .gitmodules does not change in the range of superproject commits we have fetched before deciding what commits need to be fetched in the submodules? > == Since v4 > - Rename test helpers (s/check_/write_expected_) > - Test style fixes > - Update test comments > - Remove the manual test_cmp in the test that checks sub2 (but we still > construct expect.err.super manually). All of these changes looked sensible. Will queue. Thanks.
Junio C Hamano <gitster@pobox.com> writes: > Glen Choo <chooglen@google.com> writes: > >> It's true that we don't need <.super_oid, .path> in order to init the >> subrepo, but it turns out that recursive fetch reads some >> configuration values from .gitmodules (via submodule_from_path()), so >> we still need to store super_oid in order to read the correct >> .gitmodules file. > > OK, but then do we know which .gitmodules file is the "correct" one, > when there are more than one .super_oid? Or do we assume that > .gitmodules does not change in the range of superproject commits we > have fetched before deciding what commits need to be fetched in the > submodules? This uses a "first one wins approach", which obviously doesn't have correctness guarantees. But in practice, I don't think this is likely to cause problems: - As far as I can tell, the only value we read from .gitmodules is 'submodule.<name>.fetchRecurseSubmodules', and this value gets overridden by two other values: the CLI option, and the config variable with the same name in .git/config. During "git submodule init", we copy the config values from .gitmodules to .git/config. Since we can only fetch init-ed submodules anyway, it's quite unlikely that we will ever actually make use of the .gitmodules config. - Even if we do use the .gitmodules config values, it's unlikely that the values in .gitmodules will change often, so it _probably_ won't matter which one we choose. - This only matters when the submodule is not in the index. If the submodule _is_ in the index, we read .gitmodules from the filesystem i.e. these patches shouldn't change the behavior for submodules in the index.
Glen Choo <chooglen@google.com> writes: > - <20220304235328.649768-1-jonathantanmy@google.com> I've described the > differences between the no-submodule-in-index test and the > other-submodule-in-index test (their comments now refer to one > another, so the contrast is more obvious), but didn't reorder them > because I thought that made the test setup less intuitive to read. Thanks - the comments make sense. > - <20220304234622.647776-1-jonathantanmy@google.com> I added > expect.err.sub2 to verify_test_result() but didn't change > write_expected_super() to account for sub2. It turned out to be tricky > to predict the output when 'super' fetches >1 branch because each > fetched branch can affect the formatting. e.g. > > OLD_HEAD..super super -> origin/super > > can become > > OLD_HEAD..super super -> origin/super > OLD_HEAD..super some-other-branch -> origin/some-other-branch > > (I could work around this by replacing the whitespace with sed, but it > seemed like too much overhead for a one-off test). Overwriting just the super part works for me, thanks. The only thing remaining from me is my comment about fetching OIDs from one submodule into another (of the same name but different URL) [1], but I looked into it myself and we can probably postpone handling this to another patch set. In such a patch set, we would probably need to store the URLs that are reported by upstream .gitmodules somewhere. (I forgot that we don't use them in this patch.) And then, either implement an autosync function (like "git submodule sync", perhaps gated by a "--sync-submodules" argument so that users can include it when fetching new commits and exclude it when fetching historical commits) and/or use those URLs in a diagnostic message to be printed when the fetch fails. As it is, the existing fetch-into-submodules-at-HEAD also suffers from the same flaw, so I'm OK postponing this to another patch set. So, Reviewed-by: Jonathan Tan <jonathantanmy@google.com> [1] https://lore.kernel.org/git/20220304234622.647776-1-jonathantanmy@google.com/
Glen Choo <chooglen@google.com> writes: > This uses a "first one wins approach", which obviously doesn't have > correctness guarantees. But in practice, I don't think this is likely to > cause problems: > > - As far as I can tell, the only value we read from .gitmodules is > 'submodule.<name>.fetchRecurseSubmodules', and this value gets > overridden by two other values: the CLI option, and the config > variable with the same name in .git/config. > > During "git submodule init", we copy the config values from > .gitmodules to .git/config. Since we can only fetch init-ed submodules > anyway, it's quite unlikely that we will ever actually make use of the > .gitmodules config. These are reasonable. > - Even if we do use the .gitmodules config values, it's unlikely that > the values in .gitmodules will change often, so it _probably_ won't > matter which one we choose. What bad things would we see if the value changes during the span of history of the superproject we fetched? How often we would see broken behaviour is immaterial and breakage being rare is a no excuse to import a new code with designed-in flaw. Unless the "rare" is "never", that is. I would think using ANY values from .gitmodules without having the end-user agree with the settings and copying the settings to the .git/config is a BUG. So if it mattered from which superproject commit we took .gitmodules from, that would mean we already have such a bug and it is not a new problem. That would be a reasonable argument for this topic. Together with the previous point, i.e. we do not copy values we see in the in-tree .gitmodules file to .git/config anyway, it would make a good enough assurance, I would think. > - This only matters when the submodule is not in the index. If the > submodule _is_ in the index, we read .gitmodules from the filesystem > i.e. these patches shouldn't change the behavior for submodules in the > index. How often we would see broken behaviour does not matter. If it is broken when the submodule is not in the index, we need to know. But as you said, it does not sound likely that in-tree .gitmodules matters. It leads to a possible #leftoverbit clean-up. Because we only fetch submodules that are initialized, the API functions we are using in this series has no reason to require us to feed _a_ commit in the superproject to them so that they can find .gitmodules in them. Fixing the API can probably be left outside the scope of the topic, to be done soon after the dust from the topic settles, I think, to avoid distracting us from the topic. Thanks.
Junio C Hamano <gitster@pobox.com> writes: > Glen Choo <chooglen@google.com> writes: > >> This uses a "first one wins approach", which obviously doesn't have >> correctness guarantees. But in practice, I don't think this is likely to >> cause problems: >> >> - As far as I can tell, the only value we read from .gitmodules is >> 'submodule.<name>.fetchRecurseSubmodules', and this value gets >> overridden by two other values: the CLI option, and the config >> variable with the same name in .git/config. >> >> During "git submodule init", we copy the config values from >> .gitmodules to .git/config. Since we can only fetch init-ed submodules >> anyway, it's quite unlikely that we will ever actually make use of the >> .gitmodules config. > > These are reasonable. > >> - Even if we do use the .gitmodules config values, it's unlikely that >> the values in .gitmodules will change often, so it _probably_ won't >> matter which one we choose. > > What bad things would we see if the value changes during the span of > history of the superproject we fetched? How often we would see > broken behaviour is immaterial and breakage being rare is a no excuse > to import a new code with designed-in flaw. Unless the "rare" is > "never", that is. Makes sense, I'll keep this mind. > I would think using ANY values from .gitmodules without having the > end-user agree with the settings and copying the settings to the > .git/config is a BUG. So if it mattered from which superproject > commit we took .gitmodules from, that would mean we already have > such a bug and it is not a new problem. > > That would be a reasonable argument for this topic. Together with > the previous point, i.e. we do not copy values we see in the in-tree > .gitmodules file to .git/config anyway, it would make a good enough > assurance, I would think. To clarify, does this opinion of "don't use config values that aren't copied into .git/config" extend to in-tree .gitmodules? Prior to this series, we always read the in-tree .gitmodules to get the config - the user does not need to copy the settings to .git/config, but we don't pick a commit to read .gitmodules from. If we still want to consider in-tree .gitmodules e.g. by merging .git/config and .gitmodules, then we still have the new problem of choosing the right .gitmodules. If the answer is "no, we don't even consider in-tree .gitmodules" (unless we really have to, like cloning a new submodule), that seems pretty safe and predictable because we wouldn't have to look in two different places to figure out what the user wants. And more crucially, we'd never have to guess which .gitmodules to read - which will become more of an issue as we add more support for init-ed but unpopulated submodules. > It leads to a possible #leftoverbit clean-up. Because we only fetch > submodules that are initialized, the API functions we are using in > this series has no reason to require us to feed _a_ commit in the > superproject to them so that they can find .gitmodules in them. Hm, this is true; an initialized submodule should already have the 'expected' information in .git/config. And if we no longer have to fret about whether we're reading the correct .gitmodules, we can revisit the idea of "init a subrepo using only its name". > Fixing the API can probably be left outside the scope of the topic, > to be done soon after the dust from the topic settles, I think, to > avoid distracting us from the topic. > > Thanks.
Glen Choo <chooglen@google.com> writes: > To clarify, does this opinion of "don't use config values that aren't > copied into .git/config" extend to in-tree .gitmodules? Prior to this > series, we always read the in-tree .gitmodules to get the config - the > user does not need to copy the settings to .git/config, but we don't > pick a commit to read .gitmodules from. I think we do, but I also think it was a huge mistake to allow repository data to directly affect the behaviour of local checkout. Fixing that is most likely outside the scope of this series, though.
Junio C Hamano <gitster@pobox.com> writes: > Glen Choo <chooglen@google.com> writes: > >> To clarify, does this opinion of "don't use config values that aren't >> copied into .git/config" extend to in-tree .gitmodules? Prior to this >> series, we always read the in-tree .gitmodules to get the config - the >> user does not need to copy the settings to .git/config, but we don't >> pick a commit to read .gitmodules from. > > I think we do, but I also think it was a huge mistake to allow > repository data to directly affect the behaviour of local checkout. I'm inclined to agree. > Fixing that is most likely outside the scope of this series, though. Agree. Thanks!
Glen Choo <chooglen@google.com> writes: > Junio C Hamano <gitster@pobox.com> writes: > >> Glen Choo <chooglen@google.com> writes: >> >>> To clarify, does this opinion of "don't use config values that aren't >>> copied into .git/config" extend to in-tree .gitmodules? Prior to this >>> series, we always read the in-tree .gitmodules to get the config - the >>> user does not need to copy the settings to .git/config, but we don't >>> pick a commit to read .gitmodules from. >> >> I think we do, but I also think it was a huge mistake to allow >> repository data to directly affect the behaviour of local checkout. > > I'm inclined to agree. > >> Fixing that is most likely outside the scope of this series, though. > > Agree. Thanks! I thought that this would have been the end of the discussion, but after reading <xmqqa6dpllmc.fsf@gitster.g>, I guess I had the wrong impression (oops). If I am reading everything correctly, we both agree that it's not good to read _any_ config values from .gitmodules (even if it's in-tree), and that we should clean it up outside of this topic. So for this topic to be merged into 'next', is it enough to say that I will fix this behavior in a follow up topic?
Glen Choo <chooglen@google.com> writes: > Glen Choo <chooglen@google.com> writes: > >> Junio C Hamano <gitster@pobox.com> writes: >> >>> Glen Choo <chooglen@google.com> writes: >>> >>>> To clarify, does this opinion of "don't use config values that aren't >>>> copied into .git/config" extend to in-tree .gitmodules? Prior to this >>>> series, we always read the in-tree .gitmodules to get the config - the >>>> user does not need to copy the settings to .git/config, but we don't >>>> pick a commit to read .gitmodules from. >>> >>> I think we do, but I also think it was a huge mistake to allow >>> repository data to directly affect the behaviour of local checkout. >> >> I'm inclined to agree. >> >>> Fixing that is most likely outside the scope of this series, though. >> >> Agree. Thanks! > > I thought that this would have been the end of the discussion, but after > reading <xmqqa6dpllmc.fsf@gitster.g>, I guess I had the wrong impression > (oops). > > If I am reading everything correctly, we both agree that it's not > good to read _any_ config values from .gitmodules (even if it's > in-tree), and that we should clean it up outside of this topic. So for > this topic to be merged into 'next', is it enough to say that I will fix > this behavior in a follow up topic? At least we should remember that is something to be fixed. It may not be you personally who addresses that issue, though ;-)
Junio C Hamano <gitster@pobox.com> writes: > Glen Choo <chooglen@google.com> writes: > >> Glen Choo <chooglen@google.com> writes: >> >>> Junio C Hamano <gitster@pobox.com> writes: >>> >>>> Glen Choo <chooglen@google.com> writes: >>>> >>>>> To clarify, does this opinion of "don't use config values that aren't >>>>> copied into .git/config" extend to in-tree .gitmodules? Prior to this >>>>> series, we always read the in-tree .gitmodules to get the config - the >>>>> user does not need to copy the settings to .git/config, but we don't >>>>> pick a commit to read .gitmodules from. >>>> >>>> I think we do, but I also think it was a huge mistake to allow >>>> repository data to directly affect the behaviour of local checkout. >>> >>> I'm inclined to agree. >>> >>>> Fixing that is most likely outside the scope of this series, though. >>> >>> Agree. Thanks! >> >> I thought that this would have been the end of the discussion, but after >> reading <xmqqa6dpllmc.fsf@gitster.g>, I guess I had the wrong impression >> (oops). >> >> If I am reading everything correctly, we both agree that it's not >> good to read _any_ config values from .gitmodules (even if it's >> in-tree), and that we should clean it up outside of this topic. So for >> this topic to be merged into 'next', is it enough to say that I will fix >> this behavior in a follow up topic? > > At least we should remember that is something to be fixed. It may > not be you personally who addresses that issue, though ;-) Perhaps squashing in a NEEDSWORK comment into [PATCH v5 09/10] will suffice? I can also resend this series if preferred. ----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ---- diff --git a/submodule.c b/submodule.c index 6e6b2d04e4..93c78a4dc3 100644 --- a/submodule.c +++ b/submodule.c @@ -795,6 +795,21 @@ static const char *default_name_or_path(const char *path_or_name) * superproject commit that points to the submodule, but this is * arbitrary - we can choose any (super_oid, path) that matches the * submodule's name. + * + * NEEDSWORK: Storing an arbitrary commit is undesirable because we can't + * guarantee that we're reading the commit that the user would expect. A better + * scheme would be to just fetch a submodule by its name. This requires two + * steps: + * - Create a function that behaves like repo_submodule_init(), but accepts a + * submodule name instead of treeish_name and path. This should be easy + * because repo_submodule_init() internally uses the submodule's name. + * + * - Replace most instances of 'struct submodule' (which is the .gitmodules + * config) with just the submodule name. This is OK because we expect + * submodule settings to be stored in .git/config (via "git submodule init"), + * not .gitmodules. This also lets us delete get_non_gitmodules_submodule(), + * which constructs a bogus 'struct submodule' for the sake of giving a + * placeholder name to a gitlink. */ struct changed_submodule_data { /*
Glen Choo <chooglen@google.com> writes: > Perhaps squashing in a NEEDSWORK comment into [PATCH v5 09/10] will > suffice? I can also resend this series if preferred. It should work. Let me try it in the last integration cycle of today. > ----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ---- > > diff --git a/submodule.c b/submodule.c > index 6e6b2d04e4..93c78a4dc3 100644 > --- a/submodule.c > +++ b/submodule.c > @@ -795,6 +795,21 @@ static const char *default_name_or_path(const char *path_or_name) > * superproject commit that points to the submodule, but this is > * arbitrary - we can choose any (super_oid, path) that matches the > * submodule's name. > + * > + * NEEDSWORK: Storing an arbitrary commit is undesirable because we can't > + * guarantee that we're reading the commit that the user would expect. A better > + * scheme would be to just fetch a submodule by its name. This requires two > + * steps: > + * - Create a function that behaves like repo_submodule_init(), but accepts a > + * submodule name instead of treeish_name and path. This should be easy > + * because repo_submodule_init() internally uses the submodule's name. > + * > + * - Replace most instances of 'struct submodule' (which is the .gitmodules > + * config) with just the submodule name. This is OK because we expect > + * submodule settings to be stored in .git/config (via "git submodule init"), > + * not .gitmodules. This also lets us delete get_non_gitmodules_submodule(), > + * which constructs a bogus 'struct submodule' for the sake of giving a > + * placeholder name to a gitlink. > */ > struct changed_submodule_data { > /*