Message ID | 20200108231900.192476-1-emilyshaffer@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | clone: teach --single-branch and --branch during --recurse | expand |
On Wed, Jan 08, 2020 at 03:19:00PM -0800, Emily Shaffer wrote: Bah, in my attempt to keep the subject brief I was aiming to write "--recursive" but instead just wrote "--recurse" which is wrong. I can push another version with the fix if desired. > Previously, performing "git clone --recurse-submodules --single-branch" > resulted in submodules cloning all branches even though the superproject > cloned only one branch. Pipe --single-branch and its friend, --branch, > through the submodule helper framework to make it to 'clone' later on. > > Signed-off-by: Emily Shaffer <emilyshaffer@google.com> > --- > Note that 'branch' was already in use in git-submodules.sh, so > "submodule branch" aka 'sm_branch' was used to disambiguate the two. > > Documentation/git-submodule.txt | 6 +++++- > builtin/clone.c | 6 ++++++ > builtin/submodule--helper.c | 28 +++++++++++++++++++++++++--- > git-submodule.sh | 17 ++++++++++++++++- > t/t5617-clone-submodules-remote.sh | 26 ++++++++++++++++++++++++-- > 5 files changed, 76 insertions(+), 7 deletions(-)
On Wed, Jan 08, 2020 at 03:19:00PM -0800, Emily Shaffer wrote: > Subject: clone: teach --single-branch and --branch during --recurse A minor nit, but this subject confused me for a moment. I think we'd usually say "teach" for a new feature being implemented, and this is really just about passing along the existing features. Something like: clone: pass --single-branch and --branch when recursing submodules would have been a bit more obvious (to me anyway). > Previously, performing "git clone --recurse-submodules --single-branch" > resulted in submodules cloning all branches even though the superproject > cloned only one branch. Pipe --single-branch and its friend, --branch, > through the submodule helper framework to make it to 'clone' later on. Since I don't really use submodules, I don't have much data or even intuition to go on. But could this be a regression for some situations? E.g., imagine I have a repo "parent" whose branch "foo" has a submodule "child", but "child" only has a branch "bar". What happens now if I "git clone --recurse-submodules --single-branch -b foo parent", and what will happen after this patch? I think it works before, but doesn't now. Setting up like this: git init child ( cd child git checkout -b bar echo whatever >file git add file git commit -m 'child commit' ) git init parent cd parent git checkout -b foo git submodule add $PWD/../child git commit -m 'add submodule' if I use the current tip of master, I get: $ git clone --recurse-submodules --single-branch -b foo parent cur Cloning into 'cur'... done. Submodule 'child' (/home/peff/tmp/parent/../child) registered for path 'child' Cloning into '/home/peff/tmp/cur/child'... done. Submodule path 'child': checked out 'b5cbfcc9fec3b7d67e305468624fed2ba1aa4758' $ git -C cur/child log -1 --oneline | cat b5cbfcc child commit with your patch, I get: $ git.compile clone --recurse-submodules --single-branch -b foo parent new Cloning into 'new'... done. Submodule 'child' (/home/peff/tmp/parent/../child) registered for path 'child' Cloning into '/home/peff/tmp/new/child'... warning: Could not find remote branch foo to clone. fatal: Remote branch foo not found in upstream origin fatal: clone of '/home/peff/tmp/parent/../child' into submodule path '/home/peff/tmp/new/child' failed Failed to clone 'child'. Retry scheduled Cloning into '/home/peff/tmp/new/child'... warning: Could not find remote branch foo to clone. fatal: Remote branch foo not found in upstream origin fatal: clone of '/home/peff/tmp/parent/../child' into submodule path '/home/peff/tmp/new/child' failed Failed to clone 'child' a second time, aborting $ git -C new/child log -1 --oneline | cat 11acb3a add submodule (there's nothing checked out in the submodule). I have no idea how common this kind of thing would be, and I expect in most cases your patch would do what people want. But we might need to be better about retrying without those options when the first clone fails. -Peff
On Thu, Jan 09, 2020 at 03:11:50AM -0500, Jeff King wrote: > On Wed, Jan 08, 2020 at 03:19:00PM -0800, Emily Shaffer wrote: > > > Subject: clone: teach --single-branch and --branch during --recurse > > A minor nit, but this subject confused me for a moment. I think we'd > usually say "teach" for a new feature being implemented, and this is > really just about passing along the existing features. Something like: > > clone: pass --single-branch and --branch when recursing submodules > > would have been a bit more obvious (to me anyway). > > > Previously, performing "git clone --recurse-submodules --single-branch" > > resulted in submodules cloning all branches even though the superproject > > cloned only one branch. Pipe --single-branch and its friend, --branch, > > through the submodule helper framework to make it to 'clone' later on. > > Since I don't really use submodules, I don't have much data or even > intuition to go on. But could this be a regression for some situations? > E.g., imagine I have a repo "parent" whose branch "foo" has a submodule > "child", but "child" only has a branch "bar". What happens now if I "git > clone --recurse-submodules --single-branch -b foo parent", and what will > happen after this patch? > > I think it works before, but doesn't now. > > Setting up like this: > > git init child > ( > cd child > git checkout -b bar > echo whatever >file > git add file > git commit -m 'child commit' > ) > > git init parent > cd parent > git checkout -b foo > git submodule add $PWD/../child > git commit -m 'add submodule' > > if I use the current tip of master, I get: > > $ git clone --recurse-submodules --single-branch -b foo parent cur > Cloning into 'cur'... > done. > Submodule 'child' (/home/peff/tmp/parent/../child) registered for path 'child' > Cloning into '/home/peff/tmp/cur/child'... > done. > Submodule path 'child': checked out 'b5cbfcc9fec3b7d67e305468624fed2ba1aa4758' > > $ git -C cur/child log -1 --oneline | cat > b5cbfcc child commit > > with your patch, I get: > > $ git.compile clone --recurse-submodules --single-branch -b foo parent new > Cloning into 'new'... > done. > Submodule 'child' (/home/peff/tmp/parent/../child) registered for path 'child' > Cloning into '/home/peff/tmp/new/child'... > warning: Could not find remote branch foo to clone. > fatal: Remote branch foo not found in upstream origin > fatal: clone of '/home/peff/tmp/parent/../child' into submodule path '/home/peff/tmp/new/child' failed > Failed to clone 'child'. Retry scheduled > Cloning into '/home/peff/tmp/new/child'... > warning: Could not find remote branch foo to clone. > fatal: Remote branch foo not found in upstream origin > fatal: clone of '/home/peff/tmp/parent/../child' into submodule path '/home/peff/tmp/new/child' failed > Failed to clone 'child' a second time, aborting > > $ git -C new/child log -1 --oneline | cat > 11acb3a add submodule > > (there's nothing checked out in the submodule). > > I have no idea how common this kind of thing would be, and I expect in > most cases your patch would do what people want. But we might need to be > better about retrying without those options when the first clone fails. Yeah, that's interesting. A retry sounds like a pretty solid approach, although if someone's being cautious and using --single-branch I wonder if that's really something they want (since that's avoiding grabbing extraneous branches). I suppose at the very least, --single-branch without --branch should become recursive. Whether --branch should become recursive, I'm not totally sure. The two scenarios I see in conflict are this: - Superproject branch "foo" - submodule A branch "foo" - submodule B branch "foo" and - Superproject branch "foo" - submodule A branch "bar" - submodule B branch "baz" If we propagate --branch, the first scenario Just Works, and the second scenario requires something like: git clone --recurse-submodules=no --single-branch --branch foo https://superproject git submodule update --init --single-branch --branch bar A/ git submodule update --init --single-branch --branch baz B/ (I guess if the superproject knows what branch it wants for all the submodules, you could also just do "git submodule update --init --single-branch" and have it go and ask for all of them.) If we don't propagate --branch, the second scenario Just Works, and the first scenario requires something like: git clone --recurse-submodules=no --single-branch --branch foo https://superproject git submodule update --init --single-branch --branch foo (That is, I think as long as 'update' takes --branch, even if it's not passed along by 'clone', it should still work OK when delegating to everyone.) Let me know if I misunderstood what you were worried about. I don't use submodules heavily for myself either. I'll try and ask around a little to see what folks want, at least here. The ergonomics seem pretty similar, so I guess it comes down to having explicit documentation. > > -Peff
On Thu, Jan 16, 2020 at 02:38:00PM -0800, Emily Shaffer wrote: > > I have no idea how common this kind of thing would be, and I expect in > > most cases your patch would do what people want. But we might need to be > > better about retrying without those options when the first clone fails. > > Yeah, that's interesting. A retry sounds like a pretty solid approach, > although if someone's being cautious and using --single-branch I wonder > if that's really something they want (since that's avoiding grabbing > extraneous branches). > > I suppose at the very least, --single-branch without --branch should > become recursive. Whether --branch should become recursive, I'm not > totally sure. Yeah, I think it may be worth separating out how to think about the two options. It's a lot more plausible to me that --single-branch would want to recurse than --branch. > The two scenarios I see in conflict are this: > - Superproject branch "foo" > - submodule A branch "foo" > - submodule B branch "foo" > and > - Superproject branch "foo" > - submodule A branch "bar" > - submodule B branch "baz" > > If we propagate --branch, the first scenario Just Works, and the second > scenario requires something like: > > git clone --recurse-submodules=no --single-branch --branch foo https://superproject > git submodule update --init --single-branch --branch bar A/ > git submodule update --init --single-branch --branch baz B/ > > (I guess if the superproject knows what branch it wants for all the submodules, > you could also just do "git submodule update --init --single-branch" and > have it go and ask for all of them.) > > If we don't propagate --branch, the second scenario Just Works, and the > first scenario requires something like: > > git clone --recurse-submodules=no --single-branch --branch foo https://superproject > git submodule update --init --single-branch --branch foo > > (That is, I think as long as 'update' takes --branch, even if it's not > passed along by 'clone', it should still work OK when delegating to > everyone.) > > Let me know if I misunderstood what you were worried about. Right, that's exactly my concern. You're making one case work but breaking the other. It really seems like there's no particular reason to assume that the superproject branch corresponds to the submodule branch (or even submodules of submodules). I imagine it would in some cases (like trying to replace the use of "repo" in Android), but that's just one model. It would make more sense to me to either (or both): - make sure that .gitmodules has enough information about which branch to use for each submodule - offer an extra option for the default branch to use for any submodules. This is still not general enough to cover all situations (e.g., the bar/baz you showed above), but it at least makes it relatively easy to cover the simple cases, without breaking any existing ones. -Peff
On Fri, Jan 17, 2020 at 04:03:19PM -0500, Jeff King wrote: > (like trying to replace the use of "repo" in Android) Oops, you saw right through us ;) > It would make more sense to me to either (or both): > > - make sure that .gitmodules has enough information about which branch > to use for each submodule Hum. I don't work with them day to day, but aren't we already in that state? Is that not what the 'branch' option for each submodule means? > > - offer an extra option for the default branch to use for any > submodules. This is still not general enough to cover all situations > (e.g., the bar/baz you showed above), but it at least makes it > relatively easy to cover the simple cases, without breaking any > existing ones. Yeah, this is sort of the direction my mind went too - "not --branch recursively, but --submodule-branch". But that breaks down when you've got a nontrivial number of submodules, at which point you're gonna have a hard time unless you've got the .gitmodules configured correctly. Well, as for this patch, let me try it with just --single-branch and see whether that works for the case the user reported. I can head back to the drawing board if not. - Emily
On Mon, Jan 27, 2020 at 02:20:19PM -0800, Emily Shaffer wrote: > On Fri, Jan 17, 2020 at 04:03:19PM -0500, Jeff King wrote: > > > (like trying to replace the use of "repo" in Android) > > Oops, you saw right through us ;) > > > It would make more sense to me to either (or both): > > > > - make sure that .gitmodules has enough information about which branch > > to use for each submodule > > Hum. I don't work with them day to day, but aren't we already in that > state? Is that not what the 'branch' option for each submodule means? I've been corrected off-list that the 'branch' in .gitmodules is used during 'git submodule update --remote', but not during 'git submodule init' or 'git clone --recurse-submodules'. Then, for the problem in discussion for this thread, it seems like a better choice is something like 'git clone --recurse-submdoules --use-gitmodules' or whatever we want to call it - e.g., rather than fetching the branch where the server knows HEAD, ask the .gitmodules to figure out which branch? It seems like that ought to live separately from --single-branch. In the case where you very strictly only want to fetch one branch (not two branches) I suppose you'd want something like 'git clone --recurse-submodules --single-branch --branch=mysuperprojectbranch --use-gitmodules' to make sure that only one branch per repo comes down. With n submodules of various naming schemas, provenance, etc., I don't think there's a good case for recursing --branch one way or another; it seems like filling out some config is the way to go. I guess we could also teach it to take some input like --submodule-branch-spec=foo.txt, and/or a multiply provided --submodule-branch foo=foobranch --submodule-branch bar/baz=bazbranch. [foo.txt] foo=foobranch bar/baz=bazbranch With that approach, then someone gets a little more flexibility than relying on what the .gitmodules has set up. > > - offer an extra option for the default branch to use for any > > submodules. This is still not general enough to cover all situations > > (e.g., the bar/baz you showed above), but it at least makes it > > relatively easy to cover the simple cases, without breaking any > > existing ones. > > Yeah, this is sort of the direction my mind went too - "not > --branch recursively, but --submodule-branch". But that breaks down when you've > got a nontrivial number of submodules, at which point you're gonna have > a hard time unless you've got the .gitmodules configured correctly. > > > Well, as for this patch, let me try it with just --single-branch and see > whether that works for the case the user reported. I can head back to > the drawing board if not. With only half the rework of my patch done, I'm starting to convince myself it's not actually going to work :) Well, I'll still try and see. - Emily
On Mon, Jan 27, 2020 at 02:20:19PM -0800, Emily Shaffer wrote: > > It would make more sense to me to either (or both): > > > > - make sure that .gitmodules has enough information about which branch > > to use for each submodule > > Hum. I don't work with them day to day, but aren't we already in that > state? Is that not what the 'branch' option for each submodule means? Probably? :) I've never used the feature myself, but it does seem like the right thing (and should definitely take precedence over any "-b" option passed to the superproject). > > - offer an extra option for the default branch to use for any > > submodules. This is still not general enough to cover all situations > > (e.g., the bar/baz you showed above), but it at least makes it > > relatively easy to cover the simple cases, without breaking any > > existing ones. > > Yeah, this is sort of the direction my mind went too - "not > --branch recursively, but --submodule-branch". But that breaks down when you've > got a nontrivial number of submodules, at which point you're gonna have > a hard time unless you've got the .gitmodules configured correctly. Right. Probably the right answer for that bar/baz case is to complain to the superproject owner that they didn't put branch fields into their .gitmodules. So... > Well, as for this patch, let me try it with just --single-branch and see > whether that works for the case the user reported. I can head back to > the drawing board if not. Yeah, that makes perfect sense to me. -Peff
On Mon, Jan 27, 2020 at 02:49:14PM -0800, Emily Shaffer wrote: > > > - make sure that .gitmodules has enough information about which branch > > > to use for each submodule > > > > Hum. I don't work with them day to day, but aren't we already in that > > state? Is that not what the 'branch' option for each submodule means? > > I've been corrected off-list that the 'branch' in .gitmodules is used > during 'git submodule update --remote', but not during 'git submodule > init' or 'git clone --recurse-submodules'. Then, for the problem in > discussion for this thread, it seems like a better choice is something > like 'git clone --recurse-submdoules --use-gitmodules' or whatever we > want to call it - e.g., rather than fetching the branch where the server > knows HEAD, ask the .gitmodules to figure out which branch? Oof, I should have read this message before responding to the other one. ;) > It seems like that ought to live separately from --single-branch. In the > case where you very strictly only want to fetch one branch (not two > branches) I suppose you'd want something like 'git clone > --recurse-submodules --single-branch --branch=mysuperprojectbranch > --use-gitmodules' to make sure that only one branch per repo comes down. > > With n submodules of various naming schemas, provenance, etc., I don't > think there's a good case for recursing --branch one way or another; it > seems like filling out some config is the way to go. Yeah, I do still think that it makes sense for clone to pass along --single-branch, regardless, and then deal with branch selection problem separately on top. > I guess we could also teach it to take some input like > --submodule-branch-spec=foo.txt, and/or a multiply provided > --submodule-branch foo=foobranch --submodule-branch bar/baz=bazbranch. > > [foo.txt] > foo=foobranch > bar/baz=bazbranch > > With that approach, then someone gets a little more flexibility than > relying on what the .gitmodules has set up. Yeah, I agree that the most general form is being able to specify the mapping for each individually. At first I wondered why you'd ever _not_ want to just use the branches specified in .gitmodules. But I suppose that gets embedded in the superproject history, which gets awkward as those commits move between branches. E.g., for an android-like project, you don't want to make a commit that says "use branch devel for all submodules" on the devel branch of your superproject. Eventually that will get merged to master, and you'd have to remember to switch it back to "master". So for the simple case, you probably do want to be able to say "use this branch for cloning all submodules". For the complex cases, you'd need that full mapping. But I think it may be worth it to punt on that for now. Even if we eventually added such a feature, I think we'd still want the simpler version anyway (because when you _can_ use it, it's going to be much easier). So there's nothing lost by doing the minimal thing now and waiting to see if more complex use cases even show up. Another thing occurs to me, though: should the binding of this submodule default branch be written to disk (e.g., a config option)? I'm thinking specifically that if you do: git clone --submodule-branch=devel -b devel superproject and then later, you "git fetch" and find that somebody has added a new submodule, you'd presumably want the devel branch of that, too? -Peff
On Mon, Jan 27, 2020 at 06:10:07PM -0500, Jeff King wrote: > On Mon, Jan 27, 2020 at 02:49:14PM -0800, Emily Shaffer wrote: > > > > > - make sure that .gitmodules has enough information about which branch > > > > to use for each submodule > > > > > > Hum. I don't work with them day to day, but aren't we already in that > > > state? Is that not what the 'branch' option for each submodule means? > > > > I've been corrected off-list that the 'branch' in .gitmodules is used > > during 'git submodule update --remote', but not during 'git submodule > > init' or 'git clone --recurse-submodules'. Then, for the problem in > > discussion for this thread, it seems like a better choice is something > > like 'git clone --recurse-submdoules --use-gitmodules' or whatever we > > want to call it - e.g., rather than fetching the branch where the server > > knows HEAD, ask the .gitmodules to figure out which branch? > > Oof, I should have read this message before responding to the other one. ;) > > > It seems like that ought to live separately from --single-branch. In the > > case where you very strictly only want to fetch one branch (not two > > branches) I suppose you'd want something like 'git clone > > --recurse-submodules --single-branch --branch=mysuperprojectbranch > > --use-gitmodules' to make sure that only one branch per repo comes down. > > > > With n submodules of various naming schemas, provenance, etc., I don't > > think there's a good case for recursing --branch one way or another; it > > seems like filling out some config is the way to go. > > Yeah, I do still think that it makes sense for clone to pass along > --single-branch, regardless, and then deal with branch selection problem > separately on top. Sure; I've got that ready to send shortly. It seems to grab HEAD of the remote for each submodule, and then checkout the specific commit ID the superproject wants - in my test case, that commit ID was a direct ancestor of 'master', so the single branch only got 'master'. I'm not sure how it would work with a commit ID which doesn't exist in the single branch that was fetched; I'll write a test and have a look. > > > I guess we could also teach it to take some input like > > --submodule-branch-spec=foo.txt, and/or a multiply provided > > --submodule-branch foo=foobranch --submodule-branch bar/baz=bazbranch. > > > > [foo.txt] > > foo=foobranch > > bar/baz=bazbranch > > > > With that approach, then someone gets a little more flexibility than > > relying on what the .gitmodules has set up. > > Yeah, I agree that the most general form is being able to specify the > mapping for each individually. At first I wondered why you'd ever _not_ > want to just use the branches specified in .gitmodules. But I suppose > that gets embedded in the superproject history, which gets awkward as > those commits move between branches. E.g., for an android-like project, > you don't want to make a commit that says "use branch devel for all > submodules" on the devel branch of your superproject. Eventually that > will get merged to master, and you'd have to remember to switch it back > to "master". Yeah, or I suppose I might be doing something weird, like wanting to run integration tests for the whole project on changes in just one submodule, or something. > So for the simple case, you probably do want to be able to say "use this > branch for cloning all submodules". I think it still makes sense to call this out explicitly, yes? Or do you think that should just be the default? > > For the complex cases, you'd need that full mapping. But I think it may > be worth it to punt on that for now. Even if we eventually added such a > feature, I think we'd still want the simpler version anyway (because > when you _can_ use it, it's going to be much easier). So there's nothing > lost by doing the minimal thing now and waiting to see if more complex > use cases even show up. > > Another thing occurs to me, though: should the binding of this submodule > default branch be written to disk (e.g., a config option)? I'm thinking > specifically that if you do: > > git clone --submodule-branch=devel -b devel superproject > > and then later, you "git fetch" and find that somebody has added a new > submodule, you'd presumably want the devel branch of that, too? This made me think - I wonder if it makes sense to take --submodule-branch as a wildcarded spec instead. So in your case, I could say, git clone --submodule-branch *=devel -b devel superproject And then I don't need to do anything differently for 'git fetch' later. This also opens the door for some repos getting special treatment: git clone --submodule-branch-file=foo.txt -b dev example foo.txt: curl=stable-1.2.3 nlohmann=v2.28 example-*=dev *=master (specifying specific versions for some source dependencies, dev branches for submodules which are associated with with 'example' superproject and might be getting active development, and a wild guess for everything else) I think that also tends to match the glob-expansion configs we use for other things. One thing sticking out to me about the idea of providing --submodule-branch is that you need to know what's in the repo before you clone it the first time, which being able to use globbing like this kind of helps with. But then, I suppose if you don't know what you're looking for, you're not also looking for a very precise filter on your clone ;) - Emily
On Mon, Jan 27, 2020 at 05:08:41PM -0800, Emily Shaffer wrote: > > Yeah, I do still think that it makes sense for clone to pass along > > --single-branch, regardless, and then deal with branch selection problem > > separately on top. > > Sure; I've got that ready to send shortly. It seems to grab HEAD of the > remote for each submodule, and then checkout the specific commit ID the > superproject wants - in my test case, that commit ID was a direct > ancestor of 'master', so the single branch only got 'master'. I'm not > sure how it would work with a commit ID which doesn't exist in the > single branch that was fetched; I'll write a test and have a look. Yeah, it's definitely worth exploring how that works. I thought we had some kind of fallback for when we didn't manage to fetch the object. But maybe I am confusing it with the fallback for "we tried to fetch this specific object, but the other side doesn't allow that, so we grabbed a branch instead". > > So for the simple case, you probably do want to be able to say "use this > > branch for cloning all submodules". > > I think it still makes sense to call this out explicitly, yes? Or do you > think that should just be the default? Yes, I think it should be a separate option from "--branch". > This made me think - I wonder if it makes sense to take > --submodule-branch as a wildcarded spec instead. So in your case, I > could say, > > git clone --submodule-branch *=devel -b devel superproject > > And then I don't need to do anything differently for 'git fetch' later. > This also opens the door for some repos getting special treatment: > > git clone --submodule-branch-file=foo.txt -b dev example > > foo.txt: > curl=stable-1.2.3 > nlohmann=v2.28 > example-*=dev > *=master If we write it all as config, I think things may get simpler. IIRC, there is already submodule.*.foo config in .git/config (that can mirror and override what's in .gitmodules). So if we had some config option for "clone this branch for the submodule instead of HEAD", then that means you can do: git clone -c submodule.foo.clonehead=devel ... and the result would be used by the submodule code, but also saved for future invocations. Likewise, if there's no "clonehead" config for a particular submodule, if we fall back to submodule.defaultclonehead, then you could do: git clone -c submodule.defaultclonehead=devel ... and it would also be saved as the default for future submodules. And all without having to invent a new submodule-branch-file format. The name "clonehead" isn't great. I'm not sure if this ought to be submodule.*.branch (since I don't quite know what that's used for). I think you'll have to explore that a bit. > I think that also tends to match the glob-expansion configs we use for > other things. One thing sticking out to me about the idea of providing > --submodule-branch is that you need to know what's in the repo before > you clone it the first time, which being able to use globbing like this > kind of helps with. But then, I suppose if you don't know what you're > looking for, you're not also looking for a very precise filter on your > clone ;) Yeah; the scheme I outlined above only allow specifying the value for one submodule, or the fallback default. It wouldn't allow arbitrary globbing. But I also suspect nobody wants that. If you know what the submodules are, then you can set up config for each. If you don't, then "everything" is the only glob that makes sense. -Peff
On Mon, Jan 27, 2020 at 08:31:39PM -0500, Jeff King wrote: > On Mon, Jan 27, 2020 at 05:08:41PM -0800, Emily Shaffer wrote: > > > > Yeah, I do still think that it makes sense for clone to pass along > > > --single-branch, regardless, and then deal with branch selection problem > > > separately on top. > > > > Sure; I've got that ready to send shortly. It seems to grab HEAD of the > > remote for each submodule, and then checkout the specific commit ID the > > superproject wants - in my test case, that commit ID was a direct > > ancestor of 'master', so the single branch only got 'master'. I'm not > > sure how it would work with a commit ID which doesn't exist in the > > single branch that was fetched; I'll write a test and have a look. > > Yeah, it's definitely worth exploring how that works. I thought we had > some kind of fallback for when we didn't manage to fetch the object. But > maybe I am confusing it with the fallback for "we tried to fetch this > specific object, but the other side doesn't allow that, so we grabbed a > branch instead". Ok, so I gave it a try. Some well-trimmed trace output: 1) git clone --recurse-submodules --single-branch <url> (the branch in question is remote's HEAD) - Normal clone of superproject - git submodule--helper update-clone --progress --require-init --single-branch -- - ultimately... - git clone --no-checkout --progress --separate-git-dir '/.../super_clone/.git/modules/sub' --single-branch -- '/path/to/submodule/source' '/path/to/submodule/destination' - git checkout -f -q <ID of submodule's HEAD> 2) git clone --recurse-submodules --single-branch --branch other <url> 'other' points to a commit of 'sub' which is not an ancestor of 'sub''s current HEAD. - Normal clone of superproject identical to 1) - git submodule--helper update-clone --progress --require-init --single-branch -- - ultimately... - git clone --no-checkout --progress --separate-git-dir '/.../super_clone/.git/modules/sub' --single-branch -- '/path/to/submodule/source' '/path/to/submodule/destination' - git fetch origin <ID of submodule's other branch> - git checkout -f -q <ID of submodule's other branch> So, somewhere in the submodule machinery, it looks like we check if we have the commit in question, and if not, we do another fetch. So in this case, we reach to the server twice per submodule. On the bright side, it doesn't fall over; on the dim side, I'd think we could ask for this ref up front along with whatever branch HEAD is. I thought there was a way we could tell the server we want 'master' as well as '58c34ed'? > > > > So for the simple case, you probably do want to be able to say "use this > > > branch for cloning all submodules". > > > > I think it still makes sense to call this out explicitly, yes? Or do you > > think that should just be the default? > > Yes, I think it should be a separate option from "--branch". > > > This made me think - I wonder if it makes sense to take > > --submodule-branch as a wildcarded spec instead. So in your case, I > > could say, > > > > git clone --submodule-branch *=devel -b devel superproject > > > > And then I don't need to do anything differently for 'git fetch' later. > > This also opens the door for some repos getting special treatment: > > > > git clone --submodule-branch-file=foo.txt -b dev example > > > > foo.txt: > > curl=stable-1.2.3 > > nlohmann=v2.28 > > example-*=dev > > *=master > > If we write it all as config, I think things may get simpler. IIRC, > there is already submodule.*.foo config in .git/config (that can mirror > and override what's in .gitmodules). Hm. But at clone time, there is no .git/config yet, which is why I proposed a file passed in at the command line. Although it does seem to make sense to write down those preferences in the .git/config after. I guess you could pass in configs at the command line, though, and then you don't have to massage it to write your config after fetch. > So if we had some config option for "clone this branch for the submodule > instead of HEAD", then that means you can do: > > git clone -c submodule.foo.clonehead=devel ... > > and the result would be used by the submodule code, but also saved for > future invocations. Likewise, if there's no "clonehead" config for a > particular submodule, if we fall back to submodule.defaultclonehead, > then you could do: > > git clone -c submodule.defaultclonehead=devel ... > > and it would also be saved as the default for future submodules. And > all without having to invent a new submodule-branch-file format. > > The name "clonehead" isn't great. Au contraire - it might be my new go-to insult. ;) > I'm not sure if this ought to be submodule.*.branch (since I don't > quite know what that's used for). I think you'll have to explore that > a bit. > > > I think that also tends to match the glob-expansion configs we use for > > other things. One thing sticking out to me about the idea of providing > > --submodule-branch is that you need to know what's in the repo before > > you clone it the first time, which being able to use globbing like this > > kind of helps with. But then, I suppose if you don't know what you're > > looking for, you're not also looking for a very precise filter on your > > clone ;) > > Yeah; the scheme I outlined above only allow specifying the value for > one submodule, or the fallback default. It wouldn't allow arbitrary > globbing. But I also suspect nobody wants that. If you know what the > submodules are, then you can set up config for each. If you don't, then > "everything" is the only glob that makes sense. Yeah, I suspect you're right and this fancy globbing falls under YAGNI. - Emily
diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt index 22425cbc76..8c516a9670 100644 --- a/Documentation/git-submodule.txt +++ b/Documentation/git-submodule.txt @@ -133,7 +133,7 @@ If you really want to remove a submodule from the repository and commit that use linkgit:git-rm[1] instead. See linkgit:gitsubmodules[7] for removal options. -update [--init] [--remote] [-N|--no-fetch] [--[no-]recommend-shallow] [-f|--force] [--checkout|--rebase|--merge] [--reference <repository>] [--depth <depth>] [--recursive] [--jobs <n>] [--] [<path>...]:: +update [--init] [--remote] [-N|--no-fetch] [--[no-]recommend-shallow] [-f|--force] [--checkout|--rebase|--merge] [--reference <repository>] [--depth <depth>] [--recursive] [--jobs <n>] [--single-branch] [-b|--branch <name>] [--] [<path>...]:: + -- Update the registered submodules to match what the superproject @@ -430,6 +430,10 @@ options carefully. Clone new submodules in parallel with as many jobs. Defaults to the `submodule.fetchJobs` option. +--single-branch:: + This option is only valid for the update command. + Clone only one branch during update, HEAD or --branch. + <path>...:: Paths to submodule(s). When specified this will restrict the command to only operate on the submodules found at the specified paths. diff --git a/builtin/clone.c b/builtin/clone.c index 6dee265cc9..293cef8b30 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -808,6 +808,12 @@ static int checkout(int submodule_progress) argv_array_push(&args, "--no-fetch"); } + if (option_single_branch) + argv_array_push(&args, "--single-branch"); + + if (option_branch) + argv_array_pushf(&args, "--branch=%s", option_branch); + err = run_command_v_opt(args.argv, RUN_GIT_CMD); argv_array_clear(&args); } diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c index c72931ecd7..92bd823d38 100644 --- a/builtin/submodule--helper.c +++ b/builtin/submodule--helper.c @@ -1225,7 +1225,8 @@ static int module_deinit(int argc, const char **argv, const char *prefix) static int clone_submodule(const char *path, const char *gitdir, const char *url, const char *depth, struct string_list *reference, int dissociate, - int quiet, int progress) + int quiet, int progress, int single_branch, + const char *branch) { struct child_process cp = CHILD_PROCESS_INIT; @@ -1247,6 +1248,10 @@ static int clone_submodule(const char *path, const char *gitdir, const char *url argv_array_push(&cp.args, "--dissociate"); if (gitdir && *gitdir) argv_array_pushl(&cp.args, "--separate-git-dir", gitdir, NULL); + if (single_branch) + argv_array_push(&cp.args, "--single-branch"); + if (branch) + argv_array_pushl(&cp.args, "--branch", branch, NULL); argv_array_push(&cp.args, "--"); argv_array_push(&cp.args, url); @@ -1373,6 +1378,8 @@ static int module_clone(int argc, const char **argv, const char *prefix) struct string_list reference = STRING_LIST_INIT_NODUP; int dissociate = 0, require_init = 0; char *sm_alternate = NULL, *error_strategy = NULL; + int single_branch = 0; + char *branch = NULL; struct option module_clone_options[] = { OPT_STRING(0, "prefix", &prefix, @@ -1400,12 +1407,17 @@ static int module_clone(int argc, const char **argv, const char *prefix) N_("force cloning progress")), OPT_BOOL(0, "require-init", &require_init, N_("disallow cloning into non-empty directory")), + OPT_BOOL(0, "single-branch", &single_branch, + N_("clone only one branch, HEAD or --branch")), + OPT_STRING('b', "branch", &branch, "<branch>", + N_("checkout <branch> instead of the remote's HEAD")), OPT_END() }; const char *const git_submodule_helper_usage[] = { N_("git submodule--helper clone [--prefix=<path>] [--quiet] " "[--reference <repository>] [--name <name>] [--depth <depth>] " + "[--single-branch] [-b | --branch <name>] " "--url <url> --path <path>"), NULL }; @@ -1438,7 +1450,7 @@ static int module_clone(int argc, const char **argv, const char *prefix) prepare_possible_alternates(name, &reference); if (clone_submodule(path, sm_gitdir, url, depth, &reference, dissociate, - quiet, progress)) + quiet, progress, single_branch, branch)) die(_("clone of '%s' into submodule path '%s' failed"), url, path); } else { @@ -1562,6 +1574,8 @@ struct submodule_update_clone { const char *depth; const char *recursive_prefix; const char *prefix; + int single_branch; + const char *branch; /* to be consumed by git-submodule.sh */ struct update_clone_data *update_clone; @@ -1578,7 +1592,7 @@ struct submodule_update_clone { }; #define SUBMODULE_UPDATE_CLONE_INIT {0, MODULE_LIST_INIT, 0, \ SUBMODULE_UPDATE_STRATEGY_INIT, 0, 0, -1, STRING_LIST_INIT_DUP, 0, 0, \ - NULL, NULL, NULL, \ + NULL, NULL, NULL, 0, NULL,\ NULL, 0, 0, 0, NULL, 0, 0, 1} @@ -1718,6 +1732,10 @@ static int prepare_to_clone_next_submodule(const struct cache_entry *ce, argv_array_push(&child->args, "--dissociate"); if (suc->depth) argv_array_push(&child->args, suc->depth); + if (suc->single_branch) + argv_array_push(&child->args, "--single-branch"); + if (suc->branch) + argv_array_pushl(&child->args, "--branch", suc->branch, NULL); cleanup: strbuf_reset(&displaypath_sb); @@ -1897,6 +1915,10 @@ static int update_clone(int argc, const char **argv, const char *prefix) N_("force cloning progress")), OPT_BOOL(0, "require-init", &suc.require_init, N_("disallow cloning into non-empty directory")), + OPT_BOOL(0, "single-branch", &suc.single_branch, + N_("clone only one branch, HEAD or --branch")), + OPT_STRING('b', "branch", &suc.branch, "<branch>", + N_("checkout <branch> instead of the remote's HEAD")), OPT_END() }; diff --git a/git-submodule.sh b/git-submodule.sh index aaa1809d24..c2eadbb930 100755 --- a/git-submodule.sh +++ b/git-submodule.sh @@ -10,7 +10,7 @@ USAGE="[--quiet] [--cached] or: $dashless [--quiet] status [--cached] [--recursive] [--] [<path>...] or: $dashless [--quiet] init [--] [<path>...] or: $dashless [--quiet] deinit [-f|--force] (--all| [--] <path>...) - or: $dashless [--quiet] update [--init] [--remote] [-N|--no-fetch] [-f|--force] [--checkout|--merge|--rebase] [--[no-]recommend-shallow] [--reference <repository>] [--recursive] [--] [<path>...] + or: $dashless [--quiet] update [--init] [--remote] [-N|--no-fetch] [-f|--force] [--checkout|--merge|--rebase] [--[no-]recommend-shallow] [--reference <repository>] [--recursive] [--single-branch] [-b|--branch <name>] [--] [<path>...] or: $dashless [--quiet] set-branch (--default|--branch <branch>) [--] <path> or: $dashless [--quiet] set-url [--] <path> <newurl> or: $dashless [--quiet] summary [--cached|--files] [--summary-limit <n>] [commit] [--] [<path>...] @@ -47,6 +47,8 @@ custom_name= depth= progress= dissociate= +single_branch= +sm_branch= die_if_unmatched () { @@ -526,6 +528,17 @@ cmd_update() --jobs=*) jobs=$1 ;; + --single-branch) + single_branch=1 + ;; + -b|--branch) + case "$2" in '') usage ;; esac + sm_branch="--branch=$2" + shift + ;; + -b=*|--branch=*) + sm_branch=$1 + ;; --) shift break @@ -555,6 +568,8 @@ cmd_update() ${dissociate:+"--dissociate"} \ ${depth:+--depth "$depth"} \ ${require_init:+--require-init} \ + ${single_branch:+--single-branch} \ + ${sm_branch:+"$sm_branch"} \ $recommend_shallow \ $jobs \ -- \ diff --git a/t/t5617-clone-submodules-remote.sh b/t/t5617-clone-submodules-remote.sh index 37fcce9c40..390645897d 100755 --- a/t/t5617-clone-submodules-remote.sh +++ b/t/t5617-clone-submodules-remote.sh @@ -14,14 +14,16 @@ test_expect_success 'setup' ' cd sub && git init && test_commit subcommit1 && - git tag sub_when_added_to_super + git tag sub_when_added_to_super && + git branch other ) && git submodule add "file://$pwd/sub" sub && git commit -m "add submodule" && ( cd sub && test_commit subcommit2 - ) + ) && + git branch other ' test_expect_success 'clone with --no-remote-submodules' ' @@ -51,4 +53,24 @@ test_expect_success 'check the default is --no-remote-submodules' ' ) ' +test_expect_success 'clone with --single-branch' ' + test_when_finished "rm -rf super_clone" && + git clone --recurse-submodules --single-branch "file://$pwd/." super_clone && + ( + cd super_clone/sub && + git branch -a >branches && + test_must_fail grep other branches + ) +' + +test_expect_success 'clone with --single-branch and --branch' ' + test_when_finished "rm -rf super_clone" && + git clone --recurse-submodules --single-branch --branch other "file://$pwd/." super_clone && + ( + cd super_clone/sub && + git branch -a >branches && + test_must_fail grep master branches + ) +' + test_done
Previously, performing "git clone --recurse-submodules --single-branch" resulted in submodules cloning all branches even though the superproject cloned only one branch. Pipe --single-branch and its friend, --branch, through the submodule helper framework to make it to 'clone' later on. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> --- Note that 'branch' was already in use in git-submodules.sh, so "submodule branch" aka 'sm_branch' was used to disambiguate the two. Documentation/git-submodule.txt | 6 +++++- builtin/clone.c | 6 ++++++ builtin/submodule--helper.c | 28 +++++++++++++++++++++++++--- git-submodule.sh | 17 ++++++++++++++++- t/t5617-clone-submodules-remote.sh | 26 ++++++++++++++++++++++++-- 5 files changed, 76 insertions(+), 7 deletions(-)