Message ID | 2dbcd5419073f06def007be3746ce90fffaf6a6d.1538108385.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Filter alternate references | expand |
On Thu, Sep 27, 2018 at 09:25:42PM -0700, Taylor Blau wrote: > Let the repository that has alternates configure this command to avoid > trusting the alternate to provide us a safe command to run in the shell. > To behave differently on each alternate (e.g., only list tags from > alternate A, only heads from B) provide the path of the alternate as the > first argument. Well, you also need to pass the path so it knows which repo to look at. Which I think is the primary reason we do it, but behaving differently for each alternate is another option. > +core.alternateRefsCommand:: > + When advertising tips of available history from an alternate, use the shell to > + execute the specified command instead of linkgit:git-for-each-ref[1]. The > + first argument is the absolute path of the alternate. Output must be of the > + form: `%(objectname)`, where multiple tips are separated by newlines. I wonder if people may be confused about the %(objectname) syntax, since it's specific to for-each-ref. Now that we've simplified the output format to a single value, perhaps we should define it more directly. E.g., like: The output should contain one hex object id per line (i.e., the same as produced by `git for-each-ref --format='%(objectname)'`). Now that we've dropped the refname requirement from the output, it is more clear that this really does not have to be about refs at all. In the most technical sense, what we really allow in the output is any object id X for which the alternate promises it has all objects reachable from X. Ref tips are a convenient and efficient way of providing that, but they are not the only possibility (and likewise, it is fine to omit duplicates or even tips that are ancestors of other tips). I think that's probably getting _too_ technical, though. It probably makes sense to just keep thinking of these as "what are the ref tips". > +This is useful when a repository only wishes to advertise some of its > +alternate's references as ".have"'s. For example, to only advertise branch Maybe put ".have" into backticks for formatting? > +heads, configure `core.alternateRefsCommand` to the path of a script which runs > +`git --git-dir="$1" for-each-ref --format='%(objectname)' refs/heads`. Does that script actually work? Because of the way we invoke shell commands with arguments, I think we'd end up with: git --git-dir="$1" for-each-ref --format='%(objectname)' refs/heads "$@" Possibly for-each-ref would ignore the extra path argument (thinking it's a ref pattern that just doesn't match), but it's definitely not what you intended. You'd have to write: f() { git --git-dir=$1 ...etc; } f in the usual way. That's a minor pain, but it's what makes the more direct: /my/script work. The other alternative is to pass $GIT_DIR in the environment on behalf of the program. Then writing: git for-each-ref --format='%(objectname)' refs/heads would Just Work. But it's a bit subtle, since it is not immediately obvious that the command is meant to run in a different repository. > diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh > new file mode 100755 > index 0000000000..503dde35a4 > --- /dev/null > +++ b/t/t5410-receive-pack.sh > @@ -0,0 +1,49 @@ > +#!/bin/sh > + > +test_description='git receive-pack test' The name of this test file and the description are pretty vague. Can we say something like "test handling of receive-pack with alternate-refs config"? > +test_expect_success 'setup' ' > + test_commit one && > + git update-ref refs/heads/a HEAD && > + test_commit two && > + git update-ref refs/heads/b HEAD && > + test_commit three && > + git update-ref refs/heads/c HEAD && > + git clone --bare . fork && > + git clone fork pusher && > + ( > + cd fork && > + git update-ref --stdin <<-\EOF && > + delete refs/heads/a > + delete refs/heads/b > + delete refs/heads/c > + delete refs/heads/master > + delete refs/tags/one > + delete refs/tags/two > + delete refs/tags/three > + EOF > + echo "../../.git/objects" >objects/info/alternates > + ) > +' This setup is kind of convoluted. You're deleting those refs in the fork, I think, because we don't want them to suppress the duplicate .have lines from the alternate. Might it be easier to just create the .have lines we're interested in after the fact? I think we can also use "clone -s" to make the setup of the alternate a little simpler. I don't see the "pusher" repo being used for anything here. Leftover cruft from when you were using "git push" to test? So all together, perhaps something like: # we have a fork which points back to us as an alternate test_commit base && git clone -s . fork && # the alternate has two refs with new tips, in two separate hierarchies git checkout -b public/branch master && test_commit public && git checkout -b private/branch master && test_commit private And then... > +test_expect_success 'with core.alternateRefsCommand' ' > + write_script fork/alternate-refs <<-\EOF && > + git --git-dir="$1" for-each-ref \ > + --format="%(objectname)" \ > + refs/heads/a \ > + refs/heads/c > + EOF ...this can just look for refs/heads/public/, and... > + test_config -C fork core.alternateRefsCommand alternate-refs && > + git rev-parse a c >expect && ...we verify that we saw public/branch but not private/branch. It's not that much shorter, but I had trouble understanding from the setup why we needed to delete all those refs (and why we cared about those tags in the first place). > diff --git a/transport.c b/transport.c > index 2825debac5..e271b66603 100644 > --- a/transport.c > +++ b/transport.c > @@ -1328,10 +1328,21 @@ char *transport_anonymize_url(const char *url) > static void fill_alternate_refs_command(struct child_process *cmd, > const char *repo_path) The code change itself looks good to me. -Peff
On Fri, Sep 28, 2018 at 01:26:13AM -0400, Jeff King wrote: > On Thu, Sep 27, 2018 at 09:25:42PM -0700, Taylor Blau wrote: > > > Let the repository that has alternates configure this command to avoid > > trusting the alternate to provide us a safe command to run in the shell. > > To behave differently on each alternate (e.g., only list tags from > > alternate A, only heads from B) provide the path of the alternate as the > > first argument. > > Well, you also need to pass the path so it knows which repo to look at. > Which I think is the primary reason we do it, but behaving differently > for each alternate is another option. Yeah. I think that the clearer argument is yours, so I'll amend my copy. I am thinking of: To find the alternate, pass its absolute path as the first argument. How does that sound? > > +core.alternateRefsCommand:: > > + When advertising tips of available history from an alternate, use the shell to > > + execute the specified command instead of linkgit:git-for-each-ref[1]. The > > + first argument is the absolute path of the alternate. Output must be of the > > + form: `%(objectname)`, where multiple tips are separated by newlines. > > I wonder if people may be confused about the %(objectname) syntax, since > it's specific to for-each-ref. Now that we've simplified the output > format to a single value, perhaps we should define it more directly. > E.g., like: > > The output should contain one hex object id per line (i.e., the same > as produced by `git for-each-ref --format='%(objectname)'`). I think that that's clearer, thanks. I applied it pretty much as you suggested, but changed 'should' to 'must' and dropped the leading 'the'. > Now that we've dropped the refname requirement from the output, it is > more clear that this really does not have to be about refs at all. In > the most technical sense, what we really allow in the output is any > object id X for which the alternate promises it has all objects > reachable from X. Ref tips are a convenient and efficient way of > providing that, but they are not the only possibility (and likewise, it > is fine to omit duplicates or even tips that are ancestors of other > tips). > > I think that's probably getting _too_ technical, though. It probably > makes sense to just keep thinking of these as "what are the ref tips". Yep, I agree completely. > > +This is useful when a repository only wishes to advertise some of its > > +alternate's references as ".have"'s. For example, to only advertise branch > > Maybe put ".have" into backticks for formatting? Good idea, thanks. I took this locally as suggested. > > +heads, configure `core.alternateRefsCommand` to the path of a script which runs > > +`git --git-dir="$1" for-each-ref --format='%(objectname)' refs/heads`. > > Does that script actually work? Because of the way we invoke shell > commands with arguments, I think we'd end up with: > > git --git-dir="$1" for-each-ref --format='%(objectname)' refs/heads "$@" I think that you're right... > Possibly for-each-ref would ignore the extra path argument (thinking > it's a ref pattern that just doesn't match), but it's definitely not > what you intended. You'd have to write: > > f() { git --git-dir=$1 ...etc; } f > > in the usual way. That's a minor pain, but it's what makes the more > direct: > > /my/script > > work. ...but this was what I was trying to get across with saying "...to the path of a script which runs...", such that we would get the implicit scoping that you make explicit in your example with "f() { ... }; f". Does that seem OK as-is after the additional context? I think that after reading your response, it seems to be confusing, so perhaps it should be changed... > The other alternative is to pass $GIT_DIR in the environment on behalf > of the program. Then writing: > > git for-each-ref --format='%(objectname)' refs/heads > > would Just Work. But it's a bit subtle, since it is not immediately > obvious that the command is meant to run in a different repository. I think that we discussed this approach a bit off-list, and I had the idea that it was too fragile to work in practice, and that it would be too surprising for callers to suddenly be in a different world. I say this not because it wouldn't make this particular scenario more convenient, which it uncountably would, but because it would make other scenarios _more_ complicated. For example, if a caller uses an alternate reference backed, perhaps, MySQL (or anything that _isn't_ Git), they're not going to want to have these GIT_ environment variable set. So, I think that the greatest common denominator between the two is to pass the alternate's absolute path as the first argument. > > diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh > > new file mode 100755 > > index 0000000000..503dde35a4 > > --- /dev/null > > +++ b/t/t5410-receive-pack.sh > > @@ -0,0 +1,49 @@ > > +#!/bin/sh > > + > > +test_description='git receive-pack test' > > The name of this test file and the description are pretty vague. Can we > say something like "test handling of receive-pack with alternate-refs > config"? I left it intentionally vague, since I'd like for it to contain more tests about 'git receive-pack'-specific things in the future. I'm happy to change the name, though I wonder if we should change the filename accordingly, and if so, to what. > > +test_expect_success 'setup' ' > > + test_commit one && > > + git update-ref refs/heads/a HEAD && > > + test_commit two && > > + git update-ref refs/heads/b HEAD && > > + test_commit three && > > + git update-ref refs/heads/c HEAD && > > + git clone --bare . fork && > > + git clone fork pusher && > > + ( > > + cd fork && > > + git update-ref --stdin <<-\EOF && > > + delete refs/heads/a > > + delete refs/heads/b > > + delete refs/heads/c > > + delete refs/heads/master > > + delete refs/tags/one > > + delete refs/tags/two > > + delete refs/tags/three > > + EOF > > + echo "../../.git/objects" >objects/info/alternates > > + ) > > +' > > This setup is kind of convoluted. You're deleting those refs in the > fork, I think, because we don't want them to suppress the duplicate > .have lines from the alternate. Might it be easier to just create the > .have lines we're interested in after the fact? > I think we can also use "clone -s" to make the setup of the alternate a > little simpler. > > I don't see the "pusher" repo being used for anything here. Leftover > cruft from when you were using "git push" to test? > > So all together, perhaps something like: > > # we have a fork which points back to us as an alternate > test_commit base && > git clone -s . fork && > > # the alternate has two refs with new tips, in two separate hierarchies > git checkout -b public/branch master && > test_commit public && > git checkout -b private/branch master && > test_commit private > > And then... > > > +test_expect_success 'with core.alternateRefsCommand' ' > > + write_script fork/alternate-refs <<-\EOF && > > + git --git-dir="$1" for-each-ref \ > > + --format="%(objectname)" \ > > + refs/heads/a \ > > + refs/heads/c > > + EOF > > ...this can just look for refs/heads/public/, and... > > > + test_config -C fork core.alternateRefsCommand alternate-refs && > > + git rev-parse a c >expect && > > ...we verify that we saw public/branch but not private/branch. > > It's not that much shorter, but I had trouble understanding from the > setup why we needed to delete all those refs (and why we cared about > those tags in the first place). I agree with all of this. It's certainly roughly the same length, but I think that it makes it much easier to grok, and it addresses a comment that Junio made in an earlier response to this thread. So, two wins for the price of one :-). I had to make a couple of other changes that you didn't recommend: - Since we used to create fork with 'git clone --bare', the path of `core.alternateRefsCommand` grew an extra `../`, since we have to also traverse _out_ of the .git directory in a non-bare repository. Instead of this, I opted for both, with 'git clone -s --bare . fork', which means we don't have to check out a working copy, and we can avoid changing the line mentioned above. - Another thing that I had to decide on was what to give as a prefix for the test exercising 'core.alternateRefsPrefixes', which I decided to use 'refs/heads/private' for, which makes sure that we're seeing something different than 'core.alternateRefsCommand'. The diff is kind of long (so I'm avoiding sending it here), but I think that it's mostly self-explanatory from what you recommended to me and what I said above. > > diff --git a/transport.c b/transport.c > > index 2825debac5..e271b66603 100644 > > --- a/transport.c > > +++ b/transport.c > > @@ -1328,10 +1328,21 @@ char *transport_anonymize_url(const char *url) > > static void fill_alternate_refs_command(struct child_process *cmd, > > const char *repo_path) > > The code change itself looks good to me. Thanks for your review, as always. I'll wait until Monday to re-roll, just to make sure that there isn't any new feedback between now and then. Thanks, Taylor
On Fri, Sep 28, 2018 at 03:04:10PM -0700, Taylor Blau wrote: > > Well, you also need to pass the path so it knows which repo to look at. > > Which I think is the primary reason we do it, but behaving differently > > for each alternate is another option. > > Yeah. I think that the clearer argument is yours, so I'll amend my copy. > I am thinking of: > > To find the alternate, pass its absolute path as the first argument. > > How does that sound? Sounds good. > > > +heads, configure `core.alternateRefsCommand` to the path of a script which runs > > > +`git --git-dir="$1" for-each-ref --format='%(objectname)' refs/heads`. > > > > Does that script actually work? Because of the way we invoke shell > > commands with arguments, I think we'd end up with: > > > > git --git-dir="$1" for-each-ref --format='%(objectname)' refs/heads "$@" > [...] > ...but this was what I was trying to get across with saying "...to the > path of a script which runs...", such that we would get the implicit > scoping that you make explicit in your example with "f() { ... }; f". > > Does that seem OK as-is after the additional context? I think that after > reading your response, it seems to be confusing, so perhaps it should be > changed... Ah, OK. I totally missed that "path of a script" part. What you have is correct, then, but I do wonder if we could make it less subtle. Maybe something like: For example, if `/path/to/script` runs `git --git-dir="$1" for-each-ref --format='%(objectname)' refs/heads/`, then putting `/path/to/script` in `core.alternateRefsCommand` will show only the branch heads from the alternate. I dunno. It's certainly clunkier. I wonder if we would be less awkward to show the sample script in a fenced block, with the `#!/bin/sh` and everything. Or maybe just keep the text you have and add a note at the end like: Note that writing that `for-each-ref` command directly in the config option doesn't quite work, as it has to handle the path argument specially. I don't think we need to hand-hold a user through the f() shell-snippet trickery. I just don't want somebody thinking they can blindly paste that into their config. > > The other alternative is to pass $GIT_DIR in the environment on behalf > > of the program. Then writing: > > > > git for-each-ref --format='%(objectname)' refs/heads > > > > would Just Work. But it's a bit subtle, since it is not immediately > > obvious that the command is meant to run in a different repository. > > I think that we discussed this approach a bit off-list, and I had the > idea that it was too fragile to work in practice, and that it would be > too surprising for callers to suddenly be in a different world. > > I say this not because it wouldn't make this particular scenario more > convenient, which it uncountably would, but because it would make other > scenarios _more_ complicated. > > For example, if a caller uses an alternate reference backed, perhaps, > MySQL (or anything that _isn't_ Git), they're not going to want to have > these GIT_ environment variable set. If they're not using Git under the hood, then GIT_* probably isn't hurting anything. But it is still pretty subtle. Let's forget I mentioned it. Just chaining for-each-ref with a prefix is pretty awkward, but that's why we have the next patch with alternateRefsPrefixes. Your response did make me think of one other thing, though. The alternate file points to a directory with objects, and the for_each_alternate_ref() code checks to see if that looks vaguely like the objects/ directory of a git repo. But would anybody want to run something like alternateRefsCommand on _just_ the object directory? I.e., you don't have a real git repo there, but your script can "somehow" come up with a list of valid tips. That isn't inconceivable to me for the kind of multi-fork storage we do at GitHub. E.g., imagine a shared object directory with no refs, and then a script that goes out to the other related forks to look at their ref tips. I don't think we have any immediate plans for it, though (and there are a lot of subtle bits that I won't go into here that make it non-trivial). So I'm OK to punt on it for now. I also think in a pinch that you could easily fool the alternates code by just having a dummy "refs/" directory. > > > diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh > [...] > > > +test_description='git receive-pack test' > > > > The name of this test file and the description are pretty vague. Can we > > say something like "test handling of receive-pack with alternate-refs > > config"? > > I left it intentionally vague, since I'd like for it to contain more > tests about 'git receive-pack'-specific things in the future. > > I'm happy to change the name, though I wonder if we should change the > filename accordingly, and if so, to what. I think we'd want to have a separate script for other receive-pack tests that aren't related to alternates. There's some startup overhead to each script so we don't want to make them _too_ small, but there are benefits to having small test scripts: - they're our unit of parallelism, so we want to be able to keep a reasonable number of processors full - each test script starts with a clean slate, so there's less chance for unexpected interactions between individual tests (e.g., when modifying or adding a test in the middle of the script) - it's less annoying when you're debugging a failing test near the end of a script ;) I actually think we'd benefit from splitting up a few of the longer scripts. On my quad-core laptop, running the tests in slow-to-fast order keeps the processors pretty busy, and the slowest test takes less time than the whole suite. But I've also tried running on a 40-core box. It burns through the short tests quickly, but you can never get faster than the slowest single test, which takes something like 35 seconds. So instead of being 10 times faster, it's more like two times faster, as most of the processors idle waiting for that one script to finish. But that's all pretty tangential here. My point is just that this probably ought to be remain its own script. :) I'd probably name it "t5410-receive-pack-alternates" or similar. > I'll wait until Monday to re-roll, just to make sure that there isn't > any new feedback between now and then. Sounds good. Thanks for working on this. -Peff
On Sat, Sep 29, 2018 at 03:31:38AM -0400, Jeff King wrote: > On Fri, Sep 28, 2018 at 03:04:10PM -0700, Taylor Blau wrote: > > > > Well, you also need to pass the path so it knows which repo to look at. > > > Which I think is the primary reason we do it, but behaving differently > > > for each alternate is another option. > > > > Yeah. I think that the clearer argument is yours, so I'll amend my copy. > > I am thinking of: > > > > To find the alternate, pass its absolute path as the first argument. > > > > How does that sound? > > Sounds good. > > > > > +heads, configure `core.alternateRefsCommand` to the path of a script which runs > > > > +`git --git-dir="$1" for-each-ref --format='%(objectname)' refs/heads`. > > > > > > Does that script actually work? Because of the way we invoke shell > > > commands with arguments, I think we'd end up with: > > > > > > git --git-dir="$1" for-each-ref --format='%(objectname)' refs/heads "$@" > > [...] > > ...but this was what I was trying to get across with saying "...to the > > path of a script which runs...", such that we would get the implicit > > scoping that you make explicit in your example with "f() { ... }; f". > > > > Does that seem OK as-is after the additional context? I think that after > > reading your response, it seems to be confusing, so perhaps it should be > > changed... > > Ah, OK. I totally missed that "path of a script" part. What you have is > correct, then, but I do wonder if we could make it less subtle. > > Maybe something like: > > For example, if `/path/to/script` runs `git --git-dir="$1" > for-each-ref --format='%(objectname)' refs/heads/`, then putting > `/path/to/script` in `core.alternateRefsCommand` will show only the > branch heads from the alternate. > > I dunno. It's certainly clunkier. I wonder if we would be less awkward > to show the sample script in a fenced block, with the `#!/bin/sh` and > everything. > > Or maybe just keep the text you have and add a note at the end like: > > Note that writing that `for-each-ref` command directly in the config > option doesn't quite work, as it has to handle the path argument > specially. > > I don't think we need to hand-hold a user through the f() shell-snippet > trickery. I just don't want somebody thinking they can blindly paste > that into their config. Yeah, I agree with your later suggestion, and I'm glad that we're on the same page. I certianly don't think that we need to do an extra amount of hand holding through the 'f() { ... }; f' pattern, but I added an extra bit to say that 'git for-each-ref' by itself doesn't work, since you have to handle the path argument. > > > The other alternative is to pass $GIT_DIR in the environment on behalf > > > of the program. Then writing: > > > > > > git for-each-ref --format='%(objectname)' refs/heads > > > > > > would Just Work. But it's a bit subtle, since it is not immediately > > > obvious that the command is meant to run in a different repository. > > > > I think that we discussed this approach a bit off-list, and I had the > > idea that it was too fragile to work in practice, and that it would be > > too surprising for callers to suddenly be in a different world. > > > > I say this not because it wouldn't make this particular scenario more > > convenient, which it uncountably would, but because it would make other > > scenarios _more_ complicated. > > > > For example, if a caller uses an alternate reference backed, perhaps, > > MySQL (or anything that _isn't_ Git), they're not going to want to have > > these GIT_ environment variable set. > > If they're not using Git under the hood, then GIT_* probably isn't > hurting anything. But it is still pretty subtle. Let's forget I > mentioned it. Just chaining for-each-ref with a prefix is pretty > awkward, but that's why we have the next patch with > alternateRefsPrefixes. > > Your response did make me think of one other thing, though. The > alternate file points to a directory with objects, and the > for_each_alternate_ref() code checks to see if that looks vaguely like > the objects/ directory of a git repo. But would anybody want to run > something like alternateRefsCommand on _just_ the object directory? > I.e., you don't have a real git repo there, but your script can > "somehow" come up with a list of valid tips. > > That isn't inconceivable to me for the kind of multi-fork storage we do > at GitHub. E.g., imagine a shared object directory with no refs, and > then a script that goes out to the other related forks to look at their > ref tips. I don't think we have any immediate plans for it, though (and > there are a lot of subtle bits that I won't go into here that make it > non-trivial). So I'm OK to punt on it for now. I also think in a pinch > that you could easily fool the alternates code by just having a dummy > "refs/" directory. I'm not opposed to the idea in general, and I think that it's a good one, but I am opposed to it in this series. I think that the series as-is is concise, and unlocks a path towards implementing this feature at GitHub, and for other users, too. Certainly we can invent more complicated examples, and I think that many of them (yours included) are worth building the extra support for. But in this initial version, I think that we'd be fine to leave it off. > > > > diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh > > [...] > > > > +test_description='git receive-pack test' > > > > > > The name of this test file and the description are pretty vague. Can we > > > say something like "test handling of receive-pack with alternate-refs > > > config"? > > > > I left it intentionally vague, since I'd like for it to contain more > > tests about 'git receive-pack'-specific things in the future. > > > > I'm happy to change the name, though I wonder if we should change the > > filename accordingly, and if so, to what. > > I think we'd want to have a separate script for other receive-pack tests > that aren't related to alternates. There's some startup overhead to each > script so we don't want to make them _too_ small, but there are benefits > to having small test scripts: > > - they're our unit of parallelism, so we want to be able to keep a > reasonable number of processors full > > - each test script starts with a clean slate, so there's less chance > for unexpected interactions between individual tests (e.g., when > modifying or adding a test in the middle of the script) > > - it's less annoying when you're debugging a failing test near the end > of a script ;) All good points, so I'm convinced ;-). > I actually think we'd benefit from splitting up a few of the longer > scripts. On my quad-core laptop, running the tests in slow-to-fast order > keeps the processors pretty busy, and the slowest test takes less time > than the whole suite. But I've also tried running on a 40-core box. It > burns through the short tests quickly, but you can never get faster than > the slowest single test, which takes something like 35 seconds. So > instead of being 10 times faster, it's more like two times faster, as > most of the processors idle waiting for that one script to finish. > > But that's all pretty tangential here. My point is just that this > probably ought to be remain its own script. :) > > I'd probably name it "t5410-receive-pack-alternates" or similar. Sounds good, I'll do that and update the name of the test to be 'receive-pack with alternate ref filtering'. > > I'll wait until Monday to re-roll, just to make sure that there isn't > > any new feedback between now and then. > > Sounds good. Thanks for working on this. It's been my pleasure. Thanks for all of your help. Thanks, Taylor
diff --git a/Documentation/config.txt b/Documentation/config.txt index ad0f4510c3..afcb18331a 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -616,6 +616,17 @@ core.preferSymlinkRefs:: This is sometimes needed to work with old scripts that expect HEAD to be a symbolic link. +core.alternateRefsCommand:: + When advertising tips of available history from an alternate, use the shell to + execute the specified command instead of linkgit:git-for-each-ref[1]. The + first argument is the absolute path of the alternate. Output must be of the + form: `%(objectname)`, where multiple tips are separated by newlines. ++ +This is useful when a repository only wishes to advertise some of its +alternate's references as ".have"'s. For example, to only advertise branch +heads, configure `core.alternateRefsCommand` to the path of a script which runs +`git --git-dir="$1" for-each-ref --format='%(objectname)' refs/heads`. + core.bare:: If true this repository is assumed to be 'bare' and has no working directory associated with it. If this is the case a diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh new file mode 100755 index 0000000000..503dde35a4 --- /dev/null +++ b/t/t5410-receive-pack.sh @@ -0,0 +1,49 @@ +#!/bin/sh + +test_description='git receive-pack test' + +. ./test-lib.sh + +test_expect_success 'setup' ' + test_commit one && + git update-ref refs/heads/a HEAD && + test_commit two && + git update-ref refs/heads/b HEAD && + test_commit three && + git update-ref refs/heads/c HEAD && + git clone --bare . fork && + git clone fork pusher && + ( + cd fork && + git update-ref --stdin <<-\EOF && + delete refs/heads/a + delete refs/heads/b + delete refs/heads/c + delete refs/heads/master + delete refs/tags/one + delete refs/tags/two + delete refs/tags/three + EOF + echo "../../.git/objects" >objects/info/alternates + ) +' + +extract_haves () { + depacketize | perl -lne '/^(\S+) \.have/ and print $1' +} + +test_expect_success 'with core.alternateRefsCommand' ' + write_script fork/alternate-refs <<-\EOF && + git --git-dir="$1" for-each-ref \ + --format="%(objectname)" \ + refs/heads/a \ + refs/heads/c + EOF + test_config -C fork core.alternateRefsCommand alternate-refs && + git rev-parse a c >expect && + printf "0000" | git receive-pack fork >actual && + extract_haves <actual >actual.haves && + test_cmp expect actual.haves +' + +test_done diff --git a/transport.c b/transport.c index 2825debac5..e271b66603 100644 --- a/transport.c +++ b/transport.c @@ -1328,10 +1328,21 @@ char *transport_anonymize_url(const char *url) static void fill_alternate_refs_command(struct child_process *cmd, const char *repo_path) { - cmd->git_cmd = 1; - argv_array_pushf(&cmd->args, "--git-dir=%s", repo_path); - argv_array_push(&cmd->args, "for-each-ref"); - argv_array_push(&cmd->args, "--format=%(objectname)"); + const char *value; + + if (!git_config_get_value("core.alternateRefsCommand", &value)) { + cmd->use_shell = 1; + + argv_array_push(&cmd->args, value); + argv_array_push(&cmd->args, repo_path); + } else { + cmd->git_cmd = 1; + + argv_array_pushf(&cmd->args, "--git-dir=%s", repo_path); + argv_array_push(&cmd->args, "for-each-ref"); + argv_array_push(&cmd->args, "--format=%(objectname)"); + } + cmd->env = local_repo_env; cmd->out = -1; }
When in a repository containing one or more alternates, Git would sometimes like to list references from those alternates. For example, 'git receive-pack' lists the "tips" pointed to by references in those alternates as special ".have" references. Listing ".have" references is designed to make pushing changes from upstream to a fork a lightweight operation, by advertising to the pusher that the fork already has the objects (via its alternate). Thus, the client can avoid sending them. However, when the alternate (upstream, in the previous example) has a pathologically large number of references, the initial advertisement is too expensive. In fact, it can dominate any such optimization where the pusher avoids sending certain objects. Introduce "core.alternateRefsCommand" in order to provide a facility to limit or filter alternate references. This can be used, for example, to filter out references the alternate does not wish to send (for space concerns, or otherwise) during the initial advertisement. Let the repository that has alternates configure this command to avoid trusting the alternate to provide us a safe command to run in the shell. To behave differently on each alternate (e.g., only list tags from alternate A, only heads from B) provide the path of the alternate as the first argument. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- Documentation/config.txt | 11 +++++++++ t/t5410-receive-pack.sh | 49 ++++++++++++++++++++++++++++++++++++++++ transport.c | 19 ++++++++++++---- 3 files changed, 75 insertions(+), 4 deletions(-) create mode 100755 t/t5410-receive-pack.sh