Message ID | 4c4900722cab253b3ce33cb28910c4602ce44536.1537466087.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Filter alternate references | expand |
On Thu, Sep 20, 2018 at 02:04:11PM -0400, Taylor Blau wrote: > diff --git a/Documentation/config.txt b/Documentation/config.txt > index 112041f407..b908bc5825 100644 > --- a/Documentation/config.txt > +++ b/Documentation/config.txt > @@ -616,6 +616,12 @@ core.preferSymlinkRefs:: > This is sometimes needed to work with old scripts that > expect HEAD to be a symbolic link. > > +core.alternateRefsCommand:: > + When listing references from an alternate (e.g., in the case of ".have"), use > + the shell to execute the specified command instead of > + linkgit:git-for-each-ref[1]. The first argument is the path of the alternate. > + Output must be of the form: `%(objectname) SPC %(refname)`. We discussed off-list the notion that this could just be the objectname, since the ".have" mechanism doesn't care about the actual refnames. There's a little prior discussion from the list: https://public-inbox.org/git/xmqqefzraqbu.fsf@gitster.mtv.corp.google.com/ My "rev-list --alternate-refs" patches _do_ use the refnames, since you could do something like "--source" that cares about them. But there's some awkwardness there, because the names are in a different namespace than the rest of the refs. If we were to just say "nope, you do not get to see the names of the alternates" then that awkwardness goes away. But it also loses some information that could _possibly_ be of use to a caller. Back in that earlier discussion I did not have a strong opinion, but here we are cementing that decision into a user-visible interface. So it probably makes sense to revisit and decide once and for all. > +test_description='git receive-pack test' > + > +. ./test-lib.sh > + > +test_expect_success 'setup' ' > + test_commit one && > + git update-ref refs/heads/a HEAD && > + test_commit two && > + git update-ref refs/heads/b HEAD && > + test_commit three && > + git update-ref refs/heads/c HEAD && > + git clone --bare . fork && > + git clone fork pusher && > + ( > + cd fork && > + git config receive.advertisealternates true && > + git update-ref -d refs/heads/a && > + git update-ref -d refs/heads/b && > + git update-ref -d refs/heads/c && > + git update-ref -d refs/heads/master && > + git update-ref -d refs/tags/one && > + git update-ref -d refs/tags/two && > + git update-ref -d refs/tags/three && Probably not worth nit-picking process count, but this could done with a single "update-ref --stdin". > + printf "../../.git/objects" >objects/info/alternates Also a nitpick, but I think "echo" would be more usual here (we handle the lack of a trailing newline just fine, but any use of printf makes me wonder if something tricky is going on with line endings). > +test_expect_success 'with core.alternateRefsCommand' ' > + test_config -C fork core.alternateRefsCommand \ > + "git --git-dir=\"\$1\" for-each-ref \ > + --format=\"%(objectname) %(refname)\" \ > + refs/heads/a refs/heads/c;:" && This is cute and all, but might it be more readable to use write_script() to stick it into its own script? > + cat >expect <<-EOF && > + $(git rev-parse a) .have > + $(git rev-parse c) .have > + EOF > + printf "0000" | git receive-pack fork | extract_haves >actual && There's been a push lately to avoid having git on the left-hand side of a fork, since we might otherwise miss its exit code (including things like asan/valgrind errors). So maybe: ... receive-pack fork >actual && extract_haves <actual >actual.haves && test_cmp expect actual.haves or similar? > diff --git a/transport.c b/transport.c > index 24ae3f375d..e7d2cdf00b 100644 > --- a/transport.c > +++ b/transport.c > @@ -1328,10 +1328,21 @@ char *transport_anonymize_url(const char *url) > static void fill_alternate_refs_command(struct child_process *cmd, > const char *repo_path) > { > - cmd->git_cmd = 1; > - argv_array_pushf(&cmd->args, "--git-dir=%s", repo_path); > - argv_array_push(&cmd->args, "for-each-ref"); > - argv_array_push(&cmd->args, "--format=%(objectname) %(refname)"); > + const char *value; > + > + if (!git_config_get_value("core.alternateRefsCommand", &value)) { > + cmd->use_shell = 1; > + > + argv_array_push(&cmd->args, value); > + argv_array_push(&cmd->args, repo_path); Setting use_shell allows the shell trickery in your test, and matches the modern way we run config-based commands. Good. > + } else { > + cmd->git_cmd = 1; > + > + argv_array_pushf(&cmd->args, "--git-dir=%s", repo_path); > + argv_array_push(&cmd->args, "for-each-ref"); > + argv_array_push(&cmd->args, "--format=%(objectname) %(refname)"); > + } > + > cmd->env = local_repo_env; > cmd->out = -1; And we still clear local_repo_env for the custom command, which is good to avoid confusion like $GIT_DIR being set when the custom command does "cd $1 && git ...". Good. -Peff
On Thu, Sep 20, 2018 at 03:37:51PM -0400, Jeff King wrote: > On Thu, Sep 20, 2018 at 02:04:11PM -0400, Taylor Blau wrote: > > > diff --git a/Documentation/config.txt b/Documentation/config.txt > > index 112041f407..b908bc5825 100644 > > --- a/Documentation/config.txt > > +++ b/Documentation/config.txt > > @@ -616,6 +616,12 @@ core.preferSymlinkRefs:: > > This is sometimes needed to work with old scripts that > > expect HEAD to be a symbolic link. > > > > +core.alternateRefsCommand:: > > + When listing references from an alternate (e.g., in the case of ".have"), use > > + the shell to execute the specified command instead of > > + linkgit:git-for-each-ref[1]. The first argument is the path of the alternate. > > + Output must be of the form: `%(objectname) SPC %(refname)`. > > We discussed off-list the notion that this could just be the objectname, > since the ".have" mechanism doesn't care about the actual refnames. > > There's a little prior discussion from the list: > > https://public-inbox.org/git/xmqqefzraqbu.fsf@gitster.mtv.corp.google.com/ > > My "rev-list --alternate-refs" patches _do_ use the refnames, since you > could do something like "--source" that cares about them. But there's > some awkwardness there, because the names are in a different namespace > than the rest of the refs. If we were to just say "nope, you do not get > to see the names of the alternates" then that awkwardness goes away. But > it also loses some information that could _possibly_ be of use to a > caller. > > Back in that earlier discussion I did not have a strong opinion, but > here we are cementing that decision into a user-visible interface. So it > probably makes sense to revisit and decide once and for all. Interesting, and thanks for the link to the prior discussion. I think that I agree mostly with your rationale in [1], which boils down (for me) to: - Other callers (like 'rev-list --alternate-refs') might care about them. Even if we don't have those patches in Git today, it's worth keeping their use case(s) in mind. - I didn't measure either, but I can't imagine that we're paying a huge price for this. So, it might be easy enough to keep saying, "please write output as '%(objectname) SP %(refname)'", even if we end up throwing out the refname, anyway. > > +test_description='git receive-pack test' > > + > > +. ./test-lib.sh > > + > > +test_expect_success 'setup' ' > > + test_commit one && > > + git update-ref refs/heads/a HEAD && > > + test_commit two && > > + git update-ref refs/heads/b HEAD && > > + test_commit three && > > + git update-ref refs/heads/c HEAD && > > + git clone --bare . fork && > > + git clone fork pusher && > > + ( > > + cd fork && > > + git config receive.advertisealternates true && > > + git update-ref -d refs/heads/a && > > + git update-ref -d refs/heads/b && > > + git update-ref -d refs/heads/c && > > + git update-ref -d refs/heads/master && > > + git update-ref -d refs/tags/one && > > + git update-ref -d refs/tags/two && > > + git update-ref -d refs/tags/three && > > Probably not worth nit-picking process count, but this could done with a > single "update-ref --stdin". Sure, I don't think that 7 `update-ref`'s vs 2 (`cat` + `git update-ref --stdin`) will make or break the series, but I can happily shorten it as you suggest ;-). > > + printf "../../.git/objects" >objects/info/alternates > > Also a nitpick, but I think "echo" would be more usual here (we handle > the lack of a trailing newline just fine, but any use of printf makes me > wonder if something tricky is going on with line endings). 'echo' indeed seems to be the way to go. This 'printf' preference is a Git LFS-ism ;-). > > +test_expect_success 'with core.alternateRefsCommand' ' > > + test_config -C fork core.alternateRefsCommand \ > > + "git --git-dir=\"\$1\" for-each-ref \ > > + --format=\"%(objectname) %(refname)\" \ > > + refs/heads/a refs/heads/c;:" && > > This is cute and all, but might it be more readable to use > write_script() to stick it into its own script? Good idea, I'll do that. > > + cat >expect <<-EOF && > > + $(git rev-parse a) .have > > + $(git rev-parse c) .have > > + EOF > > + printf "0000" | git receive-pack fork | extract_haves >actual && > > There's been a push lately to avoid having git on the left-hand side of > a fork, since we might otherwise miss its exit code (including things > like asan/valgrind errors). So maybe: > > ... receive-pack fork >actual && > extract_haves <actual >actual.haves && > test_cmp expect actual.haves > > or similar? Sure, I agree that it's a good idea to not miss the exit code (since we don't have pipefail on), etc. I adopted your suggestion into my local copy. > > diff --git a/transport.c b/transport.c > > index 24ae3f375d..e7d2cdf00b 100644 > > --- a/transport.c > > +++ b/transport.c > > @@ -1328,10 +1328,21 @@ char *transport_anonymize_url(const char *url) > > static void fill_alternate_refs_command(struct child_process *cmd, > > const char *repo_path) > > { > > - cmd->git_cmd = 1; > > - argv_array_pushf(&cmd->args, "--git-dir=%s", repo_path); > > - argv_array_push(&cmd->args, "for-each-ref"); > > - argv_array_push(&cmd->args, "--format=%(objectname) %(refname)"); > > + const char *value; > > + > > + if (!git_config_get_value("core.alternateRefsCommand", &value)) { > > + cmd->use_shell = 1; > > + > > + argv_array_push(&cmd->args, value); > > + argv_array_push(&cmd->args, repo_path); > > Setting use_shell allows the shell trickery in your test, and matches > the modern way we run config-based commands. Good. > > > + } else { > > + cmd->git_cmd = 1; > > + > > + argv_array_pushf(&cmd->args, "--git-dir=%s", repo_path); > > + argv_array_push(&cmd->args, "for-each-ref"); > > + argv_array_push(&cmd->args, "--format=%(objectname) %(refname)"); > > + } > > + > > cmd->env = local_repo_env; > > cmd->out = -1; > > And we still clear local_repo_env for the custom command, which is good > to avoid confusion like $GIT_DIR being set when the custom command does > "cd $1 && git ...". Good. Thanks, Taylor [1]: https://public-inbox.org/git/20170125195425.q4fpvc4ten5mfjgl@sigill.intra.peff.net/
On Thu, Sep 20, 2018 at 04:00:34PM -0400, Taylor Blau wrote: > > My "rev-list --alternate-refs" patches _do_ use the refnames, since you > > could do something like "--source" that cares about them. But there's > > some awkwardness there, because the names are in a different namespace > > than the rest of the refs. If we were to just say "nope, you do not get > > to see the names of the alternates" then that awkwardness goes away. But > > it also loses some information that could _possibly_ be of use to a > > caller. > > > > Back in that earlier discussion I did not have a strong opinion, but > > here we are cementing that decision into a user-visible interface. So it > > probably makes sense to revisit and decide once and for all. > > Interesting, and thanks for the link to the prior discussion. I think > that I agree mostly with your rationale in [1], which boils down (for > me) to: > > - Other callers (like 'rev-list --alternate-refs') might care about > them. Even if we don't have those patches in Git today, it's worth > keeping their use case(s) in mind. > > - I didn't measure either, but I can't imagine that we're paying a > huge price for this. So, it might be easy enough to keep saying, > "please write output as '%(objectname) SP %(refname)'", even if we > end up throwing out the refname, anyway. TBH, the main advantage to me is that it makes the user-visible interface way simpler. We just say "give us a list of object ids, one per line". I guess the current spec is not too bad, especially given that we can just provide a for-each-ref format that generates it. > > Probably not worth nit-picking process count, but this could done with a > > single "update-ref --stdin". > > Sure, I don't think that 7 `update-ref`'s vs 2 (`cat` + `git update-ref > --stdin`) will make or break the series, but I can happily shorten it as > you suggest ;-). Yeah, in retrospect I should have not have even mentioned it. test_commit() already adds a bunch of extra processes you may or may not care about (e.g., by making tags, or using "git add" when "commit -a" might do). > > > + cat >expect <<-EOF && > > > + $(git rev-parse a) .have > > > + $(git rev-parse c) .have > > > + EOF > > > + printf "0000" | git receive-pack fork | extract_haves >actual && > > > > There's been a push lately to avoid having git on the left-hand side of > > a fork, since we might otherwise miss its exit code (including things Heh, I meant to say "left-hand side of a pipe", but you obviously figured out what I meant. :) -Peff
Taylor Blau <ttaylorr@github.com> writes: > +extract_haves () { > + depacketize - | grep -o '^.* \.have' Not portable, isn't it? cf. http://pubs.opengroup.org/onlinepubs/9699919799/utilities/grep.html
On Fri, Sep 21, 2018 at 09:39:14AM -0700, Junio C Hamano wrote: > Taylor Blau <ttaylorr@github.com> writes: > > > +extract_haves () { > > + depacketize - | grep -o '^.* \.have' > > Not portable, isn't it? > > cf. http://pubs.opengroup.org/onlinepubs/9699919799/utilities/grep.html Good catch. Definitely not portable, per the link that you shared above. Since 'depacketize()' will give us a "\0", we can pull it and anything after it out with 'sed', instead. Any lines that don't contain a "\0" only contain an OID and the literal, ".have", and are fine as-is. Something like this: extract_haves () { depacketize - | grep '^.* \.have' | sed -e 's/\\0.*$//g' } Harder to read--at least for me--but infinitely more portable. I'll wait until a little later today, and then send you v2. Thanks for reviewing :-). Thanks, Taylor
On Fri, Sep 21, 2018 at 01:48:25PM -0400, Taylor Blau wrote: > On Fri, Sep 21, 2018 at 09:39:14AM -0700, Junio C Hamano wrote: > > Taylor Blau <ttaylorr@github.com> writes: > > > > > +extract_haves () { > > > + depacketize - | grep -o '^.* \.have' > > > > Not portable, isn't it? > > > > cf. http://pubs.opengroup.org/onlinepubs/9699919799/utilities/grep.html > > Good catch. Definitely not portable, per the link that you shared above. > > Since 'depacketize()' will give us a "\0", we can pull it and anything > after it out with 'sed', instead. Any lines that don't contain a "\0" > only contain an OID and the literal, ".have", and are fine as-is. > > Something like this: > > extract_haves () { > depacketize - | grep '^.* \.have' | sed -e 's/\\0.*$//g' > } > > Harder to read--at least for me--but infinitely more portable. In fact, I think that we can go even further: since we don't need to catch the beginning '^.*' (without -o), we can instead: extract_haves () { depacketize - | grep '\.have' | sed -e 's/\\0.*$//g' } Thanks, Taylor
Taylor Blau <me@ttaylorr.com> writes: > In fact, I think that we can go even further: since we don't need to > catch the beginning '^.*' (without -o), we can instead: > > extract_haves () { > depacketize - | grep '\.have' | sed -e 's/\\0.*$//g' > } Do not pipe grep into sed, unless you have an overly elaborate set of patterns to filter with, e.g. something along the lines of... sed -ne '/\.have/s/...//p'
On Fri, Sep 21, 2018 at 12:59:16PM -0700, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > > > In fact, I think that we can go even further: since we don't need to > > catch the beginning '^.*' (without -o), we can instead: > > > > extract_haves () { > > depacketize - | grep '\.have' | sed -e 's/\\0.*$//g' > > } > > Do not pipe grep into sed, unless you have an overly elaborate set > of patterns to filter with, e.g. something along the lines of... > > sed -ne '/\.have/s/...//p' Thanks, I'm not sure why I thought that this was a good idea to send (even after discussing it to myself twice publicly on the list beforehand). Anyway, in my local copy, I adopted Peff's suggestion below in the thread, which is: extract_haves () { depacketize - | perl -lne '/^(\S+) \.have/ and print $1' } I think that that should be OK, but I sent it here to double check before sending you real patches. Thanks, Taylor
diff --git a/Documentation/config.txt b/Documentation/config.txt index 112041f407..b908bc5825 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -616,6 +616,12 @@ core.preferSymlinkRefs:: This is sometimes needed to work with old scripts that expect HEAD to be a symbolic link. +core.alternateRefsCommand:: + When listing references from an alternate (e.g., in the case of ".have"), use + the shell to execute the specified command instead of + linkgit:git-for-each-ref[1]. The first argument is the path of the alternate. + Output must be of the form: `%(objectname) SPC %(refname)`. + core.bare:: If true this repository is assumed to be 'bare' and has no working directory associated with it. If this is the case a diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh new file mode 100755 index 0000000000..09fb3f39a1 --- /dev/null +++ b/t/t5410-receive-pack.sh @@ -0,0 +1,47 @@ +#!/bin/sh + +test_description='git receive-pack test' + +. ./test-lib.sh + +test_expect_success 'setup' ' + test_commit one && + git update-ref refs/heads/a HEAD && + test_commit two && + git update-ref refs/heads/b HEAD && + test_commit three && + git update-ref refs/heads/c HEAD && + git clone --bare . fork && + git clone fork pusher && + ( + cd fork && + git config receive.advertisealternates true && + git update-ref -d refs/heads/a && + git update-ref -d refs/heads/b && + git update-ref -d refs/heads/c && + git update-ref -d refs/heads/master && + git update-ref -d refs/tags/one && + git update-ref -d refs/tags/two && + git update-ref -d refs/tags/three && + printf "../../.git/objects" >objects/info/alternates + ) +' + +extract_haves () { + depacketize - | grep -o '^.* \.have' +} + +test_expect_success 'with core.alternateRefsCommand' ' + test_config -C fork core.alternateRefsCommand \ + "git --git-dir=\"\$1\" for-each-ref \ + --format=\"%(objectname) %(refname)\" \ + refs/heads/a refs/heads/c;:" && + cat >expect <<-EOF && + $(git rev-parse a) .have + $(git rev-parse c) .have + EOF + printf "0000" | git receive-pack fork | extract_haves >actual && + test_cmp expect actual +' + +test_done diff --git a/transport.c b/transport.c index 24ae3f375d..e7d2cdf00b 100644 --- a/transport.c +++ b/transport.c @@ -1328,10 +1328,21 @@ char *transport_anonymize_url(const char *url) static void fill_alternate_refs_command(struct child_process *cmd, const char *repo_path) { - cmd->git_cmd = 1; - argv_array_pushf(&cmd->args, "--git-dir=%s", repo_path); - argv_array_push(&cmd->args, "for-each-ref"); - argv_array_push(&cmd->args, "--format=%(objectname) %(refname)"); + const char *value; + + if (!git_config_get_value("core.alternateRefsCommand", &value)) { + cmd->use_shell = 1; + + argv_array_push(&cmd->args, value); + argv_array_push(&cmd->args, repo_path); + } else { + cmd->git_cmd = 1; + + argv_array_pushf(&cmd->args, "--git-dir=%s", repo_path); + argv_array_push(&cmd->args, "for-each-ref"); + argv_array_push(&cmd->args, "--format=%(objectname) %(refname)"); + } + cmd->env = local_repo_env; cmd->out = -1; }
When in a repository containing one or more alternates, Git would sometimes like to list references from its alternates. For example, 'git receive-pack' list the objects pointed to by alternate references as special ".have" references. Listing ".have" references is designed to make pushing changes from upstream to a fork a lightweight operation, by advertising to the pusher that the fork already has the objects (via its alternate). Thus, the client can avoid sending them. However, when the alternate has a pathologically large number of references, the initial advertisement is too expensive. In fact, it can dominate any such optimization where the pusher avoids sending certain objects. Introduce "core.alternateRefsCommand" in order to provide a facility to limit or filter alternate references. This can be used, for example, to filter out "uninteresting" references from the initial advertisement in the above scenario. Let the repository that has alternates configure this command to avoid trusting the alternate to provide us a safe command to run in the shell. To behave differently on each alternate (e.g., only list tags from alternate A, only heads from B) provide the path of the alternate as the first argument. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- Documentation/config.txt | 6 +++++ t/t5410-receive-pack.sh | 47 ++++++++++++++++++++++++++++++++++++++++ transport.c | 19 ++++++++++++---- 3 files changed, 68 insertions(+), 4 deletions(-) create mode 100755 t/t5410-receive-pack.sh