Message ID | 9797f525517142b3494cfbd17a10dfeb3bf586e2.1537555544.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Filter alternate references | expand |
On Fri, Sep 21, 2018 at 2:47 PM Taylor Blau <me@ttaylorr.com> wrote: > When in a repository containing one or more alternates, Git would > sometimes like to list references from its alternates. For example, 'git > receive-pack' list the objects pointed to by alternate references as > special ".have" references. > [...] > Signed-off-by: Taylor Blau <me@ttaylorr.com> > --- > diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh > @@ -0,0 +1,54 @@ > +expect_haves () { > + printf "%s .have\n" $(git rev-parse $@) >expect > +} Magic quoting behavior only kicks in when $@ is itself quoted, so this should be: printf "%s .have\n" $(git rev-parse "$@") >expect However, as it's unlikely that you need magic quoting in this case, you might get by with plain $* (unquoted).
Taylor Blau <me@ttaylorr.com> writes: > +core.alternateRefsCommand:: > + When listing references from an alternate (e.g., in the case of ".have"), use It is not clear how (e.g.,...) connects to what is said in the sentence. "When advertising tips of available history from an alternate, use ..." without saying ".have" may be less cryptic. I dunno. > + the shell to execute the specified command instead of > + linkgit:git-for-each-ref[1]. The first argument is the path of the alternate. "The path" meaning the absolute path? Relative to the original object store? Something else? > + Output must be of the form: `%(objectname) SPC %(refname)`. > ++ > +This is useful when a repository only wishes to advertise some of its > +alternate's references as ".have"'s. For example, to only advertise branch > +heads, configure `core.alternateRefsCommand` to the path of a script which runs > +`git --git-dir="$1" for-each-ref refs/heads`. > + > core.bare:: > If true this repository is assumed to be 'bare' and has no > working directory associated with it. If this is the case a > diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh > new file mode 100755 > index 0000000000..2f21f1cb8f > --- /dev/null > +++ b/t/t5410-receive-pack.sh > @@ -0,0 +1,54 @@ > +#!/bin/sh > + > +test_description='git receive-pack test' > + > +. ./test-lib.sh > + > +test_expect_success 'setup' ' > + test_commit one && > + git update-ref refs/heads/a HEAD && > + test_commit two && > + git update-ref refs/heads/b HEAD && > + test_commit three && > + git update-ref refs/heads/c HEAD && > + git clone --bare . fork && > + git clone fork pusher && > + ( > + cd fork && > + git config receive.advertisealternates true && Hmph. Do we have code to support this configuration variable? > + cat <<-EOF | git update-ref --stdin && Style: writing "<<-\EOF" instead would allow readers' eyes to coast over without having to look for $variable_references in the here-doc. > + delete refs/heads/a > + delete refs/heads/b > + delete refs/heads/c > + delete refs/heads/master > + delete refs/tags/one > + delete refs/tags/two > + delete refs/tags/three So, the original created one/two/three/a/b/c/master, fork is a bare clone of it and has all these things, and then you deleted all of these? What does fork have after this is done? HEAD that is dangling? > + EOF > + echo "../../.git/objects" >objects/info/alternates When viewed from fork/objects, ../../.git is the GIT_DIR of the primary test repository, so that is where we borrow objects from. If we pruned the objects from fork's object store before this echo, we would have an almost empty repository that borrows from its alternates everything, which may make a more realistic sample case, but because you are only focusing on the ref advertisement, it does not matter that your fork is full of duplicate objects that are available from the alternates. > +expect_haves () { > + printf "%s .have\n" $(git rev-parse $@) >expect Quote $@ inside dq pair, like $(git rev-parse "$@"). > +extract_haves () { > + depacketize - | grep '\.have' | sed -e 's/\\0.*$//g' > +} Don't pipe grep into sed, especially when both the pattern to filter and the operation to perform are simple. I am not sure what you are trying to achive with 'g' in s/pattern$//g; The anchor at the rightmost end of the pattern makes sure that the pattern matches only once per line at the end anyway, so "do this howmanyever times as we have match on each line" would not make any difference, no? > diff --git a/transport.c b/transport.c > index 24ae3f375d..e7d2cdf00b 100644 > --- a/transport.c > +++ b/transport.c > @@ -1328,10 +1328,21 @@ char *transport_anonymize_url(const char *url) > static void fill_alternate_refs_command(struct child_process *cmd, > const char *repo_path) > { > - cmd->git_cmd = 1; > - argv_array_pushf(&cmd->args, "--git-dir=%s", repo_path); > - argv_array_push(&cmd->args, "for-each-ref"); > - argv_array_push(&cmd->args, "--format=%(objectname) %(refname)"); > + const char *value; > + > + if (!git_config_get_value("core.alternateRefsCommand", &value)) { > + cmd->use_shell = 1; > + > + argv_array_push(&cmd->args, value); > + argv_array_push(&cmd->args, repo_path); > + } else { > + cmd->git_cmd = 1; > + > + argv_array_pushf(&cmd->args, "--git-dir=%s", repo_path); > + argv_array_push(&cmd->args, "for-each-ref"); > + argv_array_push(&cmd->args, "--format=%(objectname) %(refname)"); > + } > + > cmd->env = local_repo_env; > cmd->out = -1; > }
On Fri, Sep 21, 2018 at 2:47 PM Taylor Blau <me@ttaylorr.com> wrote: > When in a repository containing one or more alternates, Git would > sometimes like to list references from its alternates. For example, 'git > receive-pack' list the objects pointed to by alternate references as > special ".have" references. > [...] > Signed-off-by: Taylor Blau <me@ttaylorr.com> > --- > diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh > @@ -0,0 +1,54 @@ > +expect_haves () { > + printf "%s .have\n" $(git rev-parse $@) >expect > +} > + > +test_expect_success 'with core.alternateRefsCommand' ' > + [...] > + expect_haves a c >expect && This is not great. Both the caller of expect_haves() and expect_haves() itself redirect to a file named "expect". This works, but only by accident. Better would be to make expect_haves() simply a generator to stdout and let the caller redirect to the file rather than hardcoding the filename in the function itself (much as extract_haves() takes it its input on stdin rather than hardcoding a filename). If you take this approach, then you'd probably want to rename the function, as well; perhaps call it emit_haves() or something. > + printf "0000" | git receive-pack fork >actual && > + extract_haves <actual >actual.haves && > + test_cmp expect actual.haves > +'
On Fri, Sep 21, 2018 at 02:09:08PM -0700, Junio C Hamano wrote: > > +test_expect_success 'setup' ' > > + test_commit one && > > + git update-ref refs/heads/a HEAD && > > + test_commit two && > > + git update-ref refs/heads/b HEAD && > > + test_commit three && > > + git update-ref refs/heads/c HEAD && > > + git clone --bare . fork && > > + git clone fork pusher && > > + ( > > + cd fork && > > + git config receive.advertisealternates true && > > Hmph. Do we have code to support this configuration variable? Sorry, I should have caught that. Our existing solution is to disable alternates in the advertisement entirely (since the optimization backfires for us). So this line is a leftover from testing it against our fork, and should be dropped. If anybody is interested, we can share those patches, though they're unsurprisingly trivial. I suspect we may end up discarding them if this custom-command thing works, but it's possible we'll still need to be able to shut them off completely for some truly pathological cases. > > + cat <<-EOF | git update-ref --stdin && > > Style: writing "<<-\EOF" instead would allow readers' eyes to > coast over without having to look for $variable_references in > the here-doc. Also, useless-use-of-cat in the original, which could be: git update-ref --stdin <<-\EOF > [...] Yeah, I second all the other bits you mentioned. -Peff
Jeff King <peff@peff.net> writes: > On Fri, Sep 21, 2018 at 02:09:08PM -0700, Junio C Hamano wrote: > >> > +test_expect_success 'setup' ' >> > + test_commit one && >> > + git update-ref refs/heads/a HEAD && >> > + test_commit two && >> > + git update-ref refs/heads/b HEAD && >> > + test_commit three && >> > + git update-ref refs/heads/c HEAD && >> > + git clone --bare . fork && >> > + git clone fork pusher && >> > + ( >> > + cd fork && >> > + git config receive.advertisealternates true && >> >> Hmph. Do we have code to support this configuration variable? > > Sorry, I should have caught that. Our existing solution is to disable > alternates in the advertisement entirely (since the optimization > backfires for us). So this line is a leftover from testing it against > our fork, and should be dropped. > > If anybody is interested, we can share those patches, though they're > unsurprisingly trivial. Heh, I guessed correctly what is going on ;-) Even though there may not be much interest in the "all-or-none" boolean configuration, in order to upstream this custom thing, it may be the cleanest to upstream that all-or-none thing as well. Otherwise, you'd need to keep a patch to this test script that is private for your "all-or-none" feature. That's your maintenance burden so it ultimately is your call ;-) > Also, useless-use-of-cat in the original, which could be: > > git update-ref --stdin <<-\EOF Yup. Thanks.
On Fri, Sep 21, 2018 at 03:23:40PM -0700, Junio C Hamano wrote: > >> > + git config receive.advertisealternates true && > >> > >> Hmph. Do we have code to support this configuration variable? > > > > Sorry, I should have caught that. Our existing solution is to disable > > alternates in the advertisement entirely (since the optimization > > backfires for us). So this line is a leftover from testing it against > > our fork, and should be dropped. > > > > If anybody is interested, we can share those patches, though they're > > unsurprisingly trivial. > > Heh, I guessed correctly what is going on ;-) > > Even though there may not be much interest in the "all-or-none" > boolean configuration, in order to upstream this custom thing, it > may be the cleanest to upstream that all-or-none thing as well. > Otherwise, you'd need to keep a patch to this test script that is > private for your "all-or-none" feature. That's your maintenance > burden so it ultimately is your call ;-) Easy one-liners in test scripts are the least of my ongoing maintenance burden. ;) I think in this case, though, the line is not even necessary, as our patches leave the default as "true" (which is certainly what we would want upstream, as well, for compatibility). -Peff
On Fri, Sep 21, 2018 at 02:47:43PM -0400, Taylor Blau wrote: > +expect_haves () { > + printf "%s .have\n" $(git rev-parse $@) >expect > +} > + > +extract_haves () { > + depacketize - | grep '\.have' | sed -e 's/\\0.*$//g' It looks like you're trying to match a NUL here in the sed expression, but from my reading of it, POSIX doesn't permit BREs to match NUL. Perhaps someone can come up with a better solution, but I'd write this as the following: depacketize - | perl -ne 'next unless /\.have/; s/\0.*$//g; print'
On Sat, Sep 22, 2018 at 06:02:31PM +0000, brian m. carlson wrote: > On Fri, Sep 21, 2018 at 02:47:43PM -0400, Taylor Blau wrote: > > +expect_haves () { > > + printf "%s .have\n" $(git rev-parse $@) >expect > > +} > > + > > +extract_haves () { > > + depacketize - | grep '\.have' | sed -e 's/\\0.*$//g' > > It looks like you're trying to match a NUL here in the sed expression, > but from my reading of it, POSIX doesn't permit BREs to match NUL. No, it's trying to literally match backslash followed by 0. The depacketize() script will have undone the NUL already. In perl, no less, making it more or less equivalent to your suggestion. ;) So I think this is fine (modulo that the grep and sed can be combined). Yet another option would be to simply strip away everything except the object id (which is all we care about), like: depacketize | perl -lne '/^(\S+) \.have/ and print $1' Or the equivalent in sed. I am happy with any solution that does the correct thing. -Peff
On Sat, Sep 22, 2018 at 03:52:58PM -0400, Jeff King wrote: > On Sat, Sep 22, 2018 at 06:02:31PM +0000, brian m. carlson wrote: > > > On Fri, Sep 21, 2018 at 02:47:43PM -0400, Taylor Blau wrote: > > > +expect_haves () { > > > + printf "%s .have\n" $(git rev-parse $@) >expect > > > +} > > > + > > > +extract_haves () { > > > + depacketize - | grep '\.have' | sed -e 's/\\0.*$//g' > > > > It looks like you're trying to match a NUL here in the sed expression, > > but from my reading of it, POSIX doesn't permit BREs to match NUL. > > No, it's trying to literally match backslash followed by 0. The > depacketize() script will have undone the NUL already. In perl, no less, > making it more or less equivalent to your suggestion. ;) Ah, okay. That makes more sense. > So I think this is fine (modulo that the grep and sed can be combined). > Yet another option would be to simply strip away everything except the > object id (which is all we care about), like: > > depacketize | perl -lne '/^(\S+) \.have/ and print $1' > > Or the equivalent in sed. I am happy with any solution that does the > correct thing. Yeah, I agree that with that context, no change is needed.
On Fri, Sep 21, 2018 at 04:18:03PM -0400, Eric Sunshine wrote: > On Fri, Sep 21, 2018 at 2:47 PM Taylor Blau <me@ttaylorr.com> wrote: > > When in a repository containing one or more alternates, Git would > > sometimes like to list references from its alternates. For example, 'git > > receive-pack' list the objects pointed to by alternate references as > > special ".have" references. > > [...] > > Signed-off-by: Taylor Blau <me@ttaylorr.com> > > --- > > diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh > > @@ -0,0 +1,54 @@ > > +expect_haves () { > > + printf "%s .have\n" $(git rev-parse $@) >expect > > +} > > Magic quoting behavior only kicks in when $@ is itself quoted, so this > should be: > > printf "%s .have\n" $(git rev-parse "$@") >expect > > However, as it's unlikely that you need magic quoting in this case, > you might get by with plain $* (unquoted). Yep, thanks for catching my mistake. I rewrote my local copy with "$@" (instead of $@), and also applied your suggestion of not redirecting to `>expect`, and renaming the function. These both ended up becoming moot points, though, because of the Perl-ism that Peff suggested and I adopted throughout this thread. The Perl Peff wrote does not capture the " .have" suffix at all, and instead only the object identifiers. Hence, all we really need is a call to 'git-rev-parse(1)'. I doubt that this will ever change, so I removed the function entirely. Thanks, Taylor
On Fri, Sep 21, 2018 at 02:09:08PM -0700, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > > > +core.alternateRefsCommand:: > > + When listing references from an alternate (e.g., in the case of ".have"), use > > It is not clear how (e.g.,...) connects to what is said in the > sentence. "When advertising tips of available history from an > alternate, use ..." without saying ".have" may be less cryptic. > > I dunno. Thanks, I think that I tend to overuse both "e.g.," and "i.e.,". I took your suggestion as above, which I think looks better than my original prose. > > + the shell to execute the specified command instead of > > + linkgit:git-for-each-ref[1]. The first argument is the path of the alternate. > > "The path" meaning the absolute path? Relative to the original > object store? Something else? It's the absolute path, and I've updated the documentation to clarify it as such. > > + Output must be of the form: `%(objectname) SPC %(refname)`. > > ++ > > +This is useful when a repository only wishes to advertise some of its > > +alternate's references as ".have"'s. For example, to only advertise branch > > +heads, configure `core.alternateRefsCommand` to the path of a script which runs > > +`git --git-dir="$1" for-each-ref refs/heads`. > > + > > core.bare:: > > If true this repository is assumed to be 'bare' and has no > > working directory associated with it. If this is the case a > > diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh > > new file mode 100755 > > index 0000000000..2f21f1cb8f > > --- /dev/null > > +++ b/t/t5410-receive-pack.sh > > @@ -0,0 +1,54 @@ > > +#!/bin/sh > > + > > +test_description='git receive-pack test' > > + > > +. ./test-lib.sh > > + > > +test_expect_success 'setup' ' > > + test_commit one && > > + git update-ref refs/heads/a HEAD && > > + test_commit two && > > + git update-ref refs/heads/b HEAD && > > + test_commit three && > > + git update-ref refs/heads/c HEAD && > > + git clone --bare . fork && > > + git clone fork pusher && > > + ( > > + cd fork && > > + git config receive.advertisealternates true && > > Hmph. Do we have code to support this configuration variable? We don't ;-). Peff's explanation of why is accurate, and the mistake is mine. > > + cat <<-EOF | git update-ref --stdin && > > Style: writing "<<-\EOF" instead would allow readers' eyes to > coast over without having to look for $variable_references in > the here-doc. > > > + delete refs/heads/a > > + delete refs/heads/b > > + delete refs/heads/c > > + delete refs/heads/master > > + delete refs/tags/one > > + delete refs/tags/two > > + delete refs/tags/three Thanks, it ended up being much cleaner to write <<-\EOF, and avoid the unnecessary cat(1) entirely. > So, the original created one/two/three/a/b/c/master, fork is a bare > clone of it and has all these things, and then you deleted all of > these? What does fork have after this is done? HEAD that is > dangling? > > > + EOF > > + echo "../../.git/objects" >objects/info/alternates > > When viewed from fork/objects, ../../.git is the GIT_DIR of the > primary test repository, so that is where we borrow objects from. > > If we pruned the objects from fork's object store before this echo, > we would have an almost empty repository that borrows from its > alternates everything, which may make a more realistic sample case, > but because you are only focusing on the ref advertisement, it does > not matter that your fork is full of duplicate objects that are > available from the alternates. I could go either way. You're right in that we have only a dangling HEAD reference in the fork, and that all of the objects are still there. I suppose that we could gc the objects that are there, but I think (as you note above) that it doesn't make a huge difference either way. > > +expect_haves () { > > + printf "%s .have\n" $(git rev-parse $@) >expect > > Quote $@ inside dq pair, like $(git rev-parse "$@"). Thanks, I fixed this (per your and Eric's suggestion), but ended up removing the function entirely anyway. > > +extract_haves () { > > + depacketize - | grep '\.have' | sed -e 's/\\0.*$//g' > > +} > > Don't pipe grep into sed, especially when both the pattern to filter > and the operation to perform are simple. > > I am not sure what you are trying to achive with 'g' in > s/pattern$//g; The anchor at the rightmost end of the pattern makes > sure that the pattern matches only once per line at the end anyway, > so "do this howmanyever times as we have match on each line" would > not make any difference, no? I admit to not fully understanding when the trailing `/g` is and is not useful. Anyway, I took Peff's suggestion below to convert this 'grep | sed' pipeline into a Perl invocation, which I think ended up much cleaner. Thanks, Taylor
On Sat, Sep 22, 2018 at 03:52:58PM -0400, Jeff King wrote: > On Sat, Sep 22, 2018 at 06:02:31PM +0000, brian m. carlson wrote: > > > On Fri, Sep 21, 2018 at 02:47:43PM -0400, Taylor Blau wrote: > > > +expect_haves () { > > > + printf "%s .have\n" $(git rev-parse $@) >expect > > > +} > > > + > > > +extract_haves () { > > > + depacketize - | grep '\.have' | sed -e 's/\\0.*$//g' > > > > It looks like you're trying to match a NUL here in the sed expression, > > but from my reading of it, POSIX doesn't permit BREs to match NUL. > > No, it's trying to literally match backslash followed by 0. The > depacketize() script will have undone the NUL already. In perl, no less, > making it more or less equivalent to your suggestion. ;) > > So I think this is fine (modulo that the grep and sed can be combined). > Yet another option would be to simply strip away everything except the > object id (which is all we care about), like: > > depacketize | perl -lne '/^(\S+) \.have/ and print $1' Thanks for this. This is the suggestion I ended up taking (modulo taking '-' as the first argument to 'depacketize'). The 'print $1' part of this makes things a lot nicer, actually, having removed the " .have" suffix. We can get rid of the expect_haves() function above, and instead call 'git rev-parse' inline and get the right results. > Or the equivalent in sed. I am happy with any solution that does the > correct thing. Me too :-). Thanks again. Thanks, Taylor
On Tue, Sep 25, 2018 at 06:06:06PM -0700, Taylor Blau wrote: > > > +extract_haves () { > > > + depacketize - | grep '\.have' | sed -e 's/\\0.*$//g' > > > +} > > > > Don't pipe grep into sed, especially when both the pattern to filter > > and the operation to perform are simple. > > > > I am not sure what you are trying to achive with 'g' in > > s/pattern$//g; The anchor at the rightmost end of the pattern makes > > sure that the pattern matches only once per line at the end anyway, > > so "do this howmanyever times as we have match on each line" would > > not make any difference, no? > > I admit to not fully understanding when the trailing `/g` is and is not > useful. Anyway, I took Peff's suggestion below to convert this 'grep | > sed' pipeline into a Perl invocation, which I think ended up much > cleaner. It makes the replacement global in the line. Without we substitute only the first match. So try: echo foo | sed s/o/X/ versus: echo foo | sed s/o/X/g -Peff
On Tue, Sep 25, 2018 at 06:09:35PM -0700, Taylor Blau wrote: > > So I think this is fine (modulo that the grep and sed can be combined). > > Yet another option would be to simply strip away everything except the > > object id (which is all we care about), like: > > > > depacketize | perl -lne '/^(\S+) \.have/ and print $1' > > Thanks for this. This is the suggestion I ended up taking (modulo taking > '-' as the first argument to 'depacketize'). I don't think depacketize takes any arguments. It always reads from stdin directly, doesn't it? Your "-" is not hurting anything, but it is totally ignored. A perl tangent if you're interested: Normally for shell functions like this that are just wrappers around perl snippets, I would suggest to pass "$@" from the function's arguments to perl. So for example if we had: haves_from_packets () { perl -lne '/^(\S+) \.have/ and print $1' "$@" } then you could call it with a filename: haves_from_packets packets or input on stdin: haves_from_packets <packets and either works (this is magic from perl's "-p" loop, but you get the same if you write "while (<>)" explicitly in your program). But because depacketize() has to use byte-wise read() calls, it doesn't get that magic for free. And it did not seem worth the effort to implement, when shell redirections are so easy. ;) Just skimming through test-lib-functions.sh, though, it does seem that we often deviate from that pattern (e.g., all of the q_to_nul family). And has seemed to mind. > The 'print $1' part of this makes things a lot nicer, actually, having > removed the " .have" suffix. We can get rid of the expect_haves() > function above, and instead call 'git rev-parse' inline and get the > right results. Yes. You can even do it all in a single rev-parse call. -Peff
On Tue, Sep 25, 2018 at 11:33:37PM -0400, Jeff King wrote: > On Tue, Sep 25, 2018 at 06:09:35PM -0700, Taylor Blau wrote: > > > > So I think this is fine (modulo that the grep and sed can be combined). > > > Yet another option would be to simply strip away everything except the > > > object id (which is all we care about), like: > > > > > > depacketize | perl -lne '/^(\S+) \.have/ and print $1' > > > > Thanks for this. This is the suggestion I ended up taking (modulo taking > > '-' as the first argument to 'depacketize'). > > I don't think depacketize takes any arguments. It always reads from > stdin directly, doesn't it? Your "-" is not hurting anything, but it is > totally ignored. Yep, certainly. I think that I was drawn to this claim because I watched t5410 fail after applying the above recommendation, so thusly assumed that it was my fault for not passing `-` to 'depacketize()`. In the end, I'm not sure why the test failed originally (it's likely that I hadn't removed the ".have" part of 'expect_haves()', yet). But, I removed the `-` in my local copy of v3, and the tests passes on all revisions of this series that have it. > A perl tangent if you're interested: > > Normally for shell functions like this that are just wrappers around > perl snippets, I would suggest to pass "$@" from the function's > arguments to perl. So for example if we had: > > haves_from_packets () { > perl -lne '/^(\S+) \.have/ and print $1' "$@" > } > > then you could call it with a filename: > > haves_from_packets packets > > or input on stdin: > > haves_from_packets <packets > > and either works (this is magic from perl's "-p" loop, but you get the > same if you write "while (<>)" explicitly in your program). > > But because depacketize() has to use byte-wise read() calls, it > doesn't get that magic for free. And it did not seem worth the effort > to implement, when shell redirections are so easy. ;) To be clear, we ought to leave this function as: extract_haves () { depacketize | perl -lne '/^(\S+) \.have/ and print $1' } Or are you suggesting that we change it to: extract_haves () { perl -lne '/^(\S+) \.have/ and print $1' } And call it as: printf "0000" | git receive-pack fork >actual && depacketize <actual >actual.packets extract_haves <actual.packets >actual.haves && Frankly, (and I think that this is what you're getting at in your reply above), I think that the former (e.g., calling 'depacketize()' in 'extract_haves()') is cleaner. This approach leaves us with "actual" and "actual.haves", and obviates the need for another intermediary, "actual.packets". > > The 'print $1' part of this makes things a lot nicer, actually, having > > removed the " .have" suffix. We can get rid of the expect_haves() > > function above, and instead call 'git rev-parse' inline and get the > > right results. > > Yes. You can even do it all in a single rev-parse call. Indeed. Thanks, Taylor
On Wed, Sep 26, 2018 at 06:39:56AM -0700, Taylor Blau wrote: > > A perl tangent if you're interested: > [...] > > To be clear, we ought to leave this function as: > > extract_haves () { > depacketize | perl -lne '/^(\S+) \.have/ and print $1' > } Yes, I agree. You cannot do the "$@" there because it relies on depacketize, which only handles stdin. > Or are you suggesting that we change it to: > > extract_haves () { > perl -lne '/^(\S+) \.have/ and print $1' > } No, sorry. I just used the ".have" snippet as filler text, but I see that muddied my meaning considerably. This really was just a tangent for the future. What you've written above is the best thing for this case. > And call it as: > > printf "0000" | git receive-pack fork >actual && > depacketize <actual >actual.packets > extract_haves <actual.packets >actual.haves && > > Frankly, (and I think that this is what you're getting at in your reply > above), I think that the former (e.g., calling 'depacketize()' in > 'extract_haves()') is cleaner. This approach leaves us with "actual" and > "actual.haves", and obviates the need for another intermediary, > "actual.packets". Yeah. I have no problem with the three-liner you wrote above, but I do not see any particular reason for it. -Peff
On Wed, Sep 26, 2018 at 02:38:53PM -0400, Jeff King wrote: > On Wed, Sep 26, 2018 at 06:39:56AM -0700, Taylor Blau wrote: > > > > A perl tangent if you're interested: > > [...] > > > > To be clear, we ought to leave this function as: > > > > extract_haves () { > > depacketize | perl -lne '/^(\S+) \.have/ and print $1' > > } > > Yes, I agree. You cannot do the "$@" there because it relies on > depacketize, which only handles stdin. > > > Or are you suggesting that we change it to: > > > > extract_haves () { > > perl -lne '/^(\S+) \.have/ and print $1' > > } > > No, sorry. I just used the ".have" snippet as filler text, but I see > that muddied my meaning considerably. This really was just a tangent for > the future. What you've written above is the best thing for this case. I see, and I had assumed that you meant the later, not that including " .have" was a good way to go forward. So I think that we're in agreement here. > > And call it as: > > > > printf "0000" | git receive-pack fork >actual && > > depacketize <actual >actual.packets > > extract_haves <actual.packets >actual.haves && > > > > Frankly, (and I think that this is what you're getting at in your reply > > above), I think that the former (e.g., calling 'depacketize()' in > > 'extract_haves()') is cleaner. This approach leaves us with "actual" and > > "actual.haves", and obviates the need for another intermediary, > > "actual.packets". > > Yeah. I have no problem with the three-liner you wrote above, but I do > not see any particular reason for it. Good. That's the version that I'll send shortly, then. Thanks, Taylor
diff --git a/Documentation/config.txt b/Documentation/config.txt index 112041f407..526557e494 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -616,6 +616,17 @@ core.preferSymlinkRefs:: This is sometimes needed to work with old scripts that expect HEAD to be a symbolic link. +core.alternateRefsCommand:: + When listing references from an alternate (e.g., in the case of ".have"), use + the shell to execute the specified command instead of + linkgit:git-for-each-ref[1]. The first argument is the path of the alternate. + Output must be of the form: `%(objectname) SPC %(refname)`. ++ +This is useful when a repository only wishes to advertise some of its +alternate's references as ".have"'s. For example, to only advertise branch +heads, configure `core.alternateRefsCommand` to the path of a script which runs +`git --git-dir="$1" for-each-ref refs/heads`. + core.bare:: If true this repository is assumed to be 'bare' and has no working directory associated with it. If this is the case a diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh new file mode 100755 index 0000000000..2f21f1cb8f --- /dev/null +++ b/t/t5410-receive-pack.sh @@ -0,0 +1,54 @@ +#!/bin/sh + +test_description='git receive-pack test' + +. ./test-lib.sh + +test_expect_success 'setup' ' + test_commit one && + git update-ref refs/heads/a HEAD && + test_commit two && + git update-ref refs/heads/b HEAD && + test_commit three && + git update-ref refs/heads/c HEAD && + git clone --bare . fork && + git clone fork pusher && + ( + cd fork && + git config receive.advertisealternates true && + cat <<-EOF | git update-ref --stdin && + delete refs/heads/a + delete refs/heads/b + delete refs/heads/c + delete refs/heads/master + delete refs/tags/one + delete refs/tags/two + delete refs/tags/three + EOF + echo "../../.git/objects" >objects/info/alternates + ) +' + +expect_haves () { + printf "%s .have\n" $(git rev-parse $@) >expect +} + +extract_haves () { + depacketize - | grep '\.have' | sed -e 's/\\0.*$//g' +} + +test_expect_success 'with core.alternateRefsCommand' ' + write_script fork/alternate-refs <<-\EOF && + git --git-dir="$1" for-each-ref \ + --format="%(objectname) %(refname)" \ + refs/heads/a \ + refs/heads/c + EOF + test_config -C fork core.alternateRefsCommand alternate-refs && + expect_haves a c >expect && + printf "0000" | git receive-pack fork >actual && + extract_haves <actual >actual.haves && + test_cmp expect actual.haves +' + +test_done diff --git a/transport.c b/transport.c index 24ae3f375d..e7d2cdf00b 100644 --- a/transport.c +++ b/transport.c @@ -1328,10 +1328,21 @@ char *transport_anonymize_url(const char *url) static void fill_alternate_refs_command(struct child_process *cmd, const char *repo_path) { - cmd->git_cmd = 1; - argv_array_pushf(&cmd->args, "--git-dir=%s", repo_path); - argv_array_push(&cmd->args, "for-each-ref"); - argv_array_push(&cmd->args, "--format=%(objectname) %(refname)"); + const char *value; + + if (!git_config_get_value("core.alternateRefsCommand", &value)) { + cmd->use_shell = 1; + + argv_array_push(&cmd->args, value); + argv_array_push(&cmd->args, repo_path); + } else { + cmd->git_cmd = 1; + + argv_array_pushf(&cmd->args, "--git-dir=%s", repo_path); + argv_array_push(&cmd->args, "for-each-ref"); + argv_array_push(&cmd->args, "--format=%(objectname) %(refname)"); + } + cmd->env = local_repo_env; cmd->out = -1; }
When in a repository containing one or more alternates, Git would sometimes like to list references from its alternates. For example, 'git receive-pack' list the objects pointed to by alternate references as special ".have" references. Listing ".have" references is designed to make pushing changes from upstream to a fork a lightweight operation, by advertising to the pusher that the fork already has the objects (via its alternate). Thus, the client can avoid sending them. However, when the alternate has a pathologically large number of references, the initial advertisement is too expensive. In fact, it can dominate any such optimization where the pusher avoids sending certain objects. Introduce "core.alternateRefsCommand" in order to provide a facility to limit or filter alternate references. This can be used, for example, to filter out "uninteresting" references from the initial advertisement in the above scenario. Let the repository that has alternates configure this command to avoid trusting the alternate to provide us a safe command to run in the shell. To behave differently on each alternate (e.g., only list tags from alternate A, only heads from B) provide the path of the alternate as the first argument. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- Documentation/config.txt | 11 ++++++++ t/t5410-receive-pack.sh | 54 ++++++++++++++++++++++++++++++++++++++++ transport.c | 19 +++++++++++--- 3 files changed, 80 insertions(+), 4 deletions(-) create mode 100755 t/t5410-receive-pack.sh