diff mbox series

[v2,2/3] transport.c: introduce core.alternateRefsCommand

Message ID 9797f525517142b3494cfbd17a10dfeb3bf586e2.1537555544.git.me@ttaylorr.com (mailing list archive)
State New, archived
Headers show
Series Filter alternate references | expand

Commit Message

Taylor Blau Sept. 21, 2018, 6:47 p.m. UTC
When in a repository containing one or more alternates, Git would
sometimes like to list references from its alternates. For example, 'git
receive-pack' list the objects pointed to by alternate references as
special ".have" references.

Listing ".have" references is designed to make pushing changes from
upstream to a fork a lightweight operation, by advertising to the pusher
that the fork already has the objects (via its alternate). Thus, the
client can avoid sending them.

However, when the alternate has a pathologically large number of
references, the initial advertisement is too expensive. In fact, it can
dominate any such optimization where the pusher avoids sending certain
objects.

Introduce "core.alternateRefsCommand" in order to provide a facility to
limit or filter alternate references. This can be used, for example, to
filter out "uninteresting" references from the initial advertisement in
the above scenario.

Let the repository that has alternates configure this command to avoid
trusting the alternate to provide us a safe command to run in the shell.
To behave differently on each alternate (e.g., only list tags from
alternate A, only heads from B) provide the path of the alternate as the
first argument.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config.txt | 11 ++++++++
 t/t5410-receive-pack.sh  | 54 ++++++++++++++++++++++++++++++++++++++++
 transport.c              | 19 +++++++++++---
 3 files changed, 80 insertions(+), 4 deletions(-)
 create mode 100755 t/t5410-receive-pack.sh

Comments

Eric Sunshine Sept. 21, 2018, 8:18 p.m. UTC | #1
On Fri, Sep 21, 2018 at 2:47 PM Taylor Blau <me@ttaylorr.com> wrote:
> When in a repository containing one or more alternates, Git would
> sometimes like to list references from its alternates. For example, 'git
> receive-pack' list the objects pointed to by alternate references as
> special ".have" references.
> [...]
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
> diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh
> @@ -0,0 +1,54 @@
> +expect_haves () {
> +       printf "%s .have\n" $(git rev-parse $@) >expect
> +}

Magic quoting behavior only kicks in when $@ is itself quoted, so this
should be:

    printf "%s .have\n" $(git rev-parse "$@") >expect

However, as it's unlikely that you need magic quoting in this case,
you might get by with plain $* (unquoted).
Junio C Hamano Sept. 21, 2018, 9:09 p.m. UTC | #2
Taylor Blau <me@ttaylorr.com> writes:

> +core.alternateRefsCommand::
> +	When listing references from an alternate (e.g., in the case of ".have"), use

It is not clear how (e.g.,...) connects to what is said in the
sentence.  "When advertising tips of available history from an
alternate, use ..." without saying ".have" may be less cryptic.  

I dunno.

> +	the shell to execute the specified command instead of
> +	linkgit:git-for-each-ref[1]. The first argument is the path of the alternate.

"The path" meaning the absolute path?  Relative to the original
object store?  Something else?

> +	Output must be of the form: `%(objectname) SPC %(refname)`.
> ++
> +This is useful when a repository only wishes to advertise some of its
> +alternate's references as ".have"'s. For example, to only advertise branch
> +heads, configure `core.alternateRefsCommand` to the path of a script which runs
> +`git --git-dir="$1" for-each-ref refs/heads`.
> +
>  core.bare::
>  	If true this repository is assumed to be 'bare' and has no
>  	working directory associated with it.  If this is the case a
> diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh
> new file mode 100755
> index 0000000000..2f21f1cb8f
> --- /dev/null
> +++ b/t/t5410-receive-pack.sh
> @@ -0,0 +1,54 @@
> +#!/bin/sh
> +
> +test_description='git receive-pack test'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'setup' '
> +	test_commit one &&
> +	git update-ref refs/heads/a HEAD &&
> +	test_commit two &&
> +	git update-ref refs/heads/b HEAD &&
> +	test_commit three &&
> +	git update-ref refs/heads/c HEAD &&
> +	git clone --bare . fork &&
> +	git clone fork pusher &&
> +	(
> +		cd fork &&
> +		git config receive.advertisealternates true &&

Hmph.  Do we have code to support this configuration variable?

> +		cat <<-EOF | git update-ref --stdin &&

Style: writing "<<-\EOF" instead would allow readers' eyes to
coast over without having to look for $variable_references in
the here-doc.

> +		delete refs/heads/a
> +		delete refs/heads/b
> +		delete refs/heads/c
> +		delete refs/heads/master
> +		delete refs/tags/one
> +		delete refs/tags/two
> +		delete refs/tags/three

So, the original created one/two/three/a/b/c/master, fork is a bare
clone of it and has all these things, and then you deleted all of
these?  What does fork have after this is done?  HEAD that is
dangling?

> +		EOF
> +		echo "../../.git/objects" >objects/info/alternates

When viewed from fork/objects, ../../.git is the GIT_DIR of the
primary test repository, so that is where we borrow objects from.

If we pruned the objects from fork's object store before this echo,
we would have an almost empty repository that borrows from its
alternates everything, which may make a more realistic sample case,
but because you are only focusing on the ref advertisement, it does
not matter that your fork is full of duplicate objects that are
available from the alternates.

> +expect_haves () {
> +	printf "%s .have\n" $(git rev-parse $@) >expect

Quote $@ inside dq pair, like $(git rev-parse "$@").

> +extract_haves () {
> +	depacketize - | grep '\.have' | sed -e 's/\\0.*$//g'
> +}


Don't pipe grep into sed, especially when both the pattern to filter
and the operation to perform are simple.

I am not sure what you are trying to achive with 'g' in
s/pattern$//g; The anchor at the rightmost end of the pattern makes
sure that the pattern matches only once per line at the end anyway,
so "do this howmanyever times as we have match on each line" would
not make any difference, no?

> diff --git a/transport.c b/transport.c
> index 24ae3f375d..e7d2cdf00b 100644
> --- a/transport.c
> +++ b/transport.c
> @@ -1328,10 +1328,21 @@ char *transport_anonymize_url(const char *url)
>  static void fill_alternate_refs_command(struct child_process *cmd,
>  					const char *repo_path)
>  {
> -	cmd->git_cmd = 1;
> -	argv_array_pushf(&cmd->args, "--git-dir=%s", repo_path);
> -	argv_array_push(&cmd->args, "for-each-ref");
> -	argv_array_push(&cmd->args, "--format=%(objectname) %(refname)");
> +	const char *value;
> +
> +	if (!git_config_get_value("core.alternateRefsCommand", &value)) {
> +		cmd->use_shell = 1;
> +
> +		argv_array_push(&cmd->args, value);
> +		argv_array_push(&cmd->args, repo_path);
> +	} else {
> +		cmd->git_cmd = 1;
> +
> +		argv_array_pushf(&cmd->args, "--git-dir=%s", repo_path);
> +		argv_array_push(&cmd->args, "for-each-ref");
> +		argv_array_push(&cmd->args, "--format=%(objectname) %(refname)");
> +	}
> +
>  	cmd->env = local_repo_env;
>  	cmd->out = -1;
>  }
Eric Sunshine Sept. 21, 2018, 9:10 p.m. UTC | #3
On Fri, Sep 21, 2018 at 2:47 PM Taylor Blau <me@ttaylorr.com> wrote:
> When in a repository containing one or more alternates, Git would
> sometimes like to list references from its alternates. For example, 'git
> receive-pack' list the objects pointed to by alternate references as
> special ".have" references.
> [...]
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
> diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh
> @@ -0,0 +1,54 @@
> +expect_haves () {
> +       printf "%s .have\n" $(git rev-parse $@) >expect
> +}
> +
> +test_expect_success 'with core.alternateRefsCommand' '
> +       [...]
> +       expect_haves a c >expect &&

This is not great. Both the caller of expect_haves() and
expect_haves() itself redirect to a file named "expect". This works,
but only by accident.

Better would be to make expect_haves() simply a generator to stdout
and let the caller redirect to the file rather than hardcoding the
filename in the function itself (much as extract_haves() takes it its
input on stdin rather than hardcoding a filename). If you take this
approach, then you'd probably want to rename the function, as well;
perhaps call it emit_haves() or something.

> +       printf "0000" | git receive-pack fork >actual &&
> +       extract_haves <actual >actual.haves &&
> +       test_cmp expect actual.haves
> +'
Jeff King Sept. 21, 2018, 10:13 p.m. UTC | #4
On Fri, Sep 21, 2018 at 02:09:08PM -0700, Junio C Hamano wrote:

> > +test_expect_success 'setup' '
> > +	test_commit one &&
> > +	git update-ref refs/heads/a HEAD &&
> > +	test_commit two &&
> > +	git update-ref refs/heads/b HEAD &&
> > +	test_commit three &&
> > +	git update-ref refs/heads/c HEAD &&
> > +	git clone --bare . fork &&
> > +	git clone fork pusher &&
> > +	(
> > +		cd fork &&
> > +		git config receive.advertisealternates true &&
> 
> Hmph.  Do we have code to support this configuration variable?

Sorry, I should have caught that. Our existing solution is to disable
alternates in the advertisement entirely (since the optimization
backfires for us). So this line is a leftover from testing it against
our fork, and should be dropped.

If anybody is interested, we can share those patches, though they're
unsurprisingly trivial. I suspect we may end up discarding them if this
custom-command thing works, but it's possible we'll still need to be
able to shut them off completely for some truly pathological cases.

> > +		cat <<-EOF | git update-ref --stdin &&
> 
> Style: writing "<<-\EOF" instead would allow readers' eyes to
> coast over without having to look for $variable_references in
> the here-doc.

Also, useless-use-of-cat in the original, which could be:

  git update-ref --stdin <<-\EOF

> [...]

Yeah, I second all the other bits you mentioned.

-Peff
Junio C Hamano Sept. 21, 2018, 10:23 p.m. UTC | #5
Jeff King <peff@peff.net> writes:

> On Fri, Sep 21, 2018 at 02:09:08PM -0700, Junio C Hamano wrote:
>
>> > +test_expect_success 'setup' '
>> > +	test_commit one &&
>> > +	git update-ref refs/heads/a HEAD &&
>> > +	test_commit two &&
>> > +	git update-ref refs/heads/b HEAD &&
>> > +	test_commit three &&
>> > +	git update-ref refs/heads/c HEAD &&
>> > +	git clone --bare . fork &&
>> > +	git clone fork pusher &&
>> > +	(
>> > +		cd fork &&
>> > +		git config receive.advertisealternates true &&
>> 
>> Hmph.  Do we have code to support this configuration variable?
>
> Sorry, I should have caught that. Our existing solution is to disable
> alternates in the advertisement entirely (since the optimization
> backfires for us). So this line is a leftover from testing it against
> our fork, and should be dropped.
>
> If anybody is interested, we can share those patches, though they're
> unsurprisingly trivial.

Heh, I guessed correctly what is going on ;-)

Even though there may not be much interest in the "all-or-none"
boolean configuration, in order to upstream this custom thing, it
may be the cleanest to upstream that all-or-none thing as well.
Otherwise, you'd need to keep a patch to this test script that is
private for your "all-or-none" feature.  That's your maintenance
burden so it ultimately is your call ;-)
> Also, useless-use-of-cat in the original, which could be:
>
>   git update-ref --stdin <<-\EOF

Yup.

Thanks.
Jeff King Sept. 21, 2018, 10:27 p.m. UTC | #6
On Fri, Sep 21, 2018 at 03:23:40PM -0700, Junio C Hamano wrote:

> >> > +		git config receive.advertisealternates true &&
> >> 
> >> Hmph.  Do we have code to support this configuration variable?
> >
> > Sorry, I should have caught that. Our existing solution is to disable
> > alternates in the advertisement entirely (since the optimization
> > backfires for us). So this line is a leftover from testing it against
> > our fork, and should be dropped.
> >
> > If anybody is interested, we can share those patches, though they're
> > unsurprisingly trivial.
> 
> Heh, I guessed correctly what is going on ;-)
> 
> Even though there may not be much interest in the "all-or-none"
> boolean configuration, in order to upstream this custom thing, it
> may be the cleanest to upstream that all-or-none thing as well.
> Otherwise, you'd need to keep a patch to this test script that is
> private for your "all-or-none" feature.  That's your maintenance
> burden so it ultimately is your call ;-)

Easy one-liners in test scripts are the least of my ongoing maintenance
burden. ;)

I think in this case, though, the line is not even necessary, as our
patches leave the default as "true" (which is certainly what we would
want upstream, as well, for compatibility).

-Peff
brian m. carlson Sept. 22, 2018, 6:02 p.m. UTC | #7
On Fri, Sep 21, 2018 at 02:47:43PM -0400, Taylor Blau wrote:
> +expect_haves () {
> +	printf "%s .have\n" $(git rev-parse $@) >expect
> +}
> +
> +extract_haves () {
> +	depacketize - | grep '\.have' | sed -e 's/\\0.*$//g'

It looks like you're trying to match a NUL here in the sed expression,
but from my reading of it, POSIX doesn't permit BREs to match NUL.

Perhaps someone can come up with a better solution, but I'd write this
as the following:

  depacketize - | perl -ne 'next unless /\.have/; s/\0.*$//g; print'
Jeff King Sept. 22, 2018, 7:52 p.m. UTC | #8
On Sat, Sep 22, 2018 at 06:02:31PM +0000, brian m. carlson wrote:

> On Fri, Sep 21, 2018 at 02:47:43PM -0400, Taylor Blau wrote:
> > +expect_haves () {
> > +	printf "%s .have\n" $(git rev-parse $@) >expect
> > +}
> > +
> > +extract_haves () {
> > +	depacketize - | grep '\.have' | sed -e 's/\\0.*$//g'
> 
> It looks like you're trying to match a NUL here in the sed expression,
> but from my reading of it, POSIX doesn't permit BREs to match NUL.

No, it's trying to literally match backslash followed by 0. The
depacketize() script will have undone the NUL already. In perl, no less,
making it more or less equivalent to your suggestion. ;)

So I think this is fine (modulo that the grep and sed can be combined).
Yet another option would be to simply strip away everything except the
object id (which is all we care about), like:

  depacketize | perl -lne '/^(\S+) \.have/ and print $1'

Or the equivalent in sed. I am happy with any solution that does the
correct thing.

-Peff
brian m. carlson Sept. 23, 2018, 2:53 p.m. UTC | #9
On Sat, Sep 22, 2018 at 03:52:58PM -0400, Jeff King wrote:
> On Sat, Sep 22, 2018 at 06:02:31PM +0000, brian m. carlson wrote:
> 
> > On Fri, Sep 21, 2018 at 02:47:43PM -0400, Taylor Blau wrote:
> > > +expect_haves () {
> > > +	printf "%s .have\n" $(git rev-parse $@) >expect
> > > +}
> > > +
> > > +extract_haves () {
> > > +	depacketize - | grep '\.have' | sed -e 's/\\0.*$//g'
> > 
> > It looks like you're trying to match a NUL here in the sed expression,
> > but from my reading of it, POSIX doesn't permit BREs to match NUL.
> 
> No, it's trying to literally match backslash followed by 0. The
> depacketize() script will have undone the NUL already. In perl, no less,
> making it more or less equivalent to your suggestion. ;)

Ah, okay.  That makes more sense.

> So I think this is fine (modulo that the grep and sed can be combined).
> Yet another option would be to simply strip away everything except the
> object id (which is all we care about), like:
> 
>   depacketize | perl -lne '/^(\S+) \.have/ and print $1'
> 
> Or the equivalent in sed. I am happy with any solution that does the
> correct thing.

Yeah, I agree that with that context, no change is needed.
Taylor Blau Sept. 26, 2018, 12:59 a.m. UTC | #10
On Fri, Sep 21, 2018 at 04:18:03PM -0400, Eric Sunshine wrote:
> On Fri, Sep 21, 2018 at 2:47 PM Taylor Blau <me@ttaylorr.com> wrote:
> > When in a repository containing one or more alternates, Git would
> > sometimes like to list references from its alternates. For example, 'git
> > receive-pack' list the objects pointed to by alternate references as
> > special ".have" references.
> > [...]
> > Signed-off-by: Taylor Blau <me@ttaylorr.com>
> > ---
> > diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh
> > @@ -0,0 +1,54 @@
> > +expect_haves () {
> > +       printf "%s .have\n" $(git rev-parse $@) >expect
> > +}
>
> Magic quoting behavior only kicks in when $@ is itself quoted, so this
> should be:
>
>     printf "%s .have\n" $(git rev-parse "$@") >expect
>
> However, as it's unlikely that you need magic quoting in this case,
> you might get by with plain $* (unquoted).

Yep, thanks for catching my mistake. I rewrote my local copy with "$@"
(instead of $@), and also applied your suggestion of not redirecting to
`>expect`, and renaming the function.

These both ended up becoming moot points, though, because of the
Perl-ism that Peff suggested and I adopted throughout this thread.

The Perl Peff wrote does not capture the " .have" suffix at all, and
instead only the object identifiers. Hence, all we really need is a call
to 'git-rev-parse(1)'. I doubt that this will ever change, so I removed
the function entirely.

Thanks,
Taylor
Taylor Blau Sept. 26, 2018, 1:06 a.m. UTC | #11
On Fri, Sep 21, 2018 at 02:09:08PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > +core.alternateRefsCommand::
> > +	When listing references from an alternate (e.g., in the case of ".have"), use
>
> It is not clear how (e.g.,...) connects to what is said in the
> sentence.  "When advertising tips of available history from an
> alternate, use ..." without saying ".have" may be less cryptic.
>
> I dunno.

Thanks, I think that I tend to overuse both "e.g.," and "i.e.,". I took
your suggestion as above, which I think looks better than my original
prose.

> > +	the shell to execute the specified command instead of
> > +	linkgit:git-for-each-ref[1]. The first argument is the path of the alternate.
>
> "The path" meaning the absolute path?  Relative to the original
> object store?  Something else?

It's the absolute path, and I've updated the documentation to clarify it
as such.

> > +	Output must be of the form: `%(objectname) SPC %(refname)`.
> > ++
> > +This is useful when a repository only wishes to advertise some of its
> > +alternate's references as ".have"'s. For example, to only advertise branch
> > +heads, configure `core.alternateRefsCommand` to the path of a script which runs
> > +`git --git-dir="$1" for-each-ref refs/heads`.
> > +
> >  core.bare::
> >  	If true this repository is assumed to be 'bare' and has no
> >  	working directory associated with it.  If this is the case a
> > diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh
> > new file mode 100755
> > index 0000000000..2f21f1cb8f
> > --- /dev/null
> > +++ b/t/t5410-receive-pack.sh
> > @@ -0,0 +1,54 @@
> > +#!/bin/sh
> > +
> > +test_description='git receive-pack test'
> > +
> > +. ./test-lib.sh
> > +
> > +test_expect_success 'setup' '
> > +	test_commit one &&
> > +	git update-ref refs/heads/a HEAD &&
> > +	test_commit two &&
> > +	git update-ref refs/heads/b HEAD &&
> > +	test_commit three &&
> > +	git update-ref refs/heads/c HEAD &&
> > +	git clone --bare . fork &&
> > +	git clone fork pusher &&
> > +	(
> > +		cd fork &&
> > +		git config receive.advertisealternates true &&
>
> Hmph.  Do we have code to support this configuration variable?

We don't ;-). Peff's explanation of why is accurate, and the mistake is
mine.

> > +		cat <<-EOF | git update-ref --stdin &&
>
> Style: writing "<<-\EOF" instead would allow readers' eyes to
> coast over without having to look for $variable_references in
> the here-doc.
>
> > +		delete refs/heads/a
> > +		delete refs/heads/b
> > +		delete refs/heads/c
> > +		delete refs/heads/master
> > +		delete refs/tags/one
> > +		delete refs/tags/two
> > +		delete refs/tags/three

Thanks, it ended up being much cleaner to write <<-\EOF, and avoid the
unnecessary cat(1) entirely.

> So, the original created one/two/three/a/b/c/master, fork is a bare
> clone of it and has all these things, and then you deleted all of
> these?  What does fork have after this is done?  HEAD that is
> dangling?
>
> > +		EOF
> > +		echo "../../.git/objects" >objects/info/alternates
>
> When viewed from fork/objects, ../../.git is the GIT_DIR of the
> primary test repository, so that is where we borrow objects from.
>
> If we pruned the objects from fork's object store before this echo,
> we would have an almost empty repository that borrows from its
> alternates everything, which may make a more realistic sample case,
> but because you are only focusing on the ref advertisement, it does
> not matter that your fork is full of duplicate objects that are
> available from the alternates.

I could go either way. You're right in that we have only a dangling HEAD
reference in the fork, and that all of the objects are still there. I
suppose that we could gc the objects that are there, but I think (as you
note above) that it doesn't make a huge difference either way.

> > +expect_haves () {
> > +	printf "%s .have\n" $(git rev-parse $@) >expect
>
> Quote $@ inside dq pair, like $(git rev-parse "$@").

Thanks, I fixed this (per your and Eric's suggestion), but ended up
removing the function entirely anyway.

> > +extract_haves () {
> > +	depacketize - | grep '\.have' | sed -e 's/\\0.*$//g'
> > +}
>
> Don't pipe grep into sed, especially when both the pattern to filter
> and the operation to perform are simple.
>
> I am not sure what you are trying to achive with 'g' in
> s/pattern$//g; The anchor at the rightmost end of the pattern makes
> sure that the pattern matches only once per line at the end anyway,
> so "do this howmanyever times as we have match on each line" would
> not make any difference, no?

I admit to not fully understanding when the trailing `/g` is and is not
useful. Anyway, I took Peff's suggestion below to convert this 'grep |
sed' pipeline into a Perl invocation, which I think ended up much
cleaner.

Thanks,
Taylor
Taylor Blau Sept. 26, 2018, 1:09 a.m. UTC | #12
On Sat, Sep 22, 2018 at 03:52:58PM -0400, Jeff King wrote:
> On Sat, Sep 22, 2018 at 06:02:31PM +0000, brian m. carlson wrote:
>
> > On Fri, Sep 21, 2018 at 02:47:43PM -0400, Taylor Blau wrote:
> > > +expect_haves () {
> > > +	printf "%s .have\n" $(git rev-parse $@) >expect
> > > +}
> > > +
> > > +extract_haves () {
> > > +	depacketize - | grep '\.have' | sed -e 's/\\0.*$//g'
> >
> > It looks like you're trying to match a NUL here in the sed expression,
> > but from my reading of it, POSIX doesn't permit BREs to match NUL.
>
> No, it's trying to literally match backslash followed by 0. The
> depacketize() script will have undone the NUL already. In perl, no less,
> making it more or less equivalent to your suggestion. ;)
>
> So I think this is fine (modulo that the grep and sed can be combined).
> Yet another option would be to simply strip away everything except the
> object id (which is all we care about), like:
>
>   depacketize | perl -lne '/^(\S+) \.have/ and print $1'

Thanks for this. This is the suggestion I ended up taking (modulo taking
'-' as the first argument to 'depacketize').

The 'print $1' part of this makes things a lot nicer, actually, having
removed the " .have" suffix. We can get rid of the expect_haves()
function above, and instead call 'git rev-parse' inline and get the
right results.

> Or the equivalent in sed. I am happy with any solution that does the
> correct thing.

Me too :-). Thanks again.

Thanks,
Taylor
Jeff King Sept. 26, 2018, 3:21 a.m. UTC | #13
On Tue, Sep 25, 2018 at 06:06:06PM -0700, Taylor Blau wrote:

> > > +extract_haves () {
> > > +	depacketize - | grep '\.have' | sed -e 's/\\0.*$//g'
> > > +}
> >
> > Don't pipe grep into sed, especially when both the pattern to filter
> > and the operation to perform are simple.
> >
> > I am not sure what you are trying to achive with 'g' in
> > s/pattern$//g; The anchor at the rightmost end of the pattern makes
> > sure that the pattern matches only once per line at the end anyway,
> > so "do this howmanyever times as we have match on each line" would
> > not make any difference, no?
> 
> I admit to not fully understanding when the trailing `/g` is and is not
> useful. Anyway, I took Peff's suggestion below to convert this 'grep |
> sed' pipeline into a Perl invocation, which I think ended up much
> cleaner.

It makes the replacement global in the line. Without we substitute only
the first match. So try:

  echo foo | sed s/o/X/

versus:

  echo foo | sed s/o/X/g

-Peff
Jeff King Sept. 26, 2018, 3:33 a.m. UTC | #14
On Tue, Sep 25, 2018 at 06:09:35PM -0700, Taylor Blau wrote:

> > So I think this is fine (modulo that the grep and sed can be combined).
> > Yet another option would be to simply strip away everything except the
> > object id (which is all we care about), like:
> >
> >   depacketize | perl -lne '/^(\S+) \.have/ and print $1'
> 
> Thanks for this. This is the suggestion I ended up taking (modulo taking
> '-' as the first argument to 'depacketize').

I don't think depacketize takes any arguments. It always reads from
stdin directly, doesn't it? Your "-" is not hurting anything, but it is
totally ignored.

A perl tangent if you're interested:

  Normally for shell functions like this that are just wrappers around
  perl snippets, I would suggest to pass "$@" from the function's
  arguments to perl. So for example if we had:

    haves_from_packets () {
	perl -lne '/^(\S+) \.have/ and print $1' "$@"
    }

  then you could call it with a filename:

    haves_from_packets packets

  or input on stdin:

    haves_from_packets <packets

  and either works (this is magic from perl's "-p" loop, but you get the
  same if you write "while (<>)" explicitly in your program).

  But because depacketize() has to use byte-wise read() calls, it
  doesn't get that magic for free. And it did not seem worth the effort
  to implement, when shell redirections are so easy. ;)

  Just skimming through test-lib-functions.sh, though, it does seem that
  we often deviate from that pattern (e.g., all of the q_to_nul family).
  And has seemed to mind.

> The 'print $1' part of this makes things a lot nicer, actually, having
> removed the " .have" suffix. We can get rid of the expect_haves()
> function above, and instead call 'git rev-parse' inline and get the
> right results.

Yes. You can even do it all in a single rev-parse call.

-Peff
Taylor Blau Sept. 26, 2018, 1:39 p.m. UTC | #15
On Tue, Sep 25, 2018 at 11:33:37PM -0400, Jeff King wrote:
> On Tue, Sep 25, 2018 at 06:09:35PM -0700, Taylor Blau wrote:
>
> > > So I think this is fine (modulo that the grep and sed can be combined).
> > > Yet another option would be to simply strip away everything except the
> > > object id (which is all we care about), like:
> > >
> > >   depacketize | perl -lne '/^(\S+) \.have/ and print $1'
> >
> > Thanks for this. This is the suggestion I ended up taking (modulo taking
> > '-' as the first argument to 'depacketize').
>
> I don't think depacketize takes any arguments. It always reads from
> stdin directly, doesn't it? Your "-" is not hurting anything, but it is
> totally ignored.

Yep, certainly. I think that I was drawn to this claim because I watched
t5410 fail after applying the above recommendation, so thusly assumed
that it was my fault for not passing `-` to 'depacketize()`.

In the end, I'm not sure why the test failed originally (it's likely
that I hadn't removed the ".have" part of 'expect_haves()', yet). But, I
removed the `-` in my local copy of v3, and the tests passes on all
revisions of this series that have it.

> A perl tangent if you're interested:
>
>   Normally for shell functions like this that are just wrappers around
>   perl snippets, I would suggest to pass "$@" from the function's
>   arguments to perl. So for example if we had:
>
>     haves_from_packets () {
> 	perl -lne '/^(\S+) \.have/ and print $1' "$@"
>     }
>
>   then you could call it with a filename:
>
>     haves_from_packets packets
>
>   or input on stdin:
>
>     haves_from_packets <packets
>
>   and either works (this is magic from perl's "-p" loop, but you get the
>   same if you write "while (<>)" explicitly in your program).
>
>   But because depacketize() has to use byte-wise read() calls, it
>   doesn't get that magic for free. And it did not seem worth the effort
>   to implement, when shell redirections are so easy. ;)

To be clear, we ought to leave this function as:

  extract_haves () {
    depacketize | perl -lne '/^(\S+) \.have/ and print $1'
  }

Or are you suggesting that we change it to:

  extract_haves () {
    perl -lne '/^(\S+) \.have/ and print $1'
  }

And call it as:

  printf "0000" | git receive-pack fork >actual &&
  depacketize <actual >actual.packets
  extract_haves <actual.packets >actual.haves &&

Frankly, (and I think that this is what you're getting at in your reply
above), I think that the former (e.g., calling 'depacketize()' in
'extract_haves()') is cleaner. This approach leaves us with "actual" and
"actual.haves", and obviates the need for another intermediary,
"actual.packets".

> > The 'print $1' part of this makes things a lot nicer, actually, having
> > removed the " .have" suffix. We can get rid of the expect_haves()
> > function above, and instead call 'git rev-parse' inline and get the
> > right results.
>
> Yes. You can even do it all in a single rev-parse call.

Indeed.

Thanks,
Taylor
Jeff King Sept. 26, 2018, 6:38 p.m. UTC | #16
On Wed, Sep 26, 2018 at 06:39:56AM -0700, Taylor Blau wrote:

> > A perl tangent if you're interested:
> [...]
> 
> To be clear, we ought to leave this function as:
> 
>   extract_haves () {
>     depacketize | perl -lne '/^(\S+) \.have/ and print $1'
>   }

Yes, I agree. You cannot do the "$@" there because it relies on
depacketize, which only handles stdin.

> Or are you suggesting that we change it to:
> 
>   extract_haves () {
>     perl -lne '/^(\S+) \.have/ and print $1'
>   }

No, sorry. I just used the ".have" snippet as filler text, but I see
that muddied my meaning considerably. This really was just a tangent for
the future. What you've written above is the best thing for this case.

> And call it as:
> 
>   printf "0000" | git receive-pack fork >actual &&
>   depacketize <actual >actual.packets
>   extract_haves <actual.packets >actual.haves &&
> 
> Frankly, (and I think that this is what you're getting at in your reply
> above), I think that the former (e.g., calling 'depacketize()' in
> 'extract_haves()') is cleaner. This approach leaves us with "actual" and
> "actual.haves", and obviates the need for another intermediary,
> "actual.packets".

Yeah. I have no problem with the three-liner you wrote above, but I do
not see any particular reason for it.

-Peff
Taylor Blau Sept. 28, 2018, 2:39 a.m. UTC | #17
On Wed, Sep 26, 2018 at 02:38:53PM -0400, Jeff King wrote:
> On Wed, Sep 26, 2018 at 06:39:56AM -0700, Taylor Blau wrote:
>
> > > A perl tangent if you're interested:
> > [...]
> >
> > To be clear, we ought to leave this function as:
> >
> >   extract_haves () {
> >     depacketize | perl -lne '/^(\S+) \.have/ and print $1'
> >   }
>
> Yes, I agree. You cannot do the "$@" there because it relies on
> depacketize, which only handles stdin.
>
> > Or are you suggesting that we change it to:
> >
> >   extract_haves () {
> >     perl -lne '/^(\S+) \.have/ and print $1'
> >   }
>
> No, sorry. I just used the ".have" snippet as filler text, but I see
> that muddied my meaning considerably. This really was just a tangent for
> the future. What you've written above is the best thing for this case.

I see, and I had assumed that you meant the later, not that including
" .have" was a good way to go forward. So I think that we're in
agreement here.

> > And call it as:
> >
> >   printf "0000" | git receive-pack fork >actual &&
> >   depacketize <actual >actual.packets
> >   extract_haves <actual.packets >actual.haves &&
> >
> > Frankly, (and I think that this is what you're getting at in your reply
> > above), I think that the former (e.g., calling 'depacketize()' in
> > 'extract_haves()') is cleaner. This approach leaves us with "actual" and
> > "actual.haves", and obviates the need for another intermediary,
> > "actual.packets".
>
> Yeah. I have no problem with the three-liner you wrote above, but I do
> not see any particular reason for it.

Good. That's the version that I'll send shortly, then.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 112041f407..526557e494 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -616,6 +616,17 @@  core.preferSymlinkRefs::
 	This is sometimes needed to work with old scripts that
 	expect HEAD to be a symbolic link.
 
+core.alternateRefsCommand::
+	When listing references from an alternate (e.g., in the case of ".have"), use
+	the shell to execute the specified command instead of
+	linkgit:git-for-each-ref[1]. The first argument is the path of the alternate.
+	Output must be of the form: `%(objectname) SPC %(refname)`.
++
+This is useful when a repository only wishes to advertise some of its
+alternate's references as ".have"'s. For example, to only advertise branch
+heads, configure `core.alternateRefsCommand` to the path of a script which runs
+`git --git-dir="$1" for-each-ref refs/heads`.
+
 core.bare::
 	If true this repository is assumed to be 'bare' and has no
 	working directory associated with it.  If this is the case a
diff --git a/t/t5410-receive-pack.sh b/t/t5410-receive-pack.sh
new file mode 100755
index 0000000000..2f21f1cb8f
--- /dev/null
+++ b/t/t5410-receive-pack.sh
@@ -0,0 +1,54 @@ 
+#!/bin/sh
+
+test_description='git receive-pack test'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	test_commit one &&
+	git update-ref refs/heads/a HEAD &&
+	test_commit two &&
+	git update-ref refs/heads/b HEAD &&
+	test_commit three &&
+	git update-ref refs/heads/c HEAD &&
+	git clone --bare . fork &&
+	git clone fork pusher &&
+	(
+		cd fork &&
+		git config receive.advertisealternates true &&
+		cat <<-EOF | git update-ref --stdin &&
+		delete refs/heads/a
+		delete refs/heads/b
+		delete refs/heads/c
+		delete refs/heads/master
+		delete refs/tags/one
+		delete refs/tags/two
+		delete refs/tags/three
+		EOF
+		echo "../../.git/objects" >objects/info/alternates
+	)
+'
+
+expect_haves () {
+	printf "%s .have\n" $(git rev-parse $@) >expect
+}
+
+extract_haves () {
+	depacketize - | grep '\.have' | sed -e 's/\\0.*$//g'
+}
+
+test_expect_success 'with core.alternateRefsCommand' '
+	write_script fork/alternate-refs <<-\EOF &&
+		git --git-dir="$1" for-each-ref \
+			--format="%(objectname) %(refname)" \
+			refs/heads/a \
+			refs/heads/c
+	EOF
+	test_config -C fork core.alternateRefsCommand alternate-refs &&
+	expect_haves a c >expect &&
+	printf "0000" | git receive-pack fork >actual &&
+	extract_haves <actual >actual.haves &&
+	test_cmp expect actual.haves
+'
+
+test_done
diff --git a/transport.c b/transport.c
index 24ae3f375d..e7d2cdf00b 100644
--- a/transport.c
+++ b/transport.c
@@ -1328,10 +1328,21 @@  char *transport_anonymize_url(const char *url)
 static void fill_alternate_refs_command(struct child_process *cmd,
 					const char *repo_path)
 {
-	cmd->git_cmd = 1;
-	argv_array_pushf(&cmd->args, "--git-dir=%s", repo_path);
-	argv_array_push(&cmd->args, "for-each-ref");
-	argv_array_push(&cmd->args, "--format=%(objectname) %(refname)");
+	const char *value;
+
+	if (!git_config_get_value("core.alternateRefsCommand", &value)) {
+		cmd->use_shell = 1;
+
+		argv_array_push(&cmd->args, value);
+		argv_array_push(&cmd->args, repo_path);
+	} else {
+		cmd->git_cmd = 1;
+
+		argv_array_pushf(&cmd->args, "--git-dir=%s", repo_path);
+		argv_array_push(&cmd->args, "for-each-ref");
+		argv_array_push(&cmd->args, "--format=%(objectname) %(refname)");
+	}
+
 	cmd->env = local_repo_env;
 	cmd->out = -1;
 }