diff mbox series

[3/5] refspec: output a refspec item

Message ID e10007e1cf8ff0005295f648b9489c11a9427122.1617627856.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Maintenance: adapt custom refspecs | expand

Commit Message

Derrick Stolee April 5, 2021, 1:04 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

Add a new method, refspec_item_format(), that takes a 'struct
refspec_item' pointer as input and returns a string for how that refspec
item should be written to Git's config or a subcommand, such as 'git
fetch'.

There are several subtleties regarding special-case refspecs that can
occur and are represented in t5511-refspec.sh. These cases will be
explored in new tests in the following change. It requires adding a new
test helper in order to test this format directly, so that is saved for
a separate change to keep this one focused on the logic of the format
method.

A future change will consume this method when translating refspecs in
the 'prefetch' task of the 'git maintenance' builtin.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 refspec.c | 25 +++++++++++++++++++++++++
 refspec.h |  5 +++++
 2 files changed, 30 insertions(+)

Comments

Tom Saeger April 5, 2021, 4:57 p.m. UTC | #1
On Mon, Apr 05, 2021 at 01:04:13PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> Add a new method, refspec_item_format(), that takes a 'struct
> refspec_item' pointer as input and returns a string for how that refspec
> item should be written to Git's config or a subcommand, such as 'git
> fetch'.
> 
> There are several subtleties regarding special-case refspecs that can
> occur and are represented in t5511-refspec.sh. These cases will be
> explored in new tests in the following change. It requires adding a new
> test helper in order to test this format directly, so that is saved for
> a separate change to keep this one focused on the logic of the format
> method.
> 
> A future change will consume this method when translating refspecs in
> the 'prefetch' task of the 'git maintenance' builtin.
> 
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  refspec.c | 25 +++++++++++++++++++++++++
>  refspec.h |  5 +++++
>  2 files changed, 30 insertions(+)
> 
> diff --git a/refspec.c b/refspec.c
> index e3d852c0bfec..ca65ba01bfe6 100644
> --- a/refspec.c
> +++ b/refspec.c
> @@ -180,6 +180,31 @@ void refspec_item_clear(struct refspec_item *item)
>  	item->exact_sha1 = 0;
>  }
>  
> +const char *refspec_item_format(const struct refspec_item *rsi)
> +{
> +	static struct strbuf buf = STRBUF_INIT;
> +
> +	strbuf_reset(&buf);

is this even needed?

> +
> +	if (rsi->matching)
> +		return ":";
> +
> +	if (rsi->negative)
> +		strbuf_addch(&buf, '^');
> +	else if (rsi->force)
> +		strbuf_addch(&buf, '+');
> +
> +	if (rsi->src)
> +		strbuf_addstr(&buf, rsi->src);
> +
> +	if (rsi->dst) {
> +		strbuf_addch(&buf, ':');
> +		strbuf_addstr(&buf, rsi->dst);
> +	}
> +
> +	return buf.buf;

should this be strbuf_detach?

> +}
> +
>  void refspec_init(struct refspec *rs, int fetch)
>  {
>  	memset(rs, 0, sizeof(*rs));
> diff --git a/refspec.h b/refspec.h
> index 8b79891d3218..92a312f5b4e6 100644
> --- a/refspec.h
> +++ b/refspec.h
> @@ -56,6 +56,11 @@ int refspec_item_init(struct refspec_item *item, const char *refspec,
>  void refspec_item_init_or_die(struct refspec_item *item, const char *refspec,
>  			      int fetch);
>  void refspec_item_clear(struct refspec_item *item);
> +/*
> + * Output a given refspec item to a string.
> + */
> +const char *refspec_item_format(const struct refspec_item *rsi);
> +
>  void refspec_init(struct refspec *rs, int fetch);
>  void refspec_append(struct refspec *rs, const char *refspec);
>  __attribute__((format (printf,2,3)))
> -- 
> gitgitgadget
>
Eric Sunshine April 5, 2021, 5:40 p.m. UTC | #2
On Mon, Apr 5, 2021 at 12:58 PM Tom Saeger <tom.saeger@oracle.com> wrote:
> On Mon, Apr 05, 2021 at 01:04:13PM +0000, Derrick Stolee via GitGitGadget wrote:
> > +const char *refspec_item_format(const struct refspec_item *rsi)
> > +{
> > +     static struct strbuf buf = STRBUF_INIT;
> > +
> > +     strbuf_reset(&buf);
>
> is this even needed?

This is needed due to the `static` strbuf declaration (which is easy
to overlook).

> > +     if (rsi->matching)
> > +             return ":";
> > +
> > +     if (rsi->negative)
> > +             strbuf_addch(&buf, '^');
> > +     else if (rsi->force)
> > +             strbuf_addch(&buf, '+');
> > +
> > +     if (rsi->src)
> > +             strbuf_addstr(&buf, rsi->src);
> > +
> > +     if (rsi->dst) {
> > +             strbuf_addch(&buf, ':');
> > +             strbuf_addstr(&buf, rsi->dst);
> > +     }
> > +
> > +     return buf.buf;
>
> should this be strbuf_detach?

In normal circumstances, yes, however, with the `static` strbuf, this
is correct.

However, a more significant question, perhaps, is why this is using a
`static` strbuf in the first place? Does this need to be optimized
because it is on a hot path? If not, then the only obvious reason why
`static` was chosen was that sometimes the function returns a string
literal and sometimes a constructed string. However, that's minor, and
it would feel cleaner to avoid the `static` strbuf altogether by using
strbuf_detach() for the constructed case and xstrdup() for the string
literal case, and making it the caller's responsibility to free the
result. (The comment in the header file would need to be updated to
say as much.)
Junio C Hamano April 5, 2021, 5:44 p.m. UTC | #3
Eric Sunshine <sunshine@sunshineco.com> writes:

> On Mon, Apr 5, 2021 at 12:58 PM Tom Saeger <tom.saeger@oracle.com> wrote:
>> On Mon, Apr 05, 2021 at 01:04:13PM +0000, Derrick Stolee via GitGitGadget wrote:
>> > +const char *refspec_item_format(const struct refspec_item *rsi)
>> > +{
>> > +     static struct strbuf buf = STRBUF_INIT;
>> > +
>> > +     strbuf_reset(&buf);
>>
>> is this even needed?
>
> This is needed due to the `static` strbuf declaration (which is easy
> to overlook).
>
>> > +     if (rsi->matching)
>> > +             return ":";
>> > +
>> > +     if (rsi->negative)
>> > +             strbuf_addch(&buf, '^');
>> > +     else if (rsi->force)
>> > +             strbuf_addch(&buf, '+');
>> > +
>> > +     if (rsi->src)
>> > +             strbuf_addstr(&buf, rsi->src);
>> > +
>> > +     if (rsi->dst) {
>> > +             strbuf_addch(&buf, ':');
>> > +             strbuf_addstr(&buf, rsi->dst);
>> > +     }
>> > +
>> > +     return buf.buf;
>>
>> should this be strbuf_detach?
>
> In normal circumstances, yes, however, with the `static` strbuf, this
> is correct.
>
> However, a more significant question, perhaps, is why this is using a
> `static` strbuf in the first place? Does this need to be optimized
> because it is on a hot path? If not, then the only obvious reason why
> `static` was chosen was that sometimes the function returns a string
> literal and sometimes a constructed string. However, that's minor, and
> it would feel cleaner to avoid the `static` strbuf altogether by using
> strbuf_detach() for the constructed case and xstrdup() for the string
> literal case, and making it the caller's responsibility to free the
> result. (The comment in the header file would need to be updated to
> say as much.)

Very good suggestion.  That would also make this codepath
thread-safe (I do not offhand know how important that is, though).

Thanks.
Derrick Stolee April 6, 2021, 11:21 a.m. UTC | #4
On 4/5/2021 1:44 PM, Junio C Hamano wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
> 
>> On Mon, Apr 5, 2021 at 12:58 PM Tom Saeger <tom.saeger@oracle.com> wrote:
>>> On Mon, Apr 05, 2021 at 01:04:13PM +0000, Derrick Stolee via GitGitGadget wrote:
>>>> +const char *refspec_item_format(const struct refspec_item *rsi)
>>>> +{
>>>> +     static struct strbuf buf = STRBUF_INIT;
>>>> +
>>>> +     strbuf_reset(&buf);
>>>
>>> is this even needed?
>>
>> This is needed due to the `static` strbuf declaration (which is easy
>> to overlook).
>>
>>>> +     if (rsi->matching)
>>>> +             return ":";
>>>> +
>>>> +     if (rsi->negative)
>>>> +             strbuf_addch(&buf, '^');
>>>> +     else if (rsi->force)
>>>> +             strbuf_addch(&buf, '+');
>>>> +
>>>> +     if (rsi->src)
>>>> +             strbuf_addstr(&buf, rsi->src);
>>>> +
>>>> +     if (rsi->dst) {
>>>> +             strbuf_addch(&buf, ':');
>>>> +             strbuf_addstr(&buf, rsi->dst);
>>>> +     }
>>>> +
>>>> +     return buf.buf;
>>>
>>> should this be strbuf_detach?
>>
>> In normal circumstances, yes, however, with the `static` strbuf, this
>> is correct.
>>
>> However, a more significant question, perhaps, is why this is using a
>> `static` strbuf in the first place? Does this need to be optimized
>> because it is on a hot path? If not, then the only obvious reason why
>> `static` was chosen was that sometimes the function returns a string
>> literal and sometimes a constructed string. However, that's minor, and
>> it would feel cleaner to avoid the `static` strbuf altogether by using
>> strbuf_detach() for the constructed case and xstrdup() for the string
>> literal case, and making it the caller's responsibility to free the
>> result. (The comment in the header file would need to be updated to
>> say as much.)

Yes, we could get around the return of ":" very easily.

> Very good suggestion.  That would also make this codepath
> thread-safe (I do not offhand know how important that is, though).

I was not intending to make this re-entrant/thread safe. The intention
was to make it easy to consume the formatted string into output such
as a printf without needing to store a temporary 'char *' and free() it
afterwards. This ensures that the only lost memory over the life of the
process is at most one buffer. At minimum, these are things that could
be part of the message to justify this design.

So, I'm torn. This seems like a case where there is value in having
the return buffer be "owned" by this method, and the expected
consumers will use the buffer before calling it again. I'm not sure
how important it is to do this the other way.

Would it be sufficient to justify this choice in the commit message
and comment about it in the method declaration? Or is it worth adding
this templating around every caller:

	char *buf = refspec_item_format(rsi);
	...
	<use 'buf'>
	...
	free(buf);

I don't need much convincing to do this, but I hadn't properly
described my opinion before. Just a small nudge would convince me to
do it this way.

Thanks,
-Stolee
Eric Sunshine April 6, 2021, 3:23 p.m. UTC | #5
On Tue, Apr 6, 2021 at 7:21 AM Derrick Stolee <stolee@gmail.com> wrote:
> I was not intending to make this re-entrant/thread safe. The intention
> was to make it easy to consume the formatted string into output such
> as a printf without needing to store a temporary 'char *' and free() it
> afterwards. This ensures that the only lost memory over the life of the
> process is at most one buffer. At minimum, these are things that could
> be part of the message to justify this design.

This has the failing that it won't work if someone calls it twice in
the same printf() or calls it again before even consuming the first
returned value, so this fails:

    printf("foo: %s\nbar: %s\n",
        refspec_item_format(...),
        refspec_item_format(...));

as does this:

    const char *a = refspec_item_format(...);
    const char *b = refspec_item_format(...);

Historically this project would "work around" that problem by using
rotating static buffers in the function, but we've mostly been moving
away from that for several reasons (can't predict how many buffers
will be needed, re-entrancy, etc.).

> So, I'm torn. This seems like a case where there is value in having
> the return buffer be "owned" by this method, and the expected
> consumers will use the buffer before calling it again. I'm not sure
> how important it is to do this the other way.

If history is any indication, we'd probably end up moving away from
such an API eventually anyhow.

> Would it be sufficient to justify this choice in the commit message
> and comment about it in the method declaration? Or is it worth adding
> this templating around every caller:
>
>         char *buf = refspec_item_format(rsi);
>         ...
>         <use 'buf'>
>         ...
>         free(buf);

An alternative would be to have the caller pass in a strbuf to be
populated by the function. It doesn't reduce the boilerplate needed by
the caller (still need to create and release the strbuf), but may
avoid some memory allocations. But if this isn't a critical path and
won't likely ever be, then passing in strbuf may be overkill.

> I don't need much convincing to do this, but I hadn't properly
> described my opinion before. Just a small nudge would convince me to
> do it this way.

For the reasons described above and earlier in the thread, avoiding
the static buffer seems the best course of action.
Derrick Stolee April 6, 2021, 4:51 p.m. UTC | #6
On 4/6/2021 11:23 AM, Eric Sunshine wrote:
> On Tue, Apr 6, 2021 at 7:21 AM Derrick Stolee <stolee@gmail.com> wrote:
>> I was not intending to make this re-entrant/thread safe. The intention
>> was to make it easy to consume the formatted string into output such
>> as a printf without needing to store a temporary 'char *' and free() it
>> afterwards. This ensures that the only lost memory over the life of the
>> process is at most one buffer. At minimum, these are things that could
>> be part of the message to justify this design.
> 
> This has the failing that it won't work if someone calls it twice in
> the same printf() or calls it again before even consuming the first
> returned value, so this fails:
> 
>     printf("foo: %s\nbar: %s\n",
>         refspec_item_format(...),
>         refspec_item_format(...));
> 
> as does this:
> 
>     const char *a = refspec_item_format(...);
>     const char *b = refspec_item_format(...);
> 
> Historically this project would "work around" that problem by using
> rotating static buffers in the function, but we've mostly been moving
> away from that for several reasons (can't predict how many buffers
> will be needed, re-entrancy, etc.).
> 
>> So, I'm torn. This seems like a case where there is value in having
>> the return buffer be "owned" by this method, and the expected
>> consumers will use the buffer before calling it again. I'm not sure
>> how important it is to do this the other way.
> 
> If history is any indication, we'd probably end up moving away from
> such an API eventually anyhow.
> 
>> Would it be sufficient to justify this choice in the commit message
>> and comment about it in the method declaration? Or is it worth adding
>> this templating around every caller:
>>
>>         char *buf = refspec_item_format(rsi);
>>         ...
>>         <use 'buf'>
>>         ...
>>         free(buf);
> 
> An alternative would be to have the caller pass in a strbuf to be
> populated by the function. It doesn't reduce the boilerplate needed by
> the caller (still need to create and release the strbuf), but may
> avoid some memory allocations. But if this isn't a critical path and
> won't likely ever be, then passing in strbuf may be overkill.
> 
>> I don't need much convincing to do this, but I hadn't properly
>> described my opinion before. Just a small nudge would convince me to
>> do it this way.
> 
> For the reasons described above and earlier in the thread, avoiding
> the static buffer seems the best course of action.

OK, convinced. I'll return a string that must be freed in my
next version. Thanks!

-Stolee
Ævar Arnfjörð Bjarmason April 7, 2021, 8:46 a.m. UTC | #7
On Mon, Apr 05 2021, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> Add a new method, refspec_item_format(), that takes a 'struct
> refspec_item' pointer as input and returns a string for how that refspec
> item should be written to Git's config or a subcommand, such as 'git
> fetch'.
>
> There are several subtleties regarding special-case refspecs that can
> occur and are represented in t5511-refspec.sh. These cases will be
> explored in new tests in the following change. It requires adding a new
> test helper in order to test this format directly, so that is saved for
> a separate change to keep this one focused on the logic of the format
> method.
>
> A future change will consume this method when translating refspecs in
> the 'prefetch' task of the 'git maintenance' builtin.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  refspec.c | 25 +++++++++++++++++++++++++
>  refspec.h |  5 +++++
>  2 files changed, 30 insertions(+)
>
> diff --git a/refspec.c b/refspec.c
> index e3d852c0bfec..ca65ba01bfe6 100644
> --- a/refspec.c
> +++ b/refspec.c
> @@ -180,6 +180,31 @@ void refspec_item_clear(struct refspec_item *item)
>  	item->exact_sha1 = 0;
>  }
>  
> +const char *refspec_item_format(const struct refspec_item *rsi)
> +{
> +	static struct strbuf buf = STRBUF_INIT;
> +
> +	strbuf_reset(&buf);
> +
> +	if (rsi->matching)
> +		return ":";
> +
> +	if (rsi->negative)
> +		strbuf_addch(&buf, '^');
> +	else if (rsi->force)
> +		strbuf_addch(&buf, '+');
> +
> +	if (rsi->src)
> +		strbuf_addstr(&buf, rsi->src);
> +
> +	if (rsi->dst) {
> +		strbuf_addch(&buf, ':');
> +		strbuf_addstr(&buf, rsi->dst);
> +	}
> +
> +	return buf.buf;

There's a downthread discussion about the strbuf usage here so that's
covered.

But I'm still confused about the need for this function and the
following two patches. If we apply this on top of your series:
    
    diff --git a/t/helper/test-refspec.c b/t/helper/test-refspec.c
    index 08cf441a0a0..9e099e43ebf 100644
    --- a/t/helper/test-refspec.c
    +++ b/t/helper/test-refspec.c
    @@ -31,7 +31,7 @@ int cmd__refspec(int argc, const char **argv)
                            continue;
                    }
    
    -               printf("%s\n", refspec_item_format(&rsi));
    +               puts(line.buf);
                    refspec_item_clear(&rsi);
            }

The only failing test is:
    
    + diff -u expect output
    --- expect      2021-04-07 08:12:05.577598038 +0000
    +++ output      2021-04-07 08:12:05.577598038 +0000
    @@ -11,5 +11,5 @@
     refs/heads*/for-linus:refs/remotes/mine/*
     2e36527f23b7f6ae15e6f21ac3b08bf3fed6ee48:refs/heads/fixed
     HEAD
    -HEAD
    +@
     :

Other than that applying this on top makes everything pass:
    
    diff --git a/builtin/gc.c b/builtin/gc.c
    index 92cb8b4e0bf..ea5572d15c5 100644
    --- a/builtin/gc.c
    +++ b/builtin/gc.c
    @@ -889,35 +889,21 @@ static int fetch_remote(struct remote *remote, void *cbdata)
     		strvec_push(&child.args, "--quiet");
     
     	for (i = 0; i < remote->fetch.nr; i++) {
    -		struct refspec_item replace;
     		struct refspec_item *rsi = &remote->fetch.items[i];
    -		struct strbuf new_dst = STRBUF_INIT;
    -		size_t ignore_len = 0;
    +		struct strbuf new_spec = STRBUF_INIT;
    +		char *pos;
     
     		if (rsi->negative) {
     			strvec_push(&child.args, remote->fetch.raw[i]);
     			continue;
     		}
     
    -		refspec_item_init(&replace, remote->fetch.raw[i], 1);
    -
    -		/*
    -		 * If a refspec dst starts with "refs/" at the start,
    -		 * then we will replace "refs/" with "refs/prefetch/".
    -		 * Otherwise, we will prepend the dst string with
    -		 * "refs/prefetch/".
    -		 */
    -		if (!strncmp(replace.dst, "refs/", 5))
    -			ignore_len = 5;
    -
    -		strbuf_addstr(&new_dst, "refs/prefetch/");
    -		strbuf_addstr(&new_dst, replace.dst + ignore_len);
    -		free(replace.dst);
    -		replace.dst = strbuf_detach(&new_dst, NULL);
    -
    -		strvec_push(&child.args, refspec_item_format(&replace));
    -
    -		refspec_item_clear(&replace);
    +		strbuf_addstr(&new_spec, remote->fetch.raw[i]);
    +		if ((pos = strrchr(new_spec.buf, ':')) != NULL)
    +			strbuf_splice(&new_spec, pos - new_spec.buf + 1, sizeof("refs/") - 1,
    +				      "refs/prefetch/", sizeof("refs/prefetch/") - 1);
    +		strvec_push(&child.args, new_spec.buf);
    +		strbuf_release(&new_spec);
     	}
     
     	return !!run_command(&child);

So the purpose of this new API is that we don't want to make the
assumption that strrchr(buf, ':') is a safe way to find the delimiter in
the refspec, or is there some case where we grok "HEAD" but not "@"
that's buggy, but not tested for in this series?
Derrick Stolee April 7, 2021, 8:53 p.m. UTC | #8
On 4/7/2021 4:46 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Apr 05 2021, Derrick Stolee via GitGitGadget wrote:
>> +	return buf.buf;
> 
> There's a downthread discussion about the strbuf usage here so that's
> covered.

And it's fixed in v2.

> But I'm still confused about the need for this function and the
> following two patches. If we apply this on top of your series:
>     
>     diff --git a/t/helper/test-refspec.c b/t/helper/test-refspec.c
>     index 08cf441a0a0..9e099e43ebf 100644
>     --- a/t/helper/test-refspec.c
>     +++ b/t/helper/test-refspec.c
>     @@ -31,7 +31,7 @@ int cmd__refspec(int argc, const char **argv)
>                             continue;
>                     }
>     
>     -               printf("%s\n", refspec_item_format(&rsi));
>     +               puts(line.buf);
>                     refspec_item_clear(&rsi);
>             }
> 
> The only failing test is:
>     
>     + diff -u expect output
>     --- expect      2021-04-07 08:12:05.577598038 +0000
>     +++ output      2021-04-07 08:12:05.577598038 +0000
>     @@ -11,5 +11,5 @@
>      refs/heads*/for-linus:refs/remotes/mine/*
>      2e36527f23b7f6ae15e6f21ac3b08bf3fed6ee48:refs/heads/fixed
>      HEAD
>     -HEAD
>     +@
>      :

It should be obvious that taking refspecs as input, parsing them,
then reformatting them for output should be almost equivalent to
printing the input line.

The point is to exercise the logic that actually formats the
refspec for output. The test-tool clearly does this.

The logic for converting a 'struct refspec_item' to a string is
non-trivial and worth testing. I don't understand why you are
concerned that the black-box of the test-tool could be done
more easily to "trick" the test script.

> So the purpose of this new API is that we don't want to make the
> assumption that strrchr(buf, ':') is a safe way to find the delimiter in
> the refspec, or is there some case where we grok "HEAD" but not "@"
> that's buggy, but not tested for in this series?

The purpose is to allow us to modify a 'struct refspec_item' andproduce a refspec string instead of munging a refspec string
directly.

Thanks,
-Stolee
Ævar Arnfjörð Bjarmason April 7, 2021, 10:05 p.m. UTC | #9
On Wed, Apr 07 2021, Derrick Stolee wrote:

> On 4/7/2021 4:46 AM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Mon, Apr 05 2021, Derrick Stolee via GitGitGadget wrote:
>>> +	return buf.buf;
>> 
>> There's a downthread discussion about the strbuf usage here so that's
>> covered.
>
> And it's fixed in v2.
>
>> But I'm still confused about the need for this function and the
>> following two patches. If we apply this on top of your series:
>>     
>>     diff --git a/t/helper/test-refspec.c b/t/helper/test-refspec.c
>>     index 08cf441a0a0..9e099e43ebf 100644
>>     --- a/t/helper/test-refspec.c
>>     +++ b/t/helper/test-refspec.c
>>     @@ -31,7 +31,7 @@ int cmd__refspec(int argc, const char **argv)
>>                             continue;
>>                     }
>>     
>>     -               printf("%s\n", refspec_item_format(&rsi));
>>     +               puts(line.buf);
>>                     refspec_item_clear(&rsi);
>>             }
>> 
>> The only failing test is:
>>     
>>     + diff -u expect output
>>     --- expect      2021-04-07 08:12:05.577598038 +0000
>>     +++ output      2021-04-07 08:12:05.577598038 +0000
>>     @@ -11,5 +11,5 @@
>>      refs/heads*/for-linus:refs/remotes/mine/*
>>      2e36527f23b7f6ae15e6f21ac3b08bf3fed6ee48:refs/heads/fixed
>>      HEAD
>>     -HEAD
>>     +@
>>      :
>
> It should be obvious that taking refspecs as input, parsing them,
> then reformatting them for output should be almost equivalent to
> printing the input line.
>
> The point is to exercise the logic that actually formats the
> refspec for output. The test-tool clearly does this.
>
> The logic for converting a 'struct refspec_item' to a string is
> non-trivial and worth testing. I don't understand why you are
> concerned that the black-box of the test-tool could be done
> more easily to "trick" the test script.

Yes, but why do we need to convert it to a struct refspec_item in the
first place?

Maybe I'm just overly comfortable with string munging but I think the
smaller patch-on-top to use strbuf_splice() is simpler than adding a new
API just for this use-case.

But I'm still wondering if that @ v.s. HEAD case is something this
series actually needs in its end goal (but then has a missing test?), or
if it was just a "let's test the guts of the refspec.c while we're at
it".

>> So the purpose of this new API is that we don't want to make the
>> assumption that strrchr(buf, ':') is a safe way to find the delimiter in
>> the refspec, or is there some case where we grok "HEAD" but not "@"
>> that's buggy, but not tested for in this series?
>
> The purpose is to allow us to modify a 'struct refspec_item' andproduce a refspec string instead of munging a refspec string
> directly.

But aren't we doing that all over the place, e.g. the grep results for
"refspec_appendf". Even for things purely constructed on the C API level
we pass a const char* now.

I'm not saying it wouldn't be nice to have the refspec.c API changed to
have a clear delimitation between its const char* handling, and C-level
uses which could construct and pass a "struct refspec_item" instead.

But is it *needed* here in a way that I've missed, or is this just a
partial testing/refactoring of that API while we're at it?

[Guessing ahead here because of our TZ difference]:

It seems to me that if this is such a partial refactoring it's a strange
way to go about it.

We're left with freeing/munging the "struct refspec" src/dst in-place
and re-constructing a string that has "+" etc., but we already had that
data in parse_refspec() just before we'd call
refspec_item_format(). That function could then just spew out a
pre-formatted string we'd squirreled away in "struct refspec_item".

If the lengthy paragraph you have at the end of 4/5 holds true, then
such an internal representation doesn't need to have the "refs/" prefix
stores as a const char* (in cases where it's not just "*" or whatever),
no?. We'd then be able to more easily init/copy/munge the refspec for
formatting.
Junio C Hamano April 7, 2021, 10:49 p.m. UTC | #10
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Wed, Apr 07 2021, Derrick Stolee wrote:
>
>> The purpose is to allow us to modify a 'struct refspec_item'
>> andproduce a refspec string instead of munging a refspec string
>> directly.

Ouch.  I thought the goal was to take

    [remote "origin"]
	fetch = $src:$dst

let the code that is used in the actual fetching to parse it into
the in-core "refspec_item", and then transform the refspec_item by

 - discarding it if the item does not result in storing in the real
   fetch

 - tweaking $dst side so that it won't touch anywhere outside
   refs/prefetch/ to avoid disturbing end-user's notion of what the
   latest state of the remote ref is.

so that the "parsed" refspec_item is passed to the fetch machinery
without ever having to be converted back to textual form.

Why do we even need to "andproduce a refspec string"?
Ævar Arnfjörð Bjarmason April 7, 2021, 11:01 p.m. UTC | #11
On Thu, Apr 08 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> On Wed, Apr 07 2021, Derrick Stolee wrote:
>>
>>> The purpose is to allow us to modify a 'struct refspec_item'
>>> andproduce a refspec string instead of munging a refspec string
>>> directly.
>
> Ouch.  I thought the goal was to take
>
>     [remote "origin"]
> 	fetch = $src:$dst
>
> let the code that is used in the actual fetching to parse it into
> the in-core "refspec_item", and then transform the refspec_item by
>
>  - discarding it if the item does not result in storing in the real
>    fetch
>
>  - tweaking $dst side so that it won't touch anywhere outside
>    refs/prefetch/ to avoid disturbing end-user's notion of what the
>    latest state of the remote ref is.
>
> so that the "parsed" refspec_item is passed to the fetch machinery
> without ever having to be converted back to textual form.
>
> Why do we even need to "andproduce a refspec string"?

We're shelling out to "git fetch", but we first munge the refspec on in
"git gc".

But yes, it seems even more straightforward to do away with passing the
refspec at all to "git fetch", and instead pass some (maybe internal
only, and documented as such) "--refspec-dst-prefix=refs/prefetch/"
option to "git fetch".

I.e. get_ref_map() over there is already doing a version of this loop
over "remote->fetch.nr".

So instead of "git gc" doing the loop, then passing all the refspecs on
the command-line, it could tell "git fetch" to do the same munging when
it does the same iteration.

Doing the munging in builtin/gc.c's fetch_remote() just seems like a
relic from when we didn't care what decision builtin/fetch.c made about
refspecs, we always wanted our custom one.
Junio C Hamano April 8, 2021, 7:33 a.m. UTC | #12
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> But yes, it seems even more straightforward to do away with passing the
> refspec at all to "git fetch", and instead pass some (maybe internal
> only, and documented as such) "--refspec-dst-prefix=refs/prefetch/"
> option to "git fetch".
>
> I.e. get_ref_map() over there is already doing a version of this loop
> over "remote->fetch.nr".
>
> So instead of "git gc" doing the loop, then passing all the refspecs on
> the command-line, it could tell "git fetch" to do the same munging when
> it does the same iteration.

That direction makes quite a lot of sense to me.

> Doing the munging in builtin/gc.c's fetch_remote() just seems like a
> relic from when we didn't care what decision builtin/fetch.c made about
> refspecs, we always wanted our custom one.
diff mbox series

Patch

diff --git a/refspec.c b/refspec.c
index e3d852c0bfec..ca65ba01bfe6 100644
--- a/refspec.c
+++ b/refspec.c
@@ -180,6 +180,31 @@  void refspec_item_clear(struct refspec_item *item)
 	item->exact_sha1 = 0;
 }
 
+const char *refspec_item_format(const struct refspec_item *rsi)
+{
+	static struct strbuf buf = STRBUF_INIT;
+
+	strbuf_reset(&buf);
+
+	if (rsi->matching)
+		return ":";
+
+	if (rsi->negative)
+		strbuf_addch(&buf, '^');
+	else if (rsi->force)
+		strbuf_addch(&buf, '+');
+
+	if (rsi->src)
+		strbuf_addstr(&buf, rsi->src);
+
+	if (rsi->dst) {
+		strbuf_addch(&buf, ':');
+		strbuf_addstr(&buf, rsi->dst);
+	}
+
+	return buf.buf;
+}
+
 void refspec_init(struct refspec *rs, int fetch)
 {
 	memset(rs, 0, sizeof(*rs));
diff --git a/refspec.h b/refspec.h
index 8b79891d3218..92a312f5b4e6 100644
--- a/refspec.h
+++ b/refspec.h
@@ -56,6 +56,11 @@  int refspec_item_init(struct refspec_item *item, const char *refspec,
 void refspec_item_init_or_die(struct refspec_item *item, const char *refspec,
 			      int fetch);
 void refspec_item_clear(struct refspec_item *item);
+/*
+ * Output a given refspec item to a string.
+ */
+const char *refspec_item_format(const struct refspec_item *rsi);
+
 void refspec_init(struct refspec *rs, int fetch);
 void refspec_append(struct refspec *rs, const char *refspec);
 __attribute__((format (printf,2,3)))