diff mbox series

[v4,03/10] trailer: teach iterator about non-trailer lines

Message ID 4aeb48050b14e44ec65cfa651a4d98587a6cd860.1714625668.git.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit 3be65e6ee2f585a0aad0363c8ce7d966a6f8c2b3
Headers show
Series Make trailer_info struct private (plus sequencer cleanup) | expand

Commit Message

Linus Arver May 2, 2024, 4:54 a.m. UTC
From: Linus Arver <linus@ucla.edu>

Previously the iterator did not iterate over non-trailer lines. This was
somewhat unfortunate, because trailer blocks could have non-trailer
lines in them since 146245063e (trailer: allow non-trailers in trailer
block, 2016-10-21), which was before the iterator was created in
f0939a0eb1 (trailer: add interface for iterating over commit trailers,
2020-09-27).

So if trailer API users wanted to iterate over all lines in a trailer
block (including non-trailer lines), they could not use the iterator and
were forced to use the lower-level trailer_info struct directly (which
provides a raw string array that includes all lines in the trailer
block).

Change the iterator's behavior so that we also iterate over non-trailer
lines, instead of skipping over them. The new "raw" member of the
iterator allows API users to access previously inaccessible non-trailer
lines. Reword the variable "trailer" to just "line" because this
variable can now hold both trailer lines _and_ non-trailer lines.

The new "raw" member is important because anyone currently not using the
iterator is using trailer_info's raw string array directly to access
lines to check what the combined key + value looks like. If we didn't
provide a "raw" member here, iterator users would have to re-construct
the unparsed line by concatenating the key and value back together again
--- which places an undue burden for iterator users.

The next commit demonstrates the use of the iterator in sequencer.c as an
example of where "raw" will be useful, so that it can start using the
iterator.

For the existing use of the iterator in builtin/shortlog.c, we don't
have to change the code there because that code does

    trailer_iterator_init(&iter, body);
    while (trailer_iterator_advance(&iter)) {
        const char *value = iter.val.buf;

        if (!string_list_has_string(&log->trailers, iter.key.buf))
            continue;

        ...

and the

        if (!string_list_has_string(&log->trailers, iter.key.buf))

condition already skips over non-trailer lines (iter.key.buf is empty
for non-trailer lines, making the comparison still work even with this
commit).

Rename "num_expected_trailers" to "num_expected" in
t/unit-tests/t-trailer.c because the items we iterate over now include
non-trailer lines.

Signed-off-by: Linus Arver <linus@ucla.edu>
---
 t/unit-tests/t-trailer.c | 16 +++++++++++-----
 trailer.c                | 12 +++++-------
 trailer.h                |  7 +++++++
 3 files changed, 23 insertions(+), 12 deletions(-)

Comments

Phillip Wood May 4, 2024, 3:33 p.m. UTC | #1
Hi Linus

Sorry I'm late to the party here I've left a couple of thoughts below 
but I don't want to derail this series if everyone else is happy.

On 02/05/2024 05:54, Linus Arver via GitGitGadget wrote:
> From: Linus Arver <linus@ucla.edu>
> 
> Previously the iterator did not iterate over non-trailer lines. This was
> somewhat unfortunate, because trailer blocks could have non-trailer
> lines in them since 146245063e (trailer: allow non-trailers in trailer
> block, 2016-10-21), which was before the iterator was created in
> f0939a0eb1 (trailer: add interface for iterating over commit trailers,
> 2020-09-27).
> 
> So if trailer API users wanted to iterate over all lines in a trailer
> block (including non-trailer lines), they could not use the iterator and
> were forced to use the lower-level trailer_info struct directly (which
> provides a raw string array that includes all lines in the trailer
> block).
> 
> Change the iterator's behavior so that we also iterate over non-trailer
> lines, instead of skipping over them. The new "raw" member of the
> iterator allows API users to access previously inaccessible non-trailer
> lines. Reword the variable "trailer" to just "line" because this
> variable can now hold both trailer lines _and_ non-trailer lines.
> 
> The new "raw" member is important because anyone currently not using the
> iterator is using trailer_info's raw string array directly to access
> lines to check what the combined key + value looks like. If we didn't
> provide a "raw" member here, iterator users would have to re-construct
> the unparsed line by concatenating the key and value back together again
> --- which places an undue burden for iterator users.

Comparing the raw line is error prone as it ignores custom separators 
and variations in the amount of space between the key and the value. 
Therefore I'd argue that the sequencer should in fact be comparing the 
trailer key and value separately rather than comparing the whole line. 
There is an issue that we want to add a new Signed-off-by: trailer for 
"C.O. Mitter" when the trailers look like

	Signed-off-by: C.O. Mitter <c.o.mitter@example.com>
	non-trailer-line

but not when they look like

	Signed-off-by: C.O. Mitter <c.o.mitter@example.com>

so we still need some way of indicating that there was a non-trailer 
line after the last trailer though.

> The next commit demonstrates the use of the iterator in sequencer.c as an
> example of where "raw" will be useful, so that it can start using the
> iterator.
> 
> For the existing use of the iterator in builtin/shortlog.c, we don't
> have to change the code there because that code does

An interface that lets the caller pass a flag if they want to know about 
non-trailer lines might be easier to use for the callers that don't want 
to worry about such lines and wouldn't need a justification as to why it 
was safe for existing callers.

Best Wishes

Phillip

>      trailer_iterator_init(&iter, body);
>      while (trailer_iterator_advance(&iter)) {
>          const char *value = iter.val.buf;
> 
>          if (!string_list_has_string(&log->trailers, iter.key.buf))
>              continue;
> 
>          ...
> 
> and the
> 
>          if (!string_list_has_string(&log->trailers, iter.key.buf))
> 
> condition already skips over non-trailer lines (iter.key.buf is empty
> for non-trailer lines, making the comparison still work even with this
> commit).
> 
> Rename "num_expected_trailers" to "num_expected" in
> t/unit-tests/t-trailer.c because the items we iterate over now include
> non-trailer lines.
> 
> Signed-off-by: Linus Arver <linus@ucla.edu>
> ---
>   t/unit-tests/t-trailer.c | 16 +++++++++++-----
>   trailer.c                | 12 +++++-------
>   trailer.h                |  7 +++++++
>   3 files changed, 23 insertions(+), 12 deletions(-)
> 
> diff --git a/t/unit-tests/t-trailer.c b/t/unit-tests/t-trailer.c
> index c1f897235c7..4f640d2a4b8 100644
> --- a/t/unit-tests/t-trailer.c
> +++ b/t/unit-tests/t-trailer.c
> @@ -1,7 +1,7 @@
>   #include "test-lib.h"
>   #include "trailer.h"
>   
> -static void t_trailer_iterator(const char *msg, size_t num_expected_trailers)
> +static void t_trailer_iterator(const char *msg, size_t num_expected)
>   {
>   	struct trailer_iterator iter;
>   	size_t i = 0;
> @@ -11,7 +11,7 @@ static void t_trailer_iterator(const char *msg, size_t num_expected_trailers)
>   		i++;
>   	trailer_iterator_release(&iter);
>   
> -	check_uint(i, ==, num_expected_trailers);
> +	check_uint(i, ==, num_expected);
>   }
>   
>   static void run_t_trailer_iterator(void)
> @@ -19,7 +19,7 @@ static void run_t_trailer_iterator(void)
>   	static struct test_cases {
>   		const char *name;
>   		const char *msg;
> -		size_t num_expected_trailers;
> +		size_t num_expected;
>   	} tc[] = {
>   		{
>   			"empty input",
> @@ -119,7 +119,13 @@ static void run_t_trailer_iterator(void)
>   			"not a trailer line\n"
>   			"not a trailer line\n"
>   			"Signed-off-by: x\n",
> -			1
> +			/*
> +			 * Even though there is only really 1 real "trailer"
> +			 * (Signed-off-by), we still have 4 trailer objects
> +			 * because we still want to iterate through the entire
> +			 * block.
> +			 */
> +			4
>   		},
>   		{
>   			"with non-trailer lines (one too many) in trailer block",
> @@ -162,7 +168,7 @@ static void run_t_trailer_iterator(void)
>   
>   	for (int i = 0; i < sizeof(tc) / sizeof(tc[0]); i++) {
>   		TEST(t_trailer_iterator(tc[i].msg,
> -					tc[i].num_expected_trailers),
> +					tc[i].num_expected),
>   		     "%s", tc[i].name);
>   	}
>   }
> diff --git a/trailer.c b/trailer.c
> index 3e4dab9c065..4700c441442 100644
> --- a/trailer.c
> +++ b/trailer.c
> @@ -1146,17 +1146,15 @@ void trailer_iterator_init(struct trailer_iterator *iter, const char *msg)
>   
>   int trailer_iterator_advance(struct trailer_iterator *iter)
>   {
> -	while (iter->internal.cur < iter->internal.info.trailer_nr) {
> -		char *trailer = iter->internal.info.trailers[iter->internal.cur++];
> -		int separator_pos = find_separator(trailer, separators);
> -
> -		if (separator_pos < 1)
> -			continue; /* not a real trailer */
> +	if (iter->internal.cur < iter->internal.info.trailer_nr) {
> +		char *line = iter->internal.info.trailers[iter->internal.cur++];
> +		int separator_pos = find_separator(line, separators);
>   
> +		iter->raw = line;
>   		strbuf_reset(&iter->key);
>   		strbuf_reset(&iter->val);
>   		parse_trailer(&iter->key, &iter->val, NULL,
> -			      trailer, separator_pos);
> +			      line, separator_pos);
>   		/* Always unfold values during iteration. */
>   		unfold_value(&iter->val);
>   		return 1;
> diff --git a/trailer.h b/trailer.h
> index 9f42aa75994..7e36da7d13c 100644
> --- a/trailer.h
> +++ b/trailer.h
> @@ -125,6 +125,13 @@ void format_trailers_from_commit(const struct process_trailer_options *,
>    *   trailer_iterator_release(&iter);
>    */
>   struct trailer_iterator {
> +	/*
> +	 * Raw line (e.g., "foo: bar baz") before being parsed as a trailer
> +	 * key/val pair as part of a trailer block (as the "key" and "val"
> +	 * fields below). If a line fails to parse as a trailer, then the "key"
> +	 * will be the entire line and "val" will be the empty string.
> +	 */
> +	const char *raw;
>   	struct strbuf key;
>   	struct strbuf val;
>
Linus Arver May 5, 2024, 1:37 a.m. UTC | #2
Phillip Wood <phillip.wood123@gmail.com> writes:

> Hi Linus
>
> Sorry I'm late to the party here I've left a couple of thoughts below
> but I don't want to derail this series if everyone else is happy.

Hi Phillip, no problem.

> On 02/05/2024 05:54, Linus Arver via GitGitGadget wrote:
>> From: Linus Arver <linus@ucla.edu>
>>
>> Previously the iterator did not iterate over non-trailer lines. This was
>> somewhat unfortunate, because trailer blocks could have non-trailer
>> lines in them since 146245063e (trailer: allow non-trailers in trailer
>> block, 2016-10-21), which was before the iterator was created in
>> f0939a0eb1 (trailer: add interface for iterating over commit trailers,
>> 2020-09-27).
>>
>> So if trailer API users wanted to iterate over all lines in a trailer
>> block (including non-trailer lines), they could not use the iterator and
>> were forced to use the lower-level trailer_info struct directly (which
>> provides a raw string array that includes all lines in the trailer
>> block).
>>
>> Change the iterator's behavior so that we also iterate over non-trailer
>> lines, instead of skipping over them. The new "raw" member of the
>> iterator allows API users to access previously inaccessible non-trailer
>> lines. Reword the variable "trailer" to just "line" because this
>> variable can now hold both trailer lines _and_ non-trailer lines.
>>
>> The new "raw" member is important because anyone currently not using the
>> iterator is using trailer_info's raw string array directly to access
>> lines to check what the combined key + value looks like. If we didn't
>> provide a "raw" member here, iterator users would have to re-construct
>> the unparsed line by concatenating the key and value back together again
>> --- which places an undue burden for iterator users.
>
> Comparing the raw line is error prone as it ignores custom separators
> and variations in the amount of space between the key and the value.
> Therefore I'd argue that the sequencer should in fact be comparing the
> trailer key and value separately rather than comparing the whole line.

I agree, but that is likely beyond the scope of this series as the
behavior of comparing the whole line was preserved (not introduced) by
this series.

For reference, the "Signed-off-by: " is hardcoded in "sign_off_header"
in sequencer.c, and it is again hardcoded in "git_generated_prefixes" in
trailer.c. We always use the hardcoded key and colon ":" separator in a
few areas, so changing the code to be more precise to check for only the
key (to account for variability in the separator and space around it as
you pointed out) would be a more involved change (I think many tests
would need to be updated).

> There is an issue that we want to add a new Signed-off-by: trailer for
> "C.O. Mitter" when the trailers look like
>
> 	Signed-off-by: C.O. Mitter <c.o.mitter@example.com>
> 	non-trailer-line
>
> but not when they look like
>
> 	Signed-off-by: C.O. Mitter <c.o.mitter@example.com>
>
> so we still need some way of indicating that there was a non-trailer
> line after the last trailer though.

What is the issue, exactly? Also can you clarify if the issue is
introduced by this series (did you spot a regression)?

>> The next commit demonstrates the use of the iterator in sequencer.c as an
>> example of where "raw" will be useful, so that it can start using the
>> iterator.
>>
>> For the existing use of the iterator in builtin/shortlog.c, we don't
>> have to change the code there because that code does
>
> An interface that lets the caller pass a flag if they want to know about
> non-trailer lines might be easier to use for the callers that don't want
> to worry about such lines and wouldn't need a justification as to why it
> was safe for existing callers.

Makes sense. But perhaps such API enhancements belong in a future
series, when other callers that need such flexibility could benefit from
it?

> Best Wishes
>
> Phillip
>
>>      trailer_iterator_init(&iter, body);
>>      while (trailer_iterator_advance(&iter)) {
>>          const char *value = iter.val.buf;
>>
>>          if (!string_list_has_string(&log->trailers, iter.key.buf))
>>              continue;
>>
>>          ...
>>
>> and the
>>
>>          if (!string_list_has_string(&log->trailers, iter.key.buf))
>>
>> condition already skips over non-trailer lines (iter.key.buf is empty
>> for non-trailer lines, making the comparison still work even with this
>> commit).
>>
>> Rename "num_expected_trailers" to "num_expected" in
>> t/unit-tests/t-trailer.c because the items we iterate over now include
>> non-trailer lines.
>>
>> Signed-off-by: Linus Arver <linus@ucla.edu>
>> [...]
Phillip Wood May 5, 2024, 2:09 p.m. UTC | #3
Hi Linus

On 05/05/2024 02:37, Linus Arver wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
>> On 02/05/2024 05:54, Linus Arver via GitGitGadget wrote:
>>> From: Linus Arver <linus@ucla.edu>
>>>
>>> The new "raw" member is important because anyone currently not using the
>>> iterator is using trailer_info's raw string array directly to access
>>> lines to check what the combined key + value looks like. If we didn't
>>> provide a "raw" member here, iterator users would have to re-construct
>>> the unparsed line by concatenating the key and value back together again
>>> --- which places an undue burden for iterator users.
>>
>> Comparing the raw line is error prone as it ignores custom separators
>> and variations in the amount of space between the key and the value.
>> Therefore I'd argue that the sequencer should in fact be comparing the
>> trailer key and value separately rather than comparing the whole line.
> 
> I agree, but that is likely beyond the scope of this series as the
> behavior of comparing the whole line was preserved (not introduced) by
> this series.

Right but this series is changing the trailer iterator api to 
accommodate the sub-optimal sequencer code. My thought was that if the 
sequencer did the right thing we wouldn't need to expose the raw line in 
the iterator in the first place.

> For reference, the "Signed-off-by: " is hardcoded in "sign_off_header"
> in sequencer.c, and it is again hardcoded in "git_generated_prefixes" in
> trailer.c. We always use the hardcoded key and colon ":" separator in a
> few areas, so changing the code to be more precise to check for only the
> key (to account for variability in the separator and space around it as
> you pointed out) would be a more involved change (I think many tests
> would need to be updated).

So the worry is that we'd create a "Signed-off-by: " trailer that we 
then couldn't parse because the user didn't have ':' in trailer.separators?

>> There is an issue that we want to add a new Signed-off-by: trailer for
>> "C.O. Mitter" when the trailers look like
>>
>> 	Signed-off-by: C.O. Mitter <c.o.mitter@example.com>
>> 	non-trailer-line
>>
>> but not when they look like
>>
>> 	Signed-off-by: C.O. Mitter <c.o.mitter@example.com>
>>
>> so we still need some way of indicating that there was a non-trailer
>> line after the last trailer though.
> 
> What is the issue, exactly? Also can you clarify if the issue is
> introduced by this series (did you spot a regression)?

There is no regression - the issue is with my suggestion. We only want 
to add an SOB trailer if the last trailer does not match the SOB we're 
adding. If we were to use the existing trailer iterator api in the 
sequencer we would not know that we should add an SOB in the first 
example above as we'd only see the last trailer which matches the SOB 
we're trying to add. We'd still need some way to tell the caller that 
there was a non-trailer line following the last trailer.

>>> The next commit demonstrates the use of the iterator in sequencer.c as an
>>> example of where "raw" will be useful, so that it can start using the
>>> iterator.
>>>
>>> For the existing use of the iterator in builtin/shortlog.c, we don't
>>> have to change the code there because that code does
>>
>> An interface that lets the caller pass a flag if they want to know about
>> non-trailer lines might be easier to use for the callers that don't want
>> to worry about such lines and wouldn't need a justification as to why it
>> was safe for existing callers.
> 
> Makes sense. But perhaps such API enhancements belong in a future
> series, when other callers that need such flexibility could benefit from
> it?

For me the main benefit would be that you don't have to spend time 
explaining why the changes are safe for existing callers because they 
would keep the existing iterator behavor.

Best Wishes

Phillip
Linus Arver May 9, 2024, 7:11 a.m. UTC | #4
Phillip Wood <phillip.wood123@gmail.com> writes:

Sorry for the delay.

> Hi Linus
>
> On 05/05/2024 02:37, Linus Arver wrote:
>> Phillip Wood <phillip.wood123@gmail.com> writes:
>>> On 02/05/2024 05:54, Linus Arver via GitGitGadget wrote:
>>>> From: Linus Arver <linus@ucla.edu>
>>>>
>>>> The new "raw" member is important because anyone currently not using the
>>>> iterator is using trailer_info's raw string array directly to access
>>>> lines to check what the combined key + value looks like. If we didn't
>>>> provide a "raw" member here, iterator users would have to re-construct
>>>> the unparsed line by concatenating the key and value back together again
>>>> --- which places an undue burden for iterator users.
>>>
>>> Comparing the raw line is error prone as it ignores custom separators
>>> and variations in the amount of space between the key and the value.
>>> Therefore I'd argue that the sequencer should in fact be comparing the
>>> trailer key and value separately rather than comparing the whole line.
>>
>> I agree, but that is likely beyond the scope of this series as the
>> behavior of comparing the whole line was preserved (not introduced) by
>> this series.
>
> Right but this series is changing the trailer iterator api to
> accommodate the sub-optimal sequencer code. My thought was that if the
> sequencer did the right thing we wouldn't need to expose the raw line in
> the iterator in the first place.

Well, having familiarized myself with the trailer machinery I was more
comfortable updating this area than reworking the details of the
sequencer code.

To me the sequencer code is a bit hard to read so I feel more
comfortable updating the trailer code as I did here in this series. I
also have another 40, 50 patches in the trailer area I want to continue
pushing up for review, so I would rather focus on that first before
coming back to the sequencer to try to clean it up. My other trailer
patches make the trailer API more precise and aware of the separator and
spaces around it, so using those richer interfaces later would make it
easier to clean up the sequencer area, I think.

So in summary I would rather not get into refactoring the sequencer at
this time.

>> For reference, the "Signed-off-by: " is hardcoded in "sign_off_header"
>> in sequencer.c, and it is again hardcoded in "git_generated_prefixes" in
>> trailer.c. We always use the hardcoded key and colon ":" separator in a
>> few areas, so changing the code to be more precise to check for only the
>> key (to account for variability in the separator and space around it as
>> you pointed out) would be a more involved change (I think many tests
>> would need to be updated).
>
> So the worry is that we'd create a "Signed-off-by: " trailer that we
> then couldn't parse because the user didn't have ':' in trailer.separators?

Even if the user currently doesn't have ':' in trailer.separators, we
still hardcode it in as a separator. So the trailer.separators setting
doesn't matter.

The worry is that any further refactorings of sequencer would be quickly
obsoleted by the to-be-reviewed patches which enrich the trailer API
further which are sitting in my local branch.

>>> There is an issue that we want to add a new Signed-off-by: trailer for
>>> "C.O. Mitter" when the trailers look like
>>>
>>> 	Signed-off-by: C.O. Mitter <c.o.mitter@example.com>
>>> 	non-trailer-line
>>>
>>> but not when they look like
>>>
>>> 	Signed-off-by: C.O. Mitter <c.o.mitter@example.com>
>>>
>>> so we still need some way of indicating that there was a non-trailer
>>> line after the last trailer though.
>>
>> What is the issue, exactly? Also can you clarify if the issue is
>> introduced by this series (did you spot a regression)?
>
> There is no regression - the issue is with my suggestion. We only want
> to add an SOB trailer if the last trailer does not match the SOB we're
> adding.

If there is no regression then I don't understand the concern.

> If we were to use the existing trailer iterator api in the
> sequencer we would not know that we should add an SOB in the first
> example above as we'd only see the last trailer which matches the SOB
> we're trying to add.

Hmm, both the original code and the code in this patch iterate over
non-trailer lines. So the behavior is the same.

> We'd still need some way to tell the caller that
> there was a non-trailer line following the last trailer.

FWIW in one of the patches I already have currently (to be sent in a
future series), I expand the trailer API to let the caller check if the
current iteration is on a trailer or non-trailer object (they can do the
check by looking into the key and value). And in another patch I make it
so that the key field is never populated if the line is a non-trailer
line. So the capability you seek is achievable with those patches.

>>>> The next commit demonstrates the use of the iterator in sequencer.c as an
>>>> example of where "raw" will be useful, so that it can start using the
>>>> iterator.
>>>>
>>>> For the existing use of the iterator in builtin/shortlog.c, we don't
>>>> have to change the code there because that code does
>>>
>>> An interface that lets the caller pass a flag if they want to know about
>>> non-trailer lines might be easier to use for the callers that don't want
>>> to worry about such lines and wouldn't need a justification as to why it
>>> was safe for existing callers.
>>
>> Makes sense. But perhaps such API enhancements belong in a future
>> series, when other callers that need such flexibility could benefit from
>> it?
>
> For me the main benefit would be that you don't have to spend time
> explaining why the changes are safe for existing callers because they
> would keep the existing iterator behavor.

But, I've already written the explanation so this justification seems a bit
moot, no?

But ultimately I think it makes sense for the iterator to be able to
iterate over non-trailer lines because that just brings more power to
the callers that need or want it. The sequencer code is such an example
--- whether it is suboptimal or not is a separate matter, I think
(indeed I did not look too much into why the sequencer stuff wanted to
iterate over non-trailer lines when I was writing this patch).

So in summary, I'd prefer to keep this series as is. We can of course
revisit the sequencer code's use of the trailer API in the future. Thanks.

> Best Wishes
>
> Phillip
Phillip Wood May 13, 2024, 3:11 p.m. UTC | #5
Hi Linus

On 09/05/2024 08:11, Linus Arver wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
>> On 05/05/2024 02:37, Linus Arver wrote:
>>> Phillip Wood <phillip.wood123@gmail.com> writes:
>>>> On 02/05/2024 05:54, Linus Arver via GitGitGadget wrote:
>>>>> From: Linus Arver <linus@ucla.edu>
>>>>>
>>>>> The new "raw" member is important because anyone currently not using the
>>>>> iterator is using trailer_info's raw string array directly to access
>>>>> lines to check what the combined key + value looks like. If we didn't
>>>>> provide a "raw" member here, iterator users would have to re-construct
>>>>> the unparsed line by concatenating the key and value back together again
>>>>> --- which places an undue burden for iterator users.
>>>>
>>>> Comparing the raw line is error prone as it ignores custom separators
>>>> and variations in the amount of space between the key and the value.
>>>> Therefore I'd argue that the sequencer should in fact be comparing the
>>>> trailer key and value separately rather than comparing the whole line.
>>>
>>> I agree, but that is likely beyond the scope of this series as the
>>> behavior of comparing the whole line was preserved (not introduced) by
>>> this series.
>>
>> Right but this series is changing the trailer iterator api to
>> accommodate the sub-optimal sequencer code. My thought was that if the
>> sequencer did the right thing we wouldn't need to expose the raw line in
>> the iterator in the first place.
> 
> Well, having familiarized myself with the trailer machinery I was more
> comfortable updating this area than reworking the details of the
> sequencer code.
> 
> To me the sequencer code is a bit hard to read so I feel more
> comfortable updating the trailer code as I did here in this series. I
> also have another 40, 50 patches in the trailer area I want to continue
> pushing up for review, so I would rather focus on that first before
> coming back to the sequencer to try to clean it up. My other trailer
> patches make the trailer API more precise and aware of the separator and
> spaces around it,

I'm a bit confused by this as it looks like the 
trailer_iterator_advance() already respects trailer.separators and 
trims the key and value.

> so using those richer interfaces later would make it
> easier to clean up the sequencer area, I think.
> 
> So in summary I would rather not get into refactoring the sequencer at
> this time.

Far enough, I hoped I might have time over the weekend to look at the 
sequencer code in more detail but that didn't work out. So long as we're 
not building bad abstractions into the trailer iterator to accommodate 
the sequencer we can come back and clean up append_signoff() later.

Best Wishes

Phillip
Phillip Wood May 13, 2024, 3:13 p.m. UTC | #6
On 13/05/2024 16:11, Phillip Wood wrote:
> Far enough,

Sorry that should have been "Fair enough"

Phillip
diff mbox series

Patch

diff --git a/t/unit-tests/t-trailer.c b/t/unit-tests/t-trailer.c
index c1f897235c7..4f640d2a4b8 100644
--- a/t/unit-tests/t-trailer.c
+++ b/t/unit-tests/t-trailer.c
@@ -1,7 +1,7 @@ 
 #include "test-lib.h"
 #include "trailer.h"
 
-static void t_trailer_iterator(const char *msg, size_t num_expected_trailers)
+static void t_trailer_iterator(const char *msg, size_t num_expected)
 {
 	struct trailer_iterator iter;
 	size_t i = 0;
@@ -11,7 +11,7 @@  static void t_trailer_iterator(const char *msg, size_t num_expected_trailers)
 		i++;
 	trailer_iterator_release(&iter);
 
-	check_uint(i, ==, num_expected_trailers);
+	check_uint(i, ==, num_expected);
 }
 
 static void run_t_trailer_iterator(void)
@@ -19,7 +19,7 @@  static void run_t_trailer_iterator(void)
 	static struct test_cases {
 		const char *name;
 		const char *msg;
-		size_t num_expected_trailers;
+		size_t num_expected;
 	} tc[] = {
 		{
 			"empty input",
@@ -119,7 +119,13 @@  static void run_t_trailer_iterator(void)
 			"not a trailer line\n"
 			"not a trailer line\n"
 			"Signed-off-by: x\n",
-			1
+			/*
+			 * Even though there is only really 1 real "trailer"
+			 * (Signed-off-by), we still have 4 trailer objects
+			 * because we still want to iterate through the entire
+			 * block.
+			 */
+			4
 		},
 		{
 			"with non-trailer lines (one too many) in trailer block",
@@ -162,7 +168,7 @@  static void run_t_trailer_iterator(void)
 
 	for (int i = 0; i < sizeof(tc) / sizeof(tc[0]); i++) {
 		TEST(t_trailer_iterator(tc[i].msg,
-					tc[i].num_expected_trailers),
+					tc[i].num_expected),
 		     "%s", tc[i].name);
 	}
 }
diff --git a/trailer.c b/trailer.c
index 3e4dab9c065..4700c441442 100644
--- a/trailer.c
+++ b/trailer.c
@@ -1146,17 +1146,15 @@  void trailer_iterator_init(struct trailer_iterator *iter, const char *msg)
 
 int trailer_iterator_advance(struct trailer_iterator *iter)
 {
-	while (iter->internal.cur < iter->internal.info.trailer_nr) {
-		char *trailer = iter->internal.info.trailers[iter->internal.cur++];
-		int separator_pos = find_separator(trailer, separators);
-
-		if (separator_pos < 1)
-			continue; /* not a real trailer */
+	if (iter->internal.cur < iter->internal.info.trailer_nr) {
+		char *line = iter->internal.info.trailers[iter->internal.cur++];
+		int separator_pos = find_separator(line, separators);
 
+		iter->raw = line;
 		strbuf_reset(&iter->key);
 		strbuf_reset(&iter->val);
 		parse_trailer(&iter->key, &iter->val, NULL,
-			      trailer, separator_pos);
+			      line, separator_pos);
 		/* Always unfold values during iteration. */
 		unfold_value(&iter->val);
 		return 1;
diff --git a/trailer.h b/trailer.h
index 9f42aa75994..7e36da7d13c 100644
--- a/trailer.h
+++ b/trailer.h
@@ -125,6 +125,13 @@  void format_trailers_from_commit(const struct process_trailer_options *,
  *   trailer_iterator_release(&iter);
  */
 struct trailer_iterator {
+	/*
+	 * Raw line (e.g., "foo: bar baz") before being parsed as a trailer
+	 * key/val pair as part of a trailer block (as the "key" and "val"
+	 * fields below). If a line fails to parse as a trailer, then the "key"
+	 * will be the entire line and "val" will be the empty string.
+	 */
+	const char *raw;
 	struct strbuf key;
 	struct strbuf val;