diff mbox series

[v3,03/10] trailer: teach iterator about non-trailer lines

Message ID 9077d5a315d0d7272266856bf75a75b0a24df91d.1714091170.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Make trailer_info struct private (plus sequencer cleanup) | expand

Commit Message

Linus Arver April 26, 2024, 12:26 a.m. UTC
From: Linus Arver <linusa@google.com>

Previously the iterator did not iterate over non-trailer lines. This was
somewhat unfortunate, because trailer blocks could have non-trailer
lines in them since 146245063e (trailer: allow non-trailers in trailer
block, 2016-10-21), which was before the iterator was created in
f0939a0eb1 (trailer: add interface for iterating over commit trailers,
2020-09-27).

So if trailer API users wanted to iterate over all lines in a trailer
block (including non-trailer lines), they could not use the iterator and
were forced to use the lower-level trailer_info struct directly (which
provides a raw string array that includes all lines in the trailer
block).

Change the iterator's behavior so that we also iterate over non-trailer
lines, instead of skipping over them. The new "raw" member of the
iterator allows API users to access previously inaccessible non-trailer
lines. Reword the variable "trailer" to just "line" because this
variable can now hold both trailer lines _and_ non-trailer lines.

The new "raw" member is important because anyone currently not using the
iterator is using trailer_info's raw string array directly to access
lines to check what the combined key + value looks like. If we didn't
provide a "raw" member here, iterator users would have to re-construct
the unparsed line by concatenating the key and value back together again
--- which places an undue burden for iterator users.

The next commit demonstrates the use of the iterator in sequencer.c as an
example of where "raw" will be useful, so that it can start using the
iterator.

For the existing use of the iterator in builtin/shortlog.c, we don't
have to change the code there because that code does

    trailer_iterator_init(&iter, body);
    while (trailer_iterator_advance(&iter)) {
        const char *value = iter.val.buf;

        if (!string_list_has_string(&log->trailers, iter.key.buf))
            continue;

        ...

and the

        if (!string_list_has_string(&log->trailers, iter.key.buf))

condition already skips over non-trailer lines (iter.key.buf is empty
for non-trailer lines, making the comparison still work even with this
commit).

Rename "num_expected_trailers" to "num_expected_objects" in
t/unit-tests/t-trailer.c because the items we iterate over now include
non-trailer lines.

Signed-off-by: Linus Arver <linusa@google.com>
---
 t/unit-tests/t-trailer.c | 16 +++++++++++-----
 trailer.c                | 12 +++++-------
 trailer.h                |  8 ++++++++
 3 files changed, 24 insertions(+), 12 deletions(-)

Comments

Christian Couder April 27, 2024, 12:50 p.m. UTC | #1
(Sorry I just realized that I had sent this email to Linus only.)

On Fri, Apr 26, 2024 at 2:26 AM Linus Arver via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Linus Arver <linusa@google.com>
>
> Previously the iterator did not iterate over non-trailer lines. This was
> somewhat unfortunate, because trailer blocks could have non-trailer
> lines in them since 146245063e (trailer: allow non-trailers in trailer
> block, 2016-10-21), which was before the iterator was created in
> f0939a0eb1 (trailer: add interface for iterating over commit trailers,
> 2020-09-27).
>
> So if trailer API users wanted to iterate over all lines in a trailer
> block (including non-trailer lines), they could not use the iterator and
> were forced to use the lower-level trailer_info struct directly (which
> provides a raw string array that includes all lines in the trailer
> block).
>
> Change the iterator's behavior so that we also iterate over non-trailer
> lines, instead of skipping over them. The new "raw" member of the
> iterator allows API users to access previously inaccessible non-trailer
> lines. Reword the variable "trailer" to just "line" because this
> variable can now hold both trailer lines _and_ non-trailer lines.
>
> The new "raw" member is important because anyone currently not using the
> iterator is using trailer_info's raw string array directly to access
> lines to check what the combined key + value looks like. If we didn't
> provide a "raw" member here, iterator users would have to re-construct
> the unparsed line by concatenating the key and value back together again
> --- which places an undue burden for iterator users.
>
> The next commit demonstrates the use of the iterator in sequencer.c as an
> example of where "raw" will be useful, so that it can start using the
> iterator.
>
> For the existing use of the iterator in builtin/shortlog.c, we don't
> have to change the code there because that code does
>
>     trailer_iterator_init(&iter, body);
>     while (trailer_iterator_advance(&iter)) {
>         const char *value = iter.val.buf;
>
>         if (!string_list_has_string(&log->trailers, iter.key.buf))
>             continue;
>
>         ...
>
> and the
>
>         if (!string_list_has_string(&log->trailers, iter.key.buf))
>
> condition already skips over non-trailer lines (iter.key.buf is empty
> for non-trailer lines, making the comparison still work even with this
> commit).
>
> Rename "num_expected_trailers" to "num_expected_objects" in
> t/unit-tests/t-trailer.c because the items we iterate over now include
> non-trailer lines.

I think it would be simpler if the previous patch used just
"num_expected" or "expected". It's not like the other fields in the
struct ("msg" and "name") are very explicit, so why this one only?

> Signed-off-by: Linus Arver <linusa@google.com>


> diff --git a/trailer.c b/trailer.c
> index 3e4dab9c065..4700c441442 100644
> --- a/trailer.c
> +++ b/trailer.c
> @@ -1146,17 +1146,15 @@ void trailer_iterator_init(struct trailer_iterator *iter, const char *msg)
>
>  int trailer_iterator_advance(struct trailer_iterator *iter)
>  {
> -       while (iter->internal.cur < iter->internal.info.trailer_nr) {
> -               char *trailer = iter->internal.info.trailers[iter->internal.cur++];
> -               int separator_pos = find_separator(trailer, separators);
> -
> -               if (separator_pos < 1)
> -                       continue; /* not a real trailer */
> +       if (iter->internal.cur < iter->internal.info.trailer_nr) {
> +               char *line = iter->internal.info.trailers[iter->internal.cur++];
> +               int separator_pos = find_separator(line, separators);
>
> +               iter->raw = line;
>                 strbuf_reset(&iter->key);
>                 strbuf_reset(&iter->val);
>                 parse_trailer(&iter->key, &iter->val, NULL,
> -                             trailer, separator_pos);
> +                             line, separator_pos);
>                 /* Always unfold values during iteration. */
>                 unfold_value(&iter->val);
>                 return 1;
> diff --git a/trailer.h b/trailer.h
> index 9f42aa75994..ebafa3657e4 100644
> --- a/trailer.h
> +++ b/trailer.h
> @@ -125,6 +125,14 @@ void format_trailers_from_commit(const struct process_trailer_options *,
>   *   trailer_iterator_release(&iter);
>   */
>  struct trailer_iterator {
> +       /*
> +        * Raw line (e.g., "foo: bar baz") before being parsed as a trailer
> +        * key/val pair as part of a trailer block. A trailer block can be
> +        * either 100% trailer lines, or mixed in with non-trailer lines (in
> +        * which case at least 25% must be trailer lines).

I don't think 25% is important here. What is more important is to just
say that this field could not be an actual trailer, and to tell what
the 'key' and 'val' fields below will contain then.


> +        */
> +       const char *raw;
> +
>         struct strbuf key;
>         struct strbuf val;
Linus Arver April 30, 2024, 4:42 a.m. UTC | #2
Christian Couder <christian.couder@gmail.com> writes:

> (Sorry I just realized that I had sent this email to Linus only.)
>
> On Fri, Apr 26, 2024 at 2:26 AM Linus Arver via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Linus Arver <linusa@google.com>
>>
>> Previously the iterator did not iterate over non-trailer lines. This was
>> somewhat unfortunate, because trailer blocks could have non-trailer
>> lines in them since 146245063e (trailer: allow non-trailers in trailer
>> block, 2016-10-21), which was before the iterator was created in
>> f0939a0eb1 (trailer: add interface for iterating over commit trailers,
>> 2020-09-27).
>>
>> So if trailer API users wanted to iterate over all lines in a trailer
>> block (including non-trailer lines), they could not use the iterator and
>> were forced to use the lower-level trailer_info struct directly (which
>> provides a raw string array that includes all lines in the trailer
>> block).
>>
>> Change the iterator's behavior so that we also iterate over non-trailer
>> lines, instead of skipping over them. The new "raw" member of the
>> iterator allows API users to access previously inaccessible non-trailer
>> lines. Reword the variable "trailer" to just "line" because this
>> variable can now hold both trailer lines _and_ non-trailer lines.
>>
>> The new "raw" member is important because anyone currently not using the
>> iterator is using trailer_info's raw string array directly to access
>> lines to check what the combined key + value looks like. If we didn't
>> provide a "raw" member here, iterator users would have to re-construct
>> the unparsed line by concatenating the key and value back together again
>> --- which places an undue burden for iterator users.
>>
>> The next commit demonstrates the use of the iterator in sequencer.c as an
>> example of where "raw" will be useful, so that it can start using the
>> iterator.
>>
>> For the existing use of the iterator in builtin/shortlog.c, we don't
>> have to change the code there because that code does
>>
>>     trailer_iterator_init(&iter, body);
>>     while (trailer_iterator_advance(&iter)) {
>>         const char *value = iter.val.buf;
>>
>>         if (!string_list_has_string(&log->trailers, iter.key.buf))
>>             continue;
>>
>>         ...
>>
>> and the
>>
>>         if (!string_list_has_string(&log->trailers, iter.key.buf))
>>
>> condition already skips over non-trailer lines (iter.key.buf is empty
>> for non-trailer lines, making the comparison still work even with this
>> commit).
>>
>> Rename "num_expected_trailers" to "num_expected_objects" in
>> t/unit-tests/t-trailer.c because the items we iterate over now include
>> non-trailer lines.
>
> I think it would be simpler if the previous patch used just
> "num_expected" or "expected". It's not like the other fields in the
> struct ("msg" and "name") are very explicit, so why this one only?

I didn't give it much thought TBH. "num_expected" SGTM. Will update.

>> Signed-off-by: Linus Arver <linusa@google.com>
>
>
>> diff --git a/trailer.c b/trailer.c
>> index 3e4dab9c065..4700c441442 100644
>> --- a/trailer.c
>> +++ b/trailer.c
>> @@ -1146,17 +1146,15 @@ void trailer_iterator_init(struct trailer_iterator *iter, const char *msg)
>>
>>  int trailer_iterator_advance(struct trailer_iterator *iter)
>>  {
>> -       while (iter->internal.cur < iter->internal.info.trailer_nr) {
>> -               char *trailer = iter->internal.info.trailers[iter->internal.cur++];
>> -               int separator_pos = find_separator(trailer, separators);
>> -
>> -               if (separator_pos < 1)
>> -                       continue; /* not a real trailer */
>> +       if (iter->internal.cur < iter->internal.info.trailer_nr) {
>> +               char *line = iter->internal.info.trailers[iter->internal.cur++];
>> +               int separator_pos = find_separator(line, separators);
>>
>> +               iter->raw = line;
>>                 strbuf_reset(&iter->key);
>>                 strbuf_reset(&iter->val);
>>                 parse_trailer(&iter->key, &iter->val, NULL,
>> -                             trailer, separator_pos);
>> +                             line, separator_pos);
>>                 /* Always unfold values during iteration. */
>>                 unfold_value(&iter->val);
>>                 return 1;
>> diff --git a/trailer.h b/trailer.h
>> index 9f42aa75994..ebafa3657e4 100644
>> --- a/trailer.h
>> +++ b/trailer.h
>> @@ -125,6 +125,14 @@ void format_trailers_from_commit(const struct process_trailer_options *,
>>   *   trailer_iterator_release(&iter);
>>   */
>>  struct trailer_iterator {
>> +       /*
>> +        * Raw line (e.g., "foo: bar baz") before being parsed as a trailer
>> +        * key/val pair as part of a trailer block. A trailer block can be
>> +        * either 100% trailer lines, or mixed in with non-trailer lines (in
>> +        * which case at least 25% must be trailer lines).
>
> I don't think 25% is important here.

SG, will remove 25% language (FWIW we already have such language in
trailer.c if devs want to take a more closer look, so it's not like
we're losing any info overall).

> What is more important is to just
> say that this field could not be an actual trailer, and to tell what
> the 'key' and 'val' fields below will contain then.

Will update.

>
>> +        */
>> +       const char *raw;
>> +
>>         struct strbuf key;
>>         struct strbuf val;


BTW I will be on vacation for the next several weeks. However as the
suggested changes are minor, I think I can still get to them and push up
a v4 sometime this week. Cheers.
Linus Arver April 30, 2024, 4:55 a.m. UTC | #3
Linus Arver <linus@ucla.edu> writes:

> Christian Couder <christian.couder@gmail.com> writes:
>
>> (Sorry I just realized that I had sent this email to Linus only.)
>>
>> On Fri, Apr 26, 2024 at 2:26 AM Linus Arver via GitGitGadget
>> <gitgitgadget@gmail.com> wrote:
>>>
>>> From: Linus Arver <linusa@google.com>
>>>
>>> Previously the iterator did not iterate over non-trailer lines. This was
>>> somewhat unfortunate, because trailer blocks could have non-trailer
>>> lines in them since 146245063e (trailer: allow non-trailers in trailer
>>> block, 2016-10-21), which was before the iterator was created in
>>> f0939a0eb1 (trailer: add interface for iterating over commit trailers,
>>> 2020-09-27).
>>>
>>> So if trailer API users wanted to iterate over all lines in a trailer
>>> block (including non-trailer lines), they could not use the iterator and
>>> were forced to use the lower-level trailer_info struct directly (which
>>> provides a raw string array that includes all lines in the trailer
>>> block).
>>>
>>> Change the iterator's behavior so that we also iterate over non-trailer
>>> lines, instead of skipping over them. The new "raw" member of the
>>> iterator allows API users to access previously inaccessible non-trailer
>>> lines. Reword the variable "trailer" to just "line" because this
>>> variable can now hold both trailer lines _and_ non-trailer lines.
>>>
>>> The new "raw" member is important because anyone currently not using the
>>> iterator is using trailer_info's raw string array directly to access
>>> lines to check what the combined key + value looks like. If we didn't
>>> provide a "raw" member here, iterator users would have to re-construct
>>> the unparsed line by concatenating the key and value back together again
>>> --- which places an undue burden for iterator users.
>>>
>>> The next commit demonstrates the use of the iterator in sequencer.c as an
>>> example of where "raw" will be useful, so that it can start using the
>>> iterator.
>>>
>>> For the existing use of the iterator in builtin/shortlog.c, we don't
>>> have to change the code there because that code does
>>>
>>>     trailer_iterator_init(&iter, body);
>>>     while (trailer_iterator_advance(&iter)) {
>>>         const char *value = iter.val.buf;
>>>
>>>         if (!string_list_has_string(&log->trailers, iter.key.buf))
>>>             continue;
>>>
>>>         ...
>>>
>>> and the
>>>
>>>         if (!string_list_has_string(&log->trailers, iter.key.buf))
>>>
>>> condition already skips over non-trailer lines (iter.key.buf is empty
>>> for non-trailer lines, making the comparison still work even with this
>>> commit).
>>>
>>> Rename "num_expected_trailers" to "num_expected_objects" in
>>> t/unit-tests/t-trailer.c because the items we iterate over now include
>>> non-trailer lines.
>>
>> I think it would be simpler if the previous patch used just
>> "num_expected" or "expected". It's not like the other fields in the
>> struct ("msg" and "name") are very explicit, so why this one only?
>
> I didn't give it much thought TBH. "num_expected" SGTM. Will update.

Another thing: I will rename "trailer_assertions" in Path 10 to probably
"trailer_contents" because it sounds simpler. I am replying here instead
of on Patch 10 because my mail setup still has some rough edges for the
transition away from @google.com (I no longer work there).

And on that note, I'll have to update the SOB lines to match my new
email address.
diff mbox series

Patch

diff --git a/t/unit-tests/t-trailer.c b/t/unit-tests/t-trailer.c
index c1f897235c7..262e2838273 100644
--- a/t/unit-tests/t-trailer.c
+++ b/t/unit-tests/t-trailer.c
@@ -1,7 +1,7 @@ 
 #include "test-lib.h"
 #include "trailer.h"
 
-static void t_trailer_iterator(const char *msg, size_t num_expected_trailers)
+static void t_trailer_iterator(const char *msg, size_t num_expected_objects)
 {
 	struct trailer_iterator iter;
 	size_t i = 0;
@@ -11,7 +11,7 @@  static void t_trailer_iterator(const char *msg, size_t num_expected_trailers)
 		i++;
 	trailer_iterator_release(&iter);
 
-	check_uint(i, ==, num_expected_trailers);
+	check_uint(i, ==, num_expected_objects);
 }
 
 static void run_t_trailer_iterator(void)
@@ -19,7 +19,7 @@  static void run_t_trailer_iterator(void)
 	static struct test_cases {
 		const char *name;
 		const char *msg;
-		size_t num_expected_trailers;
+		size_t num_expected_objects;
 	} tc[] = {
 		{
 			"empty input",
@@ -119,7 +119,13 @@  static void run_t_trailer_iterator(void)
 			"not a trailer line\n"
 			"not a trailer line\n"
 			"Signed-off-by: x\n",
-			1
+			/*
+			 * Even though there is only really 1 real "trailer"
+			 * (Signed-off-by), we still have 4 trailer objects
+			 * because we still want to iterate through the entire
+			 * block.
+			 */
+			4
 		},
 		{
 			"with non-trailer lines (one too many) in trailer block",
@@ -162,7 +168,7 @@  static void run_t_trailer_iterator(void)
 
 	for (int i = 0; i < sizeof(tc) / sizeof(tc[0]); i++) {
 		TEST(t_trailer_iterator(tc[i].msg,
-					tc[i].num_expected_trailers),
+					tc[i].num_expected_objects),
 		     "%s", tc[i].name);
 	}
 }
diff --git a/trailer.c b/trailer.c
index 3e4dab9c065..4700c441442 100644
--- a/trailer.c
+++ b/trailer.c
@@ -1146,17 +1146,15 @@  void trailer_iterator_init(struct trailer_iterator *iter, const char *msg)
 
 int trailer_iterator_advance(struct trailer_iterator *iter)
 {
-	while (iter->internal.cur < iter->internal.info.trailer_nr) {
-		char *trailer = iter->internal.info.trailers[iter->internal.cur++];
-		int separator_pos = find_separator(trailer, separators);
-
-		if (separator_pos < 1)
-			continue; /* not a real trailer */
+	if (iter->internal.cur < iter->internal.info.trailer_nr) {
+		char *line = iter->internal.info.trailers[iter->internal.cur++];
+		int separator_pos = find_separator(line, separators);
 
+		iter->raw = line;
 		strbuf_reset(&iter->key);
 		strbuf_reset(&iter->val);
 		parse_trailer(&iter->key, &iter->val, NULL,
-			      trailer, separator_pos);
+			      line, separator_pos);
 		/* Always unfold values during iteration. */
 		unfold_value(&iter->val);
 		return 1;
diff --git a/trailer.h b/trailer.h
index 9f42aa75994..ebafa3657e4 100644
--- a/trailer.h
+++ b/trailer.h
@@ -125,6 +125,14 @@  void format_trailers_from_commit(const struct process_trailer_options *,
  *   trailer_iterator_release(&iter);
  */
 struct trailer_iterator {
+	/*
+	 * Raw line (e.g., "foo: bar baz") before being parsed as a trailer
+	 * key/val pair as part of a trailer block. A trailer block can be
+	 * either 100% trailer lines, or mixed in with non-trailer lines (in
+	 * which case at least 25% must be trailer lines).
+	 */
+	const char *raw;
+
 	struct strbuf key;
 	struct strbuf val;