Message ID | d023c297dcac0bb96f681dc1fc0116a649c2efec.1691211879.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Trailer readability cleanups | expand |
"Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes: > Currently, process_input_file does three things: > > (1) parse the input string for trailers, > (2) print text before the trailers, and > (3) calculate the position of the input where the trailers end. > > Rename this function to parse_trailers(), and make it only do > (1). Okay, process_input_file() is a very unhelpful name (What does it mean to "process a file"?). In contrast, parse_trailers() is more self-descriptive (It parses trailers into some appropriate format, so it shouldn't do things like print.) Makes sense. Is there some additional, unstated purpose behind this change besides "move things around for readability"? E.g. do you intend to move parse_trailers() to a future trailer parsing library? If so, that would be useful context to evaluate the goodness of this split. > The caller of this function, process_trailers, becomes responsible > for (2) and (3). These items belong inside process_trailers because they > are both concerned with printing the surrounding text around > trailers (which is already one of the immediate concerns of > process_trailers). I agree that (2) doesn't belong in parse_trailers(). OTOH, (3) sounds like something that belongs in parse_trailers() - you have to parse trailers in order to tell where the trailers start and end, so it makes sense for the parsing function to give those values. > diff --git a/trailer.c b/trailer.c > index dff3fafe865..16fbba03d07 100644 > --- a/trailer.c > +++ b/trailer.c > @@ -961,28 +961,23 @@ static void unfold_value(struct strbuf *val) > strbuf_release(&out); > } > > -static size_t process_input_file(FILE *outfile, > - const char *str, > - struct list_head *head, > - const struct process_trailer_options *opts) > +/* > + * Parse trailers in "str" and populate the "head" linked list structure. > + */ > +static void parse_trailers(struct trailer_info *info, "info" is an out parameter, and IIRC we typically put out parameters towards the end. I didn't find a callout in CodingGuidelines, though, so idk if this is an ironclad rule or not. > + const char *str, > + struct list_head *head, > + const struct process_trailer_options *opts) > { > - struct trailer_info info; > struct strbuf tok = STRBUF_INIT; > struct strbuf val = STRBUF_INIT; > size_t i; > > - trailer_info_get(&info, str, opts); > - > - /* Print lines before the trailers as is */ > - if (!opts->only_trailers) > - fwrite(str, 1, info.trailer_start - str, outfile); We no longer fwrite the contents before the trailer, okay. > + trailer_info_get(info, str, opts); This is where we actually get the start and end of trailers, and each trailer string. This is parsing out the trailers from a string, so what other parsing is left? Reading ahead shows that we're actually parsing the trailer string into a "struct trailer_item". Okay, so this function is basically a wrapper around trailer_info_get() that also "returns" the parsed trailer_items. > - if (!opts->only_trailers && !info.blank_line_before_trailer) > - fprintf(outfile, "\n"); > - So we don't print the trailing line. Also makes sense. > @@ -1003,9 +998,7 @@ static size_t process_input_file(FILE *outfile, > } > } > > - trailer_info_release(&info); > - > - return info.trailer_end - str; > + trailer_info_release(info); > } > Even though "info" is a pointer passed into this function, we are _release-ing it. This is not an umabiguously good change, IMO. Before, "info" was never used outside of this function, so we should obviously release it before returning. However, now that "info" is an out parameter, we should be more careful about releasing it. I don't think it's obvious that the caller will see the right values for info.trailer_end and info.trailer_start, but free()-d values for info.trailers, and a meaningless value for info.trailer_nr (since the items were free()-d). I think it might be better to update the comment on parse_trailers() like so: /* * Parse trailers in "str", populating the trailer info and "head" * linked list structure. */ and make it the caller's responsibility to call trailer_info_release(). We could move this call to where we "free_all(head)". > static void free_all(struct list_head *head) > @@ -1054,6 +1047,7 @@ void process_trailers(const char *file, > { > LIST_HEAD(head); > struct strbuf sb = STRBUF_INIT; > + struct trailer_info info; > size_t trailer_end; > FILE *outfile = stdout; > > @@ -1064,8 +1058,16 @@ void process_trailers(const char *file, > if (opts->in_place) > outfile = create_in_place_tempfile(file); Thinking out loud, should we move the creation of outfile next to where we first use it? > + parse_trailers(&info, sb.buf, &head, opts); > + trailer_end = info.trailer_end - sb.buf; > + > /* Print the lines before the trailers */ > - trailer_end = process_input_file(outfile, sb.buf, &head, opts); > + if (!opts->only_trailers) > + fwrite(sb.buf, 1, info.trailer_start - sb.buf, outfile); I'm not sure if it is an unambiguously good change for the caller to learn how to compute the start and end of the trailer sections by doing pointer arithmetic, but I guess format_trailer_info() does this anyway, so your proposal to move (3) outside of the parse_trailers() makes sense. It feels a bit non-obvious that trailer_start and trailer_end are pointing inside the input string. I wonder if we should just return the _start and _end offsets directly instead of returning pointers. I.e.: struct trailer_info { int blank_line_before_trailer; - /* - * Pointers to the start and end of the trailer block found. If there - * is no trailer block found, these 2 pointers point to the end of the - * input string. - */ - const char *trailer_start, *trailer_end; + /* Offsets to the trailer block start and end in the input string */ + size_t *trailer_start, *trailer_end; Which makes their intended use fairly unambiguous. A quick grep suggests that in trailer.c, we're roughly as likely to use the pointer directly vs using it to do pointer arithmetic, so converging on one use might be a win for readability. The only other user outside of trailer.c is sequencer.c, which doesn't care about the return type - it only checks if there are trailers.
Glen Choo <chooglen@google.com> writes: > "Linus Arver via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> Currently, process_input_file does three things: >> >> (1) parse the input string for trailers, >> (2) print text before the trailers, and >> (3) calculate the position of the input where the trailers end. >> >> Rename this function to parse_trailers(), and make it only do >> (1). > > [...] > > Is there some additional, unstated purpose behind this change besides > "move things around for readability"? E.g. do you intend to move > parse_trailers() to a future trailer parsing library? If so, that would > be useful context to evaluate the goodness of this split. I think it's still too early to say whether certain functions will make it (unmodified) into the public, libified API. So currently "move things around for readability" is the most concrete reason. >> The caller of this function, process_trailers, becomes responsible >> for (2) and (3). These items belong inside process_trailers because they >> are both concerned with printing the surrounding text around >> trailers (which is already one of the immediate concerns of >> process_trailers). > > I agree that (2) doesn't belong in parse_trailers(). OTOH, (3) sounds > like something that belongs in parse_trailers() - you have to parse > trailers in order to tell where the trailers start and end, so it makes > sense for the parsing function to give those values. I don't think (3) should belong in parse_trailers, mainly because the "info" struct we pass into it gets this information populated by parse_trailers already. Which is why we can do (3) in the caller with parse_trailers(&info, sb.buf, &head, opts); trailer_end = info.trailer_end - sb.buf; to get the same information. Also, the endpoint of the trailers is no more inherently special than, for example, the following other possible return values: - the number of trailers that were recognized and parsed - whether we encountered any trailers or not - the start position of when we first encountered a trailer in the input which makes me want to avoid returning this "trailer_end" value from parse_trailers. One more thing: we already have a function named "find_trailer_end" which is supposed to do this already. But it uses "ignore_non_trailer" from commit.c (that function should probably use the trailer API later on to figure this out...). I wanted to clean that part up in the future as part of libifcation. >> -static size_t process_input_file(FILE *outfile, >> - const char *str, >> - struct list_head *head, >> - const struct process_trailer_options *opts) >> +/* >> + * Parse trailers in "str" and populate the "head" linked list structure. >> + */ >> +static void parse_trailers(struct trailer_info *info, > > "info" is an out parameter, and IIRC we typically put out parameters > towards the end. I didn't find a callout in CodingGuidelines, though, so > idk if this is an ironclad rule or not. I wanted to minimize churn as much as possible (hence my hesitation with changing around the order of these parameters). But also, trailer_info_get uses "info" as the first parameter, so I wanted to align with that usage. >> @@ -1003,9 +998,7 @@ static size_t process_input_file(FILE *outfile, >> } >> } >> >> - trailer_info_release(&info); >> - >> - return info.trailer_end - str; >> + trailer_info_release(info); >> } >> > > Even though "info" is a pointer passed into this function, we are > _release-ing it. This is not an umabiguously good change, IMO. Before, > "info" was never used outside of this function, so we should obviously > release it before returning. However, now that "info" is an out > parameter, we should be more careful about releasing it. Hmm, agreed. > I don't think > it's obvious that the caller will see the right values for > info.trailer_end and info.trailer_start, but free()-d values for > info.trailers, and a meaningless value for info.trailer_nr (since the > items were free()-d). Agreed. Will update to avoid calling trailer_info_release() inside parse_trailers() because the caller might still need that information. I think the fix is to move the trailer_info_get outside to the caller, much like how format_trailers_from_commit() does it. > I think it might be better to update the comment on parse_trailers() > like so: > > /* > * Parse trailers in "str", populating the trailer info and "head" > * linked list structure. > */ > > and make it the caller's responsibility to call trailer_info_release(). > We could move this call to where we "free_all(head)". SGTM. (I regret not reading this text before drafting my response above.) >> static void free_all(struct list_head *head) >> @@ -1054,6 +1047,7 @@ void process_trailers(const char *file, >> { >> LIST_HEAD(head); >> struct strbuf sb = STRBUF_INIT; >> + struct trailer_info info; >> size_t trailer_end; >> FILE *outfile = stdout; >> >> @@ -1064,8 +1058,16 @@ void process_trailers(const char *file, >> if (opts->in_place) >> outfile = create_in_place_tempfile(file); > > Thinking out loud, should we move the creation of outfile next to where > we first use it? Not sure what you mean here. Can you clarify? >> + parse_trailers(&info, sb.buf, &head, opts); >> + trailer_end = info.trailer_end - sb.buf; >> + >> /* Print the lines before the trailers */ >> - trailer_end = process_input_file(outfile, sb.buf, &head, opts); >> + if (!opts->only_trailers) >> + fwrite(sb.buf, 1, info.trailer_start - sb.buf, outfile); > > I'm not sure if it is an unambiguously good change for the caller to > learn how to compute the start and end of the trailer sections by doing > pointer arithmetic, I think a future cleanup (in a follow-up series) involving find_trailer_end should simplify this area and avoid the need for pointer arithmetic in the caller. > It feels a bit non-obvious that trailer_start and trailer_end are > pointing inside the input string. I wonder if we should just return the > _start and _end offsets directly instead of returning pointers. I.e.: > > struct trailer_info { > int blank_line_before_trailer; > - /* > - * Pointers to the start and end of the trailer block found. If there > - * is no trailer block found, these 2 pointers point to the end of the > - * input string. > - */ > - const char *trailer_start, *trailer_end; > + /* Offsets to the trailer block start and end in the input string */ > + size_t *trailer_start, *trailer_end; > > Which makes their intended use fairly unambiguous. A quick grep suggests > that in trailer.c, we're roughly as likely to use the pointer directly > vs using it to do pointer arithmetic, so converging on one use might be > a win for readability. Agreed! I would prefer to use offsets everywhere, as I think that is more direct (because we are concerned with locations in the input).
diff --git a/trailer.c b/trailer.c index dff3fafe865..16fbba03d07 100644 --- a/trailer.c +++ b/trailer.c @@ -961,28 +961,23 @@ static void unfold_value(struct strbuf *val) strbuf_release(&out); } -static size_t process_input_file(FILE *outfile, - const char *str, - struct list_head *head, - const struct process_trailer_options *opts) +/* + * Parse trailers in "str" and populate the "head" linked list structure. + */ +static void parse_trailers(struct trailer_info *info, + const char *str, + struct list_head *head, + const struct process_trailer_options *opts) { - struct trailer_info info; struct strbuf tok = STRBUF_INIT; struct strbuf val = STRBUF_INIT; size_t i; - trailer_info_get(&info, str, opts); - - /* Print lines before the trailers as is */ - if (!opts->only_trailers) - fwrite(str, 1, info.trailer_start - str, outfile); + trailer_info_get(info, str, opts); - if (!opts->only_trailers && !info.blank_line_before_trailer) - fprintf(outfile, "\n"); - - for (i = 0; i < info.trailer_nr; i++) { + for (i = 0; i < info->trailer_nr; i++) { int separator_pos; - char *trailer = info.trailers[i]; + char *trailer = info->trailers[i]; if (trailer[0] == comment_line_char) continue; separator_pos = find_separator(trailer, separators); @@ -1003,9 +998,7 @@ static size_t process_input_file(FILE *outfile, } } - trailer_info_release(&info); - - return info.trailer_end - str; + trailer_info_release(info); } static void free_all(struct list_head *head) @@ -1054,6 +1047,7 @@ void process_trailers(const char *file, { LIST_HEAD(head); struct strbuf sb = STRBUF_INIT; + struct trailer_info info; size_t trailer_end; FILE *outfile = stdout; @@ -1064,8 +1058,16 @@ void process_trailers(const char *file, if (opts->in_place) outfile = create_in_place_tempfile(file); + parse_trailers(&info, sb.buf, &head, opts); + trailer_end = info.trailer_end - sb.buf; + /* Print the lines before the trailers */ - trailer_end = process_input_file(outfile, sb.buf, &head, opts); + if (!opts->only_trailers) + fwrite(sb.buf, 1, info.trailer_start - sb.buf, outfile); + + if (!opts->only_trailers && !info.blank_line_before_trailer) + fprintf(outfile, "\n"); + if (!opts->only_input) { LIST_HEAD(arg_head);