Message ID | 20240307183743.219951-1-flosch@nutanix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | wt-status: Don't find scissors line beyond buf len | expand |
Florian Schmidt <flosch@nutanix.com> writes: > Currently, if > (a) There is a "---" divider in a commit message, > (b) At some point beyond that divider, there is a cut-line (that is, > "# ------------------------ >8 ------------------------") in the > commit message, > (c) the user does not explicitly set the "no-divider" option, > then "git interpret-trailers" will hang indefinitively. You do not have to say "Currently, if"; just "If" is sufficient. Cf. Documentation/SubmittingPatches[[present-tense]] > This is because when (a) is true, find_end_of_log_message() will invoke > ignored_log_message_bytes() with a len that is intended to make it > ignore the part of the commit message beyond the divider. However, > ignored_log_message_bytes() calls wt_status_locate_end(), and that > function ignores the length restriction when it tries to locate the cut > line. If it manages to find one, the returned cutoff value is greater > than len. At this point, ignored_log_message_bytes() goes into an > infinite loop, because it won't advance the string parsing beyond len, > but the exit condition expects to reach cutoff. Good finding. > It seems sensible to expect that wt_status_locate_end() should honour > the length parameter passed in, and doing so fixes this issue. Thanks. This is an ancient bug, not a retression from recent changes to the trailer library [linusa CC'ed to save him from wasting his time wondering if he broke anything]. > diff --git a/wt-status.c b/wt-status.c > index b5a29083df..51a84575ed 100644 > --- a/wt-status.c > +++ b/wt-status.c > @@ -1089,14 +1089,19 @@ size_t wt_status_locate_end(const char *s, size_t len) > { > const char *p; > struct strbuf pattern = STRBUF_INIT; > + size_t result = len; > > strbuf_addf(&pattern, "\n%c %s", comment_line_char, cut_line); > if (starts_with(s, pattern.buf + 1)) > - len = 0; > - else if ((p = strstr(s, pattern.buf))) > - len = p - s + 1; > + result = 0; > + else if ((p = strstr(s, pattern.buf))) { > + result = p - s + 1; > + if (result > len) { > + result = len; > + } > + } > strbuf_release(&pattern); > - return len; > + return result; > } Looks correct, but we probably can make the fix a lot more isolated into a single block, like the attached patch. How does this look? wt-status.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git c/wt-status.c w/wt-status.c index b5a29083df..511f37cfe0 100644 --- c/wt-status.c +++ w/wt-status.c @@ -1093,8 +1093,11 @@ size_t wt_status_locate_end(const char *s, size_t len) strbuf_addf(&pattern, "\n%c %s", comment_line_char, cut_line); if (starts_with(s, pattern.buf + 1)) len = 0; - else if ((p = strstr(s, pattern.buf))) - len = p - s + 1; + else if ((p = strstr(s, pattern.buf))) { + int newlen = p - s + 1; + if (newlen < len) + len = newlen; + } strbuf_release(&pattern); return len; }
Florian Schmidt <flosch@nutanix.com> writes: > Side remark: Since strstr() doesn't consider len, and will always search > up to a null byte, I now wonder whether it would be safer to create a > new strbuf that only contains the len bytes we want to operate on. That is a valid concern in general, but does not seem to apply to the current codebase. Thanks for being careful. Two of the three callers of wt_status_locate_end() feed the pointer into a piece of memory that is owned by strbuf, which guarantees that the memory has an extra NUL to terminate it as a string even if you did strbuf buf = STRBUF_INIT; strbuf_addch(&buf, 'A'); The other one is in commit.c:ignored_log_message_bytes() that still takes <buf, len> as input, but again, two of its three callers call it with a pointer that points at the beginning of memory held by an instance of strbuf. That leaves us trailer.c:find_end_of_log_message() the only one to worry about, but it uses strlen() on the pointer before calling ignored_log_message_bytes() so the region of the memory pointed at by the pointer is assumed to be NUL-terminated already, and presumably (I didn't follow the logic there too closely) the length is also computed within that NUL-terminated string.
From: Florian Schmidt <flosch@nutanix.com> Date: Thu, 7 Mar 2024 18:37:38 +0000 Subject: [PATCH] wt-status: don't find scissors line beyond buf len If (a) There is a "---" divider in a commit message, (b) At some point beyond that divider, there is a cut-line (that is, "# ------------------------ >8 ------------------------") in the commit message, (c) the user does not explicitly set the "no-divider" option, then "git interpret-trailers" will hang indefinitively. This is because when (a) is true, find_end_of_log_message() will invoke ignored_log_message_bytes() with a len that is intended to make it ignore the part of the commit message beyond the divider. However, ignored_log_message_bytes() calls wt_status_locate_end(), and that function ignores the length restriction when it tries to locate the cut line. If it manages to find one, the returned cutoff value is greater than len. At this point, ignored_log_message_bytes() goes into an infinite loop, because it won't advance the string parsing beyond len, but the exit condition expects to reach cutoff. Make wt_status_locate_end() honor the length parameter passed in, to fix this issue. In general, if wt_status_locate_end() is given a piece of the memory that lacks NUL at all, strstr() may continue across page boundaries and run into an unmapped page. For our current callers, this is not a problem, as all of them except one uses a memory owned by a strbuf (which guarantees an implicit NUL-termination after its payload), and the one exeption in trailer.c:find_end_of_log_message() uses strlen() to compute the length before calling this function. Signed-off-by: Florian Schmidt <flosch@nutanix.com> Reviewed-by: Jonathan Davies <jonathan.davies@nutanix.com> [jc: tweaked the commit log message and the implementation a bit] Signed-off-by: Junio C Hamano <gitster@pobox.com> --- * So here is the version I queued. I have a new paragraph at the end of the log message to talk about use of strstr() and how it is OK in the current codebase. t/t7513-interpret-trailers.sh | 14 ++++++++++++++ wt-status.c | 7 +++++-- 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh index 6602790b5f..5efe70d675 100755 --- a/t/t7513-interpret-trailers.sh +++ b/t/t7513-interpret-trailers.sh @@ -1476,4 +1476,18 @@ test_expect_success 'suppress --- handling' ' test_cmp expected actual ' +test_expect_success 'handling of --- lines in conjunction with cut-lines' ' + echo "my-trailer: here" >expected && + + git interpret-trailers --parse >actual <<-\EOF && + subject + + my-trailer: here + --- + # ------------------------ >8 ------------------------ + EOF + + test_cmp expected actual +' + test_done diff --git a/wt-status.c b/wt-status.c index 40b59be478..16c1b9b7ee 100644 --- a/wt-status.c +++ b/wt-status.c @@ -1007,8 +1007,11 @@ size_t wt_status_locate_end(const char *s, size_t len) strbuf_addf(&pattern, "\n%c %s", comment_line_char, cut_line); if (starts_with(s, pattern.buf + 1)) len = 0; - else if ((p = strstr(s, pattern.buf))) - len = p - s + 1; + else if ((p = strstr(s, pattern.buf))) { + size_t newlen = p - s + 1; + if (newlen < len) + len = newlen; + } strbuf_release(&pattern); return len; }
On Thu, Mar 7, 2024 at 3:47 PM Junio C Hamano <gitster@pobox.com> wrote: > * So here is the version I queued. I have a new paragraph at the > end of the log message to talk about use of strstr() and how it > is OK in the current codebase. > [jc: tweaked the commit log message and the implementation a bit] > > From: Florian Schmidt <flosch@nutanix.com> > > In general, if wt_status_locate_end() is given a piece of the memory > that lacks NUL at all, strstr() may continue across page boundaries > and run into an unmapped page. For our current callers, this is not > a problem, as all of them except one uses a memory owned by a strbuf > (which guarantees an implicit NUL-termination after its payload), > and the one exeption in trailer.c:find_end_of_log_message() uses > strlen() to compute the length before calling this function. s/exeption/exception/
On Thu, Mar 7, 2024, at 21:47, Junio C Hamano wrote: > Signed-off-by: Florian Schmidt <flosch@nutanix.com> > Reviewed-by: Jonathan Davies <jonathan.davies@nutanix.com> > [jc: tweaked the commit log message and the implementation a bit] Just a question. Given the imperative mood principle/rule, why are these bracket changelog lines always written in the past tense? Cheers
"Kristoffer Haugsbakk" <code@khaugsbakk.name> writes: > On Thu, Mar 7, 2024, at 21:47, Junio C Hamano wrote: >> Signed-off-by: Florian Schmidt <flosch@nutanix.com> >> Reviewed-by: Jonathan Davies <jonathan.davies@nutanix.com> >> [jc: tweaked the commit log message and the implementation a bit] > > Just a question. Given the imperative mood principle/rule, why are these > bracket changelog lines always written in the past tense? These are not giving orders to the code to become like so. The trailer block records what happend to the patch in chronological order---think of those written there at one level higher level, "meta" comments.
On Thu, Mar 7, 2024 at 4:24 PM Junio C Hamano <gitster@pobox.com> wrote: > "Kristoffer Haugsbakk" <code@khaugsbakk.name> writes: > > On Thu, Mar 7, 2024, at 21:47, Junio C Hamano wrote: > >> [jc: tweaked the commit log message and the implementation a bit] > > > > Just a question. Given the imperative mood principle/rule, why are these > > bracket changelog lines always written in the past tense? > > These are not giving orders to the code to become like so. The > trailer block records what happend to the patch in chronological > order---think of those written there at one level higher level, > "meta" comments. Also, they are not always written in past tense[*]. [*]: https://lore.kernel.org/git/20240112171910.11131-1-ericsunshine@charter.net/
On Thu, Mar 7, 2024 at 4:24 PM Junio C Hamano <gitster@pobox.com> wrote: >> "Kristoffer Haugsbakk" <code@khaugsbakk.name> writes: >> > On Thu, Mar 7, 2024, at 21:47, Junio C Hamano wrote: >> >> [jc: tweaked the commit log message and the implementation a bit] >> > >> > Just a question. Given the imperative mood principle/rule, why are these >> > bracket changelog lines always written in the past tense? >> >> These are not giving orders to the code to become like so. The >> trailer block records what happend to the patch in chronological >> order---think of those written there at one level higher level, >> "meta" comments. And when I think about it the trailers themselves are of course in the past tense… Thanks guys On Thu, Mar 7, 2024, at 22:26, Eric Sunshine wrote: > > Also, they are not always written in past tense[*]. > > [*]: > https://lore.kernel.org/git/20240112171910.11131-1-ericsunshine@charter.net/
That's a quick review cycle, thanks! On 07/03/2024 19:20, Junio C Hamano wrote: > You do not have to say "Currently, if"; just "If" is sufficient. > Cf. Documentation/SubmittingPatches[[present-tense]] ack > Looks correct, but we probably can make the fix a lot more isolated > into a single block, like the attached patch. How does this look? > > wt-status.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git c/wt-status.c w/wt-status.c > index b5a29083df..511f37cfe0 100644 > --- c/wt-status.c > +++ w/wt-status.c > @@ -1093,8 +1093,11 @@ size_t wt_status_locate_end(const char *s, size_t len) > strbuf_addf(&pattern, "\n%c %s", comment_line_char, cut_line); > if (starts_with(s, pattern.buf + 1)) > len = 0; > - else if ((p = strstr(s, pattern.buf))) > - len = p - s + 1; > + else if ((p = strstr(s, pattern.buf))) { > + int newlen = p - s + 1; > + if (newlen < len) > + len = newlen; > + } > strbuf_release(&pattern); > return len; > } That looks good to me, thanks! For context, I had sent in the slightly larger patch because at first look, I got confused by the fact that wt_status_locate_end takes len as a parameter, but then seemingly doesn't read it at all and only writes to it. (Of course, if neither the if nor the if-else branch are hit, len is in fact read when it is returned unchanged.) So I made the patch a bit larger in the hope that the next cursory reader might not get confused. But this option is a much more minimal patch, and functionally the same. So it comes down to style, and you have a much better feeling for that in this code base, so I'm happy to go with this. Do you want me to send this version as a v2 to you + the list as per the documentation? Cheers, flosch
On 07/03/2024 19:35, Junio C Hamano wrote: > Florian Schmidt <flosch@nutanix.com> writes: > >> Side remark: Since strstr() doesn't consider len, and will always search >> up to a null byte, I now wonder whether it would be safer to create a >> new strbuf that only contains the len bytes we want to operate on. > > That is a valid concern in general, but does not seem to apply to > the current codebase. Thanks for being careful. Thanks, that confirms my cursory look at the consumers of the function. If you think that it's unlikely that in the future, a new user of this function would provide a non-terminated string, then there is no need for action. I guess the aim is to use strbufs wherever suitable in the first place, anyway, and those won't have this issue? Cheers, flosch
Florian Schmidt <flosch@nutanix.com> writes: > Do you want me to send this version as a v2 to you + the list as per > the documentation? It would be the technically correct way to do so, but as a short-cut to reduce a round-trip, if you are happy with the version I queued (should be found by fetching the 'seen' branch from any of the mirrors), you can just say "that looks fine" and we can be done with this patch. Thanks.
On 08/03/2024 15:26, Junio C Hamano wrote: > It would be the technically correct way to do so, but as a short-cut > to reduce a round-trip, if you are happy with the version I queued > (should be found by fetching the 'seen' branch from any of the > mirrors), you can just say "that looks fine" and we can be done with > this patch. I had a look, and it looks fine (though there's the one typo already pointed out by Eric: s/exeption/exception/). You can go ahead and use that version of the patch. Cheers, flosch
Junio C Hamano <gitster@pobox.com> writes: > Florian Schmidt <flosch@nutanix.com> writes: > >> It seems sensible to expect that wt_status_locate_end() should honour >> the length parameter passed in, and doing so fixes this issue. > > Thanks. This is an ancient bug, not a retression from recent > changes to the trailer library [linusa CC'ed to save him from > wasting his time wondering if he broke anything]. Ack. Very much appreciated! Also, sorry for not responding sooner --- my email search queries so far were using "replied" as a condition so I only saw this just now while searching for the "trailer" term. Cheers
diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh index ec9c6de114..3d3e13ccf8 100755 --- a/t/t7513-interpret-trailers.sh +++ b/t/t7513-interpret-trailers.sh @@ -1935,4 +1935,18 @@ test_expect_success 'suppressing --- does not disable cut-line handling' ' test_cmp expected actual ' +test_expect_success 'handling of --- lines in conjunction with cut-lines' ' + echo "my-trailer: here" >expected && + + git interpret-trailers --parse >actual <<-\EOF && + subject + + my-trailer: here + --- + # ------------------------ >8 ------------------------ + EOF + + test_cmp expected actual +' + test_done diff --git a/wt-status.c b/wt-status.c index b5a29083df..51a84575ed 100644 --- a/wt-status.c +++ b/wt-status.c @@ -1089,14 +1089,19 @@ size_t wt_status_locate_end(const char *s, size_t len) { const char *p; struct strbuf pattern = STRBUF_INIT; + size_t result = len; strbuf_addf(&pattern, "\n%c %s", comment_line_char, cut_line); if (starts_with(s, pattern.buf + 1)) - len = 0; - else if ((p = strstr(s, pattern.buf))) - len = p - s + 1; + result = 0; + else if ((p = strstr(s, pattern.buf))) { + result = p - s + 1; + if (result > len) { + result = len; + } + } strbuf_release(&pattern); - return len; + return result; } void wt_status_append_cut_line(struct strbuf *buf)