Message ID | pull.1763.git.1721312619822.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | add-patch: handle splitting hunks with diff.suppressBlankEmpty | expand |
"Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Phillip Wood <phillip.wood@dunelm.org.uk> > > When "add -p" parses diffs, it looks for context lines starting with a > single space. But when diff.suppressBlankEmpty is in effect, an empty > context line will omit the space, giving us a true empty line. This > confuses the parser, which is unable to split based on such a line. > > It's tempting to say that we should just make sure that we generate a > diff without that option. However, although we do not parse hunks that > the user has manually edited with parse_diff() we do allow the user > to split such hunks. As POSIX calls the decision of whether to print the > space here "implementation-defined" we need to handle edited hunks where > empty context lines omit the space. > > So let's handle both cases: a context line either starts with a space or > consists of a totally empty line by normalizing the first character to a > space when we parse them. Normalizing the first character rather than > changing the code to check for a space or newline will hopefully future > proof against introducing similar bugs if the code is changed. > > Reported-by: Ilya Tumaykin <itumaykin@gmail.com> > Helped-by: Jeff King <peff@peff.net> > Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> > --- Well written. Thanks for a pleasant read. > @@ -953,7 +960,7 @@ static int split_hunk(struct add_p_state *s, struct file_diff *file_diff, > context_line_count = 0; > > while (splittable_into > 1) { > - ch = s->plain.buf[current]; > + ch = normalize_marker(s->plain.buf + current); I wondered if &s->plain.buf[current] is easier to grok, but what's written already is good enough ;-) There is another explicit mention of ' ' in merge_hunks(). Unless we are normalizing the buffer here (which I do not think we do), wouldn't we have to make sure that the code over there also knows that an empty line in a patch is an unmodified empty line? if (plain[overlap_end] != ' ') return error(_("expected context line " "#%d in\n%.*s"),
Hi Junio On 18/07/2024 17:29, Junio C Hamano wrote: > "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> From: Phillip Wood <phillip.wood@dunelm.org.uk> >> >> When "add -p" parses diffs, it looks for context lines starting with a >> single space. But when diff.suppressBlankEmpty is in effect, an empty >> context line will omit the space, giving us a true empty line. This >> confuses the parser, which is unable to split based on such a line. >> >> It's tempting to say that we should just make sure that we generate a >> diff without that option. However, although we do not parse hunks that >> the user has manually edited with parse_diff() we do allow the user >> to split such hunks. As POSIX calls the decision of whether to print the >> space here "implementation-defined" we need to handle edited hunks where >> empty context lines omit the space. >> >> So let's handle both cases: a context line either starts with a space or >> consists of a totally empty line by normalizing the first character to a >> space when we parse them. Normalizing the first character rather than >> changing the code to check for a space or newline will hopefully future >> proof against introducing similar bugs if the code is changed. >> >> Reported-by: Ilya Tumaykin <itumaykin@gmail.com> >> Helped-by: Jeff King <peff@peff.net> >> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> >> --- > > Well written. Thanks for a pleasant read. Thanks to Peff for letting me steal his commit message >> @@ -953,7 +960,7 @@ static int split_hunk(struct add_p_state *s, struct file_diff *file_diff, >> context_line_count = 0; >> >> while (splittable_into > 1) { >> - ch = s->plain.buf[current]; >> + ch = normalize_marker(s->plain.buf + current); > > I wondered if &s->plain.buf[current] is easier to grok, but what's > written already is good enough ;-) Yes I think that would be better > There is another explicit mention of ' ' in merge_hunks(). Unless > we are normalizing the buffer here (which I do not think we do), > wouldn't we have to make sure that the code over there also knows > that an empty line in a patch is an unmodified empty line? > > if (plain[overlap_end] != ' ') > return error(_("expected context line " > "#%d in\n%.*s"), Oh dear, I'm not sure how I missed that. I'll fix that and update the test to make sure it exercises that code path as well. Thanks for your comments Phillip
diff without that option. However, although we do not parse hunks that the user has manually edited with parse_diff() we do allow the user to split such hunks. As POSIX calls the decision of whether to print the space here "implementation-defined" we need to handle edited hunks where empty context lines omit the space. So let's handle both cases: a context line either starts with a space or consists of a totally empty line by normalizing the first character to a space when we parse them. Normalizing the first character rather than changing the code to check for a space or newline will hopefully future proof against introducing similar bugs if the code is changed. Reported-by: Ilya Tumaykin <itumaykin@gmail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> --- add-patch: handle splitting hunks with diff.suppressBlankEmpty This is an alternative to jk/add-patch-with-suppress-blank-empty which was recently discarded from next. I hope that normalizing the context marker will simplify any future changes to the code. While I was writing this I realized that we should be recalculating hunk->splittable_into when we re-count the hunk header of it is edited but I've left to for a future series. Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1763%2Fphillipwood%2Fadd-p-suppress-blank-empty-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1763/phillipwood/add-p-suppress-blank-empty-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/1763 add-patch.c | 17 ++++++++++++----- t/t3701-add-interactive.sh | 11 +++++++++++ 2 files changed, 23 insertions(+), 5 deletions(-) diff --git a/add-patch.c b/add-patch.c index d8ea05ff108..13b2607f544 100644 --- a/add-patch.c +++ b/add-patch.c @@ -400,6 +400,12 @@ static void complete_file(char marker, struct hunk *hunk) hunk->splittable_into++; } +/* Empty context lines may omit the leading ' ' */ +static int normalize_marker(char *p) +{ + return p[0] == '\n' || (p[0] == '\r' && p[1] == '\n') ? ' ' : p[0]; +} + static int parse_diff(struct add_p_state *s, const struct pathspec *ps) { struct strvec args = STRVEC_INIT; @@ -485,6 +491,7 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps) while (p != pend) { char *eol = memchr(p, '\n', pend - p); const char *deleted = NULL, *mode_change = NULL; + char ch = normalize_marker(p); if (!eol) eol = pend; @@ -532,7 +539,7 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps) * Start counting into how many hunks this one can be * split */ - marker = *p; + marker = ch; } else if (hunk == &file_diff->head && starts_with(p, "new file")) { file_diff->added = 1; @@ -586,10 +593,10 @@ static int parse_diff(struct add_p_state *s, const struct pathspec *ps) (int)(eol - (plain->buf + file_diff->head.start)), plain->buf + file_diff->head.start); - if ((marker == '-' || marker == '+') && *p == ' ') + if ((marker == '-' || marker == '+') && ch == ' ') hunk->splittable_into++; - if (marker && *p != '\\') - marker = *p; + if (marker && ch != '\\') + marker = ch; p = eol == pend ? pend : eol + 1; hunk->end = p - plain->buf; @@ -953,7 +960,7 @@ static int split_hunk(struct add_p_state *s, struct file_diff *file_diff, context_line_count = 0; while (splittable_into > 1) { - ch = s->plain.buf[current]; + ch = normalize_marker(s->plain.buf + current); if (!ch) BUG("buffer overrun while splitting hunks"); diff --git a/t/t3701-add-interactive.sh b/t/t3701-add-interactive.sh index 5d78868ac16..351dd2b4332 100755 --- a/t/t3701-add-interactive.sh +++ b/t/t3701-add-interactive.sh @@ -1164,4 +1164,15 @@ test_expect_success 'reset -p with unmerged files' ' test_must_be_empty staged ' +test_expect_success 'hunk splitting works with diff.suppressBlankEmpty' ' + test_config diff.suppressBlankEmpty true && + test_write_lines a b c "" d e f >file && + git add file && + test_write_lines p q r "" s t u >file && + test_write_lines s n y q | git add -p && + git cat-file blob :file >actual && + test_write_lines a b c "" s t u >expect && + test_cmp expect actual +' + test_done
From: Phillip Wood <phillip.wood@dunelm.org.uk> When "add -p" parses diffs, it looks for context lines starting with a single space. But when diff.suppressBlankEmpty is in effect, an empty context line will omit the space, giving us a true empty line. This confuses the parser, which is unable to split based on such a line. It's tempting to say that we should just make sure that we generate a base-commit: 790a17fb19d6eadd16c52e5d284a5c6921744766