Message ID | 20211128091521.7ysocurtt4qlgcf6@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | git diff -I<regex> does not work with empty +/- lines | expand |
Johannes Altmanninger <aclopte@gmail.com> writes: > diff -I<regex> suppresses hunks where all +/- lines match <regex>. > it is useful to filter away boilerplate changes. > > Unfortunately, it doesn't help if a hunk has a blank line, because the one > obvious way to filter out blank lines (^$) match *every* line, surprisingly. > This is because for a line "01\n" > we have a zero-width match here ^$ (offset 3). > > IOW, while we succesfully ignore deleted blank lines > > printf '\n' | git diff --no-index - /dev/null -I'^$' > diff --git a/- b/- > deleted file mode 100644 > > we also ignore non-blank lines (very surprising) > > printf 'non-blank-line\n' | git diff --no-index - /dev/null -I'^$' > diff --git a/- b/- > deleted file mode 100644 > > unless they don't end in a newline (special case) > > printf 'line without ending newline' | git diff --no-index - /dev/null -I'^$' > diff --git a/- b/- > deleted file mode 100644 > --- a/- > +++ /dev/null > @@ -1 +0,0 @@ > -line without ending newline > \ No newline at end of file > > This patch fixes the second example. Is this the right direction? I do not know where in the code the breakage in the first example comes from. It sounds like a bug if a pattern is not matched honoring the anchor, whether the beginning-of-line "^" or the end-of-line "$" one. > Do we want to honor core.eol, so we preserve the \r when we have Unix endings? Absolutely. I think the code paths that work on blob data (i.e. Git internal, before the smudge operation is applied to externalize it to a working tree file) should only consider LF as the end of line (and any existing code that doesn't is a bug). > In any case -I<regex> won't be able to discern between "line\n" and "line" > but that's not important to me. > > diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c > index a4542c05b6..23325022b9 100644 > --- a/xdiff/xdiffi.c > +++ b/xdiff/xdiffi.c > @@ -1016,10 +1016,17 @@ static void xdl_mark_ignorable_lines(xdchange_t *xscr, xdfenv_t *xe, long flags) > static int record_matches_regex(xrecord_t *rec, xpparam_t const *xpp) { > regmatch_t regmatch; > int i; > + const char *end = rec->ptr + rec->size; > + > + if (rec->size >= 2 && end[-2] == '\r' && end[-1] == '\n') { > + end -= 2; > + } else if (rec->size && end[-1] == '\n') { > + end -= 1; > + } > > for (i = 0; i < xpp->ignore_regex_nr; i++) > - if (!regexec_buf(xpp->ignore_regex[i], rec->ptr, rec->size, 1, > - ®match, 0)) > + if (!regexec_buf(xpp->ignore_regex[i], rec->ptr, > + end - rec->ptr, 1, ®match, 0)) > return 1; > > return 0;
On Sun, Nov 28, 2021 at 03:26:10PM -0800, Junio C Hamano wrote: > Johannes Altmanninger <aclopte@gmail.com> writes: > > > diff -I<regex> suppresses hunks where all +/- lines match <regex>. > > it is useful to filter away boilerplate changes. > > > > Unfortunately, it doesn't help if a hunk has a blank line, because the one > > obvious way to filter out blank lines (^$) match *every* line, surprisingly. > > This is because for a line "01\n" > > we have a zero-width match here ^$ (offset 3). > > > > IOW, while we succesfully ignore deleted blank lines > > > > printf '\n' | git diff --no-index - /dev/null -I'^$' > > diff --git a/- b/- > > deleted file mode 100644 > > > > we also ignore non-blank lines (very surprising) > > > > printf 'non-blank-line\n' | git diff --no-index - /dev/null -I'^$' > > diff --git a/- b/- > > deleted file mode 100644 > > > > unless they don't end in a newline (special case) > > > > printf 'line without ending newline' | git diff --no-index - /dev/null -I'^$' > > diff --git a/- b/- > > deleted file mode 100644 > > --- a/- > > +++ /dev/null > > @@ -1 +0,0 @@ > > -line without ending newline > > \ No newline at end of file > > > > This patch fixes the second example. Is this the right direction? > > I do not know where in the code the breakage in the first example > comes from. It sounds like a bug if a pattern is not matched > honoring the anchor, whether the beginning-of-line "^" or the > end-of-line "$" one. THe first example (printf '\n' | ... -I'^$') works fine AFAICT. The regex matches the empty line, so the hunk is ignored (but the file header is still printed). Only the second example (printf 'non-blank-line\n' | ... -I'^$') shows that we ignore too much, because ^$ (incorrectly) matches, but it should only match blank lines. Thanks
diff -I<regex> suppresses hunks where all +/- lines match <regex>. it is useful to filter away boilerplate changes. Unfortunately, it doesn't help if a hunk has a blank line, because the one obvious way to filter out blank lines (^$) match *every* line, surprisingly. This is because for a line "01\n" we have a zero-width match here ^$ (offset 3). IOW, while we succesfully ignore deleted blank lines printf '\n' | git diff --no-index - /dev/null -I'^$' diff --git a/- b/- deleted file mode 100644 we also ignore non-blank lines (very surprising) printf 'non-blank-line\n' | git diff --no-index - /dev/null -I'^$' diff --git a/- b/- deleted file mode 100644 unless they don't end in a newline (special case) printf 'line without ending newline' | git diff --no-index - /dev/null -I'^$' diff --git a/- b/- deleted file mode 100644 --- a/- +++ /dev/null @@ -1 +0,0 @@ -line without ending newline \ No newline at end of file This patch fixes the second example. Is this the right direction? Do we want to honor core.eol, so we preserve the \r when we have Unix endings? In any case -I<regex> won't be able to discern between "line\n" and "line" but that's not important to me. diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c index a4542c05b6..23325022b9 100644 --- a/xdiff/xdiffi.c +++ b/xdiff/xdiffi.c @@ -1016,10 +1016,17 @@ static void xdl_mark_ignorable_lines(xdchange_t *xscr, xdfenv_t *xe, long flags) static int record_matches_regex(xrecord_t *rec, xpparam_t const *xpp) { regmatch_t regmatch; int i; + const char *end = rec->ptr + rec->size; + + if (rec->size >= 2 && end[-2] == '\r' && end[-1] == '\n') { + end -= 2; + } else if (rec->size && end[-1] == '\n') { + end -= 1; + } for (i = 0; i < xpp->ignore_regex_nr; i++) - if (!regexec_buf(xpp->ignore_regex[i], rec->ptr, rec->size, 1, - ®match, 0)) + if (!regexec_buf(xpp->ignore_regex[i], rec->ptr, + end - rec->ptr, 1, ®match, 0)) return 1; return 0;