diff mbox series

git diff -I<regex> does not work with empty +/- lines

Message ID 20211128091521.7ysocurtt4qlgcf6@gmail.com (mailing list archive)
State New, archived
Headers show
Series git diff -I<regex> does not work with empty +/- lines | expand

Commit Message

Johannes Altmanninger Nov. 28, 2021, 9:15 a.m. UTC

Comments

Junio C Hamano Nov. 28, 2021, 11:26 p.m. UTC | #1
Johannes Altmanninger <aclopte@gmail.com> writes:

> diff -I<regex> suppresses hunks where all +/- lines match <regex>.
> it is useful to filter away boilerplate changes.
>
> Unfortunately, it doesn't help if a hunk has a blank line, because the one
> obvious way to filter out blank lines (^$) match *every* line, surprisingly.
> This is because for a line "01\n"
> we have a zero-width match here ^$ (offset 3).
>
> IOW, while we succesfully ignore deleted blank lines
>
> 	printf '\n' | git diff --no-index - /dev/null -I'^$'
> 	diff --git a/- b/-
> 	deleted file mode 100644
>
> we also ignore non-blank lines (very surprising)
>
> 	printf 'non-blank-line\n' | git diff --no-index - /dev/null -I'^$'
> 	diff --git a/- b/-
> 	deleted file mode 100644
>
> unless they don't end in a newline (special case)
>
> 	printf 'line without ending newline' | git diff --no-index - /dev/null -I'^$'
> 	diff --git a/- b/-
> 	deleted file mode 100644
> 	--- a/-
> 	+++ /dev/null
> 	@@ -1 +0,0 @@
> 	-line without ending newline
> 	\ No newline at end of file
>
> This patch fixes the second example. Is this the right direction?

I do not know where in the code the breakage in the first example
comes from.  It sounds like a bug if a pattern is not matched
honoring the anchor, whether the beginning-of-line "^" or the
end-of-line "$" one.

> Do we want to honor core.eol, so we preserve the \r when we have Unix endings?

Absolutely.  I think the code paths that work on blob data (i.e. Git
internal, before the smudge operation is applied to externalize it
to a working tree file) should only consider LF as the end of line
(and any existing code that doesn't is a bug).

> In any case -I<regex> won't be able to discern between "line\n" and "line"
> but that's not important to me.
>
> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
> index a4542c05b6..23325022b9 100644
> --- a/xdiff/xdiffi.c
> +++ b/xdiff/xdiffi.c
> @@ -1016,10 +1016,17 @@ static void xdl_mark_ignorable_lines(xdchange_t *xscr, xdfenv_t *xe, long flags)
>  static int record_matches_regex(xrecord_t *rec, xpparam_t const *xpp) {
>  	regmatch_t regmatch;
>  	int i;
> +	const char *end = rec->ptr + rec->size;
> +
> +	if (rec->size >= 2 && end[-2] == '\r' && end[-1] == '\n') {
> +		end -= 2;
> +	} else if (rec->size && end[-1] == '\n') {
> +		end -= 1;
> +	}
>  
>  	for (i = 0; i < xpp->ignore_regex_nr; i++)
> -		if (!regexec_buf(xpp->ignore_regex[i], rec->ptr, rec->size, 1,
> -				 &regmatch, 0))
> +		if (!regexec_buf(xpp->ignore_regex[i], rec->ptr,
> +				 end - rec->ptr, 1, &regmatch, 0))
>  			return 1;
>  
>  	return 0;
Johannes Altmanninger Nov. 29, 2021, 4:54 a.m. UTC | #2
On Sun, Nov 28, 2021 at 03:26:10PM -0800, Junio C Hamano wrote:
> Johannes Altmanninger <aclopte@gmail.com> writes:
> 
> > diff -I<regex> suppresses hunks where all +/- lines match <regex>.
> > it is useful to filter away boilerplate changes.
> >
> > Unfortunately, it doesn't help if a hunk has a blank line, because the one
> > obvious way to filter out blank lines (^$) match *every* line, surprisingly.
> > This is because for a line "01\n"
> > we have a zero-width match here ^$ (offset 3).
> >
> > IOW, while we succesfully ignore deleted blank lines
> >
> > 	printf '\n' | git diff --no-index - /dev/null -I'^$'
> > 	diff --git a/- b/-
> > 	deleted file mode 100644
> >
> > we also ignore non-blank lines (very surprising)
> >
> > 	printf 'non-blank-line\n' | git diff --no-index - /dev/null -I'^$'
> > 	diff --git a/- b/-
> > 	deleted file mode 100644
> >
> > unless they don't end in a newline (special case)
> >
> > 	printf 'line without ending newline' | git diff --no-index - /dev/null -I'^$'
> > 	diff --git a/- b/-
> > 	deleted file mode 100644
> > 	--- a/-
> > 	+++ /dev/null
> > 	@@ -1 +0,0 @@
> > 	-line without ending newline
> > 	\ No newline at end of file
> >
> > This patch fixes the second example. Is this the right direction?
> 
> I do not know where in the code the breakage in the first example
> comes from.  It sounds like a bug if a pattern is not matched
> honoring the anchor, whether the beginning-of-line "^" or the
> end-of-line "$" one.

THe first example (printf '\n' | ... -I'^$') works fine AFAICT.
The regex matches the empty line, so the hunk is ignored
(but the file header is still printed).
Only the second example (printf 'non-blank-line\n' | ... -I'^$')
shows that we ignore too much, because ^$ (incorrectly) matches,
but it should only match blank lines.

Thanks
diff mbox series

Patch

diff -I<regex> suppresses hunks where all +/- lines match <regex>.
it is useful to filter away boilerplate changes.

Unfortunately, it doesn't help if a hunk has a blank line, because the one
obvious way to filter out blank lines (^$) match *every* line, surprisingly.
This is because for a line "01\n"
we have a zero-width match here ^$ (offset 3).

IOW, while we succesfully ignore deleted blank lines

	printf '\n' | git diff --no-index - /dev/null -I'^$'
	diff --git a/- b/-
	deleted file mode 100644

we also ignore non-blank lines (very surprising)

	printf 'non-blank-line\n' | git diff --no-index - /dev/null -I'^$'
	diff --git a/- b/-
	deleted file mode 100644

unless they don't end in a newline (special case)

	printf 'line without ending newline' | git diff --no-index - /dev/null -I'^$'
	diff --git a/- b/-
	deleted file mode 100644
	--- a/-
	+++ /dev/null
	@@ -1 +0,0 @@
	-line without ending newline
	\ No newline at end of file

This patch fixes the second example. Is this the right direction?
Do we want to honor core.eol, so we preserve the \r when we have Unix endings?

In any case -I<regex> won't be able to discern between "line\n" and "line"
but that's not important to me.

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index a4542c05b6..23325022b9 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -1016,10 +1016,17 @@  static void xdl_mark_ignorable_lines(xdchange_t *xscr, xdfenv_t *xe, long flags)
 static int record_matches_regex(xrecord_t *rec, xpparam_t const *xpp) {
 	regmatch_t regmatch;
 	int i;
+	const char *end = rec->ptr + rec->size;
+
+	if (rec->size >= 2 && end[-2] == '\r' && end[-1] == '\n') {
+		end -= 2;
+	} else if (rec->size && end[-1] == '\n') {
+		end -= 1;
+	}
 
 	for (i = 0; i < xpp->ignore_regex_nr; i++)
-		if (!regexec_buf(xpp->ignore_regex[i], rec->ptr, rec->size, 1,
-				 &regmatch, 0))
+		if (!regexec_buf(xpp->ignore_regex[i], rec->ptr,
+				 end - rec->ptr, 1, &regmatch, 0))
 			return 1;
 
 	return 0;