diff mbox series

xdiff: avoid arithmetic overflow in xdl_get_hunk()

Message ID 4e9b6b4c-aaa1-4c6f-93f4-7bb04607e843@web.de (mailing list archive)
State New
Headers show
Series xdiff: avoid arithmetic overflow in xdl_get_hunk() | expand

Commit Message

René Scharfe March 14, 2025, 10 p.m. UTC
xdl_get_hunk() calculates the maximum number of common lines between two
changes that would fit into the same hunk for the given context options.
It involves doubling and addition and thus can overflow if the terms are
huge.

The type of ctxlen and interhunkctxlen in xdemitconf_t is long, while
the type of the corresponding context and interhunkcontext in struct
diff_options is int.  On many platforms longs are bigger that ints,
which prevents the overflow.  On Windows they have the same range and
the overflow manifests as hunks that are split erroneously and lines
being repeated between them.

Fix the overflow by checking and not going beyond LONG_MAX.  This allows
specifying a huge context line count and getting all lines of a changed
files in a single hunk, as expected.

Reported-by: Jason Cho <jason11choca@proton.me>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 t/t4055-diff-context.sh | 10 ++++++++++
 xdiff/xemit.c           |  8 +++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

--
2.48.1

Comments

Junio C Hamano March 14, 2025, 10:28 p.m. UTC | #1
René Scharfe <l.s.r@web.de> writes:

> xdl_get_hunk() calculates the maximum number of common lines between two
> changes that would fit into the same hunk for the given context options.
> It involves doubling and addition and thus can overflow if the terms are
> huge.
>
> The type of ctxlen and interhunkctxlen in xdemitconf_t is long, while
> the type of the corresponding context and interhunkcontext in struct
> diff_options is int.  On many platforms longs are bigger that ints,
> which prevents the overflow.  On Windows they have the same range and
> the overflow manifests as hunks that are split erroneously and lines
> being repeated between them.
>
> Fix the overflow by checking and not going beyond LONG_MAX.  This allows
> specifying a huge context line count and getting all lines of a changed
> files in a single hunk, as expected.
>
> Reported-by: Jason Cho <jason11choca@proton.me>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  t/t4055-diff-context.sh | 10 ++++++++++
>  xdiff/xemit.c           |  8 +++++++-
>  2 files changed, 17 insertions(+), 1 deletion(-)

Oh, I love a patch like this that is well thought out to carefully
check the bounds, instead of blindly say "ah, counting number of
things in size_t solves everything" ;-)

> diff --git a/xdiff/xemit.c b/xdiff/xemit.c
> index f8e3f25b03..1d40c9cb40 100644
> --- a/xdiff/xemit.c
> +++ b/xdiff/xemit.c
> @@ -43,6 +43,10 @@ static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *
>  	return 0;
>  }
>
> +static long saturating_add(long a, long b)
> +{
> +	return signed_add_overflows(a, b) ? LONG_MAX : a + b;
> +}
>
>  /*
>   * Starting at the passed change atom, find the latest change atom to be included
> @@ -52,7 +56,9 @@ static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *
>  xdchange_t *xdl_get_hunk(xdchange_t **xscr, xdemitconf_t const *xecfg)
>  {
>  	xdchange_t *xch, *xchp, *lxch;
> -	long max_common = 2 * xecfg->ctxlen + xecfg->interhunkctxlen;
> +	long max_common = saturating_add(saturating_add(xecfg->ctxlen,
> +							xecfg->ctxlen),
> +					 xecfg->interhunkctxlen);

Looking good.

Thanks.  Will queue.
Jason Cho March 14, 2025, 11:14 p.m. UTC | #2
Dear René,

What a thorough analysis. I didn't know this bug is only on Windows.

Thank you for submitting the fix. I am excited to see it in a new release.

For now I will cherrypick your patch to my fork of git.

Best regards,
Jason Cho


On Friday, March 14th, 2025 at 11:00 PM, René Scharfe <l.s.r@web.de> wrote:

> xdl_get_hunk() calculates the maximum number of common lines between two
> changes that would fit into the same hunk for the given context options.
> It involves doubling and addition and thus can overflow if the terms are
> huge.
> 
> The type of ctxlen and interhunkctxlen in xdemitconf_t is long, while
> the type of the corresponding context and interhunkcontext in struct
> diff_options is int. On many platforms longs are bigger that ints,
> which prevents the overflow. On Windows they have the same range and
> the overflow manifests as hunks that are split erroneously and lines
> being repeated between them.
> 
> Fix the overflow by checking and not going beyond LONG_MAX. This allows
> specifying a huge context line count and getting all lines of a changed
> files in a single hunk, as expected.
> 
> Reported-by: Jason Cho jason11choca@proton.me
> 
> Signed-off-by: René Scharfe l.s.r@web.de
> 
> ---
> t/t4055-diff-context.sh | 10 ++++++++++
> xdiff/xemit.c | 8 +++++++-
> 2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/t/t4055-diff-context.sh b/t/t4055-diff-context.sh
> index f7ff234cf9..ec2804eea6 100755
> --- a/t/t4055-diff-context.sh
> +++ b/t/t4055-diff-context.sh
> @@ -89,4 +89,14 @@ test_expect_success '-U0 is valid, so is diff.context=0' '
> grep "^+MODIFIED" output
> '
> 
> +test_expect_success '-U2147483647 works' '
> + echo APPENDED >>x &&
> 
> + test_line_count = 16 x &&
> + git diff -U2147483647 >output &&
> 
> + test_line_count = 22 output &&
> + grep "^-ADDED" output &&
> + grep "^+MODIFIED" output &&
> + grep "^+APPENDED" output
> +'
> +
> test_done
> diff --git a/xdiff/xemit.c b/xdiff/xemit.c
> index f8e3f25b03..1d40c9cb40 100644
> --- a/xdiff/xemit.c
> +++ b/xdiff/xemit.c
> @@ -43,6 +43,10 @@ static int xdl_emit_record(xdfile_t *xdf, long ri, char const pre, xdemitcb_t *
> return 0;
> }
> 
> +static long saturating_add(long a, long b)
> +{
> + return signed_add_overflows(a, b) ? LONG_MAX : a + b;
> +}
> 
> /
> * Starting at the passed change atom, find the latest change atom to be included
> @@ -52,7 +56,9 @@ static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *
> xdchange_t *xdl_get_hunk(xdchange_t **xscr, xdemitconf_t const *xecfg)
> {
> xdchange_t *xch, *xchp, *lxch;
> - long max_common = 2 * xecfg->ctxlen + xecfg->interhunkctxlen;
> 
> + long max_common = saturating_add(saturating_add(xecfg->ctxlen,
> 
> + xecfg->ctxlen),
> 
> + xecfg->interhunkctxlen);
> 
> long max_ignorable = xecfg->ctxlen;
> 
> long ignored = 0; /* number of ignored blank lines */
> 
> --
> 2.48.1
René Scharfe March 15, 2025, 6:38 a.m. UTC | #3
Am 14.03.25 um 23:28 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>>  t/t4055-diff-context.sh | 10 ++++++++++
>>  xdiff/xemit.c           |  8 +++++++-
>>  2 files changed, 17 insertions(+), 1 deletion(-)
>
> Oh, I love a patch like this that is well thought out to carefully
> check the bounds, instead of blindly say "ah, counting number of
> things in size_t solves everything" ;-)

Converting xdiff from long to size_t is still a good idea, I think, but
would be lot more effort and thus more risky.  Comparisons to upstream
would become a lot more noisy as well.

René
Junio C Hamano March 16, 2025, 7:53 p.m. UTC | #4
René Scharfe <l.s.r@web.de> writes:

> Am 14.03.25 um 23:28 schrieb Junio C Hamano:
>> René Scharfe <l.s.r@web.de> writes:
>>
>>>  t/t4055-diff-context.sh | 10 ++++++++++
>>>  xdiff/xemit.c           |  8 +++++++-
>>>  2 files changed, 17 insertions(+), 1 deletion(-)
>>
>> Oh, I love a patch like this that is well thought out to carefully
>> check the bounds, instead of blindly say "ah, counting number of
>> things in size_t solves everything" ;-)
>
> Converting xdiff from long to size_t is still a good idea, I think, ...

Oh, no question about that, especially if the number of counted
things are somehow proportionally related to the size of in-core
memory regions in any way.

What I do *not* like is the recent trend in the patches I see.  They
stop thinking there once they blindly replace int or unsigned or
whatever with size_t and the compiler stops warning.  Even though
compiler warnings can be a useful tool when there is very little
false positives, they are merely tools to improve the code, but I
see more and more confused patches that seem to think squelching
warnings is the goal in itself, without thinking if the resulting
code is actually improved.

And I didn't see that in this patch.  The patch was actually written
with real goal of improving the code in mind.

> but
> would be lot more effort and thus more risky.  

Perhaps, perhaps not.

> Comparisons to upstream
> would become a lot more noisy as well.

I am not sure how much of that matters these days, though.  Are they
still active, or is the code perfect and pretty much done?  I somehow
had the impression it has been the latter for a long time...

Thanks.
René Scharfe March 17, 2025, 4:50 p.m. UTC | #5
Am 16.03.25 um 20:53 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> Comparisons to upstream
>> would become a lot more noisy as well.
>
> I am not sure how much of that matters these days, though.  Are they
> still active, or is the code perfect and pretty much done?  I somehow
> had the impression it has been the latter for a long time...

http://www.xmailserver.org/xdiff-lib.html offers libxdiff-0.23.tar.gz,
whose entries bear the timestamp 2008-11-12.  Solid.

René
diff mbox series

Patch

diff --git a/t/t4055-diff-context.sh b/t/t4055-diff-context.sh
index f7ff234cf9..ec2804eea6 100755
--- a/t/t4055-diff-context.sh
+++ b/t/t4055-diff-context.sh
@@ -89,4 +89,14 @@  test_expect_success '-U0 is valid, so is diff.context=0' '
 	grep "^+MODIFIED" output
 '

+test_expect_success '-U2147483647 works' '
+	echo APPENDED >>x &&
+	test_line_count = 16 x &&
+	git diff -U2147483647 >output &&
+	test_line_count = 22 output &&
+	grep "^-ADDED" output &&
+	grep "^+MODIFIED" output &&
+	grep "^+APPENDED" output
+'
+
 test_done
diff --git a/xdiff/xemit.c b/xdiff/xemit.c
index f8e3f25b03..1d40c9cb40 100644
--- a/xdiff/xemit.c
+++ b/xdiff/xemit.c
@@ -43,6 +43,10 @@  static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *
 	return 0;
 }

+static long saturating_add(long a, long b)
+{
+	return signed_add_overflows(a, b) ? LONG_MAX : a + b;
+}

 /*
  * Starting at the passed change atom, find the latest change atom to be included
@@ -52,7 +56,9 @@  static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *
 xdchange_t *xdl_get_hunk(xdchange_t **xscr, xdemitconf_t const *xecfg)
 {
 	xdchange_t *xch, *xchp, *lxch;
-	long max_common = 2 * xecfg->ctxlen + xecfg->interhunkctxlen;
+	long max_common = saturating_add(saturating_add(xecfg->ctxlen,
+							xecfg->ctxlen),
+					 xecfg->interhunkctxlen);
 	long max_ignorable = xecfg->ctxlen;
 	long ignored = 0; /* number of ignored blank lines */