diff mbox series

cleanup: fix possible overflow errors in binary search, part 2

Message ID f0059613-d3e4-a1e2-2c01-727579e069d2@web.de (mailing list archive)
State New, archived
Headers show
Series cleanup: fix possible overflow errors in binary search, part 2 | expand

Commit Message

René Scharfe June 13, 2019, 5:51 p.m. UTC
Calculating the sum of two array indexes to find the midpoint between
them can overflow, i.e. code like this is unsafe for big arrays:

	mid = (first + last) >> 1;

Make sure the intermediate value stays within the boundaries instead,
like this:

	mid = first + ((last - first) >> 1);

The loop condition of the binary search makes sure that 'last' is
always greater than 'first', so this is safe as long as 'first' is
not negative.  And that can be verified easily using the pre-context
of each change, except for name-hash.c, so add an assertion to that
effect there.

The unsafe calculations were found with:

	git grep '(.*+.*) *>> *1'

This is a continuation of 19716b21a4 (cleanup: fix possible overflow
errors in binary search, 2017-10-08).

Signed-off-by: Rene Scharfe <l.s.r@web.de>
---
 builtin/ls-files.c  | 2 +-
 diffcore-rename.c   | 4 ++--
 dir.c               | 2 +-
 name-hash.c         | 3 ++-
 read-cache.c        | 2 +-
 sh-i18n--envsubst.c | 2 +-
 6 files changed, 8 insertions(+), 7 deletions(-)

--
2.22.0

Comments

Derrick Stolee June 13, 2019, 6:06 p.m. UTC | #1
On 6/13/2019 1:51 PM, René Scharfe wrote:
> Calculating the sum of two array indexes to find the midpoint between
> them can overflow, i.e. code like this is unsafe for big arrays:
> 
> 	mid = (first + last) >> 1;
> 
> Make sure the intermediate value stays within the boundaries instead,
> like this:
> 
> 	mid = first + ((last - first) >> 1);
> 
> The loop condition of the binary search makes sure that 'last' is
> always greater than 'first', so this is safe as long as 'first' is
> not negative.  And that can be verified easily using the pre-context
> of each change, except for name-hash.c, so add an assertion to that
> effect there.
> 
> The unsafe calculations were found with:
> 
> 	git grep '(.*+.*) *>> *1'
> 
> This is a continuation of 19716b21a4 (cleanup: fix possible overflow
> errors in binary search, 2017-10-08).

Thank you for finding these additional examples!

LGTM. Pretty easy to see this is correct.

-Stolee
Martin Ågren June 13, 2019, 7:42 p.m. UTC | #2
On Thu, 13 Jun 2019 at 19:54, René Scharfe <l.s.r@web.de> wrote:
>
> Calculating the sum of two array indexes to find the midpoint between
> them can overflow, i.e. code like this is unsafe for big arrays:
>
>         mid = (first + last) >> 1;
>
> Make sure the intermediate value stays within the boundaries instead,
> like this:
>
>         mid = first + ((last - first) >> 1);
>
> The loop condition of the binary search makes sure that 'last' is
> always greater than 'first', so this is safe as long as 'first' is
> not negative.  And that can be verified easily using the pre-context
> of each change, except for name-hash.c, so add an assertion to that
> effect there.

Right, with "safe", one might mean something like "no undefined behavior
due to shifting a signed value with the high bit set". Especially since
we're worrying about overflows, we're obviously having large values in
mind, so we're right to consider the sign bit. But, we're fine as you
note.  Because we subtract, and `last` doesn't have its sign bit set,
and `first` is non-negative and not greater than `last`, the sign bit of
`(last - first)` is always zero.

So all is well. But maybe we should write `(last - first) / 2` anyway.
We could then drop the extra parenthesis, and we would keep future
readers (and static analysis?) from wondering whether we might ever be
shifting a signed value with the sign bit set. A few spots fewer to
audit in the future...

Martin
René Scharfe June 13, 2019, 9:33 p.m. UTC | #3
Am 13.06.19 um 21:42 schrieb Martin Ågren:
> On Thu, 13 Jun 2019 at 19:54, René Scharfe <l.s.r@web.de> wrote:
>>
>> Calculating the sum of two array indexes to find the midpoint between
>> them can overflow, i.e. code like this is unsafe for big arrays:
>>
>>         mid = (first + last) >> 1;
>>
>> Make sure the intermediate value stays within the boundaries instead,
>> like this:
>>
>>         mid = first + ((last - first) >> 1);
>>
>> The loop condition of the binary search makes sure that 'last' is
>> always greater than 'first', so this is safe as long as 'first' is
>> not negative.  And that can be verified easily using the pre-context
>> of each change, except for name-hash.c, so add an assertion to that
>> effect there.
>
> Right, with "safe", one might mean something like "no undefined behavior
> due to shifting a signed value with the high bit set". Especially since
> we're worrying about overflows, we're obviously having large values in
> mind, so we're right to consider the sign bit. But, we're fine as you
> note.  Because we subtract, and `last` doesn't have its sign bit set,
> and `first` is non-negative and not greater than `last`, the sign bit of
> `(last - first)` is always zero.
>
> So all is well. But maybe we should write `(last - first) / 2` anyway.
> We could then drop the extra parenthesis, and we would keep future
> readers (and static analysis?) from wondering whether we might ever be
> shifting a signed value with the sign bit set. A few spots fewer to
> audit in the future...

Yes, thought about that.  When I saw Clang 8 emitting extra opcodes for
handling the sign for the version with division I decided to restrict
the patch to just do overflow prevention and leave the right shifts in
place..

René
Martin Ågren June 14, 2019, 4:18 a.m. UTC | #4
On Thu, 13 Jun 2019 at 23:33, René Scharfe <l.s.r@web.de> wrote:
>
> Am 13.06.19 um 21:42 schrieb Martin Ågren:
> > On Thu, 13 Jun 2019 at 19:54, René Scharfe <l.s.r@web.de> wrote:

> >> Make sure the intermediate value stays within the boundaries instead,
> >> like this:
> >>
> >>         mid = first + ((last - first) >> 1);
> >>
> >> The loop condition of the binary search makes sure that 'last' is
> >> always greater than 'first', so this is safe as long as 'first' is
> >> not negative.  And that can be verified easily using the pre-context
> >> of each change, except for name-hash.c, so add an assertion to that
> >> effect there.

> > So all is well. But maybe we should write `(last - first) / 2` anyway.
> > We could then drop the extra parenthesis, and we would keep future
> > readers (and static analysis?) from wondering whether we might ever be
> > shifting a signed value with the sign bit set. A few spots fewer to
> > audit in the future...
>
> Yes, thought about that.  When I saw Clang 8 emitting extra opcodes for
> handling the sign for the version with division I decided to restrict
> the patch to just do overflow prevention and leave the right shifts in
> place..

Ah, ok, I should have known you were on top of it. I guess that means
that clang isn't able to figure out we're guaranteed to be working with
non-negative numbers. Which I guess means that it can't be sure the
shift is safe, but with undefined behavior it has the leeway it needs,
so...

Martin
diff mbox series

Patch

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 7f83c9a6f2..670e8fb93c 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -373,7 +373,7 @@  static void prune_index(struct index_state *istate,
 	first = pos;
 	last = istate->cache_nr;
 	while (last > first) {
-		int next = (last + first) >> 1;
+		int next = first + ((last - first) >> 1);
 		const struct cache_entry *ce = istate->cache[next];
 		if (!strncmp(ce->name, prefix, prefixlen)) {
 			first = next+1;
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 07bd34b631..6af92d5eba 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -23,7 +23,7 @@  static int find_rename_dst(struct diff_filespec *two)
 	first = 0;
 	last = rename_dst_nr;
 	while (last > first) {
-		int next = (last + first) >> 1;
+		int next = first + ((last - first) >> 1);
 		struct diff_rename_dst *dst = &(rename_dst[next]);
 		int cmp = strcmp(two->path, dst->two->path);
 		if (!cmp)
@@ -83,7 +83,7 @@  static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
 	first = 0;
 	last = rename_src_nr;
 	while (last > first) {
-		int next = (last + first) >> 1;
+		int next = first + ((last - first) >> 1);
 		struct diff_rename_src *src = &(rename_src[next]);
 		int cmp = strcmp(one->path, src->p->one->path);
 		if (!cmp)
diff --git a/dir.c b/dir.c
index ba4a51c296..d021c908e5 100644
--- a/dir.c
+++ b/dir.c
@@ -701,7 +701,7 @@  static struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
 	first = 0;
 	last = dir->dirs_nr;
 	while (last > first) {
-		int cmp, next = (last + first) >> 1;
+		int cmp, next = first + ((last - first) >> 1);
 		d = dir->dirs[next];
 		cmp = strncmp(name, d->name, len);
 		if (!cmp && strlen(d->name) > len)
diff --git a/name-hash.c b/name-hash.c
index b4861bc7b0..695908609f 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -345,8 +345,9 @@  static int handle_range_dir(
 	else {
 		int begin = k_start;
 		int end = k_end;
+		assert(begin >= 0);
 		while (begin < end) {
-			int mid = (begin + end) >> 1;
+			int mid = begin + ((end - begin) >> 1);
 			int cmp = strncmp(istate->cache[mid]->name, prefix->buf, prefix->len);
 			if (cmp == 0) /* mid has same prefix; look in second part */
 				begin = mid + 1;
diff --git a/read-cache.c b/read-cache.c
index 22e7b9944e..4f81fca320 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -549,7 +549,7 @@  static int index_name_stage_pos(const struct index_state *istate, const char *na
 	first = 0;
 	last = istate->cache_nr;
 	while (last > first) {
-		int next = (last + first) >> 1;
+		int next = first + ((last - first) >> 1);
 		struct cache_entry *ce = istate->cache[next];
 		int cmp = cache_name_stage_compare(name, namelen, stage, ce->name, ce_namelen(ce), ce_stage(ce));
 		if (!cmp)
diff --git a/sh-i18n--envsubst.c b/sh-i18n--envsubst.c
index cecfdd36c7..e7430b9aa8 100644
--- a/sh-i18n--envsubst.c
+++ b/sh-i18n--envsubst.c
@@ -249,7 +249,7 @@  sorted_string_list_member (const string_list_ty *slp, const char *s)
 	{
 	  /* Here we know that if s is in the list, it is at an index j
 	     with j1 <= j < j2.  */
-	  size_t j = (j1 + j2) >> 1;
+	  size_t j = j1 + ((j2 - j1) >> 1);
 	  int result = strcmp (slp->item[j], s);

 	  if (result > 0)