Message ID | f0059613-d3e4-a1e2-2c01-727579e069d2@web.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | cleanup: fix possible overflow errors in binary search, part 2 | expand |
On 6/13/2019 1:51 PM, René Scharfe wrote: > Calculating the sum of two array indexes to find the midpoint between > them can overflow, i.e. code like this is unsafe for big arrays: > > mid = (first + last) >> 1; > > Make sure the intermediate value stays within the boundaries instead, > like this: > > mid = first + ((last - first) >> 1); > > The loop condition of the binary search makes sure that 'last' is > always greater than 'first', so this is safe as long as 'first' is > not negative. And that can be verified easily using the pre-context > of each change, except for name-hash.c, so add an assertion to that > effect there. > > The unsafe calculations were found with: > > git grep '(.*+.*) *>> *1' > > This is a continuation of 19716b21a4 (cleanup: fix possible overflow > errors in binary search, 2017-10-08). Thank you for finding these additional examples! LGTM. Pretty easy to see this is correct. -Stolee
On Thu, 13 Jun 2019 at 19:54, René Scharfe <l.s.r@web.de> wrote: > > Calculating the sum of two array indexes to find the midpoint between > them can overflow, i.e. code like this is unsafe for big arrays: > > mid = (first + last) >> 1; > > Make sure the intermediate value stays within the boundaries instead, > like this: > > mid = first + ((last - first) >> 1); > > The loop condition of the binary search makes sure that 'last' is > always greater than 'first', so this is safe as long as 'first' is > not negative. And that can be verified easily using the pre-context > of each change, except for name-hash.c, so add an assertion to that > effect there. Right, with "safe", one might mean something like "no undefined behavior due to shifting a signed value with the high bit set". Especially since we're worrying about overflows, we're obviously having large values in mind, so we're right to consider the sign bit. But, we're fine as you note. Because we subtract, and `last` doesn't have its sign bit set, and `first` is non-negative and not greater than `last`, the sign bit of `(last - first)` is always zero. So all is well. But maybe we should write `(last - first) / 2` anyway. We could then drop the extra parenthesis, and we would keep future readers (and static analysis?) from wondering whether we might ever be shifting a signed value with the sign bit set. A few spots fewer to audit in the future... Martin
Am 13.06.19 um 21:42 schrieb Martin Ågren: > On Thu, 13 Jun 2019 at 19:54, René Scharfe <l.s.r@web.de> wrote: >> >> Calculating the sum of two array indexes to find the midpoint between >> them can overflow, i.e. code like this is unsafe for big arrays: >> >> mid = (first + last) >> 1; >> >> Make sure the intermediate value stays within the boundaries instead, >> like this: >> >> mid = first + ((last - first) >> 1); >> >> The loop condition of the binary search makes sure that 'last' is >> always greater than 'first', so this is safe as long as 'first' is >> not negative. And that can be verified easily using the pre-context >> of each change, except for name-hash.c, so add an assertion to that >> effect there. > > Right, with "safe", one might mean something like "no undefined behavior > due to shifting a signed value with the high bit set". Especially since > we're worrying about overflows, we're obviously having large values in > mind, so we're right to consider the sign bit. But, we're fine as you > note. Because we subtract, and `last` doesn't have its sign bit set, > and `first` is non-negative and not greater than `last`, the sign bit of > `(last - first)` is always zero. > > So all is well. But maybe we should write `(last - first) / 2` anyway. > We could then drop the extra parenthesis, and we would keep future > readers (and static analysis?) from wondering whether we might ever be > shifting a signed value with the sign bit set. A few spots fewer to > audit in the future... Yes, thought about that. When I saw Clang 8 emitting extra opcodes for handling the sign for the version with division I decided to restrict the patch to just do overflow prevention and leave the right shifts in place.. René
On Thu, 13 Jun 2019 at 23:33, René Scharfe <l.s.r@web.de> wrote: > > Am 13.06.19 um 21:42 schrieb Martin Ågren: > > On Thu, 13 Jun 2019 at 19:54, René Scharfe <l.s.r@web.de> wrote: > >> Make sure the intermediate value stays within the boundaries instead, > >> like this: > >> > >> mid = first + ((last - first) >> 1); > >> > >> The loop condition of the binary search makes sure that 'last' is > >> always greater than 'first', so this is safe as long as 'first' is > >> not negative. And that can be verified easily using the pre-context > >> of each change, except for name-hash.c, so add an assertion to that > >> effect there. > > So all is well. But maybe we should write `(last - first) / 2` anyway. > > We could then drop the extra parenthesis, and we would keep future > > readers (and static analysis?) from wondering whether we might ever be > > shifting a signed value with the sign bit set. A few spots fewer to > > audit in the future... > > Yes, thought about that. When I saw Clang 8 emitting extra opcodes for > handling the sign for the version with division I decided to restrict > the patch to just do overflow prevention and leave the right shifts in > place.. Ah, ok, I should have known you were on top of it. I guess that means that clang isn't able to figure out we're guaranteed to be working with non-negative numbers. Which I guess means that it can't be sure the shift is safe, but with undefined behavior it has the leeway it needs, so... Martin
diff --git a/builtin/ls-files.c b/builtin/ls-files.c index 7f83c9a6f2..670e8fb93c 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -373,7 +373,7 @@ static void prune_index(struct index_state *istate, first = pos; last = istate->cache_nr; while (last > first) { - int next = (last + first) >> 1; + int next = first + ((last - first) >> 1); const struct cache_entry *ce = istate->cache[next]; if (!strncmp(ce->name, prefix, prefixlen)) { first = next+1; diff --git a/diffcore-rename.c b/diffcore-rename.c index 07bd34b631..6af92d5eba 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -23,7 +23,7 @@ static int find_rename_dst(struct diff_filespec *two) first = 0; last = rename_dst_nr; while (last > first) { - int next = (last + first) >> 1; + int next = first + ((last - first) >> 1); struct diff_rename_dst *dst = &(rename_dst[next]); int cmp = strcmp(two->path, dst->two->path); if (!cmp) @@ -83,7 +83,7 @@ static struct diff_rename_src *register_rename_src(struct diff_filepair *p) first = 0; last = rename_src_nr; while (last > first) { - int next = (last + first) >> 1; + int next = first + ((last - first) >> 1); struct diff_rename_src *src = &(rename_src[next]); int cmp = strcmp(one->path, src->p->one->path); if (!cmp) diff --git a/dir.c b/dir.c index ba4a51c296..d021c908e5 100644 --- a/dir.c +++ b/dir.c @@ -701,7 +701,7 @@ static struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc, first = 0; last = dir->dirs_nr; while (last > first) { - int cmp, next = (last + first) >> 1; + int cmp, next = first + ((last - first) >> 1); d = dir->dirs[next]; cmp = strncmp(name, d->name, len); if (!cmp && strlen(d->name) > len) diff --git a/name-hash.c b/name-hash.c index b4861bc7b0..695908609f 100644 --- a/name-hash.c +++ b/name-hash.c @@ -345,8 +345,9 @@ static int handle_range_dir( else { int begin = k_start; int end = k_end; + assert(begin >= 0); while (begin < end) { - int mid = (begin + end) >> 1; + int mid = begin + ((end - begin) >> 1); int cmp = strncmp(istate->cache[mid]->name, prefix->buf, prefix->len); if (cmp == 0) /* mid has same prefix; look in second part */ begin = mid + 1; diff --git a/read-cache.c b/read-cache.c index 22e7b9944e..4f81fca320 100644 --- a/read-cache.c +++ b/read-cache.c @@ -549,7 +549,7 @@ static int index_name_stage_pos(const struct index_state *istate, const char *na first = 0; last = istate->cache_nr; while (last > first) { - int next = (last + first) >> 1; + int next = first + ((last - first) >> 1); struct cache_entry *ce = istate->cache[next]; int cmp = cache_name_stage_compare(name, namelen, stage, ce->name, ce_namelen(ce), ce_stage(ce)); if (!cmp) diff --git a/sh-i18n--envsubst.c b/sh-i18n--envsubst.c index cecfdd36c7..e7430b9aa8 100644 --- a/sh-i18n--envsubst.c +++ b/sh-i18n--envsubst.c @@ -249,7 +249,7 @@ sorted_string_list_member (const string_list_ty *slp, const char *s) { /* Here we know that if s is in the list, it is at an index j with j1 <= j < j2. */ - size_t j = (j1 + j2) >> 1; + size_t j = j1 + ((j2 - j1) >> 1); int result = strcmp (slp->item[j], s); if (result > 0)
Calculating the sum of two array indexes to find the midpoint between them can overflow, i.e. code like this is unsafe for big arrays: mid = (first + last) >> 1; Make sure the intermediate value stays within the boundaries instead, like this: mid = first + ((last - first) >> 1); The loop condition of the binary search makes sure that 'last' is always greater than 'first', so this is safe as long as 'first' is not negative. And that can be verified easily using the pre-context of each change, except for name-hash.c, so add an assertion to that effect there. The unsafe calculations were found with: git grep '(.*+.*) *>> *1' This is a continuation of 19716b21a4 (cleanup: fix possible overflow errors in binary search, 2017-10-08). Signed-off-by: Rene Scharfe <l.s.r@web.de> --- builtin/ls-files.c | 2 +- diffcore-rename.c | 4 ++-- dir.c | 2 +- name-hash.c | 3 ++- read-cache.c | 2 +- sh-i18n--envsubst.c | 2 +- 6 files changed, 8 insertions(+), 7 deletions(-) -- 2.22.0