Message ID | e16bf0c5212ae85daa0d6aa2c78d551824b542bd.1640199396.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Reftable coverity fixes | expand |
"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Han-Wen Nienhuys <hanwen@google.com> > > Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> > --- > reftable/record.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/reftable/record.c b/reftable/record.c > index fbaa1fbef56..423e687b220 100644 > --- a/reftable/record.c > +++ b/reftable/record.c > @@ -126,7 +126,8 @@ static int encode_string(char *str, struct string_view s) > string_view_consume(&s, n); > if (s.len < l) > return -1; > - memcpy(s.buf, str, l); > + if (l) > + memcpy(s.buf, str, l); > string_view_consume(&s, l); > > return start.len - s.len; > @@ -153,7 +154,9 @@ int reftable_encode_key(int *restart, struct string_view dest, > > if (dest.len < suffix_len) > return -1; > - memcpy(dest.buf, key.buf + prefix_len, suffix_len); > + > + if (suffix_len) > + memcpy(dest.buf, key.buf + prefix_len, suffix_len); > string_view_consume(&dest, suffix_len); > > return start.len - dest.len; > @@ -569,7 +572,8 @@ static int reftable_obj_record_decode(void *rec, struct strbuf key, > uint64_t last; > int j; > r->hash_prefix = reftable_malloc(key.len); > - memcpy(r->hash_prefix, key.buf, key.len); > + if (key.len) > + memcpy(r->hash_prefix, key.buf, key.len); > r->hash_prefix_len = key.len; > > if (val_type == 0) { I am not sure why any of these are needed. For a code path that involves a <ptr, len> pair, where ptr is lazily allocated only when we have contents to store, i.e. ptr can remain NULL until len becomes non-zero, memcpy(dst, ptr, len) can become a problem as the standard requires ptr to be a valid even when len is 0 (ISO/IEC 9899:1999, 7.21.1 String function conventions, paragraph 2), but none of these three calls to memcpy() do any such thing. Puzzled.
Am 22.12.21 um 23:50 schrieb Junio C Hamano: > "Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> From: Han-Wen Nienhuys <hanwen@google.com> >> >> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> >> --- >> reftable/record.c | 10 +++++++--- >> 1 file changed, 7 insertions(+), 3 deletions(-) >> >> diff --git a/reftable/record.c b/reftable/record.c >> index fbaa1fbef56..423e687b220 100644 >> --- a/reftable/record.c >> +++ b/reftable/record.c >> @@ -126,7 +126,8 @@ static int encode_string(char *str, struct string_view s) >> string_view_consume(&s, n); >> if (s.len < l) >> return -1; >> - memcpy(s.buf, str, l); >> + if (l) >> + memcpy(s.buf, str, l); >> string_view_consume(&s, l); >> >> return start.len - s.len; >> @@ -153,7 +154,9 @@ int reftable_encode_key(int *restart, struct string_view dest, >> >> if (dest.len < suffix_len) >> return -1; >> - memcpy(dest.buf, key.buf + prefix_len, suffix_len); >> + >> + if (suffix_len) >> + memcpy(dest.buf, key.buf + prefix_len, suffix_len); >> string_view_consume(&dest, suffix_len); >> >> return start.len - dest.len; >> @@ -569,7 +572,8 @@ static int reftable_obj_record_decode(void *rec, struct strbuf key, >> uint64_t last; >> int j; >> r->hash_prefix = reftable_malloc(key.len); >> - memcpy(r->hash_prefix, key.buf, key.len); >> + if (key.len) >> + memcpy(r->hash_prefix, key.buf, key.len); >> r->hash_prefix_len = key.len; >> >> if (val_type == 0) { > > I am not sure why any of these are needed. > > For a code path that involves a <ptr, len> pair, where ptr is lazily > allocated only when we have contents to store, i.e. ptr can remain > NULL until len becomes non-zero, memcpy(dst, ptr, len) can become a > problem as the standard requires ptr to be a valid even when len is > 0 (ISO/IEC 9899:1999, 7.21.1 String function conventions, paragraph > 2), but none of these three calls to memcpy() do any such thing. > > Puzzled. I don't know about the first two, but in the third case dst (i.e. r->hash_prefix) might be NULL if key.len == 0, reftable_malloc is malloc (which it is, because reftable_set_alloc is never called) and malloc(0) returns NULL (which it might do according to https://www.man7.org/linux/man-pages/man3/malloc.3.html). malloc can return NULL on failure, too, of course, and none of the reftable_malloc callers check for that. That seems a bit too optimistic. René
On Wed, Dec 22, 2021 at 11:50 PM Junio C Hamano <gitster@pobox.com> wrote: > > - memcpy(r->hash_prefix, key.buf, key.len); > > + if (key.len) > > + memcpy(r->hash_prefix, key.buf, key.len); > > r->hash_prefix_len = key.len; > > > > if (val_type == 0) { > > I am not sure why any of these are needed. I'm not sure they are needed, but IMO it's not worth spending brain cycles on deciding either way. Checking the length is always a safe alternative. I would support having a safe_memcpy() that does this check centrally (note how our array_copy() array function also does this check, even if it's not always needed.)
René Scharfe <l.s.r@web.de> writes: >>> @@ -569,7 +572,8 @@ static int reftable_obj_record_decode(void *rec, struct strbuf key, >>> uint64_t last; >>> int j; >>> r->hash_prefix = reftable_malloc(key.len); >>> - memcpy(r->hash_prefix, key.buf, key.len); >>> + if (key.len) >>> + memcpy(r->hash_prefix, key.buf, key.len); >>> r->hash_prefix_len = key.len; >>> >>> if (val_type == 0) { >> >> I am not sure why any of these are needed. >> ... > I don't know about the first two, but in the third case dst (i.e. > r->hash_prefix) might be NULL if key.len == 0, reftable_malloc is malloc > (which it is, because reftable_set_alloc is never called) and malloc(0) > returns NULL (which it might do according to > https://www.man7.org/linux/man-pages/man3/malloc.3.html). > > malloc can return NULL on failure, too, of course, and none of the > reftable_malloc callers check for that. That seems a bit too > optimistic. Yeah, I agree that the real bug in this code is that the returned value of malloc() is not checked. But checking if key.len is zero before using memcpy() would not help fix it, or hide it. So I am not sure if that is a counter-argument against "this check is pointless". Thanks.
Han-Wen Nienhuys <hanwen@google.com> writes: > On Wed, Dec 22, 2021 at 11:50 PM Junio C Hamano <gitster@pobox.com> wrote: > >> > - memcpy(r->hash_prefix, key.buf, key.len); >> > + if (key.len) >> > + memcpy(r->hash_prefix, key.buf, key.len); >> > r->hash_prefix_len = key.len; >> > >> > if (val_type == 0) { >> >> I am not sure why any of these are needed. > > I'm not sure they are needed, but IMO it's not worth spending brain > cycles on deciding either way. Checking the length is always a safe > alternative. > > I would support having a safe_memcpy() that does this check centrally > (note how our array_copy() array function also does this check, even > if it's not always needed.) But your safe_memcpy() should not be safe_memcpy(dst, src, n) { if (n) memcpy(dst, src, n); return dst; } Using memcpy() with size==0 is not a crime wrt the language. Passing an invalid pointer while doing so is. It should rather be safe_memcpy(dst, src, n) { if (dst) memcpy(dst, src, n); else if (n) BUG(...); return dst; } to support a common pattern of <ptr, len> that lazily allocates the ptr only when len goes more than zero from triggering an error from picky runtime, compiler and tools. We want to allow "dst==NULL && n==0", while still catching "dst==NULL && n!=0" as an error. And these places, I do not think use of safe_memcpy() is relevant.
Am 23.12.21 um 19:59 schrieb Junio C Hamano: > René Scharfe <l.s.r@web.de> writes: > >>>> @@ -569,7 +572,8 @@ static int reftable_obj_record_decode(void *rec, struct strbuf key, >>>> uint64_t last; >>>> int j; >>>> r->hash_prefix = reftable_malloc(key.len); >>>> - memcpy(r->hash_prefix, key.buf, key.len); >>>> + if (key.len) >>>> + memcpy(r->hash_prefix, key.buf, key.len); >>>> r->hash_prefix_len = key.len; >>>> >>>> if (val_type == 0) { >>> >>> I am not sure why any of these are needed. >>> ... >> I don't know about the first two, but in the third case dst (i.e. >> r->hash_prefix) might be NULL if key.len == 0, reftable_malloc is malloc >> (which it is, because reftable_set_alloc is never called) and malloc(0) >> returns NULL (which it might do according to >> https://www.man7.org/linux/man-pages/man3/malloc.3.html). >> >> malloc can return NULL on failure, too, of course, and none of the >> reftable_malloc callers check for that. That seems a bit too >> optimistic. > > Yeah, I agree that the real bug in this code is that the returned > value of malloc() is not checked. But checking if key.len is zero > before using memcpy() would not help fix it, or hide it. So I am > not sure if that is a counter-argument against "this check is > pointless". It's not -- I was just trying to find a scenario which would cause a Coverity warning that could be silenced by such a zero check. We can use xmalloc and xrealloc to handle allocation failures and to get valid pointers for zero-sized allocations, like in the patch below. I don't understand why reftable_set_alloc exists, though -- is there really a use case that requires changing the allocator at runtime? diff --git a/reftable/publicbasics.c b/reftable/publicbasics.c index 0ad7d5c0ff..438a1b0a7d 100644 --- a/reftable/publicbasics.c +++ b/reftable/publicbasics.c @@ -19,14 +19,14 @@ void *reftable_malloc(size_t sz) { if (reftable_malloc_ptr) return (*reftable_malloc_ptr)(sz); - return malloc(sz); + return xmalloc(sz); } void *reftable_realloc(void *p, size_t sz) { if (reftable_realloc_ptr) return (*reftable_realloc_ptr)(p, sz); - return realloc(p, sz); + return xrealloc(p, sz); } void reftable_free(void *p)
On Sun, Dec 26 2021, René Scharfe wrote: > Am 23.12.21 um 19:59 schrieb Junio C Hamano: >> René Scharfe <l.s.r@web.de> writes: >> >>>>> @@ -569,7 +572,8 @@ static int reftable_obj_record_decode(void *rec, struct strbuf key, >>>>> uint64_t last; >>>>> int j; >>>>> r->hash_prefix = reftable_malloc(key.len); >>>>> - memcpy(r->hash_prefix, key.buf, key.len); >>>>> + if (key.len) >>>>> + memcpy(r->hash_prefix, key.buf, key.len); >>>>> r->hash_prefix_len = key.len; >>>>> >>>>> if (val_type == 0) { >>>> >>>> I am not sure why any of these are needed. >>>> ... >>> I don't know about the first two, but in the third case dst (i.e. >>> r->hash_prefix) might be NULL if key.len == 0, reftable_malloc is malloc >>> (which it is, because reftable_set_alloc is never called) and malloc(0) >>> returns NULL (which it might do according to >>> https://www.man7.org/linux/man-pages/man3/malloc.3.html). >>> >>> malloc can return NULL on failure, too, of course, and none of the >>> reftable_malloc callers check for that. That seems a bit too >>> optimistic. >> >> Yeah, I agree that the real bug in this code is that the returned >> value of malloc() is not checked. But checking if key.len is zero >> before using memcpy() would not help fix it, or hide it. So I am >> not sure if that is a counter-argument against "this check is >> pointless". > > It's not -- I was just trying to find a scenario which would cause a > Coverity warning that could be silenced by such a zero check. > > We can use xmalloc and xrealloc to handle allocation failures and to get > valid pointers for zero-sized allocations, like in the patch below. > > I don't understand why reftable_set_alloc exists, though -- is there > really a use case that requires changing the allocator at runtime? I don't know why it's there, but I suppose it's for the same reason as PCREv2's integration, whose rabbit hole starts at 513f2b0bbd4 (grep: make PCRE2 aware of custom allocator, 2019-10-16). I.e. for building it as part of git.git we can just replace malloc()/free(), but for other uses as a general linkable library you'd want to replace it at runtime.
On Fri, Dec 24, 2021 at 5:16 AM Junio C Hamano <gitster@pobox.com> wrote: > > But your safe_memcpy() should not be > > safe_memcpy(dst, src, n) > { > if (n) > memcpy(dst, src, n); > return dst; > } > > Using memcpy() with size==0 is not a crime wrt the language. > Passing an invalid pointer while doing so is. It's not a crime, but what is the benefit of calling memcpy with n == 0 ? > safe_memcpy(dst, src, n) > { > if (dst) > memcpy(dst, src, n); > else if (n) > BUG(...); I think this is suboptimal. Sure, a segfault is uglier than "out of memory" error, but both effectively crash the program, so the difference isn't that big. The nice way is if the reftable library grows an error-code REFTABLE_OOM, which is propagated once a malloc or realloc returns NULL. We could test this exhaustively in the unittest by swapping in a malloc that starts failing after N allocations, and then running a transaction in a loop, increasing N. I'll have to look more closely if this is possible throughout, so for this series, I'll just take a closer look at the current call-sites to see if NULL can really occur or not.
On Wed, Jan 12, 2022 at 12:39 PM Han-Wen Nienhuys <hanwen@google.com> wrote: > this series, I'll just take a closer look at the current call-sites to > see if NULL can really occur or not. I've dropped this patch from the series. The only suspect callsite is in obj_record_decode(), and that is better addressed by other changes to make sure we never try to do a 0 length copy.
diff --git a/reftable/record.c b/reftable/record.c index fbaa1fbef56..423e687b220 100644 --- a/reftable/record.c +++ b/reftable/record.c @@ -126,7 +126,8 @@ static int encode_string(char *str, struct string_view s) string_view_consume(&s, n); if (s.len < l) return -1; - memcpy(s.buf, str, l); + if (l) + memcpy(s.buf, str, l); string_view_consume(&s, l); return start.len - s.len; @@ -153,7 +154,9 @@ int reftable_encode_key(int *restart, struct string_view dest, if (dest.len < suffix_len) return -1; - memcpy(dest.buf, key.buf + prefix_len, suffix_len); + + if (suffix_len) + memcpy(dest.buf, key.buf + prefix_len, suffix_len); string_view_consume(&dest, suffix_len); return start.len - dest.len; @@ -569,7 +572,8 @@ static int reftable_obj_record_decode(void *rec, struct strbuf key, uint64_t last; int j; r->hash_prefix = reftable_malloc(key.len); - memcpy(r->hash_prefix, key.buf, key.len); + if (key.len) + memcpy(r->hash_prefix, key.buf, key.len); r->hash_prefix_len = key.len; if (val_type == 0) {