khash: clarify that allocations never fail

Message ID	d13b50fd-5944-0bbe-d28e-8232a2932598@web.de (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@kernel.org> To: Git List <git@vger.kernel.org> Cc: Junio C Hamano <gitster@pobox.com>, Eric Wong <e@80x24.org>, Jeff King <peff@peff.net> From: =?utf-8?q?Ren=C3=A9_Scharfe?= <l.s.r@web.de> Subject: [PATCH] khash: clarify that allocations never fail Message-ID: <d13b50fd-5944-0bbe-d28e-8232a2932598@web.de> Date: Sat, 3 Jul 2021 12:05:46 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable Precedence: bulk
Series	khash: clarify that allocations never fail \| expand khash: clarify that allocations never fail

René Scharfe July 3, 2021, 10:05 a.m. UTC

We use our standard allocation functions and macros (xcalloc,
ALLOC_ARRAY, REALLOC_ARRAY) in our version of khash.h.  They terminate
the program on error, so code that's using them doesn't have to handle
allocation failures.  Make this behavior explicit by replacing the code
that handles allocation errors in kh_resize_ and kh_put_ with BUG calls.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 khash.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--
2.32.0

Jeff King July 3, 2021, 10:38 a.m. UTC | #1

On Sat, Jul 03, 2021 at 12:05:46PM +0200, René Scharfe wrote:

> We use our standard allocation functions and macros (xcalloc,
> ALLOC_ARRAY, REALLOC_ARRAY) in our version of khash.h.  They terminate
> the program on error, so code that's using them doesn't have to handle
> allocation failures.  Make this behavior explicit by replacing the code
> that handles allocation errors in kh_resize_ and kh_put_ with BUG calls.

Seems like a good idea.

We're very sloppy about checking the "ret" field from kh_put_* for
errors (it's a tri-state for "already existed", "newly added", or
"error"). I think that's not a problem because as you show here, we
can't actually hit the error case. This makes that much more obvious.

Two nits if we wanted to go further:

> diff --git a/khash.h b/khash.h
> index 21c2095216..84ff7230b6 100644
> --- a/khash.h
> +++ b/khash.h
> @@ -126,7 +126,7 @@ static const double __ac_HASH_UPPER = 0.77;
>  			if (h->size >= (khint_t)(new_n_buckets * __ac_HASH_UPPER + 0.5)) j = 0;	/* requested size is too small */ \
>  			else { /* hash table size to be changed (shrink or expand); rehash */ \
>  				ALLOC_ARRAY(new_flags, __ac_fsize(new_n_buckets)); \
> -				if (!new_flags) return -1;								\
> +				if (!new_flags) BUG("ALLOC_ARRAY failed");				\

I converted this in b32fa95fd8 (convert trivial cases to ALLOC_ARRAY,
2016-02-22), but left the now-obsolete error-check.

But a few lines below...

>  				memset(new_flags, 0xaa, __ac_fsize(new_n_buckets) * sizeof(khint32_t)); \
>  				if (h->n_buckets < new_n_buckets) {	/* expand */		\
>  					REALLOC_ARRAY(h->keys, new_n_buckets); \

These REALLOC_ARRAY() calls are in the same boat. You dropped the error
check in 2756ca4347 (use REALLOC_ARRAY for changing the allocation size
of arrays, 2014-09-16).

Should we make the two match? I'd probably do so by making the former
match the latter, and just drop the conditional and BUG entirely.

> @@ -181,10 +181,10 @@ static const double __ac_HASH_UPPER = 0.77;
>  		if (h->n_occupied >= h->upper_bound) { /* update the hash table */ \
>  			if (h->n_buckets > (h->size<<1)) {							\
>  				if (kh_resize_##name(h, h->n_buckets - 1) < 0) { /* clear "deleted" elements */ \
> -					*ret = -1; return h->n_buckets;						\
> +					BUG("kh_resize_" #name " failed");					\
>  				}														\
>  			} else if (kh_resize_##name(h, h->n_buckets + 1) < 0) { /* expand the hash table */ \
> -				*ret = -1; return h->n_buckets;							\
> +				BUG("kh_resize_" #name " failed");						\

After the first hunk, does kh_resize_*() ever return anything but 0? If
not, can we drop its return entirely, making it more clear that it's not
expected to fail? Both for human readers, but also for the compiler
(which could then alert us at compile-time if we missed any error
cases).

-Peff

Jeff King July 3, 2021, 10:44 a.m. UTC | #2

On Sat, Jul 03, 2021 at 06:38:27AM -0400, Jeff King wrote:

> On Sat, Jul 03, 2021 at 12:05:46PM +0200, René Scharfe wrote:
> 
> > We use our standard allocation functions and macros (xcalloc,
> > ALLOC_ARRAY, REALLOC_ARRAY) in our version of khash.h.  They terminate
> > the program on error, so code that's using them doesn't have to handle
> > allocation failures.  Make this behavior explicit by replacing the code
> > that handles allocation errors in kh_resize_ and kh_put_ with BUG calls.
> 
> Seems like a good idea.
> 
> We're very sloppy about checking the "ret" field from kh_put_* for
> errors (it's a tri-state for "already existed", "newly added", or
> "error"). I think that's not a problem because as you show here, we
> can't actually hit the error case. This makes that much more obvious.

Actually a quad-state, looking at the code (it distinguishes "in table
but deleted", though I don't think I've ever seen a case where that is
useful).

> Two nits if we wanted to go further:

In patch form (mostly because I was curious if I was missing any cases;
I'd probably squash it in with yours):

diff --git a/khash.h b/khash.h
index 84ff7230b6..fad486c966 100644
--- a/khash.h
+++ b/khash.h
@@ -74,7 +74,7 @@ static const double __ac_HASH_UPPER = 0.77;
 	void kh_destroy_##name(kh_##name##_t *h);					\
 	void kh_clear_##name(kh_##name##_t *h);						\
 	khint_t kh_get_##name(const kh_##name##_t *h, khkey_t key); \
-	int kh_resize_##name(kh_##name##_t *h, khint_t new_n_buckets); \
+	void kh_resize_##name(kh_##name##_t *h, khint_t new_n_buckets); \
 	khint_t kh_put_##name(kh_##name##_t *h, khkey_t key, int *ret); \
 	void kh_del_##name(kh_##name##_t *h, khint_t x);
 
@@ -116,7 +116,7 @@ static const double __ac_HASH_UPPER = 0.77;
 			return __ac_iseither(h->flags, i)? h->n_buckets : i;		\
 		} else return 0;												\
 	}																	\
-	SCOPE int kh_resize_##name(kh_##name##_t *h, khint_t new_n_buckets) \
+	SCOPE void kh_resize_##name(kh_##name##_t *h, khint_t new_n_buckets) \
 	{ /* This function uses 0.25*n_buckets bytes of working space instead of [sizeof(key_t+val_t)+.25]*n_buckets. */ \
 		khint32_t *new_flags = NULL;										\
 		khint_t j = 1;													\
@@ -173,18 +173,15 @@ static const double __ac_HASH_UPPER = 0.77;
 			h->n_occupied = h->size;									\
 			h->upper_bound = (khint_t)(h->n_buckets * __ac_HASH_UPPER + 0.5); \
 		}																\
-		return 0;														\
 	}																	\
 	SCOPE khint_t kh_put_##name(kh_##name##_t *h, khkey_t key, int *ret) \
 	{																	\
 		khint_t x;														\
 		if (h->n_occupied >= h->upper_bound) { /* update the hash table */ \
 			if (h->n_buckets > (h->size<<1)) {							\
-				if (kh_resize_##name(h, h->n_buckets - 1) < 0) { /* clear "deleted" elements */ \
-					BUG("kh_resize_" #name " failed");					\
-				}														\
-			} else if (kh_resize_##name(h, h->n_buckets + 1) < 0) { /* expand the hash table */ \
-				BUG("kh_resize_" #name " failed");						\
+				kh_resize_##name(h, h->n_buckets - 1); /* clear "deleted" elements */ \
+			} else { \
+				kh_resize_##name(h, h->n_buckets + 1); /* expand the hash table */ \
 			}															\
 		} /* TODO: to implement automatically shrinking; resize() already support shrinking */ \
 		{																\

Ævar Arnfjörð Bjarmason July 3, 2021, 11:35 a.m. UTC | #3

On Sat, Jul 03 2021, Jeff King wrote:

> On Sat, Jul 03, 2021 at 12:05:46PM +0200, René Scharfe wrote:
>
>> We use our standard allocation functions and macros (xcalloc,
>> ALLOC_ARRAY, REALLOC_ARRAY) in our version of khash.h.  They terminate
>> the program on error, so code that's using them doesn't have to handle
>> allocation failures.  Make this behavior explicit by replacing the code
>> that handles allocation errors in kh_resize_ and kh_put_ with BUG calls.
>
> Seems like a good idea.
>
> We're very sloppy about checking the "ret" field from kh_put_* for
> errors (it's a tri-state for "already existed", "newly added", or
> "error"). I think that's not a problem because as you show here, we
> can't actually hit the error case. This makes that much more obvious.
>
> Two nits if we wanted to go further:
>
>> diff --git a/khash.h b/khash.h
>> index 21c2095216..84ff7230b6 100644
>> --- a/khash.h
>> +++ b/khash.h
>> @@ -126,7 +126,7 @@ static const double __ac_HASH_UPPER = 0.77;
>>  			if (h->size >= (khint_t)(new_n_buckets * __ac_HASH_UPPER + 0.5)) j = 0;	/* requested size is too small */ \
>>  			else { /* hash table size to be changed (shrink or expand); rehash */ \
>>  				ALLOC_ARRAY(new_flags, __ac_fsize(new_n_buckets)); \
>> -				if (!new_flags) return -1;								\
>> +				if (!new_flags) BUG("ALLOC_ARRAY failed");				\
>
> I converted this in b32fa95fd8 (convert trivial cases to ALLOC_ARRAY,
> 2016-02-22), but left the now-obsolete error-check.
>
> But a few lines below...
>
>>  				memset(new_flags, 0xaa, __ac_fsize(new_n_buckets) * sizeof(khint32_t)); \
>>  				if (h->n_buckets < new_n_buckets) {	/* expand */		\
>>  					REALLOC_ARRAY(h->keys, new_n_buckets); \
>
> These REALLOC_ARRAY() calls are in the same boat. You dropped the error
> check in 2756ca4347 (use REALLOC_ARRAY for changing the allocation size
> of arrays, 2014-09-16).
>
> Should we make the two match? I'd probably do so by making the former
> match the latter, and just drop the conditional and BUG entirely.

Yes, I don't see why we should be guarding theis anymore than we do
xmalloc() or other x*() functions in various places (which is what it
resolves to).

If anything we might consider renaming it via coccinelle to
XALLOC_ARRAY(), XREALLOC_ARRAY() etc. to make it clear that they handle
any errors themselves.

René Scharfe July 3, 2021, 12:56 p.m. UTC | #4

Am 03.07.21 um 13:35 schrieb Ævar Arnfjörð Bjarmason:
>
> On Sat, Jul 03 2021, Jeff King wrote:
>
>> On Sat, Jul 03, 2021 at 12:05:46PM +0200, René Scharfe wrote:
>>
>>> We use our standard allocation functions and macros (xcalloc,
>>> ALLOC_ARRAY, REALLOC_ARRAY) in our version of khash.h.  They terminate
>>> the program on error, so code that's using them doesn't have to handle
>>> allocation failures.  Make this behavior explicit by replacing the code
>>> that handles allocation errors in kh_resize_ and kh_put_ with BUG calls.
>>
>> Seems like a good idea.
>>
>> We're very sloppy about checking the "ret" field from kh_put_* for
>> errors (it's a tri-state for "already existed", "newly added", or
>> "error"). I think that's not a problem because as you show here, we
>> can't actually hit the error case. This makes that much more obvious.
>>
>> Two nits if we wanted to go further:
>>
>>> diff --git a/khash.h b/khash.h
>>> index 21c2095216..84ff7230b6 100644
>>> --- a/khash.h
>>> +++ b/khash.h
>>> @@ -126,7 +126,7 @@ static const double __ac_HASH_UPPER = 0.77;
>>>  			if (h->size >= (khint_t)(new_n_buckets * __ac_HASH_UPPER + 0.5)) j = 0;	/* requested size is too small */ \
>>>  			else { /* hash table size to be changed (shrink or expand); rehash */ \
>>>  				ALLOC_ARRAY(new_flags, __ac_fsize(new_n_buckets)); \
>>> -				if (!new_flags) return -1;								\
>>> +				if (!new_flags) BUG("ALLOC_ARRAY failed");				\
>>
>> I converted this in b32fa95fd8 (convert trivial cases to ALLOC_ARRAY,
>> 2016-02-22), but left the now-obsolete error-check.
>>
>> But a few lines below...
>>
>>>  				memset(new_flags, 0xaa, __ac_fsize(new_n_buckets) * sizeof(khint32_t)); \
>>>  				if (h->n_buckets < new_n_buckets) {	/* expand */		\
>>>  					REALLOC_ARRAY(h->keys, new_n_buckets); \
>>
>> These REALLOC_ARRAY() calls are in the same boat. You dropped the error
>> check in 2756ca4347 (use REALLOC_ARRAY for changing the allocation size
>> of arrays, 2014-09-16).
>>
>> Should we make the two match? I'd probably do so by making the former
>> match the latter, and just drop the conditional and BUG entirely.
>
> Yes, I don't see why we should be guarding theis anymore than we do
> xmalloc() or other x*() functions in various places (which is what it
> resolves to).

Agreed.

> If anything we might consider renaming it via coccinelle to
> XALLOC_ARRAY(), XREALLOC_ARRAY() etc. to make it clear that they handle
> any errors themselves.

I don't think there's any confusion in our internal code about the macros'
handling of allocation errors.

The following semantic patch finds a leery xmalloc() caller in
compat/mmap.c, though:

@@
expression PTR, SIZE, SIZE2;
@@
(
  PTR = xmalloc(SIZE);
|
  PTR = xcalloc(SIZE, SIZE2);
|
  PTR = xrealloc(SIZE);
|
  ALLOC_ARRAY(PTR, SIZE);
|
  CALLOC_ARRAY(PTR, SIZE);
|
  REALLOC_ARRAY(PTR, SIZE);
)
  if (
- PTR == NULL
+ 0
  ) {...}

René

René Scharfe July 3, 2021, 12:57 p.m. UTC | #5

Am 03.07.21 um 12:38 schrieb Jeff King:
> On Sat, Jul 03, 2021 at 12:05:46PM +0200, René Scharfe wrote:
>
>> We use our standard allocation functions and macros (xcalloc,
>> ALLOC_ARRAY, REALLOC_ARRAY) in our version of khash.h.  They terminate
>> the program on error, so code that's using them doesn't have to handle
>> allocation failures.  Make this behavior explicit by replacing the code
>> that handles allocation errors in kh_resize_ and kh_put_ with BUG calls.
>
> Seems like a good idea.
>
> We're very sloppy about checking the "ret" field from kh_put_* for
> errors (it's a tri-state for "already existed", "newly added", or
> "error"). I think that's not a problem because as you show here, we
> can't actually hit the error case. This makes that much more obvious.
>
> Two nits if we wanted to go further:
>
>> diff --git a/khash.h b/khash.h
>> index 21c2095216..84ff7230b6 100644
>> --- a/khash.h
>> +++ b/khash.h
>> @@ -126,7 +126,7 @@ static const double __ac_HASH_UPPER = 0.77;
>>  			if (h->size >= (khint_t)(new_n_buckets * __ac_HASH_UPPER + 0.5)) j = 0;	/* requested size is too small */ \
>>  			else { /* hash table size to be changed (shrink or expand); rehash */ \
>>  				ALLOC_ARRAY(new_flags, __ac_fsize(new_n_buckets)); \
>> -				if (!new_flags) return -1;								\
>> +				if (!new_flags) BUG("ALLOC_ARRAY failed");				\
>
> I converted this in b32fa95fd8 (convert trivial cases to ALLOC_ARRAY,
> 2016-02-22), but left the now-obsolete error-check.
>
> But a few lines below...
>
>>  				memset(new_flags, 0xaa, __ac_fsize(new_n_buckets) * sizeof(khint32_t)); \
>>  				if (h->n_buckets < new_n_buckets) {	/* expand */		\
>>  					REALLOC_ARRAY(h->keys, new_n_buckets); \
>
> These REALLOC_ARRAY() calls are in the same boat. You dropped the error
> check in 2756ca4347 (use REALLOC_ARRAY for changing the allocation size
> of arrays, 2014-09-16).
>
> Should we make the two match? I'd probably do so by making the former
> match the latter, and just drop the conditional and BUG entirely.

Yeah, makes sense, thank you.

>
>> @@ -181,10 +181,10 @@ static const double __ac_HASH_UPPER = 0.77;
>>  		if (h->n_occupied >= h->upper_bound) { /* update the hash table */ \
>>  			if (h->n_buckets > (h->size<<1)) {							\
>>  				if (kh_resize_##name(h, h->n_buckets - 1) < 0) { /* clear "deleted" elements */ \
>> -					*ret = -1; return h->n_buckets;						\
>> +					BUG("kh_resize_" #name " failed");					\
>>  				}														\
>>  			} else if (kh_resize_##name(h, h->n_buckets + 1) < 0) { /* expand the hash table */ \
>> -				*ret = -1; return h->n_buckets;							\
>> +				BUG("kh_resize_" #name " failed");						\
>
> After the first hunk, does kh_resize_*() ever return anything but 0? If
> not, can we drop its return entirely, making it more clear that it's not
> expected to fail? Both for human readers, but also for the compiler
> (which could then alert us at compile-time if we missed any error
> cases).

Good idea.  Both type of changes make syncing with upstream a bit
harder, but even though the return type change bleeds into the caller,
the overall change affects only a small area.

René

Ævar Arnfjörð Bjarmason July 3, 2021, 1:05 p.m. UTC | #6

On Sat, Jul 03 2021, René Scharfe wrote:

> Am 03.07.21 um 13:35 schrieb Ævar Arnfjörð Bjarmason:
>>
>> On Sat, Jul 03 2021, Jeff King wrote:
>>
>>> On Sat, Jul 03, 2021 at 12:05:46PM +0200, René Scharfe wrote:
>>>
>>>> We use our standard allocation functions and macros (xcalloc,
>>>> ALLOC_ARRAY, REALLOC_ARRAY) in our version of khash.h.  They terminate
>>>> the program on error, so code that's using them doesn't have to handle
>>>> allocation failures.  Make this behavior explicit by replacing the code
>>>> that handles allocation errors in kh_resize_ and kh_put_ with BUG calls.
>>>
>>> Seems like a good idea.
>>>
>>> We're very sloppy about checking the "ret" field from kh_put_* for
>>> errors (it's a tri-state for "already existed", "newly added", or
>>> "error"). I think that's not a problem because as you show here, we
>>> can't actually hit the error case. This makes that much more obvious.
>>>
>>> Two nits if we wanted to go further:
>>>
>>>> diff --git a/khash.h b/khash.h
>>>> index 21c2095216..84ff7230b6 100644
>>>> --- a/khash.h
>>>> +++ b/khash.h
>>>> @@ -126,7 +126,7 @@ static const double __ac_HASH_UPPER = 0.77;
>>>>  			if (h->size >= (khint_t)(new_n_buckets * __ac_HASH_UPPER + 0.5)) j = 0;	/* requested size is too small */ \
>>>>  			else { /* hash table size to be changed (shrink or expand); rehash */ \
>>>>  				ALLOC_ARRAY(new_flags, __ac_fsize(new_n_buckets)); \
>>>> -				if (!new_flags) return -1;								\
>>>> +				if (!new_flags) BUG("ALLOC_ARRAY failed");				\
>>>
>>> I converted this in b32fa95fd8 (convert trivial cases to ALLOC_ARRAY,
>>> 2016-02-22), but left the now-obsolete error-check.
>>>
>>> But a few lines below...
>>>
>>>>  				memset(new_flags, 0xaa, __ac_fsize(new_n_buckets) * sizeof(khint32_t)); \
>>>>  				if (h->n_buckets < new_n_buckets) {	/* expand */		\
>>>>  					REALLOC_ARRAY(h->keys, new_n_buckets); \
>>>
>>> These REALLOC_ARRAY() calls are in the same boat. You dropped the error
>>> check in 2756ca4347 (use REALLOC_ARRAY for changing the allocation size
>>> of arrays, 2014-09-16).
>>>
>>> Should we make the two match? I'd probably do so by making the former
>>> match the latter, and just drop the conditional and BUG entirely.
>>
>> Yes, I don't see why we should be guarding theis anymore than we do
>> xmalloc() or other x*() functions in various places (which is what it
>> resolves to).
>
> Agreed.
>
>> If anything we might consider renaming it via coccinelle to
>> XALLOC_ARRAY(), XREALLOC_ARRAY() etc. to make it clear that they handle
>> any errors themselves.
>
> I don't think there's any confusion in our internal code about the macros'
> handling of allocation errors.
>
> The following semantic patch finds a leery xmalloc() caller in
> compat/mmap.c, though:
>
> @@
> expression PTR, SIZE, SIZE2;
> @@
> (
>   PTR = xmalloc(SIZE);
> |
>   PTR = xcalloc(SIZE, SIZE2);
> |
>   PTR = xrealloc(SIZE);
> |
>   ALLOC_ARRAY(PTR, SIZE);
> |
>   CALLOC_ARRAY(PTR, SIZE);
> |
>   REALLOC_ARRAY(PTR, SIZE);
> )
>   if (
> - PTR == NULL
> + 0
>   ) {...}
>
> René

Good catch, a bug as old as 730d48a2ef8 ([PATCH] If NO_MMAP is defined,
fake mmap() and munmap(), 2005-10-08).

It would be nice to have that coccinelle patch as a follow-up
patch. Perhaps along with changing that "0" to something that's simply a
syntax error, or just:

    if (0 /* always false due to (implicit?) x*() function call above */)

Jeff King July 4, 2021, 9:01 a.m. UTC | #7

On Sat, Jul 03, 2021 at 02:56:50PM +0200, René Scharfe wrote:

> > If anything we might consider renaming it via coccinelle to
> > XALLOC_ARRAY(), XREALLOC_ARRAY() etc. to make it clear that they handle
> > any errors themselves.
> 
> I don't think there's any confusion in our internal code about the macros'
> handling of allocation errors.

Agreed.

> The following semantic patch finds a leery xmalloc() caller in
> compat/mmap.c, though:
> 
> @@
> expression PTR, SIZE, SIZE2;
> @@
> (
>   PTR = xmalloc(SIZE);
> |
>   PTR = xcalloc(SIZE, SIZE2);
> |
>   PTR = xrealloc(SIZE);
> |
>   ALLOC_ARRAY(PTR, SIZE);
> |
>   CALLOC_ARRAY(PTR, SIZE);
> |
>   REALLOC_ARRAY(PTR, SIZE);
> )
>   if (
> - PTR == NULL
> + 0
>   ) {...}

IMHO that should not be using xmalloc() at all. It is a replacement for
system mmap, which can fail with ENOMEM, and we should be able to do the
same. Using xmalloc here is probably losing an opportunity to close
another pack window to free up memory for a new one.

I doubt it matters that much in practice (most systems are not even
compiling or using this code, and it would only matter in a tight memory
situation). But as a general rule, I think compat/ wrappers should
behave as much like (sensible) system equivalents as possible.

-Peff

René Scharfe July 4, 2021, 9:41 a.m. UTC | #8

Am 04.07.21 um 11:01 schrieb Jeff King:
> On Sat, Jul 03, 2021 at 02:56:50PM +0200, René Scharfe wrote:
>
>> The following semantic patch finds a leery xmalloc() caller in
>> compat/mmap.c, though:
>>
>> @@
>> expression PTR, SIZE, SIZE2;
>> @@
>> (
>>   PTR = xmalloc(SIZE);
>> |
>>   PTR = xcalloc(SIZE, SIZE2);
>> |
>>   PTR = xrealloc(SIZE);
>> |
>>   ALLOC_ARRAY(PTR, SIZE);
>> |
>>   CALLOC_ARRAY(PTR, SIZE);
>> |
>>   REALLOC_ARRAY(PTR, SIZE);
>> )
>>   if (
>> - PTR == NULL
>> + 0
>>   ) {...}
>

Btw. the found code is:

	start = xmalloc(length);
	if (start == NULL) {
		errno = ENOMEM;
		return MAP_FAILED;
	}

start cannot be NULL, so the check is dead code.

> IMHO that should not be using xmalloc() at all. It is a replacement for
> system mmap, which can fail with ENOMEM, and we should be able to do the
> same. Using xmalloc here is probably losing an opportunity to close
> another pack window to free up memory for a new one.

Do you mean using malloc(3) directly instead of xmalloc() would no
longer try to release pack windows?  xmalloc() hasn't done that anymore
since 9827d4c185 (packfile: drop release_pack_memory(), 2019-08-12).

xmalloc() still brings support for zero-sized allocations on platforms
whose malloc(3) doesn't like them, and it enforces GIT_ALLOC_LIMIT.
mmap() is supposed give up with EINVAL if the length is 0, so the
first feature is not actually helping.  And GIT_ALLOC_LIMIT is not
documented and only useful for testing, right?

> I doubt it matters that much in practice (most systems are not even
> compiling or using this code, and it would only matter in a tight memory
> situation). But as a general rule, I think compat/ wrappers should
> behave as much like (sensible) system equivalents as possible.

Makes sense.

René

Jeff King July 4, 2021, 10:11 a.m. UTC | #9

On Sun, Jul 04, 2021 at 11:41:23AM +0200, René Scharfe wrote:

> Btw. the found code is:
> 
> 	start = xmalloc(length);
> 	if (start == NULL) {
> 		errno = ENOMEM;
> 		return MAP_FAILED;
> 	}
> 
> start cannot be NULL, so the check is dead code.

Yep, so it's doubly silly.

> > IMHO that should not be using xmalloc() at all. It is a replacement for
> > system mmap, which can fail with ENOMEM, and we should be able to do the
> > same. Using xmalloc here is probably losing an opportunity to close
> > another pack window to free up memory for a new one.
> 
> Do you mean using malloc(3) directly instead of xmalloc() would no
> longer try to release pack windows?  xmalloc() hasn't done that anymore
> since 9827d4c185 (packfile: drop release_pack_memory(), 2019-08-12).

No, I meant that by using xmalloc() here, if the allocation fails, we'll
immediately die(). Whereas the caller of mmap() could get the ENOMEM
error, then unmap an in-use pack window and try again.

However, I forgot that we don't actually do that (yet). We unmap windows
if we go over our own packed_git_window_size counter, but not when mmap
fails. Eric posted a patch recently to change that, though (at which
point the die() in xmalloc() would be working against us).

(Actually, we did _try_ to do something like that in xmmap prior to
9827d4c185, but I don't think it actually worked since it was based on
our own internal limit, and not what the OS would allow).

> xmalloc() still brings support for zero-sized allocations on platforms
> whose malloc(3) doesn't like them, and it enforces GIT_ALLOC_LIMIT.
> mmap() is supposed give up with EINVAL if the length is 0, so the
> first feature is not actually helping.  And GIT_ALLOC_LIMIT is not
> documented and only useful for testing, right?

Yeah, I think failing on a 0-length mmap is OK, since that's what real
systems do (and this is a wrapper). Our xmmap_gently() handles this,
which is the right spot.

I don't think of GIT_ALLOC_LIMIT as something we're committed to
publicly supporting, though I have (ab)used it once or twice myself.
However, I'm not sure if it makes sense here. True, on a system without
mmap it is a heap allocation, but to me it's conceptually very different
than a normal allocation (and on a system with mmap, we have no problem
at all requesting arbitrarily large maps). If we did want to put a limit
here, I think we'd want to do it at the xmmap layer, using a separate
variable.

-Peff

khash: clarify that allocations never fail

Commit Message

Comments

Patch