diff mbox series

[v2,4/4] mm: prohibit NULL deference exposed for unsupported non-blockable __GFP_NOFAIL

Message ID 20240731000155.109583-5-21cnbao@gmail.com (mailing list archive)
State New
Headers show
Series mm: clarify nofail memory allocation | expand

Commit Message

Barry Song July 31, 2024, 12:01 a.m. UTC
From: Barry Song <v-songbaohua@oppo.com>

When users allocate memory with the __GFP_NOFAIL flag, they might
incorrectly use it alongside GFP_ATOMIC, GFP_NOWAIT, etc. This kind
of non-blockable __GFP_NOFAIL is not supported and is pointless. If
we attempt and still fail to allocate memory for these users, we have
two choices:

    1. We could busy-loop and hope that some other direct reclamation or
    kswapd rescues the current process. However, this is unreliable
    and could ultimately lead to hard or soft lockups, which might not
    be well supported by some architectures.

    2. We could use BUG_ON to trigger a reliable system crash, avoiding
    exposing NULL dereference.

This patch chooses the second option because the first is unreliable. Even
if the process incorrectly using __GFP_NOFAIL is sometimes rescued, the
long latency might be unacceptable, especially considering that misusing
GFP_ATOMIC and __GFP_NOFAIL is likely to occur in atomic contexts with
strict timing requirements.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Kees Cook <kees@kernel.org>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
 mm/page_alloc.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Michal Hocko July 31, 2024, 7:15 a.m. UTC | #1
On Wed 31-07-24 12:01:55, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> When users allocate memory with the __GFP_NOFAIL flag, they might
> incorrectly use it alongside GFP_ATOMIC, GFP_NOWAIT, etc. This kind
> of non-blockable __GFP_NOFAIL is not supported and is pointless. If
> we attempt and still fail to allocate memory for these users, we have
> two choices:
> 
>     1. We could busy-loop and hope that some other direct reclamation or
>     kswapd rescues the current process. However, this is unreliable
>     and could ultimately lead to hard or soft lockups, which might not
>     be well supported by some architectures.
> 
>     2. We could use BUG_ON to trigger a reliable system crash, avoiding
>     exposing NULL dereference.
> 
> This patch chooses the second option because the first is unreliable. Even
> if the process incorrectly using __GFP_NOFAIL is sometimes rescued, the
> long latency might be unacceptable, especially considering that misusing
> GFP_ATOMIC and __GFP_NOFAIL is likely to occur in atomic contexts with
> strict timing requirements.

Well, any latency arguments are out of table with BUG_ON crashing the
system. So this is not about reliability but rather making those
incorrect uses more obvious.

With your GFP_NOFAIL follow up this should be simply impossible to
trigger though. I am still not sure which of the bad solutions is more
appropriate so I am not giving this an ack. Either of them is better
than allow to fail though.

> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Kees Cook <kees@kernel.org>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> ---
>  mm/page_alloc.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index cc179c3e68df..ed1bd8f595bd 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4439,11 +4439,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	 */
>  	if (gfp_mask & __GFP_NOFAIL) {
>  		/*
> -		 * All existing users of the __GFP_NOFAIL are blockable, so warn
> -		 * of any new users that actually require GFP_NOWAIT
> +		 * All existing users of the __GFP_NOFAIL are blockable
> +		 * otherwise we introduce a busy loop with inside the page
> +		 * allocator from non-sleepable contexts
>  		 */
> -		if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask))
> -			goto fail;
> +		BUG_ON(!can_direct_reclaim);
>  
>  		/*
>  		 * PF_MEMALLOC request from this context is rather bizarre
> @@ -4474,7 +4474,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  		cond_resched();
>  		goto retry;
>  	}
> -fail:
> +
>  	warn_alloc(gfp_mask, ac->nodemask,
>  			"page allocation failure: order:%u", order);
>  got_pg:
> -- 
> 2.34.1
Vlastimil Babka July 31, 2024, 10:55 a.m. UTC | #2
On 7/31/24 2:01 AM, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> When users allocate memory with the __GFP_NOFAIL flag, they might
> incorrectly use it alongside GFP_ATOMIC, GFP_NOWAIT, etc. This kind
> of non-blockable __GFP_NOFAIL is not supported and is pointless. If
> we attempt and still fail to allocate memory for these users, we have
> two choices:
> 
>     1. We could busy-loop and hope that some other direct reclamation or
>     kswapd rescues the current process. However, this is unreliable
>     and could ultimately lead to hard or soft lockups, which might not
>     be well supported by some architectures.
> 
>     2. We could use BUG_ON to trigger a reliable system crash, avoiding
>     exposing NULL dereference.
> 
> This patch chooses the second option because the first is unreliable. Even
> if the process incorrectly using __GFP_NOFAIL is sometimes rescued, the
> long latency might be unacceptable, especially considering that misusing
> GFP_ATOMIC and __GFP_NOFAIL is likely to occur in atomic contexts with
> strict timing requirements.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Kees Cook <kees@kernel.org>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> ---
>  mm/page_alloc.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index cc179c3e68df..ed1bd8f595bd 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4439,11 +4439,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	 */
>  	if (gfp_mask & __GFP_NOFAIL) {
>  		/*
> -		 * All existing users of the __GFP_NOFAIL are blockable, so warn
> -		 * of any new users that actually require GFP_NOWAIT
> +		 * All existing users of the __GFP_NOFAIL are blockable
> +		 * otherwise we introduce a busy loop with inside the page
> +		 * allocator from non-sleepable contexts
>  		 */
> -		if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask))
> -			goto fail;
> +		BUG_ON(!can_direct_reclaim);

We might get more useful output if here we did just "if
(!can_direct_reclaim) goto fail; and let warn_alloc() print it, and then
there would be a BUG_ON(gfp_mask & __GFP_NOFAIL)?
Additionally we could mask out __GFP_NOWARN from gfp_mask before the goto,
as a __GFP_NOWARN would suppress the output in a non-recoverable situation
so it would be wrong.

>  
>  		/*
>  		 * PF_MEMALLOC request from this context is rather bizarre
> @@ -4474,7 +4474,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  		cond_resched();
>  		goto retry;
>  	}
> -fail:
> +
>  	warn_alloc(gfp_mask, ac->nodemask,
>  			"page allocation failure: order:%u", order);
>  got_pg:
Barry Song July 31, 2024, 11:08 a.m. UTC | #3
On Wed, Jul 31, 2024 at 6:55 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 7/31/24 2:01 AM, Barry Song wrote:
> > From: Barry Song <v-songbaohua@oppo.com>
> >
> > When users allocate memory with the __GFP_NOFAIL flag, they might
> > incorrectly use it alongside GFP_ATOMIC, GFP_NOWAIT, etc. This kind
> > of non-blockable __GFP_NOFAIL is not supported and is pointless. If
> > we attempt and still fail to allocate memory for these users, we have
> > two choices:
> >
> >     1. We could busy-loop and hope that some other direct reclamation or
> >     kswapd rescues the current process. However, this is unreliable
> >     and could ultimately lead to hard or soft lockups, which might not
> >     be well supported by some architectures.
> >
> >     2. We could use BUG_ON to trigger a reliable system crash, avoiding
> >     exposing NULL dereference.
> >
> > This patch chooses the second option because the first is unreliable. Even
> > if the process incorrectly using __GFP_NOFAIL is sometimes rescued, the
> > long latency might be unacceptable, especially considering that misusing
> > GFP_ATOMIC and __GFP_NOFAIL is likely to occur in atomic contexts with
> > strict timing requirements.
> >
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> > Cc: Christoph Lameter <cl@linux.com>
> > Cc: Pekka Enberg <penberg@kernel.org>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Roman Gushchin <roman.gushchin@linux.dev>
> > Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > Cc: Kees Cook <kees@kernel.org>
> > Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > ---
> >  mm/page_alloc.c | 10 +++++-----
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index cc179c3e68df..ed1bd8f595bd 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -4439,11 +4439,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> >        */
> >       if (gfp_mask & __GFP_NOFAIL) {
> >               /*
> > -              * All existing users of the __GFP_NOFAIL are blockable, so warn
> > -              * of any new users that actually require GFP_NOWAIT
> > +              * All existing users of the __GFP_NOFAIL are blockable
> > +              * otherwise we introduce a busy loop with inside the page
> > +              * allocator from non-sleepable contexts
> >                */
> > -             if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask))
> > -                     goto fail;
> > +             BUG_ON(!can_direct_reclaim);
>
> We might get more useful output if here we did just "if
> (!can_direct_reclaim) goto fail; and let warn_alloc() print it, and then
> there would be a BUG_ON(gfp_mask & __GFP_NOFAIL)?
> Additionally we could mask out __GFP_NOWARN from gfp_mask before the goto,
> as a __GFP_NOWARN would suppress the output in a non-recoverable situation
> so it would be wrong.

If we use BUG_ON, it seems like we don't need to do anything else, as the BUG_ON
report gives developers all the information they need. If we go with
approach 1—doing
a busy loop until rescued or a lockup occurs—I agree it might be
better to add more
warnings.

>
> >
> >               /*
> >                * PF_MEMALLOC request from this context is rather bizarre
> > @@ -4474,7 +4474,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> >               cond_resched();
> >               goto retry;
> >       }
> > -fail:
> > +
> >       warn_alloc(gfp_mask, ac->nodemask,
> >                       "page allocation failure: order:%u", order);
> >  got_pg:
>
Michal Hocko July 31, 2024, 11:31 a.m. UTC | #4
On Wed 31-07-24 19:08:44, Barry Song wrote:
> On Wed, Jul 31, 2024 at 6:55 PM Vlastimil Babka <vbabka@suse.cz> wrote:
> >
> > On 7/31/24 2:01 AM, Barry Song wrote:
> > > From: Barry Song <v-songbaohua@oppo.com>
> > >
> > > When users allocate memory with the __GFP_NOFAIL flag, they might
> > > incorrectly use it alongside GFP_ATOMIC, GFP_NOWAIT, etc. This kind
> > > of non-blockable __GFP_NOFAIL is not supported and is pointless. If
> > > we attempt and still fail to allocate memory for these users, we have
> > > two choices:
> > >
> > >     1. We could busy-loop and hope that some other direct reclamation or
> > >     kswapd rescues the current process. However, this is unreliable
> > >     and could ultimately lead to hard or soft lockups, which might not
> > >     be well supported by some architectures.
> > >
> > >     2. We could use BUG_ON to trigger a reliable system crash, avoiding
> > >     exposing NULL dereference.
> > >
> > > This patch chooses the second option because the first is unreliable. Even
> > > if the process incorrectly using __GFP_NOFAIL is sometimes rescued, the
> > > long latency might be unacceptable, especially considering that misusing
> > > GFP_ATOMIC and __GFP_NOFAIL is likely to occur in atomic contexts with
> > > strict timing requirements.
> > >
> > > Cc: Michal Hocko <mhocko@suse.com>
> > > Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > Cc: Christoph Hellwig <hch@infradead.org>
> > > Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> > > Cc: Christoph Lameter <cl@linux.com>
> > > Cc: Pekka Enberg <penberg@kernel.org>
> > > Cc: David Rientjes <rientjes@google.com>
> > > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > > Cc: Vlastimil Babka <vbabka@suse.cz>
> > > Cc: Roman Gushchin <roman.gushchin@linux.dev>
> > > Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
> > > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > > Cc: Kees Cook <kees@kernel.org>
> > > Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > > ---
> > >  mm/page_alloc.c | 10 +++++-----
> > >  1 file changed, 5 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index cc179c3e68df..ed1bd8f595bd 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -4439,11 +4439,11 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> > >        */
> > >       if (gfp_mask & __GFP_NOFAIL) {
> > >               /*
> > > -              * All existing users of the __GFP_NOFAIL are blockable, so warn
> > > -              * of any new users that actually require GFP_NOWAIT
> > > +              * All existing users of the __GFP_NOFAIL are blockable
> > > +              * otherwise we introduce a busy loop with inside the page
> > > +              * allocator from non-sleepable contexts
> > >                */
> > > -             if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask))
> > > -                     goto fail;
> > > +             BUG_ON(!can_direct_reclaim);
> >
> > We might get more useful output if here we did just "if
> > (!can_direct_reclaim) goto fail; and let warn_alloc() print it, and then
> > there would be a BUG_ON(gfp_mask & __GFP_NOFAIL)?
> > Additionally we could mask out __GFP_NOWARN from gfp_mask before the goto,
> > as a __GFP_NOWARN would suppress the output in a non-recoverable situation
> > so it would be wrong.
> 
> If we use BUG_ON, it seems like we don't need to do anything else, as the BUG_ON
> report gives developers all the information they need.

It will not give warn_alloc - aka state of the page allocator at the
time of failure. Is this really necessary? I don't know because it is
"shouldn't ever happen" rather than "how come this allocation has
failed" case.

So IMHO a simple BUG_ON should be sufficient to scream out loud that
impossible has happened and need fixing.
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cc179c3e68df..ed1bd8f595bd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4439,11 +4439,11 @@  __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	 */
 	if (gfp_mask & __GFP_NOFAIL) {
 		/*
-		 * All existing users of the __GFP_NOFAIL are blockable, so warn
-		 * of any new users that actually require GFP_NOWAIT
+		 * All existing users of the __GFP_NOFAIL are blockable
+		 * otherwise we introduce a busy loop with inside the page
+		 * allocator from non-sleepable contexts
 		 */
-		if (WARN_ON_ONCE_GFP(!can_direct_reclaim, gfp_mask))
-			goto fail;
+		BUG_ON(!can_direct_reclaim);
 
 		/*
 		 * PF_MEMALLOC request from this context is rather bizarre
@@ -4474,7 +4474,7 @@  __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 		cond_resched();
 		goto retry;
 	}
-fail:
+
 	warn_alloc(gfp_mask, ac->nodemask,
 			"page allocation failure: order:%u", order);
 got_pg: