diff mbox series

mm,page_owner: don't remove GFP flags in add_stack_record_to_list

Message ID 20240429054706.1543980-1-hch@lst.de (mailing list archive)
State New
Headers show
Series mm,page_owner: don't remove GFP flags in add_stack_record_to_list | expand

Commit Message

Christoph Hellwig April 29, 2024, 5:47 a.m. UTC
This loses flags like GFP_NOFS and GFP_NOIO that are important to avoid
deadlocks as well as GFP_NOLOCKDEP that otherwise generates lockdep false
positives.

Fixes: 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count")
Reported-by: Reported-by: syzbot+b7e8d799f0ab724876f9@syzkaller.appspotmail.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 mm/page_owner.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Vlastimil Babka April 29, 2024, 7:59 a.m. UTC | #1
On 4/29/24 7:47 AM, Christoph Hellwig wrote:
> This loses flags like GFP_NOFS and GFP_NOIO that are important to avoid
> deadlocks as well as GFP_NOLOCKDEP that otherwise generates lockdep false
> positives.

GFP_NOFS and GFP_NOIO translate to GFP_KERNEL without __GFP_FS/__GFP_IO so I
don't see how this patch would have helped with those.
__GFP_NOLOCKDEP is likely the actual issue and stackdepot solved it like this:

https://lore.kernel.org/linux-xfs/20240418141133.22950-1-ryabinin.a.a@gmail.com/

So we could just do the same here.

> Fixes: 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count")
> Reported-by: Reported-by: syzbot+b7e8d799f0ab724876f9@syzkaller.appspotmail.com
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  mm/page_owner.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index d17d1351ec84af..d214488846fa92 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
> @@ -168,9 +168,7 @@ static void add_stack_record_to_list(struct stack_record *stack_record,
>  	unsigned long flags;
>  	struct stack *stack;
>  
> -	/* Filter gfp_mask the same way stackdepot does, for consistency */
>  	gfp_mask &= ~GFP_ZONEMASK;
> -	gfp_mask &= (GFP_ATOMIC | GFP_KERNEL);
>  	gfp_mask |= __GFP_NOWARN;
>  
>  	set_current_in_page_owner();
Dave Chinner April 29, 2024, 11:49 p.m. UTC | #2
On Mon, Apr 29, 2024 at 09:59:43AM +0200, Vlastimil Babka wrote:
> On 4/29/24 7:47 AM, Christoph Hellwig wrote:
> > This loses flags like GFP_NOFS and GFP_NOIO that are important to avoid
> > deadlocks as well as GFP_NOLOCKDEP that otherwise generates lockdep false
> > positives.
> 
> GFP_NOFS and GFP_NOIO translate to GFP_KERNEL without __GFP_FS/__GFP_IO so I
> don't see how this patch would have helped with those.
> __GFP_NOLOCKDEP is likely the actual issue and stackdepot solved it like this:
> 
> https://lore.kernel.org/linux-xfs/20240418141133.22950-1-ryabinin.a.a@gmail.com/
>
> So we could just do the same here.

Yes, it is __GFP_NOLOCKDEP that is the issue here, but
cargo-cult-copying of that stackdepot fix is just whack-a-mole bug
fixing without addressing the technical debt that got us here in the
first place. Has anyone else bothered to look to see if kmemleak has
the same problem?

If anyone bothered to do an audit, they would see that
gfp_kmemleak_mask() handles the reclaim context masks correctly.
Further, it adds NOWARN, NOMEMALLOC and
NORETRY, which means the debug code is silent when it fails, it
doesn't deplete emergency reserves and doesn't bog down retrying
forever when there are sustained low memory situations.

This also points out that the page-owner/stackdepot code that strips
GFP_ZONEMASK is completely redundant. Doing:

	gfp_flags &= GFP_KERNEL|GFP_ATOMIC|__GFP_NOLOCKDEP;

strips everything but __GFP_RECLAIM, __GFP_FS, __GFP_IO,
__GFP_HIGH and __GFP_NOLOCKDEP. This already strips the zonemask
info, so there's no need to do it explicitly.

IOWs, the right way to fix this set of problems is to lift
gfp_kmemleak_mask() to include/linux/gfp.h and then use it across
all these nested allocations that occur behind the public
memory allocation API.

I've got a patchset under test at the moment that does this....

-Dave.
Vlastimil Babka April 30, 2024, 5:31 a.m. UTC | #3
On 4/30/24 1:49 AM, Dave Chinner wrote:
> On Mon, Apr 29, 2024 at 09:59:43AM +0200, Vlastimil Babka wrote:
>> On 4/29/24 7:47 AM, Christoph Hellwig wrote:
>> > This loses flags like GFP_NOFS and GFP_NOIO that are important to avoid
>> > deadlocks as well as GFP_NOLOCKDEP that otherwise generates lockdep false
>> > positives.
>> 
>> GFP_NOFS and GFP_NOIO translate to GFP_KERNEL without __GFP_FS/__GFP_IO so I
>> don't see how this patch would have helped with those.
>> __GFP_NOLOCKDEP is likely the actual issue and stackdepot solved it like this:
>> 
>> https://lore.kernel.org/linux-xfs/20240418141133.22950-1-ryabinin.a.a@gmail.com/
>>
>> So we could just do the same here.
> 
> Yes, it is __GFP_NOLOCKDEP that is the issue here, but
> cargo-cult-copying of that stackdepot fix is just whack-a-mole bug
> fixing without addressing the technical debt that got us here in the
> first place. Has anyone else bothered to look to see if kmemleak has
> the same problem?

Looks like you did :)

> If anyone bothered to do an audit, they would see that
> gfp_kmemleak_mask() handles the reclaim context masks correctly.
> Further, it adds NOWARN, NOMEMALLOC and
> NORETRY, which means the debug code is silent when it fails, it
> doesn't deplete emergency reserves and doesn't bog down retrying
> forever when there are sustained low memory situations.

So we do have NOWARN here. __GFP_RETRY_MAYFAIL might have been slightly
better than __GFP_NOWARN wrt "not retrying forever" but also not giving up
too soon. If we want to be really careful about reserves, it's a question
whether to keep the | GFP_ATOMIC which translates to leaving __GFP_HIGH.
OTOH if we don't keep it, these allocations might fail too easily from an
atomic context and we could miss the debugging data.

> This also points out that the page-owner/stackdepot code that strips
> GFP_ZONEMASK is completely redundant. Doing:
> 
> 	gfp_flags &= GFP_KERNEL|GFP_ATOMIC|__GFP_NOLOCKDEP;
> 
> strips everything but __GFP_RECLAIM, __GFP_FS, __GFP_IO,
> __GFP_HIGH and __GFP_NOLOCKDEP. This already strips the zonemask
> info, so there's no need to do it explicitly.

True.

> IOWs, the right way to fix this set of problems is to lift
> gfp_kmemleak_mask() to include/linux/gfp.h and then use it across
> all these nested allocations that occur behind the public
> memory allocation API.

Agree. But arguably these quick fixes adding __GFP_NOLOCKDEP were
appropriate for the late rc phase we're in.

> I've got a patchset under test at the moment that does this....

Great! Thanks.

> -Dave.
diff mbox series

Patch

diff --git a/mm/page_owner.c b/mm/page_owner.c
index d17d1351ec84af..d214488846fa92 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -168,9 +168,7 @@  static void add_stack_record_to_list(struct stack_record *stack_record,
 	unsigned long flags;
 	struct stack *stack;
 
-	/* Filter gfp_mask the same way stackdepot does, for consistency */
 	gfp_mask &= ~GFP_ZONEMASK;
-	gfp_mask &= (GFP_ATOMIC | GFP_KERNEL);
 	gfp_mask |= __GFP_NOWARN;
 
 	set_current_in_page_owner();