mbox series

[0/3] mm: fix nested allocation context filtering

Message ID 20240430054604.4169568-1-david@fromorbit.com (mailing list archive)
Headers show
Series mm: fix nested allocation context filtering | expand

Message

Dave Chinner April 30, 2024, 5:28 a.m. UTC
This patchset is the followup to the comment I made earlier today:

https://lore.kernel.org/linux-xfs/ZjAyIWUzDipofHFJ@dread.disaster.area/

Tl;dr: Memory allocations that are done inside the public memory
allocation API need to obey the reclaim recursion constraints placed
on the allocation by the original caller, including the "don't track
recursion for this allocation" case defined by __GFP_NOLOCKDEP.

These nested allocations are generally in debug code that is
tracking something about the allocation (kmemleak, KASAN, etc) and
so are allocating private kernel objects that only that debug system
will use.

Neither the page-owner code nor the stack depot code get this right.
They also also clear GFP_ZONEMASK as a separate operation, which is
completely redundant because the constraint filter applied
immediately after guarantees that GFP_ZONEMASK bits are cleared.

kmemleak gets this filtering right. It preserves the allocation
constraints for deadlock prevention and clears all other context
flags whilst also ensuring that the nested allocation will fail
quickly, silently and without depleting emergency kernel reserves if
there is no memory available.

This can be made much more robust, immune to whack-a-mole games and
the code greatly simplified by lifting gfp_kmemleak_mask() to
include/linux/gfp.h and using that everywhere. Also document it so
that there is no excuse for not knowing about it when writing new
debug code that nests allocations.

Tested with lockdep, KASAN + page_owner=on and kmemleak=on over
multiple fstests runs with XFS.

Comments

Christoph Hellwig April 30, 2024, 9:35 a.m. UTC | #1
The changes looks fine to me:

Reviewed-by: Christoph Hellwig <hch@lst.de>

Btw, I wonder why NOLOCKDEP isn't a context flag to start with,
but maybe this isn't the right time to start that discussion..
Vlastimil Babka April 30, 2024, 10:06 a.m. UTC | #2
On 4/30/24 7:28 AM, Dave Chinner wrote:
> This patchset is the followup to the comment I made earlier today:
> 
> https://lore.kernel.org/linux-xfs/ZjAyIWUzDipofHFJ@dread.disaster.area/
> 
> Tl;dr: Memory allocations that are done inside the public memory
> allocation API need to obey the reclaim recursion constraints placed
> on the allocation by the original caller, including the "don't track
> recursion for this allocation" case defined by __GFP_NOLOCKDEP.
> 
> These nested allocations are generally in debug code that is
> tracking something about the allocation (kmemleak, KASAN, etc) and
> so are allocating private kernel objects that only that debug system
> will use.
> 
> Neither the page-owner code nor the stack depot code get this right.
> They also also clear GFP_ZONEMASK as a separate operation, which is
> completely redundant because the constraint filter applied
> immediately after guarantees that GFP_ZONEMASK bits are cleared.
> 
> kmemleak gets this filtering right. It preserves the allocation
> constraints for deadlock prevention and clears all other context
> flags whilst also ensuring that the nested allocation will fail
> quickly, silently and without depleting emergency kernel reserves if
> there is no memory available.
> 
> This can be made much more robust, immune to whack-a-mole games and
> the code greatly simplified by lifting gfp_kmemleak_mask() to
> include/linux/gfp.h and using that everywhere. Also document it so
> that there is no excuse for not knowing about it when writing new
> debug code that nests allocations.
> 
> Tested with lockdep, KASAN + page_owner=on and kmemleak=on over
> multiple fstests runs with XFS.

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

Thanks.
Oscar Salvador May 1, 2024, 8:06 a.m. UTC | #3
On Tue, Apr 30, 2024 at 03:28:22PM +1000, Dave Chinner wrote:
> This patchset is the followup to the comment I made earlier today:
> 
> https://lore.kernel.org/linux-xfs/ZjAyIWUzDipofHFJ@dread.disaster.area/
> 
> Tl;dr: Memory allocations that are done inside the public memory
> allocation API need to obey the reclaim recursion constraints placed
> on the allocation by the original caller, including the "don't track
> recursion for this allocation" case defined by __GFP_NOLOCKDEP.
> 
> These nested allocations are generally in debug code that is
> tracking something about the allocation (kmemleak, KASAN, etc) and
> so are allocating private kernel objects that only that debug system
> will use.
> 
> Neither the page-owner code nor the stack depot code get this right.
> They also also clear GFP_ZONEMASK as a separate operation, which is
> completely redundant because the constraint filter applied
> immediately after guarantees that GFP_ZONEMASK bits are cleared.
> 
> kmemleak gets this filtering right. It preserves the allocation
> constraints for deadlock prevention and clears all other context
> flags whilst also ensuring that the nested allocation will fail
> quickly, silently and without depleting emergency kernel reserves if
> there is no memory available.
> 
> This can be made much more robust, immune to whack-a-mole games and
> the code greatly simplified by lifting gfp_kmemleak_mask() to
> include/linux/gfp.h and using that everywhere. Also document it so
> that there is no excuse for not knowing about it when writing new
> debug code that nests allocations.
> 
> Tested with lockdep, KASAN + page_owner=on and kmemleak=on over
> multiple fstests runs with XFS.

Reviewed-by: Oscar Salvador <osalvador@suse.de>

Thanks!
Andrew Morton May 2, 2024, 5:05 p.m. UTC | #4
On Tue, 30 Apr 2024 15:28:22 +1000 Dave Chinner <david@fromorbit.com> wrote:

> This patchset is the followup to the comment I made earlier today:
> 
> https://lore.kernel.org/linux-xfs/ZjAyIWUzDipofHFJ@dread.disaster.area/
> 
> Tl;dr: Memory allocations that are done inside the public memory
> allocation API need to obey the reclaim recursion constraints placed
> on the allocation by the original caller, including the "don't track
> recursion for this allocation" case defined by __GFP_NOLOCKDEP.

I resolved a number of conflicts here, mainly due to the addition
of GFP_NOLOCKDEP in stackdepot and page-owner.

https://lore.kernel.org/all/20240418141133.22950-1-ryabinin.a.a@gmail.com/T/#u
https://lkml.kernel.org/r/20240429082828.1615986-1-hch@lst.de

Please check the resulting code?