Message ID | 20200325161249.55095-4-glider@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Add KernelMemorySanitizer infrastructure | expand |
On Wed 25-03-20 17:12:14, glider@google.com wrote: > This flag is to be used by KMSAN runtime to mark that newly created > memory pages don't need KMSAN metadata backing them. I really dislike an idea of the gfp flag. If you need some form of exclusion for kmsan allocations then follow the pattern of memalloc_no{fs,io}_{save,restore} History tells us that single usecase gfp flags are too tempting to abuse and using incorrectly. > Signed-off-by: Alexander Potapenko <glider@google.com> > To: Alexander Potapenko <glider@google.com> > Cc: Vegard Nossum <vegard.nossum@oracle.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Dmitry Vyukov <dvyukov@google.com> > Cc: Marco Elver <elver@google.com> > Cc: Andrey Konovalov <andreyknvl@google.com> > Cc: linux-mm@kvack.org > > --- > We can't decide what to do here: > - do we need to conditionally define ___GFP_NO_KMSAN_SHADOW depending on > CONFIG_KMSAN like LOCKDEP does? > - if KMSAN is defined, and LOCKDEP is not, do we want to "compactify" the GFP > bits? > > Change-Id: If5d0352fd5711ad103328e2c185eb885e826423a > --- > include/linux/gfp.h | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index be2754841369e..e1ab42b5e9ce2 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -44,6 +44,7 @@ struct vm_area_struct; > #else > #define ___GFP_NOLOCKDEP 0 > #endif > +#define ___GFP_NO_KMSAN_SHADOW 0x1000000u > /* If the above are modified, __GFP_BITS_SHIFT may need updating */ > > /* > @@ -212,12 +213,13 @@ struct vm_area_struct; > #define __GFP_NOWARN ((__force gfp_t)___GFP_NOWARN) > #define __GFP_COMP ((__force gfp_t)___GFP_COMP) > #define __GFP_ZERO ((__force gfp_t)___GFP_ZERO) > +#define __GFP_NO_KMSAN_SHADOW ((__force gfp_t)___GFP_NO_KMSAN_SHADOW) > > /* Disable lockdep for GFP context tracking */ > #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP) > > /* Room for N __GFP_FOO bits */ > -#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP)) > +#define __GFP_BITS_SHIFT (25) > #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) > > /** > -- > 2.25.1.696.g5e7596f4ac-goog >
On Wed, Mar 25, 2020 at 5:19 PM Michal Hocko <mhocko@kernel.org> wrote: > > On Wed 25-03-20 17:12:14, glider@google.com wrote: > > This flag is to be used by KMSAN runtime to mark that newly created > > memory pages don't need KMSAN metadata backing them. > > I really dislike an idea of the gfp flag. If you need some form of > exclusion for kmsan allocations then follow the pattern of memalloc_no{fs,io}_{save,restore} > History tells us that single usecase gfp flags are too tempting to abuse > and using incorrectly. Great idea, will do! Guess PF_ flags isn't a scarce resource?
On Wed, Mar 25, 2020 at 6:26 PM Alexander Potapenko <glider@google.com> wrote: > > On Wed, Mar 25, 2020 at 5:19 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > On Wed 25-03-20 17:12:14, glider@google.com wrote: > > > This flag is to be used by KMSAN runtime to mark that newly created > > > memory pages don't need KMSAN metadata backing them. > > > > I really dislike an idea of the gfp flag. If you need some form of > > exclusion for kmsan allocations then follow the pattern of memalloc_no{fs,io}_{save,restore} > > History tells us that single usecase gfp flags are too tempting to abuse > > and using incorrectly. > > Great idea, will do! > Guess PF_ flags isn't a scarce resource? Actually, no, we are out of bits in current->flags already. I could introduce a separate flag into struct task, but that won't work in interrupt contexts - how do you solve that problem for FS/IO allocations?
On Wed 25-03-20 18:26:34, Alexander Potapenko wrote: > On Wed, Mar 25, 2020 at 5:19 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > On Wed 25-03-20 17:12:14, glider@google.com wrote: > > > This flag is to be used by KMSAN runtime to mark that newly created > > > memory pages don't need KMSAN metadata backing them. > > > > I really dislike an idea of the gfp flag. If you need some form of > > exclusion for kmsan allocations then follow the pattern of memalloc_no{fs,io}_{save,restore} > > History tells us that single usecase gfp flags are too tempting to abuse > > and using incorrectly. > > Great idea, will do! > Guess PF_ flags isn't a scarce resource? task_struct is a monster data structure and there are surely holes you can fit your flag in. All you need is to just store it somewhere and then check for it wherever you hook your infrastructure into the page or slab allocator. The primary thing is to avoid a gfp flag. Thanks!
On Wed, Mar 25, 2020 at 06:40:29PM +0100, Alexander Potapenko wrote: > On Wed, Mar 25, 2020 at 6:26 PM Alexander Potapenko <glider@google.com> wrote: > > > > On Wed, Mar 25, 2020 at 5:19 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > On Wed 25-03-20 17:12:14, glider@google.com wrote: > > > > This flag is to be used by KMSAN runtime to mark that newly created > > > > memory pages don't need KMSAN metadata backing them. > > > > > > I really dislike an idea of the gfp flag. If you need some form of > > > exclusion for kmsan allocations then follow the pattern of memalloc_no{fs,io}_{save,restore} > > > History tells us that single usecase gfp flags are too tempting to abuse > > > and using incorrectly. > > > > Great idea, will do! > > Guess PF_ flags isn't a scarce resource? > > Actually, no, we are out of bits in current->flags already. > I could introduce a separate flag into struct task, but that won't > work in interrupt contexts - how do you solve that problem for FS/IO > allocations? I would suggest using bits in the section labelled: /* Unserialized, strictly 'current' */ since this doesn't need to be accessed from any other task. Michal, can we move PF_MEMALLOC_NOIO and NOFS to that area too?
> I would suggest using bits in the section labelled: > > /* Unserialized, strictly 'current' */ The main problem is that |current| is unavailable in the interrupt context, so we'll also need to: - disable interrupts when preparing for a KMSAN internal memory allocation - sounds costly, huh? - store the context flag in a per-cpu variable in the case |current| is unavailable.
On Wed, Mar 25, 2020 at 07:03:32PM +0100, Alexander Potapenko wrote: > > I would suggest using bits in the section labelled: > > > > /* Unserialized, strictly 'current' */ > > The main problem is that |current| is unavailable in the interrupt > context, so we'll also need to: > - disable interrupts when preparing for a KMSAN internal memory > allocation - sounds costly, huh? > - store the context flag in a per-cpu variable in the case |current| > is unavailable. It's not /unavailable/ ... it's whatever task happens to be running at the time the interrupt is triggered. You can borrow its task_struct. You'll have to save off the current value of the flag before setting it, just like memalloc_nofs_save() does. But this does rather call into question whether Michal's advice to use task_struct is good advice to begin with. For memalloc_nofs/noio, it works well this way because allocations in interrupt context are inherently at a more restrictive context than task level. It's not clear to me what this kmsan GFP flag is being used for, and whether allocations that happen in interrupt context should inherit the kmsan setting. I will have to read these patches more carefully to determine that; I was really just responding to the "where can I find some free bits" part of the question.
On Wed, Mar 25, 2020 at 7:09 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Wed, Mar 25, 2020 at 07:03:32PM +0100, Alexander Potapenko wrote: > > > I would suggest using bits in the section labelled: > > > > > > /* Unserialized, strictly 'current' */ > > > > The main problem is that |current| is unavailable in the interrupt > > context, so we'll also need to: > > - disable interrupts when preparing for a KMSAN internal memory > > allocation - sounds costly, huh? > > - store the context flag in a per-cpu variable in the case |current| > > is unavailable. > > It's not /unavailable/ ... it's whatever task happens to be running at > the time the interrupt is triggered. You can borrow its task_struct. That didn't come to my mind, interesting! > You'll have to save off the current value of the flag before setting it, > just like memalloc_nofs_save() does. > But this does rather call into question whether Michal's advice to use > task_struct is good advice to begin with. For memalloc_nofs/noio, > it works well this way because allocations in interrupt context are > inherently at a more restrictive context than task level. It's not clear > to me what this kmsan GFP flag is being used for, and whether allocations > that happen in interrupt context should inherit the kmsan setting. The idea behind this flag is as follows. KMSAN must allocate metadata pages for every allocation performed by alloc_pages(). Metadata allocations are also done via alloc_pages(), and for those no further metadata needs to be allocated. Thus the GFP flag is used to prevent the recursion in alloc_pages(). It is theoretically possible that a less restrictive allocation (without __GFP_NO_KMSAN_SHADOW) happens in an interrupt on top of a task performing a more restrictive allocation (with __GFP_NO_KMSAN_SHADOW). > I will have to read these patches more carefully to determine that; > I was really just responding to the "where can I find some free bits" > part of the question. Thanks for clarification.
On Wed 25-03-20 18:40:29, Alexander Potapenko wrote: > On Wed, Mar 25, 2020 at 6:26 PM Alexander Potapenko <glider@google.com> wrote: > > > > On Wed, Mar 25, 2020 at 5:19 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > On Wed 25-03-20 17:12:14, glider@google.com wrote: > > > > This flag is to be used by KMSAN runtime to mark that newly created > > > > memory pages don't need KMSAN metadata backing them. > > > > > > I really dislike an idea of the gfp flag. If you need some form of > > > exclusion for kmsan allocations then follow the pattern of memalloc_no{fs,io}_{save,restore} > > > History tells us that single usecase gfp flags are too tempting to abuse > > > and using incorrectly. > > > > Great idea, will do! > > Guess PF_ flags isn't a scarce resource? > > Actually, no, we are out of bits in current->flags already. > I could introduce a separate flag into struct task, but that won't > work in interrupt contexts - how do you solve that problem for FS/IO > allocations? NOFS/NOIO is not a problem for IRQ context because we never do reclaim from that context. I was also not aware that there are users from the IRQ context. I thought this would be an internal KMSAN stuff. What would be the IRQ context you call this from? Anyway, if you cannot go with the task_struct then a per-cpu value should work, right?
On Wed 25-03-20 10:49:16, Matthew Wilcox wrote: > On Wed, Mar 25, 2020 at 06:40:29PM +0100, Alexander Potapenko wrote: > > On Wed, Mar 25, 2020 at 6:26 PM Alexander Potapenko <glider@google.com> wrote: > > > > > > On Wed, Mar 25, 2020 at 5:19 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > > > On Wed 25-03-20 17:12:14, glider@google.com wrote: > > > > > This flag is to be used by KMSAN runtime to mark that newly created > > > > > memory pages don't need KMSAN metadata backing them. > > > > > > > > I really dislike an idea of the gfp flag. If you need some form of > > > > exclusion for kmsan allocations then follow the pattern of memalloc_no{fs,io}_{save,restore} > > > > History tells us that single usecase gfp flags are too tempting to abuse > > > > and using incorrectly. > > > > > > Great idea, will do! > > > Guess PF_ flags isn't a scarce resource? > > > > Actually, no, we are out of bits in current->flags already. > > I could introduce a separate flag into struct task, but that won't > > work in interrupt contexts - how do you solve that problem for FS/IO > > allocations? > > I would suggest using bits in the section labelled: > > /* Unserialized, strictly 'current' */ > > since this doesn't need to be accessed from any other task. Michal, > can we move PF_MEMALLOC_NOIO and NOFS to that area too? I wouldn't object. The only reason for using PF_$FOO is historical. It was XFS which started to use pf flag for NOSF AFAIR. I haven't checked recently but xfs still used to use its own api in the past.
On Wed 25-03-20 19:30:09, Alexander Potapenko wrote: > On Wed, Mar 25, 2020 at 7:09 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > On Wed, Mar 25, 2020 at 07:03:32PM +0100, Alexander Potapenko wrote: > > > > I would suggest using bits in the section labelled: > > > > > > > > /* Unserialized, strictly 'current' */ > > > > > > The main problem is that |current| is unavailable in the interrupt > > > context, so we'll also need to: > > > - disable interrupts when preparing for a KMSAN internal memory > > > allocation - sounds costly, huh? > > > - store the context flag in a per-cpu variable in the case |current| > > > is unavailable. > > > > It's not /unavailable/ ... it's whatever task happens to be running at > > the time the interrupt is triggered. You can borrow its task_struct. > > That didn't come to my mind, interesting! > > > You'll have to save off the current value of the flag before setting it, > > just like memalloc_nofs_save() does. > > But this does rather call into question whether Michal's advice to use > > task_struct is good advice to begin with. For memalloc_nofs/noio, > > it works well this way because allocations in interrupt context are > > inherently at a more restrictive context than task level. It's not clear > > to me what this kmsan GFP flag is being used for, and whether allocations > > that happen in interrupt context should inherit the kmsan setting. > > The idea behind this flag is as follows. > KMSAN must allocate metadata pages for every allocation performed by > alloc_pages(). > Metadata allocations are also done via alloc_pages(), and for those no > further metadata needs to be allocated. > Thus the GFP flag is used to prevent the recursion in alloc_pages(). Kmemleak used the same approach IIRC. It turned out to be unusable in a presence of any higher GFP_NOWAIT memory pressure. Anyway talk to Catalin Marinas about problems he had to go through to address them.
On Wed, Mar 25, 2020 at 7:38 PM Michal Hocko <mhocko@kernel.org> wrote: > > On Wed 25-03-20 18:40:29, Alexander Potapenko wrote: > > On Wed, Mar 25, 2020 at 6:26 PM Alexander Potapenko <glider@google.com> wrote: > > > > > > On Wed, Mar 25, 2020 at 5:19 PM Michal Hocko <mhocko@kernel.org> wrote: > > > > > > > > On Wed 25-03-20 17:12:14, glider@google.com wrote: > > > > > This flag is to be used by KMSAN runtime to mark that newly created > > > > > memory pages don't need KMSAN metadata backing them. > > > > > > > > I really dislike an idea of the gfp flag. If you need some form of > > > > exclusion for kmsan allocations then follow the pattern of memalloc_no{fs,io}_{save,restore} > > > > History tells us that single usecase gfp flags are too tempting to abuse > > > > and using incorrectly. > > > > > > Great idea, will do! > > > Guess PF_ flags isn't a scarce resource? > > > > Actually, no, we are out of bits in current->flags already. > > I could introduce a separate flag into struct task, but that won't > > work in interrupt contexts - how do you solve that problem for FS/IO > > allocations? > > NOFS/NOIO is not a problem for IRQ context because we never do reclaim > from that context. > > I was also not aware that there are users from the IRQ context. I > thought this would be an internal KMSAN stuff. What would be the IRQ > context you call this from? KMSAN allocates its metadata lazily*, so if any code does alloc_pages() from IRQ context, we need to call alloc_pages two more times for shadow/origin pages. We also unwind the stack on every creation/copy of an uninitialized value. Those stacks are stored in the stack depot, which may also allocate new pages to store stacks. > Anyway, if you cannot go with the task_struct then a per-cpu value > should work, right? Yes, I will try that. * - as mentioned in the cover letter, a lot of problems could've been solved if we pre-allocate the metadata pages at boot time so that for every two contiguous physical pages their metadata pages are also contiguous. I haven't come up with a good idea about how to implement that. Suggestions are very welcome. -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Paul Manicle, Halimah DeLaine Prado Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg
> * - as mentioned in the cover letter, a lot of problems could've been > solved if we pre-allocate the metadata pages at boot time so that for > every two contiguous physical pages their metadata pages are also > contiguous. > I haven't come up with a good idea about how to implement that. > Suggestions are very welcome. I think I have a working solution now that doesn't require __GFP_NO_KMSAN_SHADOW. I'll drop this patch from the series.
diff --git a/include/linux/gfp.h b/include/linux/gfp.h index be2754841369e..e1ab42b5e9ce2 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -44,6 +44,7 @@ struct vm_area_struct; #else #define ___GFP_NOLOCKDEP 0 #endif +#define ___GFP_NO_KMSAN_SHADOW 0x1000000u /* If the above are modified, __GFP_BITS_SHIFT may need updating */ /* @@ -212,12 +213,13 @@ struct vm_area_struct; #define __GFP_NOWARN ((__force gfp_t)___GFP_NOWARN) #define __GFP_COMP ((__force gfp_t)___GFP_COMP) #define __GFP_ZERO ((__force gfp_t)___GFP_ZERO) +#define __GFP_NO_KMSAN_SHADOW ((__force gfp_t)___GFP_NO_KMSAN_SHADOW) /* Disable lockdep for GFP context tracking */ #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP) /* Room for N __GFP_FOO bits */ -#define __GFP_BITS_SHIFT (23 + IS_ENABLED(CONFIG_LOCKDEP)) +#define __GFP_BITS_SHIFT (25) #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) /**
This flag is to be used by KMSAN runtime to mark that newly created memory pages don't need KMSAN metadata backing them. Signed-off-by: Alexander Potapenko <glider@google.com> To: Alexander Potapenko <glider@google.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: linux-mm@kvack.org --- We can't decide what to do here: - do we need to conditionally define ___GFP_NO_KMSAN_SHADOW depending on CONFIG_KMSAN like LOCKDEP does? - if KMSAN is defined, and LOCKDEP is not, do we want to "compactify" the GFP bits? Change-Id: If5d0352fd5711ad103328e2c185eb885e826423a --- include/linux/gfp.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)