Message ID | 20230130130739.563628-1-arnd@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: extend max struct page size for kmsan | expand |
On Mon 30-01-23 14:07:26, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > After x86 has enabled support for KMSAN, it has become possible > to have larger 'struct page' than was expected when commit > 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b > architectures") was merged: > > include/linux/mm.h:156:10: warning: no case matching constant switch condition '96' > switch (sizeof(struct page)) { > > Extend the maximum accordingly. > > Fixes: 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b architectures") > Fixes: 4ca8cc8d1bbe ("x86: kmsan: enable KMSAN builds for x86") > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Michal Hocko <mhocko@suse.com> I haven't really followed KMSAN development but I would have expected that it would, like other debugging tools, add its metadata to page_ext rather than page directly. > --- > This seems to show up extremely rarely in randconfig builds, but > enough to trigger my build machine. > > I saw a related discussion at [1] about raising MAX_STRUCT_PAGE_SIZE, > but as I understand it, that needs to be addressed separately. > > [1] https://lore.kernel.org/lkml/20220701142310.2188015-11-glider@google.com/ > --- > include/linux/mm.h | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index b73ba2e5cfd2..aa39d5ddace1 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -137,7 +137,7 @@ extern int mmap_rnd_compat_bits __read_mostly; > * define their own version of this macro in <asm/pgtable.h> > */ > #if BITS_PER_LONG == 64 > -/* This function must be updated when the size of struct page grows above 80 > +/* This function must be updated when the size of struct page grows above 96 > * or reduces below 56. The idea that compiler optimizes out switch() > * statement, and only leaves move/store instructions. Also the compiler can > * combine write statements if they are both assignments and can be reordered, > @@ -148,12 +148,18 @@ static inline void __mm_zero_struct_page(struct page *page) > { > unsigned long *_pp = (void *)page; > > - /* Check that struct page is either 56, 64, 72, or 80 bytes */ > + /* Check that struct page is either 56, 64, 72, 80, 88 or 96 bytes */ > BUILD_BUG_ON(sizeof(struct page) & 7); > BUILD_BUG_ON(sizeof(struct page) < 56); > - BUILD_BUG_ON(sizeof(struct page) > 80); > + BUILD_BUG_ON(sizeof(struct page) > 96); > > switch (sizeof(struct page)) { > + case 96: > + _pp[11] = 0; > + fallthrough; > + case 88: > + _pp[10] = 0; > + fallthrough; > case 80: > _pp[9] = 0; > fallthrough; > -- > 2.39.0
On Mon, Jan 30, 2023 at 5:07 AM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > After x86 has enabled support for KMSAN, it has become possible > to have larger 'struct page' than was expected when commit > 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b > architectures") was merged: > > include/linux/mm.h:156:10: warning: no case matching constant switch condition '96' > switch (sizeof(struct page)) { > > Extend the maximum accordingly. > > Fixes: 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b architectures") > Fixes: 4ca8cc8d1bbe ("x86: kmsan: enable KMSAN builds for x86") Rather than saying this fixes the code that enables the config flags I might be more comfortable with listing the commit that added the two pointers to the struct: Fixes: f80be4571b19 ("kmsan: add KMSAN runtime core") It will make it easier to identify where the lines where added that actually increased the size of the page struct. > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > This seems to show up extremely rarely in randconfig builds, but > enough to trigger my build machine. > > I saw a related discussion at [1] about raising MAX_STRUCT_PAGE_SIZE, > but as I understand it, that needs to be addressed separately. > > [1] https://lore.kernel.org/lkml/20220701142310.2188015-11-glider@google.com/ > --- > include/linux/mm.h | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index b73ba2e5cfd2..aa39d5ddace1 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -137,7 +137,7 @@ extern int mmap_rnd_compat_bits __read_mostly; > * define their own version of this macro in <asm/pgtable.h> > */ > #if BITS_PER_LONG == 64 > -/* This function must be updated when the size of struct page grows above 80 > +/* This function must be updated when the size of struct page grows above 96 > * or reduces below 56. The idea that compiler optimizes out switch() > * statement, and only leaves move/store instructions. Also the compiler can > * combine write statements if they are both assignments and can be reordered, > @@ -148,12 +148,18 @@ static inline void __mm_zero_struct_page(struct page *page) > { > unsigned long *_pp = (void *)page; > > - /* Check that struct page is either 56, 64, 72, or 80 bytes */ > + /* Check that struct page is either 56, 64, 72, 80, 88 or 96 bytes */ > BUILD_BUG_ON(sizeof(struct page) & 7); > BUILD_BUG_ON(sizeof(struct page) < 56); > - BUILD_BUG_ON(sizeof(struct page) > 80); > + BUILD_BUG_ON(sizeof(struct page) > 96); > > switch (sizeof(struct page)) { > + case 96: > + _pp[11] = 0; > + fallthrough; > + case 88: > + _pp[10] = 0; > + fallthrough; > case 80: > _pp[9] = 0; > fallthrough; Otherwise the code itself looks good to me. Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
On Mon, Jan 30, 2023 at 02:38:22PM +0100, Michal Hocko wrote: > On Mon 30-01-23 14:07:26, Arnd Bergmann wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > > > After x86 has enabled support for KMSAN, it has become possible > > to have larger 'struct page' than was expected when commit > > 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b > > architectures") was merged: > > > > include/linux/mm.h:156:10: warning: no case matching constant switch condition '96' > > switch (sizeof(struct page)) { > > > > Extend the maximum accordingly. > > > > Fixes: 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b architectures") > > Fixes: 4ca8cc8d1bbe ("x86: kmsan: enable KMSAN builds for x86") > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > Acked-by: Michal Hocko <mhocko@suse.com> > > I haven't really followed KMSAN development but I would have expected > that it would, like other debugging tools, add its metadata to page_ext > rather than page directly. Yes, that would have been preferable. Also, I don't understand why we need an entire page to store whether each "bit" of a page is initialised. There are no CPUs which have bit-granularity stores; either you initialise an entire byte or not. So that metadata can shrink from 4096 bytes to 512.
On Mon, Jan 30, 2023 at 8:07 AM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > After x86 has enabled support for KMSAN, it has become possible > to have larger 'struct page' than was expected when commit > 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b > architectures") was merged: > > include/linux/mm.h:156:10: warning: no case matching constant switch condition '96' > switch (sizeof(struct page)) { > > Extend the maximum accordingly. > > Fixes: 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b architectures") No need to add the above "Fixes:". The above patch works as expected everywhere where the struct page is 80 bytes or smaller (as specified by the comment). I also agree with others that the KMSAN should be part of page_ext instead of increasing the complexity of "struct page". Otherwise, Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Thanks, Pasha
On Mon, Jan 30, 2023 at 2:38 PM Michal Hocko <mhocko@suse.com> wrote: > > On Mon 30-01-23 14:07:26, Arnd Bergmann wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > > > After x86 has enabled support for KMSAN, it has become possible > > to have larger 'struct page' than was expected when commit > > 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b > > architectures") was merged: > > > > include/linux/mm.h:156:10: warning: no case matching constant switch condition '96' > > switch (sizeof(struct page)) { > > > > Extend the maximum accordingly. > > > > Fixes: 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b architectures") > > Fixes: 4ca8cc8d1bbe ("x86: kmsan: enable KMSAN builds for x86") > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > Acked-by: Michal Hocko <mhocko@suse.com> > > I haven't really followed KMSAN development but I would have expected > that it would, like other debugging tools, add its metadata to page_ext > rather than page directly. Thanks for the comment! I was considering page_ext at some point, but managed to convince myself it didn't suit the purpose well enough. Right now KMSAN allocates its metadata at boot time, when tearing down memblock. At that point only a handful of memory ranges exist, and it is pretty easy to carve out some unused pages for the metadata for those ranges, then divide the rest evenly and return 1/3 to the system, spending 2/3 to keep the metadata for the returned pages. I tried allocating the memory lazily (at page_alloc(), for example), and it turned out to be very tricky because of fragmentation: for an allocation of a given order, one needs shadow and origin allocations of the same order [1], and alloc_pages() simply started with ripping apart the biggest chunk of memory available. IIRC if we choose to allocate metadata via page_ext, the memory will be already too fragmented to easily handle it, because it will only happen once alloc_pages() is available. We also can't get rid of the shadow/origin pointers in struct page_ext (storing two 4K-sized arrays in that struct would defeat all the possible alignments), so we won't save any memory by switching to page_ext. [1] - I can go into more details, but the TLDR is that contiguous pages within the same allocations better have contiguous shadow/origin pages, otherwise unaligned accesses will corrupt other pages.
> > I haven't really followed KMSAN development but I would have expected > > that it would, like other debugging tools, add its metadata to page_ext > > rather than page directly. > > Yes, that would have been preferable. Also, I don't understand why we > need an entire page to store whether each "bit" of a page is initialised. > There are no CPUs which have bit-granularity stores; either you initialise > an entire byte or not. So that metadata can shrink from 4096 bytes > to 512. It's not about bit-granularity stores, it's about bits being uninitialized or not. Consider the following struct: struct foo { char a:4; char b:4; } f; - if the user initializes f.a and then tries to use f.b, this is still undefined behavior that KMSAN is able to catch thanks to bit-to-bit shadow, but would not have been able to detect if we only stored one bit per byte. Another example is bit flags or bit masks, where you can set a single bit in an int32, but that wouldn't necessarily mean the rest of that variable is initialized. It's worth mentioning that even if we choose to shrink the shadows from 4096 to 512 bytes, there'd still be four-byte origin IDs, which are allocated for every four bytes of program memory. So a whole page of origins will still be required in addition to those 512 bytes of shadow. (Origins are handy when debugging KMSAN reports, because a single uninit value can be copied or modified multiple times before it is used in a branch or passed to the userspace. Shrinking origins further would render them useless for e.g. 32-bit local variables, which is a quite common use case).
On Mon 30-01-23 18:59:45, Alexander Potapenko wrote: > On Mon, Jan 30, 2023 at 2:38 PM Michal Hocko <mhocko@suse.com> wrote: > > > > On Mon 30-01-23 14:07:26, Arnd Bergmann wrote: > > > From: Arnd Bergmann <arnd@arndb.de> > > > > > > After x86 has enabled support for KMSAN, it has become possible > > > to have larger 'struct page' than was expected when commit > > > 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b > > > architectures") was merged: > > > > > > include/linux/mm.h:156:10: warning: no case matching constant switch condition '96' > > > switch (sizeof(struct page)) { > > > > > > Extend the maximum accordingly. > > > > > > Fixes: 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b architectures") > > > Fixes: 4ca8cc8d1bbe ("x86: kmsan: enable KMSAN builds for x86") > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > > > Acked-by: Michal Hocko <mhocko@suse.com> > > > > I haven't really followed KMSAN development but I would have expected > > that it would, like other debugging tools, add its metadata to page_ext > > rather than page directly. > > Thanks for the comment! > I was considering page_ext at some point, but managed to convince > myself it didn't suit the purpose well enough. > > Right now KMSAN allocates its metadata at boot time, when tearing down memblock. > At that point only a handful of memory ranges exist, and it is pretty > easy to carve out some unused pages for the metadata for those ranges, > then divide the rest evenly and return 1/3 to the system, spending 2/3 > to keep the metadata for the returned pages. > I tried allocating the memory lazily (at page_alloc(), for example), > and it turned out to be very tricky because of fragmentation: for an > allocation of a given order, one needs shadow and origin allocations > of the same order [1], and alloc_pages() simply started with ripping > apart the biggest chunk of memory available. page_ext allocation happens quite early as well. There shouldn't be any real fragmentation that early during the boot. > IIRC if we choose to allocate metadata via page_ext, the memory will > be already too fragmented to easily handle it, because it will only > happen once alloc_pages() is available. > We also can't get rid of the shadow/origin pointers in struct page_ext > (storing two 4K-sized arrays in that struct would defeat all the > possible alignments), so we won't save any memory by switching to > page_ext. With page_ext you would allow to compile the feature in disabled by default and allow to boot time enable it. > [1] - I can go into more details, but the TLDR is that contiguous > pages within the same allocations better have contiguous shadow/origin > pages, otherwise unaligned accesses will corrupt other pages.
On Tue, Jan 31, 2023 at 4:14 PM Michal Hocko <mhocko@suse.com> wrote: > > On Mon 30-01-23 18:59:45, Alexander Potapenko wrote: > > On Mon, Jan 30, 2023 at 2:38 PM Michal Hocko <mhocko@suse.com> wrote: > > > > > > On Mon 30-01-23 14:07:26, Arnd Bergmann wrote: > > > > From: Arnd Bergmann <arnd@arndb.de> > > > > > > > > After x86 has enabled support for KMSAN, it has become possible > > > > to have larger 'struct page' than was expected when commit > > > > 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b > > > > architectures") was merged: > > > > > > > > include/linux/mm.h:156:10: warning: no case matching constant switch condition '96' > > > > switch (sizeof(struct page)) { > > > > > > > > Extend the maximum accordingly. > > > > > > > > Fixes: 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b architectures") > > > > Fixes: 4ca8cc8d1bbe ("x86: kmsan: enable KMSAN builds for x86") > > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > > > > > Acked-by: Michal Hocko <mhocko@suse.com> > > > > > > I haven't really followed KMSAN development but I would have expected > > > that it would, like other debugging tools, add its metadata to page_ext > > > rather than page directly. > > > > Thanks for the comment! > > I was considering page_ext at some point, but managed to convince > > myself it didn't suit the purpose well enough. > > > > Right now KMSAN allocates its metadata at boot time, when tearing down memblock. > > At that point only a handful of memory ranges exist, and it is pretty > > easy to carve out some unused pages for the metadata for those ranges, > > then divide the rest evenly and return 1/3 to the system, spending 2/3 > > to keep the metadata for the returned pages. > > I tried allocating the memory lazily (at page_alloc(), for example), > > and it turned out to be very tricky because of fragmentation: for an > > allocation of a given order, one needs shadow and origin allocations > > of the same order [1], and alloc_pages() simply started with ripping > > apart the biggest chunk of memory available. > > page_ext allocation happens quite early as well. There shouldn't be any > real fragmentation that early during the boot. > > > IIRC if we choose to allocate metadata via page_ext, the memory will > > be already too fragmented to easily handle it, because it will only > > happen once alloc_pages() is available. > > We also can't get rid of the shadow/origin pointers in struct page_ext > > (storing two 4K-sized arrays in that struct would defeat all the > > possible alignments), so we won't save any memory by switching to > > page_ext. > > With page_ext you would allow to compile the feature in disabled by > default and allow to boot time enable it. This makes little sense to do, because KMSAN requires heavy compile-time instrumentation to work. One cannot simply enable/disable it at boot time anyway.
> > Right now KMSAN allocates its metadata at boot time, when tearing down memblock. > > At that point only a handful of memory ranges exist, and it is pretty > > easy to carve out some unused pages for the metadata for those ranges, > > then divide the rest evenly and return 1/3 to the system, spending 2/3 > > to keep the metadata for the returned pages. > > I tried allocating the memory lazily (at page_alloc(), for example), > > and it turned out to be very tricky because of fragmentation: for an > > allocation of a given order, one needs shadow and origin allocations > > of the same order [1], and alloc_pages() simply started with ripping > > apart the biggest chunk of memory available. > > page_ext allocation happens quite early as well. There shouldn't be any > real fragmentation that early during the boot. Assuming we are talking about the early_page_ext_enabled() case, here are the init functions that are executed between kmsan_init_shadow() and page_ext_init(): stack_depot_early_init(); mem_init(); mem_init_print_info(); kmem_cache_init(); /* * page_owner must be initialized after buddy is ready, and also after * slab is ready so that stack_depot_init() works properly */ page_ext_init_flatmem_late(); kmemleak_init(); pgtable_init(); debug_objects_mem_init(); vmalloc_init(); There's yet another problem besides fragmentation: we need to allocate shadow for every page that was allocated by these functions. Right now this is done by kmsan_init_shadow, which walks all the existing memblock ranges, plus the _data segment and the node data for each node, and grabs memory from the buddy allocator. If we delay the metadata allocation to the point where memory caches exist, we'll have to somehow walk every allocated struct page and allocate the metadata for each of those. Is there an easy way to do so? I am unsure if vmalloc_init() creates any virtual mappings (probably not?), but if it does, we'd also need to call kmsan_vmap_pages_range_noflush() for them once we set up the metadata. With the current metadata allocation scheme it's not needed, because the buddy allocator is torn down before the virtual mappings are created. In the ideal world, we'd better place KMSAN shadow/origin pages at fixed addresses, like this is done for KASAN - that would not require storing pointers in struct page. But reserving big chunks of the address space is even harder than what's currently being done.
diff --git a/include/linux/mm.h b/include/linux/mm.h index b73ba2e5cfd2..aa39d5ddace1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -137,7 +137,7 @@ extern int mmap_rnd_compat_bits __read_mostly; * define their own version of this macro in <asm/pgtable.h> */ #if BITS_PER_LONG == 64 -/* This function must be updated when the size of struct page grows above 80 +/* This function must be updated when the size of struct page grows above 96 * or reduces below 56. The idea that compiler optimizes out switch() * statement, and only leaves move/store instructions. Also the compiler can * combine write statements if they are both assignments and can be reordered, @@ -148,12 +148,18 @@ static inline void __mm_zero_struct_page(struct page *page) { unsigned long *_pp = (void *)page; - /* Check that struct page is either 56, 64, 72, or 80 bytes */ + /* Check that struct page is either 56, 64, 72, 80, 88 or 96 bytes */ BUILD_BUG_ON(sizeof(struct page) & 7); BUILD_BUG_ON(sizeof(struct page) < 56); - BUILD_BUG_ON(sizeof(struct page) > 80); + BUILD_BUG_ON(sizeof(struct page) > 96); switch (sizeof(struct page)) { + case 96: + _pp[11] = 0; + fallthrough; + case 88: + _pp[10] = 0; + fallthrough; case 80: _pp[9] = 0; fallthrough;