Message ID | 20211013073005.11351-1-vbabka@suse.cz (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v3] lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() | expand |
On 10/13/21 09:30, Vlastimil Babka wrote: > Currently, enabling CONFIG_STACKDEPOT means its stack_table will be allocated > from memblock, even if stack depot ends up not actually used. The default size > of stack_table is 4MB on 32-bit, 8MB on 64-bit. > > This is fine for use-cases such as KASAN which is also a config option and > has overhead on its own. But it's an issue for functionality that has to be > actually enabled on boot (page_owner) or depends on hardware (GPU drivers) > and thus the memory might be wasted. This was raised as an issue [1] when > attempting to add stackdepot support for SLUB's debug object tracking > functionality. It's common to build kernels with CONFIG_SLUB_DEBUG and enable > slub_debug on boot only when needed, or create only specific kmem caches with > debugging for testing purposes. > > It would thus be more efficient if stackdepot's table was allocated only when > actually going to be used. This patch thus makes the allocation (and whole > stack_depot_init() call) optional: > > - Add a CONFIG_STACKDEPOT_ALWAYS_INIT flag to keep using the current > well-defined point of allocation as part of mem_init(). Make CONFIG_KASAN > select this flag. > - Other users have to call stack_depot_init() as part of their own init when > it's determined that stack depot will actually be used. This may depend on > both config and runtime conditions. Convert current users which are > page_owner and several in the DRM subsystem. Same will be done for SLUB > later. > - Because the init might now be called after the boot-time memblock allocation > has given all memory to the buddy allocator, change stack_depot_init() to > allocate stack_table with kvmalloc() when memblock is no longer available. > Also handle allocation failure by disabling stackdepot (could have > theoretically happened even with memblock allocation previously), and don't > unnecessarily align the memblock allocation to its own size anymore. > > [1] https://lore.kernel.org/all/CAMuHMdW=eoVzM1Re5FVoEN87nKfiLmM2+Ah7eNu2KXEhCvbZyA@mail.gmail.com/ ... > --- > Changes in v3: > - stack_depot_init_mutex made static and moved inside stack_depot_init() > Reported-by: kernel test robot <lkp@intel.com> > - use !stack_table condition instead of stack_table == NULL > reported by checkpatch on freedesktop.org patchwork The last change above was missing because I forgot git commit --amend before git format-patch. More importantly there was a bot report for FLATMEM. Please add this fixup. Thanks. ----8<---- From a971a1670491f8fbbaab579eef3c756a5263af95 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka <vbabka@suse.cz> Date: Thu, 7 Oct 2021 10:49:09 +0200 Subject: [PATCH] lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup On FLATMEM, we call page_ext_init_flatmem_late() just before kmem_cache_init() which means stack_depot_init() (called by page owner init) will not recognize properly it should use kvmalloc() and not memblock_alloc(). memblock_alloc() will also not issue a warning and return a block memory that can be invalid and cause kernel page fault when saving stacks, as reported by the kernel test robot [1]. Fix this by moving page_ext_init_flatmem_late() below kmem_cache_init() so that slab_is_available() is true during stack_depot_init(). SPARSEMEM doesn't have this issue, as it doesn't do page_ext_init_flatmem_late(), but a different page_ext_init() even later in the boot process. Thanks to Mike Rapoport for pointing out the FLATMEM init ordering issue. While at it, also actually resolve a checkpatch warning in stack_depot_init() from DRM CI, which was supposed to be in the original patch already. [1] https://lore.kernel.org/all/20211014085450.GC18719@xsang-OptiPlex-9020/ Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: kernel test robot <oliver.sang@intel.com> --- init/main.c | 7 +++++-- lib/stackdepot.c | 2 +- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/init/main.c b/init/main.c index ca2765c8e45c..0ab632f681c5 100644 --- a/init/main.c +++ b/init/main.c @@ -845,9 +845,12 @@ static void __init mm_init(void) stack_depot_early_init(); mem_init(); mem_init_print_info(); - /* page_owner must be initialized after buddy is ready */ - page_ext_init_flatmem_late(); kmem_cache_init(); + /* + * page_owner must be initialized after buddy is ready, and also after + * slab is ready so that stack_depot_init() works properly + */ + page_ext_init_flatmem_late(); kmemleak_init(); pgtable_init(); debug_objects_mem_init(); diff --git a/lib/stackdepot.c b/lib/stackdepot.c index 049d7d025d78..1f8ea6d0899b 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -172,7 +172,7 @@ __ref int stack_depot_init(void) static DEFINE_MUTEX(stack_depot_init_mutex); mutex_lock(&stack_depot_init_mutex); - if (!stack_depot_disable && stack_table == NULL) { + if (!stack_depot_disable && !stack_table) { size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *)); int i;
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index 86d13d6bc463..b0ebdc843a00 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -5493,6 +5493,7 @@ int drm_dp_mst_topology_mgr_init(struct drm_dp_mst_topology_mgr *mgr, mutex_init(&mgr->probe_lock); #if IS_ENABLED(CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS) mutex_init(&mgr->topology_ref_history_lock); + stack_depot_init(); #endif INIT_LIST_HEAD(&mgr->tx_msg_downq); INIT_LIST_HEAD(&mgr->destroy_port_list); diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c index 93d48a6f04ab..5916228ea0c9 100644 --- a/drivers/gpu/drm/drm_mm.c +++ b/drivers/gpu/drm/drm_mm.c @@ -983,6 +983,10 @@ void drm_mm_init(struct drm_mm *mm, u64 start, u64 size) add_hole(&mm->head_node); mm->scan_active = 0; + +#ifdef CONFIG_DRM_DEBUG_MM + stack_depot_init(); +#endif } EXPORT_SYMBOL(drm_mm_init); diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c index eaf7688f517d..d083506986e1 100644 --- a/drivers/gpu/drm/i915/intel_runtime_pm.c +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c @@ -78,6 +78,9 @@ static void __print_depot_stack(depot_stack_handle_t stack, static void init_intel_runtime_pm_wakeref(struct intel_runtime_pm *rpm) { spin_lock_init(&rpm->debug.lock); + + if (rpm->available) + stack_depot_init(); } static noinline depot_stack_handle_t diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h index 6bb4bc1a5f54..40fc5e92194f 100644 --- a/include/linux/stackdepot.h +++ b/include/linux/stackdepot.h @@ -13,6 +13,22 @@ typedef u32 depot_stack_handle_t; +/* + * Every user of stack depot has to call this during its own init when it's + * decided that it will be calling stack_depot_save() later. + * + * The alternative is to select STACKDEPOT_ALWAYS_INIT to have stack depot + * enabled as part of mm_init(), for subsystems where it's known at compile time + * that stack depot will be used. + */ +int stack_depot_init(void); + +#ifdef CONFIG_STACKDEPOT_ALWAYS_INIT +static inline int stack_depot_early_init(void) { return stack_depot_init(); } +#else +static inline int stack_depot_early_init(void) { return 0; } +#endif + depot_stack_handle_t stack_depot_save(unsigned long *entries, unsigned int nr_entries, gfp_t gfp_flags); @@ -21,13 +37,4 @@ unsigned int stack_depot_fetch(depot_stack_handle_t handle, unsigned int filter_irq_stacks(unsigned long *entries, unsigned int nr_entries); -#ifdef CONFIG_STACKDEPOT -int stack_depot_init(void); -#else -static inline int stack_depot_init(void) -{ - return 0; -} -#endif /* CONFIG_STACKDEPOT */ - #endif diff --git a/init/main.c b/init/main.c index 81a79a77db46..ca2765c8e45c 100644 --- a/init/main.c +++ b/init/main.c @@ -842,7 +842,7 @@ static void __init mm_init(void) init_mem_debugging_and_hardening(); kfence_alloc_pool(); report_meminit(); - stack_depot_init(); + stack_depot_early_init(); mem_init(); mem_init_print_info(); /* page_owner must be initialized after buddy is ready */ diff --git a/lib/Kconfig b/lib/Kconfig index 5e7165e6a346..9d0569084152 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -671,6 +671,10 @@ config STACKDEPOT bool select STACKTRACE +config STACKDEPOT_ALWAYS_INIT + bool + select STACKDEPOT + config STACK_HASH_ORDER int "stack depot hash size (12 => 4KB, 20 => 1024KB)" range 12 20 diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan index cdc842d090db..879757b6dd14 100644 --- a/lib/Kconfig.kasan +++ b/lib/Kconfig.kasan @@ -38,7 +38,7 @@ menuconfig KASAN CC_HAS_WORKING_NOSANITIZE_ADDRESS) || \ HAVE_ARCH_KASAN_HW_TAGS depends on (SLUB && SYSFS) || (SLAB && !DEBUG_SLAB) - select STACKDEPOT + select STACKDEPOT_ALWAYS_INIT help Enables KASAN (KernelAddressSANitizer) - runtime memory debugger, designed to find out-of-bounds accesses and use-after-free bugs. diff --git a/lib/stackdepot.c b/lib/stackdepot.c index 0a2e417f83cb..049d7d025d78 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -24,6 +24,7 @@ #include <linux/jhash.h> #include <linux/kernel.h> #include <linux/mm.h> +#include <linux/mutex.h> #include <linux/percpu.h> #include <linux/printk.h> #include <linux/slab.h> @@ -162,18 +163,40 @@ static int __init is_stack_depot_disabled(char *str) } early_param("stack_depot_disable", is_stack_depot_disabled); -int __init stack_depot_init(void) +/* + * __ref because of memblock_alloc(), which will not be actually called after + * the __init code is gone, because at that point slab_is_available() is true + */ +__ref int stack_depot_init(void) { - if (!stack_depot_disable) { + static DEFINE_MUTEX(stack_depot_init_mutex); + + mutex_lock(&stack_depot_init_mutex); + if (!stack_depot_disable && stack_table == NULL) { size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *)); int i; - stack_table = memblock_alloc(size, size); - for (i = 0; i < STACK_HASH_SIZE; i++) - stack_table[i] = NULL; + if (slab_is_available()) { + pr_info("Stack Depot allocating hash table with kvmalloc\n"); + stack_table = kvmalloc(size, GFP_KERNEL); + } else { + pr_info("Stack Depot allocating hash table with memblock_alloc\n"); + stack_table = memblock_alloc(size, SMP_CACHE_BYTES); + } + if (stack_table) { + for (i = 0; i < STACK_HASH_SIZE; i++) + stack_table[i] = NULL; + } else { + pr_err("Stack Depot failed hash table allocationg, disabling\n"); + stack_depot_disable = true; + mutex_unlock(&stack_depot_init_mutex); + return -ENOMEM; + } } + mutex_unlock(&stack_depot_init_mutex); return 0; } +EXPORT_SYMBOL_GPL(stack_depot_init); /* Calculate hash for a stack */ static inline u32 hash_stack(unsigned long *entries, unsigned int size) diff --git a/mm/page_owner.c b/mm/page_owner.c index 62402d22539b..16a0ef903384 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -80,6 +80,8 @@ static void init_page_owner(void) if (!page_owner_enabled) return; + stack_depot_init(); + register_dummy_stack(); register_failure_stack(); register_early_stack();