Message ID | d94caa60d28349ca5a3c709fdb67545d9374e0dc.1694625260.git.andreyknvl@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | stackdepot: allow evicting stack traces | expand |
On 2023-09-13 19:14, andrey.konovalov@linux.dev wrote: > From: Andrey Konovalov <andreyknvl@google.com> > > Switch stack_record to use list_head for links in the hash table > and in the freelist. > > This will allow removing entries from the hash table buckets. > > This is preparatory patch for implementing the eviction of stack records > from the stack depot. > > Signed-off-by: Andrey Konovalov <andreyknvl@google.com> > Building on an arm64 kernel from linux-next tag next-20230915, and boot that in QEMU. I see the following kernel panic. [ 67.398850][ T1] Unable to handle kernel read from unreadable memory at virtual address 0000000000000010 [ 67.407996][ T1] Mem abort info: [ 67.411023][ T1] ESR = 0x0000000096000004 [ 67.414757][ T1] EC = 0x25: DABT (current EL), IL = 32 bits [ 67.419945][ T1] SET = 0, FnV = 0 [ 67.423172][ T1] EA = 0, S1PTW = 0 [ 67.426669][ T1] FSC = 0x04: level 0 translation fault [ 67.431357][ T1] Data abort info: [ 67.434593][ T1] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 67.439801][ T1] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 67.444948][ T1] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 67.449910][ T1] [0000000000000010] user address but active_mm is swapper [ 67.456236][ T1] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [ 67.462181][ T1] Modules linked in: [ 67.465435][ T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G T 6.6.0-rc1-next-20230915 #2 e95cf19845fbc1e6a5f0694214d59e527e463469 [ 67.477126][ T1] Hardware name: linux,dummy-virt (DT) [ 67.481994][ T1] pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 67.488454][ T1] pc : stack_depot_save_flags+0x2a8/0x780 [ 67.493348][ T1] lr : stack_depot_save_flags+0x2a8/0x780 [ 67.498339][ T1] sp : ffff80008000b870 [ 67.501670][ T1] x29: ffff80008000b870 x28: 00000000650dddc5 x27: 0000000000000000 [ 67.508658][ T1] x26: ffff80008470a000 x25: ffff80008000b9e8 x24: 0000000000000001 [ 67.515564][ T1] x23: 000000000000000e x22: ffff80008000b988 x21: 0000000000000001 [ 67.522430][ T1] x20: ffff00007b40d070 x19: 000000006ee80007 x18: ffff80008000d080 [ 67.529365][ T1] x17: 0000000000000000 x16: 0000000000000000 x15: 2030303178302f30 [ 67.536101][ T1] x14: 0000000000000000 x13: 205d315420202020 x12: 0000000000000000 [ 67.542985][ T1] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 [ 67.549863][ T1] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 [ 67.556764][ T1] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 [ 67.563687][ T1] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 [ 67.570500][ T1] Call trace: [ 67.573275][ T1] stack_depot_save_flags+0x2a8/0x780 [ 67.577794][ T1] stack_depot_save+0x4c/0xc0 [ 67.582062][ T1] ref_tracker_alloc+0x354/0x480 [ 67.586273][ T1] sk_alloc+0x280/0x5f8 [ 67.590064][ T1] __netlink_create+0x84/0x200 [ 67.594009][ T1] __netlink_kernel_create+0x11c/0x500 [ 67.598816][ T1] rtnetlink_net_init+0xc4/0x180 [ 67.603052][ T1] ops_init+0x100/0x2c0 [ 67.606827][ T1] register_pernet_operations+0x228/0x480 [ 67.611568][ T1] register_pernet_subsys+0x5c/0xc0 [ 67.616282][ T1] rtnetlink_init+0x60/0xb00 [ 67.620086][ T1] netlink_proto_init+0x374/0x400 [ 67.624465][ T1] do_one_initcall+0x2c8/0x840 [ 67.628518][ T1] do_initcalls+0x21c/0x340 [ 67.632527][ T1] kernel_init_freeable+0x3b0/0x480 [ 67.636905][ T1] kernel_init+0x58/0x380 [ 67.640768][ T1] ret_from_fork+0x10/0x40 [ 67.644606][ T1] Code: eb1b029f 540008c0 91004360 97caa437 (b9401360) [ 67.650293][ T1] ---[ end trace 0000000000000000 ]--- [ 67.654948][ T1] Kernel panic - not syncing: Oops: Fatal exception [ 67.660229][ T1] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]--- The full log can be found [1] and the .config file [2]. I bisected down to this commit, see the bisect log [3]. When reverted these two commits I managed to build and the kernel booted. 47590ecf1166 ("lib/stackdepot: use list_head for stack record links") 8729f3c26fc2 ("lib/stackdepot: allow users to evict stack traces") Cheers, Anders [1] http://ix.io/4GyE [2] https://people.linaro.org/~anders.roxell/next-20230915.config [3] http://ix.io/4GyG
On Sat, 16 Sep 2023 19:43:35 +0200 Anders Roxell <anders.roxell@linaro.org> wrote: > On 2023-09-13 19:14, andrey.konovalov@linux.dev wrote: > > From: Andrey Konovalov <andreyknvl@google.com> > > > > Switch stack_record to use list_head for links in the hash table > > and in the freelist. > > > > This will allow removing entries from the hash table buckets. > > > > This is preparatory patch for implementing the eviction of stack records > > from the stack depot. > > > > Signed-off-by: Andrey Konovalov <andreyknvl@google.com> > > > > Building on an arm64 kernel from linux-next tag next-20230915, and boot > that in QEMU. I see the following kernel panic. > > ... > > The full log can be found [1] and the .config file [2]. I bisected down > to this commit, see the bisect log [3]. > > When reverted these two commits I managed to build and the kernel > booted. > > 47590ecf1166 ("lib/stackdepot: use list_head for stack record links") > 8729f3c26fc2 ("lib/stackdepot: allow users to evict stack traces") Thanks, I have dropped this v2 series.
On Sat, Sep 16, 2023 at 10:04 PM Andrew Morton <akpm@linux-foundation.org> wrote: > > On Sat, 16 Sep 2023 19:43:35 +0200 Anders Roxell <anders.roxell@linaro.org> wrote: > > > On 2023-09-13 19:14, andrey.konovalov@linux.dev wrote: > > > From: Andrey Konovalov <andreyknvl@google.com> > > > > > > Switch stack_record to use list_head for links in the hash table > > > and in the freelist. > > > > > > This will allow removing entries from the hash table buckets. > > > > > > This is preparatory patch for implementing the eviction of stack records > > > from the stack depot. > > > > > > Signed-off-by: Andrey Konovalov <andreyknvl@google.com> > > > > > > > Building on an arm64 kernel from linux-next tag next-20230915, and boot > > that in QEMU. I see the following kernel panic. > > > > ... > > > > The full log can be found [1] and the .config file [2]. I bisected down > > to this commit, see the bisect log [3]. I am also seeing similar crashes on an x86 KMSAN build. They are happening when in the following code: list_for_each(pos, bucket) { found = list_entry(pos, struct stack_record, list); if (found->hash == hash && found->size == size && !stackdepot_memcmp(entries, found->entries, size)) return found; } `found` is NULL @Andrey, could you please take a look?
On Mon, Oct 9, 2023 at 2:16 PM Alexander Potapenko <glider@google.com> wrote: > > On Sat, Sep 16, 2023 at 10:04 PM Andrew Morton > <akpm@linux-foundation.org> wrote: > > > > On Sat, 16 Sep 2023 19:43:35 +0200 Anders Roxell <anders.roxell@linaro.org> wrote: > > > > > On 2023-09-13 19:14, andrey.konovalov@linux.dev wrote: > > > > From: Andrey Konovalov <andreyknvl@google.com> > > > > > > > > Switch stack_record to use list_head for links in the hash table > > > > and in the freelist. > > > > > > > > This will allow removing entries from the hash table buckets. > > > > > > > > This is preparatory patch for implementing the eviction of stack records > > > > from the stack depot. > > > > > > > > Signed-off-by: Andrey Konovalov <andreyknvl@google.com> > > > > > > > > > > Building on an arm64 kernel from linux-next tag next-20230915, and boot > > > that in QEMU. I see the following kernel panic. > > > > > > ... > > > > > > The full log can be found [1] and the .config file [2]. I bisected down > > > to this commit, see the bisect log [3]. > > I am also seeing similar crashes on an x86 KMSAN build. > > They are happening when in the following code: > > list_for_each(pos, bucket) { > found = list_entry(pos, struct stack_record, list); > if (found->hash == hash && > found->size == size && > !stackdepot_memcmp(entries, found->entries, size)) > return found; > } > > `found` is NULL > > @Andrey, could you please take a look? Found a bug, will fix in v3. Thank you!
diff --git a/lib/stackdepot.c b/lib/stackdepot.c index 0b4591475d4f..1b08897ebd2b 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -18,6 +18,7 @@ #include <linux/jhash.h> #include <linux/kernel.h> #include <linux/kmsan.h> +#include <linux/list.h> #include <linux/mm.h> #include <linux/mutex.h> #include <linux/percpu.h> @@ -55,7 +56,7 @@ union handle_parts { }; struct stack_record { - struct stack_record *next; /* Link in hash table or freelist */ + struct list_head list; /* Links in hash table or freelist */ u32 hash; /* Hash in hash table */ u32 size; /* Number of stored frames */ union handle_parts handle; @@ -77,21 +78,21 @@ static bool __stack_depot_early_init_passed __initdata; /* Initial seed for jhash2. */ #define STACK_HASH_SEED 0x9747b28c -/* Hash table of pointers to stored stack traces. */ -static struct stack_record **stack_table; +/* Hash table of stored stack records. */ +static struct list_head *stack_table; /* Fixed order of the number of table buckets. Used when KASAN is enabled. */ static unsigned int stack_bucket_number_order; /* Hash mask for indexing the table. */ static unsigned int stack_hash_mask; -/* Array of memory regions that store stack traces. */ +/* Array of memory regions that store stack records. */ static void *stack_pools[DEPOT_MAX_POOLS]; /* Newly allocated pool that is not yet added to stack_pools. */ static void *new_pool; /* Number of pools in stack_pools. */ static int pools_num; -/* Next stack in the freelist of stack records within stack_pools. */ -static struct stack_record *next_stack; +/* Freelist of stack records within stack_pools. */ +static LIST_HEAD(free_stacks); /* * Stack depot tries to keep an extra pool allocated even before it runs out * of space in the currently used pool. This flag marks whether this extra pool @@ -123,6 +124,15 @@ void __init stack_depot_request_early_init(void) __stack_depot_early_init_requested = true; } +/* Initialize list_head's within the hash table. */ +static void init_stack_table(unsigned long entries) +{ + unsigned long i; + + for (i = 0; i < entries; i++) + INIT_LIST_HEAD(&stack_table[i]); +} + /* Allocates a hash table via memblock. Can only be used during early boot. */ int __init stack_depot_early_init(void) { @@ -152,10 +162,10 @@ int __init stack_depot_early_init(void) entries = 1UL << stack_bucket_number_order; pr_info("allocating hash table via alloc_large_system_hash\n"); stack_table = alloc_large_system_hash("stackdepot", - sizeof(struct stack_record *), + sizeof(struct list_head), entries, STACK_HASH_TABLE_SCALE, - HASH_EARLY | HASH_ZERO, + HASH_EARLY, NULL, &stack_hash_mask, 1UL << STACK_BUCKET_NUMBER_ORDER_MIN, @@ -165,6 +175,7 @@ int __init stack_depot_early_init(void) stack_depot_disabled = true; return -ENOMEM; } + init_stack_table(entries); return 0; } @@ -205,7 +216,7 @@ int stack_depot_init(void) entries = 1UL << STACK_BUCKET_NUMBER_ORDER_MAX; pr_info("allocating hash table of %lu entries via kvcalloc\n", entries); - stack_table = kvcalloc(entries, sizeof(struct stack_record *), GFP_KERNEL); + stack_table = kvcalloc(entries, sizeof(struct list_head), GFP_KERNEL); if (!stack_table) { pr_err("hash table allocation failed, disabling\n"); stack_depot_disabled = true; @@ -213,6 +224,7 @@ int stack_depot_init(void) goto out_unlock; } stack_hash_mask = entries - 1; + init_stack_table(entries); out_unlock: mutex_unlock(&stack_depot_init_mutex); @@ -224,30 +236,24 @@ EXPORT_SYMBOL_GPL(stack_depot_init); /* Initializes a stack depol pool. */ static void depot_init_pool(void *pool) { - const int records_in_pool = DEPOT_POOL_SIZE / DEPOT_STACK_RECORD_SIZE; - int i, offset; + int offset; lockdep_assert_held_write(&pool_rwlock); - /* Initialize handles and link stack records to each other. */ - for (i = 0, offset = 0; - offset <= DEPOT_POOL_SIZE - DEPOT_STACK_RECORD_SIZE; - i++, offset += DEPOT_STACK_RECORD_SIZE) { + WARN_ON(!list_empty(&free_stacks)); + + /* Initialize handles and link stack records into the freelist. */ + for (offset = 0; offset <= DEPOT_POOL_SIZE - DEPOT_STACK_RECORD_SIZE; + offset += DEPOT_STACK_RECORD_SIZE) { struct stack_record *stack = pool + offset; stack->handle.pool_index = pools_num; stack->handle.offset = offset >> DEPOT_STACK_ALIGN; stack->handle.extra = 0; - if (i < records_in_pool - 1) - stack->next = (void *)stack + DEPOT_STACK_RECORD_SIZE; - else - stack->next = NULL; + list_add(&stack->list, &free_stacks); } - /* Link stack records into the freelist. */ - WARN_ON(next_stack); - next_stack = pool; /* Save reference to the pool to be used by depot_fetch_stack. */ stack_pools[pools_num] = pool; @@ -289,7 +295,7 @@ static bool depot_update_pools(void **prealloc) lockdep_assert_held_write(&pool_rwlock); /* Check if we still have objects in the freelist. */ - if (next_stack) + if (!list_empty(&free_stacks)) goto out_keep_prealloc; /* Check if we have a new pool saved and use it. */ @@ -340,19 +346,18 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc) return NULL; /* Check if we have a stack record to save the stack trace. */ - stack = next_stack; - if (!stack) + if (list_empty(&free_stacks)) return NULL; - /* Advance the freelist. */ - next_stack = stack->next; + /* Get and unlink the first entry from the freelist. */ + stack = list_first_entry(&free_stacks, struct stack_record, list); + list_del(&stack->list); /* Limit number of saved frames to CONFIG_STACKDEPOT_MAX_FRAMES. */ if (size > CONFIG_STACKDEPOT_MAX_FRAMES) size = CONFIG_STACKDEPOT_MAX_FRAMES; /* Save the stack trace. */ - stack->next = NULL; stack->hash = hash; stack->size = size; /* stack->handle is already filled in by depot_init_pool. */ @@ -414,15 +419,17 @@ int stackdepot_memcmp(const unsigned long *u1, const unsigned long *u2, } /* Finds a stack in a bucket of the hash table. */ -static inline struct stack_record *find_stack(struct stack_record *bucket, +static inline struct stack_record *find_stack(struct list_head *bucket, unsigned long *entries, int size, u32 hash) { + struct list_head *pos; struct stack_record *found; lockdep_assert_held(&pool_rwlock); - for (found = bucket; found; found = found->next) { + list_for_each(pos, bucket) { + found = list_entry(pos, struct stack_record, list); if (found->hash == hash && found->size == size && !stackdepot_memcmp(entries, found->entries, size)) @@ -435,7 +442,8 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries, unsigned int nr_entries, gfp_t alloc_flags, bool can_alloc) { - struct stack_record *found = NULL, **bucket; + struct list_head *bucket; + struct stack_record *found = NULL; depot_stack_handle_t handle = 0; struct page *page = NULL; void *prealloc = NULL; @@ -462,7 +470,7 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries, read_lock_irqsave(&pool_rwlock, flags); /* Fast path: look the stack trace up without full locking. */ - found = find_stack(*bucket, entries, nr_entries, hash); + found = find_stack(bucket, entries, nr_entries, hash); if (found) { read_unlock_irqrestore(&pool_rwlock, flags); goto exit; @@ -494,14 +502,13 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries, write_lock_irqsave(&pool_rwlock, flags); - found = find_stack(*bucket, entries, nr_entries, hash); + found = find_stack(bucket, entries, nr_entries, hash); if (!found) { struct stack_record *new = depot_alloc_stack(entries, nr_entries, hash, &prealloc); if (new) { - new->next = *bucket; - *bucket = new; + list_add(&new->list, bucket); found = new; } } else if (prealloc) {