From patchwork Mon Jan 22 17:11:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marco Elver X-Patchwork-Id: 13525840 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E81EAC47DAF for ; Mon, 22 Jan 2024 17:16:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6B5736B0080; Mon, 22 Jan 2024 12:16:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6658C6B0081; Mon, 22 Jan 2024 12:16:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 52CFE6B0083; Mon, 22 Jan 2024 12:16:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3DBF26B0080 for ; Mon, 22 Jan 2024 12:16:14 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id E648B140A53 for ; Mon, 22 Jan 2024 17:16:13 +0000 (UTC) X-FDA: 81707600226.30.C8F4748 Received: from mail-ed1-f73.google.com (mail-ed1-f73.google.com [209.85.208.73]) by imf13.hostedemail.com (Postfix) with ESMTP id 2DE722001B for ; Mon, 22 Jan 2024 17:16:11 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=O2HMIgqw; spf=pass (imf13.hostedemail.com: domain of 32qKuZQUKCA8t0At6v33v0t.r310x29C-11zAprz.36v@flex--elver.bounces.google.com designates 209.85.208.73 as permitted sender) smtp.mailfrom=32qKuZQUKCA8t0At6v33v0t.r310x29C-11zAprz.36v@flex--elver.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705943772; a=rsa-sha256; cv=none; b=52g3KKbOS5h23pse9xIMqRp5Tjw2hP2JC2+gdvoXC8i0K2EPlL77cIpgm7gxqiDPq1MScA vgIKwoOHjLyH9+U51tItDq3f3ot7LePfV7IYCdC/M3iyN32XRBchV1Grh4wxNjNveluTwW r0EfVyloy93e7n7heDKIBf7vUlsCqw4= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=O2HMIgqw; spf=pass (imf13.hostedemail.com: domain of 32qKuZQUKCA8t0At6v33v0t.r310x29C-11zAprz.36v@flex--elver.bounces.google.com designates 209.85.208.73 as permitted sender) smtp.mailfrom=32qKuZQUKCA8t0At6v33v0t.r310x29C-11zAprz.36v@flex--elver.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705943772; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=IeBiO5JIOcbB3aarK4M/MV2q2unRSMAULj52Ob3ZR74=; b=B+YUNBxT21GyCW//V/8yeHLjcWsMOji+YwImRVpKQkh3pUgkS9tZH2E+CjnxRAXqKtaBKB x/QyWAIwQ133cLwM+Lq1TmcKGKs3gi9IR48NvfguQVD6iGaQUtR8IvCbIEyrrLBVSe7FBz kUx/MDBaPOtOLz5+7jsrCJToUKcjJmc= Received: by mail-ed1-f73.google.com with SMTP id 4fb4d7f45d1cf-559c71ecdf2so1412906a12.3 for ; Mon, 22 Jan 2024 09:16:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1705943770; x=1706548570; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=IeBiO5JIOcbB3aarK4M/MV2q2unRSMAULj52Ob3ZR74=; b=O2HMIgqwAJkNm/drRrOvFH2SzQpR/HBxMWbPYwFGEeysrms5pHgDO/4UZT8GG0o9mt bMoltw8k7+e9CLjhd5Gl/RfHpR4J+LckHCDYSCTHQHR1UrGmGuEXCimwgaqUuljQCNOT x9OhB5qSFRhW3E79KvoI9Pl31warhAuPHjtzIMmfsmQZQW4U1tdXRNWa3x5X4hvHbYXo 5ZXE/vJ/t3FiedNhKRNoAQ0/fK1+mMUW3bWa06K5OUcaIHxbOetJQf3crgrjshwaZS58 u/QOFaDZzpuSW1z1D6bylM7QUz3YgWUWFk7UPylVhMUs0xs77bTN0i0ozCX4OyrXUjYx ue2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705943770; x=1706548570; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=IeBiO5JIOcbB3aarK4M/MV2q2unRSMAULj52Ob3ZR74=; b=ePcTcOhWU01oNEoROVegaJJMFKktCdYWoB6TnQU0qwHlUYQ4jaf+i4HQN2Dc4J+OD7 npm7tQprPoJnBP5dUCTrc9BwjF0unDDzWe8dOFs71O3Y3m9NzLZjh2c8gn51zlV5Wmgo bpSi6PmQRMQyNeh50N48joxyoSVA84+k88Wvx3CiRRtt+wPs9AqPSbkTOlukidiNueXc K2NHMnsbsJ38005gbnDNS/6wh+dCf07LeRVePJfV+anUm47PwsTZ2AynbhKI6KyV/G0l fEg3IoEhfCLaT3JZG497R2fEklYpGSQaP5AIa76svRLYVhsWGA8aNDPmkA/6QN66ozG9 WY1Q== X-Gm-Message-State: AOJu0Yy8TINWmFZCVtt0zFgikuj5G8dnjp5h6W1htlrwG6BUBieOhqcU 9Z67p9ojc5MPpDRL4vLwGdTfBjp2klPhWfJVVAuhUeVk38UQLwpAZ70t5C2xFDhjmmwz0rQLEw= = X-Google-Smtp-Source: AGHT+IG6FACQXom5w3f7gHO27sxHOSB2FVLWUKGdi4S+O46ejh+lyp6MF71wZTC449rYG1KxBVOV5ggoeA== X-Received: from elver.muc.corp.google.com ([2a00:79e0:9c:201:511d:f6cb:99a8:ac0d]) (user=elver job=sendgmr) by 2002:a05:6402:4004:b0:55a:6821:7753 with SMTP id d4-20020a056402400400b0055a68217753mr992eda.1.1705943770543; Mon, 22 Jan 2024 09:16:10 -0800 (PST) Date: Mon, 22 Jan 2024 18:11:30 +0100 Mime-Version: 1.0 X-Mailer: git-send-email 2.43.0.429.g432eaa2c6b-goog Message-ID: <20240122171215.319440-2-elver@google.com> Subject: [RFC PATCH] stackdepot: use variable size records for non-evictable entries From: Marco Elver To: elver@google.com, Andrew Morton Cc: Andrey Konovalov , Alexander Potapenko , Dmitry Vyukov , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kasan-dev@googlegroups.com X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 2DE722001B X-Stat-Signature: 68ixma4et81ta115rhojndnsnrhigpfj X-Rspam-User: X-HE-Tag: 1705943771-501348 X-HE-Meta: U2FsdGVkX19mowkh+LHH2eTdax4tirf3IufCfD3OQ48dygkZagdSGYkLpG9zAUn5pMK7n5qWK4eDZSD6HH34GTVAlN6Nz1oPfWxFu+6YZxLzrbhO7FtLwh4MfSZ8LTOz4hdZ4R41NZJPaEkJ6/k6EoZZhpscAoH7Vcm9NAtR2vCsi++GLqYcYFTnrw/vogFLZ7ItYfQTdpA5Zup/xKLvgM4ywa0BiAwDesqNhmbo9p7dRhT0kkiYBpzHFwjL+0hPyokXWy1Tx9S84QFdeKsrb29uyBhPs+QhdZfVEOnG6m+bBABiy0P71mk27iWlC3E7dreR1CKs5Q89biI+wzJx8w6cQg4KQZLZsPAoj0HYYMfycWCdHs5JpKnY4LzayyDVpdgmw+qS3i/ICaopqjlYNAMc7ZlQSFcH6olYtXXZ4M/r6kY4Yd8YCed2a9wjdfZJtLe/MmqZSe9EXaTyzQFIt7fEp1br+O+4955cPAE2+s2wvZCF6HakcifwgpvQkQCGTSKOhJ+lZHJaG0nIcEImdy0NHSwvsjF7guR6aIB4s3KjLdh8asT1S2LX9fsoHCAAWFenB4xC85kU2Jy6qVg79PfZcqnxGigsL+jgvQ3yh5BRHz+C8GpS9NBnJsDfmTPl4XAF5rpGKYIA5ics03nFUVRQZ+Z4E95sQ2+2OO4yUdXOOBPAwiOmcDt8yM9jY/oEW+oL4DePoewU31IspPQZkysiyp2F8axViZBukXuufyg+78uhMZnB58zHOJs6i8r3mTcpQ18ppspTCGlXT72aFQ02J5YOsTxv2la04123vbRXePt/a2jXGPHMdTEbi9kg4rMvb2AA/I8txLjex9QUNz+1BlRDnzyG4owGqgzuZTr+rRahjL2rNhxpgkGPjagkvW01Ec8oqGF6AktnfIFdSZsZQ0EuJzCBvL6VoP69t40IhiLu/ZUt/dko5luG4yfd+mErvcE5TL/O2eRIuND 34Bufy8F 9ukgQiGGlPY0cQRfL0qMp7d9TUQ4vP8Z8BpAvQz9IGQqJ59QyjIJnfWCQGN2LcNkwzhLzDIrnzbYfVKt5R3qr0JfzdW3m7LnItQNSglks2t768zpY1T3gPtReF7m7wTOeIiQK3npIzOqq4in9pUlpS5sn9Lctoy+2dunUu9iAq1Mo8uZF0ZydyM+zLG81S+1LwaJ3uBBK7hDc9r10hSac/YNX1uGIe/eYeZlN73ze1J/bDD2ZrqpqEOJ6zEH5A/ToHYxh/I3ERMKwQ8oYOGikTVwTWSNpZrCyGM/EGq3Vk+zwj79sfwwiKzwh+YbniULsB+bKMydrYeUCpnp2decYo1cb1y9zYm1SP+oQErLp7TS+23hRmr3fBvL0Ssn5A2s8rpkCapUtFyKGqolP0Bp1hrfaKkePigkm4M2GvwmsIQa+T21NpwUq/SQGaVtOy5TKb6XZNqB8TnODVMzR0WrlXPmZIDugguH+ip9vRXx2HICbZdsvgJpjOQpGrE35GvE7g1HkpHH2lhyYxeOGGulJgVpzjn/2LZ/nvUIQ3j4Ng7JGMnMzff2lGFxAEe5WCDQH+ZxlVe2A83FGLhg0OwFAAqgEb45HJ9MdTcV3YjIcnLOg+ePrqEOQolQc/g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: With the introduction of stack depot evictions, each stack record is now fixed size, so that future reuse after an eviction can safely store differently sized stack traces. In all cases that do not make use of evictions, this wastes lots of space. Fix it by re-introducing variable size stack records (up to the max allowed size) for entries that will never be evicted. We know if an entry will never be evicted if the flag STACK_DEPOT_FLAG_GET is not provided, since a later stack_depot_put() attempt is undefined behavior. With my current kernel config that enables KASAN and also SLUB owner tracking, I observe (after a kernel boot) a whopping reduction of 296 stack depot pools, which translates into 4736 KiB saved. The savings here are from SLUB owner tracking only, because KASAN generic mode still uses refcounting. Before: pools: 893 allocations: 29841 frees: 6524 in_use: 23317 freelist_size: 3454 After: pools: 597 allocations: 29657 frees: 6425 in_use: 23232 freelist_size: 3493 Fixes: 108be8def46e ("lib/stackdepot: allow users to evict stack traces") Signed-off-by: Marco Elver Cc: Alexander Potapenko Cc: Andrey Konovalov Cc: Dmitry Vyukov Signed-off-by: Marco Elver --- Sending this out as an early RFC. We're stilling mulling over what to do with generic KASAN, because stack depot eviction support was only added due to concern of too much memory usage. If this general approach makes sense, then I'd be in favour of just reverting all the KASAN-generic eviction patches and leaving KASAN-tag as the only user of evictions. Thoughts? --- lib/stackdepot.c | 163 +++++++++++++++++++++++++---------------------- 1 file changed, 88 insertions(+), 75 deletions(-) diff --git a/lib/stackdepot.c b/lib/stackdepot.c index 5caa1f566553..726002d2ac09 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -93,9 +93,6 @@ struct stack_record { }; }; -#define DEPOT_STACK_RECORD_SIZE \ - ALIGN(sizeof(struct stack_record), 1 << DEPOT_STACK_ALIGN) - static bool stack_depot_disabled; static bool __stack_depot_early_init_requested __initdata = IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT); static bool __stack_depot_early_init_passed __initdata; @@ -121,6 +118,8 @@ static void *stack_pools[DEPOT_MAX_POOLS]; static void *new_pool; /* Number of pools in stack_pools. */ static int pools_num; +/* Offset to the unused space in the currently used pool. */ +static size_t pool_offset = DEPOT_POOL_SIZE; /* Freelist of stack records within stack_pools. */ static LIST_HEAD(free_stacks); /* @@ -294,48 +293,44 @@ int stack_depot_init(void) EXPORT_SYMBOL_GPL(stack_depot_init); /* - * Initializes new stack depot @pool, release all its entries to the freelist, - * and update the list of pools. + * Initializes new stack pool, and update the list of pools. */ -static void depot_init_pool(void *pool) +static bool depot_init_pool(void **prealloc) { - int offset; + void *pool = NULL; lockdep_assert_held(&pool_lock); - /* Initialize handles and link stack records into the freelist. */ - for (offset = 0; offset <= DEPOT_POOL_SIZE - DEPOT_STACK_RECORD_SIZE; - offset += DEPOT_STACK_RECORD_SIZE) { - struct stack_record *stack = pool + offset; - - stack->handle.pool_index = pools_num; - stack->handle.offset = offset >> DEPOT_STACK_ALIGN; - stack->handle.extra = 0; - - /* - * Stack traces of size 0 are never saved, and we can simply use - * the size field as an indicator if this is a new unused stack - * record in the freelist. - */ - stack->size = 0; + if (new_pool) { + /* We have a new pool saved, use it. */ + pool = new_pool; + new_pool = NULL; - INIT_LIST_HEAD(&stack->hash_list); - /* - * Add to the freelist front to prioritize never-used entries: - * required in case there are entries in the freelist, but their - * RCU cookie still belongs to the current RCU grace period - * (there can still be concurrent readers). - */ - list_add(&stack->free_list, &free_stacks); - counters[DEPOT_COUNTER_FREELIST_SIZE]++; + /* Take note that we might need a new new_pool. */ + if (pools_num < DEPOT_MAX_POOLS) + WRITE_ONCE(new_pool_required, true); + } else if (unlikely(pools_num >= DEPOT_MAX_POOLS)) { + /* Bail out if we reached the pool limit. */ + WARN_ONCE(1, "Stack depot reached limit capacity"); + } else if (*prealloc) { + /* We have preallocated memory, use it. */ + pool = *prealloc; + *prealloc = NULL; } + if (!pool) + return false; + /* Save reference to the pool to be used by depot_fetch_stack(). */ stack_pools[pools_num] = pool; /* Pairs with concurrent READ_ONCE() in depot_fetch_stack(). */ WRITE_ONCE(pools_num, pools_num + 1); ASSERT_EXCLUSIVE_WRITER(pools_num); + + pool_offset = 0; + + return true; } /* Keeps the preallocated memory to be used for a new stack depot pool. */ @@ -368,39 +363,40 @@ static void depot_keep_new_pool(void **prealloc) } /* - * Try to initialize a new stack depot pool from either a previous or the - * current pre-allocation, and release all its entries to the freelist. + * Try to initialize a new stack record from the current pool, a cached pool, or + * the current pre-allocation. */ -static bool depot_try_init_pool(void **prealloc) +static struct stack_record *depot_pop_free_pool(void **prealloc, size_t size) { + struct stack_record *stack; + void *current_pool; + u32 pool_index; + lockdep_assert_held(&pool_lock); - /* Check if we have a new pool saved and use it. */ - if (new_pool) { - depot_init_pool(new_pool); - new_pool = NULL; + if (pool_offset + size > DEPOT_POOL_SIZE) { + if (!depot_init_pool(prealloc)) + return NULL; + } - /* Take note that we might need a new new_pool. */ - if (pools_num < DEPOT_MAX_POOLS) - WRITE_ONCE(new_pool_required, true); + if (WARN_ON_ONCE(pools_num < 1)) + return NULL; + pool_index = pools_num - 1; + current_pool = stack_pools[pool_index]; + if (WARN_ON_ONCE(!current_pool)) + return NULL; - return true; - } + stack = current_pool + pool_offset; - /* Bail out if we reached the pool limit. */ - if (unlikely(pools_num >= DEPOT_MAX_POOLS)) { - WARN_ONCE(1, "Stack depot reached limit capacity"); - return false; - } + /* Pre-initialize handle once. */ + stack->handle.pool_index = pool_index; + stack->handle.offset = pool_offset >> DEPOT_STACK_ALIGN; + stack->handle.extra = 0; + INIT_LIST_HEAD(&stack->hash_list); - /* Check if we have preallocated memory and use it. */ - if (*prealloc) { - depot_init_pool(*prealloc); - *prealloc = NULL; - return true; - } + pool_offset += size; - return false; + return stack; } /* Try to find next free usable entry. */ @@ -420,7 +416,7 @@ static struct stack_record *depot_pop_free(void) * check the first entry. */ stack = list_first_entry(&free_stacks, struct stack_record, free_list); - if (stack->size && !poll_state_synchronize_rcu(stack->rcu_state)) + if (!poll_state_synchronize_rcu(stack->rcu_state)) return NULL; list_del(&stack->free_list); @@ -429,45 +425,62 @@ static struct stack_record *depot_pop_free(void) return stack; } +static inline size_t depot_stack_record_size(struct stack_record *s, size_t nr_entries) +{ + const size_t used = flex_array_size(s, entries, nr_entries); + const size_t unused = sizeof(s->entries) - used; + + WARN_ON_ONCE(sizeof(s->entries) < used); + + return ALIGN(sizeof(struct stack_record) - unused, 1 << DEPOT_STACK_ALIGN); +} + /* Allocates a new stack in a stack depot pool. */ static struct stack_record * -depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc) +depot_alloc_stack(unsigned long *entries, int nr_entries, u32 hash, depot_flags_t flags, void **prealloc) { - struct stack_record *stack; + struct stack_record *stack = NULL; + size_t record_size; lockdep_assert_held(&pool_lock); /* This should already be checked by public API entry points. */ - if (WARN_ON_ONCE(!size)) + if (WARN_ON_ONCE(!nr_entries)) return NULL; - /* Check if we have a stack record to save the stack trace. */ - stack = depot_pop_free(); - if (!stack) { - /* No usable entries on the freelist - try to refill the freelist. */ - if (!depot_try_init_pool(prealloc)) - return NULL; + /* Limit number of saved frames to CONFIG_STACKDEPOT_MAX_FRAMES. */ + if (nr_entries > CONFIG_STACKDEPOT_MAX_FRAMES) + nr_entries = CONFIG_STACKDEPOT_MAX_FRAMES; + + if (flags & STACK_DEPOT_FLAG_GET) { + /* + * Evictable entries have to allocate the max. size so they may + * safely be re-used by differently sized allocations. + */ + record_size = depot_stack_record_size(stack, CONFIG_STACKDEPOT_MAX_FRAMES); stack = depot_pop_free(); - if (WARN_ON(!stack)) - return NULL; + } else { + record_size = depot_stack_record_size(stack, nr_entries); } - /* Limit number of saved frames to CONFIG_STACKDEPOT_MAX_FRAMES. */ - if (size > CONFIG_STACKDEPOT_MAX_FRAMES) - size = CONFIG_STACKDEPOT_MAX_FRAMES; + if (!stack) { + stack = depot_pop_free_pool(prealloc, record_size); + if (!stack) + return NULL; + } /* Save the stack trace. */ stack->hash = hash; - stack->size = size; - /* stack->handle is already filled in by depot_init_pool(). */ + stack->size = nr_entries; + /* stack->handle is already filled in by depot_pop_free_pool(). */ refcount_set(&stack->count, 1); - memcpy(stack->entries, entries, flex_array_size(stack, entries, size)); + memcpy(stack->entries, entries, flex_array_size(stack, entries, nr_entries)); /* * Let KMSAN know the stored stack record is initialized. This shall * prevent false positive reports if instrumented code accesses it. */ - kmsan_unpoison_memory(stack, DEPOT_STACK_RECORD_SIZE); + kmsan_unpoison_memory(stack, record_size); counters[DEPOT_COUNTER_ALLOCS]++; counters[DEPOT_COUNTER_INUSE]++; @@ -681,7 +694,7 @@ depot_stack_handle_t stack_depot_save_flags(unsigned long *entries, found = find_stack(bucket, entries, nr_entries, hash, depot_flags); if (!found) { struct stack_record *new = - depot_alloc_stack(entries, nr_entries, hash, &prealloc); + depot_alloc_stack(entries, nr_entries, hash, depot_flags, &prealloc); if (new) { /*