From patchwork Mon Aug 29 07:56:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Tang X-Patchwork-Id: 12957487 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4D09ECAAD2 for ; Mon, 29 Aug 2022 07:55:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3D6E0940009; Mon, 29 Aug 2022 03:55:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 35F66940007; Mon, 29 Aug 2022 03:55:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 115FA940009; Mon, 29 Aug 2022 03:55:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 01960940007 for ; Mon, 29 Aug 2022 03:55:50 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id CFB801C6712 for ; Mon, 29 Aug 2022 07:55:50 +0000 (UTC) X-FDA: 79851871260.27.E3E59DF Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf20.hostedemail.com (Postfix) with ESMTP id 29D0F1C0016 for ; Mon, 29 Aug 2022 07:55:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1661759750; x=1693295750; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=lMLQJwLUrbqI+5Q5kd/ruqcHqe/+ifhPY34+3AYEgh8=; b=l9+W1x26kO85GEciTBWhia+zzxBY19+Ya81DWS+Jq+3BD2QCCGqAW6EN Zqza3QsLPSEoKKAjR+CUsxLFNi0RhpAY8JZ1avXd2cVpDKjRGfgVn2iCq PdFcB2jpzWhW0NTIROi3LasF9aKjpZw6DAFMqDZnq8B4J9zCw7beeCpGR Ri/BwJbzgF4WTQXyUFdPrOXMtE7IoT5Gh6d5C5oidGG4GkYbEDxrghs54 2DWZ9zgmW4vBpxUckxgpwamQBv1KSa02VLfJVLZRvMcq9l48MEWxAmulF vhe+F1jhIdyjFR7iOeqigzMzxz6BppvUH4cdCgrHF9nYgzdeWDQ+L74e6 w==; X-IronPort-AV: E=McAfee;i="6500,9779,10453"; a="296111507" X-IronPort-AV: E=Sophos;i="5.93,272,1654585200"; d="scan'208";a="296111507" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Aug 2022 00:55:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,272,1654585200"; d="scan'208";a="672283553" Received: from shbuild999.sh.intel.com ([10.239.147.181]) by fmsmga008.fm.intel.com with ESMTP; 29 Aug 2022 00:55:46 -0700 From: Feng Tang To: Andrew Morton , Vlastimil Babka , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Dmitry Vyukov Cc: Dave Hansen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Feng Tang , Robin Murphy , John Garry , Kefeng Wang Subject: [PATCH v4 1/4] mm/slub: enable debugging memory wasting of kmalloc Date: Mon, 29 Aug 2022 15:56:15 +0800 Message-Id: <20220829075618.69069-2-feng.tang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220829075618.69069-1-feng.tang@intel.com> References: <20220829075618.69069-1-feng.tang@intel.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661759750; a=rsa-sha256; cv=none; b=bOJE3mfrk2OTyZ+WtdeYz1n5UkSuku2/2n6UxMCrItf9ebD17d/68GpUV6fS/6zZRGEn/H KNX6olo963CNVv7u7WO19ux0WeSdbhZrf+MoAZsGQNiXIVxi01ytDcw5xz0U4x3nmw1MSK ym59K3N4u6Y+ys3ERa2Q9F/cS6RDPn4= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=l9+W1x26; spf=pass (imf20.hostedemail.com: domain of feng.tang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=feng.tang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661759750; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cMBhc7d00ZJSeSqvjfXHVWVy+k4+adYKYruvlUqbWL4=; b=uvouhJ2rAdC+v7X9/FQGta8NJ7AZfiVUiBeSy5AHmctIIe+5oNh+Tn8nxoWG+MY++TimxC C+Qpj0UCfDVV+tAp/XSi2ch4fgQgmRcqk4/mIU3OapDxlL55nOvdqytHqSSlfnjQ9aMpsZ /m0oU1OkMFVxdwYhnHTLxUB4HUXDWaY= X-Rspam-User: X-Rspamd-Queue-Id: 29D0F1C0016 Authentication-Results: imf20.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=l9+W1x26; spf=pass (imf20.hostedemail.com: domain of feng.tang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=feng.tang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Stat-Signature: nd49t3qyqzsr3f3d1tyypts4jq6iiyay X-Rspamd-Server: rspam07 X-HE-Tag: 1661759749-667441 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: kmalloc's API family is critical for mm, with one nature that it will round up the request size to a fixed one (mostly power of 2). Say when user requests memory for '2^n + 1' bytes, actually 2^(n+1) bytes could be allocated, so in worst case, there is around 50% memory space waste. The wastage is not a big issue for requests that get allocated/freed quickly, but may cause problems with objects that have longer life time. We've met a kernel boot OOM panic (v5.10), and from the dumped slab info: [ 26.062145] kmalloc-2k 814056KB 814056KB From debug we found there are huge number of 'struct iova_magazine', whose size is 1032 bytes (1024 + 8), so each allocation will waste 1016 bytes. Though the issue was solved by giving the right (bigger) size of RAM, it is still nice to optimize the size (either use a kmalloc friendly size or create a dedicated slab for it). And from lkml archive, there was another crash kernel OOM case [1] back in 2019, which seems to be related with the similar slab waste situation, as the log is similar: [ 4.332648] iommu: Adding device 0000:20:02.0 to group 16 [ 4.338946] swapper/0 invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, oom_score_adj=0 ... [ 4.857565] kmalloc-2048 59164KB 59164KB The crash kernel only has 256M memory, and 59M is pretty big here. (Note: the related code has been changed and optimised in recent kernel [2], these logs are just picked to demo the problem, also a patch changing its size to 1024 bytes has been merged) So add an way to track each kmalloc's memory waste info, and leverage the existing SLUB debug framework (specifically SLUB_STORE_USER) to show its call stack of original allocation, so that user can evaluate the waste situation, identify some hot spots and optimize accordingly, for a better utilization of memory. The waste info is integrated into existing interface: '/sys/kernel/debug/slab/kmalloc-xx/alloc_traces', one example of 'kmalloc-4k' after boot is: 126 ixgbe_alloc_q_vector+0xa5/0x4a0 [ixgbe] waste=233856/1856 age=1493302/1493830/1494358 pid=1284 cpus=32 nodes=1 __slab_alloc.isra.86+0x52/0x80 __kmalloc_node+0x143/0x350 ixgbe_alloc_q_vector+0xa5/0x4a0 [ixgbe] ixgbe_init_interrupt_scheme+0x1a6/0x730 [ixgbe] ixgbe_probe+0xc8e/0x10d0 [ixgbe] local_pci_probe+0x42/0x80 work_for_cpu_fn+0x13/0x20 process_one_work+0x1c5/0x390 which means in 'kmalloc-4k' slab, there are 126 requests of 2240 bytes which got a 4KB space (wasting 1856 bytes each and 233856 bytes in total). And when system starts some real workload like multiple docker instances, there are more severe waste. [1]. https://lkml.org/lkml/2019/8/12/266 [2]. https://lore.kernel.org/lkml/2920df89-9975-5785-f79b-257d3052dfaf@huawei.com/ [Thanks Hyeonggon for pointing out several bugs about sorting/format] [Thanks Vlastimil for suggesting way to reduce memory usage of orig_size and keep it only for kmalloc objects] Signed-off-by: Feng Tang Cc: Robin Murphy Cc: John Garry Cc: Kefeng Wang --- include/linux/slab.h | 2 + mm/slub.c | 94 +++++++++++++++++++++++++++++++++++++------- 2 files changed, 81 insertions(+), 15 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index 9b592e611cb1..6dc495f76644 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -29,6 +29,8 @@ #define SLAB_RED_ZONE ((slab_flags_t __force)0x00000400U) /* DEBUG: Poison objects */ #define SLAB_POISON ((slab_flags_t __force)0x00000800U) +/* Indicate a kmalloc slab */ +#define SLAB_KMALLOC ((slab_flags_t __force)0x00001000U) /* Align objs on cache lines */ #define SLAB_HWCACHE_ALIGN ((slab_flags_t __force)0x00002000U) /* Use GFP_DMA memory */ diff --git a/mm/slub.c b/mm/slub.c index 5df44e00b1aa..d8bab650ed99 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -199,6 +199,12 @@ static inline bool kmem_cache_debug(struct kmem_cache *s) return kmem_cache_debug_flags(s, SLAB_DEBUG_FLAGS); } +static inline bool slub_debug_orig_size(struct kmem_cache *s) +{ + return (kmem_cache_debug_flags(s, SLAB_STORE_USER) && + (s->flags & SLAB_KMALLOC)); +} + void *fixup_red_left(struct kmem_cache *s, void *p) { if (kmem_cache_debug_flags(s, SLAB_RED_ZONE)) @@ -785,6 +791,33 @@ static void print_slab_info(const struct slab *slab) folio_flags(folio, 0)); } +static inline void set_orig_size(struct kmem_cache *s, + void *object, unsigned int orig_size) +{ + void *p = kasan_reset_tag(object); + + if (!slub_debug_orig_size(s)) + return; + + p += get_info_end(s); + p += sizeof(struct track) * 2; + + *(unsigned int *)p = orig_size; +} + +static unsigned int get_orig_size(struct kmem_cache *s, void *object) +{ + void *p = kasan_reset_tag(object); + + if (!slub_debug_orig_size(s)) + return s->object_size; + + p += get_info_end(s); + p += sizeof(struct track) * 2; + + return *(unsigned int *)p; +} + static void slab_bug(struct kmem_cache *s, char *fmt, ...) { struct va_format vaf; @@ -844,6 +877,9 @@ static void print_trailer(struct kmem_cache *s, struct slab *slab, u8 *p) if (s->flags & SLAB_STORE_USER) off += 2 * sizeof(struct track); + if (slub_debug_orig_size(s)) + off += sizeof(unsigned int); + off += kasan_metadata_size(s); if (off != size_from_object(s)) @@ -995,10 +1031,14 @@ static int check_pad_bytes(struct kmem_cache *s, struct slab *slab, u8 *p) { unsigned long off = get_info_end(s); /* The end of info */ - if (s->flags & SLAB_STORE_USER) + if (s->flags & SLAB_STORE_USER) { /* We also have user information there */ off += 2 * sizeof(struct track); + if (s->flags & SLAB_KMALLOC) + off += sizeof(unsigned int); + } + off += kasan_metadata_size(s); if (size_from_object(s) == off) @@ -1572,6 +1612,9 @@ void setup_slab_debug(struct kmem_cache *s, struct slab *slab, void *addr) {} static inline int alloc_debug_processing(struct kmem_cache *s, struct slab *slab, void *object) { return 0; } +static inline void set_orig_size(struct kmem_cache *s, + void *object, unsigned int orig_size) {} + static inline void free_debug_processing( struct kmem_cache *s, struct slab *slab, void *head, void *tail, int bulk_cnt, @@ -2974,7 +3017,7 @@ static inline void *get_freelist(struct kmem_cache *s, struct slab *slab) * already disabled (which is the case for bulk allocation). */ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, - unsigned long addr, struct kmem_cache_cpu *c) + unsigned long addr, struct kmem_cache_cpu *c, unsigned int orig_size) { void *freelist; struct slab *slab; @@ -3115,6 +3158,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, if (s->flags & SLAB_STORE_USER) set_track(s, freelist, TRACK_ALLOC, addr); + set_orig_size(s, freelist, orig_size); return freelist; } @@ -3140,6 +3184,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, */ if (s->flags & SLAB_STORE_USER) set_track(s, freelist, TRACK_ALLOC, addr); + set_orig_size(s, freelist, orig_size); + return freelist; } @@ -3182,7 +3228,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, * pointer. */ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, - unsigned long addr, struct kmem_cache_cpu *c) + unsigned long addr, struct kmem_cache_cpu *c, unsigned int orig_size) { void *p; @@ -3195,7 +3241,7 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, c = slub_get_cpu_ptr(s->cpu_slab); #endif - p = ___slab_alloc(s, gfpflags, node, addr, c); + p = ___slab_alloc(s, gfpflags, node, addr, c, orig_size); #ifdef CONFIG_PREEMPT_COUNT slub_put_cpu_ptr(s->cpu_slab); #endif @@ -3280,7 +3326,7 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s, struct list_l if (!USE_LOCKLESS_FAST_PATH() || unlikely(!object || !slab || !node_match(slab, node))) { - object = __slab_alloc(s, gfpflags, node, addr, c); + object = __slab_alloc(s, gfpflags, node, addr, c, orig_size); } else { void *next_object = get_freepointer_safe(s, object); @@ -3747,7 +3793,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, * of re-populating per CPU c->freelist */ p[i] = ___slab_alloc(s, flags, NUMA_NO_NODE, - _RET_IP_, c); + _RET_IP_, c, s->object_size); if (unlikely(!p[i])) goto error; @@ -4150,12 +4196,17 @@ static int calculate_sizes(struct kmem_cache *s) } #ifdef CONFIG_SLUB_DEBUG - if (flags & SLAB_STORE_USER) + if (flags & SLAB_STORE_USER) { /* * Need to store information about allocs and frees after * the object. */ size += 2 * sizeof(struct track); + + /* Save the original kmalloc request size */ + if (flags & SLAB_KMALLOC) + size += sizeof(unsigned int); + } #endif kasan_cache_create(s, &size, &s->flags); @@ -4770,7 +4821,7 @@ void __init kmem_cache_init(void) /* Now we can use the kmem_cache to allocate kmalloc slabs */ setup_kmalloc_cache_index_table(); - create_kmalloc_caches(0); + create_kmalloc_caches(SLAB_KMALLOC); /* Setup random freelists for each cache */ init_freelist_randomization(); @@ -4937,6 +4988,7 @@ struct location { depot_stack_handle_t handle; unsigned long count; unsigned long addr; + unsigned long waste; long long sum_time; long min_time; long max_time; @@ -4983,13 +5035,15 @@ static int alloc_loc_track(struct loc_track *t, unsigned long max, gfp_t flags) } static int add_location(struct loc_track *t, struct kmem_cache *s, - const struct track *track) + const struct track *track, + unsigned int orig_size) { long start, end, pos; struct location *l; - unsigned long caddr, chandle; + unsigned long caddr, chandle, cwaste; unsigned long age = jiffies - track->when; depot_stack_handle_t handle = 0; + unsigned int waste = s->object_size - orig_size; #ifdef CONFIG_STACKDEPOT handle = READ_ONCE(track->handle); @@ -5007,11 +5061,13 @@ static int add_location(struct loc_track *t, struct kmem_cache *s, if (pos == end) break; - caddr = t->loc[pos].addr; - chandle = t->loc[pos].handle; - if ((track->addr == caddr) && (handle == chandle)) { + l = &t->loc[pos]; + caddr = l->addr; + chandle = l->handle; + cwaste = l->waste; + if ((track->addr == caddr) && (handle == chandle) && + (waste == cwaste)) { - l = &t->loc[pos]; l->count++; if (track->when) { l->sum_time += age; @@ -5036,6 +5092,9 @@ static int add_location(struct loc_track *t, struct kmem_cache *s, end = pos; else if (track->addr == caddr && handle < chandle) end = pos; + else if (track->addr == caddr && handle == chandle && + waste < cwaste) + end = pos; else start = pos; } @@ -5059,6 +5118,7 @@ static int add_location(struct loc_track *t, struct kmem_cache *s, l->min_pid = track->pid; l->max_pid = track->pid; l->handle = handle; + l->waste = waste; cpumask_clear(to_cpumask(l->cpus)); cpumask_set_cpu(track->cpu, to_cpumask(l->cpus)); nodes_clear(l->nodes); @@ -5077,7 +5137,7 @@ static void process_slab(struct loc_track *t, struct kmem_cache *s, for_each_object(p, s, addr, slab->objects) if (!test_bit(__obj_to_index(s, addr, p), obj_map)) - add_location(t, s, get_track(s, p, alloc)); + add_location(t, s, get_track(s, p, alloc), get_orig_size(s, p)); } #endif /* CONFIG_DEBUG_FS */ #endif /* CONFIG_SLUB_DEBUG */ @@ -5942,6 +6002,10 @@ static int slab_debugfs_show(struct seq_file *seq, void *v) else seq_puts(seq, ""); + if (l->waste) + seq_printf(seq, " waste=%lu/%lu", + l->count * l->waste, l->waste); + if (l->sum_time != l->min_time) { seq_printf(seq, " age=%ld/%llu/%ld", l->min_time, div_u64(l->sum_time, l->count), From patchwork Mon Aug 29 07:56:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Tang X-Patchwork-Id: 12957488 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7BF44ECAAD4 for ; Mon, 29 Aug 2022 07:55:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D4DB794000A; Mon, 29 Aug 2022 03:55:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CD519940007; Mon, 29 Aug 2022 03:55:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B7FD594000A; Mon, 29 Aug 2022 03:55:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A9BC9940007 for ; Mon, 29 Aug 2022 03:55:53 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 79CA81A0299 for ; Mon, 29 Aug 2022 07:55:53 +0000 (UTC) X-FDA: 79851871386.19.62C8CE3 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf20.hostedemail.com (Postfix) with ESMTP id D63D21C0041 for ; Mon, 29 Aug 2022 07:55:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1661759752; x=1693295752; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PBPokcQTu2e2E10a31KPFVj+zRJDYVfS2hjU6JguMhY=; b=Nl10dyxyLtE0jo3rqj2TbCDYXCBvUa5DS+KNI+0zm2tfP4RW/nhBremG U7L3vBYIlbO5pXAp3mUfzQBqCYSsj/XKteRprohci0gzDR5KdAdQsdIVC ARa5f5WW+WHSY5g/z3MoRlW35H0VbQk/52zyCq1iD9OHdRgtQw5Si47YH o+gGEL7/8aIDZGRTYxhv9WVzTUNYCKVNDRqQslODQyxqtKnKAzEpl17rO MH3rghksXB98DykdxcEe4flSAv0VhnE+PNsOz7RsE+nrDYIedYWBRKrHL o/87Oiwr5wHhP7o0tWzR4gNvXKZk5PFSfyn8z+TcRhxbFaNkqBcyIiKjL g==; X-IronPort-AV: E=McAfee;i="6500,9779,10453"; a="296111513" X-IronPort-AV: E=Sophos;i="5.93,272,1654585200"; d="scan'208";a="296111513" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Aug 2022 00:55:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,272,1654585200"; d="scan'208";a="672283569" Received: from shbuild999.sh.intel.com ([10.239.147.181]) by fmsmga008.fm.intel.com with ESMTP; 29 Aug 2022 00:55:49 -0700 From: Feng Tang To: Andrew Morton , Vlastimil Babka , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Dmitry Vyukov Cc: Dave Hansen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Feng Tang Subject: [PATCH v4 2/4] mm/slub: only zero the requested size of buffer for kzalloc Date: Mon, 29 Aug 2022 15:56:16 +0800 Message-Id: <20220829075618.69069-3-feng.tang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220829075618.69069-1-feng.tang@intel.com> References: <20220829075618.69069-1-feng.tang@intel.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661759753; a=rsa-sha256; cv=none; b=rqDHb9jJsiv5zx5Jq/xvEROPG7Rtuc0rKIzF1OdfJWb6O/VVWzML33H1guBsO64++kOGqZ 67sng1rpl/9yDbTvLk4u2HmNPj7+fdtQclGKe+3yISErLXB1LfuJdDJSEwOmdawlg7/8Kt kCQ4/jXs4DLGaTbziAZPFwRMtgx++IA= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=Nl10dyxy; spf=pass (imf20.hostedemail.com: domain of feng.tang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=feng.tang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661759753; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=V0aftn1iiZ20B6TeIbgAsAO9UO8+GfrOumdHC2vtiGU=; b=azT9uGnMJfPey5Ex/kQxDaXzVjFWAX0UISHmvtpZci3OP5zGPBmGyibrZlT1Y8oGWmxwFF eSCGA679XbbpvwwlEgVAQPzu1OIPZ0UZ8CVkBsA9bHrjuNHghBzYoceUgV1RtGILb1+noI qBZ4uHH7g9+asAg5o4mPnUqOVB69zMw= X-Rspam-User: X-Rspamd-Queue-Id: D63D21C0041 Authentication-Results: imf20.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=Nl10dyxy; spf=pass (imf20.hostedemail.com: domain of feng.tang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=feng.tang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Stat-Signature: q6aeysg61e7t4txwjtk54ttwae6ps73h X-Rspamd-Server: rspam07 X-HE-Tag: 1661759752-286350 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: kzalloc/kmalloc will round up the request size to a fixed size (mostly power of 2), so the allocated memory could be more than requested. Currently kzalloc family APIs will zero all the allocated memory. To detect out-of-bound usage of the extra allocated memory, only zero the requested part, so that sanity check could be added to the extra space later. For kzalloc users who will call ksize() later and utilize this extra space, please be aware that the space is not zeroed any more. Signed-off-by: Feng Tang --- mm/slab.c | 6 +++--- mm/slab.h | 9 +++++++-- mm/slub.c | 6 +++--- 3 files changed, 13 insertions(+), 8 deletions(-) diff --git a/mm/slab.c b/mm/slab.c index a5486ff8362a..73ecaa7066e1 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3253,7 +3253,7 @@ slab_alloc_node(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags, init = slab_want_init_on_alloc(flags, cachep); out: - slab_post_alloc_hook(cachep, objcg, flags, 1, &objp, init); + slab_post_alloc_hook(cachep, objcg, flags, 1, &objp, init, 0); return objp; } @@ -3506,13 +3506,13 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, * Done outside of the IRQ disabled section. */ slab_post_alloc_hook(s, objcg, flags, size, p, - slab_want_init_on_alloc(flags, s)); + slab_want_init_on_alloc(flags, s), 0); /* FIXME: Trace call missing. Christoph would like a bulk variant */ return size; error: local_irq_enable(); cache_alloc_debugcheck_after_bulk(s, flags, i, p, _RET_IP_); - slab_post_alloc_hook(s, objcg, flags, i, p, false); + slab_post_alloc_hook(s, objcg, flags, i, p, false, 0); kmem_cache_free_bulk(s, i, p); return 0; } diff --git a/mm/slab.h b/mm/slab.h index 65023f000d42..1c773195cfcd 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -720,12 +720,17 @@ static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, static inline void slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, gfp_t flags, - size_t size, void **p, bool init) + size_t size, void **p, bool init, + unsigned int orig_size) { size_t i; flags &= gfp_allowed_mask; + /* If original request size(kmalloc) is not set, use object_size */ + if (!orig_size) + orig_size = s->object_size; + /* * As memory initialization might be integrated into KASAN, * kasan_slab_alloc and initialization memset must be @@ -736,7 +741,7 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s, for (i = 0; i < size; i++) { p[i] = kasan_slab_alloc(s, p[i], flags, init); if (p[i] && init && !kasan_has_integrated_init()) - memset(p[i], 0, s->object_size); + memset(p[i], 0, orig_size); kmemleak_alloc_recursive(p[i], s->object_size, 1, s->flags, flags); } diff --git a/mm/slub.c b/mm/slub.c index d8bab650ed99..936b7be0642a 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3360,7 +3360,7 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s, struct list_l init = slab_want_init_on_alloc(gfpflags, s); out: - slab_post_alloc_hook(s, objcg, gfpflags, 1, &object, init); + slab_post_alloc_hook(s, objcg, gfpflags, 1, &object, init, orig_size); return object; } @@ -3817,11 +3817,11 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, * Done outside of the IRQ disabled fastpath loop. */ slab_post_alloc_hook(s, objcg, flags, size, p, - slab_want_init_on_alloc(flags, s)); + slab_want_init_on_alloc(flags, s), 0); return i; error: slub_put_cpu_ptr(s->cpu_slab); - slab_post_alloc_hook(s, objcg, flags, i, p, false); + slab_post_alloc_hook(s, objcg, flags, i, p, false, 0); kmem_cache_free_bulk(s, i, p); return 0; } From patchwork Mon Aug 29 07:56:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Tang X-Patchwork-Id: 12957489 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41B92ECAAD2 for ; Mon, 29 Aug 2022 07:55:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CD73C94000B; Mon, 29 Aug 2022 03:55:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C865A940007; Mon, 29 Aug 2022 03:55:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B759894000B; Mon, 29 Aug 2022 03:55:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id A8CB3940007 for ; Mon, 29 Aug 2022 03:55:56 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 69D5E14069C for ; Mon, 29 Aug 2022 07:55:56 +0000 (UTC) X-FDA: 79851871512.03.4E7EF21 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf20.hostedemail.com (Postfix) with ESMTP id AEFA31C0041 for ; Mon, 29 Aug 2022 07:55:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1661759755; x=1693295755; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jcnnNskazsBc3DHIiMPQsDf7lUlURK3M021eCeUP7pM=; b=OGcIyYRbXBOQQejgaaDTfoCuJJSngme/mry8Wzwg/caju5n9W3z8SguK X7zDPFtNKj+wftSQH3It4wQagJD3WbIAQwNvz40prKQo7eSiZ/oKxD8/D L96PWpwjYsISi3JsHbfpdYWKkD461oAO/Wps+oCtOm/YHPan+3miV/r6p GJGkOOhqO9lKqtzm0GnO2GTu7pllHRZFui8GS0+BcqhLiqjh8y/E/ZvT0 AiEPqvIC8iQo6nvQYxCxfZxLI3b7iD+BUIuhS0hxj7WFmDFAuaokTiugp 8UpXqz6mMl+KrKndQ6P2eNkHb+uxmGeEWGVBD1I5IX0V1pXLWFeSSe5fp Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10453"; a="296111522" X-IronPort-AV: E=Sophos;i="5.93,272,1654585200"; d="scan'208";a="296111522" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Aug 2022 00:55:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,272,1654585200"; d="scan'208";a="672283586" Received: from shbuild999.sh.intel.com ([10.239.147.181]) by fmsmga008.fm.intel.com with ESMTP; 29 Aug 2022 00:55:52 -0700 From: Feng Tang To: Andrew Morton , Vlastimil Babka , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Dmitry Vyukov Cc: Dave Hansen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Feng Tang , kernel test robot Subject: [PATCH v4 3/4] mm: kasan: Add free_meta size info in struct kasan_cache Date: Mon, 29 Aug 2022 15:56:17 +0800 Message-Id: <20220829075618.69069-4-feng.tang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220829075618.69069-1-feng.tang@intel.com> References: <20220829075618.69069-1-feng.tang@intel.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661759756; a=rsa-sha256; cv=none; b=GWC3MdHhZL4On3IqPUz7Bmzx0iHVhYRYnmkFM005W4pSkwUCUVcc+jyJbRx0sjH4E/OatT JZ7dNcNtHVyHadTz4+z1zQ5B+7MG9Y5kwytiTId6SU7SRHUWxXbZ94QBc4YJqEQfGlPBmV ItYr+sQUyn2LQ/fvwxKU8/zBcy4dR20= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=OGcIyYRb; spf=pass (imf20.hostedemail.com: domain of feng.tang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=feng.tang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661759756; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=J64Rhb1TYG6oR5L91PBe0TU5EH464UhjIRgwvau1hKg=; b=YWe3a/1kAoHyAOIdZfzomGXQsUQ8Y+oVqUDteKaFyDLzgub0h2GM3z3Xm2uFAo9nOlaskq 7OWbNmReBCXfeT1IjTTA4jHHpQhqjx80r5zObtsCLN4m9YLophVGFRT8j5y1BO7DIzoyAV aBGYt8wTIrxGPrIvGqRdbtZA4+EgnKE= X-Rspam-User: X-Rspamd-Queue-Id: AEFA31C0041 Authentication-Results: imf20.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=OGcIyYRb; spf=pass (imf20.hostedemail.com: domain of feng.tang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=feng.tang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Stat-Signature: x68sxo8cm1ei3scdutwusx6k73xfrdru X-Rspamd-Server: rspam07 X-HE-Tag: 1661759755-768184 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When kasan is enabled for slab/slub, it may save kasan' free_meta data in the former part of slab object data area in slab object free path, which works fine. There is ongoing effort to extend slub's debug function which will redzone the latter part of kmalloc object area, and when both of the debug are enabled, there is possible conflict, especially when the kmalloc object has small size, as caught by 0Day bot [1] For better information for slab/slub, add free_meta's data size info 'kasan_cache', so that its users can take right action to avoid data conflict. [1]. https://lore.kernel.org/lkml/YuYm3dWwpZwH58Hu@xsang-OptiPlex-9020/ Reported-by: kernel test robot Signed-off-by: Feng Tang Acked-by: Dmitry Vyukov --- include/linux/kasan.h | 2 ++ mm/kasan/common.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/include/linux/kasan.h b/include/linux/kasan.h index b092277bf48d..293bdaa0ba09 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -100,6 +100,8 @@ static inline bool kasan_has_integrated_init(void) struct kasan_cache { int alloc_meta_offset; int free_meta_offset; + /* size of free_meta data saved in object's data area */ + int free_meta_size_in_object; bool is_kmalloc; }; diff --git a/mm/kasan/common.c b/mm/kasan/common.c index 69f583855c8b..762ae7a7793e 100644 --- a/mm/kasan/common.c +++ b/mm/kasan/common.c @@ -201,6 +201,8 @@ void __kasan_cache_create(struct kmem_cache *cache, unsigned int *size, cache->kasan_info.free_meta_offset = KASAN_NO_FREE_META; *size = ok_size; } + } else { + cache->kasan_info.free_meta_size_in_object = sizeof(struct kasan_free_meta); } /* Calculate size with optimal redzone. */ From patchwork Mon Aug 29 07:56:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feng Tang X-Patchwork-Id: 12957490 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B237ECAAD4 for ; Mon, 29 Aug 2022 07:56:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 86A74940007; Mon, 29 Aug 2022 03:55:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 81A4980007; Mon, 29 Aug 2022 03:55:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6BA9B94000C; Mon, 29 Aug 2022 03:55:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 5E77F940007 for ; Mon, 29 Aug 2022 03:55:59 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 32C4A16023B for ; Mon, 29 Aug 2022 07:55:59 +0000 (UTC) X-FDA: 79851871638.06.2DE78E9 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf20.hostedemail.com (Postfix) with ESMTP id AA3901C0008 for ; Mon, 29 Aug 2022 07:55:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1661759758; x=1693295758; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yiiKiHFW1n9E7Th9pCd+R7oBSWy8ZV+t/bo5jll4JtA=; b=HEzIsxpkSstILAru1ixeIR0u3jNymcuIVmo9ymYSmdHQrP6l/+68VNKD F+l555AjblM+h6jbh48YYbKKP9N3SkSd4fI36BaV+CPVCZvpl6MQe9wlL OH5qMJ9acV/S+WEBaCma6piQZfL1c1Pif5N4z/mw6rG+zlcfFXyBqnam3 Lcn94Ri6j+rA+2ZHpxtmSKBsqjBtpaOR3a3A3I4Rv9pJqNpVyg8qyO5hc ABKQkMufC/E/D9OlF+/b5kNOJBRZnAHY0YNrsxkrjwbqACUY0O+t7kHsj aDfppm3nMgIR+qIMeqzgslYKNEqWU7+69UtcUpwkkLCmJRWItEnikbbiD Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10453"; a="296111526" X-IronPort-AV: E=Sophos;i="5.93,272,1654585200"; d="scan'208";a="296111526" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Aug 2022 00:55:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,272,1654585200"; d="scan'208";a="672283594" Received: from shbuild999.sh.intel.com ([10.239.147.181]) by fmsmga008.fm.intel.com with ESMTP; 29 Aug 2022 00:55:55 -0700 From: Feng Tang To: Andrew Morton , Vlastimil Babka , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Dmitry Vyukov Cc: Dave Hansen , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Feng Tang Subject: [PATCH v4 4/4] mm/slub: extend redzone check to cover extra allocated kmalloc space than requested Date: Mon, 29 Aug 2022 15:56:18 +0800 Message-Id: <20220829075618.69069-5-feng.tang@intel.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220829075618.69069-1-feng.tang@intel.com> References: <20220829075618.69069-1-feng.tang@intel.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661759758; a=rsa-sha256; cv=none; b=j+X1+0nh38Q6tjPYh7m96CRg2QZH1z/pQnX57RWjvAnPiMVcieep4YGlrI2JMBrRS/0OiP olzbFPfX9/LSD8rlKUsj7oBZU2oBUCwJUME7iiaLt5JXYJkT9X+mcsoXnULYagDnH07X44 IDn5LhYVqO/YwpR7q4I4Py3l4S6K92o= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=HEzIsxpk; spf=pass (imf20.hostedemail.com: domain of feng.tang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=feng.tang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661759758; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=H4xMRzzDuskG8mu3uFt10Ji9PbVFNfZ/CMwEHxqiuoc=; b=fhW5b8hmLVWdslgr/k4uX89JFIKVHMF5C3rFTUl43zZghPa8OZnlrZW6YJFtpQzYXdl4ug yvv39tNfRXJNwx9JkmEMZ7L2w31Hj2wT2jGMmwoM/jRIszyqw4sHWFG8M0PD+t8YU7d9wf LolzhJwcWdN1/hXV0IQthy89MWnWLbI= X-Rspam-User: X-Rspamd-Queue-Id: AA3901C0008 Authentication-Results: imf20.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=HEzIsxpk; spf=pass (imf20.hostedemail.com: domain of feng.tang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=feng.tang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Stat-Signature: gopgqsccwb1yxyqphgr9x84nickfo589 X-Rspamd-Server: rspam07 X-HE-Tag: 1661759758-640888 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: kmalloc will round up the request size to a fixed size (mostly power of 2), so there could be a extra space than what is requested, whose size is the actual buffer size minus original request size. To better detect out of bound access or abuse of this space, add redzone sanity check for it. And in current kernel, some kmalloc user already knows the existence of the space and utilizes it after calling 'ksize()' to know the real size of the allocated buffer. So we skip the redzone sanity check and kmalloc wastage debug for objects which have been called with ksize(), as treating them as legitimate users of the extra space. Suggested-by: Vlastimil Babka Signed-off-by: Feng Tang --- mm/slab.h | 4 ++ mm/slab_common.c | 4 ++ mm/slub.c | 95 +++++++++++++++++++++++++++++++++++++++++++----- 3 files changed, 93 insertions(+), 10 deletions(-) diff --git a/mm/slab.h b/mm/slab.h index 1c773195cfcd..e296191b9453 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -875,4 +875,8 @@ void __check_heap_object(const void *ptr, unsigned long n, } #endif +#ifdef CONFIG_SLUB_DEBUG +extern void skip_orig_size_check(struct kmem_cache *s, const void *object); +#endif + #endif /* MM_SLAB_H */ diff --git a/mm/slab_common.c b/mm/slab_common.c index 8e13e3aac53f..5106667d6adb 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -1001,6 +1001,10 @@ size_t __ksize(const void *object) return folio_size(folio); } +#ifdef CONFIG_SLUB_DEBUG + skip_orig_size_check(folio_slab(folio)->slab_cache, object); +#endif + return slab_ksize(folio_slab(folio)->slab_cache); } diff --git a/mm/slub.c b/mm/slub.c index 936b7be0642a..4348e0dbf8ee 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -791,18 +791,37 @@ static void print_slab_info(const struct slab *slab) folio_flags(folio, 0)); } -static inline void set_orig_size(struct kmem_cache *s, + +/* + * Return true if the saved orig_size is different from object size, + * and the return value can be used to judge whether redzone is needed. + */ +static inline bool set_orig_size(struct kmem_cache *s, void *object, unsigned int orig_size) { void *p = kasan_reset_tag(object); - if (!slub_debug_orig_size(s)) - return; +#ifdef CONFIG_KASAN_GENERIC + /* + * KASAN could save its free meta data in the start part of + * object area, so skip the redzone check if kasan's meta data + * size is bigger enough to possibly overlap kmalloc redzone. + */ + if (s->kasan_info.free_meta_size_in_object * 2 >= s->object_size) + orig_size = s->object_size; +#endif p += get_info_end(s); p += sizeof(struct track) * 2; - *(unsigned int *)p = orig_size; + + return (orig_size != s->object_size); +} + +void skip_orig_size_check(struct kmem_cache *s, const void *object) +{ + if (slub_debug_orig_size(s)) + set_orig_size(s, (void *)object, s->object_size); } static unsigned int get_orig_size(struct kmem_cache *s, void *object) @@ -949,6 +968,45 @@ static void init_object(struct kmem_cache *s, void *object, u8 val) memset(p + s->object_size, val, s->inuse - s->object_size); } +/* + * For kmalloced objects, its allocated area could be larger than + * what was requested, save the original request size and extend + * the redzone check into this extra erea, if related debug flags + * are enabled + */ +static void init_kmalloc_object(struct kmem_cache *s, void *object, int orig_size) +{ + unsigned int redzone_start; + u8 *p = kasan_reset_tag(object); + + if (!slub_debug_orig_size(s) || !set_orig_size(s, object, orig_size)) + return; + + /* Skip the redzone part if the flag is not enabled */ + if (!(s->flags & SLAB_RED_ZONE)) + return; + + /* + * init_object() has been called earlier in alloc_debug_processing(), + * here only the object's data area is touched. + */ + redzone_start = orig_size; + + if (!freeptr_outside_object(s)) + redzone_start = max_t(unsigned int, orig_size, + s->offset + sizeof(void *)); + + if (redzone_start >= s->object_size) + return; + + memset(p + redzone_start, SLUB_RED_ACTIVE, + s->object_size - redzone_start); + + /* Poison area also need be shrunk */ + if (s->flags & __OBJECT_POISON) + p[orig_size - 1] = POISON_END; +} + static void restore_bytes(struct kmem_cache *s, char *message, u8 data, void *from, void *to) { @@ -1089,6 +1147,7 @@ static int check_object(struct kmem_cache *s, struct slab *slab, { u8 *p = object; u8 *endobject = object + s->object_size; + unsigned int orig_size; if (s->flags & SLAB_RED_ZONE) { if (!check_bytes_and_report(s, slab, object, "Left Redzone", @@ -1098,6 +1157,20 @@ static int check_object(struct kmem_cache *s, struct slab *slab, if (!check_bytes_and_report(s, slab, object, "Right Redzone", endobject, val, s->inuse - s->object_size)) return 0; + + if (slub_debug_orig_size(s) && val == SLUB_RED_ACTIVE) { + orig_size = get_orig_size(s, object); + + if (!freeptr_outside_object(s)) + orig_size = max_t(unsigned int, orig_size, + s->offset + sizeof(void *)); + if (s->object_size > orig_size && + !check_bytes_and_report(s, slab, object, + "kmalloc Redzone", p + orig_size, + val, s->object_size - orig_size)) { + return 0; + } + } } else { if ((s->flags & SLAB_POISON) && s->object_size < s->inuse) { check_bytes_and_report(s, slab, p, "Alignment padding", @@ -1612,9 +1685,11 @@ void setup_slab_debug(struct kmem_cache *s, struct slab *slab, void *addr) {} static inline int alloc_debug_processing(struct kmem_cache *s, struct slab *slab, void *object) { return 0; } -static inline void set_orig_size(struct kmem_cache *s, - void *object, unsigned int orig_size) {} +static inline bool set_orig_size(struct kmem_cache *s, + void *object, unsigned int orig_size) { return false; } +static inline void init_kmalloc_object(struct kmem_cache *s, + void *object, int orig_size) {} static inline void free_debug_processing( struct kmem_cache *s, struct slab *slab, void *head, void *tail, int bulk_cnt, @@ -2071,7 +2146,7 @@ static void *alloc_single_from_partial(struct kmem_cache *s, * and put the slab to the partial (or full) list. */ static void *alloc_single_from_new_slab(struct kmem_cache *s, - struct slab *slab) + struct slab *slab, int orig_size) { int nid = slab_nid(slab); struct kmem_cache_node *n = get_node(s, nid); @@ -3151,14 +3226,14 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, stat(s, ALLOC_SLAB); if (kmem_cache_debug(s)) { - freelist = alloc_single_from_new_slab(s, slab); + freelist = alloc_single_from_new_slab(s, slab, orig_size); if (unlikely(!freelist)) goto new_objects; if (s->flags & SLAB_STORE_USER) set_track(s, freelist, TRACK_ALLOC, addr); - set_orig_size(s, freelist, orig_size); + init_kmalloc_object(s, freelist, orig_size); return freelist; } @@ -3184,7 +3259,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, */ if (s->flags & SLAB_STORE_USER) set_track(s, freelist, TRACK_ALLOC, addr); - set_orig_size(s, freelist, orig_size); + init_kmalloc_object(s, freelist, orig_size); return freelist; }