From patchwork Thu Mar 6 22:44:50 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luiz Capitulino X-Patchwork-Id: 14005453 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE843C282D1 for ; Thu, 6 Mar 2025 22:45:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 73592280006; Thu, 6 Mar 2025 17:45:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6E574280002; Thu, 6 Mar 2025 17:45:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 588A7280006; Thu, 6 Mar 2025 17:45:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 38B77280002 for ; Thu, 6 Mar 2025 17:45:12 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 7DD391C850D for ; Thu, 6 Mar 2025 22:45:12 +0000 (UTC) X-FDA: 83192608464.18.863537B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf28.hostedemail.com (Postfix) with ESMTP id 93CB2C0003 for ; Thu, 6 Mar 2025 22:45:10 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=L7Cvphl4; spf=pass (imf28.hostedemail.com: domain of luizcap@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=luizcap@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741301110; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=teoBvxfOPLZG+ZTVaUDHrQiabUlTMJDRM/IlVl3SrJk=; b=D26XB8Glc51/TllBRMbh6ul77lR5ea6/Yi1Hgg+mG4e38rnaXHcHcBy7xYOp+5m95ahrLf 72BWhZDSmN6P/oiRouWupr9EACLgWMIsXQ9SeWW1wAgqYbWSSdLqqIeZ7p+Yiz6j0s6xT/ ZCg3lyumy1zO1aH8+++xLs2xJeQbQiI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741301110; a=rsa-sha256; cv=none; b=tzPA0nHiQtl/lxMygzAb3RD4/7Im2GEwZWcJR966Om1h/smd+c2s6myieB+q3soFBH+VT8 WXCapn0iIiGlg2wwzfJHfWZu1AirV8UthNcuWuzjCaOJfCPn93WmXgMPVeIxFS0gKf5ot0 0GytK+CLZkxA7ziwSQ5jan7hxOe2GCg= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=L7Cvphl4; spf=pass (imf28.hostedemail.com: domain of luizcap@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=luizcap@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1741301109; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=teoBvxfOPLZG+ZTVaUDHrQiabUlTMJDRM/IlVl3SrJk=; b=L7Cvphl4qJ8BOH1ixMD7bHQxlnvW+wvblIeaR7YkpvpL0f5P/vewee+F+6veb+HaFDWOzB 2oIJOiEsZBa00Jf3k9u2P2WRP9d5bubYVnTZHEim/lhwTWwfObfg5sbBQtWerlix3C1inp gA+wVZnAtFJDMsOSHgHitrGmtLYs+CA= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-124-e9BIJS8YNq2T-zcJVrc-Tw-1; Thu, 06 Mar 2025 17:45:06 -0500 X-MC-Unique: e9BIJS8YNq2T-zcJVrc-Tw-1 X-Mimecast-MFC-AGG-ID: e9BIJS8YNq2T-zcJVrc-Tw_1741301105 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id C1A0719560A3; Thu, 6 Mar 2025 22:45:04 +0000 (UTC) Received: from fedora.redhat.com (unknown [10.22.88.191]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 4B98818009BC; Thu, 6 Mar 2025 22:45:02 +0000 (UTC) From: Luiz Capitulino To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, david@redhat.com, yuzhao@google.com, pasha.tatashin@soleen.com Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, muchun.song@linux.dev, luizcap@redhat.com Subject: [PATCH v3 1/3] mm: page_ext: add an iteration API for page extensions Date: Thu, 6 Mar 2025 17:44:50 -0500 Message-ID: In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 X-Rspamd-Server: rspam02 X-Stat-Signature: jzr9kg41uq7tfhcnhm98bm6aai6cayxn X-Rspamd-Queue-Id: 93CB2C0003 X-Rspam-User: X-HE-Tag: 1741301110-375177 X-HE-Meta: U2FsdGVkX18a8JCJdKjrMJRaXSswtLYLItnxuy+DEDLY24O+/GRNSH+X4CMWfsM70Gqjduzmyu2slvtC9jxZQjzTfcYGOgDyGfnoiwQUebpsXopBryTiNsJlHRnhSj32JPa+N4LyBmSNZIOayvsmYOMKerH3DiDGnvHjl+OdMbQwvhohlJiON6YX/5SIR+se9Jo50fYz7r4T/CwrAmW1hlU0PlPsn3RPBgTs4w/Mu8QkdCDwoXoWkj6CYgfWJq+/6n3IVRgbWHrbAgPHJeMmiS7vv+SCpLWqZWhqm7OaC8/zcQ9EPxQt//6/tC188VJIktcxNQ3rlkKCl8wIGN8TYuZfpwqML5LxOwKaeHH1Qz/Y0v/6VXVr/S96fXKD63FBdx9th3B4ItVLwM70kMBikA/6Lm5VJ2Y7w74bqLoQxmtV7lvcog6I1fj8P3z/XmUH0M8mKF0fxBSNNjXufe21zm47Rc7ksxC8NGs1NWDmPEqEYuGBvY729TSAA4PWawgM6I7suOi96jjBqpHQ1If0AOEJXVZvq+WjQk7uPeT9/mS8R+UbR2fzxBJfqoGd4HzLZKotE1l9Ao4d5fGoe/etWVSGAjkgC8EqhMZQ2LJx61bYoJ5/o7mf3seayqrTJIAOmBfYcn1+jwNh3jnyBI2A1m0MDip4PMWTYsqQI7iVXwS/V8zwcoA9VWGx8Obx7NZF2llf90Tp4iuTDGl3ReQtEA0ZBLeR4PyMSaHxLcWFJuk+f1BYtwFyl9xP08zGCZrg7BcKNMK27J9/krlpRmti/gYCeEt38V3+6vFCgvZp6AEEUuRtenvIhxPBZmp4O1fzC8+6lIw27AbaJ+RBYKy591CH3deeL+rWmD8O4wkO3JT/AMDCqSNptl5r9Xrzg77Vbjmb8jZBnOJJjUsU8e/cxBVyJ5maXl5ZylT09D1KCsztD/zCVa7mglD1i9PzN3QTGFzRbLp4LMwhOLYuoPs 4x5y6C/P 1X0q8Hj2xuPJtfUB687E/3s4EVocgEOZaTiPnXtEy16sqtgp5Dn2T47VVajSoEUZWZHx7UcCpuI232+YToF6/d29Fl7kVQ9RgjV29+6qUNRmkOwmJATNl7YVPCUHur4VCz5/oPXQLeoPlZyP2woOciKv0CYLH3WSAlb5LQjWQS58/sIsEM38Q1D2crQVGI4lFhxuOnWAyOCdqPm/utsP+pr5s31rQIpJ22OZtTGaAbEyIS5gL7246s8vLmoG87TlDYYjVLOYKs2/qHfXYfzZ36goGublO8LUlKLe6KNNCLI3Sa1FEBto1YWRhQcIMHyJVtmRaXbe1Eidj7rilaQbXWpSfn2CKh277sZMr6n0vPamVns17MirMVRjI2ZGE05ckdevCesVMkwCqxVmmDner/jo25A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The page extension implementation assumes that all page extensions of a given page order are stored in the same memory section. The function page_ext_next() relies on this assumption by adding an offset to the current object to return the next adjacent page extension. This behavior works as expected for flatmem but fails for sparsemem when using 1G pages. The commit cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP for gigantic folios") exposes this issue, making it possible for a crash when using page_owner or page_table_check page extensions. The problem is that for 1G pages, the page extensions may span memory section boundaries and be stored in different memory sections. This issue was not visible before commit cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP for gigantic folios") because alloc_contig_pages() never passed more than MAX_PAGE_ORDER to post_alloc_hook(). However, the series introducing mentioned commit changed this behavior allowing the full 1G page order to be passed. Reproducer: 1. Build the kernel with CONFIG_SPARSEMEM=y and table extensions support 2. Pass 'default_hugepagesz=1 page_owner=on' in the kernel command-line 3. Reserve one 1G page at run-time, this should crash (backtrace below) To address this issue, this commit introduces a new API for iterating through page extensions. The main iteration macro is for_each_page_ext() and it must be called with the RCU read lock taken. Here's an usage example: """ struct page_ext_iter iter; struct page_ext *page_ext; ... rcu_read_lock(); for_each_page_ext(page, 1 << order, page_ext, iter) { struct my_page_ext *obj = get_my_page_ext_obj(page_ext); ... } rcu_read_unlock(); """ The loop construct uses page_ext_iter_next() which checks to see if we have crossed sections in the iteration. In this case, page_ext_iter_next() retrieves the next page_ext object from another section. Thanks to David Hildenbrand for helping identify the root cause and providing suggestions on how to fix and optmize the solution (final implementation and bugs are all mine through). Lastly, here's the backtrace, without kasan you can get random crashes: [ 76.052526] BUG: KASAN: slab-out-of-bounds in __update_page_owner_handle+0x238/0x298 [ 76.060283] Write of size 4 at addr ffff07ff96240038 by task tee/3598 [ 76.066714] [ 76.068203] CPU: 88 UID: 0 PID: 3598 Comm: tee Kdump: loaded Not tainted 6.13.0-rep1 #3 [ 76.076202] Hardware name: WIWYNN Mt.Jade Server System B81.030Z1.0007/Mt.Jade Motherboard, BIOS 2.10.20220810 (SCP: 2.10.20220810) 2022/08/10 [ 76.088972] Call trace: [ 76.091411] show_stack+0x20/0x38 (C) [ 76.095073] dump_stack_lvl+0x80/0xf8 [ 76.098733] print_address_description.constprop.0+0x88/0x398 [ 76.104476] print_report+0xa8/0x278 [ 76.108041] kasan_report+0xa8/0xf8 [ 76.111520] __asan_report_store4_noabort+0x20/0x30 [ 76.116391] __update_page_owner_handle+0x238/0x298 [ 76.121259] __set_page_owner+0xdc/0x140 [ 76.125173] post_alloc_hook+0x190/0x1d8 [ 76.129090] alloc_contig_range_noprof+0x54c/0x890 [ 76.133874] alloc_contig_pages_noprof+0x35c/0x4a8 [ 76.138656] alloc_gigantic_folio.isra.0+0x2c0/0x368 [ 76.143616] only_alloc_fresh_hugetlb_folio.isra.0+0x24/0x150 [ 76.149353] alloc_pool_huge_folio+0x11c/0x1f8 [ 76.153787] set_max_huge_pages+0x364/0xca8 [ 76.157961] __nr_hugepages_store_common+0xb0/0x1a0 [ 76.162829] nr_hugepages_store+0x108/0x118 [ 76.167003] kobj_attr_store+0x3c/0x70 [ 76.170745] sysfs_kf_write+0xfc/0x188 [ 76.174492] kernfs_fop_write_iter+0x274/0x3e0 [ 76.178927] vfs_write+0x64c/0x8e0 [ 76.182323] ksys_write+0xf8/0x1f0 [ 76.185716] __arm64_sys_write+0x74/0xb0 [ 76.189630] invoke_syscall.constprop.0+0xd8/0x1e0 [ 76.194412] do_el0_svc+0x164/0x1e0 [ 76.197891] el0_svc+0x40/0xe0 [ 76.200939] el0t_64_sync_handler+0x144/0x168 [ 76.205287] el0t_64_sync+0x1ac/0x1b0 Fixes: cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP for gigantic folios") Signed-off-by: Luiz Capitulino Acked-by: David Hildenbrand --- include/linux/page_ext.h | 93 ++++++++++++++++++++++++++++++++++++++++ mm/page_ext.c | 13 ++++++ 2 files changed, 106 insertions(+) diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index e4b48a0dda244..76c817162d2fb 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -3,6 +3,7 @@ #define __LINUX_PAGE_EXT_H #include +#include #include struct pglist_data; @@ -69,16 +70,31 @@ extern void page_ext_init(void); static inline void page_ext_init_flatmem_late(void) { } + +static inline bool page_ext_iter_next_fast_possible(unsigned long next_pfn) +{ + /* + * page_ext is allocated per memory section. Once we cross a + * memory section, we have to fetch the new pointer. + */ + return next_pfn % PAGES_PER_SECTION; +} #else extern void page_ext_init_flatmem(void); extern void page_ext_init_flatmem_late(void); static inline void page_ext_init(void) { } + +static inline bool page_ext_iter_next_fast_possible(unsigned long next_pfn) +{ + return true; +} #endif extern struct page_ext *page_ext_get(const struct page *page); extern void page_ext_put(struct page_ext *page_ext); +extern struct page_ext *page_ext_lookup(unsigned long pfn); static inline void *page_ext_data(struct page_ext *page_ext, struct page_ext_operations *ops) @@ -93,6 +109,83 @@ static inline struct page_ext *page_ext_next(struct page_ext *curr) return next; } +struct page_ext_iter { + unsigned long index; + unsigned long start_pfn; + struct page_ext *page_ext; +}; + +/** + * page_ext_iter_begin() - Prepare for iterating through page extensions. + * @iter: page extension iterator. + * @pfn: PFN of the page we're interested in. + * + * Must be called with RCU read lock taken. + * + * Return: NULL if no page_ext exists for this page. + */ +static inline struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, + unsigned long pfn) +{ + iter->index = 0; + iter->start_pfn = pfn; + iter->page_ext = page_ext_lookup(pfn); + + return iter->page_ext; +} + +/** + * page_ext_iter_next() - Get next page extension + * @iter: page extension iterator. + * + * Must be called with RCU read lock taken. + * + * Return: NULL if no next page_ext exists. + */ +static inline struct page_ext *page_ext_iter_next(struct page_ext_iter *iter) +{ + unsigned long pfn; + + if (WARN_ON_ONCE(!iter->page_ext)) + return NULL; + + iter->index++; + pfn = iter->start_pfn + iter->index; + + if (page_ext_iter_next_fast_possible(pfn)) + iter->page_ext = page_ext_next(iter->page_ext); + else + iter->page_ext = page_ext_lookup(pfn); + + return iter->page_ext; +} + +/** + * page_ext_iter_get() - Get current page extension + * @iter: page extension iterator. + * + * Return: NULL if no page_ext exists for this iterator. + */ +static inline struct page_ext *page_ext_iter_get(const struct page_ext_iter *iter) +{ + return iter->page_ext; +} + +/** + * for_each_page_ext(): iterate through page_ext objects. + * @__page: the page we're interested in + * @__pgcount: how many pages to iterate through + * @__page_ext: struct page_ext pointer where the current page_ext + * object is returned + * @__iter: struct page_ext_iter object (defined in the stack) + * + * IMPORTANT: must be called with RCU read lock taken. + */ +#define for_each_page_ext(__page, __pgcount, __page_ext, __iter) \ + for (__page_ext = page_ext_iter_begin(&__iter, page_to_pfn(__page));\ + __page_ext && __iter.index < __pgcount; \ + __page_ext = page_ext_iter_next(&__iter)) + #else /* !CONFIG_PAGE_EXTENSION */ struct page_ext; diff --git a/mm/page_ext.c b/mm/page_ext.c index 641d93f6af4c1..c351fdfe9e9a5 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -507,6 +507,19 @@ void __meminit pgdat_page_ext_init(struct pglist_data *pgdat) #endif +/** + * page_ext_lookup() - Lookup a page extension for a PFN. + * @pfn: PFN of the page we're interested in. + * + * Must be called with RCU read lock taken and @pfn must be valid. + * + * Return: NULL if no page_ext exists for this page. + */ +struct page_ext *page_ext_lookup(unsigned long pfn) +{ + return lookup_page_ext(pfn_to_page(pfn)); +} + /** * page_ext_get() - Get the extended information for a page. * @page: The page we're interested in.