From patchwork Wed Jan 15 02:17:41 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13939759 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FF67C02183 for ; Wed, 15 Jan 2025 02:18:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9FD936B008A; Tue, 14 Jan 2025 21:18:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9ADF66B008C; Tue, 14 Jan 2025 21:18:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 84E9A6B0092; Tue, 14 Jan 2025 21:18:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 65A9C6B008A for ; Tue, 14 Jan 2025 21:18:06 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id D7E4B140EC1 for ; Wed, 15 Jan 2025 02:18:05 +0000 (UTC) X-FDA: 83008076130.06.4E83FFD Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf19.hostedemail.com (Postfix) with ESMTP id E78811A000A for ; Wed, 15 Jan 2025 02:18:03 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=LA1cwVzi; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736907484; a=rsa-sha256; cv=none; b=nCk2ChlYsnPPSpZO+KZvq9hfRrA1o13LHs96vhbBsIGd4EJujFCpLLxBF3z2sBNts7xEht VcdAWoe5nteEZvyEJfYvPx/5mMO2x+XFBeCG66aQfBfehR5fi7wNlq39b16Gr7vSnUNunL m0+3AF/X+GXKKgm29A4/diJeVv1PNqw= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=LA1cwVzi; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736907484; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=S/4MiM3RUKHfwlZ4XiouzlgmWU/NeE4CfHyc1IF4D50=; b=AcJwPMEFwc66WvHuUSWua/lfncusuOMZNZ0Ng7365WYS/a4Z+BOVtZcujipD/lL3NExDVj cfTKSPa+H5h69LnebhRHooNmGHwMkPpdAFKfql4Ko3ZoYTPmICkuLOCbzKygOViuJKD4Xi EdEl/+peXN3d7fwAGwfdiKAknza3+60= Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-21a7ed0155cso106947375ad.3 for ; Tue, 14 Jan 2025 18:18:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736907483; x=1737512283; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=S/4MiM3RUKHfwlZ4XiouzlgmWU/NeE4CfHyc1IF4D50=; b=LA1cwVziuhrM5j3LRiTFwJdKlUdyM8kpyfbEOR6f5HoRevPOqXcgosqbNYxXmYibF1 0kk5tMXSfkE1CKuNGc1YrmO6CBQzms0+NsJsTLtfXljxxDqGd/uYXhyfWfLXqRHLoDPE LlOAuHPKuz7k7LUSHc7who4n9cpLIy8tT1LrHm0uy9ASJae/kI10SHnyCC1D2IgEOfKb tSH0VF60H/NFXA+ByQp6wao35UMLCie40GD1Mafk1tNY5CuFukCahY2WMhKzh6aO9Voe +ad1pV5kwr3mvR15N8+Dz1xNs2wM3J9HQ381aoJp0fe5tXI61P3cBzHeDzAHzxbOsIBz NJXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736907483; x=1737512283; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=S/4MiM3RUKHfwlZ4XiouzlgmWU/NeE4CfHyc1IF4D50=; b=WVK5ZKvQ8NeLpb8upTtd2MNVT9XJ0PmbrkAm/niznC1MPMmt5G8lcVDa/G81bDc6Td TOIfjVM/79KSKA3khqUXsiTGppZAYJZq+hHcUoQKFHCCQfD9WCcEkEqbi+WFZVwawP3a x9QMrvrQ2rDQHGnhK026L0pTnuuAGkSUWAYRGf7xcrhOdzj4WDq3kbfl0KiGET1dvpLl 9xaLVvcJqGkNhmQsVQ/S/3JK1IbWLxahHd43L8ZLNh5wZ4a7dSL1h6pXsoYsnI518NJa TmTgX9FBeX88UVT0VhKrBUL8AH9Q+CgEexW7g39AdzC3ARoZiE1jG7wCTzMIqp5WJaag ePSA== X-Forwarded-Encrypted: i=1; AJvYcCWVL+dkFsQov2NXswaec+AnzDzfdYM9BEVgTYgALB6pThoj/eL56sKBes7XBqcWb9ODSwbEZxa4Fw==@kvack.org X-Gm-Message-State: AOJu0YyRsxDQ9QmKyMkhuGZfxicVyRQFtfF83ZBN3a68BrhvHGRmP6d8 PW8TLZb3JiqMlZmCA3rSYnmzwDHuWv0Pqq8qtv+Qiw3k36fP+xjT X-Gm-Gg: ASbGncsKixvWZNQxPx49gUxbcqCwTq1T46tpeT6Wc0GocWFXAK7wb5E4F2dM2DdgYMN WMaKQL06Aa4eYyvda5bnOtIGd0tE/wKbL2xHVZlgZkojZdNR1Ji0apd0LTgclLVl9jLo6ibBpea P7vUEcBetmEHVpq9bf+UzEgdNx1LRap9XUAaM0EYbACegxrzNk01JoqllGTB/5YjTPSKMf/f245 e+BxzuhfUZRRs52gbD/jO75MF+VvDYlx7cDvbdXpZusvygBgH3Lcmoda+PK5IFcpFeZlN+oLrvh 6UGWqwLE X-Google-Smtp-Source: AGHT+IE2HSgtTwGa6fxDzYYIuEsJKl/KfTDggfWr6X8IZDgbB2AylvSlI27CX51K5j5TCNfNyEO1Hw== X-Received: by 2002:a17:902:fb8e:b0:215:6426:30a5 with SMTP id d9443c01a7336-21a83fc150dmr338535215ad.40.1736907482518; Tue, 14 Jan 2025 18:18:02 -0800 (PST) Received: from localhost.localdomain ([2620:10d:c090:400::5:4043]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21a9f25a583sm74065695ad.244.2025.01.14.18.18.00 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 14 Jan 2025 18:18:02 -0800 (PST) From: Alexei Starovoitov To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v5 2/7] mm, bpf: Introduce free_pages_nolock() Date: Tue, 14 Jan 2025 18:17:41 -0800 Message-Id: <20250115021746.34691-3-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250115021746.34691-1-alexei.starovoitov@gmail.com> References: <20250115021746.34691-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: E78811A000A X-Stat-Signature: wphnm81qnzb4dgsfuobbw3ymsx9mfgkt X-HE-Tag: 1736907483-227103 X-HE-Meta: U2FsdGVkX18yAhBDkMc9OhZhk5oFzOIwH9Awax/ajKti+2NDRXpQXm98oc60baTzWdIfqrq+9JUrkBbiLaXrn4VZErypKNtQXOn/BRFLY15yt10yWBRPDRcM/0alao4lBZZ0nX25s8ZluGFadylZ6K0M6WhDl3YxzIrVDSznWk+o+bMngGyucr1wDZtO9ggbg4cCN7moFyQ2gml2vp9hZiRz+QrlVmjmhYOpxqZLJSKosO8BZxPapOPcNtbnVTxRAIH6g42CXuKBIjwXMU9+onD9MADb3XakPI4Uu3pYYt06VHwgjejjWOlM21NS7wYplXKB+xF0Lp3sydmX0EVwIQRks8LPO7/GdmdAguzrvlkE3M735fiblRGLpex4LNJKgDPg+w6kSaG7iJJvLfHsx5sQCJeCogETmFIZQO8Ndfs4hDtceblEXEYx1y48zdNhcXGGi4LRDKmIyBS8NzA8jfDU926HOo5uSEpwNVUMMemU7K4AWf5totHTB4W0NXxIiM7gQDarj8AlJIJsHR9/WY4zfSxS4nWc2Er+X9JOZ9nIKU35UsShA+W1LfC0JaW8OETDtFYL2MvROVmQPmPwiAwtM1nRZiLIboWSpqnoJru8jl5qS5mGv7jVevNHeZ9ykEHBF81Ae7Bk68vjvLX3ENTovjNymSKy3NgrGQ5xudGN/baQkYjvNcQMqP3FHc0JaKGyNhUrIDtPn5TSn+Jsq1fE9zicV4zbAoAHjqSrfsH3XqSMnng2adRkFqcxDgRYYL2lX3o6a5PTWmF+WlQf6Tz8o1jRGDjH8CGxjvd3tN/FAfIK1rC3YS4gy2k2P9lil+pJZYIrYPCkOIy6C3/cypoCKerybxFnQP4CarGusIwkO43H79FxgfIxFKCGp8jUP7TCnomT1q8WidcBZwNILJ9a08094kZ+kQd3N74A0ZzkeDh3nAj9cqUkUM4+qSmBcYI1b0G7KpqQCKkzspp COyuo8PA +y0l6cRD2o3hE1VgkQbSZ0xcFMHUt3uV0r+hxlxT0526cNOHjptX9EJWU1LbyUtEwHGKFrw3m5mFaOZ8+mlVrrahbDoijTElSPK9EWHmRiLBflFr0uNJUXjgIV2IhwFVfkMIMUFVPeL74TLjfAlcG93KSLYATvDKLmmZmy1dPWM7/dlNdrhgKNl9Q6NAoChzfwIxxxNFvVuatiY9XywjjI5eZnCoHpk52AeZFRkIatZHPWOe41TAAeShk4RnbEic/VwJqodfxUyjiJEfKcYhen78vEI8NdPiL5t6RikOX0wQNzMaFuf1AgjtS6KHMPam4GDOgQhPVSxEApPIBNJSXEKAhztS+zmNcjp/7+OD5MLjWUnZ8bNk1Qh/fjhqSzDmOkAaGcO1S/mCvlnFHIZcHTEHf1vdJp0YHI5U5K04SQSk+wlg09VtZNymcF4kYG216+NTEtgiOSeTOH6XpvzyLnZl56mTaa69wQ6JdXQwFteJBNU4TRzng/Lq5og== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Alexei Starovoitov Introduce free_pages_nolock() that can free pages without taking locks. It relies on trylock and can be called from any context. Since spin_trylock() cannot be used in RT from hard IRQ or NMI it uses lockless link list to stash the pages which will be freed by subsequent free_pages() from good context. Do not use llist unconditionally. BPF maps continuously allocate/free, so we cannot unconditionally delay the freeing to llist. When the memory becomes free make it available to the kernel and BPF users right away if possible, and fallback to llist as the last resort. Signed-off-by: Alexei Starovoitov Acked-by: Vlastimil Babka Acked-by: Sebastian Andrzej Siewior --- include/linux/gfp.h | 1 + include/linux/mm_types.h | 4 ++ include/linux/mmzone.h | 3 ++ mm/page_alloc.c | 79 ++++++++++++++++++++++++++++++++++++---- 4 files changed, 79 insertions(+), 8 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index b41bb6e01781..6eba2d80feb8 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -391,6 +391,7 @@ __meminit void *alloc_pages_exact_nid_noprof(int nid, size_t size, gfp_t gfp_mas __get_free_pages((gfp_mask) | GFP_DMA, (order)) extern void __free_pages(struct page *page, unsigned int order); +extern void free_pages_nolock(struct page *page, unsigned int order); extern void free_pages(unsigned long addr, unsigned int order); #define __free_page(page) __free_pages((page), 0) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 7361a8f3ab68..52547b3e5fd8 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -99,6 +99,10 @@ struct page { /* Or, free page */ struct list_head buddy_list; struct list_head pcp_list; + struct { + struct llist_node pcp_llist; + unsigned int order; + }; }; /* See page-flags.h for PAGE_MAPPING_FLAGS */ struct address_space *mapping; diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index b36124145a16..1a854e0a9e3b 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -953,6 +953,9 @@ struct zone { /* Primarily protects free_area */ spinlock_t lock; + /* Pages to be freed when next trylock succeeds */ + struct llist_head trylock_free_pages; + /* Write-intensive fields used by compaction and vmstats. */ CACHELINE_PADDING(_pad2_); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 74c2a7af1a77..a9c639e3db91 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -88,6 +88,9 @@ typedef int __bitwise fpi_t; */ #define FPI_TO_TAIL ((__force fpi_t)BIT(1)) +/* Free the page without taking locks. Rely on trylock only. */ +#define FPI_TRYLOCK ((__force fpi_t)BIT(2)) + /* prevent >1 _updater_ of zone percpu pageset ->high and ->batch fields */ static DEFINE_MUTEX(pcp_batch_high_lock); #define MIN_PERCPU_PAGELIST_HIGH_FRACTION (8) @@ -1247,13 +1250,44 @@ static void split_large_buddy(struct zone *zone, struct page *page, } } +static void add_page_to_zone_llist(struct zone *zone, struct page *page, + unsigned int order) +{ + /* Remember the order */ + page->order = order; + /* Add the page to the free list */ + llist_add(&page->pcp_llist, &zone->trylock_free_pages); +} + static void free_one_page(struct zone *zone, struct page *page, unsigned long pfn, unsigned int order, fpi_t fpi_flags) { + struct llist_head *llhead; unsigned long flags; - spin_lock_irqsave(&zone->lock, flags); + if (!spin_trylock_irqsave(&zone->lock, flags)) { + if (unlikely(fpi_flags & FPI_TRYLOCK)) { + add_page_to_zone_llist(zone, page, order); + return; + } + spin_lock_irqsave(&zone->lock, flags); + } + + /* The lock succeeded. Process deferred pages. */ + llhead = &zone->trylock_free_pages; + if (unlikely(!llist_empty(llhead) && !(fpi_flags & FPI_TRYLOCK))) { + struct llist_node *llnode; + struct page *p, *tmp; + + llnode = llist_del_all(llhead); + llist_for_each_entry_safe(p, tmp, llnode, pcp_llist) { + unsigned int p_order = p->order; + + split_large_buddy(zone, p, page_to_pfn(p), p_order, fpi_flags); + __count_vm_events(PGFREE, 1 << p_order); + } + } split_large_buddy(zone, page, pfn, order, fpi_flags); spin_unlock_irqrestore(&zone->lock, flags); @@ -2596,7 +2630,7 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, struct page *page, int migratetype, - unsigned int order) + unsigned int order, fpi_t fpi_flags) { int high, batch; int pindex; @@ -2631,6 +2665,14 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, } if (pcp->free_count < (batch << CONFIG_PCP_BATCH_SCALE_MAX)) pcp->free_count += (1 << order); + + if (unlikely(fpi_flags & FPI_TRYLOCK)) { + /* + * Do not attempt to take a zone lock. Let pcp->count get + * over high mark temporarily. + */ + return; + } high = nr_pcp_high(pcp, zone, batch, free_high); if (pcp->count >= high) { free_pcppages_bulk(zone, nr_pcp_free(pcp, batch, high, free_high), @@ -2645,7 +2687,8 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, /* * Free a pcp page */ -void free_unref_page(struct page *page, unsigned int order) +static void __free_unref_page(struct page *page, unsigned int order, + fpi_t fpi_flags) { unsigned long __maybe_unused UP_flags; struct per_cpu_pages *pcp; @@ -2654,7 +2697,7 @@ void free_unref_page(struct page *page, unsigned int order) int migratetype; if (!pcp_allowed_order(order)) { - __free_pages_ok(page, order, FPI_NONE); + __free_pages_ok(page, order, fpi_flags); return; } @@ -2671,24 +2714,33 @@ void free_unref_page(struct page *page, unsigned int order) migratetype = get_pfnblock_migratetype(page, pfn); if (unlikely(migratetype >= MIGRATE_PCPTYPES)) { if (unlikely(is_migrate_isolate(migratetype))) { - free_one_page(page_zone(page), page, pfn, order, FPI_NONE); + free_one_page(page_zone(page), page, pfn, order, fpi_flags); return; } migratetype = MIGRATE_MOVABLE; } zone = page_zone(page); + if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq())) { + add_page_to_zone_llist(zone, page, order); + return; + } pcp_trylock_prepare(UP_flags); pcp = pcp_spin_trylock(zone->per_cpu_pageset); if (pcp) { - free_unref_page_commit(zone, pcp, page, migratetype, order); + free_unref_page_commit(zone, pcp, page, migratetype, order, fpi_flags); pcp_spin_unlock(pcp); } else { - free_one_page(zone, page, pfn, order, FPI_NONE); + free_one_page(zone, page, pfn, order, fpi_flags); } pcp_trylock_finish(UP_flags); } +void free_unref_page(struct page *page, unsigned int order) +{ + __free_unref_page(page, order, FPI_NONE); +} + /* * Free a batch of folios */ @@ -2777,7 +2829,7 @@ void free_unref_folios(struct folio_batch *folios) trace_mm_page_free_batched(&folio->page); free_unref_page_commit(zone, pcp, &folio->page, migratetype, - order); + order, FPI_NONE); } if (pcp) { @@ -4853,6 +4905,17 @@ void __free_pages(struct page *page, unsigned int order) } EXPORT_SYMBOL(__free_pages); +/* + * Can be called while holding raw_spin_lock or from IRQ and NMI, + * but only for pages that came from try_alloc_pages(): + * order <= 3, !folio, etc + */ +void free_pages_nolock(struct page *page, unsigned int order) +{ + if (put_page_testzero(page)) + __free_unref_page(page, order, FPI_TRYLOCK); +} + void free_pages(unsigned long addr, unsigned int order) { if (addr != 0) {