From patchwork Wed Dec 18 03:07:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexei Starovoitov X-Patchwork-Id: 13912975 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1186E77183 for ; Wed, 18 Dec 2024 03:07:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 851616B0089; Tue, 17 Dec 2024 22:07:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7FE496B008A; Tue, 17 Dec 2024 22:07:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 678AE6B008C; Tue, 17 Dec 2024 22:07:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 478286B0089 for ; Tue, 17 Dec 2024 22:07:38 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id B6843160B5C for ; Wed, 18 Dec 2024 03:07:37 +0000 (UTC) X-FDA: 82906593966.04.AE00BF6 Received: from mail-oa1-f50.google.com (mail-oa1-f50.google.com [209.85.160.50]) by imf30.hostedemail.com (Postfix) with ESMTP id 1D5D28000C for ; Wed, 18 Dec 2024 03:06:35 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=lny7k7mK; spf=pass (imf30.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.160.50 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734491221; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tgdU2JbWztEEEhB3TtDMib8w6jV6+vZNMAAIDtQqMFA=; b=Zv6eEX3XnSdmtWj9IxLftTBICPdkOp1d9bUKnXESu+618E6JoQa9wOa5nqjy3wp8ULE5kG sae9KAlKk2E/XQi/XTaMr8RkLn/JUUMsCFyt0gj4GpsRoyqFXWXFw4iOD54smtv6U3PG+O POMu6cHt78xABcdkROXcB5wvpHHCGRs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734491221; a=rsa-sha256; cv=none; b=YAKJ7jT01AZm2Te0SIN9klyZBmxkgvwY+luAvYGZuvkh6Ce5MY1Ke2WLs3xoH/gfijcSIw ztVNH7WmCRFsSwG/Ksn35+WeZoXuewul8PslihNQMrU+w9LWsRlfMtkUT7nB4abHnCGeFm RdBUFEjoXspNwUBsO7LpH30jpnih9xQ= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=lny7k7mK; spf=pass (imf30.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.160.50 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-oa1-f50.google.com with SMTP id 586e51a60fabf-29e61fcc3d2so4697138fac.2 for ; Tue, 17 Dec 2024 19:07:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734491255; x=1735096055; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=tgdU2JbWztEEEhB3TtDMib8w6jV6+vZNMAAIDtQqMFA=; b=lny7k7mKpAqpQCGbs/s5SacAP4QyR6JhNbHj41ZCIIGm9zaI5xfq0jW8lXSWL2G0J9 JPiZXh40Gsdeu4JnhPQF1xG2NIfNHUJQa0ztg/zWJYioKi//nlyNsAp3W7mR8/XmGuh9 5uwHLb9vjmCeBCZEyAByxbF90Mggt9le+VNZ0Z5vIpLLwOeP0MEByh5vhfh0N2FVcdQV RjQVUvPlH7EaGxnKSwrvB0nvED84j5b/6nEhHN+vgR45nxVgQdUz4wq26cETUirwRILd JBxPOv0763ftopRscTkJFcC921sfmjldPxnPeQwE3lkO1DOfFdiszddcQTEQ7zw/nN83 Bg4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734491255; x=1735096055; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tgdU2JbWztEEEhB3TtDMib8w6jV6+vZNMAAIDtQqMFA=; b=HYsPXwbd8dlmFsxIH1+32I5MDbVl8Pay8p2FfuWYdCxIvxd+XEWyMN5PsP4MgGR4te suDQ8dafsQfDqMiE6kkMenan6RcHS9HgS7bu7ZABAvSjifAaXx+EtQ8q3RCpKIL09HTi f78FooGpvR4oBIO9SQz2CyoW7FwdEXlgxoa5zS5YSZfPQ3QVsMM2IYEdn0OxZYf3pmNy 0ir1FBOyi3WxdhdJ5eY+ABi9ztklqmsnfdh7sx6lLGALrkv4fpNuEcTwJB/DN1f+diCG 381+6AKZV+jNqCBgPo9gRhT112hgByGHYJ3VHDz/VeFwTHWSVfsLXfSs2HhcvwfKwOow XdkA== X-Forwarded-Encrypted: i=1; AJvYcCU9Ed2K5A2rzpLr3ocSpe0uIiWJOyIIEeTB8lnvzU2fEV9qsZQ643b6LIjen01W8IZuW41tjwS8Ug==@kvack.org X-Gm-Message-State: AOJu0YwboLdQyQiycia6ZoSLFsTDBiurRDSgdQoKwX+4OFT4b8sNa8tI dDrlM/LpDWK5nyi7+6CJjJ95cGqOywrQBYS5kHc8tR9vzORsTmi6 X-Gm-Gg: ASbGnctzOUK/D8maLM1uTbhY2yFy5/5avOfXMmFor/HzpOnZN4JVUms3Xz4ejxgYjsj 4VztHvK50+R3JvUWzbNqHMTnKgwMXRlzowG/ycyy8YcDg5oKdNuhlrQOlEcq39eblamLMFgroPT ZFtDokft6I/Bl4TmyGGeH7GD6yDMGLlaU5WkbdRBAeYXKGyhn/fEZTcoKMkl+L+JCMslGRMZxjn qdjadf3HY88uz75V0Ms3QkqNegipyhe/MJC8GlCzDDdQ+hBcqEOubEy+XPT3oU= X-Google-Smtp-Source: AGHT+IFncZAcAnWxKKV09ycgYCqm0lYNyvdm4XJIXodyQo0lsEwlel1FnkF3rC4Igm8LoVETXeQkKQ== X-Received: by 2002:a05:6871:650:b0:288:2906:6882 with SMTP id 586e51a60fabf-2a7b32cc1b1mr802178fac.29.1734491254754; Tue, 17 Dec 2024 19:07:34 -0800 (PST) Received: from localhost ([2a03:2880:12ff:72::]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-2a3d25473acsm3311761fac.13.2024.12.17.19.07.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Dec 2024 19:07:33 -0800 (PST) From: alexei.starovoitov@gmail.com To: bpf@vger.kernel.org Cc: andrii@kernel.org, memxor@gmail.com, akpm@linux-foundation.org, peterz@infradead.org, vbabka@suse.cz, bigeasy@linutronix.de, rostedt@goodmis.org, houtao1@huawei.com, hannes@cmpxchg.org, shakeel.butt@linux.dev, mhocko@suse.com, willy@infradead.org, tglx@linutronix.de, jannh@google.com, tj@kernel.org, linux-mm@kvack.org, kernel-team@fb.com Subject: [PATCH bpf-next v3 2/6] mm, bpf: Introduce free_pages_nolock() Date: Tue, 17 Dec 2024 19:07:15 -0800 Message-ID: <20241218030720.1602449-3-alexei.starovoitov@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241218030720.1602449-1-alexei.starovoitov@gmail.com> References: <20241218030720.1602449-1-alexei.starovoitov@gmail.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 1D5D28000C X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: br899hskx7cuqd5s646e8b38hiw6pp15 X-HE-Tag: 1734491195-624861 X-HE-Meta: U2FsdGVkX1/FrKTqB3JeXmEI34SSOc509myTKMSwUJwWHZ23wFJFVKB/h0pIDWZXzvKRkBo64OKeLPjHjB8fXFzY7G8HWh9dEUJCSUmpJwGAbOb/DHQ0qnHGXJ6zmn/650DGf4yZ6ScPVsXuylUK+IYCDScb4nwlbvZNm8GrILJ7vUL8nJn+hdrf5DMFlHNZ9WIuCpuBFb0I8cig9gx2xvldbY/L3CGMDscVvVErOaRNyln1dE0M8O+8SQUn9VvO6VDUbPZHuN6QzVI7jq9PozSDypx4otuOUTM5OY4rAAgXZe+FDXW7rJYO6O7Yg3QviyIcV15/pABIBNt6Cw0N0POKNGAyfBZx9aN934cv/3LgcTuVYtdBa8HunOvTOGGhEw9H36JknOm+w8z1cBwb+4oblB8oDEUiX8EQG13Z71oJ5QrO1nRSs4rHQX5GPijbQ1WhouD7/EOBaPmGutSPcUBVrQizBA9QBxRBf5BQyfm+wgLDKUfYPrjiIZFWVGAAMw5WmJT+iD+XGZM2OlIb1gxNFzp2BTXUPPuADtnmoG7PXqTzussiXYUEvXVoQDTG1ljR1jcnQySPv/7zwk2kZyHvy8mLq95GGTXxzERNd4Or99BZkK4wMWF+d75t8azRHN/NDyFEpKPdPKFpvaqep1yA5GRYh785NPiG/bZ+GeIDhNNnT1sj3cqPcKByjb9o3dL4OfVUoKXG90f7m5lPTw8aAy6KfwEd+HtbMXD205e6x7vrwIKEDC2lunzyqub6guIgUQUz8Ukrl8eVyIyTtuLUgruJ/1deFUImrUAgUqAgL++3ivhCfVXeNsUiDV0J/0sRLSXWrfihJscXsklx9p0iaWOzlcyh84PuK1BXaUq9DyTm1bsTV/+u3adUyEOY8/0Bq0Sw7ahcZ0btu6DhtEqmi2cQ58qTAYtqL+cPc5KKYqFNn27X/d1AAYizLDkiKzkZG9anhO0GsT5P0bv f/eALmjE QesGFQZ5czWWnfOVGgXFzNzZIbBoSrofcwRP6c2dyz8SxNlMm5RRLKXKqvTskf/07l2AsXJduryiEzSFH4UEy2xOqcW4DaZxmDcJEdSnixqWd8Dv3nLFIo6Txq7p0bYs7tHPVJuRRSqU4bgYeMeRA98Lzhw22b7So5w07pa6uoFg1mABOA70IKFC8kToDL0DFbeE0D/6cTNgfqgGZg24omXiltUE/gKZz6KnHg4RK+XWXaLlDx/H6nPasWlzt9cWCG6xviOA0RpGedyHx5byt2oRzSF0nTERZRIPiYYecLusA2/mCRrsRwZN5U+zxHI/ILQs7u3UO+5iZpTCWQMZ8QIG+/OjIv0fX+UUN0oavWh3X76C0CwnKXwyve2vrIL29DYqvLi8S47dUwgoVttL/qJnM8Xg8xxu5qdKerYl8dVUqVULwO9zX8pR9Aw88CMlrn5SLvjApcpnOfBZH/yydkBdGNJ50XnqDYFnfZIYI2Rl4QijkNIC+D4DtUQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Alexei Starovoitov Introduce free_pages_nolock() that can free pages without taking locks. It relies on trylock and can be called from any context. Since spin_trylock() cannot be used in RT from hard IRQ or NMI it uses lockless link list to stash the pages which will be freed by subsequent free_pages() from good context. Signed-off-by: Alexei Starovoitov --- include/linux/gfp.h | 1 + include/linux/mm_types.h | 4 ++ include/linux/mmzone.h | 3 ++ mm/page_alloc.c | 79 ++++++++++++++++++++++++++++++++++++---- 4 files changed, 79 insertions(+), 8 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 65b8df1db26a..ff9060af6295 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -372,6 +372,7 @@ __meminit void *alloc_pages_exact_nid_noprof(int nid, size_t size, gfp_t gfp_mas __get_free_pages((gfp_mask) | GFP_DMA, (order)) extern void __free_pages(struct page *page, unsigned int order); +extern void free_pages_nolock(struct page *page, unsigned int order); extern void free_pages(unsigned long addr, unsigned int order); #define __free_page(page) __free_pages((page), 0) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 7361a8f3ab68..52547b3e5fd8 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -99,6 +99,10 @@ struct page { /* Or, free page */ struct list_head buddy_list; struct list_head pcp_list; + struct { + struct llist_node pcp_llist; + unsigned int order; + }; }; /* See page-flags.h for PAGE_MAPPING_FLAGS */ struct address_space *mapping; diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index b36124145a16..1a854e0a9e3b 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -953,6 +953,9 @@ struct zone { /* Primarily protects free_area */ spinlock_t lock; + /* Pages to be freed when next trylock succeeds */ + struct llist_head trylock_free_pages; + /* Write-intensive fields used by compaction and vmstats. */ CACHELINE_PADDING(_pad2_); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d23545057b6e..10918bfc6734 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -88,6 +88,9 @@ typedef int __bitwise fpi_t; */ #define FPI_TO_TAIL ((__force fpi_t)BIT(1)) +/* Free the page without taking locks. Rely on trylock only. */ +#define FPI_TRYLOCK ((__force fpi_t)BIT(2)) + /* prevent >1 _updater_ of zone percpu pageset ->high and ->batch fields */ static DEFINE_MUTEX(pcp_batch_high_lock); #define MIN_PERCPU_PAGELIST_HIGH_FRACTION (8) @@ -1247,13 +1250,44 @@ static void split_large_buddy(struct zone *zone, struct page *page, } } +static void add_page_to_zone_llist(struct zone *zone, struct page *page, + unsigned int order) +{ + /* Remember the order */ + page->order = order; + /* Add the page to the free list */ + llist_add(&page->pcp_llist, &zone->trylock_free_pages); +} + static void free_one_page(struct zone *zone, struct page *page, unsigned long pfn, unsigned int order, fpi_t fpi_flags) { + struct llist_head *llhead; unsigned long flags; - spin_lock_irqsave(&zone->lock, flags); + if (!spin_trylock_irqsave(&zone->lock, flags)) { + if (unlikely(fpi_flags & FPI_TRYLOCK)) { + add_page_to_zone_llist(zone, page, order); + return; + } + spin_lock_irqsave(&zone->lock, flags); + } + + /* The lock succeeded. Process deferred pages. */ + llhead = &zone->trylock_free_pages; + if (unlikely(!llist_empty(llhead) && !(fpi_flags & FPI_TRYLOCK))) { + struct llist_node *llnode; + struct page *p, *tmp; + + llnode = llist_del_all(llhead); + llist_for_each_entry_safe(p, tmp, llnode, pcp_llist) { + unsigned int p_order = p->order; + + split_large_buddy(zone, p, page_to_pfn(p), p_order, fpi_flags); + __count_vm_events(PGFREE, 1 << p_order); + } + } split_large_buddy(zone, page, pfn, order, fpi_flags); spin_unlock_irqrestore(&zone->lock, flags); @@ -2596,7 +2630,7 @@ static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, struct page *page, int migratetype, - unsigned int order) + unsigned int order, fpi_t fpi_flags) { int high, batch; int pindex; @@ -2631,6 +2665,14 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, } if (pcp->free_count < (batch << CONFIG_PCP_BATCH_SCALE_MAX)) pcp->free_count += (1 << order); + + if (unlikely(fpi_flags & FPI_TRYLOCK)) { + /* + * Do not attempt to take a zone lock. Let pcp->count get + * over high mark temporarily. + */ + return; + } high = nr_pcp_high(pcp, zone, batch, free_high); if (pcp->count >= high) { free_pcppages_bulk(zone, nr_pcp_free(pcp, batch, high, free_high), @@ -2645,7 +2687,8 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, /* * Free a pcp page */ -void free_unref_page(struct page *page, unsigned int order) +static void __free_unref_page(struct page *page, unsigned int order, + fpi_t fpi_flags) { unsigned long __maybe_unused UP_flags; struct per_cpu_pages *pcp; @@ -2654,7 +2697,7 @@ void free_unref_page(struct page *page, unsigned int order) int migratetype; if (!pcp_allowed_order(order)) { - __free_pages_ok(page, order, FPI_NONE); + __free_pages_ok(page, order, fpi_flags); return; } @@ -2671,24 +2714,33 @@ void free_unref_page(struct page *page, unsigned int order) migratetype = get_pfnblock_migratetype(page, pfn); if (unlikely(migratetype >= MIGRATE_PCPTYPES)) { if (unlikely(is_migrate_isolate(migratetype))) { - free_one_page(page_zone(page), page, pfn, order, FPI_NONE); + free_one_page(page_zone(page), page, pfn, order, fpi_flags); return; } migratetype = MIGRATE_MOVABLE; } zone = page_zone(page); + if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq())) { + add_page_to_zone_llist(zone, page, order); + return; + } pcp_trylock_prepare(UP_flags); pcp = pcp_spin_trylock(zone->per_cpu_pageset); if (pcp) { - free_unref_page_commit(zone, pcp, page, migratetype, order); + free_unref_page_commit(zone, pcp, page, migratetype, order, fpi_flags); pcp_spin_unlock(pcp); } else { - free_one_page(zone, page, pfn, order, FPI_NONE); + free_one_page(zone, page, pfn, order, fpi_flags); } pcp_trylock_finish(UP_flags); } +void free_unref_page(struct page *page, unsigned int order) +{ + __free_unref_page(page, order, FPI_NONE); +} + /* * Free a batch of folios */ @@ -2777,7 +2829,7 @@ void free_unref_folios(struct folio_batch *folios) trace_mm_page_free_batched(&folio->page); free_unref_page_commit(zone, pcp, &folio->page, migratetype, - order); + order, FPI_NONE); } if (pcp) { @@ -4854,6 +4906,17 @@ void __free_pages(struct page *page, unsigned int order) } EXPORT_SYMBOL(__free_pages); +/* + * Can be called while holding raw_spin_lock or from IRQ and NMI, + * but only for pages that came from try_alloc_pages(): + * order <= 3, !folio, etc + */ +void free_pages_nolock(struct page *page, unsigned int order) +{ + if (put_page_testzero(page)) + __free_unref_page(page, order, FPI_TRYLOCK); +} + void free_pages(unsigned long addr, unsigned int order) { if (addr != 0) {