From patchwork Wed Mar 19 17:28:18 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 14022871 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03244C35FFC for ; Wed, 19 Mar 2025 17:30:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2F6B5280002; Wed, 19 Mar 2025 13:30:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 27DAA280001; Wed, 19 Mar 2025 13:30:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 145F0280002; Wed, 19 Mar 2025 13:30:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id E7114280001 for ; Wed, 19 Mar 2025 13:30:16 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id C2AF7562C8 for ; Wed, 19 Mar 2025 17:30:18 +0000 (UTC) X-FDA: 83238989316.18.8540B6B Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf27.hostedemail.com (Postfix) with ESMTP id 2D35B40009 for ; Wed, 19 Mar 2025 17:30:16 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf27.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742405417; a=rsa-sha256; cv=none; b=SOsjnWtWzOqAkFjSR5sHoJVh3zu41FOBI5N/+pDiEVida4A7oUy74WZMX0Pplm5tBO8XoB ymYiuYCoKE9Iz8rFUjsuhz9A1pe4NSRJRIWE2ftbRZyA3mFQjifOU3GrL3Hyfao7fbLHbh nw5ep9dX4ImxuWoTYtWg2Gfaj3AQgPA= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf27.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742405417; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=MFDqPoXl8Fr2wOXUU+7SdR4rRBr2jWkmOUR9wW+0/ik=; b=F+LHyTlE1P/yqxtIvMPXFm5GiaD3TGlMrjO2ZbXbJRxJiEk7jT7q5PxjQ5Aemn1xFkGl3v iK2hkRkJ+jGRT6RS8SHRsUhvvqcRYSl7Vm4J5e09ViKmKGag411vrL+8tbl4wVc5FDt6yF vJ8gbAnFKv62RPPCane7N9SJVbVcG1U= Received: from [2601:18c:8180:83cc:5a47:caff:fe78:8708] (helo=fangorn) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1tuxDG-0000000081m-1RhR; Wed, 19 Mar 2025 13:28:18 -0400 Date: Wed, 19 Mar 2025 13:28:18 -0400 From: Rik van Riel To: Vinay Banakar Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, Bharata B Rao , Dave Hansen , Peter Zijlstra , Borislav Petkov , SeongJae Park , Matthew Wilcox , Byungchul Park , Brendan Jackman Subject: [PATCH] mm/vmscan: batch TLB flush during memory reclaim Message-ID: <20250319132818.1003878b@fangorn> X-Mailer: Claws Mail 4.3.0 (GTK 3.24.43; x86_64-redhat-linux-gnu) MIME-Version: 1.0 X-Stat-Signature: yk3zx3qwwyx6xzboqjk6kgf8y5krxube X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 2D35B40009 X-Rspam-User: X-HE-Tag: 1742405416-968733 X-HE-Meta: U2FsdGVkX18dsmy1qW4m/vd3plseDpWx+1xCuGdFtfd0oMQUB/OJvv8LzFP+l0MqRC38aC6441Im+tdgk59IOSw0vU5YIf4xMFz8f8hQM9wm6nMBpX2uJxC7zoms8jb5TJ1J6sXLUaOqlJ7cexshK4ev8GMYVGqb/s92J+yaR+gAHfxUqEUoj0jxERUAlbJJVTqFJ3fNc5Et8EFwuUjZvsNNV6016t0Fnknhf4OZwaTCaPsYgNmI2qaKzrJLYLN/BdsaZi4mvQbFnipH9x4CJ1/MWvAE0SAax5cq052hIFVPL0CIOZUka13A6NU44pXcxhsxaAJmS6UCnHfeop5GAhM6gko/WeyZF9wI46N0Uxd2cxKK15HhNp7qXTgvNmoGT6bBQXpxfRi6JCOvG8aUsR7OwdYmBN/SR7BLK9WcVMptFrPGNswilv3E/WFcnkzntaAirrzmugOdvEpg9cM+qLbm2fDEjtovvSGkR2QajDgi4Uo9HZO30T1wgrlVHgeUr1WzZ7V+fx+6riNHYCQg6CCrXVLNoLJCiibmiZbrR7ZfNBni+pseoE8hyStnh2K7mNlqjiSYU7X+9uuJB5Jt/P61+aWhz5QqboK2JFy5GEhi1kO6ZbBZlIb6o767l8b+1mUvd4me388NVXeXruYnzNoeqf22fN7oQI/WFyAYx9VvqV2IFb92fywKmt2S73V7ie5nk2KYRauBL7BhLm7dN5sIp1yhLbNYFgoVI68wUl1l6Hk4Kxi8DBa4D3Rnb6t/7Q7p02tv1WKTiXaJaA4bfGg4AZNa3i8P4S/bzZiaTnpPUw2dqYj/NwTU9TpKbfvFOkT8qCaqwJKYIq7ogl7Mb3w1WH84n8ZdUx8bLr6mrzTxXlGU8RnnuvwRLFfG0KFC96zSpyU462kqFRsz4q4DL9HM5LdomA8akwl9f8aXKU9SkdxwAz2jw49RzJcLvbWP0nIHG2CdKQ+psAG15BH uGMdghsf 5yi2Q34ioH8wrAya3eexUKM7IjFVc808yuzHDXLOHpputfMVXAFHgQyjbob2apd66lUyiscHHHkQ85e8wW3YPX26MTyYf20SnpH8nvr0U8aSQ9iq/RINsrKO4tCCieucLw00xFrm+m79JV/dGYS2IQjc9E87pV5eEYDBJnAygNL/z1I3F2Oyym1p981lj8eymQucNiLnd8URBT2K2o8qD5I4hp4k1YDIcfxVC9O3OT3zh3SxG5sckDhpRP7J3OwSeuqZnuZXgZVdI4NeR60whDIZfq9pBEjLsdfzLWCLkjswUUZ7FnL9A8jRdYjQZ64zpELEbes+pZo1RegBVANREitgopPXwQVKnzzcfCT5wjb9A+3YTyPIb352QobGLifyl89Cd9pDb+QtSlXmtB6vCurCHyWsEIjincFsuLJEK2k+RChFfmBPyKwe7svOMBztpFORxG8O7Gs8NLVg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Vinay Banakar The current implementation in shrink_folio_list() performs a full TLB flush for every individual folio reclaimed. This causes unnecessary overhead during memory reclaim. The current code: 1. Clears PTEs and unmaps each page individually 2. Performs a full TLB flush on every CPU the mm is running on The new code: 1. Clears PTEs and unmaps each page individually 2. Adds each unmapped page to pageout_folios 3. Flushes the TLB once before procesing pageout_folios This reduces the number of TLB flushes issued by the memory reclaim code by 1/N, where N is the number of mapped folios encountered in the batch processed by shrink_folio_list. [riel: forward port to 6.14, adjust code and naming to match surrounding code] Signed-off-by: Vinay Banakar Signed-off-by: Rik van Riel --- mm/vmscan.c | 113 ++++++++++++++++++++++++++++++++-------------------- 1 file changed, 69 insertions(+), 44 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index c767d71c43d7..ed2761610620 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1086,6 +1086,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, struct folio_batch free_folios; LIST_HEAD(ret_folios); LIST_HEAD(demote_folios); + LIST_HEAD(pageout_folios); unsigned int nr_reclaimed = 0, nr_demoted = 0; unsigned int pgactivate = 0; bool do_demote_pass; @@ -1394,51 +1395,10 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, goto keep_locked; /* - * Folio is dirty. Flush the TLB if a writable entry - * potentially exists to avoid CPU writes after I/O - * starts and then write it out here. + * Add to pageout list for batched TLB flushing and IO submission. */ - try_to_unmap_flush_dirty(); - switch (pageout(folio, mapping, &plug, folio_list)) { - case PAGE_KEEP: - goto keep_locked; - case PAGE_ACTIVATE: - /* - * If shmem folio is split when writeback to swap, - * the tail pages will make their own pass through - * this function and be accounted then. - */ - if (nr_pages > 1 && !folio_test_large(folio)) { - sc->nr_scanned -= (nr_pages - 1); - nr_pages = 1; - } - goto activate_locked; - case PAGE_SUCCESS: - if (nr_pages > 1 && !folio_test_large(folio)) { - sc->nr_scanned -= (nr_pages - 1); - nr_pages = 1; - } - stat->nr_pageout += nr_pages; - - if (folio_test_writeback(folio)) - goto keep; - if (folio_test_dirty(folio)) - goto keep; - - /* - * A synchronous write - probably a ramdisk. Go - * ahead and try to reclaim the folio. - */ - if (!folio_trylock(folio)) - goto keep; - if (folio_test_dirty(folio) || - folio_test_writeback(folio)) - goto keep_locked; - mapping = folio_mapping(folio); - fallthrough; - case PAGE_CLEAN: - ; /* try to free the folio below */ - } + list_add(&folio->lru, &pageout_folios); + continue; } /* @@ -1549,6 +1509,71 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, } /* 'folio_list' is always empty here */ + if (!list_empty(&pageout_folios)) { + /* + * The loop above unmapped the folios from the page tables. + * One TLB flush takes care of the whole batch. + */ + try_to_unmap_flush_dirty(); + + while (!list_empty(&pageout_folios)) { + struct folio *folio = lru_to_folio(&pageout_folios); + struct address_space *mapping; + list_del(&folio->lru); + + /* Recheck if the page got reactivated */ + if (folio_test_active(folio) || + (folio_mapped(folio) && folio_test_young(folio))) + goto skip_pageout_locked; + + mapping = folio_mapping(folio); + switch (pageout(folio, mapping, &plug, &pageout_folios)) { + case PAGE_KEEP: + case PAGE_ACTIVATE: + goto skip_pageout_locked; + case PAGE_SUCCESS: + /* + * If shmem folio is split when writeback to swap, + * the tail pages will make their own pass through + * this loop and be accounted then. + */ + stat->nr_pageout += folio_nr_pages(folio); + + if (folio_test_writeback(folio)) + goto skip_pageout; + if (folio_test_dirty(folio)) + goto skip_pageout; + + /* + * A synchronous write - probably a ramdisk. Go + * ahead and try to reclaim the folio. + */ + if (!folio_trylock(folio)) + goto skip_pageout; + if (folio_test_dirty(folio) || + folio_test_writeback(folio)) + goto skip_pageout_locked; + mapping = folio_mapping(folio); + /* try to free the folio below */ + fallthrough; + case PAGE_CLEAN: + /* try to free the folio */ + if (!mapping || + !remove_mapping(mapping, folio)) + goto skip_pageout_locked; + + nr_reclaimed += folio_nr_pages(folio); + folio_unlock(folio); + continue; + } + +skip_pageout_locked: + folio_unlock(folio); +skip_pageout: + list_add(&folio->lru, &ret_folios); + } + } + /* Migrate folios selected for demotion */ nr_demoted = demote_folio_list(&demote_folios, pgdat); nr_reclaimed += nr_demoted;