From patchwork Fri Mar 28 18:20:55 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Rik van Riel X-Patchwork-Id: 14032362 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 900F1C28B20 for ; Fri, 28 Mar 2025 18:21:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 42DD928015E; Fri, 28 Mar 2025 14:21:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3DCCF280156; Fri, 28 Mar 2025 14:21:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2CA5E28015E; Fri, 28 Mar 2025 14:21:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 0B2F3280156 for ; Fri, 28 Mar 2025 14:21:08 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 4233BC04B3 for ; Fri, 28 Mar 2025 18:21:08 +0000 (UTC) X-FDA: 83271776616.03.0371DE7 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf10.hostedemail.com (Postfix) with ESMTP id 7F3BDC0011 for ; Fri, 28 Mar 2025 18:21:05 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf10.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743186066; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=7QR9RjS8Z4tDtMj2ILc40Ct9i/2jEOV/ekJMzPqBBWk=; b=4GSxRN0USeNdBHr6Qp1l35npMCDT+P8UAnerS47yzlxAhYCxlon0keqk3cAU7johHywxoR 29YtJVrQp+5mIsoMczLGw4eYk1rRBvuUWWuZkrnO+y+NPIWB6FxNdBd7JPi9JYjdSNTN1d ZOT7JGGmevJ2BofT3kCWrTfFYDNVnA4= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf10.hostedemail.com: domain of riel@shelob.surriel.com designates 96.67.55.147 as permitted sender) smtp.mailfrom=riel@shelob.surriel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743186066; a=rsa-sha256; cv=none; b=wPPqERpN6hEeBJW+AwJJIBTZloJMdUZSbz5wQclEMjsY9T2/wnlBWDdRuOqlS+pn8EfOsj iElYLJW/J9fN9PF4DdVLI6fCnmg4itaHFTRc3GT7vRkO5eUMW4RoorcJf7A6FNLyEIEpNy +OGTDi/s1K0aynxatj+CG/pcWlUs6r8= Received: from [2601:18c:8180:83cc:5a47:caff:fe78:8708] (helo=fangorn) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1tyEK7-000000004rg-42PA; Fri, 28 Mar 2025 14:20:55 -0400 Date: Fri, 28 Mar 2025 14:20:55 -0400 From: Rik van Riel To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, Vinay Banakar , liuye , Hugh Dickins , Mel Gorman , Yu Zhao , Shakeel Butt Subject: [PATCH v2] mm/vmscan: batch TLB flush during memory reclaim Message-ID: <20250328142055.313916d1@fangorn> X-Mailer: Claws Mail 4.3.0 (GTK 3.24.43; x86_64-redhat-linux-gnu) MIME-Version: 1.0 X-Rspamd-Queue-Id: 7F3BDC0011 X-Rspamd-Server: rspam05 X-Rspam-User: X-Stat-Signature: jxoz1td86ns1bjnyhi3hmqdjcw5ebd9n X-HE-Tag: 1743186065-848843 X-HE-Meta: U2FsdGVkX1/Ux6OPTZ/wYPEXyB74Qk6iUnEcpGfkTTF1DErVoxRu4zCAQzZl2js6Zurh9XFQhSnzN3CL5ikf9jZiozZ+ihOByB6hRiiHzJZ9Xp/MtqRJIpEOiuzeHjMYe8J0gkuCQFTocel+n6Kqad4OQb2PSsT+e+uwE1YUhe1IEeXgKNIcvLiJByEddBGAAxIzBbuH6bpxRuq2brAZA6/qAKc3lGDjHPHL6LG9ngusvgV+Ag+DHCEySbx4m1TROop3Q95fNA/U0RZ7ABYKFVpnmYIHZzxTDCUfDiBxprjzcKyiSG0EKf4ovJzjhOTU5ktforZgK5zBmcM/1UXB6aJv/Xb2L16pNMs5hmEBVnBkP8SJx3Mi5WtQZspOd/VGJxe6TXnPWb69Zr1dOfDAlY20crLRqyXn1zFDjT6vIhwTFK/MkF5UqQ5KD1B63S/BVw6kvq/LbHR0UrbzS8ucQ1siJw06BJ/sjQiOQ1xo2/QngpxtsuzktIUvLJLhjNctKTJu+88yThdR/EMwViN259J1OEmD6yN03lafNJgIHGebD9CgUUscMiqpVvteHYAeHRHzid1qSLkH83Dx7jjXhJdf6dnJYGzx9lGXNawQrdJXZ+OkCxMxHXutRDuXW2S2RRrybRlViNEteWyv3CcgERQRnfV08GoPftYTzIb/l5JkNi+ppLwMBt6awJgQTPlXexXx7Wjc3zyR8IvikqVuxllC2k9OmEEln3ROkIW16VUHKjyY/TKJlWwn7bIEY09owOxoz0MxZq9qAyAK0EDq6eCdClJnvcCbqwf1oMKM0QmNekKUqgN8q0lMXGiCQ3POGL/w0upT/pBF9iYlxKcEXP60kHQlKRdY+9NbVoUlLHtdslaxZqEinoE2Et1avWfiPgggLprzaiiqrjS57/E2XGZCEUrGzQ/tHsxgC+t5ayPHhupdxzSH/1n+X7K+bf3Dw2A5dMbCLK31MvAiYKD 41eScYTC TQcbOFzj9yx3ejzWHYBrsg2gwTFlBAChnrBCSyT/6y7NmHzP/I5S0LL1IX4BtGsevJ7sKy+IVqN8QHrzF0H2d/aFsZYljzkPhsgTzzMqKTuvpdI8tbnP8T6zYHEKtzlXSfDRpDWGb96Sh3e787be8QHeMxY6rRUlmNVQp+uDHUKPRpIlLfwGDS181hGVtChvdFJnSxiXLaYxEY7RXhwlph/7qIaDFBFCDzEF7lU/nUkm0/efpVq2zIks1Tf0H/IdKOJ8hlK0IB9LRxiIfw+lLAklEHGhPra5PDlgA7z2mUL9npMvyXkRLjkqhf1yQyJ7xmg+axxXSy+iAFNRwQz6oGcZUYihw9PXsyGWfze0iSxxjrnTqhDopUkc+Ug== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Vinay Banakar The current implementation in shrink_folio_list() performs a full TLB flush for every individual folio reclaimed. This causes unnecessary overhead during memory reclaim. The current code: 1. Clears PTEs and unmaps each page individually 2. Performs a full TLB flush on every CPU the mm is running on The new code: 1. Clears PTEs and unmaps each page individually 2. Adds each unmapped page to pageout_folios 3. Flushes the TLB once before procesing pageout_folios This reduces the number of TLB flushes issued by the memory reclaim code by 1/N, where N is the number of mapped folios encountered in the batch processed by shrink_folio_list. [riel: forward port to 6.14, adjust code and naming to match surrounding code] Signed-off-by: Vinay Banakar Signed-off-by: Rik van Riel --- v2: remove folio_test_young that broke some 32 bit builds, since pages should be unmapped when they get to this point anyway, and if somebody mapped them again they are by definition (very) recently accessed mm/vmscan.c | 112 +++++++++++++++++++++++++++++++--------------------- 1 file changed, 68 insertions(+), 44 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index c767d71c43d7..286ff627d337 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1086,6 +1086,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, struct folio_batch free_folios; LIST_HEAD(ret_folios); LIST_HEAD(demote_folios); + LIST_HEAD(pageout_folios); unsigned int nr_reclaimed = 0, nr_demoted = 0; unsigned int pgactivate = 0; bool do_demote_pass; @@ -1394,51 +1395,10 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, goto keep_locked; /* - * Folio is dirty. Flush the TLB if a writable entry - * potentially exists to avoid CPU writes after I/O - * starts and then write it out here. + * Add to pageout list for batched TLB flushing and IO submission. */ - try_to_unmap_flush_dirty(); - switch (pageout(folio, mapping, &plug, folio_list)) { - case PAGE_KEEP: - goto keep_locked; - case PAGE_ACTIVATE: - /* - * If shmem folio is split when writeback to swap, - * the tail pages will make their own pass through - * this function and be accounted then. - */ - if (nr_pages > 1 && !folio_test_large(folio)) { - sc->nr_scanned -= (nr_pages - 1); - nr_pages = 1; - } - goto activate_locked; - case PAGE_SUCCESS: - if (nr_pages > 1 && !folio_test_large(folio)) { - sc->nr_scanned -= (nr_pages - 1); - nr_pages = 1; - } - stat->nr_pageout += nr_pages; - - if (folio_test_writeback(folio)) - goto keep; - if (folio_test_dirty(folio)) - goto keep; - - /* - * A synchronous write - probably a ramdisk. Go - * ahead and try to reclaim the folio. - */ - if (!folio_trylock(folio)) - goto keep; - if (folio_test_dirty(folio) || - folio_test_writeback(folio)) - goto keep_locked; - mapping = folio_mapping(folio); - fallthrough; - case PAGE_CLEAN: - ; /* try to free the folio below */ - } + list_add(&folio->lru, &pageout_folios); + continue; } /* @@ -1549,6 +1509,70 @@ static unsigned int shrink_folio_list(struct list_head *folio_list, } /* 'folio_list' is always empty here */ + if (!list_empty(&pageout_folios)) { + /* + * The loop above unmapped the folios from the page tables. + * One TLB flush takes care of the whole batch. + */ + try_to_unmap_flush_dirty(); + + while (!list_empty(&pageout_folios)) { + struct folio *folio = lru_to_folio(&pageout_folios); + struct address_space *mapping; + list_del(&folio->lru); + + /* Recheck if the page got reactivated */ + if (folio_test_active(folio) || folio_mapped(folio)) + goto skip_pageout_locked; + + mapping = folio_mapping(folio); + switch (pageout(folio, mapping, &plug, &pageout_folios)) { + case PAGE_KEEP: + case PAGE_ACTIVATE: + goto skip_pageout_locked; + case PAGE_SUCCESS: + /* + * If shmem folio is split when writeback to swap, + * the tail pages will make their own pass through + * this loop and be accounted then. + */ + stat->nr_pageout += folio_nr_pages(folio); + + if (folio_test_writeback(folio)) + goto skip_pageout; + if (folio_test_dirty(folio)) + goto skip_pageout; + + /* + * A synchronous write - probably a ramdisk. Go + * ahead and try to reclaim the folio. + */ + if (!folio_trylock(folio)) + goto skip_pageout; + if (folio_test_dirty(folio) || + folio_test_writeback(folio)) + goto skip_pageout_locked; + mapping = folio_mapping(folio); + /* try to free the folio below */ + fallthrough; + case PAGE_CLEAN: + /* try to free the folio */ + if (!mapping || + !remove_mapping(mapping, folio)) + goto skip_pageout_locked; + + nr_reclaimed += folio_nr_pages(folio); + folio_unlock(folio); + continue; + } + +skip_pageout_locked: + folio_unlock(folio); +skip_pageout: + list_add(&folio->lru, &ret_folios); + } + } + /* Migrate folios selected for demotion */ nr_demoted = demote_folio_list(&demote_folios, pgdat); nr_reclaimed += nr_demoted;