From patchwork Wed Jan 15 03:38:05 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13939805 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F1EA1C02180 for ; Wed, 15 Jan 2025 03:42:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=0QC+d2rUjXG1ORxI8BWNIR6GVW+NFnCMOI2H156GZVg=; b=i0ZN1j7PiVyVp8 6gU569i59POQd1afWFES/ID9rzT/hwFMPSPl50ERnX/Rga/lSXD9dXhXRHSDA88mMv2i8jRriycvC LSyW1oPFYv8S9KaOYUjTxZQjeuQOE5o9981LYrZ4p1pIN7mEMl1qtEi4H7Bunzigs5Zq1DCG2CUni jTEgLCj5O4lFujfwL38b31y5orDR1AVYL/UxiG/FCPzH+yC3eaxn+QwWuIA6WelnIvKFjzzK1evTa dF3w+q/K91vzMh6dstvB2XnIkZB/zb6qlDnOULO2wHsT4onvYuVe5FrCJw/tlgEqcf9iqmZYJj2ur 3hxIYpn+tjRu14CfYCnQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tXuIS-0000000AXTL-3jem; Wed, 15 Jan 2025 03:42:24 +0000 Received: from mail-pl1-x631.google.com ([2607:f8b0:4864:20::631]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tXuEk-0000000AWco-1HXW; Wed, 15 Jan 2025 03:38:35 +0000 Received: by mail-pl1-x631.google.com with SMTP id d9443c01a7336-21680814d42so92701755ad.2; Tue, 14 Jan 2025 19:38:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736912313; x=1737517113; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KJIQvsAs93F2Rgx1xFn4Hzm1URQ2cAp3NCo21K6rE7w=; b=k4JWV/T3/MmAq2XFttjkj1k4zEp9AiVv198o0+oQRedIJVWapHBImD3x3jnhd9Oa3j wN+Ahay/YamA+f7Q+ZDCWmBnRFCffRxq92ineMvZr6+ik0rehFDVGk8hdJK3AX3xBCzN cScMCnACd1zwO89QOWwR9PlIJm4q3E8cDgOj++CCBk7PR35unb4BbwHMHHYUnkpPBS8O CFJGvEwQUmRubzHf/mWyNPYBOsQvYB8mrOr6N3S5MR4kn7WXSzXwXuGmJh++Y8DzKe2B pg4C/z464fbx24L8i9oJAej64hRGM62h4rUS2QLELTPE19yMY779gzv+Dyq+aoNAmutx YBQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736912313; x=1737517113; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KJIQvsAs93F2Rgx1xFn4Hzm1URQ2cAp3NCo21K6rE7w=; b=NrxCdtdgjM2WhV83OLCCqFarCSnb/F+QWd5LixqXzFyLKRDahx88ERFTBHGvCjzxsP Iigvxm6vsf0qNBsEwlNYxukhWg64Cn0uZu741FFX6o3IPzrv5OR93gpUGMqjqZWgxCvU 1CSvCkGyJQF314BQUbnHSA2N5Y6RVgwCSy+npFvbXMiCh3GCLJXw0NaXW40PNBUuZTRk 2abgs/E+JfOBoD+VF7jxQ1HYSmvRyVMwgJDNqkGJRDW6vLD2EPSqLfxAp7v/H/0ugVr1 lQL2gULXAzppD01KbWuGxXBd1JltcyllTGAcXUi92q22nvVYl1TqsRLkzJp1Y8Bzl8Fv CErQ== X-Forwarded-Encrypted: i=1; AJvYcCUUAM2yNs9V1EShsygy3hYHCTlipO1qLdWw7hTDgO9O8aUawTYNllWFYf+7ZEz/kIymWSMb2fNggblx4MjVaE3o@lists.infradead.org, AJvYcCVPTD5Id7k8kLRRTfYB1QaPel0T5haQ93h3sX1CdR5IFdz6hbFTWiySkdAGK56fZ1EdeZEPv2nrSD5D+IE=@lists.infradead.org X-Gm-Message-State: AOJu0Yz2lVKo85qGZ0udhwpkKBwySYoL+D83mOScEE2J71fXgk6tZ4h7 dOihip1cZGN3mOynnQbnbW2T1hpDfCF4SGVZCxh8G87qay2A+Mnk X-Gm-Gg: ASbGnctvNJs4zAWwEvuLXIUgspQsM6NLDdIKBeQHbmG5OtdwSnEQ0c1KfkjdhcxaSon e5/uxB7PONHJCGGPXbhBFk1vnbo+VP1712Z6aMXp9ZGKq3yCT3qUPYfN6clyvvVf6irHMX98Hhe fQKH6OZ4KQ86OfTaPdivEUYgNfn6OXQxc3oFqEj0uDtiZpoh6v6DN/jkDC8Q54RZHhnyXqLv/Pq 4+xrqvZe3MeQpV98gPTAFrYhgYe1SOAwMtvO3jH/ABflRpnaI4R+l1HDuWl863aRgUlglDQN6hL dxKnVKQE X-Google-Smtp-Source: AGHT+IHnFl4K6iDBiLirFwWAnLa88ULHZZM7psfa+Le3bhG1J3ARcy1prtquvYLzB7/JWDl2R1Bcjw== X-Received: by 2002:a17:903:1cf:b0:216:39fa:5cb4 with SMTP id d9443c01a7336-21a83f67c50mr435295355ad.25.1736912313634; Tue, 14 Jan 2025 19:38:33 -0800 (PST) Received: from Barrys-MBP.hub ([2407:7000:af65:8200:e5d5:b870:ca9b:78f8]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21a9f10dffbsm73368195ad.49.2025.01.14.19.38.26 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 14 Jan 2025 19:38:32 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: 21cnbao@gmail.com, baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, ioworker0@gmail.com, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, lorenzo.stoakes@oracle.com, ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org, ying.huang@intel.com, zhengtangquan@oppo.com, Mauricio Faria de Oliveira Subject: [PATCH v3 1/4] mm: Set folio swapbacked iff folios are dirty in try_to_unmap_one Date: Wed, 15 Jan 2025 16:38:05 +1300 Message-Id: <20250115033808.40641-2-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250115033808.40641-1-21cnbao@gmail.com> References: <20250115033808.40641-1-21cnbao@gmail.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250114_193834_344857_25007562 X-CRM114-Status: GOOD ( 15.30 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Barry Song The refcount may be temporarily or long-term increased, but this does not change the fundamental nature of the folio already being lazy- freed. Therefore, we only reset 'swapbacked' when we are certain the folio is dirty and not droppable. Fixes: 6c8e2a256915 ("mm: fix race between MADV_FREE reclaim and blkdev direct IO read") Suggested-by: David Hildenbrand Signed-off-by: Barry Song Cc: Mauricio Faria de Oliveira Acked-by: David Hildenbrand Reviewed-by: Baolin Wang Reviewed-by: Lance Yang --- mm/rmap.c | 49 ++++++++++++++++++++++--------------------------- 1 file changed, 22 insertions(+), 27 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index c6c4d4ea29a7..de6b8c34e98c 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1868,34 +1868,29 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, */ smp_rmb(); - /* - * The only page refs must be one from isolation - * plus the rmap(s) (dropped by discard:). - */ - if (ref_count == 1 + map_count && - (!folio_test_dirty(folio) || - /* - * Unlike MADV_FREE mappings, VM_DROPPABLE - * ones can be dropped even if they've - * been dirtied. - */ - (vma->vm_flags & VM_DROPPABLE))) { - dec_mm_counter(mm, MM_ANONPAGES); - goto discard; - } - - /* - * If the folio was redirtied, it cannot be - * discarded. Remap the page to page table. - */ - set_pte_at(mm, address, pvmw.pte, pteval); - /* - * Unlike MADV_FREE mappings, VM_DROPPABLE ones - * never get swap backed on failure to drop. - */ - if (!(vma->vm_flags & VM_DROPPABLE)) + if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE)) { + /* + * redirtied either using the page table or a previously + * obtained GUP reference. + */ + set_pte_at(mm, address, pvmw.pte, pteval); folio_set_swapbacked(folio); - goto walk_abort; + goto walk_abort; + } else if (ref_count != 1 + map_count) { + /* + * Additional reference. Could be a GUP reference or any + * speculative reference. GUP users must mark the folio + * dirty if there was a modification. This folio cannot be + * reclaimed right now either way, so act just like nothing + * happened. + * We'll come back here later and detect if the folio was + * dirtied when the additional reference is gone. + */ + set_pte_at(mm, address, pvmw.pte, pteval); + goto walk_abort; + } + dec_mm_counter(mm, MM_ANONPAGES); + goto discard; } if (swap_duplicate(entry) < 0) { From patchwork Wed Jan 15 03:38:06 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13939806 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A0A72C02180 for ; Wed, 15 Jan 2025 03:43:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=xLU/OE3UQQNKRIqn40/SkzZ0KtlHHSisZNBOEJoc3+A=; b=eKvNqE9Y22C5iG jKQXNtYdL9zDPfEyprUeQV3w48F2lIBi/SNy0G2pR7/OFvyHnuDwK7T0UQgAtjtcxQD5lhvdgCDy+ wypgUJMyG1yf1kU26qT5+ZBoDkoxVZjrLtIW7ULGGzaNLXUVuQwE7QONJt41jhH3MOAnmxJJKTMws Xb6v49DQLCp2LgvO+NGUbTQb/86uUsRUHGY8ouIuaRWXUcFVk3Fdr3kG00xXz0uTenEzG5GT4Qu8X kE/6Epd0cXo98s/gXjgfR+o6oyzRXnJnOkG/bSqVouPq8zuo/Awyw6nJOIADlnSbVlOoimIec1FF6 8uTNNYf/oQOXzSY+lVcg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tXuJl-0000000AXds-2ZYx; Wed, 15 Jan 2025 03:43:45 +0000 Received: from mail-pl1-x629.google.com ([2607:f8b0:4864:20::629]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tXuEw-0000000AWh4-0b2m; Wed, 15 Jan 2025 03:38:47 +0000 Received: by mail-pl1-x629.google.com with SMTP id d9443c01a7336-216281bc30fso132727605ad.0; Tue, 14 Jan 2025 19:38:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736912325; x=1737517125; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gFMoOwSJpv8P/wz3qgwCAFHIUmnEPnRJywSyGSBHca8=; b=SjdlBJ07WFbC1uFFac/tV2wm7UsFQijuPdPqM/lDKcoSJ34ROWgnPOF8Go7YCN4vB7 3a2KIf7OjAvliD/TOTFaK7AkmaEUgY+tcVamvU8cWUUanckIdjxo0XMp/KbB5UrzibEw uZKsoOZq3S/3dL+6pLqx22BhZ+l73OGW3j/KEWeDypXatTX9wadfUw3DC5DyRNsCEfPf 2PxolzKve6lQ1VFjOaTCQgtfdBaQxSBD3nHrS/Hl5urwRTXM67cqqXMDxcIzSfwGBOjw PkdmpiCDd+zfQVVesmcaZI5WO0bzcV44nyLsOCSstMYU6dTjWup5U7XfTLzo2VYXc1Lg /zsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736912325; x=1737517125; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gFMoOwSJpv8P/wz3qgwCAFHIUmnEPnRJywSyGSBHca8=; b=Qu3x+AQkG0IMeRssLc75kazpKKJ+ZM2gU5G9rwB82/QMPQZY2kq1mA2K1HJyc8I8yZ nGiLgvGPmZsENhNTEXM+MW6SVLpv7y24cKAU3cGtcG8PSr1C0A0TVNzM0V1EvhQh4jIF gInbfnC5v+BaqiEm611+1AtU2XBh1IS/PcZ9j7pKU2+r3WkDmrWbA0yZ66rMh3M6uBt2 Y6SonCU1cqxi0Oten2W6KpXeJo1mW9a/2bE3FP/TY7LyP84dW2PGqLVy50haUX2meYL8 ZmNRBxB+8/IIG1ERVRPWb1tjUt8uAYhPv2pMer8pubL7zxXO5mrqArHILITXW/ebUx6I K4iA== X-Forwarded-Encrypted: i=1; AJvYcCX11bN70EkI55P2sFbCD/HXn6730pDck7e1uHAA8nbib6sb6BCswSBVvYaIjAmZkdPuceHB8fWcWIdB8Rwz3iqW@lists.infradead.org, AJvYcCXMtcnHE9bLpJK1HIIwFIcur4xGgVrOB9NV0oIg6dysmpKczqpLHeQM6ZRtJpZA8fAQ2hhg1ujSrOnoj3w=@lists.infradead.org X-Gm-Message-State: AOJu0YzPzSOrvDkgGSvsOS3McwKr1oqCP65Tuuc93/er42ENfS3/+Kzc JTqsHuj7ZtDdUEHE152EJpJ5TinHb4Uw8ewoCILG7Kf2UXxnnHnt X-Gm-Gg: ASbGncu7L8tChxBzDHdtq1pxlygBFQVngObl0gUNsZPFe317XV32HUuqXUtunMqW1ei d3u15Ynp3TMihWjtAZKjjdOPfrLsBnhZ7IzWiJSYTSoYBPc/ND3FtVcCu2zkyp2jKac10eOn9PG NNwxsXbxdXsBRTK6KPRqYBRI4Qn4QLhj48Sg801m00EAb9FiZVZazgxL9K0bz4HmdjKcXdr7EAl CZ0S6Xp1rnUtZ3K7s0sxmjE5aY1m9fHcmyu7oqlOaE133HG9pZMWEs6/Z9VKNKLOU6QOJoJdzW5 KuyKw+U7 X-Google-Smtp-Source: AGHT+IFF9bEbjjmNapGhmql+BuPntstAKEicpEzxJg3Qh6LkZHMTOzHOOAv9vQg9vg8A6/X699gIig== X-Received: by 2002:a17:902:c941:b0:215:a179:14ca with SMTP id d9443c01a7336-21a83f3eec9mr416527735ad.2.1736912325125; Tue, 14 Jan 2025 19:38:45 -0800 (PST) Received: from Barrys-MBP.hub ([2407:7000:af65:8200:e5d5:b870:ca9b:78f8]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21a9f10dffbsm73368195ad.49.2025.01.14.19.38.34 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 14 Jan 2025 19:38:44 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: 21cnbao@gmail.com, baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, ioworker0@gmail.com, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, lorenzo.stoakes@oracle.com, ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org, ying.huang@intel.com, zhengtangquan@oppo.com, Catalin Marinas , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Anshuman Khandual , Shaoqin Huang , Gavin Shan , Kefeng Wang , Mark Rutland , "Kirill A. Shutemov" , Yosry Ahmed , Paul Walmsley , Palmer Dabbelt , Albert Ou , Yicong Yang , Will Deacon Subject: [PATCH v3 2/4] mm: Support tlbbatch flush for a range of PTEs Date: Wed, 15 Jan 2025 16:38:06 +1300 Message-Id: <20250115033808.40641-3-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250115033808.40641-1-21cnbao@gmail.com> References: <20250115033808.40641-1-21cnbao@gmail.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250114_193846_194362_AE73A176 X-CRM114-Status: GOOD ( 17.20 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Barry Song This patch lays the groundwork for supporting batch PTE unmapping in try_to_unmap_one(). It introduces range handling for TLB batch flushing, with the range currently set to the size of PAGE_SIZE. The function __flush_tlb_range_nosync() is architecture-specific and is only used within arch/arm64. This function requires the mm structure instead of the vma structure. To allow its reuse by arch_tlbbatch_add_pending(), which operates with mm but not vma, this patch modifies the argument of __flush_tlb_range_nosync() to take mm as its parameter. Cc: Catalin Marinas Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: "H. Peter Anvin" Cc: Anshuman Khandual Cc: Ryan Roberts Cc: Shaoqin Huang Cc: Gavin Shan Cc: Kefeng Wang Cc: Mark Rutland Cc: David Hildenbrand Cc: Lance Yang Cc: "Kirill A. Shutemov" Cc: Yosry Ahmed Cc: Paul Walmsley Cc: Palmer Dabbelt Cc: Albert Ou Cc: Yicong Yang Signed-off-by: Barry Song Acked-by: Will Deacon --- arch/arm64/include/asm/tlbflush.h | 25 +++++++++++++------------ arch/arm64/mm/contpte.c | 2 +- arch/riscv/include/asm/tlbflush.h | 5 +++-- arch/riscv/mm/tlbflush.c | 5 +++-- arch/x86/include/asm/tlbflush.h | 5 +++-- mm/rmap.c | 12 +++++++----- 6 files changed, 30 insertions(+), 24 deletions(-) diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index bc94e036a26b..98fbc8df7cf3 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -322,13 +322,6 @@ static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm) return true; } -static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, - struct mm_struct *mm, - unsigned long uaddr) -{ - __flush_tlb_page_nosync(mm, uaddr); -} - /* * If mprotect/munmap/etc occurs during TLB batched flushing, we need to * synchronise all the TLBI issued with a DSB to avoid the race mentioned in @@ -448,7 +441,7 @@ static inline bool __flush_tlb_range_limit_excess(unsigned long start, return false; } -static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma, +static inline void __flush_tlb_range_nosync(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned long stride, bool last_level, int tlb_level) @@ -460,12 +453,12 @@ static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma, pages = (end - start) >> PAGE_SHIFT; if (__flush_tlb_range_limit_excess(start, end, pages, stride)) { - flush_tlb_mm(vma->vm_mm); + flush_tlb_mm(mm); return; } dsb(ishst); - asid = ASID(vma->vm_mm); + asid = ASID(mm); if (last_level) __flush_tlb_range_op(vale1is, start, pages, stride, asid, @@ -474,7 +467,7 @@ static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma, __flush_tlb_range_op(vae1is, start, pages, stride, asid, tlb_level, true, lpa2_is_enabled()); - mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, start, end); + mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end); } static inline void __flush_tlb_range(struct vm_area_struct *vma, @@ -482,7 +475,7 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma, unsigned long stride, bool last_level, int tlb_level) { - __flush_tlb_range_nosync(vma, start, end, stride, + __flush_tlb_range_nosync(vma->vm_mm, start, end, stride, last_level, tlb_level); dsb(ish); } @@ -533,6 +526,14 @@ static inline void __flush_tlb_kernel_pgtable(unsigned long kaddr) dsb(ish); isb(); } + +static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, + struct mm_struct *mm, + unsigned long start, + unsigned long end) +{ + __flush_tlb_range_nosync(mm, start, end, PAGE_SIZE, true, 3); +} #endif #endif diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c index 55107d27d3f8..bcac4f55f9c1 100644 --- a/arch/arm64/mm/contpte.c +++ b/arch/arm64/mm/contpte.c @@ -335,7 +335,7 @@ int contpte_ptep_clear_flush_young(struct vm_area_struct *vma, * eliding the trailing DSB applies here. */ addr = ALIGN_DOWN(addr, CONT_PTE_SIZE); - __flush_tlb_range_nosync(vma, addr, addr + CONT_PTE_SIZE, + __flush_tlb_range_nosync(vma->vm_mm, addr, addr + CONT_PTE_SIZE, PAGE_SIZE, true, 3); } diff --git a/arch/riscv/include/asm/tlbflush.h b/arch/riscv/include/asm/tlbflush.h index 72e559934952..e4c533691a7d 100644 --- a/arch/riscv/include/asm/tlbflush.h +++ b/arch/riscv/include/asm/tlbflush.h @@ -60,8 +60,9 @@ void flush_pmd_tlb_range(struct vm_area_struct *vma, unsigned long start, bool arch_tlbbatch_should_defer(struct mm_struct *mm); void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, - struct mm_struct *mm, - unsigned long uaddr); + struct mm_struct *mm, + unsigned long start, + unsigned long end); void arch_flush_tlb_batched_pending(struct mm_struct *mm); void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c index 9b6e86ce3867..6d6e8e7cc576 100644 --- a/arch/riscv/mm/tlbflush.c +++ b/arch/riscv/mm/tlbflush.c @@ -186,8 +186,9 @@ bool arch_tlbbatch_should_defer(struct mm_struct *mm) } void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, - struct mm_struct *mm, - unsigned long uaddr) + struct mm_struct *mm, + unsigned long start, + unsigned long end) { cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm)); } diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 69e79fff41b8..2b511972d008 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -278,8 +278,9 @@ static inline u64 inc_mm_tlb_gen(struct mm_struct *mm) } static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_batch *batch, - struct mm_struct *mm, - unsigned long uaddr) + struct mm_struct *mm, + unsigned long start, + unsigned long end) { inc_mm_tlb_gen(mm); cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm)); diff --git a/mm/rmap.c b/mm/rmap.c index de6b8c34e98c..abeb9fcec384 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -672,7 +672,8 @@ void try_to_unmap_flush_dirty(void) (TLB_FLUSH_BATCH_PENDING_MASK / 2) static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, - unsigned long uaddr) + unsigned long start, + unsigned long end) { struct tlbflush_unmap_batch *tlb_ubc = ¤t->tlb_ubc; int batch; @@ -681,7 +682,7 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, if (!pte_accessible(mm, pteval)) return; - arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr); + arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, start, end); tlb_ubc->flush_required = true; /* @@ -757,7 +758,8 @@ void flush_tlb_batched_pending(struct mm_struct *mm) } #else static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, - unsigned long uaddr) + unsigned long start, + unsigned long end) { } @@ -1792,7 +1794,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pteval, address); + set_tlb_ubc_flush_pending(mm, pteval, address, address + PAGE_SIZE); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } @@ -2164,7 +2166,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, */ pteval = ptep_get_and_clear(mm, address, pvmw.pte); - set_tlb_ubc_flush_pending(mm, pteval, address); + set_tlb_ubc_flush_pending(mm, pteval, address, address + PAGE_SIZE); } else { pteval = ptep_clear_flush(vma, address, pvmw.pte); } From patchwork Wed Jan 15 03:38:07 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13939809 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 34522C02185 for ; Wed, 15 Jan 2025 03:45:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=NeX+mOZBMwnKNM7QGK8AMqWy9CQ4ySHgzzHu/CwJbag=; b=JO1v1/oRgewCGs c8FyF+z/rmS7llbC0aJZfK5IjVoFfzpMNOOzvwg9CGY4WZpqd2OQTFVn8O7xKbV4XJPEPl3VLYviW uZSpG+tc3fOx25G9TqjAIO731JKvLjvzBlxiick1PLQuAHYm3RlbVMXJUMJODTPUwsdyvoQ5pTecW yurII7Yz25IyAYVtQX9JPX4Ptl9a+YDKzvD6NDJlE/N7OgPEpp0CD4QPTXyxuH/iOijIRutvL8jE3 f55fj/44gG/2twH7P6gsDQtIzJwwjbRAig9BTBL4OYhBo3ktoqBPyKq3lAU3Hr0YsgEPvawcc4pYE A6Rmqabr0OVv8/uI0jeg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tXuL5-0000000AXnm-0Ys9; Wed, 15 Jan 2025 03:45:07 +0000 Received: from mail-pl1-x62b.google.com ([2607:f8b0:4864:20::62b]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tXuF2-0000000AWk1-1DMR; Wed, 15 Jan 2025 03:38:53 +0000 Received: by mail-pl1-x62b.google.com with SMTP id d9443c01a7336-21654fdd5daso106217445ad.1; Tue, 14 Jan 2025 19:38:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736912331; x=1737517131; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=bGaU/+TE2oGF3eimQ2KnLuagLzu4AwU2TBvptcVR7ws=; b=Zvl78yW5skM/SqotXWtTUUPIXDO/nezMZ92d7TcC9LyeO0fWcs3I2vtI0Y9u/TJmQE khKd59z6SHBL6kN2MhtmrFhIxRt6u/OFCkIpbfqo0i6CqMTOzjQL3e++M4yZlls7ABse DwRsmh7QhNlmP8OGp0UuD78+bETjAfoLGcep8r1OgmixMOy+pgIJkLgk2AgEEQ11uX1E KhVDg5YnolKjlwkxNjUhrQd681uX/opJeDHQJOBEZEKQOku30uaunMVzglfsKWFh2da3 eO35sm2iJwfdjqQZckO2C3D41294/yX1xCnMRMDoQnVDlyLH6PHWryM99yWj23P1R4m3 fU0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736912331; x=1737517131; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bGaU/+TE2oGF3eimQ2KnLuagLzu4AwU2TBvptcVR7ws=; b=dX93hyV39CIjIOnfWAcs/4nRSr4N1U84MbBt0p+5BtsbOvy8w4HiR7fIrWWlEcIsHd EJyN8yo7GYQxYz/GW6D27GLdE4MIEN1Mh3MmHeov93ezs69jRqF8szhiHx8MFBCmcMQ1 7Mf2cH70sdtfBjCmr3Nzu9h5D058c+8bWfZSxKUbBYKL0HnQRhJTc8HulhIax5DdQKYN kGOZfvonmjrp3Gsl/aT0IvFmdCIaZrz61+gF8pDwbQLlGpvKg6hdY5Xa/4tyrdvek/o7 KbbgjNV8vB4etPUBdiLMvfBtR4PIR0FXNgq4jd6zeSzxVEdX73XWar337Ce9BwSeIuN8 UhtQ== X-Forwarded-Encrypted: i=1; AJvYcCUoj7hxkpqvgg6gwKWsr8z9SEotYzfQrVMTPWsCabz9ukiY1LXon9Ec+LAO/LdcCiPWe/+DnRSztvOAfy6ER5c3@lists.infradead.org, AJvYcCVg71+9Urjgcvpr89+aVOc45AwQZZY5lxLdqORCEN7biSIjfJlg1v5uBlN2RTZIYgypMaprMr4bWnweubU=@lists.infradead.org X-Gm-Message-State: AOJu0YzfY49KWhP+4P2D3idDQzWcRMfXQYpi/fh0UZzFNq+SM3/ltGn0 NGm7C/AEK/bzWxkxL6uXJznz+k0PBcqNTE/c5fitwlNnkB8S3WyI X-Gm-Gg: ASbGnctgktpoDqqv64vMRzgqgkyk2XOgg0/5DPCZjAZxhC9USYkbDvhsQHk4ft/J7Iu XZIce2q5bw085OjM6opv6x3dqU4+IlguaRHlhBg+U8rOu7BLXCZ+08rpWgDmTS394ztTS0A5dn6 7wD+CPH8JFjucNACPNRgmdVv4Q/oy/ijRG9RObh8CnKH3ytbu/hsfemPdCoU3UurrxnwR2VGHDG /hgPi5FGVaSnTuvW1lBPkDLPrOHTYbWrHMh+FdxVBjOtx2eS6Pq31psvYLyES7DRL84n/4Rvo0G QeXH3J5P X-Google-Smtp-Source: AGHT+IEqw72Wem1/NpwpTk58spR/v0PCzoGmmngV+d3HWDRVCCL9vc8Xbq25OJt+gLxiUfAdXr2j8w== X-Received: by 2002:a17:902:f644:b0:212:40e0:9562 with SMTP id d9443c01a7336-21a83f69651mr417875835ad.25.1736912331517; Tue, 14 Jan 2025 19:38:51 -0800 (PST) Received: from Barrys-MBP.hub ([2407:7000:af65:8200:e5d5:b870:ca9b:78f8]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21a9f10dffbsm73368195ad.49.2025.01.14.19.38.45 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 14 Jan 2025 19:38:51 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: 21cnbao@gmail.com, baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, ioworker0@gmail.com, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, lorenzo.stoakes@oracle.com, ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org, ying.huang@intel.com, zhengtangquan@oppo.com Subject: [PATCH v3 3/4] mm: Support batched unmap for lazyfree large folios during reclamation Date: Wed, 15 Jan 2025 16:38:07 +1300 Message-Id: <20250115033808.40641-4-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250115033808.40641-1-21cnbao@gmail.com> References: <20250115033808.40641-1-21cnbao@gmail.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250114_193852_329771_53FC6B22 X-CRM114-Status: GOOD ( 18.02 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Barry Song Currently, the PTEs and rmap of a large folio are removed one at a time. This is not only slow but also causes the large folio to be unnecessarily added to deferred_split, which can lead to races between the deferred_split shrinker callback and memory reclamation. This patch releases all PTEs and rmap entries in a batch. Currently, it only handles lazyfree large folios. The below microbench tries to reclaim 128MB lazyfree large folios whose sizes are 64KiB: #include #include #include #include #define SIZE 128*1024*1024 // 128 MB unsigned long read_split_deferred() { FILE *file = fopen("/sys/kernel/mm/transparent_hugepage" "/hugepages-64kB/stats/split_deferred", "r"); if (!file) { perror("Error opening file"); return 0; } unsigned long value; if (fscanf(file, "%lu", &value) != 1) { perror("Error reading value"); fclose(file); return 0; } fclose(file); return value; } int main(int argc, char *argv[]) { while(1) { volatile int *p = mmap(0, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); memset((void *)p, 1, SIZE); madvise((void *)p, SIZE, MADV_FREE); clock_t start_time = clock(); unsigned long start_split = read_split_deferred(); madvise((void *)p, SIZE, MADV_PAGEOUT); clock_t end_time = clock(); unsigned long end_split = read_split_deferred(); double elapsed_time = (double)(end_time - start_time) / CLOCKS_PER_SEC; printf("Time taken by reclamation: %f seconds, split_deferred: %ld\n", elapsed_time, end_split - start_split); munmap((void *)p, SIZE); } return 0; } w/o patch: ~ # ./a.out Time taken by reclamation: 0.177418 seconds, split_deferred: 2048 Time taken by reclamation: 0.178348 seconds, split_deferred: 2048 Time taken by reclamation: 0.174525 seconds, split_deferred: 2048 Time taken by reclamation: 0.171620 seconds, split_deferred: 2048 Time taken by reclamation: 0.172241 seconds, split_deferred: 2048 Time taken by reclamation: 0.174003 seconds, split_deferred: 2048 Time taken by reclamation: 0.171058 seconds, split_deferred: 2048 Time taken by reclamation: 0.171993 seconds, split_deferred: 2048 Time taken by reclamation: 0.169829 seconds, split_deferred: 2048 Time taken by reclamation: 0.172895 seconds, split_deferred: 2048 Time taken by reclamation: 0.176063 seconds, split_deferred: 2048 Time taken by reclamation: 0.172568 seconds, split_deferred: 2048 Time taken by reclamation: 0.171185 seconds, split_deferred: 2048 Time taken by reclamation: 0.170632 seconds, split_deferred: 2048 Time taken by reclamation: 0.170208 seconds, split_deferred: 2048 Time taken by reclamation: 0.174192 seconds, split_deferred: 2048 ... w/ patch: ~ # ./a.out Time taken by reclamation: 0.074231 seconds, split_deferred: 0 Time taken by reclamation: 0.071026 seconds, split_deferred: 0 Time taken by reclamation: 0.072029 seconds, split_deferred: 0 Time taken by reclamation: 0.071873 seconds, split_deferred: 0 Time taken by reclamation: 0.073573 seconds, split_deferred: 0 Time taken by reclamation: 0.071906 seconds, split_deferred: 0 Time taken by reclamation: 0.073604 seconds, split_deferred: 0 Time taken by reclamation: 0.075903 seconds, split_deferred: 0 Time taken by reclamation: 0.073191 seconds, split_deferred: 0 Time taken by reclamation: 0.071228 seconds, split_deferred: 0 Time taken by reclamation: 0.071391 seconds, split_deferred: 0 Time taken by reclamation: 0.071468 seconds, split_deferred: 0 Time taken by reclamation: 0.071896 seconds, split_deferred: 0 Time taken by reclamation: 0.072508 seconds, split_deferred: 0 Time taken by reclamation: 0.071884 seconds, split_deferred: 0 Time taken by reclamation: 0.072433 seconds, split_deferred: 0 Time taken by reclamation: 0.071939 seconds, split_deferred: 0 ... Signed-off-by: Barry Song --- mm/rmap.c | 47 +++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 41 insertions(+), 6 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index abeb9fcec384..be1978d2712d 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1642,6 +1642,25 @@ void folio_remove_rmap_pmd(struct folio *folio, struct page *page, #endif } +/* We support batch unmapping of PTEs for lazyfree large folios */ +static inline bool can_batch_unmap_folio_ptes(unsigned long addr, + struct folio *folio, pte_t *ptep) +{ + const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; + int max_nr = folio_nr_pages(folio); + pte_t pte = ptep_get(ptep); + + if (!folio_test_anon(folio) || folio_test_swapbacked(folio)) + return false; + if (pte_none(pte) || pte_unused(pte) || !pte_present(pte)) + return false; + if (pte_pfn(pte) != folio_pfn(folio)) + return false; + + return folio_pte_batch(folio, addr, ptep, pte, max_nr, fpb_flags, NULL, + NULL, NULL) == max_nr; +} + /* * @arg: enum ttu_flags will be passed to this argument */ @@ -1655,6 +1674,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, bool anon_exclusive, ret = true; struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; + int nr_pages = 1; unsigned long pfn; unsigned long hsz = 0; @@ -1780,6 +1800,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, hugetlb_vma_unlock_write(vma); } pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); + } else if (folio_test_large(folio) && !(flags & TTU_HWPOISON) && + can_batch_unmap_folio_ptes(address, folio, pvmw.pte)) { + nr_pages = folio_nr_pages(folio); + flush_cache_range(vma, range.start, range.end); + pteval = get_and_clear_full_ptes(mm, address, pvmw.pte, nr_pages, 0); + if (should_defer_flush(mm, flags)) + set_tlb_ubc_flush_pending(mm, pteval, address, + address + folio_size(folio)); + else + flush_tlb_range(vma, range.start, range.end); } else { flush_cache_page(vma, address, pfn); /* Nuke the page table entry. */ @@ -1875,7 +1905,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, * redirtied either using the page table or a previously * obtained GUP reference. */ - set_pte_at(mm, address, pvmw.pte, pteval); + set_ptes(mm, address, pvmw.pte, pteval, nr_pages); folio_set_swapbacked(folio); goto walk_abort; } else if (ref_count != 1 + map_count) { @@ -1888,10 +1918,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, * We'll come back here later and detect if the folio was * dirtied when the additional reference is gone. */ - set_pte_at(mm, address, pvmw.pte, pteval); + set_ptes(mm, address, pvmw.pte, pteval, nr_pages); goto walk_abort; } - dec_mm_counter(mm, MM_ANONPAGES); + add_mm_counter(mm, MM_ANONPAGES, -nr_pages); goto discard; } @@ -1943,13 +1973,18 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, dec_mm_counter(mm, mm_counter_file(folio)); } discard: - if (unlikely(folio_test_hugetlb(folio))) + if (unlikely(folio_test_hugetlb(folio))) { hugetlb_remove_rmap(folio); - else - folio_remove_rmap_pte(folio, subpage, vma); + } else { + folio_remove_rmap_ptes(folio, subpage, nr_pages, vma); + folio_ref_sub(folio, nr_pages - 1); + } if (vma->vm_flags & VM_LOCKED) mlock_drain_local(); folio_put(folio); + /* We have already batched the entire folio */ + if (nr_pages > 1) + goto walk_done; continue; walk_abort: ret = false; From patchwork Wed Jan 15 03:38:08 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Barry Song <21cnbao@gmail.com> X-Patchwork-Id: 13939810 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 51588C02180 for ; Wed, 15 Jan 2025 03:46:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=fd1/t5mDbPeQHy7a34ZmiJQgf3d109cpb7K2UxPoLCE=; b=U/yMzEDP9zY69D ICQP5Irx7Wu6vZcN60BP1KJEBAFqK/oJluHVqlrFqG/jqz1sPc78vwtG77j2EC5n+SHPfROLnIn/e pvvBF1a5CR0wrTcIlG4jb2jw1Dmr2s0F8XN2d5bFmVKCDO3n8V2pZDU7IRmOCA1rDdJA63CNg2QsM gDBCmsHcLmMGj5nlVDY9JzSiWBTYPc49H/gPFSW0FkanlAUy6LRP4E3kLx0zQ4s4Zzi5fZQKR28PL 00fH5GZ7osrhv2Uvty6NIudCGSxKoHb2L3z3ZwGkyXem5sOMUA9aGxLAZ1lFZooVYL8WQ2up+v6hr mnP/QHSeNki8QcT2GOlA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tXuMO-0000000AY5d-1njc; Wed, 15 Jan 2025 03:46:28 +0000 Received: from mail-pj1-x1031.google.com ([2607:f8b0:4864:20::1031]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tXuF8-0000000AWmB-2abJ; Wed, 15 Jan 2025 03:38:59 +0000 Received: by mail-pj1-x1031.google.com with SMTP id 98e67ed59e1d1-2efe25558ddso7823860a91.2; Tue, 14 Jan 2025 19:38:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736912338; x=1737517138; darn=lists.infradead.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9d8u+VrD2ONy4imzNddIFq+dSgSLZFsyXGu9GrVZVD8=; b=lfMiFfyw3pqm4eeuZ7/Bl9BsdY09hePh89KR160uYEX+dK1ipMDJQVZBJVjWy5UBKK ALIKGVAeZpvYqmBxNL1DpP9Eqj9yBLo4j7p7czMlL29DGzCxoedWk4sezp6zAYGAptYa tK0vcdGEOApBH/WgdaKX2SJ3fPAjDbqGaye6dptA6SJNZyVk6v7F5GMw3xiep3ygqyJE cTej8h0yrm1/Z1P7CFbNtjuM1vRhNaxi8O135b/iqTaXJJGNaHZn091ljeUUCkztSvJM u1rzcu+xk1HxCX3sxhYpRiHf9SMh7aJ9/91IWIG1WsXpDq3CFHJyRJw9gI1W1dL1w0Yy ymGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736912338; x=1737517138; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=9d8u+VrD2ONy4imzNddIFq+dSgSLZFsyXGu9GrVZVD8=; b=EeYFRtNBt2zxkch2pPuguIvaOk6H96ooEnQHxTzzaihUjvwCuLg55o8s4YtbTdpTN0 SzEukXYLbYePlkBFvYVO42GjzwkaOEIvQDbEjBXfaWSB1ISLMGLDwHmTn2TQQt6V2lCK 8upd5BpIbxsaBSW/kDDEUss6s9/lbnSP/u4FpHaZVHgvLa0Ws8ZrOHCrEhLnaRn6mX7C oNkxozbZyFBd2lI3+JEEMm1J4pb1OQpD6UgYw/U8d1hXsna0JqV2a6TWLpYXNurytXvp OIVVnddZbSzw7YxH9IqlRYV+US7vZgd8Z7d/beoZSCYGl0ZrCIL9biOiD6spFEvCUOCJ zOpA== X-Forwarded-Encrypted: i=1; AJvYcCUJkz71NeuYXPx4v3ZAws7v7afybjwJ26n+7kQnI/57cFdHXgw6w5PZ5XjOlqmJbRt4aq04mjMLrmenKDRAJcrB@lists.infradead.org, AJvYcCXLUGHO7N86cLSrXLyx7jM61fEZA1o2fahcEAxdRFr+U+TdcCdqI3iAe7dIHCnODUoCObnVEBYSpll6Q3g=@lists.infradead.org X-Gm-Message-State: AOJu0Ywz/mV3CDQhekQ/sV9GjGoN7AdlyEEH9iEbkI2dkOQGaqmCOtlL W40aefphmVBwkFvknju6pohUCh/vpv6w1xTFNxfVcwXw+nLFBDa3 X-Gm-Gg: ASbGnctmz0uo0HGPbPHp8igSmQ0FLJHpfpRYsbzKjyAiwptH0mPyWJxcOM5cIvic9Xx iYJd8s0N++VYiOyCL2AkWh8czZnjbM4iosfQncTVMKBauaClE8FYzzAdJ780siSgHXj4Dy/l6WM TCeSUp0yOjAj95SgpcbXTq3x0Ip6UJSxFac3YpVOsuDF/mJt3g8ye/p3o7EeHUAv2CTlzRZA9J8 MN7o+6JteETZp68pfeKYr9AGfQe38PfgdRx3cEYxOY4jIaJqNHU8c8bqhezGYqmJWTVZn460mBD NgCxHqDF X-Google-Smtp-Source: AGHT+IEgu9NCoDqW1LR96/MyezmmdQXkL7jxU3Tb2bYXbRgZkZZNqA2mJFSkzEeWX1OvoHRYK4gZ3A== X-Received: by 2002:a17:90b:2f4e:b0:2ee:f550:3848 with SMTP id 98e67ed59e1d1-2f548e98ea9mr36383031a91.5.1736912337886; Tue, 14 Jan 2025 19:38:57 -0800 (PST) Received: from Barrys-MBP.hub ([2407:7000:af65:8200:e5d5:b870:ca9b:78f8]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21a9f10dffbsm73368195ad.49.2025.01.14.19.38.51 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 14 Jan 2025 19:38:57 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: 21cnbao@gmail.com, baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, ioworker0@gmail.com, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, lorenzo.stoakes@oracle.com, ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org, ying.huang@intel.com, zhengtangquan@oppo.com Subject: [PATCH v3 4/4] mm: Avoid splitting pmd for lazyfree pmd-mapped THP in try_to_unmap Date: Wed, 15 Jan 2025 16:38:08 +1300 Message-Id: <20250115033808.40641-5-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250115033808.40641-1-21cnbao@gmail.com> References: <20250115033808.40641-1-21cnbao@gmail.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250114_193858_667304_6555BF74 X-CRM114-Status: GOOD ( 14.92 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org From: Barry Song The try_to_unmap_one() function currently handles PMD-mapped THPs inefficiently. It first splits the PMD into PTEs, copies the dirty state from the PMD to the PTEs, iterates over the PTEs to locate the dirty state, and then marks the THP as swap-backed. This process involves unnecessary PMD splitting and redundant iteration. Instead, this functionality can be efficiently managed in __discard_anon_folio_pmd_locked(), avoiding the extra steps and improving performance. The following microbenchmark redirties folios after invoking MADV_FREE, then measures the time taken to perform memory reclamation (actually set those folios swapbacked again) on the redirtied folios. #include #include #include #include #define SIZE 128*1024*1024 // 128 MB int main(int argc, char *argv[]) { while(1) { volatile int *p = mmap(0, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); memset((void *)p, 1, SIZE); madvise((void *)p, SIZE, MADV_FREE); /* redirty after MADV_FREE */ memset((void *)p, 1, SIZE); clock_t start_time = clock(); madvise((void *)p, SIZE, MADV_PAGEOUT); clock_t end_time = clock(); double elapsed_time = (double)(end_time - start_time) / CLOCKS_PER_SEC; printf("Time taken by reclamation: %f seconds\n", elapsed_time); munmap((void *)p, SIZE); } return 0; } Testing results are as below, w/o patch: ~ # ./a.out Time taken by reclamation: 0.007300 seconds Time taken by reclamation: 0.007226 seconds Time taken by reclamation: 0.007295 seconds Time taken by reclamation: 0.007731 seconds Time taken by reclamation: 0.007134 seconds Time taken by reclamation: 0.007285 seconds Time taken by reclamation: 0.007720 seconds Time taken by reclamation: 0.007128 seconds Time taken by reclamation: 0.007710 seconds Time taken by reclamation: 0.007712 seconds Time taken by reclamation: 0.007236 seconds Time taken by reclamation: 0.007690 seconds Time taken by reclamation: 0.007174 seconds Time taken by reclamation: 0.007670 seconds Time taken by reclamation: 0.007169 seconds Time taken by reclamation: 0.007305 seconds Time taken by reclamation: 0.007432 seconds Time taken by reclamation: 0.007158 seconds Time taken by reclamation: 0.007133 seconds … w/ patch ~ # ./a.out Time taken by reclamation: 0.002124 seconds Time taken by reclamation: 0.002116 seconds Time taken by reclamation: 0.002150 seconds Time taken by reclamation: 0.002261 seconds Time taken by reclamation: 0.002137 seconds Time taken by reclamation: 0.002173 seconds Time taken by reclamation: 0.002063 seconds Time taken by reclamation: 0.002088 seconds Time taken by reclamation: 0.002169 seconds Time taken by reclamation: 0.002124 seconds Time taken by reclamation: 0.002111 seconds Time taken by reclamation: 0.002224 seconds Time taken by reclamation: 0.002297 seconds Time taken by reclamation: 0.002260 seconds Time taken by reclamation: 0.002246 seconds Time taken by reclamation: 0.002272 seconds Time taken by reclamation: 0.002277 seconds Time taken by reclamation: 0.002462 seconds … This patch significantly speeds up try_to_unmap_one() by allowing it to skip redirtied THPs without splitting the PMD. Suggested-by: Baolin Wang Suggested-by: Lance Yang Signed-off-by: Barry Song --- mm/huge_memory.c | 24 +++++++++++++++++------- mm/rmap.c | 13 ++++++++++--- 2 files changed, 27 insertions(+), 10 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 3d3ebdc002d5..47cc8c3f8f80 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3070,8 +3070,12 @@ static bool __discard_anon_folio_pmd_locked(struct vm_area_struct *vma, int ref_count, map_count; pmd_t orig_pmd = *pmdp; - if (folio_test_dirty(folio) || pmd_dirty(orig_pmd)) + if (pmd_dirty(orig_pmd)) + folio_set_dirty(folio); + if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE)) { + folio_set_swapbacked(folio); return false; + } orig_pmd = pmdp_huge_clear_flush(vma, addr, pmdp); @@ -3098,8 +3102,15 @@ static bool __discard_anon_folio_pmd_locked(struct vm_area_struct *vma, * * The only folio refs must be one from isolation plus the rmap(s). */ - if (folio_test_dirty(folio) || pmd_dirty(orig_pmd) || - ref_count != map_count + 1) { + if (pmd_dirty(orig_pmd)) + folio_set_dirty(folio); + if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE)) { + folio_set_swapbacked(folio); + set_pmd_at(mm, addr, pmdp, orig_pmd); + return false; + } + + if (ref_count != map_count + 1) { set_pmd_at(mm, addr, pmdp, orig_pmd); return false; } @@ -3119,12 +3130,11 @@ bool unmap_huge_pmd_locked(struct vm_area_struct *vma, unsigned long addr, { VM_WARN_ON_FOLIO(!folio_test_pmd_mappable(folio), folio); VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio); + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); + VM_WARN_ON_FOLIO(folio_test_swapbacked(folio), folio); VM_WARN_ON_ONCE(!IS_ALIGNED(addr, HPAGE_PMD_SIZE)); - if (folio_test_anon(folio) && !folio_test_swapbacked(folio)) - return __discard_anon_folio_pmd_locked(vma, addr, pmdp, folio); - - return false; + return __discard_anon_folio_pmd_locked(vma, addr, pmdp, folio); } static void remap_page(struct folio *folio, unsigned long nr, int flags) diff --git a/mm/rmap.c b/mm/rmap.c index be1978d2712d..a859c399ec7c 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1724,9 +1724,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, } if (!pvmw.pte) { - if (unmap_huge_pmd_locked(vma, pvmw.address, pvmw.pmd, - folio)) - goto walk_done; + if (folio_test_anon(folio) && !folio_test_swapbacked(folio)) { + if (unmap_huge_pmd_locked(vma, pvmw.address, pvmw.pmd, folio)) + goto walk_done; + /* + * unmap_huge_pmd_locked has either already marked + * the folio as swap-backed or decided to retain it + * due to GUP or speculative references. + */ + goto walk_abort; + } if (flags & TTU_SPLIT_HUGE_PMD) { /*