From patchwork Thu Apr 18 10:57:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lance Yang X-Patchwork-Id: 13634512 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14474C4345F for ; Thu, 18 Apr 2024 10:58:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7FA7C6B0087; Thu, 18 Apr 2024 06:58:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7AAA86B0088; Thu, 18 Apr 2024 06:58:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 64BB86B0089; Thu, 18 Apr 2024 06:58:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 475F56B0087 for ; Thu, 18 Apr 2024 06:58:16 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id F24911C16C3 for ; Thu, 18 Apr 2024 10:58:15 +0000 (UTC) X-FDA: 82022353350.01.4C9913A Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) by imf16.hostedemail.com (Postfix) with ESMTP id 3F4EC180011 for ; Thu, 18 Apr 2024 10:58:14 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=huQHVHzm; spf=pass (imf16.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713437894; a=rsa-sha256; cv=none; b=jWtTCjZotTSO27oHkGByIXhjcfMSom+ZcEmpRfEI8Qbwn0i+Lof6c/+RlJU+5FqkTIDbSS 4PN8j0n33wjh6QXfmY3zI/hQovRyR3KbfXrjYhnvE3DgO2vK2W2toWkU3i3R3KCF/ka9n3 BWRC/e9YT+vkRArR+xeAqTQwr9y+G3k= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=huQHVHzm; spf=pass (imf16.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713437894; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=lDh/BDmc4GxVIIPX4vpyqLS6cxtfqtFde9O8+cyucD8=; b=ta6R8+MtlJmDunRGrVqcYYX9R3Fgo4QEJeixkR82LvgZRphLOybI6Jzp8siuG3+tKozCPe O0HCCfdDOsf79OQqTdUYU0i0sQWmSX+q5OD9JT2iugbdlTOUFJbbBeXrQAVvY4JsIvNg4m OXtFLYgrESJ9+Tp7EEB7B7WOeD/kltw= Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-6ecff9df447so725250b3a.1 for ; Thu, 18 Apr 2024 03:58:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1713437893; x=1714042693; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=lDh/BDmc4GxVIIPX4vpyqLS6cxtfqtFde9O8+cyucD8=; b=huQHVHzmLWY8nweHEKwjhHXkBQd5muzf8k98PExPWMzPG7+C82Se+R8xT6kXKWaibf LMG0ZMRfCvUgoxfIycNasJ1jTvoPC0lSyWhI+5NvvnoAknI0lefTuptyMqgVqhFGt4nf B/3gVvMKHQ3PMUOAnUkpI6b2dXaCXbRRdkmRAFG6o0HPeevaXS4gQn/Q3y4wfTiy88Hp xZl5HScMzQxAtt/L83g8rKD23ABD4kR9ES6HXXodcQHIc/0Duyz2r4fT3IKdPIKeHmSm ok720jIcDBPrTE0adhmekKycyKXVJGSH99M57cToV6E0OtJGAcZHmZaBeILid9uzxSUt 8gTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713437893; x=1714042693; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=lDh/BDmc4GxVIIPX4vpyqLS6cxtfqtFde9O8+cyucD8=; b=jNzHAtWbztnhhC+Gu8wmYYgjafhYyA/ojEqveZ+kRyBdsXO+ldMRpdI7Vfe9PYFsUM kbt0PPZBLZ05/a5vlCUZNbHxI2RKC3BTh7UJJu2qrGeVaQJ1QxZZrmeGkIh1nLVdh+WH cAsrcNLfeZO7HQDequQWc/9g2HSk/Tm7H2gskic3oLPBznMB4PQNy8/wwJbHQEp8AwuB Tf4iOE4X49kTgH5UWOeYExaMi3nrOBT6cAvvb1raYj5HdOXBOMIAxbJRs24yI+3u6gxa YBAnjOXq0te2jBiIomYIQE35mVI/qj+5XepqAuqb1GWOb9w2dRJIfui2ImgNkvlOjxRu SY1w== X-Forwarded-Encrypted: i=1; AJvYcCUFPcY0PNqzKgMoyKaPhRJLGIXJRmeso2uOCUqJgUsjBLzWnw4VTqXBFsaNBJ7JS3L3/x6Uz7eOMH9ikXKDJnyT8Kg= X-Gm-Message-State: AOJu0YzNMIiVTIjBW5UK28pcLkBSZhsi6O215K3oKMwvIgbjqq44sriN wybbYGwQf8OeJVFJGRB6Bl0mzhSkFTztNot+1R2bjTib7loyvOjD X-Google-Smtp-Source: AGHT+IFMtKbWnXmmZkOC1fnGlD0vshJ2wQkVFHJuz/Uhj+dQ7fHI58t2TX/J4UwzBWJylBQj0rjcVg== X-Received: by 2002:a05:6a00:1906:b0:6ea:b9ef:f482 with SMTP id y6-20020a056a00190600b006eab9eff482mr2986258pfi.24.1713437892925; Thu, 18 Apr 2024 03:58:12 -0700 (PDT) Received: from LancedeMBP.lan ([112.10.225.217]) by smtp.gmail.com with ESMTPSA id gd26-20020a056a00831a00b006ea923678a6sm1200487pfb.137.2024.04.18.03.58.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Apr 2024 03:58:12 -0700 (PDT) From: Lance Yang To: akpm@linux-foundation.org Cc: ryan.roberts@arm.com, david@redhat.com, 21cnbao@gmail.com, mhocko@suse.com, fengwei.yin@intel.com, zokeefe@google.com, shy828301@gmail.com, xiehuan09@gmail.com, wangkefeng.wang@huawei.com, songmuchun@bytedance.com, peterx@redhat.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Lance Yang Subject: [PATCH v9 0/4] mm/madvise: enhance lazyfreeing with mTHP in madvise_free Date: Thu, 18 Apr 2024 18:57:46 +0800 Message-Id: <20240418105750.98866-1-ioworker0@gmail.com> X-Mailer: git-send-email 2.33.1 MIME-Version: 1.0 X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 3F4EC180011 X-Stat-Signature: e6z9nt8s745p3knhwhi3nx81wfwwjbnt X-Rspam-User: X-HE-Tag: 1713437894-820388 X-HE-Meta: U2FsdGVkX18XaiYNnOaZ3UCtpFfk7dlyuuVpG0+J3vKTxOJ8IEVa1JJQZlUL49hUU8qcOnv7mNGZULKhSJCT+tXYHPcpF/Qiza9Jbrc4sgaADTsXerHcS6fuSIDMwi3M65mySjYEmv+nRVcWNOdqSMBTdWC54XlezQnL4oLAVXtP33UPACr90LiGTCTRQi36GintckAFsCJARiysgUJvsLfwh5XAAPIw/CKYOkRf5lTYnDYnaPzoD/4GCwoS4Wnfh/3bNWEk+oi2dPZTyIFuzuNHNaVpp2tXJ3vw+urkAZShzSmdJYadu6kY1dbVq+SS5wccF4n9waTVpD2Y4/S3UYJWksK9vJ+VWRYEEKcchFy25ZBuwGHzijCsEjfYrg13/T+oiSE4LUbUag6i2XJRi1TbI41PL6V9uEBZk9OdeuV9N7omp+YcHbRVkS3o//mO+DTVrKnbAmJzOo4maxt61Nula7ZqZHQpa6l/14HR7y8SV8FCylp6zCIc2JD6B+SSmvstqZPKoH4jFZ/LiZMVU6sUx/5/aejbxv4OIkflNPA9c2xGN7GU+MCKpFR45UUEZ8fvVagyDyLYsymW5XD9cFi50qO+vT51dGUZeydNYSDyw4YmdX7qm6RYPTFJ1hi62NXIWSu2su5RQB4E6CmCdl21PmFfjqYJ2fKgjcNNF9vlwNo4GIH4yTuyC7jsBP4qntPaLMqx9QlQ1cjM/uBPfLvO1VTvpWYyf2i1QF+pxsa/pWf0OWPuSzNIJlWC47l8eX5kxUh3CVPVIc4or+ivtas3zqU55KzyoJnodzEOjI94AQVYHX9Yik0ePV5tou8LZUKLklygh+fQbIxiznFS8E8vzteQKwWeYjF0Z08th27+te0HJ6gmF0+ewTtYbN/hgIA+pDAiXwHZKJGUkDiU7dFRYKh7bZ+GPFFS/P7Uo1eWnas0iAlCoEp7EzACyjfT00SHNfvnG5ELdnB8xRu jKIONIGV /Qm6RIdKRYUa/imY17Wo/FU3QhsH9hesW4mkpIyGsfb+o1eUTaZb6dgIh1uMzKwqfvyN5h3WcJtF0VklkC8iwT37aGN3XrQBNWQomXB01nT7bYil2yeo1BdLT9DJCq3dLlkiKrS8ZtICsQ99fapN6uh0zm6+PY+ljQSv16ryguw4osTdkCYoWfoQOKr98GRUld9Pi/a5c2oWm44IwTVPLczua/hCKsEerCLxoT58/XvQMJA2a+cb2tRaFLmejoENLNb2uUODnym0vBTOu8p0gwhLsbbWvvEQko+JA5e+nXC/hJQvwJtDWQH+E4jC90M3Opfp2X8IFmOOfY6Rj3aKN323JBvMoiJS/g13WKUvVSxah2/fMX5FtPG7Q1/F9F6AWirpeL+Cx9hxmhYqiKu3fgVy6pkZ4UEi8/Cm3IyL41jpjP3FZ+ifureEFqsXrh+pPZ9+GkmuFMnw1jb/BKpCGI0zuDpNdJAjmyZYB2+/pSoLdyXsg0ZkbSHvw+LxGfwPRfHWH0Pex8erwug76ky23ZQojs01pu2Jur2SRWLWb0oi9/eg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi All, This patchset adds support for lazyfreeing multi-size THP (mTHP) without needing to first split the large folio via split_folio(). However, we still need to split a large folio that is not fully mapped within the target range. If a large folio is locked or shared, or if we fail to split it, we just leave it in place and advance to the next PTE in the range. But note that the behavior is changed; previously, any failure of this sort would cause the entire operation to give up. As large folios become more common, sticking to the old way could result in wasted opportunities. Performance Testing =================== On an Intel I5 CPU, lazyfreeing a 1GiB VMA backed by PTE-mapped folios of the same size results in the following runtimes for madvise(MADV_FREE) in seconds (shorter is better): Folio Size | Old | New | Change ------------------------------------------ 4KiB | 0.590251 | 0.590259 | 0% 16KiB | 2.990447 | 0.185655 | -94% 32KiB | 2.547831 | 0.104870 | -95% 64KiB | 2.457796 | 0.052812 | -97% 128KiB | 2.281034 | 0.032777 | -99% 256KiB | 2.230387 | 0.017496 | -99% 512KiB | 2.189106 | 0.010781 | -99% 1024KiB | 2.183949 | 0.007753 | -99% 2048KiB | 0.002799 | 0.002804 | 0% --- This patchset applies against mm-unstable (6723e3b1a668). The performance numbers are from v2. I did a quick benchmark run of v9 and nothing significantly changed. Changes since v8 [8] ==================== - mm/madvise: optimize lazyfreeing with mTHP in madvise_free - Leave the split folio code as is in the caller (per David Hildenbrand) - Use cydp_flags here that will make this easier to read (per David Hildenbrand) - Pick up RB's, Thanks to Ryan! Changes since v7 [7] ==================== - mm/madvise: optimize lazyfreeing with mTHP in madvise_free - Remove the duplicated check for the mapcount (per Ryan Roberts, David Hildenbrand) - Pick up AB's and RB's. Thanks to Ryan and David! Changes since v6 [6] ==================== - Fix a bug with incorrect bitwise operations (Thanks to Ryan Roberts) - Use a cmpxchg loop to only clear one of the flags to prevent race with the HW (per Ryan Roberts) Changes since v5 [5] ==================== - Convert mkold_ptes() to clear_young_dirty_ptes() (per Ryan Roberts) - Use the __bitwise flags as the input for clear_young_dirty_ptes() (per David Hildenbrand) - Follow the pattern already established by the original code (per Ryan Roberts) Changes since v4 [4] ==================== - The first patch implements the MADV_FREE change and introduces mkold_clean_ptes() with a generic implementation. The second patch specializes mkold_clean_ptes() for arm64, providing a performance boost specific to arm64 (per Ryan Roberts) - Drop the full parameter and call ptep_get_and_clear() in mkold_clean_ptes() (per Ryan Roberts) - Keep the previous behavior that avoids locking the folio if it wasn't in the swapcache or if it wasn't dirty (per Ryan Roberts) Changes since v3 [3] ==================== - Rename refresh_full_ptes -> mkold_clean_ptes (per Ryan Roberts) - Override mkold_clean_ptes() for arm64 to make it faster (per Ryan Roberts) - Update the changelog Changes since v2 [2] ==================== - Only skip all the PTEs for nr_pages when the number of batched PTEs matches nr_pages (per Barry Song) - Change folio_pte_batch() to consume an optional *any_dirty and *any_young function (per David Hildenbrand) - Move the ptep_get_and_clear_full() loop into refresh_full_ptes() (per David Hildenbrand) - Follow a similar pattern for madvise_free_pte_range() (per Ryan Roberts) Changes since v1 [1] ==================== - Update the performance numbers - Update the changelog (per Ryan Roberts) - Check the COW folio (per Yin Fengwei) - Check if we are mapping all subpages (per Barry Song, David Hildenbrand, Ryan Roberts) [1] https://lore.kernel.org/linux-mm/20240225123215.86503-1-ioworker0@gmail.com [2] https://lore.kernel.org/linux-mm/20240307061425.21013-1-ioworker0@gmail.com [3] https://lore.kernel.org/linux-mm/20240316102952.39233-1-ioworker0@gmail.com [4] https://lore.kernel.org/linux-mm/20240402124029.47846-1-ioworker0@gmail.com [5] https://lore.kernel.org/linux-mm/20240408042437.10951-1-ioworker0@gmail.com [6] https://lore.kernel.org/linux-mm/20240413002219.71246-1-ioworker0@gmail.com [7] https://lore.kernel.org/linux-mm/20240416033457.32154-1-ioworker0@gmail.com [8] https://lore.kernel.org/linux-mm/20240417141436.77963-1-ioworker0@gmail.com Thanks, Lance Lance Yang (4): mm/madvise: introduce clear_young_dirty_ptes() batch helper mm/arm64: override clear_young_dirty_ptes() batch helper mm/memory: add any_dirty optional pointer to folio_pte_batch() mm/madvise: optimize lazyfreeing with mTHP in madvise_free arch/arm64/include/asm/pgtable.h | 55 +++++++++++++++++++++++++++++++++++++++ arch/arm64/mm/contpte.c | 29 +++++++++++++++++++++++ include/linux/mm_types.h | 9 ++++++++ include/linux/pgtable.h | 74 +++++++++++++++++++++------------------ mm/internal.h | 12 ++++++++-- mm/madvise.c | 107 +++++++++++++++++++-------------------- mm/memory.c | 4 ++-- 7 files changed, 209 insertions(+), 81 deletions(-)