From patchwork Tue Apr 30 13:31:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13649021 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82B80C10F16 for ; Tue, 30 Apr 2024 13:31:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1622B6B0092; Tue, 30 Apr 2024 09:31:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0EBD36B0093; Tue, 30 Apr 2024 09:31:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EA7606B0096; Tue, 30 Apr 2024 09:31:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id C91F36B0092 for ; Tue, 30 Apr 2024 09:31:54 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 7A79540796 for ; Tue, 30 Apr 2024 13:31:54 +0000 (UTC) X-FDA: 82066286148.21.15C2A5D Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf29.hostedemail.com (Postfix) with ESMTP id CDBF112000D for ; Tue, 30 Apr 2024 13:31:52 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714483913; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references; bh=9/ayb5SffCcVskTX2S5qmo65YP2y/SC2L4cvVEy7kPA=; b=IF0pyKxrPddEjjBBF8pv5kYq7czGQCjDuoqiuGeLlUaTVUq0X/rAGsKQcU96BEDu3Erfnc LfeJ9Nu2/VsAiw0UByQ25ZzEs+3OVZLe8VrmO+trSpGuWGJsSl0953gKLfSHi5dWlvYZCA IgrzTdehqWrijYNt4lm5ehe6EesegK4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714483913; a=rsa-sha256; cv=none; b=PS1chILuhWuXk9Zaml6wpYCfu6Hvm0+N8OLo7NV644bG623FhgLkQT+w01YiSgaRr1HMnM mJfKW8rXJlwg38f7wTubH88YJC8Ghu4a21Niks78wD8dAu+8ejfcZxQoemHqVfjqrUVzqO XvsUJfqx6eObmKCjrrCjUVhM+iWs1dU= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; spf=pass (imf29.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 33C712F4; Tue, 30 Apr 2024 06:32:18 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3A9F63F793; Tue, 30 Apr 2024 06:31:50 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Mark Rutland , Anshuman Khandual , Andrew Morton , Zi Yan , "Aneesh Kumar K.V" Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH v2] arm64/mm: pmd_mkinvalid() must handle swap pmds Date: Tue, 30 Apr 2024 14:31:38 +0100 Message-Id: <20240430133138.732088-1-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Rspamd-Queue-Id: CDBF112000D X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: ub4jafu84rjmpuh4fdxneh7es5fee96a X-HE-Tag: 1714483912-657142 X-HE-Meta: U2FsdGVkX184K4xXmRS8LOeWah4g2vzRx5uOuj9pLDXh48vxeVBUiu03xt3qWCikDnUPZ0l4/E2SonpuDz8GbIbV42F/bvJtxsc9W8nDWMiSA6APB/N60cPvv4RuvIcLAgZpVVfaoLCNFmcLjf1nDxKhRMRzGT0B+4oCrKDW5XhQYqULz03Nb5+y/f9XsLfsHDvKXBue9HYYDIl3RrleY/oiI99NZspupBf4aqt8HHVIvnz9KDP0D+Xq/AKdbNTmnH2KGuZshRDY2MA0rI9GeEJ9Tr+s2mabAa6rjFlrXUKNfpyzYuVvMnPGyWgH4wh7/U2JIxjx3gvxLvUBvPZ0lo+UURVO0hCzQYjzvzuvSIXgE2ZuhizAeXqbn7dLtfuIJZnibgK/5vIjIHjguC6PO9YgcrGoQVlLtqxjsKcinggPavgeaeFnh38/v7lmcwPvQrhQLrR9tAOf+4npk26w+ef3dmTx8XpCn2ZTuHnR4FFAGrLSByqz/C0ZdKk+wafi9EtVZOf04vlGPTn3qDno9m9TZdNUyG4lBUgw383QzZAk5pIrN38bBimAQ//Ej/7xh3zf36o0CV8C2rK0lCfAUD4ozNdmlzPcrFvz9CRiZe1qmtrVZjFxROHbMK2sLaccA6vLavClaCMUAi/4mk3pCVi24EfGgs/93vzShR4P6bkbnQrw13UyonGB1k0JyZ3jTsd/JKE03Q9eDqjp2yMQqf0H6NIxFTyM2W6dnx2p6aWgUn8q1j/EZoCepHa+sroD8sMn3AmJHjTIb3qSOldqfd3aJ/RXOaxlKi94FndZfvQ67CMhXtwKXDDOG8wIF3g0zPzLUad/QgILSdcItKTKqeDmaU6ai2VB+vp567+p5LQjhfSFTr6hq/sC87NVlCKf4aQ87xEQC1U= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: __split_huge_pmd_locked() can be called for a present THP, devmap or (non-present) migration entry. It calls pmdp_invalidate() unconditionally on the pmdp and only determines if it is present or not based on the returned old pmd. But arm64's pmd_mkinvalid(), called by pmdp_invalidate(), unconditionally sets the PMD_PRESENT_INVALID flag, which causes future pmd_present() calls to return true - even for a swap pmd. Therefore any lockless pgtable walker could see the migration entry pmd in this state and start interpretting the fields (e.g. pmd_pfn()) as if it were present, leading to BadThings (TM). GUP-fast appears to be one such lockless pgtable walker. While the obvious fix is for core-mm to avoid such calls for non-present pmds (pmdp_invalidate() will also issue TLBI which is not necessary for this case either), all other arches that implement pmd_mkinvalid() do it in such a way that it is robust to being called with a non-present pmd. So it is simpler and safer to make arm64 robust too. This approach means we can even add tests to debug_vm_pgtable.c to validate the required behaviour. This is a theoretical bug found during code review. I don't have any test case to trigger it in practice. Cc: stable@vger.kernel.org Fixes: 53fa117bb33c ("arm64/mm: Enable THP migration") Signed-off-by: Ryan Roberts Acked-by: Will Deacon --- Hi all, v1 of this fix [1] took the approach of fixing core-mm to never call pmdp_invalidate() on a non-present pmd. But Zi Yan highlighted that only arm64 suffers this problem; all other arches are robust. So his suggestion was to instead make arm64 robust in the same way and add tests to validate it. Despite my stated reservations in the context of the v1 discussion, having thought on it for a bit, I now agree with Zi Yan. Hence this post. Andrew has v1 in mm-unstable at the moment, so probably the best thing to do is remove it from there and have this go in through the arm64 tree? Assuming there is agreement that this approach is right one. This applies on top of v6.9-rc5. Passes all the mm selftests on arm64. [1] https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/ Thanks, Ryan arch/arm64/include/asm/pgtable.h | 12 +++++-- mm/debug_vm_pgtable.c | 61 ++++++++++++++++++++++++++++++++ 2 files changed, 71 insertions(+), 2 deletions(-) -- 2.25.1 diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index afdd56d26ad7..7d580271a46d 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -511,8 +511,16 @@ static inline int pmd_trans_huge(pmd_t pmd) static inline pmd_t pmd_mkinvalid(pmd_t pmd) { - pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID)); - pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID)); + /* + * If not valid then either we are already present-invalid or we are + * not-present (i.e. none or swap entry). We must not convert + * not-present to present-invalid. Unbelievably, the core-mm may call + * pmd_mkinvalid() for a swap entry and all other arches can handle it. + */ + if (pmd_valid(pmd)) { + pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID)); + pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID)); + } return pmd; } diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index 65c19025da3d..7e9c387d06b0 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -956,6 +956,65 @@ static void __init hugetlb_basic_tests(struct pgtable_debug_args *args) { } #endif /* CONFIG_HUGETLB_PAGE */ #ifdef CONFIG_TRANSPARENT_HUGEPAGE +#if !defined(__HAVE_ARCH_PMDP_INVALIDATE) && defined(CONFIG_ARCH_ENABLE_THP_MIGRATION) +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args) +{ + unsigned long max_swap_offset; + swp_entry_t swp_set, swp_clear, swp_convert; + pmd_t pmd_set, pmd_clear; + + /* + * See generic_max_swapfile_size(): probe the maximum offset, then + * create swap entry will all possible bits set and a swap entry will + * all bits clear. + */ + max_swap_offset = swp_offset(pmd_to_swp_entry(swp_entry_to_pmd(swp_entry(0, ~0UL)))); + swp_set = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset); + swp_clear = swp_entry(0, 0); + + /* Convert to pmd. */ + pmd_set = swp_entry_to_pmd(swp_set); + pmd_clear = swp_entry_to_pmd(swp_clear); + + /* + * Sanity check that the pmds are not-present, not-huge and swap entry + * is recoverable without corruption. + */ + WARN_ON(pmd_present(pmd_set)); + WARN_ON(pmd_trans_huge(pmd_set)); + swp_convert = pmd_to_swp_entry(pmd_set); + WARN_ON(swp_type(swp_set) != swp_type(swp_convert)); + WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert)); + WARN_ON(pmd_present(pmd_clear)); + WARN_ON(pmd_trans_huge(pmd_clear)); + swp_convert = pmd_to_swp_entry(pmd_clear); + WARN_ON(swp_type(swp_clear) != swp_type(swp_convert)); + WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert)); + + /* Now invalidate the pmd. */ + pmd_set = pmd_mkinvalid(pmd_set); + pmd_clear = pmd_mkinvalid(pmd_clear); + + /* + * Since its a swap pmd, invalidation should effectively be a noop and + * the checks we already did should give the same answer. Check the + * invalidation didn't corrupt any fields. + */ + WARN_ON(pmd_present(pmd_set)); + WARN_ON(pmd_trans_huge(pmd_set)); + swp_convert = pmd_to_swp_entry(pmd_set); + WARN_ON(swp_type(swp_set) != swp_type(swp_convert)); + WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert)); + WARN_ON(pmd_present(pmd_clear)); + WARN_ON(pmd_trans_huge(pmd_clear)); + swp_convert = pmd_to_swp_entry(pmd_clear); + WARN_ON(swp_type(swp_clear) != swp_type(swp_convert)); + WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert)); +} +#else +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args) { } +#endif /* !__HAVE_ARCH_PMDP_INVALIDATE && CONFIG_ARCH_ENABLE_THP_MIGRATION */ + static void __init pmd_thp_tests(struct pgtable_debug_args *args) { pmd_t pmd; @@ -982,6 +1041,8 @@ static void __init pmd_thp_tests(struct pgtable_debug_args *args) WARN_ON(!pmd_trans_huge(pmd_mkinvalid(pmd_mkhuge(pmd)))); WARN_ON(!pmd_present(pmd_mkinvalid(pmd_mkhuge(pmd)))); #endif /* __HAVE_ARCH_PMDP_INVALIDATE */ + + swp_pmd_mkinvalid_tests(args); } #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD