From patchwork Mon Dec 12 13:02:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13071097 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95F6EC4332F for ; Mon, 12 Dec 2022 13:02:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 07A488E0003; Mon, 12 Dec 2022 08:02:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 02A598E0002; Mon, 12 Dec 2022 08:02:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E0D218E0003; Mon, 12 Dec 2022 08:02:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id CF8C18E0002 for ; Mon, 12 Dec 2022 08:02:27 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 86EF61205E5 for ; Mon, 12 Dec 2022 13:02:27 +0000 (UTC) X-FDA: 80233667934.04.00B228D Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf20.hostedemail.com (Postfix) with ESMTP id A8A3B1C0019 for ; Mon, 12 Dec 2022 13:02:25 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=gs9bWw77; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf20.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670850146; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=dnOWOUcC/yhSsD6LXwOHnegXH6YrG1Sx79O9enROpBk=; b=oUqWlSWJgex9xEiohks8oXEztTRQLyQ241dO5Sc1Pgh1jjlhBrvJXONrNyDyvbYwG0QNcr K4qxGaSpk9AtZEXFY4g/IZWIEHC+myvu2KJHGC852AS1GJc2XYGHD6DvcIqluU2qNrniLn XA8obuKbVG+oEdDnEql9434qiBgtPgM= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=gs9bWw77; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf20.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670850146; a=rsa-sha256; cv=none; b=6P7b7bdTFewILptjxJSX/QI6PzQFK9Avq4UmlgdbjDN0e5O0PN6d88EbKJ7llZ8Wq3FhG5 /WcPzUkClVyNm5+sYKUZxeKMYKdvbGpYjHHBtqp8/Egu9CFpiXedsug8JA5fkkkX4Tkib3 4o0qjEq2y2ID9ZERB2oKm1PJVBmsQAQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1670850144; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=dnOWOUcC/yhSsD6LXwOHnegXH6YrG1Sx79O9enROpBk=; b=gs9bWw77ktyRCOd73G1Ahywzh6f2MW4+j95CDevw8XVjsRvUkDL1woUMOlj2Xr+60j239r 49Xfx0CxLt3vLUNKRnvwgMIXAutmcFLQVmfu85qoBtrRFcOrxaX8HAoFtcz3wLd0G6WYT9 HSMsySMeiJHlJWoQZnkSrSOMTTMHjb4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-19-ViqUhs0GP_C8R75eOVdxQQ-1; Mon, 12 Dec 2022 08:02:21 -0500 X-MC-Unique: ViqUhs0GP_C8R75eOVdxQQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7564B806004; Mon, 12 Dec 2022 13:02:20 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.193.144]) by smtp.corp.redhat.com (Postfix) with ESMTP id A3BB340ED76C; Mon, 12 Dec 2022 13:02:16 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, sparclinux@vger.kernel.org, David Hildenbrand , Andrew Morton , "David S. Miller" , Peter Xu , Hev , Anatoly Pugachev , Raghavendra K T , Thorsten Leemhuis , Mike Kravetz , "Kirill A. Shutemov" , Juergen Gross Subject: [PATCH v1] sparc/mm: don't unconditionally set HW writable bit when setting PTE dirty on 64bit Date: Mon, 12 Dec 2022 14:02:13 +0100 Message-Id: <20221212130213.136267-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: A8A3B1C0019 X-Stat-Signature: rsfnfp46kpmjqtabppwmanst8gf6g3eq X-HE-Tag: 1670850145-919057 X-HE-Meta: U2FsdGVkX1/tK06WKzASrfUOg7/TI/A6nGbyWcdhEgyCazApGUme1m8zoKHaHhRQV+LwLv62rEDy7h66Or4LXH3iReoOn5tC4zp2nMPbDMdTHhLTQbzlitqKT+yjJ3Szzy9UGgJkebKdU4urX/kI6yxum9s9UnDCa+BcU8VmeZaH0uzCAt5Tf14c6i1K3CZjTNNypF3xvF6nxOoZL3GcXgJDYRDQUr8qESHctQv7GauZVOSFIfCRYZ0upNgAsDS9Xy72oUPbH1cKvQLH4iE38FQXq4WA0QvkoutljE6h27atpDfms8dGC6FrfIcvwmDEPv1nzz0H0+Zy00Os5cfhIpn+jVtgtQGwmXiepHqdw+caWQ84AOGBaLDyFyJLja1o1nrCq7L5P+reoJh5nP5zBivj8FV5NdFrz5BGyE9xNkoYNtcmiNepEvZXAa81r9CTpnzh7fp/bHSEi1GHadyVvYxDLzuCMldnbv6QM0aHuFEh2cwoC1/MWH+D7IJ70UiVGoePlHjm6dDV5PDdOap5oY+2KIDjkv/FItIPqYB+p24Prfm3K3q2KFBRFbQJ/qOB0d3ZqdkIwcdNJ/UfPLj1O6ebO65E+QXzFXFNhG6hUKfTnkfq1pFQj7IPjC9XKPlTKPm+Nqck6XCmbZTyh+RcIoJxoSHWCWLua2F94tUaGlwC3tiU+jgrPc5qaTrH0ZgYV5YjqLsvz60R/aI8uh/GxtR1KT8ZMUqTWLv3rnax4YzL3AMHjK6pVp4VFJcPKWN+/zdtZcpgxYnUOQQyNzG1yFPFHJl/WPkeD2CbtALEFpCJy90wAsZHq1iyOwWY9fe1HiRSK82wWxLEI4olyKGt8gxxN4nSz6O1jDPQAW7NnZXU/u9U+80O5v5G4DZ3vXHHITnxrZuZ0Z7KY7CmPq6s9D87q31l+Ap+6fRF81ZXCs0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On sparc64, there is no HW modified bit, therefore, SW tracks via a SW bit if the PTE is dirty via pte_mkdirty(). However, pte_mkdirty() currently also unconditionally sets the HW writable bit, which is wrong. pte_mkdirty() is not supposed to make a PTE actually writable, unless the SW writable bit (pte_write()) indicates that the PTE is not write-protected. Fortunately, sparc64 also defines a SW writable bit. For example, this already turned into a problem in the context of THP splitting as documented in commit 624a2c94f5b7 ("Partly revert "mm/thp: carry over dirty bit when thp splits on pmd") and might be an issue during page migration in mm/migrate.c:remove_migration_pte() as well where we: if (folio_test_dirty(folio) && is_migration_entry_dirty(entry)) pte = pte_mkdirty(pte); But more general, anything like: maybe_mkwrite(pte_mkdirty(pte), vma) code is broken on sparc64, because it will unconditionally set the HW writable bit even if the SW writable bit is not set. Simple reproducer that will result in a writable PTE after ptrace access, to highlight the problem and as an easy way to verify if it has been fixed: -------------------------------------------------------------------------- #include #include #include #include #include #include #include static void signal_handler(int sig) { if (sig == SIGSEGV) printf("[PASS] SIGSEGV generated\n"); else printf("[FAIL] wrong signal generated\n"); exit(0); } int main(void) { size_t pagesize = getpagesize(); char data = 1; off_t offs; int mem_fd; char *map; int ret; mem_fd = open("/proc/self/mem", O_RDWR); if (mem_fd < 0) { fprintf(stderr, "open(/proc/self/mem) failed: %d\n", errno); return 1; } map = mmap(NULL, pagesize, PROT_READ, MAP_PRIVATE|MAP_ANON, -1 ,0); if (map == MAP_FAILED) { fprintf(stderr, "mmap() failed: %d\n", errno); return 1; } printf("original: %x\n", *map); /* debug access */ offs = lseek(mem_fd, (uintptr_t) map, SEEK_SET); ret = write(mem_fd, &data, 1); if (ret != 1) { fprintf(stderr, "pwrite(/proc/self/mem) failed with %d: %d\n", ret, errno); return 1; } if (*map != data) { fprintf(stderr, "pwrite(/proc/self/mem) not visible\n"); return 1; } printf("ptrace: %x\n", *map); /* Install signal handler. */ if (signal(SIGSEGV, signal_handler) == SIG_ERR) { fprintf(stderr, "signal() failed\n"); return 1; } /* Ordinary access. */ *map = 2; printf("access: %x\n", *map); printf("[FAIL] SIGSEGV not generated\n"); return 0; } -------------------------------------------------------------------------- Without this commit (sun4u in QEMU): # ./reproducer original: 0 ptrace: 1 access: 2 [FAIL] SIGSEGV not generated Let's fix this by setting the HW writable bit only if both, the SW dirty bit and the SW writable bit are set. This matches, for example, how s390x handles pte_mkwrite() and pte_mkdirty() -- except, that they have to clear the _PAGE_PROTECT bit. We have to move pte_dirty() and pte_dirty() up. The code patching mechanism and handling constants > 22bit is a bit special on sparc64. With this commit (sun4u in QEMU): # ./reproducer original: 0 ptrace: 1 [PASS] SIGSEGV generated This handling seems to have been in place forever. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: Andrew Morton Cc: "David S. Miller" Cc: Peter Xu Cc: Hev Cc: Anatoly Pugachev Cc: Raghavendra K T Cc: Thorsten Leemhuis Cc: Mike Kravetz Cc: "Kirill A. Shutemov" Cc: Juergen Gross Signed-off-by: David Hildenbrand --- Only tested under QEMU with sun4u, as I cannot seem to get sun4v running in QEMU. Survives a simple debian 10 boot. This also tackles what's documented in: https://lkml.kernel.org/r/20221125185857.3110155-1-peterx@redhat.com and once loongarch also has been fixed, we might be able to remove all that special-casing. --- arch/sparc/include/asm/pgtable_64.h | 117 ++++++++++++++++------------ 1 file changed, 67 insertions(+), 50 deletions(-) base-commit: 830b3c68c1fb1e9176028d02ef86f3cf76aa2476 diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h index 3bc9736bddb1..7f2e57747563 100644 --- a/arch/sparc/include/asm/pgtable_64.h +++ b/arch/sparc/include/asm/pgtable_64.h @@ -354,6 +354,42 @@ static inline pgprot_t pgprot_noncached(pgprot_t prot) */ #define pgprot_noncached pgprot_noncached +static inline unsigned long pte_dirty(pte_t pte) +{ + unsigned long mask; + + __asm__ __volatile__( + "\n661: mov %1, %0\n" + " nop\n" + " .section .sun4v_2insn_patch, \"ax\"\n" + " .word 661b\n" + " sethi %%uhi(%2), %0\n" + " sllx %0, 32, %0\n" + " .previous\n" + : "=r" (mask) + : "i" (_PAGE_MODIFIED_4U), "i" (_PAGE_MODIFIED_4V)); + + return (pte_val(pte) & mask); +} + +static inline unsigned long pte_write(pte_t pte) +{ + unsigned long mask; + + __asm__ __volatile__( + "\n661: mov %1, %0\n" + " nop\n" + " .section .sun4v_2insn_patch, \"ax\"\n" + " .word 661b\n" + " sethi %%uhi(%2), %0\n" + " sllx %0, 32, %0\n" + " .previous\n" + : "=r" (mask) + : "i" (_PAGE_WRITE_4U), "i" (_PAGE_WRITE_4V)); + + return (pte_val(pte) & mask); +} + #if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE) pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags); #define arch_make_huge_pte arch_make_huge_pte @@ -415,28 +451,44 @@ static inline bool is_hugetlb_pte(pte_t pte) } #endif -static inline pte_t pte_mkdirty(pte_t pte) +static inline pte_t __pte_mkhwwrite(pte_t pte) { - unsigned long val = pte_val(pte), tmp; + unsigned long val = pte_val(pte); + /* + * Note: we only want to set the HW writable bit if the SW writable bit + * and the SW dirty bit are set. + */ __asm__ __volatile__( - "\n661: or %0, %3, %0\n" + "\n661: or %0, %2, %0\n" " nop\n" - "\n662: nop\n" + " .section .sun4v_1insn_patch, \"ax\"\n" + " .word 661b\n" + " or %0, %3, %0\n" + " .previous\n" + : "=r" (val) + : "0" (val), "i" (_PAGE_W_4U), "i" (_PAGE_W_4V)); + + return __pte(val); +} + +static inline pte_t pte_mkdirty(pte_t pte) +{ + unsigned long val = pte_val(pte), mask; + + __asm__ __volatile__( + "\n661: mov %1, %0\n" " nop\n" " .section .sun4v_2insn_patch, \"ax\"\n" " .word 661b\n" - " sethi %%uhi(%4), %1\n" - " sllx %1, 32, %1\n" - " .word 662b\n" - " or %1, %%lo(%4), %1\n" - " or %0, %1, %0\n" + " sethi %%uhi(%2), %0\n" + " sllx %0, 32, %0\n" " .previous\n" - : "=r" (val), "=r" (tmp) - : "0" (val), "i" (_PAGE_MODIFIED_4U | _PAGE_W_4U), - "i" (_PAGE_MODIFIED_4V | _PAGE_W_4V)); + : "=r" (mask) + : "i" (_PAGE_MODIFIED_4U), "i" (_PAGE_MODIFIED_4V)); - return __pte(val); + pte = __pte(val | mask); + return pte_write(pte) ? __pte_mkhwwrite(pte) : pte; } static inline pte_t pte_mkclean(pte_t pte) @@ -478,7 +530,8 @@ static inline pte_t pte_mkwrite(pte_t pte) : "=r" (mask) : "i" (_PAGE_WRITE_4U), "i" (_PAGE_WRITE_4V)); - return __pte(val | mask); + pte = __pte(val | mask); + return pte_dirty(pte) ? __pte_mkhwwrite(pte) : pte; } static inline pte_t pte_wrprotect(pte_t pte) @@ -581,42 +634,6 @@ static inline unsigned long pte_young(pte_t pte) return (pte_val(pte) & mask); } -static inline unsigned long pte_dirty(pte_t pte) -{ - unsigned long mask; - - __asm__ __volatile__( - "\n661: mov %1, %0\n" - " nop\n" - " .section .sun4v_2insn_patch, \"ax\"\n" - " .word 661b\n" - " sethi %%uhi(%2), %0\n" - " sllx %0, 32, %0\n" - " .previous\n" - : "=r" (mask) - : "i" (_PAGE_MODIFIED_4U), "i" (_PAGE_MODIFIED_4V)); - - return (pte_val(pte) & mask); -} - -static inline unsigned long pte_write(pte_t pte) -{ - unsigned long mask; - - __asm__ __volatile__( - "\n661: mov %1, %0\n" - " nop\n" - " .section .sun4v_2insn_patch, \"ax\"\n" - " .word 661b\n" - " sethi %%uhi(%2), %0\n" - " sllx %0, 32, %0\n" - " .previous\n" - : "=r" (mask) - : "i" (_PAGE_WRITE_4U), "i" (_PAGE_WRITE_4V)); - - return (pte_val(pte) & mask); -} - static inline unsigned long pte_exec(pte_t pte) { unsigned long mask;