From patchwork Tue Apr 18 14:21:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 13215768 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1083C77B75 for ; Tue, 18 Apr 2023 14:21:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3AD716B0071; Tue, 18 Apr 2023 10:21:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 336888E0002; Tue, 18 Apr 2023 10:21:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1D69A8E0001; Tue, 18 Apr 2023 10:21:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 06EF56B0071 for ; Tue, 18 Apr 2023 10:21:23 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id C85CE402B4 for ; Tue, 18 Apr 2023 14:21:22 +0000 (UTC) X-FDA: 80694724404.12.7B8F1CC Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf26.hostedemail.com (Postfix) with ESMTP id BFBD8140013 for ; Tue, 18 Apr 2023 14:21:19 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=MTr2aCvX; spf=pass (imf26.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681827681; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=dpMEk+AE9gNBV5EZL8jLZUYxXkab39JHIZx9DZ7EuD4=; b=H0yNQK6KICdFa/BGc03ly5SXEBeL+MP3O/Qmbk3dTnxNYxd2qwytB2Uyh/w3yqcpWJc5On p2iG2rqe6AxztYxnj9vQQvFaubrgjqURw9jf2a0koHA0m/W4JklR7UbMTl8h+fMfSy1Alp BggUsU/NZztBI4Dztr+vrMQVOIneIL4= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=MTr2aCvX; spf=pass (imf26.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681827681; a=rsa-sha256; cv=none; b=DZHxdbqo67nBiS9ZgMmi8RvMKdWD6LWoc8PA27AtbscdY+KU6pdCUWHXN+DTP4OEkt3nzU Saht5DRClMgncLAWRhJsv10CDfv0TIcwvj/So0PQ2B8pVDmb7YxSeWCH+Pvq+OsDoa7fR4 T+eG+Ij/oENqM6Dyp7h316TqsriZCl4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681827679; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=dpMEk+AE9gNBV5EZL8jLZUYxXkab39JHIZx9DZ7EuD4=; b=MTr2aCvXY6/rqW9mScszciqvgKeaHWBvqi8FBuxE1qhWFBvz+qPci9WmkfmKXs+DajUQhO zHEJDr+iaSsdY3wWi5KEiA3SoC7sHe5SDXxHbOwNOfEb/LcptEjPO/RlxydEpw3Pflmbtm iuyjDiFj9aQCI+Lu/UFHpLkTtK1rWP0= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-169-CACWkmrXMQK8fVt49YL9tA-1; Tue, 18 Apr 2023 10:21:16 -0400 X-MC-Unique: CACWkmrXMQK8fVt49YL9tA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E44AB28135B8; Tue, 18 Apr 2023 14:21:15 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.149]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4C57A1121314; Tue, 18 Apr 2023 14:21:14 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , Mel Gorman , Peter Xu Subject: [PATCH mm-unstable v1] mm: don't check VMA write permissions if the PTE/PMD indicates write permissions Date: Tue, 18 Apr 2023 16:21:13 +0200 Message-Id: <20230418142113.439494-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Stat-Signature: y39eyg3di5f77b7u4srt1uebc8izanf3 X-Rspam-User: X-Rspamd-Queue-Id: BFBD8140013 X-Rspamd-Server: rspam06 X-HE-Tag: 1681827679-218026 X-HE-Meta: U2FsdGVkX18tsAqCs30Ggvakvp8uv3Q+snd+2Fz/miaoXlKacnEFwjwsAbtdWvtuCzbIfaiYTf7AP/JOnONME+LRaJ9mwxf6lHsGMCIIWWrTcKOJZgmG5FKRGr3QgCGWK3+wSgFFKGfM2N7AK8IKnHASih/hYzZt0ZxsvVxA1cqGsRl8p48wlZ1Xn+gdZQr6oGNjqgwj5a7p46WugKXkGDokZSdQP7epr9J9JYzcZnxCsGRWEiHZ/5igTL7hq5LdS8UYKEinBqIMLbLTbS64vmgV5K/H08eQuCRYfG3XiK+HKcCWqzWSExrEAKsaeYSmZtl79yET8Wh40rz6jZ2cYT9STgexVP2KWbvRNIohz19h3Gus5AfDP4j61pe0HMXMSA2fEofFPYb9BBcV3XCEyX00CbYo+fn0tdF0PKncMmKraDkJmPkkN/eLCEoQTlYBTNXlm2AtXbbYQZ4b4LEVtiT0Fu0/0fUBK38Y9E2ofqYGSZyvgfFYQZEshOSdCBAAHJI4VYT3U4rWVxrWi9MToJrteJMzU7/p7/1SVYT5GmOWsLsDQXJRTGvLcAjWFmhxZbDiREtroFbt8rkxsDoojsrPGh5tlNDwyVaa8FTLiQ1xW3T9n8O4IVRKMCAv0QnByHcT3ruduzXRy2b7ZCIc12V9TFOZwjO14ZmT13btXMmtHDR2CSoN30S3+IUf09L1htWgZpkYRGdlH1FkxDEvsBot6FSq+1BrX2XSV3zmHt3j+9xFGwshnf0ILgdO5g6ozutpOkMw9FOw5SfsVAxahRIxTcdOzS3PKSGxZkji+Qz81u0BnKv9tGTlOHiWCdDHyN8CgAEhPcO7rUo/8/WRQjkHr6fGhGGmMr/c/PLx0fCitT2fvFsC2elW/MOPsFNjwWG67jqQBMVxBBZ+OUXHUS4u0boLgxhvMqTH325SrMbnuY6dliz4r8w8ogshbrbwxN5KwjAfK9qX072QlMg t3xSvuLT 5jRh7tDbk7nGOfJ5pV+aRYT0TeWFQHu1vGK8C2fxebjqecWa3kr2Q1ulnCWQ+1aMwdS1iJ9Zvo5zV+1dTZhtHP7Hp8oU7mYPKd1ArwDdC9gpLq34F8+1c2LyU/LiWOlN2pYGOa9AxUVxiFgeXGw5tG6YM4MAUpsFD1tO5yQq9O9SKG20GDDIQ9T/Xf5IQn0NQ+d/lZTXO4ouRfynTjXJ7oV9TcUGkdJSmTuqqUoq1rUudNdwP1zgE51F+uBIvpwxzM0xtINKJbhK0pkhOvBzYpVu/A6OF8jmHQR3PodZ4qS0/le/nPl0oczsuQ/vkEsedyAOqoOk1p0tywT+8WOlxqZSel8aI+K0VGe4G3eikYjkpi8WBSGhEb+MWW8Vc4xem/01FR/YK/LHXn8k= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Staring at the comment "Recheck VMA as permissions can change since migration started" in remove_migration_pte() can result in confusion, because if the source PTE/PMD indicates write permissions, then there should be no need to check VMA write permissions when restoring migration entries or PTE-mapping a PMD. Commit d3cb8bf6081b ("mm: migrate: Close race between migration completion and mprotect") introduced the maybe_mkwrite() handling in remove_migration_pte() in 2014, stating that a race between mprotect() and migration finishing would be possible, and that we could end up with a writable PTE that should be readable. However, mprotect() code first updates vma->vm_flags / vma->vm_page_prot and then walks the page tables to (a) set all present writable PTEs to read-only and (b) convert all writable migration entries to readable migration entries. While walking the page tables and modifying the entries, migration code has to grab the PT locks to synchronize against concurrent page table modifications. Assuming migration would find a writable migration entry (while holding the PT lock) and replace it with a writable present PTE, surely mprotect() code didn't stumble over the writable migration entry yet (converting it into a readable migration entry) and would instead wait for the PT lock to convert the now present writable PTE into a read-only PTE. As mprotect() didn't finish yet, the behavior is just like migration didn't happen: a writable PTE will be converted to a read-only PTE. So it's fine to rely on the writability information in the source PTE/PMD and not recheck against the VMA as long as we're holding the PT lock to synchronize with anyone who concurrently wants to downgrade write permissions (like mprotect()) by first adjusting vma->vm_flags / vma->vm_page_prot to then walk over the page tables to adjust the page table entries. Running test cases that should reveal such races -- mprotect(PROT_READ) racing with page migration or THP splitting -- for multiple hours did not reveal an issue with this cleanup. Cc: Andrew Morton Cc: Mel Gorman Cc: Peter Xu Signed-off-by: David Hildenbrand Acked-by: Kirill A. Shutemov Reviewed-by: Alistair Popple --- This is a follow-up cleanup to [1]: [PATCH v1 RESEND 0/6] mm: (pte|pmd)_mkdirty() should not unconditionally allow for write access I wanted to be a bit careful and write some test cases to convince myself that I am not missing something important. Of course, there is still the possibility that my test cases are buggy ;) Test cases I'm running: https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/test_mprotect_migration.c https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/test_mprotect_thp_split.c [1] https://lkml.kernel.org/r/20230411142512.438404-1-david@redhat.com --- mm/huge_memory.c | 4 ++-- mm/migrate.c | 5 +---- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index c23fa39dec92..624671aaa60d 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2234,7 +2234,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, } else { entry = mk_pte(page + i, READ_ONCE(vma->vm_page_prot)); if (write) - entry = maybe_mkwrite(entry, vma); + entry = pte_mkwrite(entry); if (anon_exclusive) SetPageAnonExclusive(page + i); if (!young) @@ -3271,7 +3271,7 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new) if (pmd_swp_soft_dirty(*pvmw->pmd)) pmde = pmd_mksoft_dirty(pmde); if (is_writable_migration_entry(entry)) - pmde = maybe_pmd_mkwrite(pmde, vma); + pmde = pmd_mkwrite(pmde); if (pmd_swp_uffd_wp(*pvmw->pmd)) pmde = pmd_mkuffd_wp(pmde); if (!is_migration_entry_young(entry)) diff --git a/mm/migrate.c b/mm/migrate.c index 5d95e09b1618..02cace7955d4 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -213,16 +213,13 @@ static bool remove_migration_pte(struct folio *folio, if (pte_swp_soft_dirty(*pvmw.pte)) pte = pte_mksoft_dirty(pte); - /* - * Recheck VMA as permissions can change since migration started - */ entry = pte_to_swp_entry(*pvmw.pte); if (!is_migration_entry_young(entry)) pte = pte_mkold(pte); if (folio_test_dirty(folio) && is_migration_entry_dirty(entry)) pte = pte_mkdirty(pte); if (is_writable_migration_entry(entry)) - pte = maybe_mkwrite(pte, vma); + pte = pte_mkwrite(pte); else if (pte_swp_uffd_wp(*pvmw.pte)) pte = pte_mkuffd_wp(pte);