From patchwork Mon Nov 15 07:55:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12618843 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44E0FC433F5 for ; Mon, 15 Nov 2021 07:55:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B745C63219 for ; Mon, 15 Nov 2021 07:55:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org B745C63219 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 46BCE6B007B; Mon, 15 Nov 2021 02:55:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 42EED6B0080; Mon, 15 Nov 2021 02:55:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 183E76B007D; Mon, 15 Nov 2021 02:55:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0222.hostedemail.com [216.40.44.222]) by kanga.kvack.org (Postfix) with ESMTP id EE07E6B007D for ; Mon, 15 Nov 2021 02:55:49 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id A90821812690F for ; Mon, 15 Nov 2021 07:55:49 +0000 (UTC) X-FDA: 78810405618.25.E2CB696 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf12.hostedemail.com (Postfix) with ESMTP id 0EB6510000AC for ; Mon, 15 Nov 2021 07:55:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636962948; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tB7XXxUDPClsvCRJSlQGXzWYmuo0RbrsIC/Jp1UIRWI=; b=OqFG+sUtl0exIxL8W5rvVekAXpDU6+KATBh7SCaekV6zSkUjVOitOz0o3HttwCVusMM5gm R1PuAknJ8N9Y6mn8+qMhkChO89+JxwujOyeP7aSHlLkCQJJyeEFnqiznLawdhZZz2xVIS/ qn+Kv0vwO9dpI6zAjJ05rGIQ7Nrg6qk= Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-211-4lnx4ZRzO7mgLcQDK0bcmQ-1; Mon, 15 Nov 2021 02:55:47 -0500 X-MC-Unique: 4lnx4ZRzO7mgLcQDK0bcmQ-1 Received: by mail-pf1-f199.google.com with SMTP id u4-20020a056a00098400b004946fc3e863so9600699pfg.8 for ; Sun, 14 Nov 2021 23:55:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=tB7XXxUDPClsvCRJSlQGXzWYmuo0RbrsIC/Jp1UIRWI=; b=wg3AkXXwUkoR1jMZrKwL3tpPOmOGyqLb6JpPZCzMCtqzpfCoPVITTk6cuHjqgkBUw9 9ldV3A/Xcxb7jFlSg1O4RONbc9xgcsw0yKJZx1+ahqexlH1Y5NgvMVFOJVyZ7Ij0fRAv t9L0a9zwKNNsbsObOVV3SEsxTC68Cq6npE3NYmtmvy0OKO2DyiStEucN3kGZQfykBqQ1 e+Ou5EMnLUTl03qquKFg95CsrgvfkdfXaa3JAfuDook1277pG9h5CsKsP8Dn65FtnLcE cpJlxpLaAabNiv6oftSIkYRSXpSC2SjtC8lwRVRHGKTsUxzK76xBa3yJ7BgAmqPc8iNd 0l8Q== X-Gm-Message-State: AOAM530IDTUdDExGg3A8Jcuz9k4HYqyGK7+8yH8zUO7aKO5J1tU2HT+i R8Qvj+xv2k1B8NscfiOQFBzlcDgUQjfa8Ug/OIbp4FrQSiiSuTe/YmvH99aFMGxWgC47j7dbEfk A6Qs71vLwGCozwEpsPuHZLFvr6wmBJUQQR+k5AAqIbjcS5Q2TRg+jAlvRFIcf X-Received: by 2002:a17:902:ced0:b0:142:189a:4284 with SMTP id d16-20020a170902ced000b00142189a4284mr33424074plg.79.1636962945587; Sun, 14 Nov 2021 23:55:45 -0800 (PST) X-Google-Smtp-Source: ABdhPJxIh24oN/aOHwQNeFngZZiEj6k+znRNmUOXfm7wYUcH1IH7Mfl4beexyoTb5Mv7N+ELT1eiqg== X-Received: by 2002:a17:902:ced0:b0:142:189a:4284 with SMTP id d16-20020a170902ced000b00142189a4284mr33424023plg.79.1636962945136; Sun, 14 Nov 2021 23:55:45 -0800 (PST) Received: from localhost.localdomain ([191.101.132.223]) by smtp.gmail.com with ESMTPSA id e10sm15792796pfv.140.2021.11.14.23.55.37 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 14 Nov 2021 23:55:44 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Axel Rasmussen , Nadav Amit , Mike Rapoport , Hugh Dickins , Mike Kravetz , "Kirill A . Shutemov" , Alistair Popple , Jerome Glisse , Matthew Wilcox , Andrew Morton , peterx@redhat.com, David Hildenbrand , Andrea Arcangeli Subject: [PATCH v6 01/23] mm: Introduce PTE_MARKER swap entry Date: Mon, 15 Nov 2021 15:55:00 +0800 Message-Id: <20211115075522.73795-2-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211115075522.73795-1-peterx@redhat.com> References: <20211115075522.73795-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 0EB6510000AC X-Stat-Signature: y8rrxfnfcwp13cske1e9y5baqmpno19a Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=OqFG+sUt; spf=none (imf12.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1636962948-290646 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch introduces a new swap entry type called PTE_MARKER. It can be installed for any pte that maps a file-backed memory when the pte is temporarily zapped, so as to maintain per-pte information. The information that kept in the pte is called a "marker". Here we define the marker as "unsigned long" just to match pgoff_t, however it will only work if it still fits in swp_offset(), which is e.g. currently 58 bits on x86_64. A new config CONFIG_PTE_MARKER is introduced too; it's by default off. A bunch of helpers are defined altogether to service the rest of the pte marker code. Signed-off-by: Peter Xu --- include/asm-generic/hugetlb.h | 9 ++++ include/linux/swap.h | 15 ++++++- include/linux/swapops.h | 78 +++++++++++++++++++++++++++++++++++ mm/Kconfig | 7 ++++ 4 files changed, 108 insertions(+), 1 deletion(-) diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h index 8e1e6244a89d..f39cad20ffc6 100644 --- a/include/asm-generic/hugetlb.h +++ b/include/asm-generic/hugetlb.h @@ -2,6 +2,9 @@ #ifndef _ASM_GENERIC_HUGETLB_H #define _ASM_GENERIC_HUGETLB_H +#include +#include + static inline pte_t mk_huge_pte(struct page *page, pgprot_t pgprot) { return mk_pte(page, pgprot); @@ -80,6 +83,12 @@ static inline int huge_pte_none(pte_t pte) } #endif +/* Please refer to comments above pte_none_mostly() for the usage */ +static inline int huge_pte_none_mostly(pte_t pte) +{ + return huge_pte_none(pte) || is_pte_marker(pte); +} + #ifndef __HAVE_ARCH_HUGE_PTE_WRPROTECT static inline pte_t huge_pte_wrprotect(pte_t pte) { diff --git a/include/linux/swap.h b/include/linux/swap.h index d1ea44b31f19..cc9adcfd666f 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -55,6 +55,19 @@ static inline int current_is_kswapd(void) * actions on faults. */ +/* + * PTE markers are used to persist information onto PTEs that are mapped with + * file-backed memories. As its name "PTE" hints, it should only be applied to + * the leaves of pgtables. + */ +#ifdef CONFIG_PTE_MARKER +#define SWP_PTE_MARKER_NUM 1 +#define SWP_PTE_MARKER (MAX_SWAPFILES + SWP_HWPOISON_NUM + \ + SWP_MIGRATION_NUM + SWP_DEVICE_NUM) +#else +#define SWP_PTE_MARKER_NUM 0 +#endif + /* * Unaddressable device memory support. See include/linux/hmm.h and * Documentation/vm/hmm.rst. Short description is we need struct pages for @@ -100,7 +113,7 @@ static inline int current_is_kswapd(void) #define MAX_SWAPFILES \ ((1 << MAX_SWAPFILES_SHIFT) - SWP_DEVICE_NUM - \ - SWP_MIGRATION_NUM - SWP_HWPOISON_NUM) + SWP_MIGRATION_NUM - SWP_HWPOISON_NUM - SWP_PTE_MARKER_NUM) /* * Magic header for a swap area. The first part of the union is diff --git a/include/linux/swapops.h b/include/linux/swapops.h index d356ab4047f7..5103d2a4ae38 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -247,6 +247,84 @@ static inline int is_writable_migration_entry(swp_entry_t entry) #endif +typedef unsigned long pte_marker; + +#define PTE_MARKER_MASK (0) + +#ifdef CONFIG_PTE_MARKER + +static inline swp_entry_t make_pte_marker_entry(pte_marker marker) +{ + return swp_entry(SWP_PTE_MARKER, marker); +} + +static inline bool is_pte_marker_entry(swp_entry_t entry) +{ + return swp_type(entry) == SWP_PTE_MARKER; +} + +static inline pte_marker pte_marker_get(swp_entry_t entry) +{ + return swp_offset(entry) & PTE_MARKER_MASK; +} + +static inline bool is_pte_marker(pte_t pte) +{ + return is_swap_pte(pte) && is_pte_marker_entry(pte_to_swp_entry(pte)); +} + +#else /* CONFIG_PTE_MARKER */ + +static inline swp_entry_t make_pte_marker_entry(pte_marker marker) +{ + /* This should never be called if !CONFIG_PTE_MARKER */ + WARN_ON_ONCE(1); + return swp_entry(0, 0); +} + +static inline bool is_pte_marker_entry(swp_entry_t entry) +{ + return false; +} + +static inline pte_marker pte_marker_get(swp_entry_t entry) +{ + return 0; +} + +static inline bool is_pte_marker(pte_t pte) +{ + return false; +} + +#endif /* CONFIG_PTE_MARKER */ + +static inline pte_t make_pte_marker(pte_marker marker) +{ + return swp_entry_to_pte(make_pte_marker_entry(marker)); +} + +/* + * This is a special version to check pte_none() just to cover the case when + * the pte is a pte marker. It existed because in many cases the pte marker + * should be seen as a none pte; it's just that we have stored some information + * onto the none pte so it becomes not-none any more. + * + * It should be used when the pte is file-backed, ram-based and backing + * userspace pages, like shmem. It is not needed upon pgtables that do not + * support pte markers at all. For example, it's not needed on anonymous + * memory, kernel-only memory (including when the system is during-boot), + * non-ram based generic file-system. It's fine to be used even there, but the + * extra pte marker check will be pure overhead. + * + * For systems configured with !CONFIG_PTE_MARKER this will be automatically + * optimized to pte_none(). + */ +static inline int pte_none_mostly(pte_t pte) +{ + return pte_none(pte) || is_pte_marker(pte); +} + static inline struct page *pfn_swap_entry_to_page(swp_entry_t entry) { struct page *p = pfn_to_page(swp_offset(entry)); diff --git a/mm/Kconfig b/mm/Kconfig index 068ce591a13a..66f23c6c2032 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -897,6 +897,13 @@ config IO_MAPPING config SECRETMEM def_bool ARCH_HAS_SET_DIRECT_MAP && !EMBEDDED +config PTE_MARKER + def_bool n + bool "Marker PTEs support" + + help + Allows to create marker PTEs for file-backed memory. + source "mm/damon/Kconfig" endmenu From patchwork Mon Nov 15 07:55:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12618845 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E6F2C433F5 for ; Mon, 15 Nov 2021 07:55:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id ED95863211 for ; Mon, 15 Nov 2021 07:55:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org ED95863211 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 929E66B007D; Mon, 15 Nov 2021 02:55:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 88BC66B0080; Mon, 15 Nov 2021 02:55:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7030E6B0081; Mon, 15 Nov 2021 02:55:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0032.hostedemail.com [216.40.44.32]) by kanga.kvack.org (Postfix) with ESMTP id 520B46B007D for ; Mon, 15 Nov 2021 02:55:57 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 11CBA83E98 for ; Mon, 15 Nov 2021 07:55:57 +0000 (UTC) X-FDA: 78810405954.09.8595A33 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf20.hostedemail.com (Postfix) with ESMTP id DAF42D0000B4 for ; Mon, 15 Nov 2021 07:55:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636962956; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XNTxE1iM+okUx3tL+hKy8xQDAnTI8ZhKWAdYtMcr0FQ=; b=UWBoYVhHBX51W/LT7+sTDR0P+UwIoa6S99e2Q93ZvbjAX0Kd2ta7ojmKcYuoBkXsSqSVBz 5a0km08M0TY7qK5JyOe5kGzzkm+cjpQi6lAFW9K8i48l31+Bkd62tYikEstnow8/Q/iutt 6a0nWVShlnsuuYrQBmYUVu6xwLgpn7Q= Received: from mail-pj1-f72.google.com (mail-pj1-f72.google.com [209.85.216.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-578-Br4MCcxUPXGDaZ9Dh-vbMA-1; Mon, 15 Nov 2021 02:55:54 -0500 X-MC-Unique: Br4MCcxUPXGDaZ9Dh-vbMA-1 Received: by mail-pj1-f72.google.com with SMTP id b8-20020a17090a10c800b001a61dff6c9dso4873731pje.5 for ; Sun, 14 Nov 2021 23:55:54 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=XNTxE1iM+okUx3tL+hKy8xQDAnTI8ZhKWAdYtMcr0FQ=; b=YDfgZt6U/78O0LPhVVC0GSfsZJA4Q2rl9pwjwgN4FlRmVYov2yYPm64PdMhJoiW0gg GDXLFKn4JbCeLlSSxP6HNHaWrRULa8KDa4PB60WAHJLHpFWO7ivrojrs26sM3x6h7S1G eykD0hoooz++4EV6z891FqZtfiT+flR114k1qMsSkJkegUu2prqgu94Mp9UIdBViXLP1 MCfzjyGr4oFQgHbSY9NFWolmR5MgRsyeumygnpD2iA6v6KJB5t9M64rOUTbpzsS81EuP H2lGradFtZTAUdRRizhgLYQD00j1S75rnbEyD0VwdJurb5012vLxwSEYVCuv/NI8Hlv8 eGLA== X-Gm-Message-State: AOAM533mP8X65+DTW+DENSPgqSutCFEGBMGzP2NIrm+plnYlTyvhInkP XWiM3IIP2FRPZMM84gh09UQbI5XaF6vt6yDAv+CxolswN7WkVo20bby3dmLqoQPYbZeA8H7CNJD AlqCMUqB6FlpldYrUS04oRifgs9P6PBwajCKXdPPCabMT/RYdhv5SINHqkWdP X-Received: by 2002:a17:902:d703:b0:140:125b:40a5 with SMTP id w3-20020a170902d70300b00140125b40a5mr33288621ply.65.1636962953318; Sun, 14 Nov 2021 23:55:53 -0800 (PST) X-Google-Smtp-Source: ABdhPJzr2k4IPsVPsWDOJS1pWtswgYrOd6P3YKvfA/vFCF6qlLE/nRrQuC+VisMGWkBR++WhCEY/sw== X-Received: by 2002:a17:902:d703:b0:140:125b:40a5 with SMTP id w3-20020a170902d70300b00140125b40a5mr33288572ply.65.1636962952914; Sun, 14 Nov 2021 23:55:52 -0800 (PST) Received: from localhost.localdomain ([191.101.132.223]) by smtp.gmail.com with ESMTPSA id e10sm15792796pfv.140.2021.11.14.23.55.45 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 14 Nov 2021 23:55:52 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Axel Rasmussen , Nadav Amit , Mike Rapoport , Hugh Dickins , Mike Kravetz , "Kirill A . Shutemov" , Alistair Popple , Jerome Glisse , Matthew Wilcox , Andrew Morton , peterx@redhat.com, David Hildenbrand , Andrea Arcangeli Subject: [PATCH v6 02/23] mm: Teach core mm about pte markers Date: Mon, 15 Nov 2021 15:55:01 +0800 Message-Id: <20211115075522.73795-3-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211115075522.73795-1-peterx@redhat.com> References: <20211115075522.73795-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: DAF42D0000B4 X-Stat-Signature: 7d9qds7i4axxhkzzgqd1tgqgtoptz5x1 Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UWBoYVhH; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf20.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com X-HE-Tag: 1636962943-569817 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch still does not use pte marker in any way, however it teaches the core mm about the pte marker idea. For example, handle_pte_marker() is introduced that will parse and handle all the pte marker faults. Many of the places are more about commenting it up - so that we know there's the possibility of pte marker showing up, and why we don't need special code for the cases. Signed-off-by: Peter Xu --- fs/userfaultfd.c | 10 ++++++---- mm/filemap.c | 5 +++++ mm/hmm.c | 2 +- mm/memcontrol.c | 8 ++++++-- mm/memory.c | 23 +++++++++++++++++++++++ mm/mincore.c | 3 ++- mm/mprotect.c | 3 +++ 7 files changed, 46 insertions(+), 8 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 22bf14ab2d16..fa24c72a849e 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -245,9 +245,10 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx, /* * Lockless access: we're in a wait_event so it's ok if it - * changes under us. + * changes under us. PTE markers should be handled the same as none + * ptes here. */ - if (huge_pte_none(pte)) + if (huge_pte_none_mostly(pte)) ret = true; if (!huge_pte_write(pte) && (reason & VM_UFFD_WP)) ret = true; @@ -326,9 +327,10 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx, pte = pte_offset_map(pmd, address); /* * Lockless access: we're in a wait_event so it's ok if it - * changes under us. + * changes under us. PTE markers should be handled the same as none + * ptes here. */ - if (pte_none(*pte)) + if (pte_none_mostly(*pte)) ret = true; if (!pte_write(*pte) && (reason & VM_UFFD_WP)) ret = true; diff --git a/mm/filemap.c b/mm/filemap.c index daa0e23a6ee6..9a7228b95b30 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3327,6 +3327,11 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf, vmf->pte += xas.xa_index - last_pgoff; last_pgoff = xas.xa_index; + /* + * NOTE: If there're PTE markers, we'll leave them to be + * handled in the specific fault path, and it'll prohibit the + * fault-around logic. + */ if (!pte_none(*vmf->pte)) goto unlock; diff --git a/mm/hmm.c b/mm/hmm.c index 842e26599238..a0f72a540dc3 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -239,7 +239,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, pte_t pte = *ptep; uint64_t pfn_req_flags = *hmm_pfn; - if (pte_none(pte)) { + if (pte_none_mostly(pte)) { required_fault = hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, 0); if (required_fault) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 781605e92015..eaddbc77aa5a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5692,10 +5692,14 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, if (pte_present(ptent)) page = mc_handle_present_pte(vma, addr, ptent); + else if (pte_none_mostly(ptent)) + /* + * PTE markers should be treated as a none pte here, separated + * from other swap handling below. + */ + page = mc_handle_file_pte(vma, addr, ptent); else if (is_swap_pte(ptent)) page = mc_handle_swap_pte(vma, ptent, &ent); - else if (pte_none(ptent)) - page = mc_handle_file_pte(vma, addr, ptent); if (!page && !ent.val) return ret; diff --git a/mm/memory.c b/mm/memory.c index e5d59a6b6479..04662b010005 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -98,6 +98,8 @@ struct page *mem_map; EXPORT_SYMBOL(mem_map); #endif +static vm_fault_t do_fault(struct vm_fault *vmf); + /* * A number of key systems in x86 including ioremap() rely on the assumption * that high_memory defines the upper bound on direct map memory, then end @@ -1380,6 +1382,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, if (unlikely(zap_skip_check_mapping(details, page))) continue; rss[mm_counter(page)]--; + } else if (is_pte_marker_entry(entry)) { + /* By default, simply drop all pte markers when zap */ } else if (!non_swap_entry(entry)) { rss[MM_SWAPENTS]--; if (unlikely(!free_swap_and_cache(entry))) @@ -3448,6 +3452,23 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf) return 0; } +static vm_fault_t handle_pte_marker(struct vm_fault *vmf) +{ + swp_entry_t entry = pte_to_swp_entry(vmf->orig_pte); + unsigned long marker = pte_marker_get(entry); + + /* + * PTE markers should always be with file-backed memories, and the + * marker should never be empty. If anything weird happened, the best + * thing to do is to kill the process along with its mm. + */ + if (WARN_ON_ONCE(vma_is_anonymous(vmf->vma) || !marker)) + return VM_FAULT_SIGBUS; + + /* TODO: handle pte markers */ + return 0; +} + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -3484,6 +3505,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) ret = vmf->page->pgmap->ops->migrate_to_ram(vmf); } else if (is_hwpoison_entry(entry)) { ret = VM_FAULT_HWPOISON; + } else if (is_pte_marker_entry(entry)) { + ret = handle_pte_marker(vmf); } else { print_bad_pte(vma, vmf->address, vmf->orig_pte, NULL); ret = VM_FAULT_SIGBUS; diff --git a/mm/mincore.c b/mm/mincore.c index 9122676b54d6..736869f4b409 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -121,7 +121,8 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, for (; addr != end; ptep++, addr += PAGE_SIZE) { pte_t pte = *ptep; - if (pte_none(pte)) + /* We need to do cache lookup too for pte markers */ + if (pte_none_mostly(pte)) __mincore_unmapped_range(addr, addr + PAGE_SIZE, vma, vec); else if (pte_present(pte)) diff --git a/mm/mprotect.c b/mm/mprotect.c index e552f5e0ccbd..890bc1f9ca24 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -173,6 +173,9 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, newpte = pte_swp_mksoft_dirty(newpte); if (pte_swp_uffd_wp(oldpte)) newpte = pte_swp_mkuffd_wp(newpte); + } else if (is_pte_marker_entry(entry)) { + /* Skip it, the same as none pte */ + continue; } else { newpte = oldpte; } From patchwork Mon Nov 15 07:55:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12618847 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A09CC433FE for ; Mon, 15 Nov 2021 07:56:05 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4973E6321A for ; Mon, 15 Nov 2021 07:56:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 4973E6321A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id E07E46B0080; Mon, 15 Nov 2021 02:56:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D91BB6B0081; Mon, 15 Nov 2021 02:56:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C324F6B0082; Mon, 15 Nov 2021 02:56:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0146.hostedemail.com [216.40.44.146]) by kanga.kvack.org (Postfix) with ESMTP id AF9486B0080 for ; Mon, 15 Nov 2021 02:56:04 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 6975083EAE for ; Mon, 15 Nov 2021 07:56:04 +0000 (UTC) X-FDA: 78810406248.17.141C521 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf20.hostedemail.com (Postfix) with ESMTP id 48FCAD0000BB for ; Mon, 15 Nov 2021 07:55:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636962963; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oiRP6M0XDz2JYaGWMJAJYTL9K9of+5C/YsBL0nPA2ZA=; b=UshRouBwHf0vuWMpvkgpxgsVPpAYVGZTKqp3DWICnES7E4n4GEPSurD9K//WcZwAVcZFXC 01o41COmks00VA2wtFhYJ3lvlvs9S1N4xSArta+DmD7VBUOlDcJtQ0XM8wOo5VvVMGlaap Cm4oPUBLPCahVkxZymKUGFv5dV9y3S4= Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-282-lhQhI7oKP6Cxbk61jcfr1w-1; Mon, 15 Nov 2021 02:56:02 -0500 X-MC-Unique: lhQhI7oKP6Cxbk61jcfr1w-1 Received: by mail-pg1-f200.google.com with SMTP id p20-20020a63fe14000000b002cc2a31eaf6so8765940pgh.6 for ; Sun, 14 Nov 2021 23:56:02 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=oiRP6M0XDz2JYaGWMJAJYTL9K9of+5C/YsBL0nPA2ZA=; b=WldI0rFjcfdqecYndIRbQBt7WDQG6DrJcDpXQG8Nu67/+DofNa33ibYy10zLmvEemV IUC8ZCKVqWs4Ki4g0aITQM0ScKv6fJeWz/vMjSJHTAsfZ/YqEcS4Duw00IereIn6KeD7 mdOOMMJtMIzafOYbfO84Mc/CISIcovG7YH4tYVzvjbIhDyss2GrYH2K9BPlAWKQrQPrV miyICrKU485IJ8aRWvYPEeumcp4gUi/28IcLqFBa55iU/YVSSgt8T8RvyPFF2lBw9iOt TdaCNwGB0uRL1Ju63zv1bRPW7M6Xb5CTAI2bAKCey9XIU5WvhqTPe0vFXoC2ANjzYG96 mMBQ== X-Gm-Message-State: AOAM533pQQVVnoU2deYTGFjeu179oKTHAQd5l6qj/YRsYllQ24YWnc92 SYjSYpIhm7CON6BLOVC/7GFkC1XfK7ihpPqrF/NYfN3Oxw7afNnOs/F28XFybgifTvOZ9ZH8grr j33TWnm6/0xn6ECnrKKZbbwVnPlj4L8K3lE2RGJ46zhP5/odxsAQUafdSOUnP X-Received: by 2002:a17:90a:df97:: with SMTP id p23mr44899766pjv.3.1636962960971; Sun, 14 Nov 2021 23:56:00 -0800 (PST) X-Google-Smtp-Source: ABdhPJw80QtT18dNY1cg9toIlmKOYG28dAqiAY105pUOfD9FUEbM6Y8GthQX1coYmOY4ZZLz+1zQRg== X-Received: by 2002:a17:90a:df97:: with SMTP id p23mr44899715pjv.3.1636962960688; Sun, 14 Nov 2021 23:56:00 -0800 (PST) Received: from localhost.localdomain ([191.101.132.223]) by smtp.gmail.com with ESMTPSA id e10sm15792796pfv.140.2021.11.14.23.55.53 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 14 Nov 2021 23:56:00 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Axel Rasmussen , Nadav Amit , Mike Rapoport , Hugh Dickins , Mike Kravetz , "Kirill A . Shutemov" , Alistair Popple , Jerome Glisse , Matthew Wilcox , Andrew Morton , peterx@redhat.com, David Hildenbrand , Andrea Arcangeli Subject: [PATCH v6 03/23] mm: Check against orig_pte for finish_fault() Date: Mon, 15 Nov 2021 15:55:02 +0800 Message-Id: <20211115075522.73795-4-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211115075522.73795-1-peterx@redhat.com> References: <20211115075522.73795-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UshRouBw; spf=none (imf20.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 48FCAD0000BB X-Stat-Signature: rjsfkei468wppunzd34tdn5qw678itim X-HE-Tag: 1636962951-843775 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We used to check against none pte and in those cases orig_pte should always be none pte anyway. This change prepares us to be able to call do_fault() on !none ptes. For example, we should allow that to happen for pte marker so that we can restore information out of the pte markers. Signed-off-by: Peter Xu --- mm/memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index 04662b010005..d5966d9e24c3 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4052,7 +4052,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf) vmf->address, &vmf->ptl); ret = 0; /* Re-check under ptl */ - if (likely(pte_none(*vmf->pte))) + if (likely(pte_same(*vmf->pte, vmf->orig_pte))) do_set_pte(vmf, page, vmf->address); else ret = VM_FAULT_NOPAGE; From patchwork Mon Nov 15 07:55:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12618849 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57EADC433EF for ; Mon, 15 Nov 2021 07:56:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 07BF363211 for ; Mon, 15 Nov 2021 07:56:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 07BF363211 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 9736A6B0081; Mon, 15 Nov 2021 02:56:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8FCA26B0082; Mon, 15 Nov 2021 02:56:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7777D6B0083; Mon, 15 Nov 2021 02:56:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0048.hostedemail.com [216.40.44.48]) by kanga.kvack.org (Postfix) with ESMTP id 620256B0081 for ; Mon, 15 Nov 2021 02:56:12 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 234528249980 for ; Mon, 15 Nov 2021 07:56:12 +0000 (UTC) X-FDA: 78810406584.25.52A1BDC Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf13.hostedemail.com (Postfix) with ESMTP id 49B5510512DC for ; Mon, 15 Nov 2021 07:55:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636962971; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PmskQuQ5v+E/C6hWNn1p+tojqPkwHTmjUCaKkLoPmAQ=; b=Q9mocqAOCVZoQMRmqH3ztXxUGjz/qwKax4zWu4wdtyWK5ooracSzPSmCgSdYrU0Ze1icWn 9CjBL0MKhbAHe8jV5P1vwoN8aFwgtbLrpsrQuyqS2ALJLQvtvDoyPHUHIn9IuyQaVgaciz ls0FrXgSVsIodGevWaMj5q2ItcWMszI= Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-565-llEQGDM1OL2THjH8SmAQYQ-1; Mon, 15 Nov 2021 02:56:10 -0500 X-MC-Unique: llEQGDM1OL2THjH8SmAQYQ-1 Received: by mail-pl1-f197.google.com with SMTP id n13-20020a170902d2cd00b0014228ffc40dso5831537plc.4 for ; Sun, 14 Nov 2021 23:56:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=PmskQuQ5v+E/C6hWNn1p+tojqPkwHTmjUCaKkLoPmAQ=; b=gzQ0yir4cdKdLhiIaith/eptUW0eSi/5IUo57TEfxbDwXug4pJKFn0npzsjS1qO1nO QUvobop9db3J8AcqtZinb5JFUvtVQN3KiWMKKxG9kSvLTmI7Z2nO94WnsLDacr44nWDo og3/XnfjwR8f1nVohPlNWgzUDhOvMMGSQO4QUnDycUlfo1F70fogJhptNSLDLlo0jA3p EZKQyAM05JL0+oSd+lTartPbu8XaSC19CV/3YzOFOTVxL1RIz/UmHpEQPHNuvbqSNRnH fT8m8vSDA066ANR17FPFMpLEmHe52JmDNTJ3UqVQSsyqa/evzoDyavSDyxcRoOacxLwG Wfjg== X-Gm-Message-State: AOAM531lSFXmjN8AInooKkLzaDwROIBMGp5kyXLhAsFQfkv0gzcT/f/V FxEoxDlgGLzBPmR2oBDDnov8UU+gw2b9SgfbZXIdCSFQPDVawJKLUlCbYE/E65DJiudz1DHu6kO ycsBA8G/1Ev/V2SoNph+7kGqojc6QNBFTJXt+yuVqsfMvVH0C992dESMX2ctN X-Received: by 2002:a05:6a00:1901:b0:44b:e041:f07f with SMTP id y1-20020a056a00190100b0044be041f07fmr31906460pfi.52.1636962968805; Sun, 14 Nov 2021 23:56:08 -0800 (PST) X-Google-Smtp-Source: ABdhPJwO6oDN0fGNLF/myzmtv7RWKcD4B2LB1pQ6VJVj/upm3PYad0YwxcCTvIC/G3q3fUkWS4EYpg== X-Received: by 2002:a05:6a00:1901:b0:44b:e041:f07f with SMTP id y1-20020a056a00190100b0044be041f07fmr31906409pfi.52.1636962968263; Sun, 14 Nov 2021 23:56:08 -0800 (PST) Received: from localhost.localdomain ([191.101.132.223]) by smtp.gmail.com with ESMTPSA id e10sm15792796pfv.140.2021.11.14.23.56.01 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 14 Nov 2021 23:56:07 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Axel Rasmussen , Nadav Amit , Mike Rapoport , Hugh Dickins , Mike Kravetz , "Kirill A . Shutemov" , Alistair Popple , Jerome Glisse , Matthew Wilcox , Andrew Morton , peterx@redhat.com, David Hildenbrand , Andrea Arcangeli Subject: [PATCH v6 04/23] mm/uffd: PTE_MARKER_UFFD_WP Date: Mon, 15 Nov 2021 15:55:03 +0800 Message-Id: <20211115075522.73795-5-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211115075522.73795-1-peterx@redhat.com> References: <20211115075522.73795-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Q9mocqAO; spf=none (imf13.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 49B5510512DC X-Stat-Signature: 8k3rwdgzwx1qfptgw3muhw7noh77i86o X-HE-Tag: 1636962959-845108 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch introduces the 1st user of pte marker: the uffd-wp marker. When the pte marker is installed with the uffd-wp bit set, it means this pte was wr-protected by uffd. We will use this special pte to arm the ptes that got either unmapped or swapped out for a file-backed region that was previously wr-protected. This special pte could trigger a page fault just like swap entries. This idea is greatly inspired by Hugh and Andrea in the discussion, which is referenced in the links below. Some helpers are introduced to detect whether a swap pte is uffd wr-protected. After the pte marker introduced, one swap pte can be wr-protected in two forms: either it is a normal swap pte and it has _PAGE_SWP_UFFD_WP set, or it's a pte marker that has PTE_MARKER_UFFD_WP set. Link: https://lore.kernel.org/lkml/20201126222359.8120-1-peterx@redhat.com/ Link: https://lore.kernel.org/lkml/20201130230603.46187-1-peterx@redhat.com/ Suggested-by: Andrea Arcangeli Suggested-by: Hugh Dickins Signed-off-by: Peter Xu --- include/linux/swapops.h | 3 ++- include/linux/userfaultfd_k.h | 38 +++++++++++++++++++++++++++++++++++ mm/Kconfig | 9 +++++++++ 3 files changed, 49 insertions(+), 1 deletion(-) diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 5103d2a4ae38..2cec3ef355a7 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -249,7 +249,8 @@ static inline int is_writable_migration_entry(swp_entry_t entry) typedef unsigned long pte_marker; -#define PTE_MARKER_MASK (0) +#define PTE_MARKER_UFFD_WP BIT(0) +#define PTE_MARKER_MASK (PTE_MARKER_UFFD_WP) #ifdef CONFIG_PTE_MARKER diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 33cea484d1ad..7d7ffec53ddb 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -15,6 +15,8 @@ #include #include +#include +#include #include /* The set of all possible UFFD-related VM flags. */ @@ -236,4 +238,40 @@ static inline void userfaultfd_unmap_complete(struct mm_struct *mm, #endif /* CONFIG_USERFAULTFD */ +static inline bool is_pte_marker_uffd_wp(pte_t pte) +{ +#ifdef CONFIG_PTE_MARKER_UFFD_WP + swp_entry_t entry; + + if (!is_swap_pte(pte)) + return false; + + entry = pte_to_swp_entry(pte); + + return is_pte_marker_entry(entry) && + (pte_marker_get(entry) & PTE_MARKER_UFFD_WP); +#else + return false; +#endif +} + +/* + * Returns true if this is a swap pte and was uffd-wp wr-protected in either + * forms (pte marker or a normal swap pte), false otherwise. + */ +static inline bool pte_swp_uffd_wp_any(pte_t pte) +{ +#ifdef CONFIG_PTE_MARKER_UFFD_WP + if (!is_swap_pte(pte)) + return false; + + if (pte_swp_uffd_wp(pte)) + return true; + + if (is_pte_marker_uffd_wp(pte)) + return true; +#endif + return false; +} + #endif /* _LINUX_USERFAULTFD_K_H */ diff --git a/mm/Kconfig b/mm/Kconfig index 66f23c6c2032..f01c8e0afadf 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -904,6 +904,15 @@ config PTE_MARKER help Allows to create marker PTEs for file-backed memory. +config PTE_MARKER_UFFD_WP + bool "Marker PTEs support for userfaultfd write protection" + depends on PTE_MARKER && HAVE_ARCH_USERFAULTFD_WP + + help + Allows to create marker PTEs for userfaultfd write protection + purposes. It is required to enable userfaultfd write protection on + file-backed memory types like shmem and hugetlbfs. + source "mm/damon/Kconfig" endmenu From patchwork Mon Nov 15 07:55:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12618851 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F18DEC433EF for ; Mon, 15 Nov 2021 07:56:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A69D76321A for ; Mon, 15 Nov 2021 07:56:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org A69D76321A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 46AEE6B0083; Mon, 15 Nov 2021 02:56:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3F45C6B0085; Mon, 15 Nov 2021 02:56:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 296296B0087; Mon, 15 Nov 2021 02:56:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0236.hostedemail.com [216.40.44.236]) by kanga.kvack.org (Postfix) with ESMTP id 11BA16B0083 for ; Mon, 15 Nov 2021 02:56:20 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id C3E838249980 for ; Mon, 15 Nov 2021 07:56:19 +0000 (UTC) X-FDA: 78810406878.22.226AE73 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf24.hostedemail.com (Postfix) with ESMTP id 5364EB0000B1 for ; Mon, 15 Nov 2021 07:56:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636962978; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ubfx2uHai2Y0k1b/BYqRnZVepmNNtt1cqqHxspLPNNE=; b=LScwSVagYHBxtVtmrNKsle6JXYyd6OlEyV2yjuZTsEs09ti9fR9ozo7rauToZ046AZNNZs kEn6/6ueoPX7q4snclOdVulhPP2gncgj67BOs3XkzSAphUf4qVV0y+Pcb+v1ka8Wa8YklZ 1I5+rSwATj3r6YXRf3KvXaNtjNe/VGE= Received: from mail-pj1-f72.google.com (mail-pj1-f72.google.com [209.85.216.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-251-J9iDmPwRPkGj7hYL99oZYw-1; Mon, 15 Nov 2021 02:56:17 -0500 X-MC-Unique: J9iDmPwRPkGj7hYL99oZYw-1 Received: by mail-pj1-f72.google.com with SMTP id mv1-20020a17090b198100b001a67d5901d2so8210355pjb.7 for ; Sun, 14 Nov 2021 23:56:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ubfx2uHai2Y0k1b/BYqRnZVepmNNtt1cqqHxspLPNNE=; b=WC2eTBDu/OucFx+W5ihiPm141dhHzY1Pz7CKBFvuo8cpE5FnQZ7j8Vyi0Lnvkee/RN vSQQTolSuD7KW+l4VcApEMHf7Rw3s973dJDSt7WnkrO856wfbYJ2viiCPc2VWiTdMCiA +5PXphasuh1Z4MJgsA5QFwFYdM4w3WsCPXMvOrUX5hBAl9MmvzwxfV4x6Ox/3SaPNV6e l7MjZOcQSgF5cfh2XBpSuVgcC3hbATAcStypyS7bXTvJhIwzMNLmVNP19V9RuXbwu82e qq0IwZ2o2rhTaUN2CboyEdUJsI/GOOCRbQ/D0XyNRfLirZmvOsdfeGaxhzZxWLG3QYk6 23lg== X-Gm-Message-State: AOAM530E78GtxjTSjgFn8kfsdm+CQ+G/jwgXMUAmVDIMlEHStCg6Vngm kGIj+A+UmhpUHSsXbMnkvdXLqEm2MTd+bCtWtRNl0iSZOxnFiAQQSOJvYpyv7dmI1xShlKnj7jU Si2AyIaICXg+vsERkIKOvMW+thBhrdY2PXGNHlHPtCMf6cefXebbZPD9QwqN9 X-Received: by 2002:a17:902:b615:b0:143:bbf0:aad0 with SMTP id b21-20020a170902b61500b00143bbf0aad0mr16671635pls.12.1636962976302; Sun, 14 Nov 2021 23:56:16 -0800 (PST) X-Google-Smtp-Source: ABdhPJwavIcr/CimjYOEqw/YJdq6Mqfcy+56LKYHxJhlNMkBwmXshfu7Q/OZQlTY3k8NhCo5kJhghQ== X-Received: by 2002:a17:902:b615:b0:143:bbf0:aad0 with SMTP id b21-20020a170902b61500b00143bbf0aad0mr16671588pls.12.1636962975927; Sun, 14 Nov 2021 23:56:15 -0800 (PST) Received: from localhost.localdomain ([191.101.132.223]) by smtp.gmail.com with ESMTPSA id e10sm15792796pfv.140.2021.11.14.23.56.08 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 14 Nov 2021 23:56:15 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Axel Rasmussen , Nadav Amit , Mike Rapoport , Hugh Dickins , Mike Kravetz , "Kirill A . Shutemov" , Alistair Popple , Jerome Glisse , Matthew Wilcox , Andrew Morton , peterx@redhat.com, David Hildenbrand , Andrea Arcangeli Subject: [PATCH v6 05/23] mm/shmem: Take care of UFFDIO_COPY_MODE_WP Date: Mon, 15 Nov 2021 15:55:04 +0800 Message-Id: <20211115075522.73795-6-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211115075522.73795-1-peterx@redhat.com> References: <20211115075522.73795-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 5364EB0000B1 X-Stat-Signature: gqi5j6xgk35an65uwctqiqcagqhofsad Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=LScwSVag; spf=none (imf24.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1636962979-769798 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Pass wp_copy into shmem_mfill_atomic_pte() through the stack, then apply the UFFD_WP bit properly when the UFFDIO_COPY on shmem is with UFFDIO_COPY_MODE_WP. wp_copy lands mfill_atomic_install_pte() finally. Note: we must do pte_wrprotect() if !writable in mfill_atomic_install_pte(), as mk_pte() could return a writable pte (e.g., when VM_SHARED on a shmem file). Signed-off-by: Peter Xu --- include/linux/shmem_fs.h | 4 ++-- mm/shmem.c | 4 ++-- mm/userfaultfd.c | 30 +++++++++++++++++++++--------- 3 files changed, 25 insertions(+), 13 deletions(-) diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 166158b6e917..0ee0f437b14f 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -145,11 +145,11 @@ extern int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, - bool zeropage, + bool zeropage, bool wp_copy, struct page **pagep); #else /* !CONFIG_SHMEM */ #define shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, \ - src_addr, zeropage, pagep) ({ BUG(); 0; }) + src_addr, zeropage, wp_copy, pagep) ({ BUG(); 0; }) #endif /* CONFIG_SHMEM */ #endif /* CONFIG_USERFAULTFD */ diff --git a/mm/shmem.c b/mm/shmem.c index dc038ce78700..167a46e6a1ff 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2344,7 +2344,7 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, - bool zeropage, + bool zeropage, bool wp_copy, struct page **pagep) { struct inode *inode = file_inode(dst_vma->vm_file); @@ -2415,7 +2415,7 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, goto out_release; ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr, - page, true, false); + page, true, wp_copy); if (ret) goto out_delete_from_cache; diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index ac6f036298cd..95e5a9ba3196 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -70,14 +70,22 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, _dst_pte = mk_pte(page, dst_vma->vm_page_prot); _dst_pte = pte_mkdirty(_dst_pte); - if (page_in_cache && !vm_shared) + /* Don't write if uffd-wp wr-protected */ + if (wp_copy) { + _dst_pte = pte_mkuffd_wp(_dst_pte); writable = false; - if (writable) { - if (wp_copy) - _dst_pte = pte_mkuffd_wp(_dst_pte); - else - _dst_pte = pte_mkwrite(_dst_pte); } + /* Don't write if page cache privately mapped */ + if (page_in_cache && !vm_shared) + writable = false; + if (writable) + _dst_pte = pte_mkwrite(_dst_pte); + else + /* + * We need this to make sure write bit removed; as mk_pte() + * could return a pte with write bit set. + */ + _dst_pte = pte_wrprotect(_dst_pte); dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl); @@ -92,7 +100,12 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, } ret = -EEXIST; - if (!pte_none(*dst_pte)) + /* + * We allow to overwrite a pte marker: consider when both MISSING|WP + * registered, we firstly wr-protect a none pte which has no page cache + * page backing it, then access the page. + */ + if (!pte_none_mostly(*dst_pte)) goto out_unlock; if (page_in_cache) @@ -467,11 +480,10 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, err = mfill_zeropage_pte(dst_mm, dst_pmd, dst_vma, dst_addr); } else { - VM_WARN_ON_ONCE(wp_copy); err = shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, src_addr, mode != MCOPY_ATOMIC_NORMAL, - page); + wp_copy, page); } return err; From patchwork Mon Nov 15 07:55:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12618853 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8FC8DC433EF for ; Mon, 15 Nov 2021 07:56:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4157F63219 for ; Mon, 15 Nov 2021 07:56:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 4157F63219 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id E07176B0087; Mon, 15 Nov 2021 02:56:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D91066B0088; Mon, 15 Nov 2021 02:56:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C0A7B6B0089; Mon, 15 Nov 2021 02:56:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0207.hostedemail.com [216.40.44.207]) by kanga.kvack.org (Postfix) with ESMTP id AA0F96B0087 for ; Mon, 15 Nov 2021 02:56:28 -0500 (EST) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 51F5E8249980 for ; Mon, 15 Nov 2021 07:56:28 +0000 (UTC) X-FDA: 78810407256.31.A26756E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf19.hostedemail.com (Postfix) with ESMTP id BD207B0000BD for ; Mon, 15 Nov 2021 07:56:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636962987; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PQTAYR2U0B9NQT5q5Dh9pQHL0/MM7Qz5TQPJEv+v98g=; b=VqdSPyr1TMRw5Sc1JaPh/CnbsFI90vx9l4gVJdaMgMnhZfZuKYauHz4MDJ+4s4YS97M8DS C9IqJzVs1i2J8FdB6U70AC2qXOwrdtkC2lG/0MdN7GM0YBP4YsMoFF4ZjVTBUt9Vn2TxVP SZfjUjdbjZt0iwgoqcYpfNMFIrFm6gY= Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-398-0UCfQBLIN8ajQlpNf7ednA-1; Mon, 15 Nov 2021 02:56:26 -0500 X-MC-Unique: 0UCfQBLIN8ajQlpNf7ednA-1 Received: by mail-pf1-f199.google.com with SMTP id x14-20020a627c0e000000b0049473df362dso9577118pfc.12 for ; Sun, 14 Nov 2021 23:56:26 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=PQTAYR2U0B9NQT5q5Dh9pQHL0/MM7Qz5TQPJEv+v98g=; b=iURiSCGg43EhRVaL4u/OA8us7bbqsf5YOz+1YqJJuCNUfOnf6RmMUO2eidDNitBXDV huhw6n+xLC3dKz4/Jg51PKaLaJ51JEO1VEHFZiwLx92qESKeZAXD+o3YieBZxkfTSFdX hS49++HrXjg04EAemAcjIe34QUfiVXXSojBcYR/2fJP9QGf4j1wSg+EUlIaqfrh50BHa qg03Xi6PVt0UYHqvTwn4o5fgzd6unxQNtvcnmD7ultCKBtdu6Ln2VaUCJ3sukqc3mnyG jFEYK6RxP/Jcsnl3XS2sZjddNaR9vMjUhxQtYaMs2drc3F1eiuQCptCCtfJzJtjsiGSn hRSg== X-Gm-Message-State: AOAM533RatoJDSKYeWbDkpqkUsuUJa9v3tgBJNFej2HNKhR0wJmOXb5S RyxwFoFsq/igkCj+3WAf8QNTv7oMcXHoZTgVN5grKOyRlFyolNxS48MSA11H8U56koV+GyItNuL POljM1Tyr/CrRVPULgV8BdyuJ1F6UeWnhs3HOO3UY3MM7oMlL7gPCovxz8Ivz X-Received: by 2002:a17:90a:ca81:: with SMTP id y1mr61191372pjt.231.1636962984639; Sun, 14 Nov 2021 23:56:24 -0800 (PST) X-Google-Smtp-Source: ABdhPJwQVZcj01aFlbyqYO4BSw1DS5ur5lPKwB1eRxjFMR30tJL7G2a+EjobAshXx0ZYvcchxwbzZQ== X-Received: by 2002:a17:90a:ca81:: with SMTP id y1mr61191302pjt.231.1636962984128; Sun, 14 Nov 2021 23:56:24 -0800 (PST) Received: from localhost.localdomain ([191.101.132.223]) by smtp.gmail.com with ESMTPSA id e10sm15792796pfv.140.2021.11.14.23.56.16 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 14 Nov 2021 23:56:23 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Axel Rasmussen , Nadav Amit , Mike Rapoport , Hugh Dickins , Mike Kravetz , "Kirill A . Shutemov" , Alistair Popple , Jerome Glisse , Matthew Wilcox , Andrew Morton , peterx@redhat.com, David Hildenbrand , Andrea Arcangeli Subject: [PATCH v6 06/23] mm/shmem: Handle uffd-wp special pte in page fault handler Date: Mon, 15 Nov 2021 15:55:05 +0800 Message-Id: <20211115075522.73795-7-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211115075522.73795-1-peterx@redhat.com> References: <20211115075522.73795-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=VqdSPyr1; spf=none (imf19.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: BD207B0000BD X-Stat-Signature: pq1qrw74zcg4pdryg9fxy88t5hgsusix X-HE-Tag: 1636962977-367084 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: File-backed memories are prone to unmap/swap so the ptes are always unstable, because they can be easily faulted back later using the page cache. This could lead to uffd-wp getting lost when unmapping or swapping out such memory. One example is shmem. PTE markers are needed to store those information. This patch prepares it by handling uffd-wp pte markers first it is applied elsewhere, so that the page fault handler can recognize uffd-wp pte markers. The handling of uffd-wp pte markers is similar to missing fault, it's just that we'll handle this "missing fault" when we see the pte markers, meanwhile we need to make sure the marker information is kept during processing the fault. This is a slow path of uffd-wp handling, because zapping of wr-protected shmem ptes should be rare. So far it should only trigger in two conditions: (1) When trying to punch holes in shmem_fallocate(), there is an optimization to zap the pgtables before evicting the page. (2) When swapping out shmem pages. Because of this, the page fault handling is simplifed too by not sending the wr-protect message in the 1st page fault, instead the page will be installed read-only, so the uffd-wp message will be generated in the next fault, which will trigger the do_wp_page() path of general uffd-wp handling. Disable fault-around for all uffd-wp registered ranges for extra safety just like uffd-minor fault, and clean the code up. Signed-off-by: Peter Xu --- include/linux/userfaultfd_k.h | 17 +++++++++ mm/memory.c | 71 ++++++++++++++++++++++++++++++----- 2 files changed, 79 insertions(+), 9 deletions(-) diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 7d7ffec53ddb..05cec02140cb 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -96,6 +96,18 @@ static inline bool uffd_disable_huge_pmd_share(struct vm_area_struct *vma) return vma->vm_flags & (VM_UFFD_WP | VM_UFFD_MINOR); } +/* + * Don't do fault around for either WP or MINOR registered uffd range. For + * MINOR registered range, fault around will be a total disaster and ptes can + * be installed without notifications; for WP it should mostly be fine as long + * as the fault around checks for pte_none() before the installation, however + * to be super safe we just forbid it. + */ +static inline bool uffd_disable_fault_around(struct vm_area_struct *vma) +{ + return vma->vm_flags & (VM_UFFD_WP | VM_UFFD_MINOR); +} + static inline bool userfaultfd_missing(struct vm_area_struct *vma) { return vma->vm_flags & VM_UFFD_MISSING; @@ -236,6 +248,11 @@ static inline void userfaultfd_unmap_complete(struct mm_struct *mm, { } +static inline bool uffd_disable_fault_around(struct vm_area_struct *vma) +{ + return false; +} + #endif /* CONFIG_USERFAULTFD */ static inline bool is_pte_marker_uffd_wp(pte_t pte) diff --git a/mm/memory.c b/mm/memory.c index d5966d9e24c3..e8557d43a87d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3452,6 +3452,43 @@ static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf) return 0; } +static vm_fault_t pte_marker_clear(struct vm_fault *vmf) +{ + vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, + vmf->address, &vmf->ptl); + /* + * Be careful so that we will only recover a special uffd-wp pte into a + * none pte. Otherwise it means the pte could have changed, so retry. + */ + if (is_pte_marker(*vmf->pte)) + pte_clear(vmf->vma->vm_mm, vmf->address, vmf->pte); + pte_unmap_unlock(vmf->pte, vmf->ptl); + return 0; +} + +/* + * This is actually a page-missing access, but with uffd-wp special pte + * installed. It means this pte was wr-protected before being unmapped. + */ +static vm_fault_t pte_marker_handle_uffd_wp(struct vm_fault *vmf) +{ + /* Careful! vmf->pte unmapped after return */ + if (!pte_unmap_same(vmf)) + return 0; + + /* + * Just in case there're leftover special ptes even after the region + * got unregistered - we can simply clear them. We can also do that + * proactively when e.g. when we do UFFDIO_UNREGISTER upon some uffd-wp + * ranges, but it should be more efficient to be done lazily here. + */ + if (unlikely(!userfaultfd_wp(vmf->vma) || vma_is_anonymous(vmf->vma))) + return pte_marker_clear(vmf); + + /* do_fault() can handle pte markers too like none pte */ + return do_fault(vmf); +} + static vm_fault_t handle_pte_marker(struct vm_fault *vmf) { swp_entry_t entry = pte_to_swp_entry(vmf->orig_pte); @@ -3465,8 +3502,11 @@ static vm_fault_t handle_pte_marker(struct vm_fault *vmf) if (WARN_ON_ONCE(vma_is_anonymous(vmf->vma) || !marker)) return VM_FAULT_SIGBUS; - /* TODO: handle pte markers */ - return 0; + if (marker & PTE_MARKER_UFFD_WP) + return pte_marker_handle_uffd_wp(vmf); + + /* This is an unknown pte marker */ + return VM_FAULT_SIGBUS; } /* @@ -3968,6 +4008,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page) void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr) { struct vm_area_struct *vma = vmf->vma; + bool uffd_wp = is_pte_marker_uffd_wp(vmf->orig_pte); bool write = vmf->flags & FAULT_FLAG_WRITE; bool prefault = vmf->address != addr; pte_t entry; @@ -3982,6 +4023,8 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr) if (write) entry = maybe_mkwrite(pte_mkdirty(entry), vma); + if (unlikely(uffd_wp)) + entry = pte_mkuffd_wp(pte_wrprotect(entry)); /* copy-on-write page */ if (write && !(vma->vm_flags & VM_SHARED)) { inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); @@ -4155,9 +4198,21 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf) return vmf->vma->vm_ops->map_pages(vmf, start_pgoff, end_pgoff); } +/* Return true if we should do read fault-around, false otherwise */ +static inline bool should_fault_around(struct vm_fault *vmf) +{ + /* No ->map_pages? No way to fault around... */ + if (!vmf->vma->vm_ops->map_pages) + return false; + + if (uffd_disable_fault_around(vmf->vma)) + return false; + + return fault_around_bytes >> PAGE_SHIFT > 1; +} + static vm_fault_t do_read_fault(struct vm_fault *vmf) { - struct vm_area_struct *vma = vmf->vma; vm_fault_t ret = 0; /* @@ -4165,12 +4220,10 @@ static vm_fault_t do_read_fault(struct vm_fault *vmf) * if page by the offset is not ready to be mapped (cold cache or * something). */ - if (vma->vm_ops->map_pages && fault_around_bytes >> PAGE_SHIFT > 1) { - if (likely(!userfaultfd_minor(vmf->vma))) { - ret = do_fault_around(vmf); - if (ret) - return ret; - } + if (should_fault_around(vmf)) { + ret = do_fault_around(vmf); + if (ret) + return ret; } ret = __do_fault(vmf); From patchwork Mon Nov 15 07:55:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12618855 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BC30C433EF for ; Mon, 15 Nov 2021 07:56:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EA53F6321A for ; Mon, 15 Nov 2021 07:56:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org EA53F6321A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 88A2A6B0089; Mon, 15 Nov 2021 02:56:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 812D46B008A; Mon, 15 Nov 2021 02:56:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 68C376B008C; Mon, 15 Nov 2021 02:56:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0164.hostedemail.com [216.40.44.164]) by kanga.kvack.org (Postfix) with ESMTP id 543606B0089 for ; Mon, 15 Nov 2021 02:56:36 -0500 (EST) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 186B578B15 for ; Mon, 15 Nov 2021 07:56:36 +0000 (UTC) X-FDA: 78810407592.04.288198C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf30.hostedemail.com (Postfix) with ESMTP id 74CF5E001989 for ; Mon, 15 Nov 2021 07:56:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636962995; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bNc70TgztEilFYXrr/S3QzbmII7tOwB+GzXzdIRG250=; b=fg09nfDttRuGrnmh/ro5HL2NrfquePehpd2NckaXzI6IrOY51HZZvqsaS6/wL8m96+zWf7 Gvx0ZJ659S5S5q6ZVxyCXFRE0eVzF7WpUGJtICgI85MnSMvDJwVFlrqYY00+lzLmCLGsoz 4j+THj3rmN1DyVHUJPH+A+b9C/Pv2TU= Received: from mail-pj1-f72.google.com (mail-pj1-f72.google.com [209.85.216.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-195-lC9AKuD1Mx6iLZh1YtqLtA-1; Mon, 15 Nov 2021 02:56:33 -0500 X-MC-Unique: lC9AKuD1Mx6iLZh1YtqLtA-1 Received: by mail-pj1-f72.google.com with SMTP id p12-20020a17090b010c00b001a65bfe8054so8617142pjz.8 for ; Sun, 14 Nov 2021 23:56:33 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=bNc70TgztEilFYXrr/S3QzbmII7tOwB+GzXzdIRG250=; b=KLGqZt36Rr5TWVkFP8o8rjoD87TWO7WzM7+wDCDRbW0bG9bJYVqatVWe0ZzoWrjvkL PWIQgBfR693aY1rrVk7725qbuV3ELf/rrv09ESM8hMSs7/bmA68wdjR3mPzLF0xkeYXu WMAs5i04uhD0d9150b7MYM9Gd45t06yYEjy/kV+eHcSAhicaBvSKPwrS8YEAIQiDWguo hZzWD1dbkkqmPkleuLELzPZ32chub4wjdwMF/PqqnZ7ZEkfySS0KrzH6tGceCz6IpfM7 zqWIflfbK4IM3yUqMw3HFrcd56DiP6pMmP0mK0Dd+WSJVJj+Opewk8b/t8gT+AvTxtsS wWtA== X-Gm-Message-State: AOAM531ewmX7ns1G1e42OaOh5/BtyS02sI0dTygp+amChVR59m9RpobC eOgTWJRHppcKy7Y0FtcwVkHIoljxj1UeG1zqrIJvfGPAOeypF1SYANc22pfdEjZRd+nlhgDIG76 /xIEHkXdG0TqcES9eUF2sTbVNIX5jSvntZF8McNd04jHpKbjZbTdRqabFm+Az X-Received: by 2002:a17:90a:ec15:: with SMTP id l21mr6509949pjy.48.1636962992208; Sun, 14 Nov 2021 23:56:32 -0800 (PST) X-Google-Smtp-Source: ABdhPJwcWLiSu54dg3NGApWSpCAOU0KSpIFrzWp5wpXMgcgfTovXA+wja+X2pK0NLh2KBFy3bdsyhw== X-Received: by 2002:a17:90a:ec15:: with SMTP id l21mr6509874pjy.48.1636962991714; Sun, 14 Nov 2021 23:56:31 -0800 (PST) Received: from localhost.localdomain ([191.101.132.223]) by smtp.gmail.com with ESMTPSA id e10sm15792796pfv.140.2021.11.14.23.56.24 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 14 Nov 2021 23:56:31 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Axel Rasmussen , Nadav Amit , Mike Rapoport , Hugh Dickins , Mike Kravetz , "Kirill A . Shutemov" , Alistair Popple , Jerome Glisse , Matthew Wilcox , Andrew Morton , peterx@redhat.com, David Hildenbrand , Andrea Arcangeli Subject: [PATCH v6 07/23] mm/shmem: Persist uffd-wp bit across zapping for file-backed Date: Mon, 15 Nov 2021 15:55:06 +0800 Message-Id: <20211115075522.73795-8-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211115075522.73795-1-peterx@redhat.com> References: <20211115075522.73795-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=fg09nfDt; spf=none (imf30.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 74CF5E001989 X-Stat-Signature: zi9bu5e3od1opy65hukkyhiw1bd6oi73 X-HE-Tag: 1636962971-281907 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: File-backed memory is prone to being unmapped at any time. It means all information in the pte will be dropped, including the uffd-wp flag. To persist the uffd-wp flag, we'll use the pte markers. This patch teaches the zap code to understand uffd-wp and know when to keep or drop the uffd-wp bit. Add a new flag ZAP_FLAG_DROP_MARKER and set it in zap_details when we don't want to persist such an information, for example, when destroying the whole vma, or punching a hole in a shmem file. For the rest cases we should never drop the uffd-wp bit, or the wr-protect information will get lost. Signed-off-by: Peter Xu --- include/linux/mm.h | 20 +++++++++++++++++ include/linux/mm_inline.h | 45 +++++++++++++++++++++++++++++++++++++++ mm/memory.c | 38 +++++++++++++++++++++++++++++++-- mm/rmap.c | 8 +++++++ 4 files changed, 109 insertions(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index a7e4a9e7d807..015e287063a8 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1825,12 +1825,23 @@ static inline bool can_do_mlock(void) { return false; } extern int user_shm_lock(size_t, struct ucounts *); extern void user_shm_unlock(size_t, struct ucounts *); +typedef unsigned int __bitwise zap_flags_t; + +/* + * Whether to drop the pte markers, for example, the uffd-wp information for + * file-backed memory. This should only be specified when we will completely + * drop the page in the mm, either by truncation or unmapping of the vma. By + * default, the flag is not set. + */ +#define ZAP_FLAG_DROP_MARKER ((__force zap_flags_t) BIT(0)) + /* * Parameter block passed down to zap_pte_range in exceptional cases. */ struct zap_details { struct address_space *zap_mapping; /* Check page->mapping if set */ struct page *single_page; /* Locked page to be unmapped */ + zap_flags_t zap_flags; /* Extra flags for zapping */ }; /* @@ -1847,6 +1858,15 @@ zap_skip_check_mapping(struct zap_details *details, struct page *page) (details->zap_mapping != page_rmapping(page)); } +static inline bool +zap_drop_file_uffd_wp(struct zap_details *details) +{ + if (!details) + return false; + + return details->zap_flags & ZAP_FLAG_DROP_MARKER; +} + struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte); struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr, diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index e2ec68b0515c..ca861e910938 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -4,6 +4,8 @@ #include #include +#include +#include /** * folio_is_file_lru - Should the folio be on a file LRU or anon LRU? @@ -135,4 +137,47 @@ static __always_inline void del_page_from_lru_list(struct page *page, { lruvec_del_folio(lruvec, page_folio(page)); } + +/* + * If this pte is wr-protected by uffd-wp in any form, arm the special pte to + * replace a none pte. NOTE! This should only be called when *pte is already + * cleared so we will never accidentally replace something valuable. Meanwhile + * none pte also means we are not demoting the pte so tlb flushed is not needed. + * E.g., when pte cleared the caller should have taken care of the tlb flush. + * + * Must be called with pgtable lock held so that no thread will see the none + * pte, and if they see it, they'll fault and serialize at the pgtable lock. + * + * This function is a no-op if PTE_MARKER_UFFD_WP is not enabled. + */ +static inline void +pte_install_uffd_wp_if_needed(struct vm_area_struct *vma, unsigned long addr, + pte_t *pte, pte_t pteval) +{ +#ifdef CONFIG_PTE_MARKER_UFFD_WP + bool arm_uffd_pte = false; + + /* The current status of the pte should be "cleared" before calling */ + WARN_ON_ONCE(!pte_none(*pte)); + + if (vma_is_anonymous(vma)) + return; + + /* A uffd-wp wr-protected normal pte */ + if (unlikely(pte_present(pteval) && pte_uffd_wp(pteval))) + arm_uffd_pte = true; + + /* + * A uffd-wp wr-protected swap pte. Note: this should even cover an + * existing pte marker with uffd-wp bit set. + */ + if (unlikely(pte_swp_uffd_wp_any(pteval))) + arm_uffd_pte = true; + + if (unlikely(arm_uffd_pte)) + set_pte_at(vma->vm_mm, addr, pte, + make_pte_marker(PTE_MARKER_UFFD_WP)); +#endif +} + #endif diff --git a/mm/memory.c b/mm/memory.c index e8557d43a87d..fef6a91c5dfb 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -73,6 +73,7 @@ #include #include #include +#include #include @@ -1306,6 +1307,21 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) return ret; } +/* + * This function makes sure that we'll replace the none pte with an uffd-wp + * swap special pte marker when necessary. Must be with the pgtable lock held. + */ +static inline void +zap_install_uffd_wp_if_needed(struct vm_area_struct *vma, + unsigned long addr, pte_t *pte, + struct zap_details *details, pte_t pteval) +{ + if (zap_drop_file_uffd_wp(details)) + return; + + pte_install_uffd_wp_if_needed(vma, addr, pte, pteval); +} + static unsigned long zap_pte_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, @@ -1343,6 +1359,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm); tlb_remove_tlb_entry(tlb, pte, addr); + zap_install_uffd_wp_if_needed(vma, addr, pte, details, + ptent); if (unlikely(!page)) continue; @@ -1373,6 +1391,13 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, page = pfn_swap_entry_to_page(entry); if (unlikely(zap_skip_check_mapping(details, page))) continue; + /* + * Both device private/exclusive mappings should only + * work with anonymous page so far, so we don't need to + * consider uffd-wp bit when zap. For more information, + * see zap_install_uffd_wp_if_needed(). + */ + WARN_ON_ONCE(!vma_is_anonymous(vma)); rss[mm_counter(page)]--; if (is_device_private_entry(entry)) page_remove_rmap(page, false); @@ -1383,13 +1408,18 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, continue; rss[mm_counter(page)]--; } else if (is_pte_marker_entry(entry)) { - /* By default, simply drop all pte markers when zap */ + /* Currently there's only uffd-wp marker bit */ + WARN_ON_ONCE(!(pte_marker_get(entry) & PTE_MARKER_UFFD_WP)); + /* Only drop the uffd-wp marker if explicitly requested */ + if (!zap_drop_file_uffd_wp(details)) + continue; } else if (!non_swap_entry(entry)) { rss[MM_SWAPENTS]--; if (unlikely(!free_swap_and_cache(entry))) print_bad_pte(vma, addr, ptent, NULL); } pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); + zap_install_uffd_wp_if_needed(vma, addr, pte, details, ptent); } while (pte++, addr += PAGE_SIZE, addr != end); add_mm_rss_vec(mm, rss); @@ -1600,12 +1630,15 @@ void unmap_vmas(struct mmu_gather *tlb, unsigned long end_addr) { struct mmu_notifier_range range; + struct zap_details details = { + .zap_flags = ZAP_FLAG_DROP_MARKER, + }; mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm, start_addr, end_addr); mmu_notifier_invalidate_range_start(&range); for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next) - unmap_single_vma(tlb, vma, start_addr, end_addr, NULL); + unmap_single_vma(tlb, vma, start_addr, end_addr, &details); mmu_notifier_invalidate_range_end(&range); } @@ -3350,6 +3383,7 @@ void unmap_mapping_page(struct page *page) details.zap_mapping = mapping; details.single_page = page; + details.zap_flags = ZAP_FLAG_DROP_MARKER; i_mmap_lock_write(mapping); if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))) diff --git a/mm/rmap.c b/mm/rmap.c index 163ac4e6bcee..89068e957486 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -73,6 +73,7 @@ #include #include #include +#include #include @@ -1517,6 +1518,13 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, pteval = ptep_clear_flush(vma, address, pvmw.pte); } + /* + * Now the pte is cleared. If this is uffd-wp armed pte, we + * may want to replace a none pte with a marker pte if it's + * file-backed, so we don't lose the tracking information. + */ + pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); + /* Move the dirty bit to the page. Now the pte is gone. */ if (pte_dirty(pteval)) set_page_dirty(page); From patchwork Mon Nov 15 08:00:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12618861 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37C8BC433EF for ; Mon, 15 Nov 2021 08:00:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B69B46321B for ; Mon, 15 Nov 2021 08:00:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org B69B46321B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 1A6AA6B007B; Mon, 15 Nov 2021 03:00:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 156AF6B007D; Mon, 15 Nov 2021 03:00:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 045286B007E; Mon, 15 Nov 2021 03:00:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0037.hostedemail.com [216.40.44.37]) by kanga.kvack.org (Postfix) with ESMTP id E782A6B007B for ; Mon, 15 Nov 2021 03:00:51 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id A44ED180ED780 for ; Mon, 15 Nov 2021 08:00:51 +0000 (UTC) X-FDA: 78810418302.23.C8C59B3 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf06.hostedemail.com (Postfix) with ESMTP id 379E0801AB0D for ; Mon, 15 Nov 2021 08:00:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636963250; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=enD0wV1PzTMcVZir6gX0xAfGiKSstXRuvW8GppqUCLE=; b=RS1jTk+j0wKyYoaLKLKgB19AzvbAjLpnu6HBuSIVfDflg2BIXOxT5wjL3UTC6/iHVMf1we YowlJx4R9xrergHlifumORrVmU71AMy54fb2wkAjP4XTdbB6LO4VNPQF1kNzhX7CAuVxRK KMtf4zp27cFO3i39xp+InBLccR5An7U= Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-499-_F_6ASZhNiWVtDLfeD6G2g-1; Mon, 15 Nov 2021 03:00:49 -0500 X-MC-Unique: _F_6ASZhNiWVtDLfeD6G2g-1 Received: by mail-pg1-f199.google.com with SMTP id 76-20020a63054f000000b002c9284978aaso8771449pgf.10 for ; Mon, 15 Nov 2021 00:00:49 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=enD0wV1PzTMcVZir6gX0xAfGiKSstXRuvW8GppqUCLE=; b=aVg0KMNOxiONr3hzPJEo/B6seYMzXReflbsqOydHGQZrgiuJhsfIfNrh8qidANRx2w 3tH55N7EbMBnefFYJCNxa/S/Lxa2KDjn2dDqT5DXL/tKaFqyGZJGgaQis+3zvtMgNosq 8/YkbC1IPEgbRZCzN/KEgnF/xfbVB0Qt19U2eOIAFUuQPnU39REc7qd0uW/Sb8MGg2j4 a6E5oWmty8G1/lAxpD1lOh8dvklitEO7JbiamV8+RTEqO5trTwPZo52remnrqL/0J7SM AzB0j/4wn/t2vCiJFi1MueKXXh53Yv8NrfuKFJuwtx5cuYYU8NPohb30MYztEmFi53wU ZVtw== X-Gm-Message-State: AOAM531hq4bAvz2rdqeUHK5z5ORfgBnCKUohAA/g/hUpR5jx1KzjrJ+g 4/xbM/BR5vv9zbWwMgHauslXEF4FyMmeq2WcLPUpk2suADedi3F4IHObQ2gIQwYz4qC9hihyAs8 Y2sDU00jtS6/w0o/E2LC6/YYpbh8rSwp5V0C4bxJBZ44wXHiR5QJPX3mNYANp X-Received: by 2002:a05:6a00:84c:b0:494:6d40:ed76 with SMTP id q12-20020a056a00084c00b004946d40ed76mr31113179pfk.65.1636963248283; Mon, 15 Nov 2021 00:00:48 -0800 (PST) X-Google-Smtp-Source: ABdhPJxnNCoV7LVEKS2kGNyP+wQ8iY7s2Exa4jA+VMzI9tOHbNahQ0/af4BuXB848choZ6Ilf+Dr0g== X-Received: by 2002:a05:6a00:84c:b0:494:6d40:ed76 with SMTP id q12-20020a056a00084c00b004946d40ed76mr31113125pfk.65.1636963247852; Mon, 15 Nov 2021 00:00:47 -0800 (PST) Received: from localhost.localdomain ([94.177.118.89]) by smtp.gmail.com with ESMTPSA id t40sm14468176pfg.107.2021.11.15.00.00.41 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Nov 2021 00:00:47 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Nadav Amit , peterx@redhat.com, Alistair Popple , Andrew Morton , Mike Kravetz , Mike Rapoport , Matthew Wilcox , Jerome Glisse , Axel Rasmussen , "Kirill A . Shutemov" , David Hildenbrand , Andrea Arcangeli , Hugh Dickins Subject: [PATCH v6 08/23] mm/shmem: Allow uffd wr-protect none pte for file-backed mem Date: Mon, 15 Nov 2021 16:00:34 +0800 Message-Id: <20211115080034.74526-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211115075522.73795-1-peterx@redhat.com> References: <20211115075522.73795-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 379E0801AB0D X-Stat-Signature: 16bw99i51dt58kx6w5xgdj3c55wsfszu Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=RS1jTk+j; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf06.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=peterx@redhat.com X-HE-Tag: 1636963251-508064 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: File-backed memory differs from anonymous memory in that even if the pte is missing, the data could still resides either in the file or in page/swap cache. So when wr-protect a pte, we need to consider none ptes too. We do that by installing the uffd-wp pte markers when necessary. So when there's a future write to the pte, the fault handler will go the special path to first fault-in the page as read-only, then report to userfaultfd server with the wr-protect message. On the other hand, when unprotecting a page, it's also possible that the pte got unmapped but replaced by the special uffd-wp marker. Then we'll need to be able to recover from a uffd-wp pte marker into a none pte, so that the next access to the page will fault in correctly as usual when accessed the next time. Special care needs to be taken throughout the change_protection_range() process. Since now we allow user to wr-protect a none pte, we need to be able to pre-populate the page table entries if we see (!anonymous && MM_CP_UFFD_WP) requests, otherwise change_protection_range() will always skip when the pgtable entry does not exist. For example, the pgtable can be missing for a whole chunk of 2M pmd, but the page cache can exist for the 2M range. When we want to wr-protect one 4K page within the 2M pmd range, we need to pre-populate the pgtable and install the pte marker showing that we want to get a message and block the thread when the page cache of that 4K page is written. Without pre-populating the pmd, change_protection() will simply skip that whole pmd. Note that this patch only covers the small pages (pte level) but not covering any of the transparent huge pages yet. That will be done later, and this patch will be a preparation for it too. Signed-off-by: Peter Xu --- mm/mprotect.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 62 insertions(+), 1 deletion(-) diff --git a/mm/mprotect.c b/mm/mprotect.c index 890bc1f9ca24..be837c4dbc64 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include #include @@ -174,7 +175,16 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, if (pte_swp_uffd_wp(oldpte)) newpte = pte_swp_mkuffd_wp(newpte); } else if (is_pte_marker_entry(entry)) { - /* Skip it, the same as none pte */ + /* + * If this is uffd-wp pte marker and we'd like + * to unprotect it, drop it; the next page + * fault will trigger without uffd trapping. + */ + if (uffd_wp_resolve && + (pte_marker_get(entry) & PTE_MARKER_UFFD_WP)) { + pte_clear(vma->vm_mm, addr, pte); + pages++; + } continue; } else { newpte = oldpte; @@ -189,6 +199,20 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, set_pte_at(vma->vm_mm, addr, pte, newpte); pages++; } + } else { + /* It must be an none page, or what else?.. */ + WARN_ON_ONCE(!pte_none(oldpte)); + if (unlikely(uffd_wp && !vma_is_anonymous(vma))) { + /* + * For file-backed mem, we need to be able to + * wr-protect a none pte, because even if the + * pte is none, the page/swap cache could + * exist. Doing that by install a marker. + */ + set_pte_at(vma->vm_mm, addr, pte, + make_pte_marker(PTE_MARKER_UFFD_WP)); + pages++; + } } } while (pte++, addr += PAGE_SIZE, addr != end); arch_leave_lazy_mmu_mode(); @@ -222,6 +246,39 @@ static inline int pmd_none_or_clear_bad_unless_trans_huge(pmd_t *pmd) return 0; } +/* Return true if we're uffd wr-protecting file-backed memory, or false */ +static inline bool +uffd_wp_protect_file(struct vm_area_struct *vma, unsigned long cp_flags) +{ + return (cp_flags & MM_CP_UFFD_WP) && !vma_is_anonymous(vma); +} + +/* + * If wr-protecting the range for file-backed, populate pgtable for the case + * when pgtable is empty but page cache exists. When {pte|pmd|...}_alloc() + * failed it means no memory, we don't have a better option but stop. + */ +#define change_pmd_prepare(vma, pmd, cp_flags) \ + do { \ + if (unlikely(uffd_wp_protect_file(vma, cp_flags))) { \ + if (WARN_ON_ONCE(pte_alloc(vma->vm_mm, pmd))) \ + break; \ + } \ + } while (0) +/* + * This is the general pud/p4d/pgd version of change_pmd_prepare(). We need to + * have separate change_pmd_prepare() because pte_alloc() returns 0 on success, + * while {pmd|pud|p4d}_alloc() returns the valid pointer on success. + */ +#define change_prepare(vma, high, low, addr, cp_flags) \ + do { \ + if (unlikely(uffd_wp_protect_file(vma, cp_flags))) { \ + low##_t *p = low##_alloc(vma->vm_mm, high, addr); \ + if (WARN_ON_ONCE(p == NULL)) \ + break; \ + } \ + } while (0) + static inline unsigned long change_pmd_range(struct vm_area_struct *vma, pud_t *pud, unsigned long addr, unsigned long end, pgprot_t newprot, unsigned long cp_flags) @@ -240,6 +297,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, next = pmd_addr_end(addr, end); + change_pmd_prepare(vma, pmd, cp_flags); /* * Automatic NUMA balancing walks the tables with mmap_lock * held for read. It's possible a parallel update to occur @@ -305,6 +363,7 @@ static inline unsigned long change_pud_range(struct vm_area_struct *vma, pud = pud_offset(p4d, addr); do { next = pud_addr_end(addr, end); + change_prepare(vma, pud, pmd, addr, cp_flags); if (pud_none_or_clear_bad(pud)) continue; pages += change_pmd_range(vma, pud, addr, next, newprot, @@ -325,6 +384,7 @@ static inline unsigned long change_p4d_range(struct vm_area_struct *vma, p4d = p4d_offset(pgd, addr); do { next = p4d_addr_end(addr, end); + change_prepare(vma, p4d, pud, addr, cp_flags); if (p4d_none_or_clear_bad(p4d)) continue; pages += change_pud_range(vma, p4d, addr, next, newprot, @@ -350,6 +410,7 @@ static unsigned long change_protection_range(struct vm_area_struct *vma, inc_tlb_flush_pending(mm); do { next = pgd_addr_end(addr, end); + change_prepare(vma, pgd, p4d, addr, cp_flags); if (pgd_none_or_clear_bad(pgd)) continue; pages += change_p4d_range(vma, pgd, addr, next, newprot, From patchwork Mon Nov 15 08:00:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12618863 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 949A4C433FE for ; Mon, 15 Nov 2021 08:01:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3CDC163218 for ; Mon, 15 Nov 2021 08:01:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 3CDC163218 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id D40486B007D; Mon, 15 Nov 2021 03:01:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CEF5E6B007E; Mon, 15 Nov 2021 03:01:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BB8D26B0080; Mon, 15 Nov 2021 03:01:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0051.hostedemail.com [216.40.44.51]) by kanga.kvack.org (Postfix) with ESMTP id AC4C86B007D for ; Mon, 15 Nov 2021 03:01:06 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 71B9C8249980 for ; Mon, 15 Nov 2021 08:01:06 +0000 (UTC) X-FDA: 78810418932.26.99C2694 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf06.hostedemail.com (Postfix) with ESMTP id D8B24801A8A8 for ; Mon, 15 Nov 2021 08:01:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636963265; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CT33CO1P4m9z3srlG1n5DsePRxe7FxchG1iBDIBzjEc=; b=Vv+rmHjmWBIATdCl3qzqdi4NPMq6fhB8zLJGeVCERxRPCuLPHs+Z36HnurlNK0KUJ+fsyU TfA20hNsHwbFW7+JEDJqzRXzkwhWLjSqhetJJEw+2O1vQ3yzMC0/pgGQ5Vp6L3H5qm1gYB 2Bs1e+hcjn1LYo5fcEr7JrARIADGfOE= Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-197-z_nWslYOMquK314kRVi8Bg-1; Mon, 15 Nov 2021 03:01:04 -0500 X-MC-Unique: z_nWslYOMquK314kRVi8Bg-1 Received: by mail-pf1-f199.google.com with SMTP id m16-20020a628c10000000b004a282d715b2so4192668pfd.11 for ; Mon, 15 Nov 2021 00:01:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=CT33CO1P4m9z3srlG1n5DsePRxe7FxchG1iBDIBzjEc=; b=PZVsD4SPnvjrOn9YTuIDt7DvnRVUvEUFV/PrZOHzlOGKJ5//EKYo/skaeP+UTFRCVC 8VHGwq1WIDRUkVp8ZDLbc3wmuClHN6D2/ZfTnJ1e8j3vDTeN7QRjHNRpvTSWhK8N0OSu 5cmkwGh2IA+C5hW4pcZ+rDnaBCAfQrPNW5XCCwMFx32gHRwEBTMp394+b+9KBNl9gLUS zMBLD/0nuxrnxG7HjyMggCdrLo/EhZeZdaqPDxHiK75rVrx2qSliz982Hx1DynKcYCpD mmrH0OvQpqiiRUDRtuci7pmRiFmhDL0cj3nSCfFecNySNhk+wdYfrgIPaRhBoLUtEebQ gryg== X-Gm-Message-State: AOAM532PfE2JPbptxDBqJUDduz+5AaTG2h46/lYqtD78ZYjobdQikrP+ eHaNrzNjx7NjEzYBdDwiOlEO/ofdk6BxEGAVqDa57OGhDuD7nHCriwFM6UMtJyLUn2y8Ue31Byx FlnlPi1qxc6rOtAPBboOtwBw6mDB1r43gw2gjiDCC68uKsxUt9MOq9HymaP7J X-Received: by 2002:a17:90a:e60a:: with SMTP id j10mr63679980pjy.169.1636963262812; Mon, 15 Nov 2021 00:01:02 -0800 (PST) X-Google-Smtp-Source: ABdhPJw5fbAzVfzXaTeCmw8VY664lWE42EUqztZ2N89jbHIauCly+ZiaTzVAzB2oLqZABMH3u9HTaw== X-Received: by 2002:a17:90a:e60a:: with SMTP id j10mr63679926pjy.169.1636963262451; Mon, 15 Nov 2021 00:01:02 -0800 (PST) Received: from localhost.localdomain ([94.177.118.89]) by smtp.gmail.com with ESMTPSA id rj8sm2841393pjb.0.2021.11.15.00.00.54 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Nov 2021 00:01:01 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Nadav Amit , peterx@redhat.com, Alistair Popple , Andrew Morton , Mike Kravetz , Mike Rapoport , Matthew Wilcox , Jerome Glisse , Axel Rasmussen , "Kirill A . Shutemov" , David Hildenbrand , Andrea Arcangeli , Hugh Dickins Subject: [PATCH v6 09/23] mm/shmem: Allows file-back mem to be uffd wr-protected on thps Date: Mon, 15 Nov 2021 16:00:48 +0800 Message-Id: <20211115080048.74584-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211115075522.73795-1-peterx@redhat.com> References: <20211115075522.73795-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Vv+rmHjm; spf=none (imf06.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: D8B24801A8A8 X-Stat-Signature: 1r4ib184zwt7ndkkipz5hjyw4aiqysrt X-HE-Tag: 1636963265-732335 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We don't have "huge" version of pte markers, instead when necessary we split the thp. However split the thp is not enough, because file-backed thp is handled totally differently comparing to anonymous thps: rather than doing a real split, the thp pmd will simply got cleared in __split_huge_pmd_locked(). That is not enough if e.g. when there is a thp covers range [0, 2M) but we want to wr-protect small page resides in [4K, 8K) range, because after __split_huge_pmd() returns, there will be a none pmd, and change_pmd_range() will just skip it right after the split. Here we leverage the previously introduced change_pmd_prepare() macro so that we'll populate the pmd with a pgtable page after the pmd split (in which process the pmd will be cleared for cases like shmem). Then change_pte_range() will do all the rest for us by installing the uffd-wp pte marker at any none pte that we'd like to wr-protect. Signed-off-by: Peter Xu --- mm/mprotect.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/mm/mprotect.c b/mm/mprotect.c index be837c4dbc64..0d4bf755cee8 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -319,8 +319,15 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, } if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) { - if (next - addr != HPAGE_PMD_SIZE) { + if ((next - addr != HPAGE_PMD_SIZE) || + uffd_wp_protect_file(vma, cp_flags)) { __split_huge_pmd(vma, pmd, addr, false, NULL); + /* + * For file-backed, the pmd could have been + * cleared; make sure pmd populated if + * necessary, then fall-through to pte level. + */ + change_pmd_prepare(vma, pmd, cp_flags); } else { int nr_ptes = change_huge_pmd(vma, pmd, addr, newprot, cp_flags); From patchwork Mon Nov 15 08:01:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12618865 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D6C6C433F5 for ; Mon, 15 Nov 2021 08:01:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C577963218 for ; Mon, 15 Nov 2021 08:01:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org C577963218 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 661116B007E; Mon, 15 Nov 2021 03:01:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6117A6B0080; Mon, 15 Nov 2021 03:01:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B1D46B008C; Mon, 15 Nov 2021 03:01:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0186.hostedemail.com [216.40.44.186]) by kanga.kvack.org (Postfix) with ESMTP id 3D5FE6B007E for ; Mon, 15 Nov 2021 03:01:21 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 0008F7F90E for ; Mon, 15 Nov 2021 08:01:20 +0000 (UTC) X-FDA: 78810419562.11.0DC18DC Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf25.hostedemail.com (Postfix) with ESMTP id 78FFAB00018D for ; Mon, 15 Nov 2021 08:01:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636963280; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2a8iiYw9cHHedU0g0ICbwjIFnh904x/uUid5qyax/0o=; b=IaQZC/ICGs0jAcIpXLJmc6rvGkjrGO6RzUuGO3mcwd8iHaD8IDsKh8cca7iCUhwuKERk8n ZIHa5jjDa4cLKQLzA9ax46SQ+Q250ennIpwKsWt3K7RBL6RECYOEAWM45Z3qWuV0ra7vXv H890TyLq3TCJ7riwWYHsA9JMvGi+RpM= Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-322-86-xV2RdOxu5YV7xGRoikw-1; Mon, 15 Nov 2021 03:01:19 -0500 X-MC-Unique: 86-xV2RdOxu5YV7xGRoikw-1 Received: by mail-pf1-f197.google.com with SMTP id 184-20020a6217c1000000b0049f9aad0040so9545643pfx.21 for ; Mon, 15 Nov 2021 00:01:18 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2a8iiYw9cHHedU0g0ICbwjIFnh904x/uUid5qyax/0o=; b=GSnUb7gBwbrTGvmUkuiVTr9LaNbQ/3GdYROdZEnteDs/CBhaRXPd3cw668/9DP3dbJ we7UDWtPzb2xkG/BkRA5XGTLVCIZq5L5UOVFmCyud5FClVEYX64YSZCi3u13jO7KLNFv 94xSpolMwgmZG4epeymlNq98b+QWamAqzDUcahmFC7VDehdJU+bN3HpCbskhbhg7JmWM ux/yGRecnkhvihmLlLHe3ShoueLOdSeOBapR5E3rMAxBrObg0M3rCTklz7xG9s6+/3e7 reHWv71ADP9XUelNW4jM4+OYsjUo3SYmvLjVhlVjTd+nZMb+6bjc/asmtdyVOSUAN6Ft nobA== X-Gm-Message-State: AOAM533/Kv8QOY/tIUY6EU5yLB0SaBSa5Y659pVMavW6gGEMOOz3aejq UFTfcDryLUcgrMJg88wmKmiHChQvLrfuBu+1tXu0bvTM8VRDPT6RwU3LQOVGR6PnpuiZu8+lfft NNBI6Fil8ddD2ECOwU5NiW3lNfVOLYax3HJYgwcng89G556pXZA0VvzBp9JaH X-Received: by 2002:a17:90a:e7c4:: with SMTP id kb4mr62908429pjb.237.1636963277508; Mon, 15 Nov 2021 00:01:17 -0800 (PST) X-Google-Smtp-Source: ABdhPJxMaGzKjVGqxW+/p2kNrUGh0+wmkSEQkrNVusG8+B8od3HYJT9hN/7CV3ovfOezj2dLkfDFdA== X-Received: by 2002:a17:90a:e7c4:: with SMTP id kb4mr62908368pjb.237.1636963277100; Mon, 15 Nov 2021 00:01:17 -0800 (PST) Received: from localhost.localdomain ([94.177.118.89]) by smtp.gmail.com with ESMTPSA id ng9sm19694926pjb.4.2021.11.15.00.01.08 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Nov 2021 00:01:16 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Nadav Amit , peterx@redhat.com, Alistair Popple , Andrew Morton , Mike Kravetz , Mike Rapoport , Matthew Wilcox , Jerome Glisse , Axel Rasmussen , "Kirill A . Shutemov" , David Hildenbrand , Andrea Arcangeli , Hugh Dickins Subject: [PATCH v6 10/23] mm/shmem: Handle uffd-wp during fork() Date: Mon, 15 Nov 2021 16:01:03 +0800 Message-Id: <20211115080103.74640-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211115075522.73795-1-peterx@redhat.com> References: <20211115075522.73795-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 78FFAB00018D X-Stat-Signature: 45u9znx4rwqiwyhb5qe6r6e8cqpa4i85 Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="IaQZC/IC"; spf=none (imf25.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1636963267-360764 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Normally we skip copy page when fork() for VM_SHARED shmem, but we can't skip it anymore if uffd-wp is enabled on dst vma. This should only happen when the src uffd has UFFD_FEATURE_EVENT_FORK enabled on uffd-wp shmem vma, so that VM_UFFD_WP will be propagated onto dst vma too, then we should copy the pgtables with uffd-wp bit and pte markers, because these information will be lost otherwise. Since the condition checks will become even more complicated for deciding "whether a vma needs to copy the pgtable during fork()", introduce a helper vma_needs_copy() for it, so everything will be clearer. Signed-off-by: Peter Xu --- mm/memory.c | 49 +++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 41 insertions(+), 8 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index fef6a91c5dfb..cc625c616645 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -859,6 +859,14 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, if (try_restore_exclusive_pte(src_pte, src_vma, addr)) return -EBUSY; return -ENOENT; + } else if (is_pte_marker_entry(entry)) { + /* + * We're copying the pgtable should only because dst_vma has + * uffd-wp enabled, do sanity check. + */ + WARN_ON_ONCE(!userfaultfd_wp(dst_vma)); + set_pte_at(dst_mm, addr, dst_pte, pte); + return 0; } if (!userfaultfd_wp(dst_vma)) pte = pte_swp_clear_uffd_wp(pte); @@ -1227,6 +1235,38 @@ copy_p4d_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, return 0; } +/* + * Return true if the vma needs to copy the pgtable during this fork(). Return + * false when we can speed up fork() by allowing lazy page faults later until + * when the child accesses the memory range. + */ +bool +vma_needs_copy(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) +{ + /* + * Always copy pgtables when dst_vma has uffd-wp enabled even if it's + * file-backed (e.g. shmem). Because when uffd-wp is enabled, pgtable + * contains uffd-wp protection information, that's something we can't + * retrieve from page cache, and skip copying will lose those info. + */ + if (userfaultfd_wp(dst_vma)) + return true; + + if (src_vma->vm_flags & (VM_HUGETLB | VM_PFNMAP | VM_MIXEDMAP)) + return true; + + if (src_vma->anon_vma) + return true; + + /* + * Don't copy ptes where a page fault will fill them correctly. Fork + * becomes much lighter when there are big shared or private readonly + * mappings. The tradeoff is that copy_page_range is more efficient + * than faulting. + */ + return false; +} + int copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) { @@ -1240,14 +1280,7 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) bool is_cow; int ret; - /* - * Don't copy ptes where a page fault will fill them correctly. - * Fork becomes much lighter when there are big shared or private - * readonly mappings. The tradeoff is that copy_page_range is more - * efficient than faulting. - */ - if (!(src_vma->vm_flags & (VM_HUGETLB | VM_PFNMAP | VM_MIXEDMAP)) && - !src_vma->anon_vma) + if (!vma_needs_copy(dst_vma, src_vma)) return 0; if (is_vm_hugetlb_page(src_vma)) From patchwork Mon Nov 15 08:01:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12618867 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16BEEC433F5 for ; Mon, 15 Nov 2021 08:01:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BB88663218 for ; Mon, 15 Nov 2021 08:01:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org BB88663218 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 5C4D16B0080; Mon, 15 Nov 2021 03:01:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 574ED6B008C; Mon, 15 Nov 2021 03:01:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 43CD26B0092; Mon, 15 Nov 2021 03:01:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0132.hostedemail.com [216.40.44.132]) by kanga.kvack.org (Postfix) with ESMTP id 377066B0080 for ; Mon, 15 Nov 2021 03:01:36 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id ED8EA180E936C for ; Mon, 15 Nov 2021 08:01:35 +0000 (UTC) X-FDA: 78810420150.22.ED534B5 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf24.hostedemail.com (Postfix) with ESMTP id 57F39B0000B2 for ; Mon, 15 Nov 2021 08:01:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636963294; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Jyv1lzZ4mkTvzdL9kLWgnj86vlKE/ygyLYqVdoUf6Sw=; b=FgTdnSIUWXS7rBbFCD6/d+2d1s7UE05APElQO8ThoHBEBoeY7CfgkFdX7r/gONQ2Hxf2fN oTocpUF9s3z6Xx+mZpG58theCGgj9ufnMkVvDIHXYnRUhDYe1tOKEOwD9cPTjgrd2wuq2E WnahCi08eH26DddWAhQX2gkvZlUh7aU= Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-81-oszmBop8NmG5tPiRXIR6Sg-1; Mon, 15 Nov 2021 03:01:34 -0500 X-MC-Unique: oszmBop8NmG5tPiRXIR6Sg-1 Received: by mail-pg1-f200.google.com with SMTP id k8-20020a6555c8000000b002e32ed2a021so5008869pgs.1 for ; Mon, 15 Nov 2021 00:01:33 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Jyv1lzZ4mkTvzdL9kLWgnj86vlKE/ygyLYqVdoUf6Sw=; b=p+RC6H3kAYnm8R2nauVB0+yqN+Pq6ubPj334VM8jybf4uHodgn/f5xckxhhFEOEKLb WtL+7ahkAXTq23NaGKdxwji/c4Zp1vvXKvnB3dG5NK6uEdep9AK6PzwfgHqaKd6gL7KF Xm7QiGA1pzUCz3xoZ4le+qTfT4Bdu7DGqxgo38cGgX48KbrZLm2CotYYIxPYBGd4MNCq uIHhLT/97+iRQGoZZ95WbbIWJIzyAU1ew64QPBExoa1lB9bCBJynePV1ZftFT47TS5Ll BBP3BSjegMUR2mhIWqoFAa1De7yWF+yoGn69a5zcVDL+2+lKo4ufqgpiXSS4Ssf4u5jH v3fA== X-Gm-Message-State: AOAM531/8Js9gWDGF+T/IkOLvZA/pjz49dBrdmsJ3nc6wEJlqIgjFiSl uwyNQYM56CdR4CtBOvayOAzeB9g8DqJe7v+gSkQCmuydF9VGL86WYCHZQf9Z3vvID/7W2EH4UGd +aFLKkFVmX2S/h8JyThLiMxvbe/6+nQT/4LRcMoHSHw9RXaKldRZAreiDL13S X-Received: by 2002:a17:902:e294:b0:143:86a8:c56d with SMTP id o20-20020a170902e29400b0014386a8c56dmr33185967plc.22.1636963292228; Mon, 15 Nov 2021 00:01:32 -0800 (PST) X-Google-Smtp-Source: ABdhPJwMIv+3cF09VX35C/GmVeDIlebJd1GQRnqQzuF2YhF3HKAjwkxGT6acz2QFRWDpRQuvSoQbMw== X-Received: by 2002:a17:902:e294:b0:143:86a8:c56d with SMTP id o20-20020a170902e29400b0014386a8c56dmr33185912plc.22.1636963291897; Mon, 15 Nov 2021 00:01:31 -0800 (PST) Received: from localhost.localdomain ([94.177.118.89]) by smtp.gmail.com with ESMTPSA id p20sm14708877pfw.96.2021.11.15.00.01.23 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Nov 2021 00:01:31 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Nadav Amit , peterx@redhat.com, Alistair Popple , Andrew Morton , Mike Kravetz , Mike Rapoport , Matthew Wilcox , Jerome Glisse , Axel Rasmussen , "Kirill A . Shutemov" , David Hildenbrand , Andrea Arcangeli , Hugh Dickins Subject: [PATCH v6 11/23] mm/hugetlb: Introduce huge pte version of uffd-wp helpers Date: Mon, 15 Nov 2021 16:01:17 +0800 Message-Id: <20211115080117.74699-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211115075522.73795-1-peterx@redhat.com> References: <20211115075522.73795-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=FgTdnSIU; spf=none (imf24.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 57F39B0000B2 X-Stat-Signature: upjdo8d4adepdzx5ed8ey33t4rahai9i X-HE-Tag: 1636963295-583483 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: They will be used in the follow up patches to either check/set/clear uffd-wp bit of a huge pte. So far it reuses all the small pte helpers. Archs can overwrite these versions when necessary (with __HAVE_ARCH_HUGE_PTE_UFFD_WP* macros) in the future. Signed-off-by: Peter Xu --- arch/s390/include/asm/hugetlb.h | 15 +++++++++++++++ include/asm-generic/hugetlb.h | 15 +++++++++++++++ 2 files changed, 30 insertions(+) diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetlb.h index 60f9241e5e4a..19c4b4431d27 100644 --- a/arch/s390/include/asm/hugetlb.h +++ b/arch/s390/include/asm/hugetlb.h @@ -115,6 +115,21 @@ static inline pte_t huge_pte_modify(pte_t pte, pgprot_t newprot) return pte_modify(pte, newprot); } +static inline pte_t huge_pte_mkuffd_wp(pte_t pte) +{ + return pte; +} + +static inline pte_t huge_pte_clear_uffd_wp(pte_t pte) +{ + return pte; +} + +static inline int huge_pte_uffd_wp(pte_t pte) +{ + return 0; +} + static inline bool gigantic_page_runtime_supported(void) { return true; diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h index f39cad20ffc6..896f341f614d 100644 --- a/include/asm-generic/hugetlb.h +++ b/include/asm-generic/hugetlb.h @@ -35,6 +35,21 @@ static inline pte_t huge_pte_modify(pte_t pte, pgprot_t newprot) return pte_modify(pte, newprot); } +static inline pte_t huge_pte_mkuffd_wp(pte_t pte) +{ + return pte_mkuffd_wp(pte); +} + +static inline pte_t huge_pte_clear_uffd_wp(pte_t pte) +{ + return pte_clear_uffd_wp(pte); +} + +static inline int huge_pte_uffd_wp(pte_t pte) +{ + return pte_uffd_wp(pte); +} + #ifndef __HAVE_ARCH_HUGE_PTE_CLEAR static inline void huge_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep, unsigned long sz) From patchwork Mon Nov 15 08:01:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12618869 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50D33C433EF for ; Mon, 15 Nov 2021 08:01:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EE91260E8D for ; Mon, 15 Nov 2021 08:01:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org EE91260E8D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 8AB686B008C; Mon, 15 Nov 2021 03:01:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 85A356B0092; Mon, 15 Nov 2021 03:01:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 749E86B0093; Mon, 15 Nov 2021 03:01:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0027.hostedemail.com [216.40.44.27]) by kanga.kvack.org (Postfix) with ESMTP id 6801D6B008C for ; Mon, 15 Nov 2021 03:01:50 -0500 (EST) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 2618F180E936C for ; Mon, 15 Nov 2021 08:01:50 +0000 (UTC) X-FDA: 78810420780.04.901AFAF Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf25.hostedemail.com (Postfix) with ESMTP id 87F87B00018D for ; Mon, 15 Nov 2021 08:01:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636963309; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3JN/qHWTDj3qXyIKJcMZC7BazoBUZifDcHVBgdG1ZHs=; b=CfANeJZ3v2hiBeqeRpIoBikbJu+JbRoPi63LvufvKv/zSjdwEmYNudRuBEGMhaH/p+PBb+ cIaoinMPgbfN3WPVgnFB4iCnYCsR7BNJ+dAURkai0/TvxBMZa8ZJZ14/rq+wd84Z1QDypW 9Y1fo3PsrMo+jLwtQUlcdDKO5d/vcWA= Received: from mail-pg1-f197.google.com (mail-pg1-f197.google.com [209.85.215.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-202-9ob3jgkAMpiQQ1PVBW9Phg-1; Mon, 15 Nov 2021 03:01:48 -0500 X-MC-Unique: 9ob3jgkAMpiQQ1PVBW9Phg-1 Received: by mail-pg1-f197.google.com with SMTP id h35-20020a63f923000000b002d5262fdfc4so8779534pgi.2 for ; Mon, 15 Nov 2021 00:01:48 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=3JN/qHWTDj3qXyIKJcMZC7BazoBUZifDcHVBgdG1ZHs=; b=TyHx1ULsMypXGIdur1zHi7uKRq4J+4sgM9FDuzkcpFYZWdBkfOvFF+vYzlLOuoUj33 myukxkO4Cqz2xGY4v+tkgaM0KKgLwJF2n1fiRm6i1TC21cBZzSLNmhmJrpHco1cQz9WS IsBfat44IbImjHQBga1knXLnHHJvPE88nfTf6Cqd1qmyK6W7UTg2cxE4/s/j6j2tftT0 Dc8HLl0GMY7WG/RrIW6IDQ5+YDcBzOYSWUEKWx8+So/EoWENpM+QlUGIx7JybO5nVVhl PUC0GUuvhN6e+ePsK3pJFjR0dDjStycQ7smzhihzixcplYBEbhzVm/QSHqeS8qVnRcXJ uG+Q== X-Gm-Message-State: AOAM533jkqAlFcXuM7FLtU2R4jKQcIzU+oERm7cRx2ErfIlm3ubF49ve IoLWF/0UYtmuzx8DaZFVoeGEQkcmUHK7OSDbi0Fayb7Z6oZCXi/LMQMvZL0cC0heCL9SvwXYeoY DPWm14h5SoV318ciCVM5Q01shlvOLX4E8fCCoMrLCRxEihr3g5CN6/6UuI1ff X-Received: by 2002:a65:4c01:: with SMTP id u1mr23373808pgq.151.1636963306643; Mon, 15 Nov 2021 00:01:46 -0800 (PST) X-Google-Smtp-Source: ABdhPJxAqokYq6ooU4nDXInwG5sxx6pe6ho32Uv0rSbhETu3lnvghB2QdRC5YvCrwwuikrQwoQlTpg== X-Received: by 2002:a65:4c01:: with SMTP id u1mr23373775pgq.151.1636963306345; Mon, 15 Nov 2021 00:01:46 -0800 (PST) Received: from localhost.localdomain ([94.177.118.89]) by smtp.gmail.com with ESMTPSA id f130sm14450402pfa.81.2021.11.15.00.01.38 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Nov 2021 00:01:45 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Nadav Amit , peterx@redhat.com, Alistair Popple , Andrew Morton , Mike Kravetz , Mike Rapoport , Matthew Wilcox , Jerome Glisse , Axel Rasmussen , "Kirill A . Shutemov" , David Hildenbrand , Andrea Arcangeli , Hugh Dickins Subject: [PATCH v6 12/23] mm/hugetlb: Hook page faults for uffd write protection Date: Mon, 15 Nov 2021 16:01:32 +0800 Message-Id: <20211115080132.74754-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211115075522.73795-1-peterx@redhat.com> References: <20211115075522.73795-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 87F87B00018D X-Stat-Signature: 7czadj4rfokbit4w5yfr9x9hhyjigmcy Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=CfANeJZ3; spf=none (imf25.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1636963296-402301 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hook up hugetlbfs_fault() with the capability to handle userfaultfd-wp faults. We do this slightly earlier than hugetlb_cow() so that we can avoid taking some extra locks that we definitely don't need. Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu --- mm/hugetlb.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e09159c957e3..3a10274b2e39 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5658,6 +5658,25 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, if (unlikely(!pte_same(entry, huge_ptep_get(ptep)))) goto out_ptl; + /* Handle userfault-wp first, before trying to lock more pages */ + if (userfaultfd_wp(vma) && huge_pte_uffd_wp(huge_ptep_get(ptep)) && + (flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry)) { + struct vm_fault vmf = { + .vma = vma, + .address = haddr, + .flags = flags, + }; + + spin_unlock(ptl); + if (pagecache_page) { + unlock_page(pagecache_page); + put_page(pagecache_page); + } + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + i_mmap_unlock_read(mapping); + return handle_userfault(&vmf, VM_UFFD_WP); + } + /* * hugetlb_cow() requires page locks of pte_page(entry) and * pagecache_page, so here we need take the former one From patchwork Mon Nov 15 08:01:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12618871 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 074BEC433EF for ; Mon, 15 Nov 2021 08:02:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 87F9063219 for ; Mon, 15 Nov 2021 08:02:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 87F9063219 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 268856B007B; Mon, 15 Nov 2021 03:02:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 218CE6B0093; Mon, 15 Nov 2021 03:02:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0E1A76B0095; Mon, 15 Nov 2021 03:02:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0170.hostedemail.com [216.40.44.170]) by kanga.kvack.org (Postfix) with ESMTP id ED6B26B0093 for ; Mon, 15 Nov 2021 03:02:06 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id B47878249980 for ; Mon, 15 Nov 2021 08:02:06 +0000 (UTC) X-FDA: 78810421452.08.6CF860F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf27.hostedemail.com (Postfix) with ESMTP id 3C53C70000AD for ; Mon, 15 Nov 2021 08:02:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636963325; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6hmTwTIZrb3ZAEbduDIRoqEEL0oiwJCe8ujMVDm5FPs=; b=XWY9xZSjkw+R82HodG9GBLfxJ9cSLbdYUyr/StXBqtLlgGW6c/r5P/LkTT5i5aBPB5K2DK /42VdsHL5aFCaLNkRfH+I2w/0G+zDVL5kP0UUwClFCccVEm5NmznSkwFzOjyjGev3iAPor dhHaP03aJonmbvDPR1ByZXWTFU8SyDQ= Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-275-RALdWTE5NPug6MYuD0vmUw-1; Mon, 15 Nov 2021 03:02:04 -0500 X-MC-Unique: RALdWTE5NPug6MYuD0vmUw-1 Received: by mail-pl1-f199.google.com with SMTP id k9-20020a170902c40900b001421e921ccaso5769313plk.22 for ; Mon, 15 Nov 2021 00:02:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=6hmTwTIZrb3ZAEbduDIRoqEEL0oiwJCe8ujMVDm5FPs=; b=Bqq7+X0BD1mBuNxP3lT7k5pKJ+cJ2TlQt6nNym9CjAGRSeacI1ep9GZihDHiEal5vw vbOmDqz+9kcBqwZycPi/Rv+bEpsyqhJzyCPHO64UntLGjYnPI2KZO54/uvwJnN6lj47n y6pw/qoGMBRLY6TZVwZR6cK0Y8xY0TvYBAlukZgrMwcCCiMh9zyaCM9ssaniz4V0eouu oIgP+siktFdJmUcuK54ubgPul/+iCuc3fbzzWrJkRXSV4i8wQ6bz0a2VlYLJr4vjXNBl NejzvlpZ6ECDgK4KEu7cH95bxet6sVOuFhWqDSXTutVk8J8iuzm53ENDwpT4lvzg4VND PC7Q== X-Gm-Message-State: AOAM532XVr8qH6seqiE69mODQGwHm9ihVNUQb4GuE1Eogl/bCNNqwX0J +2qwO8EgvwCi0gSmRc0ZoNkw9/LtLt6JexMJ/SAew7BekpwwjcU/BUH+hMhbn9QvBErb1+BN33W jXBl629suFkiWjAFA4Ls7NWWJ0B/v7a5oscUUStM8RhOSR6bdpzDx/2L34J4L X-Received: by 2002:a17:90a:4b47:: with SMTP id o7mr25848459pjl.92.1636963320609; Mon, 15 Nov 2021 00:02:00 -0800 (PST) X-Google-Smtp-Source: ABdhPJzlWxTaC0v+r/YuGJSh03eszu/Jv/h92Vt+54eS7+7+RnqDDaawpklrXtrShKxo0IbKQ4Ql0g== X-Received: by 2002:a17:90a:4b47:: with SMTP id o7mr25848392pjl.92.1636963320125; Mon, 15 Nov 2021 00:02:00 -0800 (PST) Received: from localhost.localdomain ([94.177.118.89]) by smtp.gmail.com with ESMTPSA id f21sm9904786pfe.69.2021.11.15.00.01.51 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Nov 2021 00:01:59 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Nadav Amit , peterx@redhat.com, Alistair Popple , Andrew Morton , Mike Kravetz , Mike Rapoport , Matthew Wilcox , Jerome Glisse , Axel Rasmussen , "Kirill A . Shutemov" , David Hildenbrand , Andrea Arcangeli , Hugh Dickins Subject: [PATCH v6 13/23] mm/hugetlb: Take care of UFFDIO_COPY_MODE_WP Date: Mon, 15 Nov 2021 16:01:46 +0800 Message-Id: <20211115080146.74812-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211115075522.73795-1-peterx@redhat.com> References: <20211115075522.73795-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=XWY9xZSj; spf=none (imf27.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 3C53C70000AD X-Stat-Signature: sxen1apsgr7ajx44i7b7ormgft3s4ic6 X-HE-Tag: 1636963326-991910 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Pass the wp_copy variable into hugetlb_mcopy_atomic_pte() thoughout the stack. Apply the UFFD_WP bit if UFFDIO_COPY_MODE_WP is with UFFDIO_COPY. Hugetlb pages are only managed by hugetlbfs, so we're safe even without setting dirty bit in the huge pte if the page is installed as read-only. However we'd better still keep the dirty bit set for a read-only UFFDIO_COPY pte (when UFFDIO_COPY_MODE_WP bit is set), not only to match what we do with shmem, but also because the page does contain dirty data that the kernel just copied from the userspace. Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 6 ++++-- mm/hugetlb.c | 29 +++++++++++++++++++++++------ mm/userfaultfd.c | 14 +++++++++----- 3 files changed, 36 insertions(+), 13 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 00351ccb49a3..4da0c4b4159a 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -160,7 +160,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, unsigned long dst_addr, unsigned long src_addr, enum mcopy_atomic_mode mode, - struct page **pagep); + struct page **pagep, + bool wp_copy); #endif /* CONFIG_USERFAULTFD */ bool hugetlb_reserve_pages(struct inode *inode, long from, long to, struct vm_area_struct *vma, @@ -355,7 +356,8 @@ static inline int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, unsigned long dst_addr, unsigned long src_addr, enum mcopy_atomic_mode mode, - struct page **pagep) + struct page **pagep, + bool wp_copy) { BUG(); return 0; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3a10274b2e39..8146240eefc6 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5740,7 +5740,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, unsigned long dst_addr, unsigned long src_addr, enum mcopy_atomic_mode mode, - struct page **pagep) + struct page **pagep, + bool wp_copy) { bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE); struct hstate *h = hstate_vma(dst_vma); @@ -5868,7 +5869,12 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, goto out_release_unlock; ret = -EEXIST; - if (!huge_pte_none(huge_ptep_get(dst_pte))) + /* + * We allow to overwrite a pte marker: consider when both MISSING|WP + * registered, we firstly wr-protect a none pte which has no page cache + * page backing it, then access the page. + */ + if (!huge_pte_none_mostly(huge_ptep_get(dst_pte))) goto out_release_unlock; if (vm_shared) { @@ -5878,17 +5884,28 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, hugepage_add_new_anon_rmap(page, dst_vma, dst_addr); } - /* For CONTINUE on a non-shared VMA, don't set VM_WRITE for CoW. */ - if (is_continue && !vm_shared) + /* + * For either: (1) CONTINUE on a non-shared VMA, or (2) UFFDIO_COPY + * with wp flag set, don't set pte write bit. + */ + if (wp_copy || (is_continue && !vm_shared)) writable = 0; else writable = dst_vma->vm_flags & VM_WRITE; _dst_pte = make_huge_pte(dst_vma, page, writable); - if (writable) - _dst_pte = huge_pte_mkdirty(_dst_pte); + /* + * Always mark UFFDIO_COPY page dirty; note that this may not be + * extremely important for hugetlbfs for now since swapping is not + * supported, but we should still be clear in that this page cannot be + * thrown away at will, even if write bit not set. + */ + _dst_pte = huge_pte_mkdirty(_dst_pte); _dst_pte = pte_mkyoung(_dst_pte); + if (wp_copy) + _dst_pte = huge_pte_mkuffd_wp(_dst_pte); + set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte); (void)huge_ptep_set_access_flags(dst_vma, dst_addr, dst_pte, _dst_pte, diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 95e5a9ba3196..6174a212c72f 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -291,7 +291,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - enum mcopy_atomic_mode mode) + enum mcopy_atomic_mode mode, + bool wp_copy) { int vm_shared = dst_vma->vm_flags & VM_SHARED; ssize_t err; @@ -379,7 +380,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, } if (mode != MCOPY_ATOMIC_CONTINUE && - !huge_pte_none(huge_ptep_get(dst_pte))) { + !huge_pte_none_mostly(huge_ptep_get(dst_pte))) { err = -EEXIST; mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); @@ -387,7 +388,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, } err = hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, - dst_addr, src_addr, mode, &page); + dst_addr, src_addr, mode, &page, + wp_copy); mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); @@ -442,7 +444,8 @@ extern ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - enum mcopy_atomic_mode mode); + enum mcopy_atomic_mode mode, + bool wp_copy); #endif /* CONFIG_HUGETLB_PAGE */ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, @@ -562,7 +565,8 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, */ if (is_vm_hugetlb_page(dst_vma)) return __mcopy_atomic_hugetlb(dst_mm, dst_vma, dst_start, - src_start, len, mcopy_mode); + src_start, len, mcopy_mode, + wp_copy); if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma)) goto out_unlock; From patchwork Mon Nov 15 08:02:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12618873 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7870C433EF for ; Mon, 15 Nov 2021 08:02:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 94A5761B97 for ; Mon, 15 Nov 2021 08:02:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 94A5761B97 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 3EFA26B007D; Mon, 15 Nov 2021 03:02:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 39F736B0093; Mon, 15 Nov 2021 03:02:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 267516B0095; Mon, 15 Nov 2021 03:02:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0069.hostedemail.com [216.40.44.69]) by kanga.kvack.org (Postfix) with ESMTP id 185456B007D for ; Mon, 15 Nov 2021 03:02:18 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id CE2B382D90 for ; Mon, 15 Nov 2021 08:02:17 +0000 (UTC) X-FDA: 78810421956.14.8136F44 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf19.hostedemail.com (Postfix) with ESMTP id 66E01B000093 for ; Mon, 15 Nov 2021 08:02:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636963336; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/Iv5CypmvIVE6oMVapm0QBWFP5IxCJXl/Bg1UzUVmko=; b=GqIdtyVnWnmYfMPtmeaD6NqUEoFAxXF46atsus5grGBG5zA1Fi/ccq92+BXw1QuYDMFs0a J6d6rcSPS0lr+EPKigDWkT+tCwpYzgXF/EGCdG7FSjHrQ/QLxWcAayYtKKrRt9DoQ+0F1+ o+t/KM7TL6sHjT+Gt8KxaJ6tzPmw44I= Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-601-SAseUZbsNwur529P4IiJ4Q-1; Mon, 15 Nov 2021 03:02:15 -0500 X-MC-Unique: SAseUZbsNwur529P4IiJ4Q-1 Received: by mail-pf1-f197.google.com with SMTP id y124-20020a623282000000b0047a09271e49so9557926pfy.16 for ; Mon, 15 Nov 2021 00:02:15 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/Iv5CypmvIVE6oMVapm0QBWFP5IxCJXl/Bg1UzUVmko=; b=zU8/KA+6SxeWRsJmJAqC8EJeB4+ncME1nFymd/I1NcAW3cMu1Ft6T9dwmsbmFSMNDm XOJ9/QSENms1NNWjeK4cFLqHn+Lou9jf/5OxrNuVCTtFju3S8RL+Y7rWVxnhmsbNwL/q kgBHHNKefl91BgfnnT+ttDJ+GG/KLgSKQet2o6J62t2iP6TO2plOqf/0PuWncJ4IbZJ1 03nXJWEVB1NWaJbW03G4M4w6tvBloAWigt0JmZFMaKxoVJ8+K9SEmEgIolbzDukf4XNq CkEdwl0GX91BB1nqQ8CJ6GmkeRxOYLyvpfkA92Z3BolM2UJCUJ6NkReOUIaqd+kCi+Pp ezbQ== X-Gm-Message-State: AOAM533oWrRKMYQNeTMS5djP6SkftoDuljs7EOLqnbmmd3F9NRGUKt/U 7MSERK8vXzYPzb83vZeMgJML2xDLy8fW9EQJScld/P3wYU1+aw2lAIFhCYMHSTjHtE3RApNM+TI M8JO941eTwCFIRvJl85Rmuo1D7vMMKnPZ5Jl2E6qUx8dykNSF+ZajiTZrhX/t X-Received: by 2002:a17:90b:38c5:: with SMTP id nn5mr61485332pjb.220.1636963334144; Mon, 15 Nov 2021 00:02:14 -0800 (PST) X-Google-Smtp-Source: ABdhPJyNSkZ/vww+jPhvco2wNnlsy70sP9i0ZGZit2163wehM7xneRAKVX66NU/eBtol2Mp8mRmPcQ== X-Received: by 2002:a17:90b:38c5:: with SMTP id nn5mr61485283pjb.220.1636963333787; Mon, 15 Nov 2021 00:02:13 -0800 (PST) Received: from localhost.localdomain ([94.177.118.89]) by smtp.gmail.com with ESMTPSA id h25sm10459878pgm.33.2021.11.15.00.02.06 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Nov 2021 00:02:13 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Nadav Amit , peterx@redhat.com, Alistair Popple , Andrew Morton , Mike Kravetz , Mike Rapoport , Matthew Wilcox , Jerome Glisse , Axel Rasmussen , "Kirill A . Shutemov" , David Hildenbrand , Andrea Arcangeli , Hugh Dickins Subject: [PATCH v6 14/23] mm/hugetlb: Handle UFFDIO_WRITEPROTECT Date: Mon, 15 Nov 2021 16:02:00 +0800 Message-Id: <20211115080200.74866-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211115075522.73795-1-peterx@redhat.com> References: <20211115075522.73795-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 66E01B000093 X-Stat-Signature: yswo9emrwc1h67quqgftoti9con3u5t4 Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=GqIdtyVn; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf19.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=peterx@redhat.com X-HE-Tag: 1636963327-196486 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This starts from passing cp_flags into hugetlb_change_protection() so hugetlb will be able to handle MM_CP_UFFD_WP[_RESOLVE] requests. huge_pte_clear_uffd_wp() is introduced to handle the case where the UFFDIO_WRITEPROTECT is requested upon migrating huge page entries. Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 6 ++++-- mm/hugetlb.c | 13 ++++++++++++- mm/mprotect.c | 3 ++- mm/userfaultfd.c | 8 ++++++++ 4 files changed, 26 insertions(+), 4 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 4da0c4b4159a..a46011510e49 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -210,7 +210,8 @@ struct page *follow_huge_pgd(struct mm_struct *mm, unsigned long address, int pmd_huge(pmd_t pmd); int pud_huge(pud_t pud); unsigned long hugetlb_change_protection(struct vm_area_struct *vma, - unsigned long address, unsigned long end, pgprot_t newprot); + unsigned long address, unsigned long end, pgprot_t newprot, + unsigned long cp_flags); bool is_hugetlb_entry_migration(pte_t pte); void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); @@ -391,7 +392,8 @@ static inline void move_hugetlb_state(struct page *oldpage, static inline unsigned long hugetlb_change_protection( struct vm_area_struct *vma, unsigned long address, - unsigned long end, pgprot_t newprot) + unsigned long end, pgprot_t newprot, + unsigned long cp_flags) { return 0; } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8146240eefc6..7fc213c0ebf8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6127,7 +6127,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, } unsigned long hugetlb_change_protection(struct vm_area_struct *vma, - unsigned long address, unsigned long end, pgprot_t newprot) + unsigned long address, unsigned long end, + pgprot_t newprot, unsigned long cp_flags) { struct mm_struct *mm = vma->vm_mm; unsigned long start = address; @@ -6137,6 +6138,8 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, unsigned long pages = 0; bool shared_pmd = false; struct mmu_notifier_range range; + bool uffd_wp = cp_flags & MM_CP_UFFD_WP; + bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE; /* * In the case of shared PMDs, the area to flush could be beyond @@ -6178,6 +6181,10 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, entry = make_readable_migration_entry( swp_offset(entry)); newpte = swp_entry_to_pte(entry); + if (uffd_wp) + newpte = pte_swp_mkuffd_wp(newpte); + else if (uffd_wp_resolve) + newpte = pte_swp_clear_uffd_wp(newpte); set_huge_swap_pte_at(mm, address, ptep, newpte, huge_page_size(h)); pages++; @@ -6192,6 +6199,10 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, old_pte = huge_ptep_modify_prot_start(vma, address, ptep); pte = pte_mkhuge(huge_pte_modify(old_pte, newprot)); pte = arch_make_huge_pte(pte, shift, vma->vm_flags); + if (uffd_wp) + pte = huge_pte_mkuffd_wp(huge_pte_wrprotect(pte)); + else if (uffd_wp_resolve) + pte = huge_pte_clear_uffd_wp(pte); huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); pages++; } diff --git a/mm/mprotect.c b/mm/mprotect.c index 0d4bf755cee8..1cc4a6d1886b 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -441,7 +441,8 @@ unsigned long change_protection(struct vm_area_struct *vma, unsigned long start, BUG_ON((cp_flags & MM_CP_UFFD_WP_ALL) == MM_CP_UFFD_WP_ALL); if (is_vm_hugetlb_page(vma)) - pages = hugetlb_change_protection(vma, start, end, newprot); + pages = hugetlb_change_protection(vma, start, end, newprot, + cp_flags); else pages = change_protection_range(vma, start, end, newprot, cp_flags); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 6174a212c72f..037f82719e64 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -690,6 +690,7 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, atomic_t *mmap_changing) { struct vm_area_struct *dst_vma; + unsigned long page_mask; pgprot_t newprot; int err; @@ -726,6 +727,13 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, if (!vma_is_anonymous(dst_vma)) goto out_unlock; + if (is_vm_hugetlb_page(dst_vma)) { + err = -EINVAL; + page_mask = vma_kernel_pagesize(dst_vma) - 1; + if ((start & page_mask) || (len & page_mask)) + goto out_unlock; + } + if (enable_wp) newprot = vm_get_page_prot(dst_vma->vm_flags & ~(VM_WRITE)); else From patchwork Mon Nov 15 08:02:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12618875 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 705E2C433F5 for ; Mon, 15 Nov 2021 08:02:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0F16C61B97 for ; Mon, 15 Nov 2021 08:02:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 0F16C61B97 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id A09D76B007E; Mon, 15 Nov 2021 03:02:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9B9EC6B0093; Mon, 15 Nov 2021 03:02:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 881A26B0095; Mon, 15 Nov 2021 03:02:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0004.hostedemail.com [216.40.44.4]) by kanga.kvack.org (Postfix) with ESMTP id 7728A6B007E for ; Mon, 15 Nov 2021 03:02:31 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 40EDE8249980 for ; Mon, 15 Nov 2021 08:02:31 +0000 (UTC) X-FDA: 78810422502.21.3865C66 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf17.hostedemail.com (Postfix) with ESMTP id D33EEF00039B for ; Mon, 15 Nov 2021 08:02:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636963350; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=juo8CnrknUVx2CU+cPYbVJwKflQnt9kXKdPVkj+rnzk=; b=Y9DeJv48eKe/LYmwDk5MNDecn0k0DFZxdmS4o8l2uvoNkfGojOGF0LOessvamUiXGvQi9L lGEpaGftViRqo144K5oLNmG+ke2YavP96JdWPMUQY/fjUFW28iO1UaCc4Ru+R1T9cpFWRr anD1sfQ0J1nfQRgbma8vPpL3jAHi58A= Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-401-I-7Ci8rmMwyY1hp7cUY7Dg-1; Mon, 15 Nov 2021 03:02:29 -0500 X-MC-Unique: I-7Ci8rmMwyY1hp7cUY7Dg-1 Received: by mail-pg1-f200.google.com with SMTP id e4-20020a630f04000000b002cc40fe16afso8733860pgl.23 for ; Mon, 15 Nov 2021 00:02:29 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=juo8CnrknUVx2CU+cPYbVJwKflQnt9kXKdPVkj+rnzk=; b=3+tyY+2YKph6ozwGBikeAjE46NtUg7u5Dl1QWaip290JwmOL0uDc+g3OSIdS7HTtaL Bvhd0dnPJXmyeeKrKfFyd+1m/QjNJ0L6fGQdx7dwkDe7OgYzSDSMIg34NSF0LHbO/x/d r1J/vYbtYSqzcAqSOe5OTPY5SYoPKGIYI+3imvlTCpVbDZbPG4ffL8C/I0RxECY7gczc VW19yt8CjZPZ4DMma5riwbQfXE8sCuCFeTdxK8STqOLzSMsG4xmVsxZs/XCQq587PmnH OJfoiE0sp7DqaglH2NZZl58VNpnIlzcIzhFNWIEE3x9bNqXtsLLTPwf4DVh3YcldfK2M tZjA== X-Gm-Message-State: AOAM532Xv67p4TVNkR2cGGlTm/6S4L5Y4s/X+tG8AQVKtORcEr67Q8m+ EHpgqx+rgY6ZZoffekHm6zCa+fo1sRPyhQS3ZkdlwkMudeWjsoXluZGoDkgsgzZHtgbokM7gM9Y bA23Wtld23QSO7fjtCuidW+2jUe5y9gqHK0Pl4ADyo6oQhNN8opXlmsGWsZZ7 X-Received: by 2002:a17:902:e804:b0:142:1c0b:c2a6 with SMTP id u4-20020a170902e80400b001421c0bc2a6mr32648884plg.23.1636963347946; Mon, 15 Nov 2021 00:02:27 -0800 (PST) X-Google-Smtp-Source: ABdhPJwDNuRUr56R2HgkYgfQqjH6+BcRz0mtYQUWUhFx5NvN9CbgJnvkwVUYkq/DiyMnk9FcFK3PfQ== X-Received: by 2002:a17:902:e804:b0:142:1c0b:c2a6 with SMTP id u4-20020a170902e80400b001421c0bc2a6mr32648832plg.23.1636963347598; Mon, 15 Nov 2021 00:02:27 -0800 (PST) Received: from localhost.localdomain ([94.177.118.89]) by smtp.gmail.com with ESMTPSA id b4sm14912250pfl.60.2021.11.15.00.02.19 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Nov 2021 00:02:27 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Nadav Amit , peterx@redhat.com, Alistair Popple , Andrew Morton , Mike Kravetz , Mike Rapoport , Matthew Wilcox , Jerome Glisse , Axel Rasmussen , "Kirill A . Shutemov" , David Hildenbrand , Andrea Arcangeli , Hugh Dickins Subject: [PATCH v6 15/23] mm/hugetlb: Handle pte markers in page faults Date: Mon, 15 Nov 2021 16:02:14 +0800 Message-Id: <20211115080214.74926-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211115075522.73795-1-peterx@redhat.com> References: <20211115075522.73795-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: D33EEF00039B X-Stat-Signature: bbm6bo3uipk6b68agurni43x4tmei7tt Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Y9DeJv48; spf=none (imf17.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1636963350-360407 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Allow hugetlb code to handle pte markers just like none ptes. It's mostly there, we just need to make sure we don't assume hugetlb_no_page() only handles none pte, so when detecting pte change we should use pte_same() rather than pte_none(). We need to pass in the old_pte to do the comparison. Check the original pte to see whether it's a pte marker, if it is, we should recover uffd-wp bit on the new pte to be installed, so that the next write will be trapped by uffd. Signed-off-by: Peter Xu --- mm/hugetlb.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 7fc213c0ebf8..e8d01277af0f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5361,7 +5361,8 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma, static vm_fault_t hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma, struct address_space *mapping, pgoff_t idx, - unsigned long address, pte_t *ptep, unsigned int flags) + unsigned long address, pte_t *ptep, + pte_t old_pte, unsigned int flags) { struct hstate *h = hstate_vma(vma); vm_fault_t ret = VM_FAULT_SIGBUS; @@ -5487,7 +5488,8 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, ptl = huge_pte_lock(h, mm, ptep); ret = 0; - if (!huge_pte_none(huge_ptep_get(ptep))) + /* If pte changed from under us, retry */ + if (!pte_same(huge_ptep_get(ptep), old_pte)) goto backout; if (anon_rmap) { @@ -5497,6 +5499,12 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, page_dup_rmap(page, true); new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE) && (vma->vm_flags & VM_SHARED))); + /* + * If this pte was previously wr-protected, keep it wr-protected even + * if populated. + */ + if (unlikely(is_pte_marker_uffd_wp(old_pte))) + new_pte = huge_pte_wrprotect(huge_pte_mkuffd_wp(new_pte)); set_huge_pte_at(mm, haddr, ptep, new_pte); hugetlb_count_add(pages_per_huge_page(h), mm); @@ -5614,8 +5622,10 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, mutex_lock(&hugetlb_fault_mutex_table[hash]); entry = huge_ptep_get(ptep); - if (huge_pte_none(entry)) { - ret = hugetlb_no_page(mm, vma, mapping, idx, address, ptep, flags); + /* PTE markers should be handled the same way as none pte */ + if (huge_pte_none_mostly(entry)) { + ret = hugetlb_no_page(mm, vma, mapping, idx, address, ptep, + entry, flags); goto out_mutex; } From patchwork Mon Nov 15 08:02:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12618877 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5606FC433EF for ; Mon, 15 Nov 2021 08:02:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 078E561B97 for ; Mon, 15 Nov 2021 08:02:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 078E561B97 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 992F86B0093; Mon, 15 Nov 2021 03:02:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 943A36B0095; Mon, 15 Nov 2021 03:02:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 80BC06B0096; Mon, 15 Nov 2021 03:02:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0190.hostedemail.com [216.40.44.190]) by kanga.kvack.org (Postfix) with ESMTP id 6F8EF6B0093 for ; Mon, 15 Nov 2021 03:02:46 -0500 (EST) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 318F082D90 for ; Mon, 15 Nov 2021 08:02:46 +0000 (UTC) X-FDA: 78810423132.04.D68CF34 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf13.hostedemail.com (Postfix) with ESMTP id 4FC0210512E0 for ; Mon, 15 Nov 2021 08:02:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636963365; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qkupakg3emmkCuPdLA0U47j3HM8dJoa5zeaf2nr8YrE=; b=Zet983ylOvpWbj+2+1Z4z7P6KzkVddp9+cGJDN6IDlhIP2J8XZLc/MpdPnz7l7mkwcIo0B RBBgDIKloVFknIQSiJyWl9kCEu0Z1TY5E2mrk8/9Ib8kLN/+FBhik62hPzbEQXRpwCy+8e AT3GB1qMDjSQR9ijD2308tM+nbxzdio= Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-580-HNKxxPbAP_SSiqUZTq6f5g-1; Mon, 15 Nov 2021 03:02:44 -0500 X-MC-Unique: HNKxxPbAP_SSiqUZTq6f5g-1 Received: by mail-pl1-f197.google.com with SMTP id l3-20020a170902f68300b00142892d0a86so5807273plg.13 for ; Mon, 15 Nov 2021 00:02:44 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=qkupakg3emmkCuPdLA0U47j3HM8dJoa5zeaf2nr8YrE=; b=Q7FXW4b0OU4SBpqA9DlyfUp8wPzbq/bsciDt8mOstTNqqhUEsOsIBzMMvQK8cEwuAZ aFtS0dP7s/JQispzV8vmr4/YP/2aGK6Xd3Nps/IxEoJSS9ZC0aKPpnW2nNo+g50YWAe1 115alMOrRbfZM+CdfJFibE+55M8uKsa24efrhkIz7loJqqhzwhpCZFrD8xfEYnAgE8ko az6nZpwwQ7umr2u9xF19Qz/dL7YqsAStije/2aGFCUo9Fp/ueB6dIwSH9e2X23DQAJU2 BTFwPPVwHNMDgWgA0nbdlBjh1x+LyglWAHdeKhUQ8lO6JL8JWZvgg+h6WMp/WaTDb63L lywQ== X-Gm-Message-State: AOAM532ci1J733Ihwz9wqP1UJYpDVFM3y2G+MgSI2ahlktMk61UUcvTb 4Ky3D+nb6vn+JrNsZR9JToiy/Va5HzPOX7sG661Ru3Qrc6ZdagozHZ+mYvmwdhvlq0r/23ZvvS8 f4LiD6Xdk3MonP2mWE/RvswLeCOB5kzS1A303bQPstqoNyxohBXt/esp8Qifq X-Received: by 2002:aa7:8019:0:b0:44d:d761:6f79 with SMTP id j25-20020aa78019000000b0044dd7616f79mr31742121pfi.3.1636963362660; Mon, 15 Nov 2021 00:02:42 -0800 (PST) X-Google-Smtp-Source: ABdhPJyv5uP4O5V4xtiTKpwFZqqVXpd4X2FyO5vCLmFS1r+O+rAVPgEZZOgFH+PNonWxXoKguHRPXQ== X-Received: by 2002:aa7:8019:0:b0:44d:d761:6f79 with SMTP id j25-20020aa78019000000b0044dd7616f79mr31742074pfi.3.1636963362297; Mon, 15 Nov 2021 00:02:42 -0800 (PST) Received: from localhost.localdomain ([94.177.118.89]) by smtp.gmail.com with ESMTPSA id 95sm11508978pjo.2.2021.11.15.00.02.34 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Nov 2021 00:02:41 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Nadav Amit , peterx@redhat.com, Alistair Popple , Andrew Morton , Mike Kravetz , Mike Rapoport , Matthew Wilcox , Jerome Glisse , Axel Rasmussen , "Kirill A . Shutemov" , David Hildenbrand , Andrea Arcangeli , Hugh Dickins Subject: [PATCH v6 16/23] mm/hugetlb: Allow uffd wr-protect none ptes Date: Mon, 15 Nov 2021 16:02:28 +0800 Message-Id: <20211115080228.74982-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211115075522.73795-1-peterx@redhat.com> References: <20211115075522.73795-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 4FC0210512E0 X-Stat-Signature: uhqdabi57iuqwnhgf1coyouy6saw5j5n Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Zet983yl; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf13.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com X-HE-Tag: 1636963353-125835 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Teach hugetlbfs code to wr-protect none ptes just in case the page cache existed for that pte. Meanwhile we also need to be able to recognize a uffd-wp marker pte and remove it for uffd_wp_resolve. Since at it, introduce a variable "psize" to replace all references to the huge page size fetcher. Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu --- mm/hugetlb.c | 28 ++++++++++++++++++++++++---- 1 file changed, 24 insertions(+), 4 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e8d01277af0f..bba2ede5f6dc 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6145,7 +6145,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, pte_t *ptep; pte_t pte; struct hstate *h = hstate_vma(vma); - unsigned long pages = 0; + unsigned long pages = 0, psize = huge_page_size(h); bool shared_pmd = false; struct mmu_notifier_range range; bool uffd_wp = cp_flags & MM_CP_UFFD_WP; @@ -6165,13 +6165,19 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, mmu_notifier_invalidate_range_start(&range); i_mmap_lock_write(vma->vm_file->f_mapping); - for (; address < end; address += huge_page_size(h)) { + for (; address < end; address += psize) { spinlock_t *ptl; - ptep = huge_pte_offset(mm, address, huge_page_size(h)); + ptep = huge_pte_offset(mm, address, psize); if (!ptep) continue; ptl = huge_pte_lock(h, mm, ptep); if (huge_pmd_unshare(mm, vma, &address, ptep)) { + /* + * When uffd-wp is enabled on the vma, unshare + * shouldn't happen at all. Warn about it if it + * happened due to some reason. + */ + WARN_ON_ONCE(uffd_wp || uffd_wp_resolve); pages++; spin_unlock(ptl); shared_pmd = true; @@ -6196,12 +6202,20 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, else if (uffd_wp_resolve) newpte = pte_swp_clear_uffd_wp(newpte); set_huge_swap_pte_at(mm, address, ptep, - newpte, huge_page_size(h)); + newpte, psize); pages++; } spin_unlock(ptl); continue; } + if (unlikely(is_pte_marker_uffd_wp(pte))) { + /* + * This is changing a non-present pte into a none pte, + * no need for huge_ptep_modify_prot_start/commit(). + */ + if (uffd_wp_resolve) + huge_pte_clear(mm, address, ptep, psize); + } if (!huge_pte_none(pte)) { pte_t old_pte; unsigned int shift = huge_page_shift(hstate_vma(vma)); @@ -6215,6 +6229,12 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, pte = huge_pte_clear_uffd_wp(pte); huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); pages++; + } else { + /* None pte */ + if (unlikely(uffd_wp)) + /* Safe to modify directly (none->non-present). */ + set_huge_pte_at(mm, address, ptep, + make_pte_marker(PTE_MARKER_UFFD_WP)); } spin_unlock(ptl); } From patchwork Mon Nov 15 08:02:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12618879 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 248E2C433EF for ; Mon, 15 Nov 2021 08:03:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BF8F463218 for ; Mon, 15 Nov 2021 08:03:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org BF8F463218 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 5EA406B0098; Mon, 15 Nov 2021 03:03:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 59A986B0099; Mon, 15 Nov 2021 03:03:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 463F76B009A; Mon, 15 Nov 2021 03:03:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0028.hostedemail.com [216.40.44.28]) by kanga.kvack.org (Postfix) with ESMTP id 3856D6B0098 for ; Mon, 15 Nov 2021 03:03:00 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 000E1181327AA for ; Mon, 15 Nov 2021 08:02:59 +0000 (UTC) X-FDA: 78810423720.03.E9BC11C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf28.hostedemail.com (Postfix) with ESMTP id 75D7E90000AF for ; Mon, 15 Nov 2021 08:02:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636963378; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YHxegeFAQpAm7QNoFX5FWZJCYVnUR8APDp1wCGQBN2s=; b=NxsVeVAsH3mrEoJdhR+oUCg7kbLYVcYwDK+AxUeBGTVYlMBltuRhMV2fyokhZuzxX5fW// O+daUJeHhBtBv9+D4GqbbtUGunTahXR0CCj5GFUpMxHzCv61cXkde9czNg+YWxEHfMeb8r boowzOVtoKQfGWwbqhxXofOEpqiX9I0= Received: from mail-pj1-f70.google.com (mail-pj1-f70.google.com [209.85.216.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-60-AVVtOiesNyWH97t_XcBc6A-1; Mon, 15 Nov 2021 03:02:57 -0500 X-MC-Unique: AVVtOiesNyWH97t_XcBc6A-1 Received: by mail-pj1-f70.google.com with SMTP id x18-20020a17090a789200b001a7317f995cso8232533pjk.4 for ; Mon, 15 Nov 2021 00:02:57 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=YHxegeFAQpAm7QNoFX5FWZJCYVnUR8APDp1wCGQBN2s=; b=jL0qJHybr0z0Ttddkdt8v/x2BcVnEfvpQ09CD4IQwZ6AckWeSJc8Bj8vhKvwBjVvrb bQ2vPRdOj9LwszmmFmR0zlsZVuTdYaFNCE/po71JjVAkERLZi8HJ+aKtKnkYDqXVLaPr eTh6+KhPlGxm6XKN8V2LCdtK9QGwt08pqOuersqWAboTfyQBFR/dIWzUPSUYCbyx8feX V2qgxHGeyq5VzH/ALuXE7PXjqw8ijJ45cK+CPdd+kh/9eEdRdhCGml7ajnQtF/6RxdTw aKlYdjvGJmoC0QptuDbWk5ZEYdMAU9jFK+NBobHZkjtVL6ClMvcEouMrz81AEkP2M/Co J0lQ== X-Gm-Message-State: AOAM532eKThgGBCOdWKMDvIFTMu1COLVZ8Qv0K2XjR3Oj2eZpu80LD2M Gxb/xpQUJTaflHmfC7xOo99VLeeN3k07eGCsKXCNRWDX6MVnpPwCJ3pd7SKj7d2EcopzOrq/kzw 2rgSR7WO0gW+BkQ+5PFTEHVkOT7fRQ4JAha14NSM6hrni0NrcGgl7dh9cUzwg X-Received: by 2002:a17:902:be06:b0:142:5a21:9e8a with SMTP id r6-20020a170902be0600b001425a219e8amr33100066pls.17.1636963376339; Mon, 15 Nov 2021 00:02:56 -0800 (PST) X-Google-Smtp-Source: ABdhPJzi3mzgc10Ws3LtJnt+vKzwU+NxNytHynLNU3P3cefT6vAfJ+/+/0CGsrQ9Zf5D1jVuVMeZpQ== X-Received: by 2002:a17:902:be06:b0:142:5a21:9e8a with SMTP id r6-20020a170902be0600b001425a219e8amr33100014pls.17.1636963375845; Mon, 15 Nov 2021 00:02:55 -0800 (PST) Received: from localhost.localdomain ([94.177.118.89]) by smtp.gmail.com with ESMTPSA id d2sm15074317pfj.42.2021.11.15.00.02.48 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Nov 2021 00:02:55 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Nadav Amit , peterx@redhat.com, Alistair Popple , Andrew Morton , Mike Kravetz , Mike Rapoport , Matthew Wilcox , Jerome Glisse , Axel Rasmussen , "Kirill A . Shutemov" , David Hildenbrand , Andrea Arcangeli , Hugh Dickins Subject: [PATCH v6 17/23] mm/hugetlb: Only drop uffd-wp special pte if required Date: Mon, 15 Nov 2021 16:02:43 +0800 Message-Id: <20211115080243.75040-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211115075522.73795-1-peterx@redhat.com> References: <20211115075522.73795-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 75D7E90000AF X-Stat-Signature: xp98zojxm6grteouc14pnr5rftstqih9 Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=NxsVeVAs; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf28.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com X-HE-Tag: 1636963379-227485 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: As with shmem uffd-wp special ptes, only drop the uffd-wp special swap pte if unmapping an entire vma or synchronized such that faults can not race with the unmap operation. This requires passing zap_flags all the way to the lowest level hugetlb unmap routine: __unmap_hugepage_range. In general, unmap calls originated in hugetlbfs code will pass the ZAP_FLAG_DROP_MARKER flag as synchronization is in place to prevent faults. The exception is hole punch which will first unmap without any synchronization. Later when hole punch actually removes the page from the file, it will check to see if there was a subsequent fault and if so take the hugetlb fault mutex while unmapping again. This second unmap will pass in ZAP_FLAG_DROP_MARKER. The justification of "whether to apply ZAP_FLAG_DROP_MARKER flag when unmap a hugetlb range" is (IMHO): we should never reach a state when a page fault could errornously fault in a page-cache page that was wr-protected to be writable, even in an extremely short period. That could happen if e.g. we pass ZAP_FLAG_DROP_MARKER when hugetlbfs_punch_hole() calls hugetlb_vmdelete_list(), because if a page faults after that call and before remove_inode_hugepages() is executed, the page cache can be mapped writable again in the small racy window, that can cause unexpected data overwritten. Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu --- fs/hugetlbfs/inode.c | 15 +++++++++------ include/linux/hugetlb.h | 8 +++++--- mm/hugetlb.c | 33 +++++++++++++++++++++++++-------- mm/memory.c | 5 ++++- 4 files changed, 43 insertions(+), 18 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 49d2e686be74..92c8d1a47404 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -404,7 +404,8 @@ static void remove_huge_page(struct page *page) } static void -hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end) +hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, + unsigned long zap_flags) { struct vm_area_struct *vma; @@ -437,7 +438,7 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end) } unmap_hugepage_range(vma, vma->vm_start + v_offset, v_end, - NULL); + NULL, zap_flags); } } @@ -515,7 +516,8 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, mutex_lock(&hugetlb_fault_mutex_table[hash]); hugetlb_vmdelete_list(&mapping->i_mmap, index * pages_per_huge_page(h), - (index + 1) * pages_per_huge_page(h)); + (index + 1) * pages_per_huge_page(h), + ZAP_FLAG_DROP_MARKER); i_mmap_unlock_write(mapping); } @@ -581,7 +583,8 @@ static void hugetlb_vmtruncate(struct inode *inode, loff_t offset) i_mmap_lock_write(mapping); i_size_write(inode, offset); if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) - hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0); + hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0, + ZAP_FLAG_DROP_MARKER); i_mmap_unlock_write(mapping); remove_inode_hugepages(inode, offset, LLONG_MAX); } @@ -614,8 +617,8 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) i_mmap_lock_write(mapping); if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) hugetlb_vmdelete_list(&mapping->i_mmap, - hole_start >> PAGE_SHIFT, - hole_end >> PAGE_SHIFT); + hole_start >> PAGE_SHIFT, + hole_end >> PAGE_SHIFT, 0); i_mmap_unlock_write(mapping); remove_inode_hugepages(inode, hole_start, hole_end); inode_unlock(inode); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index a46011510e49..4c3ea7ee8ce8 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -143,11 +143,12 @@ long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, unsigned long *, unsigned long *, long, unsigned int, int *); void unmap_hugepage_range(struct vm_area_struct *, - unsigned long, unsigned long, struct page *); + unsigned long, unsigned long, struct page *, + unsigned long); void __unmap_hugepage_range_final(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, unsigned long end, - struct page *ref_page); + struct page *ref_page, unsigned long zap_flags); void hugetlb_report_meminfo(struct seq_file *); int hugetlb_report_node_meminfo(char *buf, int len, int nid); void hugetlb_show_meminfo(void); @@ -400,7 +401,8 @@ static inline unsigned long hugetlb_change_protection( static inline void __unmap_hugepage_range_final(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page) + unsigned long end, struct page *ref_page, + unsigned long zap_flags) { BUG(); } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index bba2ede5f6dc..16fb9cd8d9c5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4926,7 +4926,7 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, unsigned long end, - struct page *ref_page) + struct page *ref_page, unsigned long zap_flags) { struct mm_struct *mm = vma->vm_mm; unsigned long address; @@ -4983,7 +4983,18 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct * unmapped and its refcount is dropped, so just clear pte here. */ if (unlikely(!pte_present(pte))) { - huge_pte_clear(mm, address, ptep, sz); + /* + * If the pte was wr-protected by uffd-wp in any of the + * swap forms, meanwhile the caller does not want to + * drop the uffd-wp bit in this zap, then replace the + * pte with a marker. + */ + if (pte_swp_uffd_wp_any(pte) && + !(zap_flags & ZAP_FLAG_DROP_MARKER)) + set_huge_pte_at(mm, address, ptep, + make_pte_marker(PTE_MARKER_UFFD_WP)); + else + huge_pte_clear(mm, address, ptep, sz); spin_unlock(ptl); continue; } @@ -5011,7 +5022,11 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct tlb_remove_huge_tlb_entry(h, tlb, ptep, address); if (huge_pte_dirty(pte)) set_page_dirty(page); - + /* Leave a uffd-wp pte marker if needed */ + if (huge_pte_uffd_wp(pte) && + !(zap_flags & ZAP_FLAG_DROP_MARKER)) + set_huge_pte_at(mm, address, ptep, + make_pte_marker(PTE_MARKER_UFFD_WP)); hugetlb_count_sub(pages_per_huge_page(h), mm); page_remove_rmap(page, true); @@ -5029,9 +5044,10 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct void __unmap_hugepage_range_final(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page) + unsigned long end, struct page *ref_page, + unsigned long zap_flags) { - __unmap_hugepage_range(tlb, vma, start, end, ref_page); + __unmap_hugepage_range(tlb, vma, start, end, ref_page, zap_flags); /* * Clear this flag so that x86's huge_pmd_share page_table_shareable @@ -5047,12 +5063,13 @@ void __unmap_hugepage_range_final(struct mmu_gather *tlb, } void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page) + unsigned long end, struct page *ref_page, + unsigned long zap_flags) { struct mmu_gather tlb; tlb_gather_mmu(&tlb, vma->vm_mm); - __unmap_hugepage_range(&tlb, vma, start, end, ref_page); + __unmap_hugepage_range(&tlb, vma, start, end, ref_page, zap_flags); tlb_finish_mmu(&tlb); } @@ -5107,7 +5124,7 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma, */ if (!is_vma_resv_set(iter_vma, HPAGE_RESV_OWNER)) unmap_hugepage_range(iter_vma, address, - address + huge_page_size(h), page); + address + huge_page_size(h), page, 0); } i_mmap_unlock_write(mapping); } diff --git a/mm/memory.c b/mm/memory.c index cc625c616645..69a73d47513b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1631,8 +1631,11 @@ static void unmap_single_vma(struct mmu_gather *tlb, * safe to do nothing in this case. */ if (vma->vm_file) { + unsigned long zap_flags = details ? + details->zap_flags : 0; i_mmap_lock_write(vma->vm_file->f_mapping); - __unmap_hugepage_range_final(tlb, vma, start, end, NULL); + __unmap_hugepage_range_final(tlb, vma, start, end, + NULL, zap_flags); i_mmap_unlock_write(vma->vm_file->f_mapping); } } else From patchwork Mon Nov 15 08:02:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12618881 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AFC06C433EF for ; Mon, 15 Nov 2021 08:03:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 504C363219 for ; Mon, 15 Nov 2021 08:03:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 504C363219 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id E284A6B007B; Mon, 15 Nov 2021 03:03:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DD74A6B0080; Mon, 15 Nov 2021 03:03:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C79536B009A; Mon, 15 Nov 2021 03:03:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0149.hostedemail.com [216.40.44.149]) by kanga.kvack.org (Postfix) with ESMTP id B96796B007B for ; Mon, 15 Nov 2021 03:03:13 -0500 (EST) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 844D0181327A9 for ; Mon, 15 Nov 2021 08:03:13 +0000 (UTC) X-FDA: 78810424350.01.0D5A063 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf28.hostedemail.com (Postfix) with ESMTP id 028DB90000A8 for ; Mon, 15 Nov 2021 08:03:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636963392; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nlsVhTui5YYYuoTB5Im50dnZjt4bHIyLERdEUNhIhqY=; b=PfGEQv8llbvWbcM86SxHhWFgMQeqBruUpJixIhCe3ss2G16dO22XOTA1ocyER0WlU0BO2h rUI+QynwumTsWk3xI+GpItwXaJLdCqc+GmBzDblaJG0yP09IfAnQIXAMA/8bqb6Ztj9O9I TNK+UFaYfiwrSoXFoeDp4gPbfZo+mWM= Received: from mail-pj1-f69.google.com (mail-pj1-f69.google.com [209.85.216.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-218-ngeOskSnMRacac74Z7TKGQ-1; Mon, 15 Nov 2021 03:03:11 -0500 X-MC-Unique: ngeOskSnMRacac74Z7TKGQ-1 Received: by mail-pj1-f69.google.com with SMTP id a12-20020a17090aa50cb0290178fef5c227so3666959pjq.1 for ; Mon, 15 Nov 2021 00:03:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=nlsVhTui5YYYuoTB5Im50dnZjt4bHIyLERdEUNhIhqY=; b=BfWYeZ8ntKloq0kf14n56zRnsaavy049LxX8rueO8+UXXaQYijRFfJNHMmTyhZyMTS RMHRA+/xy1WIzQxAvsBfuVfW+VabPhuALh+3+kktISsvA4iQrRsRvvTGOB3rZe1qsJlo D0ab+2z/g/72Fr9izbzIRkby60jF2EW0CrgCuth2b9aE+XS8ziBhs3UGTIeipTFxTqx0 0retkQfTSlq3TrV6E1asSRo1A8MAUgCgV+Lpw+fbdwaQKE2Ypnzb0K3GSMv8Gd28MM8l QfFQC2jSATNrIUTGGzOq6eEh289GOArCRbZvtokgVhUalUprhv2e4908mwbP9VUx/9c2 Qp1Q== X-Gm-Message-State: AOAM531uUFfDouxzfY6vUAZuoHc/iv3Nj3fNuu9lRhOJri1KZvUFBjga mp0djx0tPTyFmNy47LhO+SrJxCgx6JtRpozZoqS76y67FrvQZ1ZYbIsxAQ8XV5wND9fStOOp8te 4xhpfR0cGkwtsEYQG/D5C9JeQpdVrCKS3rMaf2wGiug3vgmc9z7AC9X9VFqdX X-Received: by 2002:a17:90b:3849:: with SMTP id nl9mr63063534pjb.145.1636963389900; Mon, 15 Nov 2021 00:03:09 -0800 (PST) X-Google-Smtp-Source: ABdhPJzQjMvgZ1MCPpUCZR/YQLBlvTIXbGd+IIXURYXaaXpubOwQw7QvuE9XXYAbw3RiQ73tuH1vFg== X-Received: by 2002:a17:90b:3849:: with SMTP id nl9mr63063457pjb.145.1636963389453; Mon, 15 Nov 2021 00:03:09 -0800 (PST) Received: from localhost.localdomain ([94.177.118.89]) by smtp.gmail.com with ESMTPSA id x20sm11712603pjp.48.2021.11.15.00.03.01 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Nov 2021 00:03:08 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Nadav Amit , peterx@redhat.com, Alistair Popple , Andrew Morton , Mike Kravetz , Mike Rapoport , Matthew Wilcox , Jerome Glisse , Axel Rasmussen , "Kirill A . Shutemov" , David Hildenbrand , Andrea Arcangeli , Hugh Dickins Subject: [PATCH v6 18/23] mm/hugetlb: Handle uffd-wp during fork() Date: Mon, 15 Nov 2021 16:02:56 +0800 Message-Id: <20211115080256.75095-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211115075522.73795-1-peterx@redhat.com> References: <20211115075522.73795-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 028DB90000A8 X-Stat-Signature: q1ky66rayudw8gwcdxh4arhtwu8agojq Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=PfGEQv8l; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf28.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com X-HE-Tag: 1636963392-309872 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Firstly, we'll need to pass in dst_vma into copy_hugetlb_page_range() because for uffd-wp it's the dst vma that matters on deciding how we should treat uffd-wp protected ptes. We should recognize pte markers during fork and do the pte copy if needed. Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 7 +++++-- mm/hugetlb.c | 41 +++++++++++++++++++++++++++-------------- mm/memory.c | 2 +- 3 files changed, 33 insertions(+), 17 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 4c3ea7ee8ce8..6935b02f1081 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -137,7 +137,8 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, struct vm_area_struct *new_vma, unsigned long old_addr, unsigned long new_addr, unsigned long len); -int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *); +int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, + struct vm_area_struct *, struct vm_area_struct *); long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, unsigned long *, long, unsigned int, @@ -268,7 +269,9 @@ static inline struct page *follow_huge_addr(struct mm_struct *mm, } static inline int copy_hugetlb_page_range(struct mm_struct *dst, - struct mm_struct *src, struct vm_area_struct *vma) + struct mm_struct *src, + struct vm_area_struct *dst_vma, + struct vm_area_struct *src_vma) { BUG(); return 0; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 16fb9cd8d9c5..cf9a0e8c32ba 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4690,23 +4690,24 @@ hugetlb_install_page(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr } int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, - struct vm_area_struct *vma) + struct vm_area_struct *dst_vma, + struct vm_area_struct *src_vma) { pte_t *src_pte, *dst_pte, entry, dst_entry; struct page *ptepage; unsigned long addr; - bool cow = is_cow_mapping(vma->vm_flags); - struct hstate *h = hstate_vma(vma); + bool cow = is_cow_mapping(src_vma->vm_flags); + struct hstate *h = hstate_vma(src_vma); unsigned long sz = huge_page_size(h); unsigned long npages = pages_per_huge_page(h); - struct address_space *mapping = vma->vm_file->f_mapping; + struct address_space *mapping = src_vma->vm_file->f_mapping; struct mmu_notifier_range range; int ret = 0; if (cow) { - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, src, - vma->vm_start, - vma->vm_end); + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, src_vma, src, + src_vma->vm_start, + src_vma->vm_end); mmu_notifier_invalidate_range_start(&range); } else { /* @@ -4718,12 +4719,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, i_mmap_lock_read(mapping); } - for (addr = vma->vm_start; addr < vma->vm_end; addr += sz) { + for (addr = src_vma->vm_start; addr < src_vma->vm_end; addr += sz) { spinlock_t *src_ptl, *dst_ptl; src_pte = huge_pte_offset(src, addr, sz); if (!src_pte) continue; - dst_pte = huge_pte_alloc(dst, vma, addr, sz); + dst_pte = huge_pte_alloc(dst, dst_vma, addr, sz); if (!dst_pte) { ret = -ENOMEM; break; @@ -4758,6 +4759,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, } else if (unlikely(is_hugetlb_entry_migration(entry) || is_hugetlb_entry_hwpoisoned(entry))) { swp_entry_t swp_entry = pte_to_swp_entry(entry); + bool uffd_wp = huge_pte_uffd_wp(entry); if (is_writable_migration_entry(swp_entry) && cow) { /* @@ -4767,10 +4769,21 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, swp_entry = make_readable_migration_entry( swp_offset(swp_entry)); entry = swp_entry_to_pte(swp_entry); + if (userfaultfd_wp(src_vma) && uffd_wp) + entry = huge_pte_mkuffd_wp(entry); set_huge_swap_pte_at(src, addr, src_pte, entry, sz); } + if (!userfaultfd_wp(dst_vma) && uffd_wp) + entry = huge_pte_clear_uffd_wp(entry); set_huge_swap_pte_at(dst, addr, dst_pte, entry, sz); + } else if (unlikely(is_pte_marker(entry))) { + /* + * We copy the pte marker only if the dst vma has + * uffd-wp enabled. + */ + if (userfaultfd_wp(dst_vma)) + set_huge_pte_at(dst, addr, dst_pte, entry); } else { entry = huge_ptep_get(src_pte); ptepage = pte_page(entry); @@ -4785,20 +4798,20 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, * need to be without the pgtable locks since we could * sleep during the process. */ - if (unlikely(page_needs_cow_for_dma(vma, ptepage))) { + if (unlikely(page_needs_cow_for_dma(src_vma, ptepage))) { pte_t src_pte_old = entry; struct page *new; spin_unlock(src_ptl); spin_unlock(dst_ptl); /* Do not use reserve as it's private owned */ - new = alloc_huge_page(vma, addr, 1); + new = alloc_huge_page(dst_vma, addr, 1); if (IS_ERR(new)) { put_page(ptepage); ret = PTR_ERR(new); break; } - copy_user_huge_page(new, ptepage, addr, vma, + copy_user_huge_page(new, ptepage, addr, dst_vma, npages); put_page(ptepage); @@ -4808,13 +4821,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); if (!pte_same(src_pte_old, entry)) { - restore_reserve_on_error(h, vma, addr, + restore_reserve_on_error(h, dst_vma, addr, new); put_page(new); /* dst_entry won't change as in child */ goto again; } - hugetlb_install_page(vma, dst_pte, addr, new); + hugetlb_install_page(dst_vma, dst_pte, addr, new); spin_unlock(src_ptl); spin_unlock(dst_ptl); continue; diff --git a/mm/memory.c b/mm/memory.c index 69a73d47513b..89715d1ec956 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1284,7 +1284,7 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) return 0; if (is_vm_hugetlb_page(src_vma)) - return copy_hugetlb_page_range(dst_mm, src_mm, src_vma); + return copy_hugetlb_page_range(dst_mm, src_mm, dst_vma, src_vma); if (unlikely(src_vma->vm_flags & VM_PFNMAP)) { /* From patchwork Mon Nov 15 08:03:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12618895 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C59FC433F5 for ; Mon, 15 Nov 2021 08:03:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E1A4263219 for ; Mon, 15 Nov 2021 08:03:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org E1A4263219 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 7E52F6B007D; Mon, 15 Nov 2021 03:03:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7952F6B0080; Mon, 15 Nov 2021 03:03:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 684DD6B009A; Mon, 15 Nov 2021 03:03:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0102.hostedemail.com [216.40.44.102]) by kanga.kvack.org (Postfix) with ESMTP id 593396B007D for ; Mon, 15 Nov 2021 03:03:27 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 220F48249980 for ; Mon, 15 Nov 2021 08:03:27 +0000 (UTC) X-FDA: 78810424854.09.2C8F101 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf31.hostedemail.com (Postfix) with ESMTP id A009B10512E1 for ; Mon, 15 Nov 2021 08:03:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636963406; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ILY47EUAX8YWGrwtSg12WPG1Xh7JfzVTNjA+1Egv88U=; b=ExQISPt43LnooPvo0p1Jj+Xe/JZwJueHnSfhvPkW4YXtwYvvJqdbQlcVC9tb2Sczk0LqcV 7UVhpOlQ6PyXjphHslmaWYFbAMlvEZPQEXKBNNkZy/7c7WNlO1H+UPt+Ua8xOQFHUO1svy cihVPByD8HnWNtsiKLSppggCGbZ3PG4= Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-186-ZkIm_ZBTMcicvwfDpEKQoQ-1; Mon, 15 Nov 2021 03:03:25 -0500 X-MC-Unique: ZkIm_ZBTMcicvwfDpEKQoQ-1 Received: by mail-pl1-f200.google.com with SMTP id m17-20020a170902db1100b001421cb34857so5809253plx.15 for ; Mon, 15 Nov 2021 00:03:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ILY47EUAX8YWGrwtSg12WPG1Xh7JfzVTNjA+1Egv88U=; b=BXKF+4jrPCsuxOH66lNQ1Ulw5RV9AoOsyynQADMokoIsmGdj56cpWsHeDuBnhxbsIQ 1tUOZdbONURZeWdUqkMtdEKH7xBUxu1/cvl3zrxIFbR5qNUuWpY3LXD/R8GtIYEDVPGN 5wq5+M4UxaQd15Tg1qc35usGUgVuH1KSGshNYIiRVPu3+oQVxK7tZ9n9r4VXwn6Ul6Xo nxv5uG3Xs1LqlWc0CxHJ5MA45J66JNEoKibUHxsPI/i8g7p62zo8ua+241wdijBX1AVn grxa0WoSprxFiG+gjyADxFEjoSNYoA08QmfuV3LLMiQa9pcCip3xslhw0PoBF0mNJJW9 3AeQ== X-Gm-Message-State: AOAM530nZxH1P9lqOA6vbAT013p0Bo5eW8f8FqLlnF9xbOtbKqBoGSjf g2yRhxDQ0VHey+QkjHUUzeHUttE0qWLNm4B9LWC6duzldOutZUFbd8m1H33cqNzUW0/V7c9lHAF 6FGFPvLB9RXqh2mvoVL0zw6cWPGOJn2llpERxdvXmK5TmFcdJnGNuY3r0MdP3 X-Received: by 2002:a17:90b:1d0e:: with SMTP id on14mr62720952pjb.119.1636963403676; Mon, 15 Nov 2021 00:03:23 -0800 (PST) X-Google-Smtp-Source: ABdhPJxn0Rp/rkm5Ee0ybXP2u5WW8WzuvnOkz9ms/eTShUG9S2uyIUVwsqBwYx3b+PT9/ChdhtyCSA== X-Received: by 2002:a17:90b:1d0e:: with SMTP id on14mr62720896pjb.119.1636963403271; Mon, 15 Nov 2021 00:03:23 -0800 (PST) Received: from localhost.localdomain ([94.177.118.89]) by smtp.gmail.com with ESMTPSA id rm10sm12783901pjb.29.2021.11.15.00.03.14 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Nov 2021 00:03:22 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Nadav Amit , peterx@redhat.com, Alistair Popple , Andrew Morton , Mike Kravetz , Mike Rapoport , Matthew Wilcox , Jerome Glisse , Axel Rasmussen , "Kirill A . Shutemov" , David Hildenbrand , Andrea Arcangeli , Hugh Dickins Subject: [PATCH v6 19/23] mm/khugepaged: Don't recycle vma pgtable if uffd-wp registered Date: Mon, 15 Nov 2021 16:03:10 +0800 Message-Id: <20211115080310.75154-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211115075522.73795-1-peterx@redhat.com> References: <20211115075522.73795-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: A009B10512E1 X-Stat-Signature: 7ruuho7hhzwia8maqigdhgpg7f8jaqhi Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ExQISPt4; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf31.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com X-HE-Tag: 1636963389-162652 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When we're trying to collapse a 2M huge shmem page, don't retract pgtable pmd page if it's registered with uffd-wp, because that pgtable could have pte markers installed. Recycling of that pgtable means we'll lose the pte markers. That could cause data loss for an uffd-wp enabled application on shmem. Instead of disabling khugepaged on these files, simply skip retracting these special VMAs, then the page cache can still be merged into a huge thp, and other mm/vma can still map the range of file with a huge thp when proper. Note that checking VM_UFFD_WP needs to be done with mmap_sem held for write, that avoids race like: khugepaged user thread ========== =========== check VM_UFFD_WP, not set UFFDIO_REGISTER with uffd-wp on shmem wr-protect some pages (install markers) take mmap_sem write lock erase pmd and free pmd page --> pte markers are dropped unnoticed! Signed-off-by: Peter Xu --- mm/khugepaged.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index e99101162f1a..9c75153a36de 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1454,6 +1454,10 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE)) return; + /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */ + if (userfaultfd_wp(vma)) + return; + hpage = find_lock_page(vma->vm_file->f_mapping, linear_page_index(vma, haddr)); if (!hpage) @@ -1594,7 +1598,15 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * reverse order. Trylock is a way to avoid deadlock. */ if (mmap_write_trylock(mm)) { - if (!khugepaged_test_exit(mm)) { + /* + * When a vma is registered with uffd-wp, we can't + * recycle the pmd pgtable because there can be pte + * markers installed. Skip it only, so the rest mm/vma + * can still have the same file mapped hugely, however + * it'll always mapped in small page size for uffd-wp + * registered ranges. + */ + if (!khugepaged_test_exit(mm) && !userfaultfd_wp(vma)) { spinlock_t *ptl = pmd_lock(mm, pmd); /* assume page table is clear */ _pmd = pmdp_collapse_flush(vma, addr, pmd); From patchwork Mon Nov 15 08:03:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12618897 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5FF75C433FE for ; Mon, 15 Nov 2021 08:03:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0F14561B97 for ; Mon, 15 Nov 2021 08:03:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 0F14561B97 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id ACAC96B007E; Mon, 15 Nov 2021 03:03:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A7AF86B0080; Mon, 15 Nov 2021 03:03:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9436D6B009A; Mon, 15 Nov 2021 03:03:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0126.hostedemail.com [216.40.44.126]) by kanga.kvack.org (Postfix) with ESMTP id 863AB6B007E for ; Mon, 15 Nov 2021 03:03:41 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 51C7082FEC for ; Mon, 15 Nov 2021 08:03:41 +0000 (UTC) X-FDA: 78810425442.08.9EED848 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf18.hostedemail.com (Postfix) with ESMTP id CB7B540020A3 for ; Mon, 15 Nov 2021 08:03:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636963420; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=l/H+oDixRZnd7dOwc6UEzkc+HkILBWlXcmJWYj576qI=; b=MZWKt6Qwkcq4Kn2V95SGhmqjRZ2rGoc8Rx8iclWXVUmnFR3/iw/h9K55J2tLesoChDSgEt w8WbDsvWj48E602OJs0zeuKaXI63iaJrSeXd63Jqn71jNraWixC+Jf47WrhhKMHxZbPr8f R3Q5YWZ2zga4fiuh3/VdKWhkVxrDc9U= Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-240-8gNK0gFKPuSGjZfB6_tYHA-1; Mon, 15 Nov 2021 03:03:39 -0500 X-MC-Unique: 8gNK0gFKPuSGjZfB6_tYHA-1 Received: by mail-pl1-f197.google.com with SMTP id p3-20020a170903248300b00143c00a5411so579330plw.12 for ; Mon, 15 Nov 2021 00:03:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=l/H+oDixRZnd7dOwc6UEzkc+HkILBWlXcmJWYj576qI=; b=qZKkDgsG16H0vy3K3cb0EsNYyM8JJB1geJ+BgZdGgGrYap8W0r2e88viZyaG8FKUtZ d8xwyDgpGWUNJixLSZRhuNnkwRT/5F9YetGthwCYCPEXfTVtV5/IB5Xqjb5QfeOzsbkU 8yv4fWBlOFJhlv6Nl0i7UR8Njg+Ufju95t2hJ8IetkA6irQkQXhrYFkucz5g7PSfs25Y knayC8kfh/KxTQ/oLvuoTkQIEYsN7Ba6R6tC7CysyIt7FOzWOqXNlA0V06XhT6gecIQ2 Pi5vcCpkEhPhDvniSIpvzFJHRlx1zqOH/Wv0x1ar/LqjtUmUjVislG+zQNJuXCwlM6G1 ocDA== X-Gm-Message-State: AOAM531iXRg5z35GbF0VXQ2z+DtjGH55o+uhIN/bniaiLa7YWC87UcZ9 eZ1a7HhDA2JdmHwGU1A5XztJX6EMe1tpYybVVZekKgm5J5tSAJAwDPdYB4aNlEYF2Lha1IKSPSo 0AhySClknpVKhP8qh2PO08hyUNm5cwupcZHTbjY/aAXr3dP9145R2BpUnDy9c X-Received: by 2002:a05:6a00:807:b0:49f:9a8d:23b4 with SMTP id m7-20020a056a00080700b0049f9a8d23b4mr30974723pfk.71.1636963417876; Mon, 15 Nov 2021 00:03:37 -0800 (PST) X-Google-Smtp-Source: ABdhPJx7GarKkukoThonF99DLrh9L+TgpvqvwTOsDbfjBOYZ++8KkD2i4+zt+ZNLQP7HNUNlnOJUmQ== X-Received: by 2002:a05:6a00:807:b0:49f:9a8d:23b4 with SMTP id m7-20020a056a00080700b0049f9a8d23b4mr30974675pfk.71.1636963417529; Mon, 15 Nov 2021 00:03:37 -0800 (PST) Received: from localhost.localdomain ([94.177.118.89]) by smtp.gmail.com with ESMTPSA id il13sm15351105pjb.52.2021.11.15.00.03.29 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Nov 2021 00:03:37 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Nadav Amit , peterx@redhat.com, Alistair Popple , Andrew Morton , Mike Kravetz , Mike Rapoport , Matthew Wilcox , Jerome Glisse , Axel Rasmussen , "Kirill A . Shutemov" , David Hildenbrand , Andrea Arcangeli , Hugh Dickins Subject: [PATCH v6 20/23] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs Date: Mon, 15 Nov 2021 16:03:23 +0800 Message-Id: <20211115080323.75209-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211115075522.73795-1-peterx@redhat.com> References: <20211115075522.73795-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: CB7B540020A3 X-Stat-Signature: cthiymii7c346qiicq793hwqs5tz3hej Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=MZWKt6Qw; spf=none (imf18.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1636963420-784091 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This requires the pagemap code to be able to recognize the newly introduced swap special pte for uffd-wp, meanwhile the general case for hugetlb that we recently start to support. It should make pagemap uffd-wp support complete. Signed-off-by: Peter Xu --- fs/proc/task_mmu.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index ad667dbc96f5..5d2f73b2e63d 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1390,6 +1390,12 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm, flags |= PM_SWAP; if (is_pfn_swap_entry(entry)) page = pfn_swap_entry_to_page(entry); + if (is_pte_marker_entry(entry)) { + pte_marker marker = pte_marker_get(entry); + + if (marker & PTE_MARKER_UFFD_WP) + flags |= PM_UFFD_WP; + } } if (page && !PageAnon(page)) @@ -1523,10 +1529,15 @@ static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask, if (page_mapcount(page) == 1) flags |= PM_MMAP_EXCLUSIVE; + if (huge_pte_uffd_wp(pte)) + flags |= PM_UFFD_WP; + flags |= PM_PRESENT; if (pm->show_pfn) frame = pte_pfn(pte) + ((addr & ~hmask) >> PAGE_SHIFT); + } else if (pte_swp_uffd_wp_any(pte)) { + flags |= PM_UFFD_WP; } for (; addr != end; addr += PAGE_SIZE) { From patchwork Mon Nov 15 08:03:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12618899 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C9E68C433EF for ; Mon, 15 Nov 2021 08:03:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7C9A663218 for ; Mon, 15 Nov 2021 08:03:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 7C9A663218 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 1F2236B0080; Mon, 15 Nov 2021 03:03:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1A1206B0093; Mon, 15 Nov 2021 03:03:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 069856B009A; Mon, 15 Nov 2021 03:03:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0214.hostedemail.com [216.40.44.214]) by kanga.kvack.org (Postfix) with ESMTP id EC1776B0080 for ; Mon, 15 Nov 2021 03:03:56 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id AD06C83E98 for ; Mon, 15 Nov 2021 08:03:56 +0000 (UTC) X-FDA: 78810426072.21.8358DBE Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf04.hostedemail.com (Postfix) with ESMTP id 286915000306 for ; Mon, 15 Nov 2021 08:03:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636963435; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QYi6ss6i+qaMQh6aXf6FP47ZJ3XzTF5CNwYJPDp0yLw=; b=RLxvtzvsVsDBJ6kBpbWTweVm8P28QzWlDDQyp3vBykhmDp6Z4wLuhrZ9cvDoIX5nJ/hR21 UgcwypPXLX0TbIs5LZ0+VM7AkCgiVFd1gMFZpe5vMqWoLpyEgGHSLtVMQLFWh33KRfKFGQ pAtCxdwQh6kiizpaeLRHDIvfgYSY8qc= Received: from mail-pj1-f69.google.com (mail-pj1-f69.google.com [209.85.216.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-592--JTXZyenMGy-Y0bcSZaHuQ-1; Mon, 15 Nov 2021 03:03:54 -0500 X-MC-Unique: -JTXZyenMGy-Y0bcSZaHuQ-1 Received: by mail-pj1-f69.google.com with SMTP id x3-20020a17090a1f8300b001a285b9f2cbso4890293pja.6 for ; Mon, 15 Nov 2021 00:03:54 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=QYi6ss6i+qaMQh6aXf6FP47ZJ3XzTF5CNwYJPDp0yLw=; b=c4SPTU8XR5pEdoWaGqxf9RiGySiA0tuXg+PtvMOGanSibhNO21OgNNO8TIeqxYEl1s R1eyFS5an58tna5AQaB9pFmst+58+X0o1iiLrvGtr34Z56t+t/2zVcL2S+UlDkQNehBw MHQitf8005iNa8RhH5X+PcrYkfswr1sI+DvpeCwknhMiL6USnUS5Rvy7q9nbprW2HopT 0cWoik48TxflwHdHT24dguTzPgsIH+i0WohtECKwDbzC8iee5YDyAAQxYzC3Rwxo+kLn G6gOsGlBpXTNRbf+uZ+KtynFrDpJ7QfMQ8QSeiUUgdxZMPFMXpDvyen6P5f1FYnLc5J5 Z/9A== X-Gm-Message-State: AOAM5331C+fe6f9zXdPzaR/pyowl306ySlx0C9pLHNIk09ypCn/M0RcW PGnk1oGSUxS3xx7hqwOEzxqlKMYntujPPaL3rhXHNp7krIMa1pOVvAeiUEyLfCSq0lOuw6VbXkN ZB0vgPnCzq8vYYIKlf6ldKM00qtnJEkYCKaZOnmSAqD7/OfiKacF0DJmj5liH X-Received: by 2002:a17:902:6905:b0:142:9e19:702e with SMTP id j5-20020a170902690500b001429e19702emr34025138plk.34.1636963433145; Mon, 15 Nov 2021 00:03:53 -0800 (PST) X-Google-Smtp-Source: ABdhPJyvuw9vrvNG2L7cS/AHgrgJmEaz6hA9Bl56RBh77ZosYjxsN756UigcYazaskbxJjNEt9UMhA== X-Received: by 2002:a17:902:6905:b0:142:9e19:702e with SMTP id j5-20020a170902690500b001429e19702emr34025099plk.34.1636963432806; Mon, 15 Nov 2021 00:03:52 -0800 (PST) Received: from localhost.localdomain ([94.177.118.89]) by smtp.gmail.com with ESMTPSA id i15sm14335793pfu.151.2021.11.15.00.03.43 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Nov 2021 00:03:52 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Nadav Amit , peterx@redhat.com, Alistair Popple , Andrew Morton , Mike Kravetz , Mike Rapoport , Matthew Wilcox , Jerome Glisse , Axel Rasmussen , "Kirill A . Shutemov" , David Hildenbrand , Andrea Arcangeli , Hugh Dickins Subject: [PATCH v6 21/23] mm/uffd: Enable write protection for shmem & hugetlbfs Date: Mon, 15 Nov 2021 16:03:38 +0800 Message-Id: <20211115080338.75264-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211115075522.73795-1-peterx@redhat.com> References: <20211115075522.73795-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 286915000306 X-Stat-Signature: 78e34gdt6r5a9ypcdkmghq7x1oe55yk1 Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=RLxvtzvs; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf04.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com X-HE-Tag: 1636963424-620088 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We've had all the necessary changes ready for both shmem and hugetlbfs. Turn on all the shmem/hugetlbfs switches for userfaultfd-wp. We can expand UFFD_API_RANGE_IOCTLS_BASIC with _UFFDIO_WRITEPROTECT too because all existing types now support write protection mode. Since vma_can_userfault() will be used elsewhere, move into userfaultfd_k.h. Signed-off-by: Peter Xu --- fs/userfaultfd.c | 21 ++------------------- include/linux/userfaultfd_k.h | 12 ++++++++++++ include/uapi/linux/userfaultfd.h | 10 ++++++++-- mm/userfaultfd.c | 9 +++------ 4 files changed, 25 insertions(+), 27 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index fa24c72a849e..b74cad206d0a 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1253,24 +1253,6 @@ static __always_inline int validate_range(struct mm_struct *mm, return 0; } -static inline bool vma_can_userfault(struct vm_area_struct *vma, - unsigned long vm_flags) -{ - /* FIXME: add WP support to hugetlbfs and shmem */ - if (vm_flags & VM_UFFD_WP) { - if (is_vm_hugetlb_page(vma) || vma_is_shmem(vma)) - return false; - } - - if (vm_flags & VM_UFFD_MINOR) { - if (!(is_vm_hugetlb_page(vma) || vma_is_shmem(vma))) - return false; - } - - return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) || - vma_is_shmem(vma); -} - static int userfaultfd_register(struct userfaultfd_ctx *ctx, unsigned long arg) { @@ -1949,7 +1931,8 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx, ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM); #endif #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP - uffdio_api.features &= ~UFFD_FEATURE_PAGEFAULT_FLAG_WP; + uffdio_api.features &= + ~(UFFD_FEATURE_PAGEFAULT_FLAG_WP | UFFD_FEATURE_WP_HUGETLBFS_SHMEM); #endif uffdio_api.ioctls = UFFD_API_IOCTLS; ret = -EFAULT; diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 05cec02140cb..ef9b70f6447e 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -18,6 +18,7 @@ #include #include #include +#include /* The set of all possible UFFD-related VM flags. */ #define __VM_UFFD_FLAGS (VM_UFFD_MISSING | VM_UFFD_WP | VM_UFFD_MINOR) @@ -140,6 +141,17 @@ static inline bool userfaultfd_armed(struct vm_area_struct *vma) return vma->vm_flags & __VM_UFFD_FLAGS; } +static inline bool vma_can_userfault(struct vm_area_struct *vma, + unsigned long vm_flags) +{ + if (vm_flags & VM_UFFD_MINOR) + return is_vm_hugetlb_page(vma) || vma_is_shmem(vma); + + return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) || + vma_is_shmem(vma); +} + + extern int dup_userfaultfd(struct vm_area_struct *, struct list_head *); extern void dup_userfaultfd_complete(struct list_head *); diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 05b31d60acf6..a67b5185a7a9 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -32,7 +32,8 @@ UFFD_FEATURE_SIGBUS | \ UFFD_FEATURE_THREAD_ID | \ UFFD_FEATURE_MINOR_HUGETLBFS | \ - UFFD_FEATURE_MINOR_SHMEM) + UFFD_FEATURE_MINOR_SHMEM | \ + UFFD_FEATURE_WP_HUGETLBFS_SHMEM) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -46,7 +47,8 @@ #define UFFD_API_RANGE_IOCTLS_BASIC \ ((__u64)1 << _UFFDIO_WAKE | \ (__u64)1 << _UFFDIO_COPY | \ - (__u64)1 << _UFFDIO_CONTINUE) + (__u64)1 << _UFFDIO_CONTINUE | \ + (__u64)1 << _UFFDIO_WRITEPROTECT) /* * Valid ioctl command number range with this API is from 0x00 to @@ -189,6 +191,9 @@ struct uffdio_api { * * UFFD_FEATURE_MINOR_SHMEM indicates the same support as * UFFD_FEATURE_MINOR_HUGETLBFS, but for shmem-backed pages instead. + * + * UFFD_FEATURE_WP_HUGETLBFS_SHMEM indicates that userfaultfd + * write-protection mode is supported on both shmem and hugetlbfs. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -201,6 +206,7 @@ struct uffdio_api { #define UFFD_FEATURE_THREAD_ID (1<<8) #define UFFD_FEATURE_MINOR_HUGETLBFS (1<<9) #define UFFD_FEATURE_MINOR_SHMEM (1<<10) +#define UFFD_FEATURE_WP_HUGETLBFS_SHMEM (1<<11) __u64 features; __u64 ioctls; diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 037f82719e64..6d8cd9f6b8a1 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -716,15 +716,12 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, err = -ENOENT; dst_vma = find_dst_vma(dst_mm, start, len); - /* - * Make sure the vma is not shared, that the dst range is - * both valid and fully within a single existing vma. - */ - if (!dst_vma || (dst_vma->vm_flags & VM_SHARED)) + + if (!dst_vma) goto out_unlock; if (!userfaultfd_wp(dst_vma)) goto out_unlock; - if (!vma_is_anonymous(dst_vma)) + if (!vma_can_userfault(dst_vma, dst_vma->vm_flags)) goto out_unlock; if (is_vm_hugetlb_page(dst_vma)) { From patchwork Mon Nov 15 08:03:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12618901 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF081C433FE for ; Mon, 15 Nov 2021 08:04:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8875C6321B for ; Mon, 15 Nov 2021 08:04:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 8875C6321B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 23F676B007B; Mon, 15 Nov 2021 03:04:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1C9856B0093; Mon, 15 Nov 2021 03:04:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 041D36B009A; Mon, 15 Nov 2021 03:04:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0077.hostedemail.com [216.40.44.77]) by kanga.kvack.org (Postfix) with ESMTP id E5DAF6B007B for ; Mon, 15 Nov 2021 03:04:20 -0500 (EST) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 9D38D181AC212 for ; Mon, 15 Nov 2021 08:04:20 +0000 (UTC) X-FDA: 78810427080.31.08EB114 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf22.hostedemail.com (Postfix) with ESMTP id 2B0AB1904 for ; Mon, 15 Nov 2021 08:04:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636963448; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c0+U3p8s/OABhw6SAh8P2t+DkAujdKbUA9hOXLSGPzE=; b=EUMl2NQvD2oPgN+XPS6ee2xRV/NwF74LngfCQais3aYcX2N18C+3pPlsgSqsvhXyOf3mPp 3swCYgcxi5des8ofQQwH0S3dnxL/JSb1x6wsjJcZMK/81qVucHHAFNBZeK71ugq3du2hES LyxXLoEZYaoEzTcdoAuQ66MXFA2d4uo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636963459; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c0+U3p8s/OABhw6SAh8P2t+DkAujdKbUA9hOXLSGPzE=; b=iuvsfrMxX7hisuWguA2D81PzhyaL2+qNpmrCmTNT5Hq9Ythj6tUACY3XUmNRpSZJRuEidv lD5Fp6VOz0v8w6XoHguj/Isy65m/dTBBOknU6Xcp0PMi9Mw80fPedbGy4/bNfz/gAqrXXM /nSItsI53lSZ2TC3AIaNvdphzJIzrc0= Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-510-G3LAh0ygPR2GPqSaxvprug-1; Mon, 15 Nov 2021 03:04:07 -0500 X-MC-Unique: G3LAh0ygPR2GPqSaxvprug-1 Received: by mail-pf1-f198.google.com with SMTP id z13-20020a627e0d000000b004a2849e589aso4060694pfc.0 for ; Mon, 15 Nov 2021 00:04:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=c0+U3p8s/OABhw6SAh8P2t+DkAujdKbUA9hOXLSGPzE=; b=LGM1uVdTlBqoh8CvCqEbZzs8hV6cWOW+5wraNgTvtHG8CXCXsAjyeydxdXOQgmGayv yBMSV5HGyTxYHTOqALJ9DCvDBEK28FlBiAfv7NJpBG/n5/cvghZmGBpXgsNXiLPHxLbq wlPUFFhmd/YbR7Pk2nSaHEsLMQbihwIw7oHzH8iYRlqFf/X07YrEnc3LLYn8K+ZG4viT UcJexd66qN9CcvxKsig/TTugqmQuOPD5a6I3EYKnp5iD+XwSuNncIeN4YnVsbtgo1hG3 AeaE4a665yH4YUwmYQe9ReYCCKk7W3MQAt82ZiVmBwLs+oketv+TkqOqOYesOkoAEncy tlkg== X-Gm-Message-State: AOAM5303O2ul9vOVlFcHCBf2x5bBJhXGkIIbgqkaZAhYIJWV04DfQWJ5 B0AI9OeVKEvxmRq043Cw4VdQyy4N+w+PkH0DfYCPcD+1V7wkt5iWFLeDUIxLuWevKDpjqq232WI m5dqQvFW4+RdBdQ+aICsicVlfFAze3lqVaUOIknOwy6rvgBtrTZWTEshghNRv X-Received: by 2002:a17:902:b7cb:b0:141:b33a:9589 with SMTP id v11-20020a170902b7cb00b00141b33a9589mr33256297plz.9.1636963446109; Mon, 15 Nov 2021 00:04:06 -0800 (PST) X-Google-Smtp-Source: ABdhPJyTzcWj62pK1w0Vg8ojQeBIVMs39QWhoC3KUqbkNjXYhPKFdP/63f8P+4EzznoW07+f/8ytsw== X-Received: by 2002:a17:902:b7cb:b0:141:b33a:9589 with SMTP id v11-20020a170902b7cb00b00141b33a9589mr33256249plz.9.1636963445748; Mon, 15 Nov 2021 00:04:05 -0800 (PST) Received: from localhost.localdomain ([94.177.118.89]) by smtp.gmail.com with ESMTPSA id c9sm10948505pgq.58.2021.11.15.00.03.58 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Nov 2021 00:04:05 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Nadav Amit , peterx@redhat.com, Alistair Popple , Andrew Morton , Mike Kravetz , Mike Rapoport , Matthew Wilcox , Jerome Glisse , Axel Rasmussen , "Kirill A . Shutemov" , David Hildenbrand , Andrea Arcangeli , Hugh Dickins Subject: [PATCH v6 22/23] mm: Enable PTE markers by default Date: Mon, 15 Nov 2021 16:03:53 +0800 Message-Id: <20211115080353.75322-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211115075522.73795-1-peterx@redhat.com> References: <20211115075522.73795-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 2B0AB1904 X-Stat-Signature: oyjxdfg8i8yky97k3t3ew7eqbcbzp3t8 Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=EUMl2NQv; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=iuvsfrMx; spf=none (imf22.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1636963460-107588 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Enable PTE markers by default. On x86_64 it means it'll auto-enable PTE_MARKER_UFFD_WP as well. Signed-off-by: Peter Xu --- mm/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/Kconfig b/mm/Kconfig index f01c8e0afadf..401e4dff5f42 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -898,7 +898,7 @@ config SECRETMEM def_bool ARCH_HAS_SET_DIRECT_MAP && !EMBEDDED config PTE_MARKER - def_bool n + def_bool y bool "Marker PTEs support" help From patchwork Mon Nov 15 08:04:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12618903 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7489EC433EF for ; Mon, 15 Nov 2021 08:04:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1D6F561B97 for ; Mon, 15 Nov 2021 08:04:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 1D6F561B97 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id B18776B0093; Mon, 15 Nov 2021 03:04:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AA2536B009A; Mon, 15 Nov 2021 03:04:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 91D006B009B; Mon, 15 Nov 2021 03:04:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0025.hostedemail.com [216.40.44.25]) by kanga.kvack.org (Postfix) with ESMTP id 7AD676B0093 for ; Mon, 15 Nov 2021 03:04:23 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 3DBEC18036FE2 for ; Mon, 15 Nov 2021 08:04:23 +0000 (UTC) X-FDA: 78810427206.28.C86E95A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf26.hostedemail.com (Postfix) with ESMTP id 9161020019F0 for ; Mon, 15 Nov 2021 08:04:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636963462; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4fjb0FATaPAZrC5jaWcL/YcRV73dv6wRjJXclwNYe0A=; b=E+aO84nZu1cvvZALGyoNM/TvOHk6koSzD80XtIbeAa06aaULQX9N5JsbKUhA7PmlH9oL4u v6hLtz6LO2kJuFGvaUsqcq02WCk13yY6tLHDDB6m/Z0vuEZKnlAxOxIAG6OvDUysikyB5T slPKc02FENzt/zDeNCigMqjIy8TYkCw= Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-133-gpUBE64tNCqMFteDK99UcQ-1; Mon, 15 Nov 2021 03:04:21 -0500 X-MC-Unique: gpUBE64tNCqMFteDK99UcQ-1 Received: by mail-pl1-f197.google.com with SMTP id e4-20020a170902b78400b00143c2e300ddso523753pls.17 for ; Mon, 15 Nov 2021 00:04:21 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=4fjb0FATaPAZrC5jaWcL/YcRV73dv6wRjJXclwNYe0A=; b=pB3Pyihn7SwwerIj0VUFR5xyXkOeCQt6Gs6HtmqTiq5b087tmkqZlb4c+J2wFp+Xfk lIaasPU1R/KyLApxGJZe9jLcr24VsU5FE8FApPTklmXp35GoIOFShJF8ZjTFioGOSo9B ho5TFKQOU1v5p8FtSEXrUvvwMzSlD3bLoKijA3OEV+6G71o66JBv4ZqfN+Vmsga4c3XI VEiYlvZa1icStEHq28fXEH24tobWdRDMxfxMj3/67vkpKuOJ7slGSxo65sUjj7XZxx0o d/3O6JJD7Jlw6+DfBSwRSQqSLQmaKY9Koo0Ls1tCOMG6DEu0WD8g1/Hf1M9ldKZa9cpg GO0w== X-Gm-Message-State: AOAM532lEgTWwjH6Ly4V7P1N7VM21UHQx9RMV5QoAOI5Vda9QCbAaCG0 iDOwIT5twnn866tDAIp0tG13o9JIymV7Q0ch7EBp9XOtE1nk9WMCxA2bLf4C0T4EaHeO2dpG4PR sE4XkqeLwGe0rN304oPpBlbtAOcgZAto1szud3Cv38PC8yGpbgU7DDeuVlKnj X-Received: by 2002:a05:6a00:1ad0:b0:49f:d04e:78da with SMTP id f16-20020a056a001ad000b0049fd04e78damr32260451pfv.77.1636963460012; Mon, 15 Nov 2021 00:04:20 -0800 (PST) X-Google-Smtp-Source: ABdhPJwYov+lN99bqz8/ojCZuJTialWNO5qsxTAc2MyPy6CwRqTAK/GxA7hVywDwqPIq9fwS60hHOg== X-Received: by 2002:a05:6a00:1ad0:b0:49f:d04e:78da with SMTP id f16-20020a056a001ad000b0049fd04e78damr32260406pfv.77.1636963459693; Mon, 15 Nov 2021 00:04:19 -0800 (PST) Received: from localhost.localdomain ([94.177.118.89]) by smtp.gmail.com with ESMTPSA id d24sm13594745pfn.62.2021.11.15.00.04.12 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Nov 2021 00:04:19 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Nadav Amit , peterx@redhat.com, Alistair Popple , Andrew Morton , Mike Kravetz , Mike Rapoport , Matthew Wilcox , Jerome Glisse , Axel Rasmussen , "Kirill A . Shutemov" , David Hildenbrand , Andrea Arcangeli , Hugh Dickins Subject: [PATCH v6 23/23] selftests/uffd: Enable uffd-wp for shmem/hugetlbfs Date: Mon, 15 Nov 2021 16:04:06 +0800 Message-Id: <20211115080406.75377-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211115075522.73795-1-peterx@redhat.com> References: <20211115075522.73795-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 9161020019F0 X-Stat-Signature: bueyet8o3996dy3adpktxnradqkhgrdw Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=E+aO84nZ; spf=none (imf26.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1636963463-295596 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: After we added support for shmem and hugetlbfs, we can turn uffd-wp test on always now. Signed-off-by: Peter Xu --- tools/testing/selftests/vm/userfaultfd.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 64845be3971d..232cc6083039 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -81,7 +81,7 @@ static int test_type; static volatile bool test_uffdio_copy_eexist = true; static volatile bool test_uffdio_zeropage_eexist = true; /* Whether to test uffd write-protection */ -static bool test_uffdio_wp = false; +static bool test_uffdio_wp = true; /* Whether to test uffd minor faults */ static bool test_uffdio_minor = false; @@ -1588,8 +1588,6 @@ static void set_test_type(const char *type) if (!strcmp(type, "anon")) { test_type = TEST_ANON; uffd_test_ops = &anon_uffd_test_ops; - /* Only enable write-protect test for anonymous test */ - test_uffdio_wp = true; } else if (!strcmp(type, "hugetlb")) { test_type = TEST_HUGETLB; uffd_test_ops = &hugetlb_uffd_test_ops;