From patchwork Fri Feb 12 21:53:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12086157 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8B0DC433DB for ; Fri, 12 Feb 2021 21:54:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 259AF64DD7 for ; Fri, 12 Feb 2021 21:54:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 259AF64DD7 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 99B6E8D0098; Fri, 12 Feb 2021 16:54:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 972EA8D0060; Fri, 12 Feb 2021 16:54:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 812EE8D0098; Fri, 12 Feb 2021 16:54:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0143.hostedemail.com [216.40.44.143]) by kanga.kvack.org (Postfix) with ESMTP id 666E68D0060 for ; Fri, 12 Feb 2021 16:54:20 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 1EAA5182FB215 for ; Fri, 12 Feb 2021 21:54:20 +0000 (UTC) X-FDA: 77810969880.03.park53_3e00fd327624 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 010FB28A4E9 for ; Fri, 12 Feb 2021 21:54:19 +0000 (UTC) X-HE-Tag: park53_3e00fd327624 X-Filterd-Recvd-Size: 13365 Received: from mail-qt1-f201.google.com (mail-qt1-f201.google.com [209.85.160.201]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Fri, 12 Feb 2021 21:54:19 +0000 (UTC) Received: by mail-qt1-f201.google.com with SMTP id w3so978274qti.17 for ; Fri, 12 Feb 2021 13:54:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=CKBB7i4WKZ8V4NFQL35jIBKBg/dy7ji0N2zuyueuiDE=; b=q/C1CnJMihmZljCjgasMtzMEMhefxZ5Leb2bQUmOpZraBDczG4IIHCbCq+xO/0qTi2 qIizPQMOVNkylMylWN4qE63+45q4oGIZxHxbn/nZRrxUchmJ3OQro5H1B7qZ/Sv59h5v 4QfsycT1QbveNer1hlT4SP/ZQ5L5vHayDsUpfB/KItgn0yYMwZ19mJTRMqRbKrI16eSr rRkIbL+ydVKIIdkxCUDTpN5KWbWROY0RiLDAF1gHyKeTdd+5BWP209Q/0coSz8xlSMXr Rx1Y3gQI9ThzeFRxrLMwmmOrbDHshVUaCxF+I/q5QFEVGuwmMIQOAbDxrgdi6SgwVgu1 EFKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=CKBB7i4WKZ8V4NFQL35jIBKBg/dy7ji0N2zuyueuiDE=; b=A8NR85PS+1xwG0emGYNTNbquWeQEyXl5ZGDAZUmZg4qPFZvnUFc+Ux1SN1xwjoJdlU Ig3W6OT+KKjA9Jb2A89sFPtYZ3ty3CtjfhF3VwjKziuB8yPZdyDIPXPmouFHDfLOwV/1 rwN+cMY0Bnx/i4rYQG484ABweLtz1zeNnLgK/Osko0hkn9esx4zJpJiOwxseW/0DnGr6 yRkuuQap75e5zTfNipMFcNJyz4yNn2MHQElFBsZ/WMo/e2rHC49WSYiMuEEDAzomcgOo 8phw30FrWdVtwgIAaseHIYCR0owbw0YWEQt3tzhYIbEUNrVxLJ6v2EcowiPmRxhIFfJG IdJw== X-Gm-Message-State: AOAM530n7IwsLTSmAYVXxE/v1BghpLS8y6Sl3ZfANheQgRb7IPeQ27Ve NVCsHoRA5byv3zEFg8s7/8e043aIiHgqAXZ/9aTC X-Google-Smtp-Source: ABdhPJwcHsCAN2GnttmsvMnQuKlj2Z/KdxDSc80IoWKQ/QvZqaQGpiLklO3HftzBZmN17C7RUsTH9hUh74EMOFwN61NW X-Received: from ajr0.svl.corp.google.com ([2620:15c:2cd:203:d2f:99bb:c1e0:34ba]) (user=axelrasmussen job=sendgmr) by 2002:ad4:4b30:: with SMTP id s16mr4540474qvw.62.1613166858565; Fri, 12 Feb 2021 13:54:18 -0800 (PST) Date: Fri, 12 Feb 2021 13:53:57 -0800 In-Reply-To: <20210212215403.3457686-1-axelrasmussen@google.com> Message-Id: <20210212215403.3457686-2-axelrasmussen@google.com> Mime-Version: 1.0 References: <20210212215403.3457686-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.30.0.478.g8a0d178c01-goog Subject: [PATCH v6 1/7] userfaultfd: introduce a new reason enum instead of using VM_* flags From: Axel Rasmussen To: Alexander Viro , Alexey Dobriyan , Andrea Arcangeli , Andrew Morton , Anshuman Khandual , Catalin Marinas , Chinwen Chang , Huang Ying , Ingo Molnar , Jann Horn , Jerome Glisse , Lokesh Gidra , "Matthew Wilcox (Oracle)" , Michael Ellerman , " =?utf-8?q?Michal_Koutn=C3=BD?= " , Michel Lespinasse , Mike Kravetz , Mike Rapoport , Nicholas Piggin , Peter Xu , Shaohua Li , Shawn Anastasio , Steven Rostedt , Steven Price , Vlastimil Babka Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Adam Ruprecht , Axel Rasmussen , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Mina Almasry , Oliver Upton X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The problem is, VM_* flags are a limited resource. As we add support for new use cases to userfaultfd, there are new reasons why a userfault might be triggered, but we can't keep adding new VM_* flags. So, introduce a new enum, to which we can add arbitrarily many reasons going forward. The intent is: 1. Page fault handlers will notice a userfaultfd registration (VM_UFFD_MISSING or VM_UFFD_WP). 2. They'll call handle_userfault() to resolve it, with the reason: page missing, write protect fault, or (in the future) minor fault, etc... Importantly, the possible reasons for triggering a userfault will no longer match 1:1 with VM_* flags; there can be > 1 reason to trigger a fault for a single VM_* flag. Signed-off-by: Axel Rasmussen --- fs/userfaultfd.c | 21 +++++++++------------ include/linux/userfaultfd_k.h | 12 ++++++++++-- mm/huge_memory.c | 4 ++-- mm/hugetlb.c | 2 +- mm/memory.c | 8 ++++---- mm/shmem.c | 2 +- 6 files changed, 27 insertions(+), 22 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 1f4a34b1a1e7..8d663eae0266 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -190,7 +190,7 @@ static inline void msg_init(struct uffd_msg *msg) static inline struct uffd_msg userfault_msg(unsigned long address, unsigned int flags, - unsigned long reason, + enum uffd_trigger_reason reason, unsigned int features) { struct uffd_msg msg; @@ -206,7 +206,7 @@ static inline struct uffd_msg userfault_msg(unsigned long address, * a write fault. */ msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_WRITE; - if (reason & VM_UFFD_WP) + if (reason == UFFD_REASON_WP) /* * If UFFD_FEATURE_PAGEFAULT_FLAG_WP was set in the * uffdio_api.features and UFFD_PAGEFAULT_FLAG_WP was @@ -229,7 +229,7 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx, struct vm_area_struct *vma, unsigned long address, unsigned long flags, - unsigned long reason) + enum uffd_trigger_reason reason) { struct mm_struct *mm = ctx->mm; pte_t *ptep, pte; @@ -251,7 +251,7 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx, */ if (huge_pte_none(pte)) ret = true; - if (!huge_pte_write(pte) && (reason & VM_UFFD_WP)) + if (!huge_pte_write(pte) && (reason == UFFD_REASON_WP)) ret = true; out: return ret; @@ -261,7 +261,7 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx, struct vm_area_struct *vma, unsigned long address, unsigned long flags, - unsigned long reason) + enum uffd_trigger_reason reason) { return false; /* should never get here */ } @@ -277,7 +277,7 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx, static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx, unsigned long address, unsigned long flags, - unsigned long reason) + enum uffd_trigger_reason reason) { struct mm_struct *mm = ctx->mm; pgd_t *pgd; @@ -316,7 +316,7 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx, goto out; if (pmd_trans_huge(_pmd)) { - if (!pmd_write(_pmd) && (reason & VM_UFFD_WP)) + if (!pmd_write(_pmd) && (reason == UFFD_REASON_WP)) ret = true; goto out; } @@ -332,7 +332,7 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx, */ if (pte_none(*pte)) ret = true; - if (!pte_write(*pte) && (reason & VM_UFFD_WP)) + if (!pte_write(*pte) && (reason == UFFD_REASON_WP)) ret = true; pte_unmap(pte); @@ -366,7 +366,7 @@ static inline long userfaultfd_get_blocking_state(unsigned int flags) * fatal_signal_pending()s, and the mmap_lock must be released before * returning it. */ -vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) +vm_fault_t handle_userfault(struct vm_fault *vmf, enum uffd_trigger_reason reason) { struct mm_struct *mm = vmf->vma->vm_mm; struct userfaultfd_ctx *ctx; @@ -401,9 +401,6 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) BUG_ON(ctx->mm != mm); - VM_BUG_ON(reason & ~(VM_UFFD_MISSING|VM_UFFD_WP)); - VM_BUG_ON(!(reason & VM_UFFD_MISSING) ^ !!(reason & VM_UFFD_WP)); - if (ctx->features & UFFD_FEATURE_SIGBUS) goto out; if ((vmf->flags & FAULT_FLAG_USER) == 0 && diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index c63ccdae3eab..cc1554e7162f 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -9,6 +9,14 @@ #ifndef _LINUX_USERFAULTFD_K_H #define _LINUX_USERFAULTFD_K_H +/* Denotes the reason why handle_userfault() is being triggered. */ +enum uffd_trigger_reason { + /* A page was missing. */ + UFFD_REASON_MISSING, + /* A write protect fault occurred. */ + UFFD_REASON_WP, +}; + #ifdef CONFIG_USERFAULTFD #include /* linux/include/uapi/linux/userfaultfd.h */ @@ -32,7 +40,7 @@ extern int sysctl_unprivileged_userfaultfd; -extern vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason); +extern vm_fault_t handle_userfault(struct vm_fault *vmf, enum uffd_trigger_reason reason); extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, @@ -111,7 +119,7 @@ extern void userfaultfd_unmap_complete(struct mm_struct *mm, /* mm helpers */ static inline vm_fault_t handle_userfault(struct vm_fault *vmf, - unsigned long reason) + enum uffd_trigger_reason reason) { return VM_FAULT_SIGBUS; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 395c75111d33..1d740b43bcc5 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -629,7 +629,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, spin_unlock(vmf->ptl); put_page(page); pte_free(vma->vm_mm, pgtable); - ret2 = handle_userfault(vmf, VM_UFFD_MISSING); + ret2 = handle_userfault(vmf, UFFD_REASON_MISSING); VM_BUG_ON(ret2 & VM_FAULT_FALLBACK); return ret2; } @@ -748,7 +748,7 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf) } else if (userfaultfd_missing(vma)) { spin_unlock(vmf->ptl); pte_free(vma->vm_mm, pgtable); - ret = handle_userfault(vmf, VM_UFFD_MISSING); + ret = handle_userfault(vmf, UFFD_REASON_MISSING); VM_BUG_ON(ret & VM_FAULT_FALLBACK); } else { set_huge_zero_page(pgtable, vma->vm_mm, vma, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 0d45a01a85f8..2a90e0b4bf47 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4305,7 +4305,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, hash = hugetlb_fault_mutex_hash(mapping, idx); mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); - ret = handle_userfault(&vmf, VM_UFFD_MISSING); + ret = handle_userfault(&vmf, UFFD_REASON_MISSING); i_mmap_lock_read(mapping); mutex_lock(&hugetlb_fault_mutex_table[hash]); goto out; diff --git a/mm/memory.c b/mm/memory.c index bc4a41ec81aa..995a95826f4d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3100,7 +3100,7 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) if (userfaultfd_pte_wp(vma, *vmf->pte)) { pte_unmap_unlock(vmf->pte, vmf->ptl); - return handle_userfault(vmf, VM_UFFD_WP); + return handle_userfault(vmf, UFFD_REASON_WP); } vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte); @@ -3535,7 +3535,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) /* Deliver the page fault to userland, check inside PT lock */ if (userfaultfd_missing(vma)) { pte_unmap_unlock(vmf->pte, vmf->ptl); - return handle_userfault(vmf, VM_UFFD_MISSING); + return handle_userfault(vmf, UFFD_REASON_MISSING); } goto setpte; } @@ -3577,7 +3577,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) if (userfaultfd_missing(vma)) { pte_unmap_unlock(vmf->pte, vmf->ptl); put_page(page); - return handle_userfault(vmf, VM_UFFD_MISSING); + return handle_userfault(vmf, UFFD_REASON_MISSING); } inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); @@ -4195,7 +4195,7 @@ static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf, pmd_t orig_pmd) { if (vma_is_anonymous(vmf->vma)) { if (userfaultfd_huge_pmd_wp(vmf->vma, orig_pmd)) - return handle_userfault(vmf, VM_UFFD_WP); + return handle_userfault(vmf, UFFD_REASON_WP); return do_huge_pmd_wp_page(vmf, orig_pmd); } if (vmf->vma->vm_ops->huge_fault) { diff --git a/mm/shmem.c b/mm/shmem.c index 06c771d23127..e1e2513b4298 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1849,7 +1849,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, */ if (vma && userfaultfd_missing(vma)) { - *fault_type = handle_userfault(vmf, VM_UFFD_MISSING); + *fault_type = handle_userfault(vmf, UFFD_REASON_MISSING); return 0; } From patchwork Fri Feb 12 21:53:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12086159 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B596AC433E6 for ; Fri, 12 Feb 2021 21:54:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3D36964DDF for ; Fri, 12 Feb 2021 21:54:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3D36964DDF Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C333F8D0099; Fri, 12 Feb 2021 16:54:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B464C8D0060; Fri, 12 Feb 2021 16:54:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A5D9E8D0099; Fri, 12 Feb 2021 16:54:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0123.hostedemail.com [216.40.44.123]) by kanga.kvack.org (Postfix) with ESMTP id 8E8F98D0060 for ; Fri, 12 Feb 2021 16:54:22 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 59D42840E for ; Fri, 12 Feb 2021 21:54:22 +0000 (UTC) X-FDA: 77810969964.24.cave12_410614127624 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin24.hostedemail.com (Postfix) with ESMTP id 40F231A4A0 for ; Fri, 12 Feb 2021 21:54:22 +0000 (UTC) X-HE-Tag: cave12_410614127624 X-Filterd-Recvd-Size: 13578 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Fri, 12 Feb 2021 21:54:21 +0000 (UTC) Received: by mail-pl1-f201.google.com with SMTP id e12so706846plh.2 for ; Fri, 12 Feb 2021 13:54:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=fpqa0kXnD6kS3MA9sd4m+N+7CcpsXi1WCIN8v0AwHd8=; b=CXE7n5gu3fRPXEsL5tnMLb/r7x4kb+xJi4lJ2ganTbzN3MVu5ChjuWjJsPh4a4GLGd jpR6yU7eHsEjKGTkk5zrAFb537iiXdKDwyNa9JUlT1lbV9d8pGtqsHNsdMttuR0/35fq g9G39Qj7KCYJvcZzrtVhrtMj0Q7ugRO/7Ps6HvOKDHy/DFen93Y6KWt5AZhsSKe+n/JJ jU8Ls7FW/Ig+g+qSIibEVT/sW6S6DsU7ptdTK14lNxlXksC2g1wPpwocGfSsrqAiUDpI oyksqeQw33hOIkbTHV380D+1ynLvFXW8CiNciW6LQGTwU1lCkbgTUKwKoz8wiL1GiWzz df/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=fpqa0kXnD6kS3MA9sd4m+N+7CcpsXi1WCIN8v0AwHd8=; b=ZCWfYxAjInlhsb6pXDYc2U2IaaPMnTlf8BQU0Wa/iEkl5wHz7phd46o6Ab6+R/mmvw Vn3pJVMF5kGGHFf0aOvfAmWW7QzdNCFKQ+VAey8YWpijeWiavMKEfDGnlnSLcWIO5y42 geOsLBAP9FKK0JRARil5olRpyRTOgOaWwJLhDunoEBxCFBtwpqWApkdkVNRJSJJYvYkc oK6uX1nRRflL2V+Jug0Qu6xoJSD7IX0+Rk47GydrZ2DznAh9cSXTmN539wEoEws4HlD9 JNe1hPfId7wcZQ8k7mLw5CWayD6EcELwNsADNLkKq/j50Ltdrkh6ZCDT9aGcdI74opet mygw== X-Gm-Message-State: AOAM5311Pfa3BLnUr/CPv3r9WCAezqCrZh4Fjlhpc3vdrC+c09w32/Ac sgRxkvXxXErCHEUpRYA3DzVwKp5CECI0Ry2NpyNZ X-Google-Smtp-Source: ABdhPJx5gACt6fHpzSxD1mt3KUSvMtM+l7lyNZvJvbveFvEU2cR9zjuQhF6I/jlAB422DN9lHfPJzRvq3RV3OeDDp5Mm X-Received: from ajr0.svl.corp.google.com ([2620:15c:2cd:203:d2f:99bb:c1e0:34ba]) (user=axelrasmussen job=sendgmr) by 2002:a17:90a:df17:: with SMTP id gp23mr2957388pjb.55.1613166860543; Fri, 12 Feb 2021 13:54:20 -0800 (PST) Date: Fri, 12 Feb 2021 13:53:58 -0800 In-Reply-To: <20210212215403.3457686-1-axelrasmussen@google.com> Message-Id: <20210212215403.3457686-3-axelrasmussen@google.com> Mime-Version: 1.0 References: <20210212215403.3457686-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.30.0.478.g8a0d178c01-goog Subject: [PATCH v6 2/7] userfaultfd: add minor fault registration mode From: Axel Rasmussen To: Alexander Viro , Alexey Dobriyan , Andrea Arcangeli , Andrew Morton , Anshuman Khandual , Catalin Marinas , Chinwen Chang , Huang Ying , Ingo Molnar , Jann Horn , Jerome Glisse , Lokesh Gidra , "Matthew Wilcox (Oracle)" , Michael Ellerman , " =?utf-8?q?Michal_Koutn=C3=BD?= " , Michel Lespinasse , Mike Kravetz , Mike Rapoport , Nicholas Piggin , Peter Xu , Shaohua Li , Shawn Anastasio , Steven Rostedt , Steven Price , Vlastimil Babka Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Adam Ruprecht , Axel Rasmussen , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Mina Almasry , Oliver Upton X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This feature allows userspace to intercept "minor" faults. By "minor" faults, I mean the following situation: Let there exist two mappings (i.e., VMAs) to the same page(s). One of the mappings is registered with userfaultfd (in minor mode), and the other is not. Via the non-UFFD mapping, the underlying pages have already been allocated & filled with some contents. The UFFD mapping has not yet been faulted in; when it is touched for the first time, this results in what I'm calling a "minor" fault. As a concrete example, when working with hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing page. This commit adds the new feature flag used to enable this behavior. In the hugetlb fault path, if we find that we have huge_pte_none(), but find_lock_page() does indeed find an existing page, then we have a "minor" fault, and if the VMA is UFFD-registered (with VM_UFFD_MISSING), *and* this feature is enabled, we call into userfaultfd to handle it. Why not add a new registration mode instead? After all, this being a feature flag instead has drawbacks: - You can't handle *only* minor faults, but *not* missing faults. - This is a per-FD option, not a per-registration option, so if you want minor faults for some VMAs but not others, you need to open a separate FD for those two configurations. - The userfaultfd_minor() check is more expensive, as we have to examine the userfaultfd_ctx. - handle_userfault()'s "reason" argument is no longer 1:1 with VM_* flags, which has to be dealt with (complexity). Basically, it comes down to the fact that we can't really add a new VM_* flag. There are no unused bits left. :) With the current design of UFFD, we don't write down the requested registration mode anywhere except this flag either - there isn't any extended context we can check. So, I think this is the only way. Signed-off-by: Axel Rasmussen --- fs/userfaultfd.c | 37 ++++++++++++++++++++------------ include/linux/mm.h | 2 +- include/linux/userfaultfd_k.h | 9 ++++++++ include/uapi/linux/userfaultfd.h | 15 +++++++++---- mm/hugetlb.c | 32 +++++++++++++++++++++++++++ 5 files changed, 76 insertions(+), 19 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 8d663eae0266..edfdb8f1c740 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -178,6 +178,18 @@ static void userfaultfd_ctx_put(struct userfaultfd_ctx *ctx) } } +bool userfaultfd_minor(struct vm_area_struct *vma) +{ + struct userfaultfd_ctx *ctx = vma->vm_userfaultfd_ctx.ctx; + unsigned int features = ctx ? ctx->features : 0; + bool minor_hugetlbfs = (features & UFFD_FEATURE_MINOR_HUGETLBFS); + + if (!userfaultfd_missing(vma)) + return false; + + return is_vm_hugetlb_page(vma) && minor_hugetlbfs; +} + static inline void msg_init(struct uffd_msg *msg) { BUILD_BUG_ON(sizeof(struct uffd_msg) != 32); @@ -197,24 +209,21 @@ static inline struct uffd_msg userfault_msg(unsigned long address, msg_init(&msg); msg.event = UFFD_EVENT_PAGEFAULT; msg.arg.pagefault.address = address; + /* + * These flags indicate why the userfault occurred: + * - UFFD_PAGEFAULT_FLAG_WP indicates a write protect fault. + * - UFFD_PAGEFAULT_FLAG_MINOR indicates a minor fault. + * - Neither of these flags being set indicates a MISSING fault. + * + * Separately, UFFD_PAGEFAULT_FLAG_WRITE indicates it was a write + * fault. Otherwise, it was a read fault. + */ if (flags & FAULT_FLAG_WRITE) - /* - * If UFFD_FEATURE_PAGEFAULT_FLAG_WP was set in the - * uffdio_api.features and UFFD_PAGEFAULT_FLAG_WRITE - * was not set in a UFFD_EVENT_PAGEFAULT, it means it - * was a read fault, otherwise if set it means it's - * a write fault. - */ msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_WRITE; if (reason == UFFD_REASON_WP) - /* - * If UFFD_FEATURE_PAGEFAULT_FLAG_WP was set in the - * uffdio_api.features and UFFD_PAGEFAULT_FLAG_WP was - * not set in a UFFD_EVENT_PAGEFAULT, it means it was - * a missing fault, otherwise if set it means it's a - * write protect fault. - */ msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_WP; + if (reason == UFFD_REASON_MINOR) + msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_MINOR; if (features & UFFD_FEATURE_THREAD_ID) msg.arg.pagefault.feat.ptid = task_pid_vnr(current); return msg; diff --git a/include/linux/mm.h b/include/linux/mm.h index 89fca443e6f1..3ddc465e31b0 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -272,7 +272,7 @@ extern unsigned int kobjsize(const void *objp); #define VM_MAYSHARE 0x00000080 #define VM_GROWSDOWN 0x00000100 /* general info on the segment */ -#define VM_UFFD_MISSING 0x00000200 /* missing pages tracking */ +#define VM_UFFD_MISSING 0x00000200 /* missing or minor fault tracking */ #define VM_PFNMAP 0x00000400 /* Page-ranges managed without "struct page", just pure PFN */ #define VM_DENYWRITE 0x00000800 /* ETXTBSY on write attempts.. */ #define VM_UFFD_WP 0x00001000 /* wrprotect pages tracking */ diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index cc1554e7162f..4e03268c65ec 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -15,6 +15,8 @@ enum uffd_trigger_reason { UFFD_REASON_MISSING, /* A write protect fault occurred. */ UFFD_REASON_WP, + /* A minor fault occurred. */ + UFFD_REASON_MINOR, }; #ifdef CONFIG_USERFAULTFD @@ -79,6 +81,8 @@ static inline bool userfaultfd_wp(struct vm_area_struct *vma) return vma->vm_flags & VM_UFFD_WP; } +bool userfaultfd_minor(struct vm_area_struct *vma); + static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma, pte_t pte) { @@ -140,6 +144,11 @@ static inline bool userfaultfd_wp(struct vm_area_struct *vma) return false; } +static inline bool userfaultfd_minor(struct vm_area_struct *vma) +{ + return false; +} + static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma, pte_t pte) { diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 5f2d88212f7c..6b038d56bca7 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -22,12 +22,13 @@ #define UFFD_API_FEATURES (UFFD_FEATURE_PAGEFAULT_FLAG_WP | \ UFFD_FEATURE_EVENT_FORK | \ UFFD_FEATURE_EVENT_REMAP | \ - UFFD_FEATURE_EVENT_REMOVE | \ + UFFD_FEATURE_EVENT_REMOVE | \ UFFD_FEATURE_EVENT_UNMAP | \ UFFD_FEATURE_MISSING_HUGETLBFS | \ UFFD_FEATURE_MISSING_SHMEM | \ UFFD_FEATURE_SIGBUS | \ - UFFD_FEATURE_THREAD_ID) + UFFD_FEATURE_THREAD_ID | \ + UFFD_FEATURE_MINOR_HUGETLBFS) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -125,8 +126,9 @@ struct uffd_msg { #define UFFD_EVENT_UNMAP 0x16 /* flags for UFFD_EVENT_PAGEFAULT */ -#define UFFD_PAGEFAULT_FLAG_WRITE (1<<0) /* If this was a write fault */ -#define UFFD_PAGEFAULT_FLAG_WP (1<<1) /* If reason is VM_UFFD_WP */ +#define UFFD_PAGEFAULT_FLAG_WRITE (1<<0) /* write fault */ +#define UFFD_PAGEFAULT_FLAG_WP (1<<1) /* write-protect fault */ +#define UFFD_PAGEFAULT_FLAG_MINOR (1<<2) /* minor fault */ struct uffdio_api { /* userland asks for an API number and the features to enable */ @@ -171,6 +173,10 @@ struct uffdio_api { * * UFFD_FEATURE_THREAD_ID pid of the page faulted task_struct will * be returned, if feature is not requested 0 will be returned. + * + * If requested, UFFD_FEATURE_MINOR_HUGETLBFS indicates that hugetlbfs + * memory registered with REGISTER_MODE_MISSING will *also* receive + * events for minor faults, not just missing faults. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -181,6 +187,7 @@ struct uffdio_api { #define UFFD_FEATURE_EVENT_UNMAP (1<<6) #define UFFD_FEATURE_SIGBUS (1<<7) #define UFFD_FEATURE_THREAD_ID (1<<8) +#define UFFD_FEATURE_MINOR_HUGETLBFS (1<<9) __u64 features; __u64 ioctls; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 2a90e0b4bf47..93307fb058b7 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4366,6 +4366,38 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, VM_FAULT_SET_HINDEX(hstate_index(h)); goto backout_unlocked; } + + /* Check for page in userfault range. */ + if (userfaultfd_minor(vma)) { + u32 hash; + struct vm_fault vmf = { + .vma = vma, + .address = haddr, + .flags = flags, + /* + * Hard to debug if it ends up being used by a + * callee that assumes something about the + * other uninitialized fields... same as in + * memory.c + */ + }; + + unlock_page(page); + + /* + * hugetlb_fault_mutex and i_mmap_rwsem must be dropped + * before handling userfault. Reacquire after handling + * fault to make calling code simpler. + */ + + hash = hugetlb_fault_mutex_hash(mapping, idx); + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + i_mmap_unlock_read(mapping); + ret = handle_userfault(&vmf, UFFD_REASON_MINOR); + i_mmap_lock_read(mapping); + mutex_lock(&hugetlb_fault_mutex_table[hash]); + goto out; + } } /* From patchwork Fri Feb 12 21:53:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12086161 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C94C5C433DB for ; Fri, 12 Feb 2021 21:54:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 67A9764DDF for ; Fri, 12 Feb 2021 21:54:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 67A9764DDF Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E7D4B8D009A; Fri, 12 Feb 2021 16:54:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DE9E38D0060; Fri, 12 Feb 2021 16:54:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C7F198D009A; Fri, 12 Feb 2021 16:54:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0053.hostedemail.com [216.40.44.53]) by kanga.kvack.org (Postfix) with ESMTP id ABD078D0060 for ; Fri, 12 Feb 2021 16:54:24 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 777B418332C43 for ; Fri, 12 Feb 2021 21:54:24 +0000 (UTC) X-FDA: 77810970048.03.anger07_03010cc27624 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 58D3128A4E9 for ; Fri, 12 Feb 2021 21:54:24 +0000 (UTC) X-HE-Tag: anger07_03010cc27624 X-Filterd-Recvd-Size: 6440 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf29.hostedemail.com (Postfix) with ESMTP for ; Fri, 12 Feb 2021 21:54:23 +0000 (UTC) Received: by mail-pj1-f73.google.com with SMTP id t6so1005527pje.9 for ; Fri, 12 Feb 2021 13:54:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=YlfIormFBY/rysSGqj+ed5treI/R2/QqIMyWcQNPNXw=; b=CTA6VBPTeFoAytyEos3lFWsVfDnGleeYtmq/FweYdKvW2QZB2YOJ2iA5510aDYkwwm f1M4M0HXg164r7mCCQYA8E96Khx3lIUmOQCYKUU6G88yAzBKEo21KmMcQvRZeWO37DMT GuCm3Juq29cIL78PNFnwnb1evyu9ZvJY7rMy4EV+45WqOfBr00LinEOZ0kn4BPWi1s/v IsevNgCw4AzhAORs6F1A/oWmx8boULq3Uhrrc269HT0dzBi5x3fTSVlMO+xQ5ZMxjK7L E70c3DKqkObmpcmJsSksnqXlyE/pVf++EDEpmdEg0e2ecnMT9pcJVoet4SwErpYyG2sZ zHzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=YlfIormFBY/rysSGqj+ed5treI/R2/QqIMyWcQNPNXw=; b=gu7tdY0Aew2YyZEezQ7c9umemvdMZgwicALrokdOWMenBC66LvRUBUZZwxH7BzxG/E ZpjnegJDQHkgGpUEMz/K98+htbrreMO9jYWmAZPxi/zHULzdO1n8um5+wjfMa8uOkwad h4mKPHvnxtUYQUruUEVPqJDxjWRv8+GysvahUQOtAzx1arLj1CKxKsO7GFy16VwdkDS7 aAsRKigLV1weoyoY3IUmtesZdUZ8jQYoIByWy8eUO1eYEso60JzPknBpRbWlOJLPKwgA 0Rpz02uzuwVDONqQvJ+OqiglsFW4KUmCKpRarLv50M1oXXinDZnnayMxioPnTNKfFfIU /lFw== X-Gm-Message-State: AOAM532oaoKrWFs2e0sTXSVx2T+HKjV88Q4KYmZIBuk+ErpNeXHcUFQE sP1NDLVoH/tXy3rAAWJ3h/EZo7JQYo6hvEixpW2V X-Google-Smtp-Source: ABdhPJwGFRdVdg0cd7jePQFHLBhd+X3gZzHYFwdheoONEE1KBWOaidn+Vm+YQFlcd9yaV231OScKlUtfWhrhA94tO2JM X-Received: from ajr0.svl.corp.google.com ([2620:15c:2cd:203:d2f:99bb:c1e0:34ba]) (user=axelrasmussen job=sendgmr) by 2002:a63:da4a:: with SMTP id l10mr5133633pgj.222.1613166862664; Fri, 12 Feb 2021 13:54:22 -0800 (PST) Date: Fri, 12 Feb 2021 13:53:59 -0800 In-Reply-To: <20210212215403.3457686-1-axelrasmussen@google.com> Message-Id: <20210212215403.3457686-4-axelrasmussen@google.com> Mime-Version: 1.0 References: <20210212215403.3457686-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.30.0.478.g8a0d178c01-goog Subject: [PATCH v6 3/7] userfaultfd: disable huge PMD sharing for minor fault registered VMAs From: Axel Rasmussen To: Alexander Viro , Alexey Dobriyan , Andrea Arcangeli , Andrew Morton , Anshuman Khandual , Catalin Marinas , Chinwen Chang , Huang Ying , Ingo Molnar , Jann Horn , Jerome Glisse , Lokesh Gidra , "Matthew Wilcox (Oracle)" , Michael Ellerman , " =?utf-8?q?Michal_Koutn=C3=BD?= " , Michel Lespinasse , Mike Kravetz , Mike Rapoport , Nicholas Piggin , Peter Xu , Shaohua Li , Shawn Anastasio , Steven Rostedt , Steven Price , Vlastimil Babka Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Adam Ruprecht , Axel Rasmussen , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Mina Almasry , Oliver Upton X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: As the comment says: for the minor fault use case, although the page might be present and populated in the other (non-UFFD-registered) half of the mapping, it may be out of date, and we explicitly want userspace to get a minor fault so it can check and potentially update the page's contents. Huge PMD sharing would prevent these faults from occurring for suitably aligned areas, so disable it upon UFFD registration. Signed-off-by: Axel Rasmussen --- include/linux/userfaultfd_k.h | 26 +++++++++++++++++--------- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 4e03268c65ec..98cb6260b4b4 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -62,15 +62,6 @@ static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma, return vma->vm_userfaultfd_ctx.ctx == vm_ctx.ctx; } -/* - * Never enable huge pmd sharing on uffd-wp registered vmas, because uffd-wp - * protect information is per pgtable entry. - */ -static inline bool uffd_disable_huge_pmd_share(struct vm_area_struct *vma) -{ - return vma->vm_flags & VM_UFFD_WP; -} - static inline bool userfaultfd_missing(struct vm_area_struct *vma) { return vma->vm_flags & VM_UFFD_MISSING; @@ -83,6 +74,23 @@ static inline bool userfaultfd_wp(struct vm_area_struct *vma) bool userfaultfd_minor(struct vm_area_struct *vma); +/* + * Never enable huge pmd sharing on some uffd registered vmas: + * + * - VM_UFFD_WP VMAs, because write protect information is per pgtable entry. + * + * - VM_UFFD_MISSING VMAs with UFFD_FEATURE_MINOR_HUGETLBFS, because otherwise + * we would never get minor faults for VMAs which share huge pmds. (If you + * have two mappings to the same underlying pages, and fault in the + * non-UFFD-registered one with a write, with huge pmd sharing this would + * *also* setup the second UFFD-registered mapping, and we'd not get minor + * faults.) + */ +static inline bool uffd_disable_huge_pmd_share(struct vm_area_struct *vma) +{ + return userfaultfd_wp(vma) || userfaultfd_minor(vma); +} + static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma, pte_t pte) { From patchwork Fri Feb 12 21:54:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12086163 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3F73C433E6 for ; Fri, 12 Feb 2021 21:54:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3DF8364DDF for ; Fri, 12 Feb 2021 21:54:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3DF8364DDF Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A84C18D009B; Fri, 12 Feb 2021 16:54:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A600F8D0060; Fri, 12 Feb 2021 16:54:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 85F858D009B; Fri, 12 Feb 2021 16:54:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0158.hostedemail.com [216.40.44.158]) by kanga.kvack.org (Postfix) with ESMTP id 7219B8D0060 for ; Fri, 12 Feb 2021 16:54:26 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 453AC282D for ; Fri, 12 Feb 2021 21:54:26 +0000 (UTC) X-FDA: 77810970132.03.mass76_190a48b27624 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 2685B28A4EA for ; Fri, 12 Feb 2021 21:54:26 +0000 (UTC) X-HE-Tag: mass76_190a48b27624 X-Filterd-Recvd-Size: 7244 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Fri, 12 Feb 2021 21:54:25 +0000 (UTC) Received: by mail-pl1-f202.google.com with SMTP id a6so678987plm.17 for ; Fri, 12 Feb 2021 13:54:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=LiVtqkhHz/r4Zeb2EoTFGJ4n9HyxABq0yJQZLNwm2Q8=; b=O7vBRrIx5Dz4nWqloIt5jjd55w0xsLnlt0ZDfNUqgpCSKytSZeYWl0eBlXJzeWpeRO mscnUD5ciEeVnFWj+vELioxHhbcdFk8ZcE4meUV7GbG7ACQ9UbH9iHAy7dTBR40KHQN5 P0hsB2ltoSujvseIWy6uirjIjIg4naFd/PpA4y8/OsQjCl3eTn+PF4YA7JYYQgBZNuWd +fufUKzyhCpmV5A54qOsXjtQjiaEHdyjh5wo/oIisoos1/fGTY6VrO+T/UrFGDo4JKnK tQPUFUk28KtNK14EG6HH21RO5ZAJLF4s/Zb1WVduHQCLGjGSRkFjoI20fT39Bjs8h2ci 2C7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=LiVtqkhHz/r4Zeb2EoTFGJ4n9HyxABq0yJQZLNwm2Q8=; b=sUvuRBiyV7JQ+GJOWmQUipRNnVWtZgHUL5XVYBa/QCjE2Qvf9cZwUnBVnPvxwtW0TU quOLSLwnQ2tSg7ZRqM1vTW2HqiZu0BTAQt1XSjGJRFV6mL7PDX5szU/AsrxLkXaXUw/0 vLPVIkT+XtEcDOpUUPNsognzPNy4XENi4IxJ/fs9QfiAFkpygv4mgi5UYl2UCisBsU3J iqy3Aw1I0hwQtJbzL36wmr14mgSEUEGXfcafFpqwtPFBSpicCt+MhoyTnB4YxfV45XUd 9Q/m8fbE0RYNcjmVrjgnHBozYQOO8AD/EptLeNsjIIN2/rFb0FP3txaASySkxUghUSgZ ln6A== X-Gm-Message-State: AOAM533cfgUGYwf+uxCh7ssF1V26Odnc3isuQ4Wm7j4N9wGtqp53E7r5 LZyUZj72AHHX74I/Cxv7WtYyvcP8ZuzkrXpb/uor X-Google-Smtp-Source: ABdhPJwIyWsSZkVJMhVY+zxGWSXQg/fBoMBEYQ3+treCTWr7uZ5zN8pWhsGxAM08+YjFkTJSWMtS/rr2O9Evf+0i2xFG X-Received: from ajr0.svl.corp.google.com ([2620:15c:2cd:203:d2f:99bb:c1e0:34ba]) (user=axelrasmussen job=sendgmr) by 2002:a17:90a:8c18:: with SMTP id a24mr2543745pjo.218.1613166864656; Fri, 12 Feb 2021 13:54:24 -0800 (PST) Date: Fri, 12 Feb 2021 13:54:00 -0800 In-Reply-To: <20210212215403.3457686-1-axelrasmussen@google.com> Message-Id: <20210212215403.3457686-5-axelrasmussen@google.com> Mime-Version: 1.0 References: <20210212215403.3457686-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.30.0.478.g8a0d178c01-goog Subject: [PATCH v6 4/7] userfaultfd: hugetlbfs: only compile UFFD helpers if config enabled From: Axel Rasmussen To: Alexander Viro , Alexey Dobriyan , Andrea Arcangeli , Andrew Morton , Anshuman Khandual , Catalin Marinas , Chinwen Chang , Huang Ying , Ingo Molnar , Jann Horn , Jerome Glisse , Lokesh Gidra , "Matthew Wilcox (Oracle)" , Michael Ellerman , " =?utf-8?q?Michal_Koutn=C3=BD?= " , Michel Lespinasse , Mike Kravetz , Mike Rapoport , Nicholas Piggin , Peter Xu , Shaohua Li , Shawn Anastasio , Steven Rostedt , Steven Price , Vlastimil Babka Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Adam Ruprecht , Axel Rasmussen , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Mina Almasry , Oliver Upton X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: For background, mm/userfaultfd.c provides a general mcopy_atomic implementation. But some types of memory (i.e., hugetlb and shmem) need a slightly different implementation, so they provide their own helpers for this. In other words, userfaultfd is the only caller of these functions. This patch achieves two things: 1. Don't spend time compiling code which will end up never being referenced anyway (a small build time optimization). 2. In patches later in this series, we extend the signature of these helpers with UFFD-specific state (a mode enumeration). Once this happens, we *have to* either not compile the helpers, or unconditionally define the UFFD-only state (which seems messier to me). This includes the declarations in the headers, as otherwise they'd yield warnings about implicitly defining the type of those arguments. Reviewed-by: Mike Kravetz Reviewed-by: Peter Xu Signed-off-by: Axel Rasmussen --- include/linux/hugetlb.h | 4 ++++ mm/hugetlb.c | 2 ++ 2 files changed, 6 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index d740c6fd19ae..aa9e1d6de831 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -134,11 +134,13 @@ void hugetlb_show_meminfo(void); unsigned long hugetlb_total_pages(void); vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, unsigned int flags); +#ifdef CONFIG_USERFAULTFD int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, struct page **pagep); +#endif /* CONFIG_USERFAULTFD */ bool hugetlb_reserve_pages(struct inode *inode, long from, long to, struct vm_area_struct *vma, vm_flags_t vm_flags); @@ -309,6 +311,7 @@ static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb, BUG(); } +#ifdef CONFIG_USERFAULTFD static inline int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, struct vm_area_struct *dst_vma, @@ -319,6 +322,7 @@ static inline int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, BUG(); return 0; } +#endif /* CONFIG_USERFAULTFD */ static inline pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, unsigned long sz) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 93307fb058b7..37b9ff7c2d04 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4638,6 +4638,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, return ret; } +#ifdef CONFIG_USERFAULTFD /* * Used by userfaultfd UFFDIO_COPY. Based on mcopy_atomic_pte with * modifications for huge pages. @@ -4768,6 +4769,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, put_page(page); goto out; } +#endif /* CONFIG_USERFAULTFD */ static void record_subpages_vmas(struct page *page, struct vm_area_struct *vma, int refs, struct page **pages, From patchwork Fri Feb 12 21:54:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12086165 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3F42C433E0 for ; Fri, 12 Feb 2021 21:54:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5217A64DDF for ; Fri, 12 Feb 2021 21:54:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5217A64DDF Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 63EEF8D009C; Fri, 12 Feb 2021 16:54:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 616F88D0060; Fri, 12 Feb 2021 16:54:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 508BF8D009C; Fri, 12 Feb 2021 16:54:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0248.hostedemail.com [216.40.44.248]) by kanga.kvack.org (Postfix) with ESMTP id 353A48D0060 for ; Fri, 12 Feb 2021 16:54:28 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id F31B01837FA19 for ; Fri, 12 Feb 2021 21:54:27 +0000 (UTC) X-FDA: 77810970216.19.3D77BC0 Received: from mail-qv1-f74.google.com (mail-qv1-f74.google.com [209.85.219.74]) by imf28.hostedemail.com (Postfix) with ESMTP id 85F332000D82 for ; Fri, 12 Feb 2021 21:54:26 +0000 (UTC) Received: by mail-qv1-f74.google.com with SMTP id q104so530091qvq.20 for ; Fri, 12 Feb 2021 13:54:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=DWVyr1ZZbWP2yL4UmcS8cS1Q3kIDy4wU8uYCp1CCBqo=; b=o8rqDRHpEiBn+eOk9OgsMrkXIiHamj2wsmCP6vcv0UXSLKbj+2TZy4w/9IB8OyVNbv P4iCLycQ9QOmT3XnHWrmIrqavOUxP/EQxwccwPjtsgOZeIvyDbq+aFDRT3OH57/WTOXj tCwe91VEDdKJcm2y4pfoeCMTH7IN9cqm9LeNW1vvIiEJQ3Xvo8iiT31s1ZnoWvYOoNAk YT0W1e7U92uUV9ZaCnka0okFlTghxMmnRI3+r1qTOg2l0ymQOe5FDkOAiyg17jRjj8+C KrWxCV++mulatPyT3k2hCZAZzXqAE6nBNRzz3WTPWSQQWHF/hX6huma9enm/Q2lGDyod fdQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=DWVyr1ZZbWP2yL4UmcS8cS1Q3kIDy4wU8uYCp1CCBqo=; b=hbVJfYq1PdpOiWdul9gIVIy33xYed4o0TwPwCXGCl0nd6QVDQnNXNWRdCmgRA+/l2K aYqe9GNwcEcyWAuzYiHT8bLJjwi/KYjRSJLIdedWiOefu+ONHnbYPJ3pvQOh/R6iIomx /GkbC6fSB2k/tVGKTxsKPJjXtQwjNdYiYJ57uOABqosLzHt/1Ipi8lZZK3fSKU2O552N BAYzZOu4/x2b8c02OhYeLWqwwKu7OKTd2nc64ajUyZ3nZ0UqhlhDdG55RmTw48k7x5JZ ahsX4OVAQ5jFzlZsERwhTEdSUQE5f8+bORIUqxzGdRJZGMbE9zTMWqMsEbsUpCIC0GH/ nlYg== X-Gm-Message-State: AOAM533Mw2lgpkSQS30sYarYfkxmEol60TVSoZtR4zrDyVmOWXSxy//l 80mqPBTh2MzjJcgdXetz9wqt9xEJJWJSDn01/KxA X-Google-Smtp-Source: ABdhPJxoRjHRBdTebnE2N+h8biPOdA1oJNTh8Yy0zj/34QQOIWfWLCoDkrcsGXHmrYxRzhEDuBKcl5EXLL7Ir7IK+ODG X-Received: from ajr0.svl.corp.google.com ([2620:15c:2cd:203:d2f:99bb:c1e0:34ba]) (user=axelrasmussen job=sendgmr) by 2002:a0c:80e9:: with SMTP id 96mr4268727qvb.53.1613166866493; Fri, 12 Feb 2021 13:54:26 -0800 (PST) Date: Fri, 12 Feb 2021 13:54:01 -0800 In-Reply-To: <20210212215403.3457686-1-axelrasmussen@google.com> Message-Id: <20210212215403.3457686-6-axelrasmussen@google.com> Mime-Version: 1.0 References: <20210212215403.3457686-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.30.0.478.g8a0d178c01-goog Subject: [PATCH v6 5/7] userfaultfd: add UFFDIO_CONTINUE ioctl From: Axel Rasmussen To: Alexander Viro , Alexey Dobriyan , Andrea Arcangeli , Andrew Morton , Anshuman Khandual , Catalin Marinas , Chinwen Chang , Huang Ying , Ingo Molnar , Jann Horn , Jerome Glisse , Lokesh Gidra , "Matthew Wilcox (Oracle)" , Michael Ellerman , " =?utf-8?q?Michal_Koutn=C3=BD?= " , Michel Lespinasse , Mike Kravetz , Mike Rapoport , Nicholas Piggin , Peter Xu , Shaohua Li , Shawn Anastasio , Steven Rostedt , Steven Price , Vlastimil Babka Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Adam Ruprecht , Axel Rasmussen , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Mina Almasry , Oliver Upton X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 85F332000D82 X-Stat-Signature: hptoyoeisa9tsjzx8tb4xj84npx5rx9m Received-SPF: none (flex--axelrasmussen.bounces.google.com>: No applicable sender policy available) receiver=imf28; identity=mailfrom; envelope-from="<3EvkmYA0KCKcHeLSYHZTbZZLUNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--axelrasmussen.bounces.google.com>"; helo=mail-qv1-f74.google.com; client-ip=209.85.219.74 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1613166866-322111 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This ioctl is how userspace ought to resolve "minor" userfaults. The idea is, userspace is notified that a minor fault has occurred. It might change the contents of the page using its second non-UFFD mapping, or not. Then, it calls UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". Note that it doesn't make much sense to use UFFDIO_{COPY,ZEROPAGE} for minor fault registered VMAs. ZEROPAGE maps the VMA to the zero page; but in the minor fault case, we already have some pre-existing underlying page. Likewise, UFFDIO_COPY isn't useful if we have a second non-UFFD mapping. We'd just use memcpy() or similar instead. It turns out hugetlb_mcopy_atomic_pte() already does very close to what we want, if an existing page is provided via `struct page **pagep`. We already special-case the behavior a bit for the UFFDIO_ZEROPAGE case, so just extend that design: add an enum for the three modes of operation, and make the small adjustments needed for the MCOPY_ATOMIC_CONTINUE case. (Basically, look up the existing page, and avoid adding the existing page to the page cache or calling set_page_huge_active() on it.) Signed-off-by: Axel Rasmussen --- fs/userfaultfd.c | 78 ++++++++++++++++++++++++++++++-- include/linux/hugetlb.h | 3 ++ include/linux/userfaultfd_k.h | 18 ++++++++ include/uapi/linux/userfaultfd.h | 21 ++++++++- mm/hugetlb.c | 41 +++++++++++------ mm/userfaultfd.c | 37 +++++++++------ 6 files changed, 164 insertions(+), 34 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index edfdb8f1c740..b0142a9919f7 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1327,7 +1327,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, struct uffdio_register __user *user_uffdio_register; unsigned long vm_flags, new_flags; bool found; - bool basic_ioctls; + bool found_hugetlb; unsigned long start, end, vma_end; user_uffdio_register = (struct uffdio_register __user *) arg; @@ -1386,7 +1386,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, * Search for not compatible vmas. */ found = false; - basic_ioctls = false; + found_hugetlb = false; for (cur = vma; cur && cur->vm_start < end; cur = cur->vm_next) { cond_resched(); @@ -1441,7 +1441,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, * Note vmas containing huge pages */ if (is_vm_hugetlb_page(cur)) - basic_ioctls = true; + found_hugetlb = true; found = true; } @@ -1514,7 +1514,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, if (!ret) { __u64 ioctls_out; - ioctls_out = basic_ioctls ? UFFD_API_RANGE_IOCTLS_BASIC : + ioctls_out = found_hugetlb ? UFFD_API_RANGE_IOCTLS_BASIC : UFFD_API_RANGE_IOCTLS; /* @@ -1524,6 +1524,13 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, if (!(uffdio_register.mode & UFFDIO_REGISTER_MODE_WP)) ioctls_out &= ~((__u64)1 << _UFFDIO_WRITEPROTECT); + /* CONTINUE ioctl is only supported for minor ranges. */ + if (!(found_hugetlb && + (uffdio_register.mode & UFFDIO_REGISTER_MODE_MISSING) && + (ctx->features & UFFD_FEATURE_MINOR_HUGETLBFS))) { + ioctls_out &= ~((__u64)1 << _UFFDIO_CONTINUE); + } + /* * Now that we scanned all vmas we can already tell * userland which ioctls methods are guaranteed to @@ -1877,6 +1884,66 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, return ret; } +static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) +{ + __s64 ret; + struct uffdio_continue uffdio_continue; + struct uffdio_continue __user *user_uffdio_continue; + struct userfaultfd_wake_range range; + + user_uffdio_continue = (struct uffdio_continue __user *)arg; + + ret = -EAGAIN; + if (READ_ONCE(ctx->mmap_changing)) + goto out; + + ret = -EFAULT; + if (copy_from_user(&uffdio_continue, user_uffdio_continue, + /* don't copy the output fields */ + sizeof(uffdio_continue) - (sizeof(__s64)))) + goto out; + + ret = validate_range(ctx->mm, &uffdio_continue.range.start, + uffdio_continue.range.len); + if (ret) + goto out; + + ret = -EINVAL; + /* double check for wraparound just in case. */ + if (uffdio_continue.range.start + uffdio_continue.range.len <= + uffdio_continue.range.start) { + goto out; + } + if (uffdio_continue.mode & ~UFFDIO_CONTINUE_MODE_DONTWAKE) + goto out; + + if (mmget_not_zero(ctx->mm)) { + ret = mcopy_continue(ctx->mm, uffdio_continue.range.start, + uffdio_continue.range.len, + &ctx->mmap_changing); + mmput(ctx->mm); + } else { + return -ESRCH; + } + + if (unlikely(put_user(ret, &user_uffdio_continue->mapped))) + return -EFAULT; + if (ret < 0) + goto out; + + /* len == 0 would wake all */ + BUG_ON(!ret); + range.len = ret; + if (!(uffdio_continue.mode & UFFDIO_CONTINUE_MODE_DONTWAKE)) { + range.start = uffdio_continue.range.start; + wake_userfault(ctx, &range); + } + ret = range.len == uffdio_continue.range.len ? 0 : -EAGAIN; + +out: + return ret; +} + static inline unsigned int uffd_ctx_features(__u64 user_features) { /* @@ -1961,6 +2028,9 @@ static long userfaultfd_ioctl(struct file *file, unsigned cmd, case UFFDIO_WRITEPROTECT: ret = userfaultfd_writeprotect(ctx, arg); break; + case UFFDIO_CONTINUE: + ret = userfaultfd_continue(ctx, arg); + break; } return ret; } diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index aa9e1d6de831..3d01d228fc78 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -11,6 +11,7 @@ #include #include #include +#include struct ctl_table; struct user_struct; @@ -139,6 +140,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, + enum mcopy_atomic_mode mode, struct page **pagep); #endif /* CONFIG_USERFAULTFD */ bool hugetlb_reserve_pages(struct inode *inode, long from, long to, @@ -317,6 +319,7 @@ static inline int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, + enum mcopy_atomic_mode mode, struct page **pagep) { BUG(); diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 98cb6260b4b4..5d87f2883783 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -44,6 +44,22 @@ extern int sysctl_unprivileged_userfaultfd; extern vm_fault_t handle_userfault(struct vm_fault *vmf, enum uffd_trigger_reason reason); +/* + * The mode of operation for __mcopy_atomic and its helpers. + * + * This is almost an implementation detail (mcopy_atomic below doesn't take this + * as a parameter), but it's exposed here because memory-kind-specific + * implementations (e.g. hugetlbfs) need to know the mode of operation. + */ +enum mcopy_atomic_mode { + /* A normal copy_from_user into the destination range. */ + MCOPY_ATOMIC_NORMAL, + /* Don't copy; map the destination range to the zero page. */ + MCOPY_ATOMIC_ZEROPAGE, + /* Just install pte(s) with the existing page(s) in the page cache. */ + MCOPY_ATOMIC_CONTINUE, +}; + extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, bool *mmap_changing, __u64 mode); @@ -51,6 +67,8 @@ extern ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long len, bool *mmap_changing); +extern ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long dst_start, + unsigned long len, bool *mmap_changing); extern int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, unsigned long len, bool enable_wp, bool *mmap_changing); diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 6b038d56bca7..10e98cb29352 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -37,10 +37,12 @@ ((__u64)1 << _UFFDIO_WAKE | \ (__u64)1 << _UFFDIO_COPY | \ (__u64)1 << _UFFDIO_ZEROPAGE | \ - (__u64)1 << _UFFDIO_WRITEPROTECT) + (__u64)1 << _UFFDIO_WRITEPROTECT | \ + (__u64)1 << _UFFDIO_CONTINUE) #define UFFD_API_RANGE_IOCTLS_BASIC \ ((__u64)1 << _UFFDIO_WAKE | \ - (__u64)1 << _UFFDIO_COPY) + (__u64)1 << _UFFDIO_COPY | \ + (__u64)1 << _UFFDIO_CONTINUE) /* * Valid ioctl command number range with this API is from 0x00 to @@ -56,6 +58,7 @@ #define _UFFDIO_COPY (0x03) #define _UFFDIO_ZEROPAGE (0x04) #define _UFFDIO_WRITEPROTECT (0x06) +#define _UFFDIO_CONTINUE (0x07) #define _UFFDIO_API (0x3F) /* userfaultfd ioctl ids */ @@ -74,6 +77,8 @@ struct uffdio_zeropage) #define UFFDIO_WRITEPROTECT _IOWR(UFFDIO, _UFFDIO_WRITEPROTECT, \ struct uffdio_writeprotect) +#define UFFDIO_CONTINUE _IOR(UFFDIO, _UFFDIO_CONTINUE, \ + struct uffdio_continue) /* read() structure */ struct uffd_msg { @@ -264,6 +269,18 @@ struct uffdio_writeprotect { __u64 mode; }; +struct uffdio_continue { + struct uffdio_range range; +#define UFFDIO_CONTINUE_MODE_DONTWAKE ((__u64)1<<0) + __u64 mode; + + /* + * Fields below here are written by the ioctl and must be at the end: + * the copy_from_user will not read past here. + */ + __s64 mapped; +}; + /* * Flags for the userfaultfd(2) system call itself. */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 37b9ff7c2d04..adca0d99b954 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -39,7 +39,6 @@ #include #include #include -#include #include #include "internal.h" @@ -4648,8 +4647,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, + enum mcopy_atomic_mode mode, struct page **pagep) { + bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE); struct address_space *mapping; pgoff_t idx; unsigned long size; @@ -4659,8 +4660,18 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, spinlock_t *ptl; int ret; struct page *page; + int writable; - if (!*pagep) { + mapping = dst_vma->vm_file->f_mapping; + idx = vma_hugecache_offset(h, dst_vma, dst_addr); + + if (is_continue) { + ret = -EFAULT; + page = find_lock_page(mapping, idx); + *pagep = NULL; + if (!page) + goto out; + } else if (!*pagep) { ret = -ENOMEM; page = alloc_huge_page(dst_vma, dst_addr, 0); if (IS_ERR(page)) @@ -4689,13 +4700,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, */ __SetPageUptodate(page); - mapping = dst_vma->vm_file->f_mapping; - idx = vma_hugecache_offset(h, dst_vma, dst_addr); - - /* - * If shared, add to page cache - */ - if (vm_shared) { + /* Add shared, newly allocated pages to the page cache. */ + if (vm_shared && !is_continue) { size = i_size_read(mapping->host) >> huge_page_shift(h); ret = -EFAULT; if (idx >= size) @@ -4740,8 +4746,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, hugepage_add_new_anon_rmap(page, dst_vma, dst_addr); } - _dst_pte = make_huge_pte(dst_vma, page, dst_vma->vm_flags & VM_WRITE); - if (dst_vma->vm_flags & VM_WRITE) + /* For CONTINUE on a non-shared VMA, don't set VM_WRITE for CoW. */ + if (is_continue && !vm_shared) + writable = 0; + else + writable = dst_vma->vm_flags & VM_WRITE; + + _dst_pte = make_huge_pte(dst_vma, page, writable); + if (writable) _dst_pte = huge_pte_mkdirty(_dst_pte); _dst_pte = pte_mkyoung(_dst_pte); @@ -4755,15 +4767,16 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, update_mmu_cache(dst_vma, dst_addr, dst_pte); spin_unlock(ptl); - SetHPageMigratable(page); - if (vm_shared) + if (!is_continue) + SetHPageMigratable(page); + if (vm_shared || is_continue) unlock_page(page); ret = 0; out: return ret; out_release_unlock: spin_unlock(ptl); - if (vm_shared) + if (vm_shared || is_continue) unlock_page(page); out_release_nounlock: put_page(page); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index b2ce61c1b50d..ce6cb4760d2c 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -207,7 +207,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - bool zeropage) + enum mcopy_atomic_mode mode) { int vm_alloc_shared = dst_vma->vm_flags & VM_SHARED; int vm_shared = dst_vma->vm_flags & VM_SHARED; @@ -227,7 +227,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, * by THP. Since we can not reliably insert a zero page, this * feature is not supported. */ - if (zeropage) { + if (mode == MCOPY_ATOMIC_ZEROPAGE) { mmap_read_unlock(dst_mm); return -EINVAL; } @@ -273,8 +273,6 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, } while (src_addr < src_start + len) { - pte_t dst_pteval; - BUG_ON(dst_addr >= dst_start + len); /* @@ -297,16 +295,16 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, goto out_unlock; } - err = -EEXIST; - dst_pteval = huge_ptep_get(dst_pte); - if (!huge_pte_none(dst_pteval)) { + if (mode != MCOPY_ATOMIC_CONTINUE && + !huge_pte_none(huge_ptep_get(dst_pte))) { + err = -EEXIST; mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); goto out_unlock; } err = hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, - dst_addr, src_addr, &page); + dst_addr, src_addr, mode, &page); mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); @@ -408,7 +406,7 @@ extern ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - bool zeropage); + enum mcopy_atomic_mode mode); #endif /* CONFIG_HUGETLB_PAGE */ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, @@ -458,7 +456,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - bool zeropage, + enum mcopy_atomic_mode mcopy_mode, bool *mmap_changing, __u64 mode) { @@ -469,6 +467,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, long copied; struct page *page; bool wp_copy; + bool zeropage = (mcopy_mode == MCOPY_ATOMIC_ZEROPAGE); /* * Sanitize the command parameters: @@ -527,10 +526,12 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, */ if (is_vm_hugetlb_page(dst_vma)) return __mcopy_atomic_hugetlb(dst_mm, dst_vma, dst_start, - src_start, len, zeropage); + src_start, len, mcopy_mode); if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma)) goto out_unlock; + if (mcopy_mode == MCOPY_ATOMIC_CONTINUE) + goto out_unlock; /* * Ensure the dst_vma has a anon_vma or this page @@ -626,14 +627,22 @@ ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, bool *mmap_changing, __u64 mode) { - return __mcopy_atomic(dst_mm, dst_start, src_start, len, false, - mmap_changing, mode); + return __mcopy_atomic(dst_mm, dst_start, src_start, len, + MCOPY_ATOMIC_NORMAL, mmap_changing, mode); } ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start, unsigned long len, bool *mmap_changing) { - return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing, 0); + return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_ZEROPAGE, + mmap_changing, 0); +} + +ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long start, + unsigned long len, bool *mmap_changing) +{ + return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_CONTINUE, + mmap_changing, 0); } int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, From patchwork Fri Feb 12 21:54:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12086167 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 296ADC433E6 for ; Fri, 12 Feb 2021 21:54:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B259364DDF for ; Fri, 12 Feb 2021 21:54:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B259364DDF Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 37F9B8D009D; Fri, 12 Feb 2021 16:54:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3567E8D0060; Fri, 12 Feb 2021 16:54:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2239D8D009D; Fri, 12 Feb 2021 16:54:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0100.hostedemail.com [216.40.44.100]) by kanga.kvack.org (Postfix) with ESMTP id F2C958D0060 for ; Fri, 12 Feb 2021 16:54:30 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id C470E282D for ; Fri, 12 Feb 2021 21:54:30 +0000 (UTC) X-FDA: 77810970300.05.legs36_2608c0127624 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id AB1A618347341 for ; Fri, 12 Feb 2021 21:54:30 +0000 (UTC) X-HE-Tag: legs36_2608c0127624 X-Filterd-Recvd-Size: 12003 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) by imf23.hostedemail.com (Postfix) with ESMTP for ; Fri, 12 Feb 2021 21:54:30 +0000 (UTC) Received: by mail-pg1-f201.google.com with SMTP id t196so764197pgb.20 for ; Fri, 12 Feb 2021 13:54:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=bDtjA5Mj64PKIXivN7+vBaDB7zA4gfkDWN/bO1d3kzE=; b=mlGrz0560MvpN6T2897JEFwbu8KJgaRSL8SgF9PP3/OStRDKY5lmRkzyCNdkhemT0y dZOruNOcEQfjty93lbZyO4wyIaDqkxNPZFw98D/HCmo4JFl/NfYr/ZN/u1plmNp80MIk 1CRMRMQ/Do3BDpbIoK5/C4FdumyYHZHclNj9Jn7dU6a21II1aUo+5seKKYTtlDqwbkLv pe6u/9N2xLFVZenMgg3V+5mPv8ybp2YVNq6NGjU3tG9kNP2nm3dSzjF+rQRHkpctaNnz xoms8isX2NshX0o10O99DyI4KkzsA+KDkZj4HgX+FLeDbfeK3mxdslR5azG0anjvhkwY aWYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=bDtjA5Mj64PKIXivN7+vBaDB7zA4gfkDWN/bO1d3kzE=; b=CjYKW0RiQOpNPSBIXcLzIuOYX4bbN433CPQSmpswmHjJh0lAgdpM1GDf/Tv523U212 mWN0H3CLRmj5jBuUgd2c1+Aj0IwmJpMEu2lFVSO4OxS0kaXkK5JPebec3yT4+/3mjCDX BzbJaZwYdhEBjcOzPIfuvGZf7l5gezEahipt/FWlZtL142rIvpbawFNY1fPZU6rkOOuc TGlklA6Kimc6mohQlTjRXXZmkLlMEXstPqrI9E8wpK7qKss0elmqMAzIChmX9CCHnC2T 5ox/bzfix2QUxPDc0oHgxOQvJp8+PvqnGM/42oCbOJxZqJB+B++IkqYXrpGGbsXy6kHK 5gKQ== X-Gm-Message-State: AOAM532UrncT6HQ500F02bv531DQI/Ipbe8PNrUheErDXRCbb1ckifzt SFDWmRK4e06Hcj3YQQeFhP0mEJ2CYwCISOKBmI39 X-Google-Smtp-Source: ABdhPJwM54ZLRvUd3VQAdohmT5ljP8vEvqwCvhdBPZQxg11B4p/OTJRCJ7EcRDarTlPbCflo4kN0FWkumWE8VVbCbXgq X-Received: from ajr0.svl.corp.google.com ([2620:15c:2cd:203:d2f:99bb:c1e0:34ba]) (user=axelrasmussen job=sendgmr) by 2002:a17:90a:8d83:: with SMTP id d3mr1528pjo.0.1613166868460; Fri, 12 Feb 2021 13:54:28 -0800 (PST) Date: Fri, 12 Feb 2021 13:54:02 -0800 In-Reply-To: <20210212215403.3457686-1-axelrasmussen@google.com> Message-Id: <20210212215403.3457686-7-axelrasmussen@google.com> Mime-Version: 1.0 References: <20210212215403.3457686-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.30.0.478.g8a0d178c01-goog Subject: [PATCH v6 6/7] userfaultfd: update documentation to describe minor fault handling From: Axel Rasmussen To: Alexander Viro , Alexey Dobriyan , Andrea Arcangeli , Andrew Morton , Anshuman Khandual , Catalin Marinas , Chinwen Chang , Huang Ying , Ingo Molnar , Jann Horn , Jerome Glisse , Lokesh Gidra , "Matthew Wilcox (Oracle)" , Michael Ellerman , " =?utf-8?q?Michal_Koutn=C3=BD?= " , Michel Lespinasse , Mike Kravetz , Mike Rapoport , Nicholas Piggin , Peter Xu , Shaohua Li , Shawn Anastasio , Steven Rostedt , Steven Price , Vlastimil Babka Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Adam Ruprecht , Axel Rasmussen , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Mina Almasry , Oliver Upton X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Reword / reorganize things a little bit into "lists", so new features / modes / ioctls can sort of just be appended. Describe how minor faults can be intercepted and resolved. Make it clear that COPY and ZEROPAGE are for missing faults, whereas CONTINUE is for minor faults. Signed-off-by: Axel Rasmussen --- Documentation/admin-guide/mm/userfaultfd.rst | 109 ++++++++++++------- 1 file changed, 68 insertions(+), 41 deletions(-) diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst index 65eefa66c0ba..a3434b3f4f2d 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -63,36 +63,37 @@ the generic ioctl available. The ``uffdio_api.features`` bitmask returned by the ``UFFDIO_API`` ioctl defines what memory types are supported by the ``userfaultfd`` and what -events, except page fault notifications, may be generated. - -If the kernel supports registering ``userfaultfd`` ranges on hugetlbfs -virtual memory areas, ``UFFD_FEATURE_MISSING_HUGETLBFS`` will be set in -``uffdio_api.features``. Similarly, ``UFFD_FEATURE_MISSING_SHMEM`` will be -set if the kernel supports registering ``userfaultfd`` ranges on shared -memory (covering all shmem APIs, i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, -``MAP_SHARED``, ``memfd_create``, etc). - -The userland application that wants to use ``userfaultfd`` with hugetlbfs -or shared memory need to set the corresponding flag in -``uffdio_api.features`` to enable those features. - -If the userland desires to receive notifications for events other than -page faults, it has to verify that ``uffdio_api.features`` has appropriate -``UFFD_FEATURE_EVENT_*`` bits set. These events are described in more -detail below in `Non-cooperative userfaultfd`_ section. - -Once the ``userfaultfd`` has been enabled the ``UFFDIO_REGISTER`` ioctl should -be invoked (if present in the returned ``uffdio_api.ioctls`` bitmask) to -register a memory range in the ``userfaultfd`` by setting the +events, beyond page fault notifications, may be generated: + +- The ``UFFD_FEATURE_EVENT_*`` flags indicate that various other events + other than page faults are supported. These events are described in more + detail below in the `Non-cooperative userfaultfd`_ section. + +- ``UFFD_FEATURE_MISSING_HUGETLBFS`` and ``UFFD_FEATURE_MISSING_SHMEM`` + indicate that the kernel supports ``UFFDIO_REGISTER_MODE_MISSING`` + registrations for hugetlbfs and shared memory (covering all shmem APIs, + i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, ``MAP_SHARED``, ``memfd_create``, + etc) virtual memory areas, respectively. + +- ``UFFD_FEATURE_MINOR_HUGETLBFS`` indicates that hugetlbfs virtual memory + areas that are ``UFFDIO_REGISTER_MODE_MINOR`` registered will also + receive page fault events for minor faults. This feature is not enabled + unless specifically requested. + +The userland application should set the feature flags it intends to use +when invoking the ``UFFDIO_API`` ioctl, to request that those features be +enabled if supported. + +Once the ``userfaultfd`` API has been enabled the ``UFFDIO_REGISTER`` +ioctl should be invoked (if present in the returned ``uffdio_api.ioctls`` +bitmask) to register a memory range in the ``userfaultfd`` by setting the uffdio_register structure accordingly. The ``uffdio_register.mode`` bitmask will specify to the kernel which kind of faults to track for -the range (``UFFDIO_REGISTER_MODE_MISSING`` would track missing -pages). The ``UFFDIO_REGISTER`` ioctl will return the +the range. The ``UFFDIO_REGISTER`` ioctl will return the ``uffdio_register.ioctls`` bitmask of ioctls that are suitable to resolve userfaults on the range registered. Not all ioctls will necessarily be -supported for all memory types depending on the underlying virtual -memory backend (anonymous memory vs tmpfs vs real filebacked -mappings). +supported for all memory types (e.g. anonymous memory vs. shmem vs. +hugetlbfs), or all types of intercepted faults. Userland can use the ``uffdio_register.ioctls`` to manage the virtual address space in the background (to add or potentially also remove @@ -100,21 +101,47 @@ memory from the ``userfaultfd`` registered range). This means a userfault could be triggering just before userland maps in the background the user-faulted page. -The primary ioctl to resolve userfaults is ``UFFDIO_COPY``. That -atomically copies a page into the userfault registered range and wakes -up the blocked userfaults -(unless ``uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE`` is set). -Other ioctl works similarly to ``UFFDIO_COPY``. They're atomic as in -guaranteeing that nothing can see an half copied page since it'll -keep userfaulting until the copy has finished. +Resolving Userfaults +-------------------- + +There are three basic ways to resolve userfaults: + +- ``UFFDIO_COPY`` atomically copies some existing page contents from + userspace. + +- ``UFFDIO_ZEROPAGE`` atomically zeros the new page. + +- ``UFFDIO_CONTINUE`` maps an existing, previously-populated page. + +These operations are atomic in the sense that they guarantee nothing can +see a half-populated page, since readers will keep userfaulting until the +operation has finished. + +By default, these wake up userfaults blocked on the range in question. +They support a ``UFFDIO_*_MODE_DONTWAKE`` ``mode`` flag, which indicates +that waking will be done separately at some later time. + +Which ioctl to choose depends on the kind of page fault, and what we'd +like to do to resolve it: + +- For missing faults (neither ``UFFD_PAGEFAULT_FLAG_WP`` nor + ``UFFD_PAGEFAULT_FLAG_MINOR`` are set), the fault needs to be resolved + by either providing a new page (``UFFDIO_COPY``), or mapping the zero + page (``UFFDIO_ZEROPAGE``). By default, the kernel would map the zero + page for a missing fault. With userfaultfd, userspace can decide what + content to provide before the faulting thread continues. + +- For minor faults (``UFFD_PAGEFAULT_FLAG_MINOR`` is set), there is an + existing page (in the page cache). Userspace has the option of modifying + the page's contents before resolving the fault. Once the contents are + correct (modified or not), userspace asks the kernel to map the page and + let the faulting thread continue with ``UFFDIO_CONTINUE``. Notes: -- If you requested ``UFFDIO_REGISTER_MODE_MISSING`` when registering then - you must provide some kind of page in your thread after reading from - the uffd. You must provide either ``UFFDIO_COPY`` or ``UFFDIO_ZEROPAGE``. - The normal behavior of the OS automatically providing a zero page on - an anonymous mmaping is not in place. +- You can tell which kind of fault occurred by examining + ``pagefault.flags`` within the ``uffd_msg``, checking for the + ``UFFD_PAGEFAULT_FLAG_*`` flags. - None of the page-delivering ioctls default to the range that you registered with. You must fill in all fields for the appropriate @@ -122,9 +149,9 @@ Notes: - You get the address of the access that triggered the missing page event out of a struct uffd_msg that you read in the thread from the - uffd. You can supply as many pages as you want with ``UFFDIO_COPY`` or - ``UFFDIO_ZEROPAGE``. Keep in mind that unless you used DONTWAKE then - the first of any of those IOCTLs wakes up the faulting thread. + uffd. You can supply as many pages as you want with these IOCTLs. + Keep in mind that unless you used DONTWAKE then the first of any of + those IOCTLs wakes up the faulting thread. - Be sure to test for all errors including (``pollfd[0].revents & POLLERR``). This can happen, e.g. when ranges From patchwork Fri Feb 12 21:54:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12086169 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24F98C433E0 for ; Fri, 12 Feb 2021 21:54:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A000564DD7 for ; Fri, 12 Feb 2021 21:54:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A000564DD7 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D482D8D009E; Fri, 12 Feb 2021 16:54:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CCCAE8D0060; Fri, 12 Feb 2021 16:54:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B6E0A8D009E; Fri, 12 Feb 2021 16:54:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0160.hostedemail.com [216.40.44.160]) by kanga.kvack.org (Postfix) with ESMTP id 9E1958D0060 for ; Fri, 12 Feb 2021 16:54:32 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 57B27824805A for ; Fri, 12 Feb 2021 21:54:32 +0000 (UTC) X-FDA: 77810970384.20.B03D4DE Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf05.hostedemail.com (Postfix) with ESMTP id A0989E0011C0 for ; Fri, 12 Feb 2021 21:54:30 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id u3so1048162ybj.13 for ; Fri, 12 Feb 2021 13:54:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=ynQoh3piWdcwMI3A9fmVyJvOTKo6cf3euEv7boIDrGw=; b=cRBlTlUPL+lFCjJb0ltkXmv5nyTjjwdtsKPw0sOX+NVTd3AFos4dfzjVZZat3UV6lo TFmDe6rMsEXpiGrz0irICyHyue4UsQkWbL6JjQB4oOPrpasSHgdguYQ3e5NwqHP+aqXx 7GWGDlFszTGeT/+3sbZTI9A2msJnIYqnKDpqls/g3+m/wiFO0dJyckcdeQKkz0GpcrMz gynpqIgesJvbGpNvTCQVzoup64iJEs6QnT5CK2cvZnZDjGDf7kct1SZXolOKAcViaZal eNNKoYLwj1c1Ou3vZGzEKQB2N0nvVh0j6IKBM1zQTiVpYsjBKkyHQ+Bpv2Am2BYJ5Rp1 eedg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ynQoh3piWdcwMI3A9fmVyJvOTKo6cf3euEv7boIDrGw=; b=Etya/BpNBksfM6SbxsCpF8EQYUqnCaUKMWLC7Y1h8Y7pSBQrQK1iRaPpUNtXIMBmme s54lVtucfLgYzZutjD/ZWJanmaonu3SOeZMPoeeYp0APtz3K5FlzT0QGYbTqYec0+Qfh eWo/piqYOPgTByCQY/VSggQjKqcjTZiJ90pRMA9vBi2DGpH3HPUZPJ5ZwZpmwoWKVXex 8JK2mSGHJMAd4QAYHbpIh1RZJr9B1gBYbHmQXS9qsd/ZLlgkWXvZEeDgdtrbTGTek/5Q ILM/Qx1bzBHZ3to87gnO+fA9DKn0cb2UUeIBUFIQpJJeh8KhLVp7kmZNlLBdACRqVkbt zP9w== X-Gm-Message-State: AOAM5317UxTaDCRLyK8eYxV+qy8h9QFiJ7wF6f6i8+VT8JtK3CYpSuYM UbdnSmuAZbGWEjAzZBbvmJAoUe0Aco2cpP7DLLeh X-Google-Smtp-Source: ABdhPJzLIMpTMFSvjBRXFuKcg5CYSqwMXlLH8ZNdogYGG1fT1is1JXX/gDwKjlilVePT2V1US2VWC5kxo6Ad1/2obs2g X-Received: from ajr0.svl.corp.google.com ([2620:15c:2cd:203:d2f:99bb:c1e0:34ba]) (user=axelrasmussen job=sendgmr) by 2002:a25:b883:: with SMTP id w3mr6815698ybj.321.1613166870943; Fri, 12 Feb 2021 13:54:30 -0800 (PST) Date: Fri, 12 Feb 2021 13:54:03 -0800 In-Reply-To: <20210212215403.3457686-1-axelrasmussen@google.com> Message-Id: <20210212215403.3457686-8-axelrasmussen@google.com> Mime-Version: 1.0 References: <20210212215403.3457686-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.30.0.478.g8a0d178c01-goog Subject: [PATCH v6 7/7] userfaultfd/selftests: add test exercising minor fault handling From: Axel Rasmussen To: Alexander Viro , Alexey Dobriyan , Andrea Arcangeli , Andrew Morton , Anshuman Khandual , Catalin Marinas , Chinwen Chang , Huang Ying , Ingo Molnar , Jann Horn , Jerome Glisse , Lokesh Gidra , "Matthew Wilcox (Oracle)" , Michael Ellerman , " =?utf-8?q?Michal_Koutn=C3=BD?= " , Michel Lespinasse , Mike Kravetz , Mike Rapoport , Nicholas Piggin , Peter Xu , Shaohua Li , Shawn Anastasio , Steven Rostedt , Steven Price , Vlastimil Babka Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Adam Ruprecht , Axel Rasmussen , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Mina Almasry , Oliver Upton X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: A0989E0011C0 X-Stat-Signature: edsto5rfqgskn3m7tro6gn9r9xcwqydw Received-SPF: none (flex--axelrasmussen.bounces.google.com>: No applicable sender policy available) receiver=imf05; identity=mailfrom; envelope-from="<3FvkmYA0KCKsLiPWcLdXfddPYRZZRWP.NZXWTYfi-XXVgLNV.ZcR@flex--axelrasmussen.bounces.google.com>"; helo=mail-yb1-f202.google.com; client-ip=209.85.219.202 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1613166870-854085 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Fix a dormant bug in userfaultfd_events_test(), where we did `return faulting_process(0)` instead of `exit(faulting_process(0))`. This caused the forked process to keep running, trying to execute any further test cases after the events test in parallel with the "real" process. Add a simple test case which exercises minor faults. In short, it does the following: 1. "Sets up" an area (area_dst) and a second shared mapping to the same underlying pages (area_dst_alias). 2. Register one of these areas with userfaultfd, in minor fault mode. 3. Start a second thread to handle any minor faults. 4. Populate the underlying pages with the non-UFFD-registered side of the mapping. Basically, memset() each page with some arbitrary contents. 5. Then, using the UFFD-registered mapping, read all of the page contents, asserting that the contents match expectations (we expect the minor fault handling thread can modify the page contents before resolving the fault). The minor fault handling thread, upon receiving an event, flips all the bits (~) in that page, just to prove that it can modify it in some arbitrary way. Then it issues a UFFDIO_CONTINUE ioctl, to setup the mapping and resolve the fault. The reading thread should wake up and see this modification. Currently the minor fault test is only enabled in hugetlb_shared mode, as this is the only configuration the kernel feature supports. Reviewed-by: Peter Xu Signed-off-by: Axel Rasmussen --- tools/testing/selftests/vm/userfaultfd.c | 147 ++++++++++++++++++++++- 1 file changed, 143 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 92b8ec423201..915b34d997ce 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -81,6 +81,8 @@ static volatile bool test_uffdio_copy_eexist = true; static volatile bool test_uffdio_zeropage_eexist = true; /* Whether to test uffd write-protection */ static bool test_uffdio_wp = false; +/* Whether to test uffd minor faults */ +static bool test_uffdio_minor = false; static bool map_shared; static int huge_fd; @@ -96,6 +98,7 @@ struct uffd_stats { int cpu; unsigned long missing_faults; unsigned long wp_faults; + unsigned long minor_faults; }; /* pthread_mutex_t starts at page offset 0 */ @@ -153,17 +156,19 @@ static void uffd_stats_reset(struct uffd_stats *uffd_stats, uffd_stats[i].cpu = i; uffd_stats[i].missing_faults = 0; uffd_stats[i].wp_faults = 0; + uffd_stats[i].minor_faults = 0; } } static void uffd_stats_report(struct uffd_stats *stats, int n_cpus) { int i; - unsigned long long miss_total = 0, wp_total = 0; + unsigned long long miss_total = 0, wp_total = 0, minor_total = 0; for (i = 0; i < n_cpus; i++) { miss_total += stats[i].missing_faults; wp_total += stats[i].wp_faults; + minor_total += stats[i].minor_faults; } printf("userfaults: %llu missing (", miss_total); @@ -172,6 +177,9 @@ static void uffd_stats_report(struct uffd_stats *stats, int n_cpus) printf("\b), %llu wp (", wp_total); for (i = 0; i < n_cpus; i++) printf("%lu+", stats[i].wp_faults); + printf("\b), %llu minor (", minor_total); + for (i = 0; i < n_cpus; i++) + printf("%lu+", stats[i].minor_faults); printf("\b)\n"); } @@ -328,7 +336,7 @@ static struct uffd_test_ops shmem_uffd_test_ops = { }; static struct uffd_test_ops hugetlb_uffd_test_ops = { - .expected_ioctls = UFFD_API_RANGE_IOCTLS_BASIC, + .expected_ioctls = UFFD_API_RANGE_IOCTLS_BASIC & ~(1 << _UFFDIO_CONTINUE), .allocate_area = hugetlb_allocate_area, .release_pages = hugetlb_release_pages, .alias_mapping = hugetlb_alias_mapping, @@ -362,6 +370,22 @@ static void wp_range(int ufd, __u64 start, __u64 len, bool wp) } } +static void continue_range(int ufd, __u64 start, __u64 len) +{ + struct uffdio_continue req; + + req.range.start = start; + req.range.len = len; + req.mode = 0; + + if (ioctl(ufd, UFFDIO_CONTINUE, &req)) { + fprintf(stderr, + "UFFDIO_CONTINUE failed for address 0x%" PRIx64 "\n", + (uint64_t)start); + exit(1); + } +} + static void *locking_thread(void *arg) { unsigned long cpu = (unsigned long) arg; @@ -569,8 +593,32 @@ static void uffd_handle_page_fault(struct uffd_msg *msg, } if (msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP) { + /* Write protect page faults */ wp_range(uffd, msg->arg.pagefault.address, page_size, false); stats->wp_faults++; + } else if (msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_MINOR) { + uint8_t *area; + int b; + + /* + * Minor page faults + * + * To prove we can modify the original range for testing + * purposes, we're going to bit flip this range before + * continuing. + * + * Note that this requires all minor page fault tests operate on + * area_dst (non-UFFD-registered) and area_dst_alias + * (UFFD-registered). + */ + + area = (uint8_t *)(area_dst + + ((char *)msg->arg.pagefault.address - + area_dst_alias)); + for (b = 0; b < page_size; ++b) + area[b] = ~area[b]; + continue_range(uffd, msg->arg.pagefault.address, page_size); + stats->minor_faults++; } else { /* Missing page faults */ if (bounces & BOUNCE_VERIFY && @@ -1112,7 +1160,7 @@ static int userfaultfd_events_test(void) } if (!pid) - return faulting_process(0); + exit(faulting_process(0)); waitpid(pid, &err, 0); if (err) { @@ -1215,6 +1263,95 @@ static int userfaultfd_sig_test(void) return userfaults != 0; } +static int userfaultfd_minor_test(void) +{ + struct uffdio_register uffdio_register; + unsigned long expected_ioctls; + unsigned long p; + pthread_t uffd_mon; + uint8_t expected_byte; + void *expected_page; + char c; + struct uffd_stats stats = { 0 }; + + if (!test_uffdio_minor) + return 0; + + printf("testing minor faults: "); + fflush(stdout); + + if (uffd_test_ops->release_pages(area_dst)) + return 1; + + if (userfaultfd_open(UFFD_FEATURE_MINOR_HUGETLBFS)) + return 1; + + uffdio_register.range.start = (unsigned long)area_dst_alias; + uffdio_register.range.len = nr_pages * page_size; + uffdio_register.mode = UFFDIO_REGISTER_MODE_MISSING; + if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register)) { + fprintf(stderr, "register failure\n"); + exit(1); + } + + expected_ioctls = uffd_test_ops->expected_ioctls; + expected_ioctls |= (1 << _UFFDIO_CONTINUE); + if ((uffdio_register.ioctls & expected_ioctls) != expected_ioctls) { + fprintf(stderr, "unexpected missing ioctl(s)\n"); + exit(1); + } + + /* + * After registering with UFFD, populate the non-UFFD-registered side of + * the shared mapping. This should *not* trigger any UFFD minor faults. + */ + for (p = 0; p < nr_pages; ++p) { + memset(area_dst + (p * page_size), p % ((uint8_t)-1), + page_size); + } + + if (pthread_create(&uffd_mon, &attr, uffd_poll_thread, &stats)) { + perror("uffd_poll_thread create"); + exit(1); + } + + /* + * Read each of the pages back using the UFFD-registered mapping. We + * expect that the first time we touch a page, it will result in a minor + * fault. uffd_poll_thread will resolve the fault by bit-flipping the + * page's contents, and then issuing a CONTINUE ioctl. + */ + + if (posix_memalign(&expected_page, page_size, page_size)) { + fprintf(stderr, "out of memory\n"); + return 1; + } + + for (p = 0; p < nr_pages; ++p) { + expected_byte = ~((uint8_t)(p % ((uint8_t)-1))); + memset(expected_page, expected_byte, page_size); + if (my_bcmp(expected_page, area_dst_alias + (p * page_size), + page_size)) { + fprintf(stderr, + "unexpected page contents after minor fault\n"); + exit(1); + } + } + + if (write(pipefd[1], &c, sizeof(c)) != sizeof(c)) { + perror("pipe write"); + exit(1); + } + if (pthread_join(uffd_mon, NULL)) + return 1; + + close(uffd); + + uffd_stats_report(&stats, 1); + + return stats.missing_faults != 0 || stats.minor_faults != nr_pages; +} + static int userfaultfd_stress(void) { void *area; @@ -1413,7 +1550,7 @@ static int userfaultfd_stress(void) close(uffd); return userfaultfd_zeropage_test() || userfaultfd_sig_test() - || userfaultfd_events_test(); + || userfaultfd_events_test() || userfaultfd_minor_test(); } /* @@ -1454,6 +1591,8 @@ static void set_test_type(const char *type) map_shared = true; test_type = TEST_HUGETLB; uffd_test_ops = &hugetlb_uffd_test_ops; + /* Minor faults require shared hugetlb; only enable here. */ + test_uffdio_minor = true; } else if (!strcmp(type, "shmem")) { map_shared = true; test_type = TEST_SHMEM;