From patchwork Sun Jun 19 23:34:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12887049 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 585C6C433EF for ; Mon, 20 Jun 2022 07:09:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BE5D68E0001; Mon, 20 Jun 2022 03:09:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B6E436B0074; Mon, 20 Jun 2022 03:09:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8FC918E0001; Mon, 20 Jun 2022 03:09:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 75DD16B0073 for ; Mon, 20 Jun 2022 03:09:04 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 45B8F33703 for ; Mon, 20 Jun 2022 07:09:04 +0000 (UTC) X-FDA: 79597737408.27.A4AEEB9 Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by imf27.hostedemail.com (Postfix) with ESMTP id D241940019 for ; Mon, 20 Jun 2022 07:09:03 +0000 (UTC) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 9341560CF2 for ; Mon, 20 Jun 2022 07:09:03 +0000 (UTC) X-FDA: 79597737366.28.64C8833 Received: from mail-pg1-f180.google.com (mail-pg1-f180.google.com [209.85.215.180]) by imf05.hostedemail.com (Postfix) with ESMTP id 3B3A4100013 for ; Mon, 20 Jun 2022 07:09:03 +0000 (UTC) Received: by mail-pg1-f180.google.com with SMTP id l4so9479724pgh.13 for ; Mon, 20 Jun 2022 00:09:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=s/vOdTdz1MKDU4MKbsuatRcArWJwzOdI5hFpZSVk+2U=; b=kO0bGiXIyczLIo+Y7kjA5y8J2qHROQ1hQ4Zbp1RBi1KKl0YyCSU2+oqw48C7RkjyYR RFr1TsNNxRP+HY6k25T9BnTsoZcjeW8zNhh/qd/jah6phqvB898r09lSP2vxAuuonx4w sJDJ0Wodry+yUpUBhkE6SflbXXLaUEq0zAbDiymRPJsdHxakuHyTyzJeP3Jccy9DA3hd 8Z085bdlBrtwfkqttzHx4yYyFF0+WDLu8uG/gKWqMQgUICh6oYRJ4xbK3ZBZYMZR4/X2 +NrKI+MezT2J0qrlH1i6zHGA2JlO1bFGmbnX/p0ktaRr0C4RMbOVsb6u+hNmDJmdo+ca 7D3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=s/vOdTdz1MKDU4MKbsuatRcArWJwzOdI5hFpZSVk+2U=; b=76Z3GqvWw5jUiky6W/T7YiBCB5AifBB7FrlZ9p2U6q8PKOgrecuBZuxKiM+kX3P2cr AFL9VhlzDS/bI3T/BiODonvjDs4cwAIfWIv56G6PD8MflWkJIzDI//4ffe/AWaWtXLpp pekkf24XjvygF5Fgs/DMj+tpif8ocmqd2/a2e9R4VcyIU4FA5fvHOhPuI+WMXgBQyVTk FM3eHgkBhBJzwFdMNJDAjeenYy0HdMstI+zO7eFNPt0aDd9CMmEQQEkv+9tF20TVRSf2 eRx4PFpeS/aLgwO3TnMhHfTshDQI/zTFefgGyeueFmX52GsBVo3TmYSs4f1LctSjfcm+ LlnA== X-Gm-Message-State: AJIora8rLrvjj/mFL4kylKnwjZz7xvuTJdErA8rcrHZMx36gT8FH8heR yMk19HyCzwvEa0x2TGk/BgGCXe4mcW+aUQ== X-Google-Smtp-Source: AGRyM1vvNWf5VscAoOfgoA7ujhchzBbWHkfIoFGQudLwhCz61tpO7x16tUFWXpG74XKmjThMzmaRZQ== X-Received: by 2002:a63:2c93:0:b0:407:b2e9:b22a with SMTP id s141-20020a632c93000000b00407b2e9b22amr20159174pgs.46.1655708941776; Mon, 20 Jun 2022 00:09:01 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id a8-20020a17090a6d8800b001e2ee0a9ac6sm9639773pjk.44.2022.06.20.00.09.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Jun 2022 00:09:00 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: Nadav Amit , Mike Kravetz , Hugh Dickins , Andrew Morton , Axel Rasmussen , Peter Xu , David Hildenbrand , Mike Rapoport Subject: [RFC PATCH v2 1/5] userfaultfd: introduce uffd_flags Date: Sun, 19 Jun 2022 16:34:45 -0700 Message-Id: <20220619233449.181323-2-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220619233449.181323-1-namit@vmware.com> References: <20220619233449.181323-1-namit@vmware.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655708943; a=rsa-sha256; cv=none; b=qAIHzOr+o84/iZDScpZq0zapxhXnYRxi+c+mVm1IbnLTpyGXs9LUSywZim9O56iRuHtj66 2HbnKSZhPLKC7bi+Y3n2FS3k3EzYNqn6yldMPc16YFMDi4xLsAGb8uT92EDV4ay+nxqHwJ QjnKy2Sr/eTc13pJsjP09EqnWhvZSEE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655708943; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=s/vOdTdz1MKDU4MKbsuatRcArWJwzOdI5hFpZSVk+2U=; b=Jor3XCK9z6JLvZTfCa/pHcaE8eskbialZ/XD66qOsBIuqLu9za7vnN0fFdjWMDhXXftwxT MMkknxBNVuzevHJD5AEquW3gdmr6C1S8cIuTBrsYv1SZcwIglHXorvWU5pYQeT4m24vc7w x3Pbo+0TGLY5Z/dzCXK9TJtKSYU46ic= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=kO0bGiXI; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf27.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.15) smtp.mailfrom=MAILER-DAEMON@hostedemail.com X-HE-Tag-Orig: 1655708943-682820 Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=kO0bGiXI; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf27.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.15) smtp.mailfrom=MAILER-DAEMON@hostedemail.com X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: D241940019 X-Stat-Signature: gjshe8zq7tij8myeocitmhjzdnojuf6k X-HE-Tag: 1655708943-148819 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit As the next patches are going to introduce more information that needs to be propagated regarding handled user requests, introduce uffd_flags that would be used to propagate this information. Remove the unused UFFD_FLAGS_SET to avoid confusion in the constant names. Introducing uffd flags also allows to avoid mm/userfaultfd from being using uapi (e.g., UFFDIO_COPY_MODE_WP). Cc: Mike Kravetz Cc: Hugh Dickins Cc: Andrew Morton Cc: Axel Rasmussen Cc: Peter Xu Cc: David Hildenbrand Cc: Mike Rapoport Signed-off-by: Nadav Amit Acked-by: David Hildenbrand --- fs/userfaultfd.c | 20 +++++++++---- include/linux/hugetlb.h | 4 +-- include/linux/shmem_fs.h | 8 ++++-- include/linux/userfaultfd_k.h | 23 +++++++++------ mm/hugetlb.c | 3 +- mm/shmem.c | 6 ++-- mm/userfaultfd.c | 53 ++++++++++++++++++----------------- 7 files changed, 68 insertions(+), 49 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index d398f6bf6d74..5daafa54eb3f 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1700,6 +1700,8 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, struct uffdio_copy uffdio_copy; struct uffdio_copy __user *user_uffdio_copy; struct userfaultfd_wake_range range; + bool mode_wp; + uffd_flags_t uffd_flags; user_uffdio_copy = (struct uffdio_copy __user *) arg; @@ -1726,10 +1728,15 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, goto out; if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP)) goto out; + + mode_wp = uffdio_copy.mode & UFFDIO_COPY_MODE_WP; + + uffd_flags = mode_wp ? UFFD_FLAGS_WP : 0; + if (mmget_not_zero(ctx->mm)) { ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src, uffdio_copy.len, &ctx->mmap_changing, - uffdio_copy.mode); + uffd_flags); mmput(ctx->mm); } else { return -ESRCH; @@ -1781,7 +1788,7 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, if (mmget_not_zero(ctx->mm)) { ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start, uffdio_zeropage.range.len, - &ctx->mmap_changing); + &ctx->mmap_changing, 0); mmput(ctx->mm); } else { return -ESRCH; @@ -1810,6 +1817,7 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, struct uffdio_writeprotect __user *user_uffdio_wp; struct userfaultfd_wake_range range; bool mode_wp, mode_dontwake; + uffd_flags_t uffd_flags; if (atomic_read(&ctx->mmap_changing)) return -EAGAIN; @@ -1835,10 +1843,12 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, if (mode_wp && mode_dontwake) return -EINVAL; + uffd_flags = (mode_wp ? UFFD_FLAGS_WP : 0); + if (mmget_not_zero(ctx->mm)) { ret = mwriteprotect_range(ctx->mm, uffdio_wp.range.start, - uffdio_wp.range.len, mode_wp, - &ctx->mmap_changing); + uffdio_wp.range.len, + &ctx->mmap_changing, uffd_flags); mmput(ctx->mm); } else { return -ESRCH; @@ -1891,7 +1901,7 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) if (mmget_not_zero(ctx->mm)) { ret = mcopy_continue(ctx->mm, uffdio_continue.range.start, uffdio_continue.range.len, - &ctx->mmap_changing); + &ctx->mmap_changing, 0); mmput(ctx->mm); } else { return -ESRCH; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 642a39016f9a..a4f326bc2de6 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -166,7 +166,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, unsigned long src_addr, enum mcopy_atomic_mode mode, struct page **pagep, - bool wp_copy); + uffd_flags_t uffd_flags); #endif /* CONFIG_USERFAULTFD */ bool hugetlb_reserve_pages(struct inode *inode, long from, long to, struct vm_area_struct *vma, @@ -366,7 +366,7 @@ static inline int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, unsigned long src_addr, enum mcopy_atomic_mode mode, struct page **pagep, - bool wp_copy) + uffd_flags_t uffd_flags) { BUG(); return 0; diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index a68f982f22d1..f93a3c114002 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -9,6 +9,7 @@ #include #include #include +#include /* inode in-kernel data */ @@ -145,11 +146,12 @@ extern int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, - bool zeropage, bool wp_copy, - struct page **pagep); + bool zeropage, + struct page **pagep, + uffd_flags_t uffd_flags); #else /* !CONFIG_SHMEM */ #define shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, \ - src_addr, zeropage, wp_copy, pagep) ({ BUG(); 0; }) + src_addr, zeropage, pagep, uffd_flags) ({ BUG(); 0; }) #endif /* CONFIG_SHMEM */ #endif /* CONFIG_USERFAULTFD */ diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index eee374c29c85..6331148023c1 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -34,7 +34,6 @@ #define UFFD_NONBLOCK O_NONBLOCK #define UFFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK) -#define UFFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS) extern int sysctl_unprivileged_userfaultfd; @@ -56,23 +55,29 @@ enum mcopy_atomic_mode { MCOPY_ATOMIC_CONTINUE, }; +typedef unsigned int __bitwise uffd_flags_t; + +#define UFFD_FLAGS_WP ((__force uffd_flags_t)BIT(0)) + extern int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, unsigned long dst_addr, struct page *page, - bool newly_allocated, bool wp_copy); + bool newly_allocated, + uffd_flags_t uffd_flags); extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - atomic_t *mmap_changing, __u64 mode); -extern ssize_t mfill_zeropage(struct mm_struct *dst_mm, - unsigned long dst_start, - unsigned long len, - atomic_t *mmap_changing); + atomic_t *mmap_changing, uffd_flags_t uffd_flags); +extern ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long dst_start, + unsigned long len, atomic_t *mmap_changing, + uffd_flags_t uffd_flags); extern ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long dst_start, - unsigned long len, atomic_t *mmap_changing); + unsigned long len, atomic_t *mmap_changing, + uffd_flags_t uffd_flags); extern int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, unsigned long len, - bool enable_wp, atomic_t *mmap_changing); + atomic_t *mmap_changing, + uffd_flags_t uffd_flags); /* mm helpers */ static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 2bc9d1170e4f..2beff8a4bf7c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5875,9 +5875,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, unsigned long src_addr, enum mcopy_atomic_mode mode, struct page **pagep, - bool wp_copy) + uffd_flags_t uffd_flags) { bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE); + bool wp_copy = uffd_flags & UFFD_FLAGS_WP; struct hstate *h = hstate_vma(dst_vma); struct address_space *mapping = dst_vma->vm_file->f_mapping; pgoff_t idx = vma_hugecache_offset(h, dst_vma, dst_addr); diff --git a/mm/shmem.c b/mm/shmem.c index 12ac67dc831f..89c775275bae 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2343,8 +2343,8 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, - bool zeropage, bool wp_copy, - struct page **pagep) + bool zeropage, struct page **pagep, + uffd_flags_t uffd_flags) { struct inode *inode = file_inode(dst_vma->vm_file); struct shmem_inode_info *info = SHMEM_I(inode); @@ -2418,7 +2418,7 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, goto out_release; ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr, - page, true, wp_copy); + page, true, uffd_flags); if (ret) goto out_delete_from_cache; diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 07d3befc80e4..734de6aa0b8e 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -58,7 +58,7 @@ struct vm_area_struct *find_dst_vma(struct mm_struct *dst_mm, int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, unsigned long dst_addr, struct page *page, - bool newly_allocated, bool wp_copy) + bool newly_allocated, uffd_flags_t uffd_flags) { int ret; pte_t _dst_pte, *dst_pte; @@ -78,7 +78,7 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, * Always mark a PTE as write-protected when needed, regardless of * VM_WRITE, which the user might change. */ - if (wp_copy) { + if (uffd_flags & UFFD_FLAGS_WP) { _dst_pte = pte_mkuffd_wp(_dst_pte); writable = false; } @@ -145,7 +145,7 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm, unsigned long dst_addr, unsigned long src_addr, struct page **pagep, - bool wp_copy) + uffd_flags_t uffd_flags) { void *page_kaddr; int ret; @@ -189,7 +189,7 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm, goto out_release; ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr, - page, true, wp_copy); + page, true, uffd_flags); if (ret) goto out_release; out: @@ -239,7 +239,7 @@ static int mcontinue_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, unsigned long dst_addr, - bool wp_copy) + uffd_flags_t uffd_flags) { struct inode *inode = file_inode(dst_vma->vm_file); pgoff_t pgoff = linear_page_index(dst_vma, dst_addr); @@ -263,7 +263,7 @@ static int mcontinue_atomic_pte(struct mm_struct *dst_mm, } ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr, - page, false, wp_copy); + page, false, uffd_flags); if (ret) goto out_release; @@ -309,7 +309,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, unsigned long src_start, unsigned long len, enum mcopy_atomic_mode mode, - bool wp_copy) + uffd_flags_t uffd_flags) { int vm_shared = dst_vma->vm_flags & VM_SHARED; ssize_t err; @@ -406,7 +406,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, err = hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, dst_addr, src_addr, mode, &page, - wp_copy); + uffd_flags); mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); @@ -462,7 +462,7 @@ extern ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, unsigned long src_start, unsigned long len, enum mcopy_atomic_mode mode, - bool wp_copy); + uffd_flags_t uffd_flags); #endif /* CONFIG_HUGETLB_PAGE */ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, @@ -472,13 +472,13 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, unsigned long src_addr, struct page **page, enum mcopy_atomic_mode mode, - bool wp_copy) + uffd_flags_t uffd_flags) { ssize_t err; if (mode == MCOPY_ATOMIC_CONTINUE) { return mcontinue_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, - wp_copy); + uffd_flags); } /* @@ -495,7 +495,7 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, if (mode == MCOPY_ATOMIC_NORMAL) err = mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, src_addr, page, - wp_copy); + uffd_flags); else err = mfill_zeropage_pte(dst_mm, dst_pmd, dst_vma, dst_addr); @@ -503,7 +503,7 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, err = shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, src_addr, mode != MCOPY_ATOMIC_NORMAL, - wp_copy, page); + page, uffd_flags); } return err; @@ -515,7 +515,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, unsigned long len, enum mcopy_atomic_mode mcopy_mode, atomic_t *mmap_changing, - __u64 mode) + uffd_flags_t uffd_flags) { struct vm_area_struct *dst_vma; ssize_t err; @@ -523,7 +523,6 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, unsigned long src_addr, dst_addr; long copied; struct page *page; - bool wp_copy; /* * Sanitize the command parameters: @@ -570,11 +569,10 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, goto out_unlock; /* - * validate 'mode' now that we know the dst_vma: don't allow + * validate 'flags' now that we know the dst_vma: don't allow * a wrprotect copy if the userfaultfd didn't register as WP. */ - wp_copy = mode & UFFDIO_COPY_MODE_WP; - if (wp_copy && !(dst_vma->vm_flags & VM_UFFD_WP)) + if ((uffd_flags & UFFD_FLAGS_WP) && !(dst_vma->vm_flags & VM_UFFD_WP)) goto out_unlock; /* @@ -583,7 +581,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, if (is_vm_hugetlb_page(dst_vma)) return __mcopy_atomic_hugetlb(dst_mm, dst_vma, dst_start, src_start, len, mcopy_mode, - wp_copy); + uffd_flags); if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma)) goto out_unlock; @@ -635,7 +633,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, BUG_ON(pmd_trans_huge(*dst_pmd)); err = mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, - src_addr, &page, mcopy_mode, wp_copy); + src_addr, &page, mcopy_mode, uffd_flags); cond_resched(); if (unlikely(err == -ENOENT)) { @@ -683,30 +681,33 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - atomic_t *mmap_changing, __u64 mode) + atomic_t *mmap_changing, uffd_flags_t uffd_flags) { return __mcopy_atomic(dst_mm, dst_start, src_start, len, - MCOPY_ATOMIC_NORMAL, mmap_changing, mode); + MCOPY_ATOMIC_NORMAL, mmap_changing, uffd_flags); } ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start, - unsigned long len, atomic_t *mmap_changing) + unsigned long len, atomic_t *mmap_changing, + uffd_flags_t uffd_flags) { return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_ZEROPAGE, mmap_changing, 0); } ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long start, - unsigned long len, atomic_t *mmap_changing) + unsigned long len, atomic_t *mmap_changing, + uffd_flags_t uffd_flags) { return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_CONTINUE, mmap_changing, 0); } int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, - unsigned long len, bool enable_wp, - atomic_t *mmap_changing) + unsigned long len, + atomic_t *mmap_changing, uffd_flags_t uffd_flags) { + bool enable_wp = uffd_flags & UFFD_FLAGS_WP; struct vm_area_struct *dst_vma; unsigned long page_mask; struct mmu_gather tlb; From patchwork Sun Jun 19 23:34:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12887050 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D4C8C43334 for ; Mon, 20 Jun 2022 07:09:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 223DD8E0002; Mon, 20 Jun 2022 03:09:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C1B46B0074; Mon, 20 Jun 2022 03:09:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E05A98E0002; Mon, 20 Jun 2022 03:09:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CDCD76B0073 for ; Mon, 20 Jun 2022 03:09:05 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id A7288459 for ; Mon, 20 Jun 2022 07:09:05 +0000 (UTC) X-FDA: 79597737450.28.76E601A Received: from relay5.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by imf06.hostedemail.com (Postfix) with ESMTP id 447DC1800A9 for ; Mon, 20 Jun 2022 07:09:05 +0000 (UTC) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id E4BF934853 for ; Mon, 20 Jun 2022 07:09:04 +0000 (UTC) X-FDA: 79597737408.14.4644BED Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) by imf17.hostedemail.com (Postfix) with ESMTP id 7DC134000D for ; Mon, 20 Jun 2022 07:09:04 +0000 (UTC) Received: by mail-pj1-f53.google.com with SMTP id p5so3612183pjt.2 for ; Mon, 20 Jun 2022 00:09:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Rqmkflqc8W7O4jv7TJNo2Rb/yH8GIusQs/2+8iaK/eI=; b=dCdhkkH0+yeuvMHakNgMl9Nz2z9cVgLIDATYSbOEc9nmYja449v2JlVDbYJIAUCVw5 Qrat5E4YZ5/C9zXAoxdvEAiltRGpfbpwGP/Omy+1rpRh8DA8UcDHfQd3Knd2YC32o7oC TnccsxwSb4XXlW/fYAyYqVnQ9TXSSRYU/TWrRNaloLTipCyWMy/NmppxZqEHxXqKzrxz bWLNRCaeQ74hsBQcIIifzyu0rV9sLgXje1Qdc4oMWoOIDLmDTrLjJIh7Fa/x4ez3ONTC bkj32HbRGPkbKeVH+OgDhUYQKp/bvHwhABRPnTq5aIiImPGrlC0gGUjISmZ8qHdEW/l2 bU1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Rqmkflqc8W7O4jv7TJNo2Rb/yH8GIusQs/2+8iaK/eI=; b=IkpZMDYgtujL2gdWGM7GY7G1DVvzy2sjST9s0AQ1HwuREfyPaGjwU8w4aRWVRJQ8B6 bCkWlT/R6QFSK4njII9lBOqiW6L1ZFWF9GBKjUfYtvB3VQ/DBJiLuAgimjMw/+J5AAx0 wg0khL6Dhdj+iJZD2bRNVHL1Q3xthZD3vo5wtmAj5WJTxK94AG/SSWRTdfXiOtAbrbMN 0rJ4E3YFrMsbnaVpHa7kZjolFZYCLI7kXrPpAbfk0ySnSwhySVKbO8Bnmeo4cNE2issX RCegsYAjVdQiQC17Ug0MJRahsiXsEsqb6bHUztOI91RfuoBkBSiPEhS97/PPmvht4Mnm wGOw== X-Gm-Message-State: AJIora+92yW8etLOKETBiZjlw+gldi36TD5JojjmceX5/qwDwU8CMjui TCE5/v4CpHugx4Q6Ug7yQQ20S6YHhwvS6w== X-Google-Smtp-Source: AGRyM1uvv1rmPOadDOg30u7NU6O5ocys/7vRqKkcDHKMaR+BHbqJXQuqfY27a6wh0AvrL8NGvHjKew== X-Received: by 2002:a17:90b:4c0c:b0:1ec:b128:51f5 with SMTP id na12-20020a17090b4c0c00b001ecb12851f5mr2209056pjb.220.1655708943109; Mon, 20 Jun 2022 00:09:03 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id a8-20020a17090a6d8800b001e2ee0a9ac6sm9639773pjk.44.2022.06.20.00.09.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Jun 2022 00:09:02 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: Nadav Amit , Mike Kravetz , Hugh Dickins , Andrew Morton , Axel Rasmussen , Peter Xu , David Hildenbrand , Mike Rapoport Subject: [RFC PATCH v2 2/5] userfaultfd: introduce access-likely mode for copy/wp operations Date: Sun, 19 Jun 2022 16:34:46 -0700 Message-Id: <20220619233449.181323-3-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220619233449.181323-1-namit@vmware.com> References: <20220619233449.181323-1-namit@vmware.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655708945; a=rsa-sha256; cv=none; b=mnG/I9dZv3Z8htIcqeg2xYltxzyx7uGSNokBNwwzfb6cjq418bQU1M8DTziqNpKLXuOyGN kABgVR5Sjkk0AswbacCwgeHGgUpcmJ/3IUR8gmjoIAv8IyNBReCO84z8RW/7VwHgPLQzd2 +1jf6fCpXklM/snq9CMPq8q9hAkEXO0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655708945; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Rqmkflqc8W7O4jv7TJNo2Rb/yH8GIusQs/2+8iaK/eI=; b=o/T/gsg7eIsJl1425/92m1s3o3uA8Gv0MpYKsm1o9RZ92JayEiFyAlUNaRp3dm5Y1yCg84 HgZWh67f1rCscE4knjheKVzHKdWZMr2aNUGvSkfM5CdfOH0c+n6lpCMImrEvTiMaqAJuiI qE0fj9HC461/fFVFK94aBsxSaiM+wMA= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=dCdhkkH0; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf06.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.14) smtp.mailfrom=MAILER-DAEMON@hostedemail.com X-HE-Tag-Orig: 1655708944-277591 Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=dCdhkkH0; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf06.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.14) smtp.mailfrom=MAILER-DAEMON@hostedemail.com X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 447DC1800A9 X-Stat-Signature: wh7j5um8sfo73eei6trhusabq1rhfune X-HE-Tag: 1655708945-775101 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Using a PTE on x86 with cleared access-bit (aka young-bit) takes ~600 cycles more than when the access bit is set. At the same time, setting the access-bit for memory that is not used (e.g., prefetched) can introduce greater overheads, as the prefetched memory is reclaimed later than it should be. Userfaultfd currently does not set the access-bit (excluding the huge-pages case). Arguably, it is best to let the user control whether the access bit should be set or not. The expected use is to request userfaultfd to set the access-bit when the copy/wp operation is done to resolve a page-fault, and not to set the access-bit when the memory is prefetched. Introduce UFFDIO_COPY_MODE_ACCESS_LIKELY and UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY to enable userspace to request the young bit to be set. Set for UFFDIO_CONTINUE and UFFDIO_ZEROPAGE the bit unconditionally since the former is only used to resolve page-faults and the latter would not benefit from not setting the access-bit. Cc: Mike Kravetz Cc: Hugh Dickins Cc: Andrew Morton Cc: Axel Rasmussen Cc: Peter Xu Cc: David Hildenbrand Cc: Mike Rapoport Signed-off-by: Nadav Amit --- fs/userfaultfd.c | 23 ++++++++++++++++------- include/linux/userfaultfd_k.h | 1 + include/uapi/linux/userfaultfd.h | 20 +++++++++++++++++++- mm/userfaultfd.c | 18 ++++++++++++++++-- 4 files changed, 52 insertions(+), 10 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 5daafa54eb3f..35a8c4347c54 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1700,7 +1700,7 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, struct uffdio_copy uffdio_copy; struct uffdio_copy __user *user_uffdio_copy; struct userfaultfd_wake_range range; - bool mode_wp; + bool mode_wp, mode_access_likely; uffd_flags_t uffd_flags; user_uffdio_copy = (struct uffdio_copy __user *) arg; @@ -1726,12 +1726,15 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, ret = -EINVAL; if (uffdio_copy.src + uffdio_copy.len <= uffdio_copy.src) goto out; - if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP)) + if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP| + UFFDIO_COPY_MODE_ACCESS_LIKELY)) goto out; mode_wp = uffdio_copy.mode & UFFDIO_COPY_MODE_WP; + mode_access_likely = uffdio_copy.mode & UFFDIO_COPY_MODE_ACCESS_LIKELY; - uffd_flags = mode_wp ? UFFD_FLAGS_WP : 0; + uffd_flags = (mode_wp ? UFFD_FLAGS_WP : 0) | + (mode_access_likely ? UFFD_FLAGS_ACCESS_LIKELY : 0); if (mmget_not_zero(ctx->mm)) { ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src, @@ -1816,7 +1819,7 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, struct uffdio_writeprotect uffdio_wp; struct uffdio_writeprotect __user *user_uffdio_wp; struct userfaultfd_wake_range range; - bool mode_wp, mode_dontwake; + bool mode_wp, mode_dontwake, mode_access_likely; uffd_flags_t uffd_flags; if (atomic_read(&ctx->mmap_changing)) @@ -1834,16 +1837,19 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, return ret; if (uffdio_wp.mode & ~(UFFDIO_WRITEPROTECT_MODE_DONTWAKE | - UFFDIO_WRITEPROTECT_MODE_WP)) + UFFDIO_WRITEPROTECT_MODE_WP | + UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY)) return -EINVAL; mode_wp = uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_WP; mode_dontwake = uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_DONTWAKE; + mode_access_likely = uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY; if (mode_wp && mode_dontwake) return -EINVAL; - uffd_flags = (mode_wp ? UFFD_FLAGS_WP : 0); + uffd_flags = (mode_wp ? UFFD_FLAGS_WP : 0) | + (mode_access_likely ? UFFD_FLAGS_ACCESS_LIKELY : 0); if (mmget_not_zero(ctx->mm)) { ret = mwriteprotect_range(ctx->mm, uffdio_wp.range.start, @@ -1871,6 +1877,7 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) struct uffdio_continue uffdio_continue; struct uffdio_continue __user *user_uffdio_continue; struct userfaultfd_wake_range range; + uffd_flags_t uffd_flags; user_uffdio_continue = (struct uffdio_continue __user *)arg; @@ -1898,10 +1905,12 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) if (uffdio_continue.mode & ~UFFDIO_CONTINUE_MODE_DONTWAKE) goto out; + uffd_flags = UFFD_FLAGS_ACCESS_LIKELY; + if (mmget_not_zero(ctx->mm)) { ret = mcopy_continue(ctx->mm, uffdio_continue.range.start, uffdio_continue.range.len, - &ctx->mmap_changing, 0); + &ctx->mmap_changing, uffd_flags); mmput(ctx->mm); } else { return -ESRCH; diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 6331148023c1..e6ac165ec044 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -58,6 +58,7 @@ enum mcopy_atomic_mode { typedef unsigned int __bitwise uffd_flags_t; #define UFFD_FLAGS_WP ((__force uffd_flags_t)BIT(0)) +#define UFFD_FLAGS_ACCESS_LIKELY ((__force uffd_flags_t)BIT(1)) extern int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 005e5e306266..d9c8ce9ba777 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -38,7 +38,8 @@ UFFD_FEATURE_MINOR_HUGETLBFS | \ UFFD_FEATURE_MINOR_SHMEM | \ UFFD_FEATURE_EXACT_ADDRESS | \ - UFFD_FEATURE_WP_HUGETLBFS_SHMEM) + UFFD_FEATURE_WP_HUGETLBFS_SHMEM | \ + UFFD_FEATURE_ACCESS_HINTS) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -203,6 +204,10 @@ struct uffdio_api { * * UFFD_FEATURE_WP_HUGETLBFS_SHMEM indicates that userfaultfd * write-protection mode is supported on both shmem and hugetlbfs. + * + * UFFD_FEATURE_ACCESS_HINTS indicates that the copy supports + * UFFDIO_COPY_MODE_ACCESS_LIKELY supports + * UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -217,6 +222,7 @@ struct uffdio_api { #define UFFD_FEATURE_MINOR_SHMEM (1<<10) #define UFFD_FEATURE_EXACT_ADDRESS (1<<11) #define UFFD_FEATURE_WP_HUGETLBFS_SHMEM (1<<12) +#define UFFD_FEATURE_ACCESS_HINTS (1<<13) __u64 features; __u64 ioctls; @@ -260,6 +266,13 @@ struct uffdio_copy { * copy_from_user will not read the last 8 bytes. */ __s64 copy; + /* + * UFFDIO_COPY_MODE_ACCESS_LIKELY will set the mapped page as young. + * This can reduce the time that the first access to the page takes. + * Yet, if set opportunistically to memory that is not used, it might + * extend the time before the unused memory pages are reclaimed. + */ +#define UFFDIO_COPY_MODE_ACCESS_LIKELY ((__u64)1<<3) }; struct uffdio_zeropage { @@ -284,6 +297,10 @@ struct uffdio_writeprotect { * UFFDIO_WRITEPROTECT_MODE_DONTWAKE: set the flag to avoid waking up * any wait thread after the operation succeeds. * + * UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY: set the flag to mark the modified + * memory as young, which can reduce the time that the first access + * to the page takes. + * * NOTE: Write protecting a region (WP=1) is unrelated to page faults, * therefore DONTWAKE flag is meaningless with WP=1. Removing write * protection (WP=0) in response to a page fault wakes the faulting @@ -291,6 +308,7 @@ struct uffdio_writeprotect { */ #define UFFDIO_WRITEPROTECT_MODE_WP ((__u64)1<<0) #define UFFDIO_WRITEPROTECT_MODE_DONTWAKE ((__u64)1<<1) +#define UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY ((__u64)1<<2) __u64 mode; }; diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 734de6aa0b8e..140c8d3e946e 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -92,6 +92,9 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, */ _dst_pte = pte_wrprotect(_dst_pte); + if (uffd_flags & UFFD_FLAGS_ACCESS_LIKELY) + _dst_pte = pte_mkyoung(_dst_pte); + dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl); if (vma_is_shmem(dst_vma)) { @@ -202,7 +205,8 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm, static int mfill_zeropage_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, - unsigned long dst_addr) + unsigned long dst_addr, + uffd_flags_t uffd_flags) { pte_t _dst_pte, *dst_pte; spinlock_t *ptl; @@ -225,6 +229,10 @@ static int mfill_zeropage_pte(struct mm_struct *dst_mm, ret = -EEXIST; if (!pte_none(*dst_pte)) goto out_unlock; + + if (uffd_flags & UFFD_FLAGS_ACCESS_LIKELY) + _dst_pte = pte_mkyoung(_dst_pte); + set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte); /* No need to invalidate - it was non-present before */ update_mmu_cache(dst_vma, dst_addr, dst_pte); @@ -498,7 +506,7 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, uffd_flags); else err = mfill_zeropage_pte(dst_mm, dst_pmd, - dst_vma, dst_addr); + dst_vma, dst_addr, uffd_flags); } else { err = shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, src_addr, @@ -691,6 +699,9 @@ ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start, unsigned long len, atomic_t *mmap_changing, uffd_flags_t uffd_flags) { + /* There is no cost for setting the access bit of a zeropage */ + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_ZEROPAGE, mmap_changing, 0); } @@ -699,6 +710,9 @@ ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long start, unsigned long len, atomic_t *mmap_changing, uffd_flags_t uffd_flags) { + /* The page is likely to be accessed */ + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_CONTINUE, mmap_changing, 0); } From patchwork Sun Jun 19 23:34:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12887051 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F060BCCA479 for ; Mon, 20 Jun 2022 07:09:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 46B928E0003; Mon, 20 Jun 2022 03:09:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3CE716B0074; Mon, 20 Jun 2022 03:09:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 135068E0003; Mon, 20 Jun 2022 03:09:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0098A6B0073 for ; Mon, 20 Jun 2022 03:09:06 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C45D233B03 for ; Mon, 20 Jun 2022 07:09:06 +0000 (UTC) X-FDA: 79597737492.21.8EED5BD Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by imf28.hostedemail.com (Postfix) with ESMTP id 427E6C0012 for ; Mon, 20 Jun 2022 07:09:06 +0000 (UTC) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id F1100347E0 for ; Mon, 20 Jun 2022 07:09:05 +0000 (UTC) X-FDA: 79597737450.23.DED5103 Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) by imf09.hostedemail.com (Postfix) with ESMTP id 91AC41400B0 for ; Mon, 20 Jun 2022 07:09:05 +0000 (UTC) Received: by mail-pf1-f179.google.com with SMTP id t21so3157406pfq.1 for ; Mon, 20 Jun 2022 00:09:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=XpZJJ/gku1wPF74neCu1Xmg/zJHYwUODRVOkqyPkGMY=; b=mb/REVZSR51ZMHAPDPUJBXmKGwCAs7yZ5FaMGlYIM2Av9Usqbf8SJu8Wn5JytIkKaX 5qiBXrruv/8EBfgVJ8SgvN5dbexGk1PM3by1pvI6SG5m/cKyUZ6k69zRz0vqxvb5InHg hE7o5IVYgdh33iIOQl/cvpICqy712RIHdTTE0TjmWvmkoIMX3tlHg9t6gLT8rxAYwDeK SnsQTzF4KemKe09h7RnSYO0eQSNP/j245GbRrIqNrHOftD6kiYV9RbWzpoO/SBj6KkYM zhJshFD9mqrZe+N639uPEa9MvTVkjAlM6Okl7WXao6/8wEv3lRNrgMskFvSaBpNrmVzp Rv6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=XpZJJ/gku1wPF74neCu1Xmg/zJHYwUODRVOkqyPkGMY=; b=HViRKb5fN8yjQQ17iGxovCBGlr3yZX3rbfwxTeqt3eGp+3+9YQnqmQ7sF96NFE5JiS mImEQF5M39ot+irDjpRN+oJQQZ/Z8uPWmtnm6sx3q9DicAaGbSS/7iePHYW3Em+2fKrp 0TGxsmF3D0DoTkLfcX9tKu1sgCQlnHV66x3N71GrgaNLCsG15NIU2jDr2xpZXQ0JVSTx 8pGzuC9bBPxgygC2fkFJ/B5uyrFvrAsEqy0aQo1cDY93PXXNYwNcVhjRnHmtCd95S8nu Bd0pt2f4pBN2NEuYImuh8Hy/Lvv5Dz07S8w/lDoA3sxb7HX9NRWWpLKAJEpdevdO2c38 DSqQ== X-Gm-Message-State: AJIora+tlH2txmQ3KqVRXrB1IZ3xZkdO6TI0UTbpjTibMvNAzY5Bs6Z5 9LE9KXhvXRE/KgCJfuXo25MUa3iH9WgcxA== X-Google-Smtp-Source: AGRyM1sAdJ/rSQW92bdBX9MUXVVPMxM4RBOOD6Ecv3uO5qwqyb2XmA+cXutlXWWN5yITVm7GZI2GBA== X-Received: by 2002:a63:7258:0:b0:40c:7483:969b with SMTP id c24-20020a637258000000b0040c7483969bmr10383147pgn.612.1655708944298; Mon, 20 Jun 2022 00:09:04 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id a8-20020a17090a6d8800b001e2ee0a9ac6sm9639773pjk.44.2022.06.20.00.09.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Jun 2022 00:09:03 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: Nadav Amit , Mike Kravetz , Hugh Dickins , Andrew Morton , Axel Rasmussen , Peter Xu , David Hildenbrand , Mike Rapoport Subject: [RFC PATCH v2 3/5] userfaultfd: introduce write-likely mode for copy/wp operations Date: Sun, 19 Jun 2022 16:34:47 -0700 Message-Id: <20220619233449.181323-4-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220619233449.181323-1-namit@vmware.com> References: <20220619233449.181323-1-namit@vmware.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655708946; a=rsa-sha256; cv=none; b=LLKyPKzFFXRY20F6OFew1imR6EYb3HvZjMcNzGf8K5IO/3MBCYQD/4aCYSE+pey4PCR53U eTc8ISjAHi9cmvOadykxOWBN3PLHH9IpQQLGr+K6BtlVjNodtIowIBJG92D4D9JaVSIN5R 9tQiEOVba/sKiHbtUTMEwk9IaHEF2FI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655708946; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XpZJJ/gku1wPF74neCu1Xmg/zJHYwUODRVOkqyPkGMY=; b=zop3elP6/5MD2TTIj0awx7v88hSaDVK+VVZA/EIKAJVMLVUFnbBlz1EC53BJDEJGQYiqw2 t7svHQgTXUBoho7ezHGccowBcTJf6hjZ7GFI7os1t1DXx3SLXq88bsk8L2cnnb1aDpkPja esEM595UWgkitlGC+d+MOLFAACR/Plo= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="mb/REVZS"; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf28.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.10) smtp.mailfrom=MAILER-DAEMON@hostedemail.com X-HE-Tag-Orig: 1655708945-253139 Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="mb/REVZS"; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf28.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.10) smtp.mailfrom=MAILER-DAEMON@hostedemail.com X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 427E6C0012 X-Stat-Signature: 8hskuphtcsx65t8u61bf4wodnrd3ayys X-HE-Tag: 1655708946-333986 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Commit 9ae0f87d009ca ("mm/shmem: unconditionally set pte dirty in mfill_atomic_install_pte") has set PTEs as dirty as its title indicates. However, setting read-only PTEs as dirty can have several undesired implications. First, setting read-only PTEs as dirty, can cause these PTEs to become writable during mprotect() syscall. See in change_pte_range(): /* Avoid taking write faults for known dirty pages */ if (dirty_accountable && pte_dirty(ptent) && (pte_soft_dirty(ptent) || !(vma->vm_flags & VM_SOFTDIRTY))) { ptent = pte_mkwrite(ptent); } Second, unmapping read-only dirty PTEs often prevents TLB flush batching. See try_to_unmap_one(): /* * Page is dirty. Flush the TLB if a writable entry * potentially exists to avoid CPU writes after IO * starts and then write it out here. */ try_to_unmap_flush_dirty(); Similarly batching TLB flushed might be prevented in zap_pte_range(): if (!PageAnon(page)) { if (pte_dirty(ptent)) { force_flush = 1; set_page_dirty(page); } ... In general, setting a PTE as dirty seems for read-only entries might be dangerous. It should be reminded the dirty-COW vulnerability mitigation also relies on the dirty bit being set only after COW (although it does not appear to apply to userfaultfd). To summarize, setting the dirty bit for read-only PTEs is dangerous. But even if we only consider writable pages, always setting the dirty bit or always leaving it clear, does not seem as the best policy. Leaving the bit clear introduces overhead on the first write-access to set the bit. Setting the bit for pages the are eventually not written to can require more TLB flushes. Let the userfaultfd users control whether PTEs are marked as dirty or clean. Introduce UFFDIO_COPY_MODE_WRITE and UFFDIO_COPY_MODE_WRITE_LIKELY and UFFDIO_WRITEPROTECT_MODE_WRITE_LIKELY to enable userspace to indicate whether pages are likely to be written and set the dirty-bit if they are likely to be written. Cc: Mike Kravetz Cc: Hugh Dickins Cc: Andrew Morton Cc: Axel Rasmussen Cc: Peter Xu Cc: David Hildenbrand Cc: Mike Rapoport Signed-off-by: Nadav Amit --- fs/userfaultfd.c | 22 ++++++++++++++-------- include/linux/userfaultfd_k.h | 1 + include/uapi/linux/userfaultfd.h | 27 +++++++++++++++++++-------- mm/hugetlb.c | 3 +++ mm/shmem.c | 3 +++ mm/userfaultfd.c | 11 +++++++++-- 6 files changed, 49 insertions(+), 18 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 35a8c4347c54..a56983b594d5 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1700,7 +1700,7 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, struct uffdio_copy uffdio_copy; struct uffdio_copy __user *user_uffdio_copy; struct userfaultfd_wake_range range; - bool mode_wp, mode_access_likely; + bool mode_wp, mode_access_likely, mode_write_likely; uffd_flags_t uffd_flags; user_uffdio_copy = (struct uffdio_copy __user *) arg; @@ -1727,14 +1727,17 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, if (uffdio_copy.src + uffdio_copy.len <= uffdio_copy.src) goto out; if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP| - UFFDIO_COPY_MODE_ACCESS_LIKELY)) + UFFDIO_COPY_MODE_ACCESS_LIKELY| + UFFDIO_COPY_MODE_WRITE_LIKELY)) goto out; mode_wp = uffdio_copy.mode & UFFDIO_COPY_MODE_WP; mode_access_likely = uffdio_copy.mode & UFFDIO_COPY_MODE_ACCESS_LIKELY; + mode_write_likely = uffdio_copy.mode & UFFDIO_COPY_MODE_WRITE_LIKELY; uffd_flags = (mode_wp ? UFFD_FLAGS_WP : 0) | - (mode_access_likely ? UFFD_FLAGS_ACCESS_LIKELY : 0); + (mode_access_likely ? UFFD_FLAGS_ACCESS_LIKELY : 0) | + (mode_write_likely ? UFFD_FLAGS_WRITE_LIKELY : 0); if (mmget_not_zero(ctx->mm)) { ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src, @@ -1819,7 +1822,7 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, struct uffdio_writeprotect uffdio_wp; struct uffdio_writeprotect __user *user_uffdio_wp; struct userfaultfd_wake_range range; - bool mode_wp, mode_dontwake, mode_access_likely; + bool mode_wp, mode_dontwake, mode_access_likely, mode_write_likely; uffd_flags_t uffd_flags; if (atomic_read(&ctx->mmap_changing)) @@ -1838,18 +1841,21 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, if (uffdio_wp.mode & ~(UFFDIO_WRITEPROTECT_MODE_DONTWAKE | UFFDIO_WRITEPROTECT_MODE_WP | - UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY)) + UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY | + UFFDIO_WRITEPROTECT_MODE_WRITE_LIKELY)) return -EINVAL; mode_wp = uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_WP; mode_dontwake = uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_DONTWAKE; mode_access_likely = uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY; + mode_write_likely = uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_WRITE_LIKELY; if (mode_wp && mode_dontwake) return -EINVAL; uffd_flags = (mode_wp ? UFFD_FLAGS_WP : 0) | - (mode_access_likely ? UFFD_FLAGS_ACCESS_LIKELY : 0); + (mode_access_likely ? UFFD_FLAGS_ACCESS_LIKELY : 0) | + (mode_write_likely ? UFFD_FLAGS_WRITE_LIKELY : 0); if (mmget_not_zero(ctx->mm)) { ret = mwriteprotect_range(ctx->mm, uffdio_wp.range.start, @@ -1902,10 +1908,10 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) uffdio_continue.range.start) { goto out; } - if (uffdio_continue.mode & ~UFFDIO_CONTINUE_MODE_DONTWAKE) + if (uffdio_continue.mode & ~(UFFDIO_CONTINUE_MODE_DONTWAKE)) goto out; - uffd_flags = UFFD_FLAGS_ACCESS_LIKELY; + uffd_flags = UFFD_FLAGS_ACCESS_LIKELY | UFFD_FLAGS_WRITE_LIKELY; if (mmget_not_zero(ctx->mm)) { ret = mcopy_continue(ctx->mm, uffdio_continue.range.start, diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index e6ac165ec044..261a3fa750d0 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -59,6 +59,7 @@ typedef unsigned int __bitwise uffd_flags_t; #define UFFD_FLAGS_WP ((__force uffd_flags_t)BIT(0)) #define UFFD_FLAGS_ACCESS_LIKELY ((__force uffd_flags_t)BIT(1)) +#define UFFD_FLAGS_WRITE_LIKELY ((__force uffd_flags_t)BIT(2)) extern int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index d9c8ce9ba777..6ad93a13282e 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -267,12 +267,20 @@ struct uffdio_copy { */ __s64 copy; /* - * UFFDIO_COPY_MODE_ACCESS_LIKELY will set the mapped page as young. - * This can reduce the time that the first access to the page takes. - * Yet, if set opportunistically to memory that is not used, it might - * extend the time before the unused memory pages are reclaimed. + * UFFDIO_COPY_MODE_ACCESS_LIKELY indicates that the memory is likely to + * be accessed in the near future, in contrast to memory that is + * opportunistically copied and might not be accessed. The kernel will + * act accordingly, for instance by setting the access-bit in the PTE to + * reduce the access time to the page. + * + * UFFDIO_COPY_MODE_WRITE_LIKELY indicates that the memory is likely to + * be written to. The kernel will act accordingly, for instance by + * setting the dirty-bit in the PTE to reduce the write time to the + * page. This flag will be silently ignored if UFFDIO_COPY_MODE_WP is + * set. */ -#define UFFDIO_COPY_MODE_ACCESS_LIKELY ((__u64)1<<3) +#define UFFDIO_COPY_MODE_ACCESS_LIKELY ((__u64)1<<2) +#define UFFDIO_COPY_MODE_WRITE_LIKELY ((__u64)1<<3) }; struct uffdio_zeropage { @@ -297,9 +305,11 @@ struct uffdio_writeprotect { * UFFDIO_WRITEPROTECT_MODE_DONTWAKE: set the flag to avoid waking up * any wait thread after the operation succeeds. * - * UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY: set the flag to mark the modified - * memory as young, which can reduce the time that the first access - * to the page takes. + * UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY: set the flag to indicate the memory + * is likely to be accessed in the near future. + * + * UFFDIO_WRITEPROTECT_MODE_WRITE_LIKELY: set the flag to indicate that the + * memory is likely to be written to in the near future. * * NOTE: Write protecting a region (WP=1) is unrelated to page faults, * therefore DONTWAKE flag is meaningless with WP=1. Removing write @@ -309,6 +319,7 @@ struct uffdio_writeprotect { #define UFFDIO_WRITEPROTECT_MODE_WP ((__u64)1<<0) #define UFFDIO_WRITEPROTECT_MODE_DONTWAKE ((__u64)1<<1) #define UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY ((__u64)1<<2) +#define UFFDIO_WRITEPROTECT_MODE_WRITE_LIKELY ((__u64)1<<3) __u64 mode; }; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 2beff8a4bf7c..46814fc7762f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5962,6 +5962,9 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, *pagep = NULL; } + /* The PTE is not marked as dirty unconditionally */ + SetPageDirty(page); + /* * The memory barrier inside __SetPageUptodate makes sure that * preceding stores to the page contents become visible before diff --git a/mm/shmem.c b/mm/shmem.c index 89c775275bae..7488cd186c32 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2404,6 +2404,9 @@ int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, VM_BUG_ON(PageSwapBacked(page)); __SetPageLocked(page); __SetPageSwapBacked(page); + + /* The PTE is not marked as dirty unconditionally */ + SetPageDirty(page); __SetPageUptodate(page); ret = -EFAULT; diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 140c8d3e946e..3172158d8faa 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -70,7 +70,6 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, pgoff_t offset, max_off; _dst_pte = mk_pte(page, dst_vma->vm_page_prot); - _dst_pte = pte_mkdirty(_dst_pte); if (page_in_cache && !vm_shared) writable = false; @@ -85,13 +84,18 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, if (writable) _dst_pte = pte_mkwrite(_dst_pte); - else + else { /* * We need this to make sure write bit removed; as mk_pte() * could return a pte with write bit set. */ _dst_pte = pte_wrprotect(_dst_pte); + /* Marking RO entries as dirty can mess with other code */ + if (uffd_flags & UFFD_FLAGS_WRITE_LIKELY) + _dst_pte = pte_mkdirty(_dst_pte); + } + if (uffd_flags & UFFD_FLAGS_ACCESS_LIKELY) _dst_pte = pte_mkyoung(_dst_pte); @@ -180,6 +184,9 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm, *pagep = NULL; } + /* The PTE is not marked as dirty unconditionally */ + SetPageDirty(page); + /* * The memory barrier inside __SetPageUptodate makes sure that * preceding stores to the page contents become visible before From patchwork Sun Jun 19 23:34:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12887052 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 517A0C433EF for ; Mon, 20 Jun 2022 07:09:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B22036B0073; Mon, 20 Jun 2022 03:09:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A83D88E0006; Mon, 20 Jun 2022 03:09:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7EC358E0005; Mon, 20 Jun 2022 03:09:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 5ED6F6B0073 for ; Mon, 20 Jun 2022 03:09:08 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 38C9F345B9 for ; Mon, 20 Jun 2022 07:09:08 +0000 (UTC) X-FDA: 79597737576.15.4B658D0 Received: from relay3.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by imf17.hostedemail.com (Postfix) with ESMTP id B4E424000D for ; Mon, 20 Jun 2022 07:09:07 +0000 (UTC) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id 796E860ADF for ; Mon, 20 Jun 2022 07:09:07 +0000 (UTC) X-FDA: 79597737534.11.C7A85FB Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf13.hostedemail.com (Postfix) with ESMTP id DBAE4200AC for ; Mon, 20 Jun 2022 07:09:06 +0000 (UTC) Received: by mail-pl1-f176.google.com with SMTP id i15so8947115plr.1 for ; Mon, 20 Jun 2022 00:09:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=SqhwD4Dw3c180meNDcwp6+/UH/0kbGdZS9x035w9RJw=; b=oBvMChcfIyC7J9baP37XK8GZBgYKB8YMMQJadLYmp+2az+rS0pNxtjH3nn4fgzZ7TN fKWU8seeomXIyc+Hkp5XDGQktCrwgJVmjNz7xYDuXAI7b4a54WXyhSKLCgrLO1qAyC2X j/kXP0HQbWTyPMJir2mAII1FaLhcSdBucRqOooAERxsdCOHjO+Ggo5vsGQDDnUbryEZp cK5KFsrGhUNkrDxJea+alTmnZjLYEXisHK+4Hq0PSwfMw7AXqtyt8iGbYEFgrLBMP6IR eNfm6Z2wuxlrtzNYaNmLAbk5Qm5F2DVQA9YF/3U+cf8lghNj2EPRenqZM5FvZGWiLXkM lkJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=SqhwD4Dw3c180meNDcwp6+/UH/0kbGdZS9x035w9RJw=; b=ii6puxod0rXkE8WfISLaNss+n5g/duFtxT7DeCV2DAc25ixQIsTzXt5XkGb6KycbKW 2GXK4FLJN+kp0eF8Ok+14remOXZCDcK7GgAFeng/SI+1dRS8n32H/+zxUDLW82bPVinm gTnSiA8uA3SPKgWzkOJRQxHV9RsXr0HPho0nU3BMbM1G/K0HGFge2GjZoCQm089KNGDq pnifRHWCYE/fmqI0nLBApmdmoNP5UyvqUoFCPerZIjH2OPiqpE2fkJJRNk80R3pqASB7 JB9oVL9bb1+DzIOEv1CqPVipPG0x7pMZeWfBT/HNJBKgZQGdfMPfMq28c22kuLYhy2/z M5mA== X-Gm-Message-State: AJIora/H48XZ5nbO1nhDL+7w6TGa3c2t8l+c/oVb9vmsRwUClWYISxM4 OF6Y74Ba8A/sSd65/iG7MUG+kstTKO2Mww== X-Google-Smtp-Source: AGRyM1tUF9UYeE90XTSM99oBhxeSAEWxYt2np7SLzQ9ZuvktTWEHIpRkxQBaGMfLtsIlGi4g8/s02Q== X-Received: by 2002:a17:90b:38c2:b0:1e8:747f:a13b with SMTP id nn2-20020a17090b38c200b001e8747fa13bmr25388385pjb.166.1655708945416; Mon, 20 Jun 2022 00:09:05 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id a8-20020a17090a6d8800b001e2ee0a9ac6sm9639773pjk.44.2022.06.20.00.09.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Jun 2022 00:09:05 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: Nadav Amit , David Hildenbrand , Mike Kravetz , Hugh Dickins , Andrew Morton , Axel Rasmussen , Peter Xu , Mike Rapoport Subject: [RFC PATCH v2 4/5] userfaultfd: zero access/write hints Date: Sun, 19 Jun 2022 16:34:48 -0700 Message-Id: <20220619233449.181323-5-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220619233449.181323-1-namit@vmware.com> References: <20220619233449.181323-1-namit@vmware.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655708947; a=rsa-sha256; cv=none; b=J4N0CBHnBQnhSTUSU5hJaqasSalwPWGVcrsCO6rFtKwwxUUUKk6letCEpv7cLmyQi/dvmT Te3yLwi0UV3fp6iguIw3fPRhEcqVKIeVinVQD9+BqpUYk6gGMBYTG7ch9JveveMZ/HCBGU WhnGk4QDOrd+2dlUTq/l/7qHUE7+QyI= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=oBvMChcf; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf17.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.11) smtp.mailfrom=MAILER-DAEMON@hostedemail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655708947; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SqhwD4Dw3c180meNDcwp6+/UH/0kbGdZS9x035w9RJw=; b=qvs1pvCPI7uxNB4nQOks7pVMhJNP5LDC1vxHzZVz6vagHjcRVP2FE9AN1u6ZuPFAB5vpE/ XImjfpuPuU6PLiZZWVfmRjJOe3Wm1SB1X/NbndDQK/X8+VxZG1nk1bvZDczm7BkIFp424K A1yfRfOcBhueEg/CHN8+hjwi5uNXozw= X-Rspamd-Queue-Id: B4E424000D X-Rspam-User: Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=oBvMChcf; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf17.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.11) smtp.mailfrom=MAILER-DAEMON@hostedemail.com X-HE-Tag-Orig: 1655708946-469806 X-Rspamd-Server: rspam06 X-Stat-Signature: k384dgya6bu851sb94nmwybffno9us9d X-HE-Tag: 1655708947-302245 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit When userfaultfd provides a zeropage in response to ioctl, it provides a readonly alias to the zero page. If the page is later written (which is the likely scenario), page-fault occurs and the page-fault allocator allocates a page and rewires the page-tables. This is an expensive flow for cases in which a page is likely be written to. Users can use the copy ioctl to initialize zero page (by copying zeros), but this is also wasteful. Allow userfaultfd users to efficiently map initialized zero-pages that are writable. Introduce UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY, which, when provided would map a clear page instead of an alias to the zero page. For consistency, introduce also UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY. Suggested-by: David Hildenbrand Cc: Mike Kravetz Cc: Hugh Dickins Cc: Andrew Morton Cc: Axel Rasmussen Cc: Peter Xu Cc: Mike Rapoport Signed-off-by: Nadav Amit --- fs/userfaultfd.c | 14 +++++++++++-- include/uapi/linux/userfaultfd.h | 2 ++ mm/userfaultfd.c | 36 ++++++++++++++++++++++++++++++++ 3 files changed, 50 insertions(+), 2 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index a56983b594d5..ff073de78ea8 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1770,6 +1770,8 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, struct uffdio_zeropage uffdio_zeropage; struct uffdio_zeropage __user *user_uffdio_zeropage; struct userfaultfd_wake_range range; + bool mode_dontwake, mode_access_likely, mode_write_likely; + uffd_flags_t uffd_flags; user_uffdio_zeropage = (struct uffdio_zeropage __user *) arg; @@ -1788,8 +1790,16 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, if (ret) goto out; ret = -EINVAL; - if (uffdio_zeropage.mode & ~UFFDIO_ZEROPAGE_MODE_DONTWAKE) - goto out; + + mode_dontwake = uffdio_zeropage.mode & UFFDIO_ZEROPAGE_MODE_DONTWAKE; + mode_access_likely = uffdio_zeropage.mode & UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY; + mode_write_likely = uffdio_zeropage.mode & UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY; + + if (mode_dontwake) + return -EINVAL; + + uffd_flags = (mode_access_likely ? UFFD_FLAGS_ACCESS_LIKELY : 0) | + (mode_write_likely ? UFFD_FLAGS_WRITE_LIKELY : 0); if (mmget_not_zero(ctx->mm)) { ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start, diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 6ad93a13282e..b586b7c1e265 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -286,6 +286,8 @@ struct uffdio_copy { struct uffdio_zeropage { struct uffdio_range range; #define UFFDIO_ZEROPAGE_MODE_DONTWAKE ((__u64)1<<0) +#define UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY ((__u64)1<<2) +#define UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY ((__u64)1<<3) __u64 mode; /* diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 3172158d8faa..5dfbb1e80369 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -249,6 +249,38 @@ static int mfill_zeropage_pte(struct mm_struct *dst_mm, return ret; } +static int mfill_clearpage_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, + struct vm_area_struct *dst_vma, + unsigned long dst_addr, + uffd_flags_t uffd_flags) +{ + struct page *page; + int ret; + + ret = -ENOMEM; + page = alloc_zeroed_user_highpage_movable(dst_vma, dst_addr); + if (!page) + goto out; + + /* The PTE is not marked as dirty unconditionally */ + SetPageDirty(page); + __SetPageUptodate(page); + + ret = -ENOMEM; + if (mem_cgroup_charge(page_folio(page), dst_vma->vm_mm, GFP_KERNEL)) + goto out_release; + + ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr, + page, true, uffd_flags); + if (ret) + goto out_release; +out: + return ret; +out_release: + put_page(page); + goto out; +} + /* Handles UFFDIO_CONTINUE for all shmem VMAs (shared or private). */ static int mcontinue_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, @@ -511,6 +543,10 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, err = mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, src_addr, page, uffd_flags); + else if (!(uffd_flags & UFFD_FLAGS_WP) && + (uffd_flags & UFFD_FLAGS_WRITE_LIKELY)) + err = mfill_clearpage_pte(dst_mm, dst_pmd, dst_vma, + dst_addr, uffd_flags); else err = mfill_zeropage_pte(dst_mm, dst_pmd, dst_vma, dst_addr, uffd_flags); From patchwork Sun Jun 19 23:34:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12887053 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6FB1C43334 for ; Mon, 20 Jun 2022 07:09:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 77C1C6B0074; Mon, 20 Jun 2022 03:09:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6DA838E0006; Mon, 20 Jun 2022 03:09:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4BB188E0005; Mon, 20 Jun 2022 03:09:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 2D5306B0074 for ; Mon, 20 Jun 2022 03:09:09 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id E8E8220CD5 for ; Mon, 20 Jun 2022 07:09:08 +0000 (UTC) X-FDA: 79597737576.29.777B09C Received: from relay3.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by imf01.hostedemail.com (Postfix) with ESMTP id 6DA1440010 for ; Mon, 20 Jun 2022 07:09:08 +0000 (UTC) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 29B4C120A28 for ; Mon, 20 Jun 2022 07:09:08 +0000 (UTC) X-FDA: 79597737576.17.CC666E6 Received: from mail-pj1-f51.google.com (mail-pj1-f51.google.com [209.85.216.51]) by imf28.hostedemail.com (Postfix) with ESMTP id C66B4C00B2 for ; Mon, 20 Jun 2022 07:09:07 +0000 (UTC) Received: by mail-pj1-f51.google.com with SMTP id x1-20020a17090abc8100b001ec7f8a51f5so5880269pjr.0 for ; Mon, 20 Jun 2022 00:09:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=G4PEtPnEHe98X1Yl7BPl8PbpqJ52DPzGWNZy0Kq4zsI=; b=jtMJWL4H+5SaK1T37BskSqqDGgIiCGbCpyD39uy0yc7DVbHjco8DxisH9/Wze/z5ic GyBW9slf4RjkoGsIv/BXDr1YNS25lO+ZFuJ8wpoMF5XMC2CeCQEdKembserR2V4e/Dmy saimbHSx7meOrit3VNyZ3IBqvmFbND1/TRt3r0PvYMiwwJB0Q/WlhvF7HqUSKSZzKFA9 IhWDpIuOpRHunQ5S3ECzESXQ5DCjOrrtWSnp7Gm0oY9+PeHVFCxBhL85QqgWwS3aXEtG piGgJRIT8UBg3fflXmzsHmnIKzZKgG9BurtmxHLuPWK51NtwYIA1k22bXtO3Z2rvC5mF TUvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=G4PEtPnEHe98X1Yl7BPl8PbpqJ52DPzGWNZy0Kq4zsI=; b=bdEjK237ji3bU9UXCU+W9DygIdYlEMlyJJLeZyZZNePku0AUsM737LUb0IG9E5ARnK SCRb01ruE2PqnP0ogZn9Sxv+Ps91saNu1my7sJcNvcgp9Uw89tpJcp8lExZqlE126M/8 grkk+m9YWOy6GLWlFXZgghPNpuRquatJ55ti0r00aKkZQF8zexeBLfxsp7elslrq8v8E opMiroY+wPbBYAt195kODXRIsPKwXMEc5MI5VjFoadT9rd8h2bdkXCyQO17+ynP13Tr7 3XW9FnZpCB9bMN2C0SRBREtI96s42bKQXxMMz6nIVo8J3BQbL40kdQy5uTbMLTaIfjXi yw2w== X-Gm-Message-State: AJIora/atyo7OOLYzGsEaIR5OjZAvYctAN8+gKvsVc6s9XDTW2LRb81s gvFB7B6b94sw489wxU3CsuQ/WTdfnCBXmw== X-Google-Smtp-Source: AGRyM1uapjbpFjTbuQhIIT1nwZoRi6loEQVWroK6cutY8NkkLHV+TQ/nyOl6SpDLBw1I5Asw9aJJgA== X-Received: by 2002:a17:90a:ac05:b0:1ec:aabf:e13d with SMTP id o5-20020a17090aac0500b001ecaabfe13dmr3454952pjq.96.1655708946540; Mon, 20 Jun 2022 00:09:06 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id a8-20020a17090a6d8800b001e2ee0a9ac6sm9639773pjk.44.2022.06.20.00.09.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Jun 2022 00:09:06 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: Nadav Amit , Mike Kravetz , Hugh Dickins , Andrew Morton , Axel Rasmussen , Peter Xu , David Hildenbrand , Mike Rapoport Subject: [RFC PATCH v2 5/5] selftest/userfaultfd: test read/write hints Date: Sun, 19 Jun 2022 16:34:49 -0700 Message-Id: <20220619233449.181323-6-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220619233449.181323-1-namit@vmware.com> References: <20220619233449.181323-1-namit@vmware.com> MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655708948; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=G4PEtPnEHe98X1Yl7BPl8PbpqJ52DPzGWNZy0Kq4zsI=; b=qfSdTDskZrSb46Bw5irmzQsM3iuEx7IaW8AN7lsKH/UDucAU4RvEYB2TJVJ6gtKQkJg5Pa MoIiCzaWTjFmkFbLnSF4hTJeDfn42DWBE6JB8wRRfhOnTvce+vCtEY5EeEDy2038UgTUJm oBc9X5mYIk+RHJBh+7+D4k7MkA+DqgY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655708948; a=rsa-sha256; cv=none; b=jVZrYE2fO4iD6sn1ehPGmn4CEBnCd+/tfxx8jEnz6OmGQ3rLEi/osZFY/A40oHL9nqpw/C KdURbXpRRHRBI4bMvQimdDo//WWal7nOOmC8BI8clHS7akL45y9o3CwuDEpJURcglyZXO8 0OcOfvqo2i8VWltOp/DK68GqRaLYpz8= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=jtMJWL4H; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf01.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.17) smtp.mailfrom=MAILER-DAEMON@hostedemail.com X-HE-Tag-Orig: 1655708947-929320 Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=jtMJWL4H; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf01.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.17) smtp.mailfrom=MAILER-DAEMON@hostedemail.com X-Rspamd-Server: rspam12 X-Rspam-User: X-Stat-Signature: aiqms5uszg1abmrxkdg1ir1ppje8wwaq X-Rspamd-Queue-Id: 6DA1440010 X-HE-Tag: 1655708948-793671 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Test UFFDIO_*_MODE_ACCESS_LIKELY and UFFDIO_*_MODE_WRITE_LIKELY. Introduce a modifier to trigger the use of the hints. Cc: Mike Kravetz Cc: Hugh Dickins Cc: Andrew Morton Cc: Axel Rasmussen Cc: Peter Xu Cc: David Hildenbrand Cc: Mike Rapoport Signed-off-by: Nadav Amit --- tools/testing/selftests/vm/userfaultfd.c | 26 ++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 28b881523d15..01680a7d1cdd 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -88,6 +88,8 @@ static volatile bool test_uffdio_zeropage_eexist = true; static bool test_uffdio_wp = true; /* Whether to test uffd minor faults */ static bool test_uffdio_minor = false; +static bool test_access_likely; +static bool test_write_likely; static bool map_shared; static int shm_fd; @@ -550,6 +552,12 @@ static void wp_range(int ufd, __u64 start, __u64 len, bool wp) /* Undo write-protect, do wakeup after that */ prms.mode = wp ? UFFDIO_WRITEPROTECT_MODE_WP : 0; + if (test_access_likely) + prms.mode |= UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY; + + if (test_write_likely) + prms.mode |= UFFDIO_WRITEPROTECT_MODE_WRITE_LIKELY; + if (ioctl(ufd, UFFDIO_WRITEPROTECT, &prms)) err("clear WP failed: address=0x%"PRIx64, (uint64_t)start); } @@ -653,6 +661,13 @@ static int __copy_page(int ufd, unsigned long offset, bool retry) uffdio_copy.mode = UFFDIO_COPY_MODE_WP; else uffdio_copy.mode = 0; + + if (test_access_likely) + uffdio_copy.mode |= UFFDIO_COPY_MODE_ACCESS_LIKELY; + + if (test_write_likely) + uffdio_copy.mode |= UFFDIO_COPY_MODE_WRITE_LIKELY; + uffdio_copy.copy = 0; if (ioctl(ufd, UFFDIO_COPY, &uffdio_copy)) { /* real retval in ufdio_copy.copy */ @@ -1080,6 +1095,13 @@ static int __uffdio_zeropage(int ufd, unsigned long offset, bool retry) uffdio_zeropage.range.start = (unsigned long) area_dst + offset; uffdio_zeropage.range.len = page_size; uffdio_zeropage.mode = 0; + + if (test_access_likely) + uffdio_zeropage.mode |= UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY; + + if (test_write_likely) + uffdio_zeropage.mode |= UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY; + ret = ioctl(ufd, UFFDIO_ZEROPAGE, &uffdio_zeropage); res = uffdio_zeropage.zeropage; if (ret) { @@ -1648,6 +1670,10 @@ static void parse_test_type_arg(const char *raw_type) set_test_type(token); else if (!strcmp(token, "dev")) test_dev_userfaultfd = true; + else if (!strcmp(token, "access_likely")) + test_access_likely = true; + else if (!strcmp(token, "write_likely")) + test_write_likely = true; else err("unrecognized test mod '%s'", token); }