From patchwork Wed Jun 22 18:50:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12891716 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FDD7C43334 for ; Thu, 23 Jun 2022 02:25:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0BEBE8E010B; Wed, 22 Jun 2022 22:25:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EC1838E00FA; Wed, 22 Jun 2022 22:25:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D636D8E010B; Wed, 22 Jun 2022 22:25:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id BA5C08E00FA for ; Wed, 22 Jun 2022 22:25:55 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 8C8D93570B for ; Thu, 23 Jun 2022 02:25:55 +0000 (UTC) X-FDA: 79607910270.22.C4064E4 Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by imf15.hostedemail.com (Postfix) with ESMTP id 0F971A0004 for ; Thu, 23 Jun 2022 02:25:54 +0000 (UTC) Received: from smtpin31.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id BF166613C7 for ; Thu, 23 Jun 2022 02:25:54 +0000 (UTC) X-FDA: 79607910228.31.9C376B6 Received: from mail-pg1-f177.google.com (mail-pg1-f177.google.com [209.85.215.177]) by imf19.hostedemail.com (Postfix) with ESMTP id 4168C1A001B for ; Thu, 23 Jun 2022 02:25:54 +0000 (UTC) Received: by mail-pg1-f177.google.com with SMTP id 184so17782051pga.12 for ; Wed, 22 Jun 2022 19:25:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=pqUWXRGN0QqWO9lVdyz5c49OfcgUmUpD4Ajt7dD7c2w=; b=AxXDQN57elghAQEwg0BQjwqIgveFwnskzGAQewqhgTSoSws50GuIf2XQ3OeNl5Ffc6 wdhun4ZvjQtzn1zwk3K3sHYKpCj0WpnwHuFQG7uZIXn+znwiN8ESXP5z14g+V5TYdDa1 erVFruODWKInd6+wkUNPzqxsppwYGDof/s39owCjrIxwxTc63PQ37F8r3V8Ci6K1ew9W 1Vf5R2tuVVEYuYh2x+0BsE7VA/d5x01Q2Kx1ApZx3KlcxbEcHcmesQQsLiETRniMDSmn VqReayxmMCtIbrVt6j6z9YgMJlhpUInrkSgbJIpoA3+8+Jf/n6HzTWsh5UtFDfeKsYgF WU9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=pqUWXRGN0QqWO9lVdyz5c49OfcgUmUpD4Ajt7dD7c2w=; b=XL5Xeoaed8cX+PCtHv3ssngx1OAxBqaFI5qySd9I3uzSyg8TVem1ecI1tLNNrnlTm1 LaLnyZAzSwLvMp35PxY6C3El/oS89rE4zstlbVAtnApyiX3N0OGEXAnPtJhqYgHpf8GV W4W5pd6PzjWKdXjdJYcz9cbNShrW7Cjfdos5ydsO1ROAAUtDpYRHjuNQB5OYQvrUR2Op xlMhU4RNZYxxEVx5Z8X785B3BaIf4efGcQdqTSdf3Th4RLIfY7DeWaEVSofyRCBm0ke8 DgGA+eAw/5DbYUT1RuUkccShyEN4yiRJ4qfhobRPY4ZDMNchgIQdKauSJEH7H4x9Wrw9 YcEA== X-Gm-Message-State: AJIora/SFAAC/G3hDWOzWl9sxPAlocKzWjCqpJ8zxQmytRW8VOR+GyrV 3w47j0/YYaM0AEdq6K+E9ILM5AOKS47MoA== X-Google-Smtp-Source: AGRyM1tbtJ3Asncd5WV1OcxPI3TVk6x3OE+eKTNNG5N3wWd13mSVvDRBh+CJczpsvfNykKvlKYYCcQ== X-Received: by 2002:a65:6045:0:b0:399:3a5e:e25a with SMTP id a5-20020a656045000000b003993a5ee25amr5430257pgp.139.1655951152951; Wed, 22 Jun 2022 19:25:52 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id ik10-20020a170902ab0a00b001617541c94fsm13423998plb.60.2022.06.22.19.25.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Jun 2022 19:25:52 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: Nadav Amit , Mike Kravetz , Hugh Dickins , Andrew Morton , Axel Rasmussen , Peter Xu , David Hildenbrand , Mike Rapoport Subject: [PATCH v1 2/5] userfaultfd: introduce access-likely mode for common operations Date: Wed, 22 Jun 2022 11:50:35 -0700 Message-Id: <20220622185038.71740-3-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220622185038.71740-1-namit@vmware.com> References: <20220622185038.71740-1-namit@vmware.com> MIME-Version: 1.0 ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=AxXDQN57; spf=none (imf15.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.11) smtp.mailfrom=MAILER-DAEMON@hostedemail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655951155; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pqUWXRGN0QqWO9lVdyz5c49OfcgUmUpD4Ajt7dD7c2w=; b=6D2+KfZ1nIrU378TM7mg48Rnnp3tXUc4+9j9150T5/hk32zCRFgGogoGDK1Jbh24qbmLRA YZGJ2hUA4yCkjda/hYASdtMT0ZLPn3DbD3oL7rwfXetDnj4hc80VP1GzVtBnpsP8PHblYH k9mcbhriczq+saGTrv5cKW5QJWpuchk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655951155; a=rsa-sha256; cv=none; b=WQuc+nPZIdFuWHpH5FFmp6n9Bdog2Lw0HvcCCN83ZzqO7bDTVrEEZLpqhXiZhlS1mPa/XF vRooF32YpSFYyVpMa1Jugi2Qjc0msKJbMP8uRRgT3xg9iHLmPj3MI/H9J8/vq2dkK4dp75 xIz0Mw0gj9lMg4IKrhKrUzxXU3p+Nmg= X-HE-Tag-Orig: 1655951154-354615 X-Stat-Signature: kqs6czxgnat3nfqgpmweyh7quokoxo7r X-Rspam-User: X-Rspamd-Server: rspam07 Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=AxXDQN57; spf=none (imf15.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.11) smtp.mailfrom=MAILER-DAEMON@hostedemail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Queue-Id: 0F971A0004 X-HE-Tag: 1655951154-254693 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Using a PTE on x86 with cleared access-bit (aka young-bit) takes ~600 cycles more than when the access bit is set. At the same time, setting the access-bit for memory that is not used (e.g., prefetched) can introduce greater overheads, as the prefetched memory is reclaimed later than it should be. Userfaultfd currently does not set the access-bit (excluding the huge-pages case). Arguably, it is best to let the user control whether the access bit should be set or not. The expected use is to request userfaultfd to set the access-bit when the copy/wp operation is done to resolve a page-fault, and not to set the access-bit when the memory is prefetched. Introduce UFFDIO_[op]_ACCESS_LIKELY to enable userspace to request the young bit to be set. Cc: Mike Kravetz Cc: Hugh Dickins Cc: Andrew Morton Cc: Axel Rasmussen Cc: Peter Xu Cc: David Hildenbrand Cc: Mike Rapoport Signed-off-by: Nadav Amit --- fs/userfaultfd.c | 25 ++++++++++++++++++++----- include/linux/userfaultfd_k.h | 1 + include/uapi/linux/userfaultfd.h | 20 +++++++++++++++++++- mm/userfaultfd.c | 16 ++++++++++++---- 4 files changed, 52 insertions(+), 10 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index a44e46f8249f..abf176bd0349 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1726,12 +1726,15 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, ret = -EINVAL; if (uffdio_copy.src + uffdio_copy.len <= uffdio_copy.src) goto out; - if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP)) + if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP| + UFFDIO_COPY_MODE_ACCESS_LIKELY)) goto out; mode_wp = uffdio_copy.mode & UFFDIO_COPY_MODE_WP; uffd_flags = mode_wp ? UFFD_FLAGS_WP : UFFD_FLAGS_NONE; + if (uffdio_copy.mode & UFFDIO_COPY_MODE_ACCESS_LIKELY) + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; if (mmget_not_zero(ctx->mm)) { ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src, @@ -1783,9 +1786,13 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, if (ret) goto out; ret = -EINVAL; - if (uffdio_zeropage.mode & ~UFFDIO_ZEROPAGE_MODE_DONTWAKE) + if (uffdio_zeropage.mode & ~(UFFDIO_ZEROPAGE_MODE_DONTWAKE| + UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY)) goto out; + if (uffdio_zeropage.mode & UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY) + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + if (mmget_not_zero(ctx->mm)) { ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start, uffdio_zeropage.range.len, @@ -1835,7 +1842,8 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, return ret; if (uffdio_wp.mode & ~(UFFDIO_WRITEPROTECT_MODE_DONTWAKE | - UFFDIO_WRITEPROTECT_MODE_WP)) + UFFDIO_WRITEPROTECT_MODE_WP | + UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY)) return -EINVAL; mode_wp = uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_WP; @@ -1845,6 +1853,8 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, return -EINVAL; uffd_flags = mode_wp ? UFFD_FLAGS_WP : UFFD_FLAGS_NONE; + if (uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY) + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; if (mmget_not_zero(ctx->mm)) { ret = mwriteprotect_range(ctx->mm, uffdio_wp.range.start, @@ -1872,6 +1882,7 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) struct uffdio_continue uffdio_continue; struct uffdio_continue __user *user_uffdio_continue; struct userfaultfd_wake_range range; + uffd_flags_t uffd_flags = UFFD_FLAGS_NONE; user_uffdio_continue = (struct uffdio_continue __user *)arg; @@ -1896,13 +1907,17 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) uffdio_continue.range.start) { goto out; } - if (uffdio_continue.mode & ~UFFDIO_CONTINUE_MODE_DONTWAKE) + if (uffdio_continue.mode & ~(UFFDIO_CONTINUE_MODE_DONTWAKE| + UFFDIO_CONTINUE_MODE_ACCESS_LIKELY)) goto out; + if (uffdio_continue.mode & UFFDIO_CONTINUE_MODE_ACCESS_LIKELY) + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + if (mmget_not_zero(ctx->mm)) { ret = mcopy_continue(ctx->mm, uffdio_continue.range.start, uffdio_continue.range.len, - &ctx->mmap_changing, 0); + &ctx->mmap_changing, uffd_flags); mmput(ctx->mm); } else { return -ESRCH; diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index d5b3dff48a87..af268b2c2b27 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -59,6 +59,7 @@ typedef unsigned int __bitwise uffd_flags_t; #define UFFD_FLAGS_NONE ((__force uffd_flags_t)0) #define UFFD_FLAGS_WP ((__force uffd_flags_t)BIT(0)) +#define UFFD_FLAGS_ACCESS_LIKELY ((__force uffd_flags_t)BIT(1)) extern int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 005e5e306266..ff7150c878bb 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -38,7 +38,8 @@ UFFD_FEATURE_MINOR_HUGETLBFS | \ UFFD_FEATURE_MINOR_SHMEM | \ UFFD_FEATURE_EXACT_ADDRESS | \ - UFFD_FEATURE_WP_HUGETLBFS_SHMEM) + UFFD_FEATURE_WP_HUGETLBFS_SHMEM | \ + UFFD_FEATURE_ACCESS_HINTS) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -203,6 +204,9 @@ struct uffdio_api { * * UFFD_FEATURE_WP_HUGETLBFS_SHMEM indicates that userfaultfd * write-protection mode is supported on both shmem and hugetlbfs. + * + * UFFD_FEATURE_ACCESS_HINTS indicates that the ioctl operations + * support the UFFDIO_*_MODE_ACCESS_LIKELY hints. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -217,6 +221,7 @@ struct uffdio_api { #define UFFD_FEATURE_MINOR_SHMEM (1<<10) #define UFFD_FEATURE_EXACT_ADDRESS (1<<11) #define UFFD_FEATURE_WP_HUGETLBFS_SHMEM (1<<12) +#define UFFD_FEATURE_ACCESS_HINTS (1<<13) __u64 features; __u64 ioctls; @@ -251,8 +256,14 @@ struct uffdio_copy { * the fly. UFFDIO_COPY_MODE_WP is available only if the * write protected ioctl is implemented for the range * according to the uffdio_register.ioctls. + * + * UFFDIO_COPY_MODE_ACCESS_LIKELY provides a hint to the kernel that the + * page is likely to be access in the near future. Providing the hint + * properly can improve performance. + * */ #define UFFDIO_COPY_MODE_WP ((__u64)1<<1) +#define UFFDIO_COPY_MODE_ACCESS_LIKELY ((__u64)1<<2) __u64 mode; /* @@ -265,6 +276,7 @@ struct uffdio_copy { struct uffdio_zeropage { struct uffdio_range range; #define UFFDIO_ZEROPAGE_MODE_DONTWAKE ((__u64)1<<0) +#define UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY ((__u64)1<<1) __u64 mode; /* @@ -284,6 +296,10 @@ struct uffdio_writeprotect { * UFFDIO_WRITEPROTECT_MODE_DONTWAKE: set the flag to avoid waking up * any wait thread after the operation succeeds. * + * UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY provides a hint to the kernel + * that the page is likely to be access in the near future. Providing + * the hint properly can improve performance. + * * NOTE: Write protecting a region (WP=1) is unrelated to page faults, * therefore DONTWAKE flag is meaningless with WP=1. Removing write * protection (WP=0) in response to a page fault wakes the faulting @@ -291,12 +307,14 @@ struct uffdio_writeprotect { */ #define UFFDIO_WRITEPROTECT_MODE_WP ((__u64)1<<0) #define UFFDIO_WRITEPROTECT_MODE_DONTWAKE ((__u64)1<<1) +#define UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY ((__u64)1<<2) __u64 mode; }; struct uffdio_continue { struct uffdio_range range; #define UFFDIO_CONTINUE_MODE_DONTWAKE ((__u64)1<<0) +#define UFFDIO_CONTINUE_MODE_ACCESS_LIKELY ((__u64)1<<1) __u64 mode; /* diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 734de6aa0b8e..5051b9028722 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -92,6 +92,9 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, */ _dst_pte = pte_wrprotect(_dst_pte); + if (uffd_flags & UFFD_FLAGS_ACCESS_LIKELY) + _dst_pte = pte_mkyoung(_dst_pte); + dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl); if (vma_is_shmem(dst_vma)) { @@ -202,7 +205,8 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm, static int mfill_zeropage_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, - unsigned long dst_addr) + unsigned long dst_addr, + uffd_flags_t uffd_flags) { pte_t _dst_pte, *dst_pte; spinlock_t *ptl; @@ -225,6 +229,10 @@ static int mfill_zeropage_pte(struct mm_struct *dst_mm, ret = -EEXIST; if (!pte_none(*dst_pte)) goto out_unlock; + + if (uffd_flags & UFFD_FLAGS_ACCESS_LIKELY) + _dst_pte = pte_mkyoung(_dst_pte); + set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte); /* No need to invalidate - it was non-present before */ update_mmu_cache(dst_vma, dst_addr, dst_pte); @@ -498,7 +506,7 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, uffd_flags); else err = mfill_zeropage_pte(dst_mm, dst_pmd, - dst_vma, dst_addr); + dst_vma, dst_addr, uffd_flags); } else { err = shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, src_addr, @@ -692,7 +700,7 @@ ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start, uffd_flags_t uffd_flags) { return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_ZEROPAGE, - mmap_changing, 0); + mmap_changing, uffd_flags); } ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long start, @@ -700,7 +708,7 @@ ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long start, uffd_flags_t uffd_flags) { return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_CONTINUE, - mmap_changing, 0); + mmap_changing, uffd_flags); } int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,