From patchwork Sun Jun 19 23:34:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12887050 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D4C8C43334 for ; Mon, 20 Jun 2022 07:09:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 223DD8E0002; Mon, 20 Jun 2022 03:09:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C1B46B0074; Mon, 20 Jun 2022 03:09:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E05A98E0002; Mon, 20 Jun 2022 03:09:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CDCD76B0073 for ; Mon, 20 Jun 2022 03:09:05 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id A7288459 for ; Mon, 20 Jun 2022 07:09:05 +0000 (UTC) X-FDA: 79597737450.28.76E601A Received: from relay5.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by imf06.hostedemail.com (Postfix) with ESMTP id 447DC1800A9 for ; Mon, 20 Jun 2022 07:09:05 +0000 (UTC) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id E4BF934853 for ; Mon, 20 Jun 2022 07:09:04 +0000 (UTC) X-FDA: 79597737408.14.4644BED Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) by imf17.hostedemail.com (Postfix) with ESMTP id 7DC134000D for ; Mon, 20 Jun 2022 07:09:04 +0000 (UTC) Received: by mail-pj1-f53.google.com with SMTP id p5so3612183pjt.2 for ; Mon, 20 Jun 2022 00:09:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Rqmkflqc8W7O4jv7TJNo2Rb/yH8GIusQs/2+8iaK/eI=; b=dCdhkkH0+yeuvMHakNgMl9Nz2z9cVgLIDATYSbOEc9nmYja449v2JlVDbYJIAUCVw5 Qrat5E4YZ5/C9zXAoxdvEAiltRGpfbpwGP/Omy+1rpRh8DA8UcDHfQd3Knd2YC32o7oC TnccsxwSb4XXlW/fYAyYqVnQ9TXSSRYU/TWrRNaloLTipCyWMy/NmppxZqEHxXqKzrxz bWLNRCaeQ74hsBQcIIifzyu0rV9sLgXje1Qdc4oMWoOIDLmDTrLjJIh7Fa/x4ez3ONTC bkj32HbRGPkbKeVH+OgDhUYQKp/bvHwhABRPnTq5aIiImPGrlC0gGUjISmZ8qHdEW/l2 bU1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Rqmkflqc8W7O4jv7TJNo2Rb/yH8GIusQs/2+8iaK/eI=; b=IkpZMDYgtujL2gdWGM7GY7G1DVvzy2sjST9s0AQ1HwuREfyPaGjwU8w4aRWVRJQ8B6 bCkWlT/R6QFSK4njII9lBOqiW6L1ZFWF9GBKjUfYtvB3VQ/DBJiLuAgimjMw/+J5AAx0 wg0khL6Dhdj+iJZD2bRNVHL1Q3xthZD3vo5wtmAj5WJTxK94AG/SSWRTdfXiOtAbrbMN 0rJ4E3YFrMsbnaVpHa7kZjolFZYCLI7kXrPpAbfk0ySnSwhySVKbO8Bnmeo4cNE2issX RCegsYAjVdQiQC17Ug0MJRahsiXsEsqb6bHUztOI91RfuoBkBSiPEhS97/PPmvht4Mnm wGOw== X-Gm-Message-State: AJIora+92yW8etLOKETBiZjlw+gldi36TD5JojjmceX5/qwDwU8CMjui TCE5/v4CpHugx4Q6Ug7yQQ20S6YHhwvS6w== X-Google-Smtp-Source: AGRyM1uvv1rmPOadDOg30u7NU6O5ocys/7vRqKkcDHKMaR+BHbqJXQuqfY27a6wh0AvrL8NGvHjKew== X-Received: by 2002:a17:90b:4c0c:b0:1ec:b128:51f5 with SMTP id na12-20020a17090b4c0c00b001ecb12851f5mr2209056pjb.220.1655708943109; Mon, 20 Jun 2022 00:09:03 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id a8-20020a17090a6d8800b001e2ee0a9ac6sm9639773pjk.44.2022.06.20.00.09.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Jun 2022 00:09:02 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: Nadav Amit , Mike Kravetz , Hugh Dickins , Andrew Morton , Axel Rasmussen , Peter Xu , David Hildenbrand , Mike Rapoport Subject: [RFC PATCH v2 2/5] userfaultfd: introduce access-likely mode for copy/wp operations Date: Sun, 19 Jun 2022 16:34:46 -0700 Message-Id: <20220619233449.181323-3-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220619233449.181323-1-namit@vmware.com> References: <20220619233449.181323-1-namit@vmware.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655708945; a=rsa-sha256; cv=none; b=mnG/I9dZv3Z8htIcqeg2xYltxzyx7uGSNokBNwwzfb6cjq418bQU1M8DTziqNpKLXuOyGN kABgVR5Sjkk0AswbacCwgeHGgUpcmJ/3IUR8gmjoIAv8IyNBReCO84z8RW/7VwHgPLQzd2 +1jf6fCpXklM/snq9CMPq8q9hAkEXO0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655708945; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Rqmkflqc8W7O4jv7TJNo2Rb/yH8GIusQs/2+8iaK/eI=; b=o/T/gsg7eIsJl1425/92m1s3o3uA8Gv0MpYKsm1o9RZ92JayEiFyAlUNaRp3dm5Y1yCg84 HgZWh67f1rCscE4knjheKVzHKdWZMr2aNUGvSkfM5CdfOH0c+n6lpCMImrEvTiMaqAJuiI qE0fj9HC461/fFVFK94aBsxSaiM+wMA= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=dCdhkkH0; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf06.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.14) smtp.mailfrom=MAILER-DAEMON@hostedemail.com X-HE-Tag-Orig: 1655708944-277591 Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=dCdhkkH0; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf06.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.14) smtp.mailfrom=MAILER-DAEMON@hostedemail.com X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 447DC1800A9 X-Stat-Signature: wh7j5um8sfo73eei6trhusabq1rhfune X-HE-Tag: 1655708945-775101 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit Using a PTE on x86 with cleared access-bit (aka young-bit) takes ~600 cycles more than when the access bit is set. At the same time, setting the access-bit for memory that is not used (e.g., prefetched) can introduce greater overheads, as the prefetched memory is reclaimed later than it should be. Userfaultfd currently does not set the access-bit (excluding the huge-pages case). Arguably, it is best to let the user control whether the access bit should be set or not. The expected use is to request userfaultfd to set the access-bit when the copy/wp operation is done to resolve a page-fault, and not to set the access-bit when the memory is prefetched. Introduce UFFDIO_COPY_MODE_ACCESS_LIKELY and UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY to enable userspace to request the young bit to be set. Set for UFFDIO_CONTINUE and UFFDIO_ZEROPAGE the bit unconditionally since the former is only used to resolve page-faults and the latter would not benefit from not setting the access-bit. Cc: Mike Kravetz Cc: Hugh Dickins Cc: Andrew Morton Cc: Axel Rasmussen Cc: Peter Xu Cc: David Hildenbrand Cc: Mike Rapoport Signed-off-by: Nadav Amit --- fs/userfaultfd.c | 23 ++++++++++++++++------- include/linux/userfaultfd_k.h | 1 + include/uapi/linux/userfaultfd.h | 20 +++++++++++++++++++- mm/userfaultfd.c | 18 ++++++++++++++++-- 4 files changed, 52 insertions(+), 10 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 5daafa54eb3f..35a8c4347c54 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1700,7 +1700,7 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, struct uffdio_copy uffdio_copy; struct uffdio_copy __user *user_uffdio_copy; struct userfaultfd_wake_range range; - bool mode_wp; + bool mode_wp, mode_access_likely; uffd_flags_t uffd_flags; user_uffdio_copy = (struct uffdio_copy __user *) arg; @@ -1726,12 +1726,15 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx, ret = -EINVAL; if (uffdio_copy.src + uffdio_copy.len <= uffdio_copy.src) goto out; - if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP)) + if (uffdio_copy.mode & ~(UFFDIO_COPY_MODE_DONTWAKE|UFFDIO_COPY_MODE_WP| + UFFDIO_COPY_MODE_ACCESS_LIKELY)) goto out; mode_wp = uffdio_copy.mode & UFFDIO_COPY_MODE_WP; + mode_access_likely = uffdio_copy.mode & UFFDIO_COPY_MODE_ACCESS_LIKELY; - uffd_flags = mode_wp ? UFFD_FLAGS_WP : 0; + uffd_flags = (mode_wp ? UFFD_FLAGS_WP : 0) | + (mode_access_likely ? UFFD_FLAGS_ACCESS_LIKELY : 0); if (mmget_not_zero(ctx->mm)) { ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src, @@ -1816,7 +1819,7 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, struct uffdio_writeprotect uffdio_wp; struct uffdio_writeprotect __user *user_uffdio_wp; struct userfaultfd_wake_range range; - bool mode_wp, mode_dontwake; + bool mode_wp, mode_dontwake, mode_access_likely; uffd_flags_t uffd_flags; if (atomic_read(&ctx->mmap_changing)) @@ -1834,16 +1837,19 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, return ret; if (uffdio_wp.mode & ~(UFFDIO_WRITEPROTECT_MODE_DONTWAKE | - UFFDIO_WRITEPROTECT_MODE_WP)) + UFFDIO_WRITEPROTECT_MODE_WP | + UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY)) return -EINVAL; mode_wp = uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_WP; mode_dontwake = uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_DONTWAKE; + mode_access_likely = uffdio_wp.mode & UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY; if (mode_wp && mode_dontwake) return -EINVAL; - uffd_flags = (mode_wp ? UFFD_FLAGS_WP : 0); + uffd_flags = (mode_wp ? UFFD_FLAGS_WP : 0) | + (mode_access_likely ? UFFD_FLAGS_ACCESS_LIKELY : 0); if (mmget_not_zero(ctx->mm)) { ret = mwriteprotect_range(ctx->mm, uffdio_wp.range.start, @@ -1871,6 +1877,7 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) struct uffdio_continue uffdio_continue; struct uffdio_continue __user *user_uffdio_continue; struct userfaultfd_wake_range range; + uffd_flags_t uffd_flags; user_uffdio_continue = (struct uffdio_continue __user *)arg; @@ -1898,10 +1905,12 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) if (uffdio_continue.mode & ~UFFDIO_CONTINUE_MODE_DONTWAKE) goto out; + uffd_flags = UFFD_FLAGS_ACCESS_LIKELY; + if (mmget_not_zero(ctx->mm)) { ret = mcopy_continue(ctx->mm, uffdio_continue.range.start, uffdio_continue.range.len, - &ctx->mmap_changing, 0); + &ctx->mmap_changing, uffd_flags); mmput(ctx->mm); } else { return -ESRCH; diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 6331148023c1..e6ac165ec044 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -58,6 +58,7 @@ enum mcopy_atomic_mode { typedef unsigned int __bitwise uffd_flags_t; #define UFFD_FLAGS_WP ((__force uffd_flags_t)BIT(0)) +#define UFFD_FLAGS_ACCESS_LIKELY ((__force uffd_flags_t)BIT(1)) extern int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 005e5e306266..d9c8ce9ba777 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -38,7 +38,8 @@ UFFD_FEATURE_MINOR_HUGETLBFS | \ UFFD_FEATURE_MINOR_SHMEM | \ UFFD_FEATURE_EXACT_ADDRESS | \ - UFFD_FEATURE_WP_HUGETLBFS_SHMEM) + UFFD_FEATURE_WP_HUGETLBFS_SHMEM | \ + UFFD_FEATURE_ACCESS_HINTS) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -203,6 +204,10 @@ struct uffdio_api { * * UFFD_FEATURE_WP_HUGETLBFS_SHMEM indicates that userfaultfd * write-protection mode is supported on both shmem and hugetlbfs. + * + * UFFD_FEATURE_ACCESS_HINTS indicates that the copy supports + * UFFDIO_COPY_MODE_ACCESS_LIKELY supports + * UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -217,6 +222,7 @@ struct uffdio_api { #define UFFD_FEATURE_MINOR_SHMEM (1<<10) #define UFFD_FEATURE_EXACT_ADDRESS (1<<11) #define UFFD_FEATURE_WP_HUGETLBFS_SHMEM (1<<12) +#define UFFD_FEATURE_ACCESS_HINTS (1<<13) __u64 features; __u64 ioctls; @@ -260,6 +266,13 @@ struct uffdio_copy { * copy_from_user will not read the last 8 bytes. */ __s64 copy; + /* + * UFFDIO_COPY_MODE_ACCESS_LIKELY will set the mapped page as young. + * This can reduce the time that the first access to the page takes. + * Yet, if set opportunistically to memory that is not used, it might + * extend the time before the unused memory pages are reclaimed. + */ +#define UFFDIO_COPY_MODE_ACCESS_LIKELY ((__u64)1<<3) }; struct uffdio_zeropage { @@ -284,6 +297,10 @@ struct uffdio_writeprotect { * UFFDIO_WRITEPROTECT_MODE_DONTWAKE: set the flag to avoid waking up * any wait thread after the operation succeeds. * + * UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY: set the flag to mark the modified + * memory as young, which can reduce the time that the first access + * to the page takes. + * * NOTE: Write protecting a region (WP=1) is unrelated to page faults, * therefore DONTWAKE flag is meaningless with WP=1. Removing write * protection (WP=0) in response to a page fault wakes the faulting @@ -291,6 +308,7 @@ struct uffdio_writeprotect { */ #define UFFDIO_WRITEPROTECT_MODE_WP ((__u64)1<<0) #define UFFDIO_WRITEPROTECT_MODE_DONTWAKE ((__u64)1<<1) +#define UFFDIO_WRITEPROTECT_MODE_ACCESS_LIKELY ((__u64)1<<2) __u64 mode; }; diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 734de6aa0b8e..140c8d3e946e 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -92,6 +92,9 @@ int mfill_atomic_install_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, */ _dst_pte = pte_wrprotect(_dst_pte); + if (uffd_flags & UFFD_FLAGS_ACCESS_LIKELY) + _dst_pte = pte_mkyoung(_dst_pte); + dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl); if (vma_is_shmem(dst_vma)) { @@ -202,7 +205,8 @@ static int mcopy_atomic_pte(struct mm_struct *dst_mm, static int mfill_zeropage_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, - unsigned long dst_addr) + unsigned long dst_addr, + uffd_flags_t uffd_flags) { pte_t _dst_pte, *dst_pte; spinlock_t *ptl; @@ -225,6 +229,10 @@ static int mfill_zeropage_pte(struct mm_struct *dst_mm, ret = -EEXIST; if (!pte_none(*dst_pte)) goto out_unlock; + + if (uffd_flags & UFFD_FLAGS_ACCESS_LIKELY) + _dst_pte = pte_mkyoung(_dst_pte); + set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte); /* No need to invalidate - it was non-present before */ update_mmu_cache(dst_vma, dst_addr, dst_pte); @@ -498,7 +506,7 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, uffd_flags); else err = mfill_zeropage_pte(dst_mm, dst_pmd, - dst_vma, dst_addr); + dst_vma, dst_addr, uffd_flags); } else { err = shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, src_addr, @@ -691,6 +699,9 @@ ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start, unsigned long len, atomic_t *mmap_changing, uffd_flags_t uffd_flags) { + /* There is no cost for setting the access bit of a zeropage */ + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_ZEROPAGE, mmap_changing, 0); } @@ -699,6 +710,9 @@ ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long start, unsigned long len, atomic_t *mmap_changing, uffd_flags_t uffd_flags) { + /* The page is likely to be accessed */ + uffd_flags |= UFFD_FLAGS_ACCESS_LIKELY; + return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_CONTINUE, mmap_changing, 0); }