From patchwork Sun Jun 19 23:34:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nadav Amit X-Patchwork-Id: 12887052 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 517A0C433EF for ; Mon, 20 Jun 2022 07:09:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B22036B0073; Mon, 20 Jun 2022 03:09:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A83D88E0006; Mon, 20 Jun 2022 03:09:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7EC358E0005; Mon, 20 Jun 2022 03:09:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 5ED6F6B0073 for ; Mon, 20 Jun 2022 03:09:08 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 38C9F345B9 for ; Mon, 20 Jun 2022 07:09:08 +0000 (UTC) X-FDA: 79597737576.15.4B658D0 Received: from relay3.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by imf17.hostedemail.com (Postfix) with ESMTP id B4E424000D for ; Mon, 20 Jun 2022 07:09:07 +0000 (UTC) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id 796E860ADF for ; Mon, 20 Jun 2022 07:09:07 +0000 (UTC) X-FDA: 79597737534.11.C7A85FB Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf13.hostedemail.com (Postfix) with ESMTP id DBAE4200AC for ; Mon, 20 Jun 2022 07:09:06 +0000 (UTC) Received: by mail-pl1-f176.google.com with SMTP id i15so8947115plr.1 for ; Mon, 20 Jun 2022 00:09:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=SqhwD4Dw3c180meNDcwp6+/UH/0kbGdZS9x035w9RJw=; b=oBvMChcfIyC7J9baP37XK8GZBgYKB8YMMQJadLYmp+2az+rS0pNxtjH3nn4fgzZ7TN fKWU8seeomXIyc+Hkp5XDGQktCrwgJVmjNz7xYDuXAI7b4a54WXyhSKLCgrLO1qAyC2X j/kXP0HQbWTyPMJir2mAII1FaLhcSdBucRqOooAERxsdCOHjO+Ggo5vsGQDDnUbryEZp cK5KFsrGhUNkrDxJea+alTmnZjLYEXisHK+4Hq0PSwfMw7AXqtyt8iGbYEFgrLBMP6IR eNfm6Z2wuxlrtzNYaNmLAbk5Qm5F2DVQA9YF/3U+cf8lghNj2EPRenqZM5FvZGWiLXkM lkJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=SqhwD4Dw3c180meNDcwp6+/UH/0kbGdZS9x035w9RJw=; b=ii6puxod0rXkE8WfISLaNss+n5g/duFtxT7DeCV2DAc25ixQIsTzXt5XkGb6KycbKW 2GXK4FLJN+kp0eF8Ok+14remOXZCDcK7GgAFeng/SI+1dRS8n32H/+zxUDLW82bPVinm gTnSiA8uA3SPKgWzkOJRQxHV9RsXr0HPho0nU3BMbM1G/K0HGFge2GjZoCQm089KNGDq pnifRHWCYE/fmqI0nLBApmdmoNP5UyvqUoFCPerZIjH2OPiqpE2fkJJRNk80R3pqASB7 JB9oVL9bb1+DzIOEv1CqPVipPG0x7pMZeWfBT/HNJBKgZQGdfMPfMq28c22kuLYhy2/z M5mA== X-Gm-Message-State: AJIora/H48XZ5nbO1nhDL+7w6TGa3c2t8l+c/oVb9vmsRwUClWYISxM4 OF6Y74Ba8A/sSd65/iG7MUG+kstTKO2Mww== X-Google-Smtp-Source: AGRyM1tUF9UYeE90XTSM99oBhxeSAEWxYt2np7SLzQ9ZuvktTWEHIpRkxQBaGMfLtsIlGi4g8/s02Q== X-Received: by 2002:a17:90b:38c2:b0:1e8:747f:a13b with SMTP id nn2-20020a17090b38c200b001e8747fa13bmr25388385pjb.166.1655708945416; Mon, 20 Jun 2022 00:09:05 -0700 (PDT) Received: from sc2-haas01-esx0118.eng.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id a8-20020a17090a6d8800b001e2ee0a9ac6sm9639773pjk.44.2022.06.20.00.09.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Jun 2022 00:09:05 -0700 (PDT) From: Nadav Amit X-Google-Original-From: Nadav Amit To: linux-mm@kvack.org Cc: Nadav Amit , David Hildenbrand , Mike Kravetz , Hugh Dickins , Andrew Morton , Axel Rasmussen , Peter Xu , Mike Rapoport Subject: [RFC PATCH v2 4/5] userfaultfd: zero access/write hints Date: Sun, 19 Jun 2022 16:34:48 -0700 Message-Id: <20220619233449.181323-5-namit@vmware.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220619233449.181323-1-namit@vmware.com> References: <20220619233449.181323-1-namit@vmware.com> MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655708947; a=rsa-sha256; cv=none; b=J4N0CBHnBQnhSTUSU5hJaqasSalwPWGVcrsCO6rFtKwwxUUUKk6letCEpv7cLmyQi/dvmT Te3yLwi0UV3fp6iguIw3fPRhEcqVKIeVinVQD9+BqpUYk6gGMBYTG7ch9JveveMZ/HCBGU WhnGk4QDOrd+2dlUTq/l/7qHUE7+QyI= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=oBvMChcf; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf17.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.11) smtp.mailfrom=MAILER-DAEMON@hostedemail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655708947; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SqhwD4Dw3c180meNDcwp6+/UH/0kbGdZS9x035w9RJw=; b=qvs1pvCPI7uxNB4nQOks7pVMhJNP5LDC1vxHzZVz6vagHjcRVP2FE9AN1u6ZuPFAB5vpE/ XImjfpuPuU6PLiZZWVfmRjJOe3Wm1SB1X/NbndDQK/X8+VxZG1nk1bvZDczm7BkIFp424K A1yfRfOcBhueEg/CHN8+hjwi5uNXozw= X-Rspamd-Queue-Id: B4E424000D X-Rspam-User: Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=oBvMChcf; dmarc=pass (policy=none) header.from=gmail.com; spf=none (imf17.hostedemail.com: domain of MAILER-DAEMON@hostedemail.com has no SPF policy when checking 216.40.44.11) smtp.mailfrom=MAILER-DAEMON@hostedemail.com X-HE-Tag-Orig: 1655708946-469806 X-Rspamd-Server: rspam06 X-Stat-Signature: k384dgya6bu851sb94nmwybffno9us9d X-HE-Tag: 1655708947-302245 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Nadav Amit When userfaultfd provides a zeropage in response to ioctl, it provides a readonly alias to the zero page. If the page is later written (which is the likely scenario), page-fault occurs and the page-fault allocator allocates a page and rewires the page-tables. This is an expensive flow for cases in which a page is likely be written to. Users can use the copy ioctl to initialize zero page (by copying zeros), but this is also wasteful. Allow userfaultfd users to efficiently map initialized zero-pages that are writable. Introduce UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY, which, when provided would map a clear page instead of an alias to the zero page. For consistency, introduce also UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY. Suggested-by: David Hildenbrand Cc: Mike Kravetz Cc: Hugh Dickins Cc: Andrew Morton Cc: Axel Rasmussen Cc: Peter Xu Cc: Mike Rapoport Signed-off-by: Nadav Amit --- fs/userfaultfd.c | 14 +++++++++++-- include/uapi/linux/userfaultfd.h | 2 ++ mm/userfaultfd.c | 36 ++++++++++++++++++++++++++++++++ 3 files changed, 50 insertions(+), 2 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index a56983b594d5..ff073de78ea8 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1770,6 +1770,8 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, struct uffdio_zeropage uffdio_zeropage; struct uffdio_zeropage __user *user_uffdio_zeropage; struct userfaultfd_wake_range range; + bool mode_dontwake, mode_access_likely, mode_write_likely; + uffd_flags_t uffd_flags; user_uffdio_zeropage = (struct uffdio_zeropage __user *) arg; @@ -1788,8 +1790,16 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx, if (ret) goto out; ret = -EINVAL; - if (uffdio_zeropage.mode & ~UFFDIO_ZEROPAGE_MODE_DONTWAKE) - goto out; + + mode_dontwake = uffdio_zeropage.mode & UFFDIO_ZEROPAGE_MODE_DONTWAKE; + mode_access_likely = uffdio_zeropage.mode & UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY; + mode_write_likely = uffdio_zeropage.mode & UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY; + + if (mode_dontwake) + return -EINVAL; + + uffd_flags = (mode_access_likely ? UFFD_FLAGS_ACCESS_LIKELY : 0) | + (mode_write_likely ? UFFD_FLAGS_WRITE_LIKELY : 0); if (mmget_not_zero(ctx->mm)) { ret = mfill_zeropage(ctx->mm, uffdio_zeropage.range.start, diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 6ad93a13282e..b586b7c1e265 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -286,6 +286,8 @@ struct uffdio_copy { struct uffdio_zeropage { struct uffdio_range range; #define UFFDIO_ZEROPAGE_MODE_DONTWAKE ((__u64)1<<0) +#define UFFDIO_ZEROPAGE_MODE_ACCESS_LIKELY ((__u64)1<<2) +#define UFFDIO_ZEROPAGE_MODE_WRITE_LIKELY ((__u64)1<<3) __u64 mode; /* diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 3172158d8faa..5dfbb1e80369 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -249,6 +249,38 @@ static int mfill_zeropage_pte(struct mm_struct *dst_mm, return ret; } +static int mfill_clearpage_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, + struct vm_area_struct *dst_vma, + unsigned long dst_addr, + uffd_flags_t uffd_flags) +{ + struct page *page; + int ret; + + ret = -ENOMEM; + page = alloc_zeroed_user_highpage_movable(dst_vma, dst_addr); + if (!page) + goto out; + + /* The PTE is not marked as dirty unconditionally */ + SetPageDirty(page); + __SetPageUptodate(page); + + ret = -ENOMEM; + if (mem_cgroup_charge(page_folio(page), dst_vma->vm_mm, GFP_KERNEL)) + goto out_release; + + ret = mfill_atomic_install_pte(dst_mm, dst_pmd, dst_vma, dst_addr, + page, true, uffd_flags); + if (ret) + goto out_release; +out: + return ret; +out_release: + put_page(page); + goto out; +} + /* Handles UFFDIO_CONTINUE for all shmem VMAs (shared or private). */ static int mcontinue_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, @@ -511,6 +543,10 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, err = mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, src_addr, page, uffd_flags); + else if (!(uffd_flags & UFFD_FLAGS_WP) && + (uffd_flags & UFFD_FLAGS_WRITE_LIKELY)) + err = mfill_clearpage_pte(dst_mm, dst_pmd, dst_vma, + dst_addr, uffd_flags); else err = mfill_zeropage_pte(dst_mm, dst_pmd, dst_vma, dst_addr, uffd_flags);