From patchwork Mon Aug 8 17:56:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12938953 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21249C3F6B0 for ; Mon, 8 Aug 2022 17:56:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AEEF36B0073; Mon, 8 Aug 2022 13:56:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AC56A8E0002; Mon, 8 Aug 2022 13:56:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 917A58E0001; Mon, 8 Aug 2022 13:56:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 846C66B0073 for ; Mon, 8 Aug 2022 13:56:24 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 6232FA017B for ; Mon, 8 Aug 2022 17:56:24 +0000 (UTC) X-FDA: 79777179888.30.F7B5814 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf28.hostedemail.com (Postfix) with ESMTP id E47D0C014F for ; Mon, 8 Aug 2022 17:56:23 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-3225b644be1so83180707b3.1 for ; Mon, 08 Aug 2022 10:56:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc; bh=Jwq0jXWhpPKY4rjZycF0kBtXykpSsmq4jMCb4fxEPCM=; b=Yyk9ueBMp+D30ALuCCr4oQg4v0Phj71m9ZUNFotBmxh5WUoEer1fIKUM0YNI8t36m/ LAKC/p9D6Ibbok4i3QPJuOcLeLuIZtfZz5hWpJ2SamU/0IqIgKUcq5tdx+B0lyBwjzWU 6rlAg4BqXREsBkDISimbX7tf6KbTpVyl8Sp7EURabchFN3gbwsiNGVRrvf2Dz2vV+YiO 3CyjWXzxMSiuTFLOXfXCyEX3jlVt/8lgkhEOc6o2d2ADrfwgq5hAGBW01/v+pi4V5E9Q Y0xIlwhyNQFlXTgiXIJE1rXkVz/ykc6WHPLIvdfzBC4Sd+2KaItcnCFA+eFXdTaHMTth ajfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc; bh=Jwq0jXWhpPKY4rjZycF0kBtXykpSsmq4jMCb4fxEPCM=; b=ghmGM39celPYSRrECltBxG+4RszIHCwjZFTJGGqAd1xFpWx9fLKOqBi8tk5w4U0OQe cH9Bu4PLJvGxw1WIF5+74rqIGpaiqNjj9pBVXaCJ5nkY3hXnlUnMp+imYvHh9ckE5dfh 3aoqn9S9hRj3On7hQ/CWegc0lOdMbksQT43v6cPiBGtTfZorI/Ear+1zNRqoyrf6urZk IHPSzCuys2tLKCbe5l/5tLQ0R9Q+fUicF6WlMtKatNBT3t9qpAVBSlMhSnrUQFesi8bM CF8n0bdbwi8rIjkXlelcsKxaqgz21sPEO+n+DXfp4qS+pFWFyABj7avcmnJ6o+2RuBmY RuPQ== X-Gm-Message-State: ACgBeo0ufse3SyBxmD25Ea3bDs8VGoIRVYKSVHP9RKXq6/dRmLb26k5Q ioheUpSau638SQzcrewVP10IGdU8DEdaWLuafCRG X-Google-Smtp-Source: AA6agR43R4xY0r1/MvCjyc43MmcPqeFzuRXsCNk0/nJZn8G74hDr1u1wkNE8kXISlycKU+NX0kObB9Nbsmb3XfCgX24r X-Received: from ajr0.svl.corp.google.com ([2620:15c:2d4:203:7a2a:3bb5:f3a0:3bbc]) (user=axelrasmussen job=sendgmr) by 2002:a25:41cb:0:b0:671:80ac:bb4c with SMTP id o194-20020a2541cb000000b0067180acbb4cmr16114083yba.24.1659981383209; Mon, 08 Aug 2022 10:56:23 -0700 (PDT) Date: Mon, 8 Aug 2022 10:56:10 -0700 In-Reply-To: <20220808175614.3885028-1-axelrasmussen@google.com> Message-Id: <20220808175614.3885028-2-axelrasmussen@google.com> Mime-Version: 1.0 References: <20220808175614.3885028-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.37.1.559.g78731f0fdb-goog Subject: [PATCH v5 1/5] selftests: vm: add hugetlb_shared userfaultfd test to run_vmtests.sh From: Axel Rasmussen To: Alexander Viro , Andrew Morton , Dave Hansen , "Dmitry V . Levin" , Gleb Fotengauer-Malinovskiy , Hugh Dickins , Jan Kara , Jonathan Corbet , Mel Gorman , Mike Kravetz , Mike Rapoport , Nadav Amit , Peter Xu , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , zhangyi Cc: Axel Rasmussen , linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, Shuah Khan ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659981384; a=rsa-sha256; cv=none; b=mNl9tIDjM8cYY4dagP/PdofRrfZT23bUUMkhXhCQf3E/J2OIgjeoa3rvb42BAK1KXXehAE cg+zV9qkajQtkZzmT8sC6JLNJwVTFjGVE95yr6mX4ndgODwj6fNc+y5wzGr7i0OVEuYaCp LIB7433dq9KBJLBcITz6Wumy1SLWiYs= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659981384; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Jwq0jXWhpPKY4rjZycF0kBtXykpSsmq4jMCb4fxEPCM=; b=JsLNm7YLf0/UkByCOdX8Ecs/UQk4w0qpmd265OMV3U8j2Lrv7gf1h13+s3JTpT/LeC7Q+7 aazhOdzKPgbX8BMb8iaeSra5jTr2W4DgJUSGNRSKpGNDIEcVOo6PiKH4ZpeeLTBv+oLc0n r9TgVhRXHSqdAo9iVVGolg4iJH76dTY= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Yyk9ueBM; spf=pass (imf28.hostedemail.com: domain of 3R07xYg0KCMYm9qx3m4y644qzs00sxq.o0yxuz69-yyw7mow.03s@flex--axelrasmussen.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3R07xYg0KCMYm9qx3m4y644qzs00sxq.o0yxuz69-yyw7mow.03s@flex--axelrasmussen.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: w1y3ag531mokezpgekibjizn4q6ociz9 X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: E47D0C014F Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Yyk9ueBM; spf=pass (imf28.hostedemail.com: domain of 3R07xYg0KCMYm9qx3m4y644qzs00sxq.o0yxuz69-yyw7mow.03s@flex--axelrasmussen.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3R07xYg0KCMYm9qx3m4y644qzs00sxq.o0yxuz69-yyw7mow.03s@flex--axelrasmussen.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1659981383-262770 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This not being included was just a simple oversight. There are certain features (like minor fault support) which are only enabled on shared mappings, so without including hugetlb_shared we actually lose a significant amount of test coverage. Reviewed-by: Shuah Khan Reviewed-by: Peter Xu Signed-off-by: Axel Rasmussen --- tools/testing/selftests/vm/run_vmtests.sh | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/vm/run_vmtests.sh b/tools/testing/selftests/vm/run_vmtests.sh index de86983b8a0f..b8e7f6f38d64 100755 --- a/tools/testing/selftests/vm/run_vmtests.sh +++ b/tools/testing/selftests/vm/run_vmtests.sh @@ -121,9 +121,11 @@ run_test ./gup_test -a run_test ./gup_test -ct -F 0x1 0 19 0x1000 run_test ./userfaultfd anon 20 16 -# Test requires source and destination huge pages. Size of source -# (half_ufd_size_MB) is passed as argument to test. +# Hugetlb tests require source and destination huge pages. Pass in half the +# size ($half_ufd_size_MB), which is used for *each*. run_test ./userfaultfd hugetlb "$half_ufd_size_MB" 32 +run_test ./userfaultfd hugetlb_shared "$half_ufd_size_MB" 32 "$mnt"/uffd-test +rm -f "$mnt"/uffd-test run_test ./userfaultfd shmem 20 16 #cleanup From patchwork Mon Aug 8 17:56:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12938954 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B06EC25B0C for ; Mon, 8 Aug 2022 17:56:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 23C488E0001; Mon, 8 Aug 2022 13:56:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1ED816B0075; Mon, 8 Aug 2022 13:56:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0650F8E0001; Mon, 8 Aug 2022 13:56:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id EC6C06B0074 for ; Mon, 8 Aug 2022 13:56:26 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C6B41120CDE for ; Mon, 8 Aug 2022 17:56:26 +0000 (UTC) X-FDA: 79777179972.15.5E0C14D Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf29.hostedemail.com (Postfix) with ESMTP id 71C0E120155 for ; Mon, 8 Aug 2022 17:56:26 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id m123-20020a253f81000000b0066ff6484995so7981759yba.22 for ; Mon, 08 Aug 2022 10:56:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc; bh=CGty+tEr/tk0rC9BCM7jIIeUNQhRK73+QrnZH4Sx+Qo=; b=We9ZS1gYNHd0gn622wtkQZkkXLSQ6n5HOJ1HPvcMn6CJe9K8/pj4UiHWzPJVnC0K0B KB6L7cVoeu5N6NnW1F34khKpIDrw1mcbvnCwQbMdIk0/CLPLXewvvLqtlmc6IzCHpSE/ /wIo6gf7S1BI8GxOK3AM7Diz6O0X77W5udYiZtxB6yoW8H43tHL3OIbXi+xHUmk/fLJw wBHTxXcUwNocVssKW/evR2A/RbNnH8Az+963jczK31ttLsFTtJZWRHzPAO6FdIvQcfhM B97mTxBSxdDrDNhW6dbeGmTnrZ8CS4D+Zo2TwtyPdB/cB3Zf/o3lOB+e5SPgAgx8QKaR l8iQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc; bh=CGty+tEr/tk0rC9BCM7jIIeUNQhRK73+QrnZH4Sx+Qo=; b=K90phJuZW7sB2mdN4cLXZ8enErShMLqgCnm9aeX4MgHncxykvMijaPc2ODHmjxS7Vb ho0NamJWhz7csEOmTsF2G6xZonQwSEKEBf263+q1jNayYPMtg3XDxzMyrSmFg9Yb/Od9 0MHlcFCc6aSeCLC1THYq4ClxvUhegLFiwgIKfXMXEg6zK8VADYQJJV39JqZJEL2KgcWo 80Pfsw+TDIdi2egoE7QAz0zKVxgtIh7+kJcKx1Azqn6B5EZgw+lTz3W4tAQG9Sy83CWE M5dn4c7eeNcp5/FecEXMjtQpaGU4agJwbN8KCml0d+USzMEQUGeOKwPnOhovMGGPEUwn seQQ== X-Gm-Message-State: ACgBeo23yk/68u93Wz2PM1wObeprTUYjQ4KU3pJ8Zn3z1/q1M4sSpK8o SbeN6DRac4WNI/NaFjKz2mip3coEpRBgtkhrePl7 X-Google-Smtp-Source: AA6agR429+nQwLS+GTRt3TL8rFi5rkaCzsd+auyC+wUJV5p/vxBojaGA7Fdoexd8LBuTTZhZ7DViwZN1XsYfSYgYIEzl X-Received: from ajr0.svl.corp.google.com ([2620:15c:2d4:203:7a2a:3bb5:f3a0:3bbc]) (user=axelrasmussen job=sendgmr) by 2002:a81:7406:0:b0:322:64d1:3035 with SMTP id p6-20020a817406000000b0032264d13035mr19673303ywc.279.1659981385704; Mon, 08 Aug 2022 10:56:25 -0700 (PDT) Date: Mon, 8 Aug 2022 10:56:11 -0700 In-Reply-To: <20220808175614.3885028-1-axelrasmussen@google.com> Message-Id: <20220808175614.3885028-3-axelrasmussen@google.com> Mime-Version: 1.0 References: <20220808175614.3885028-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.37.1.559.g78731f0fdb-goog Subject: [PATCH v5 2/5] userfaultfd: add /dev/userfaultfd for fine grained access control From: Axel Rasmussen To: Alexander Viro , Andrew Morton , Dave Hansen , "Dmitry V . Levin" , Gleb Fotengauer-Malinovskiy , Hugh Dickins , Jan Kara , Jonathan Corbet , Mel Gorman , Mike Kravetz , Mike Rapoport , Nadav Amit , Peter Xu , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , zhangyi Cc: Axel Rasmussen , linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659981386; a=rsa-sha256; cv=none; b=Cr4Cn2qXFIJy5c5LSS5dDg6o2l2tgDdvyfSOEAtV6k97plQc2rvP33ZI5L3hOLQrKaBQR9 IojUTOZpXVLdmmAW2ZfprNPoF5XIP1X7Up9ug9ZtnvL52s1lcg86SWir/OTnVE/VykFOa/ GXpXlTMTUyZec1iKCcii+RywDFOYQp0= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=We9ZS1gY; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of 3SU7xYg0KCMgoBsz5o60866s1u22uzs.q20zw18B-00y9oqy.25u@flex--axelrasmussen.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3SU7xYg0KCMgoBsz5o60866s1u22uzs.q20zw18B-00y9oqy.25u@flex--axelrasmussen.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659981386; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CGty+tEr/tk0rC9BCM7jIIeUNQhRK73+QrnZH4Sx+Qo=; b=OEbbO2I6ZbX9HmGdlidz183zfEIcN63lWtP1USabFNyYvkIcbI2apVpk5U18rTrXC8RWCs nxMsbacwreEu6+yOa/ZxDhGC2K/sruL9ACbxB4BdRhgkVJuP1kfjmy/zr+QSAQnBBrMPga v/Grd6XOz2fAydm7NLXisyv7wvtHfRI= X-Rspamd-Server: rspam10 X-Stat-Signature: 7t7pbx36wfjmaii8egncgjuciqbxotik Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=We9ZS1gY; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of 3SU7xYg0KCMgoBsz5o60866s1u22uzs.q20zw18B-00y9oqy.25u@flex--axelrasmussen.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3SU7xYg0KCMgoBsz5o60866s1u22uzs.q20zw18B-00y9oqy.25u@flex--axelrasmussen.bounces.google.com X-Rspam-User: X-Rspamd-Queue-Id: 71C0E120155 X-HE-Tag: 1659981386-968091 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Historically, it has been shown that intercepting kernel faults with userfaultfd (thereby forcing the kernel to wait for an arbitrary amount of time) can be exploited, or at least can make some kinds of exploits easier. So, in 37cd0575b8 "userfaultfd: add UFFD_USER_MODE_ONLY" we changed things so, in order for kernel faults to be handled by userfaultfd, either the process needs CAP_SYS_PTRACE, or this sysctl must be configured so that any unprivileged user can do it. In a typical implementation of a hypervisor with live migration (take QEMU/KVM as one such example), we do indeed need to be able to handle kernel faults. But, both options above are less than ideal: - Toggling the sysctl increases attack surface by allowing any unprivileged user to do it. - Granting the live migration process CAP_SYS_PTRACE gives it this ability, but *also* the ability to "observe and control the execution of another process [...], and examine and change [its] memory and registers" (from ptrace(2)). This isn't something we need or want to be able to do, so granting this permission violates the "principle of least privilege". This is all a long winded way to say: we want a more fine-grained way to grant access to userfaultfd, without granting other additional permissions at the same time. To achieve this, add a /dev/userfaultfd misc device. This device provides an alternative to the userfaultfd(2) syscall for the creation of new userfaultfds. The idea is, any userfaultfds created this way will be able to handle kernel faults, without the caller having any special capabilities. Access to this mechanism is instead restricted using e.g. standard filesystem permissions. Acked-by: Nadav Amit Acked-by: Peter Xu Signed-off-by: Axel Rasmussen Acked-by: Mike Rapoport --- fs/userfaultfd.c | 73 +++++++++++++++++++++++++------- include/uapi/linux/userfaultfd.h | 4 ++ 2 files changed, 61 insertions(+), 16 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 1c44bf75f916..698e768d5c3d 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -30,6 +30,7 @@ #include #include #include +#include int sysctl_unprivileged_userfaultfd __read_mostly; @@ -415,13 +416,8 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) if (ctx->features & UFFD_FEATURE_SIGBUS) goto out; - if ((vmf->flags & FAULT_FLAG_USER) == 0 && - ctx->flags & UFFD_USER_MODE_ONLY) { - printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd " - "sysctl knob to 1 if kernel faults must be handled " - "without obtaining CAP_SYS_PTRACE capability\n"); + if (!(vmf->flags & FAULT_FLAG_USER) && (ctx->flags & UFFD_USER_MODE_ONLY)) goto out; - } /* * If it's already released don't get it. This avoids to loop @@ -2052,20 +2048,11 @@ static void init_once_userfaultfd_ctx(void *mem) seqcount_spinlock_init(&ctx->refile_seq, &ctx->fault_pending_wqh.lock); } -SYSCALL_DEFINE1(userfaultfd, int, flags) +static int new_userfaultfd(int flags) { struct userfaultfd_ctx *ctx; int fd; - if (!sysctl_unprivileged_userfaultfd && - (flags & UFFD_USER_MODE_ONLY) == 0 && - !capable(CAP_SYS_PTRACE)) { - printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd " - "sysctl knob to 1 if kernel faults must be handled " - "without obtaining CAP_SYS_PTRACE capability\n"); - return -EPERM; - } - BUG_ON(!current->mm); /* Check the UFFD_* constants for consistency. */ @@ -2098,8 +2085,62 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) return fd; } +static inline bool userfaultfd_syscall_allowed(int flags) +{ + /* Userspace-only page faults are always allowed */ + if (flags & UFFD_USER_MODE_ONLY) + return true; + + /* + * The user is requesting a userfaultfd which can handle kernel faults. + * Privileged users are always allowed to do this. + */ + if (capable(CAP_SYS_PTRACE)) + return true; + + /* Otherwise, access to kernel fault handling is sysctl controlled. */ + return sysctl_unprivileged_userfaultfd; +} + +SYSCALL_DEFINE1(userfaultfd, int, flags) +{ + if (!userfaultfd_syscall_allowed(flags)) + return -EPERM; + + return new_userfaultfd(flags); +} + +static int userfaultfd_dev_open(struct inode *inode, struct file *file) +{ + return 0; +} + +static long userfaultfd_dev_ioctl(struct file *file, unsigned int cmd, unsigned long flags) +{ + if (cmd != USERFAULTFD_IOC_NEW) + return -EINVAL; + + return new_userfaultfd(flags); +} + +static const struct file_operations userfaultfd_dev_fops = { + .open = userfaultfd_dev_open, + .unlocked_ioctl = userfaultfd_dev_ioctl, + .compat_ioctl = userfaultfd_dev_ioctl, + .owner = THIS_MODULE, + .llseek = noop_llseek, +}; + +static struct miscdevice userfaultfd_misc = { + .minor = MISC_DYNAMIC_MINOR, + .name = "userfaultfd", + .fops = &userfaultfd_dev_fops +}; + static int __init userfaultfd_init(void) { + WARN_ON(misc_register(&userfaultfd_misc)); + userfaultfd_ctx_cachep = kmem_cache_create("userfaultfd_ctx_cache", sizeof(struct userfaultfd_ctx), 0, diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 7d32b1e797fb..005e5e306266 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -12,6 +12,10 @@ #include +/* ioctls for /dev/userfaultfd */ +#define USERFAULTFD_IOC 0xAA +#define USERFAULTFD_IOC_NEW _IO(USERFAULTFD_IOC, 0x00) + /* * If the UFFDIO_API is upgraded someday, the UFFDIO_UNREGISTER and * UFFDIO_WAKE ioctls should be defined as _IOW and not as _IOR. In From patchwork Mon Aug 8 17:56:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12938955 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0406CC25B0D for ; Mon, 8 Aug 2022 17:56:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 919E58E0002; Mon, 8 Aug 2022 13:56:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8A1DB6B0075; Mon, 8 Aug 2022 13:56:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 743248E0002; Mon, 8 Aug 2022 13:56:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 63F086B0074 for ; Mon, 8 Aug 2022 13:56:29 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 412421A015A for ; Mon, 8 Aug 2022 17:56:29 +0000 (UTC) X-FDA: 79777180098.10.CF40CFD Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf07.hostedemail.com (Postfix) with ESMTP id D48D240024 for ; Mon, 8 Aug 2022 17:56:28 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id y81-20020a253254000000b0067ba548d2a1so6308309yby.15 for ; Mon, 08 Aug 2022 10:56:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc; bh=XlgksS8fZomBMU6we0kRE5nSjV4DtoPIEs0s8v9e0Dk=; b=eV0w1ZYN780esWshwM0oQr+SMcM8X1ftPquQabqMDvmezQ0q1z2xKHKgDWX/yxeqc9 ozDUd669ncenrTTXZiSADu6HOBVVJ+hx8n1kxLTCTk97Omymh2R56EtzHiXWQ3aIW+KD 2k+KoHq5nBWEX8hgoiES2yl9f43ytrPEVoREK3tGhoY+w4usvBXzAtAleOgBr3FUZsMb 0saV5Fro8js0s6kRVa6wN3HVgaDKhirXO5QWULIWfIxTIArpnf/9nyn9MCvTUeL8rugB 1p612BVnCnDJF6UB4CkcsJiydss3klalfiJWZvzpzEhQL4Bzg78G9g9gN0FyVHuS5cTI CRcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc; bh=XlgksS8fZomBMU6we0kRE5nSjV4DtoPIEs0s8v9e0Dk=; b=jV1pMXc3D+HAeF1Xmh722EU30QGD05m83tc9ekFaWPxMO1W/YH5ByAdbjlpvnQMJWh ZcR9pfIiOdpFDR/0UtKZG2ncyjAjTQrPUPvOcyvOtQUyrqSy7KVP6XBYYKgjzLsG7jKP wYhb7OAFnSIKQhIbbBaGvAvTYnL/dB4xr0uSjR68GRaNXrvp6Be34G9lbuRhtFMjHmiD 83Hl4O83tRn1Ap26aE8TnbLH6+1xS5yO6LJV8aKlBlHfalR/K9fsDMlDcpen2xobrX1l jn+GumKyy/bF5+osEktezMSc92wgQeKR3r5IgHiRrEUI4RGHd7k8/PthZRN568Q3A4YI qb7Q== X-Gm-Message-State: ACgBeo1BSuFdffbmrJDVXHSr8njBVsKZdYUwBM4w0AZOcFGCVJpF84TP pHOX3RQ1ZLpIXjeznYWxkrD5DrK2oTXbR9NIlN6L X-Google-Smtp-Source: AA6agR6npeg0ptcuEpFjsffo8RCXbkNDOrCByOZCObVgrDxj7LPO76dCYQ9ymRjDrk6K+87ZEVPO8jVWMLgS4Kj7TBeU X-Received: from ajr0.svl.corp.google.com ([2620:15c:2d4:203:7a2a:3bb5:f3a0:3bbc]) (user=axelrasmussen job=sendgmr) by 2002:a25:3b4e:0:b0:67a:85fd:f24f with SMTP id i75-20020a253b4e000000b0067a85fdf24fmr16477068yba.51.1659981388099; Mon, 08 Aug 2022 10:56:28 -0700 (PDT) Date: Mon, 8 Aug 2022 10:56:12 -0700 In-Reply-To: <20220808175614.3885028-1-axelrasmussen@google.com> Message-Id: <20220808175614.3885028-4-axelrasmussen@google.com> Mime-Version: 1.0 References: <20220808175614.3885028-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.37.1.559.g78731f0fdb-goog Subject: [PATCH v5 3/5] userfaultfd: selftests: modify selftest to use /dev/userfaultfd From: Axel Rasmussen To: Alexander Viro , Andrew Morton , Dave Hansen , "Dmitry V . Levin" , Gleb Fotengauer-Malinovskiy , Hugh Dickins , Jan Kara , Jonathan Corbet , Mel Gorman , Mike Kravetz , Mike Rapoport , Nadav Amit , Peter Xu , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , zhangyi Cc: Axel Rasmussen , linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659981388; a=rsa-sha256; cv=none; b=5XPjRvNEk86jbcb5+JKE+4/LI3tuWs8mRNuGyBpC6MbWwvk1kWEYV+MyJ3lW0L32iUAohx SmJq4SZM+YfzGfdtIg639eZaAlXRLoIZGFkpP65xBs4dk4LVK5rWpVQnm6Wai5tIcGtmPt WdLHS2lTKKWGuNQNnUDUnffw/tTnoPg= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=eV0w1ZYN; spf=pass (imf07.hostedemail.com: domain of 3TE7xYg0KCMsrEv28r93B99v4x55x2v.t532z4BE-331Crt1.58x@flex--axelrasmussen.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3TE7xYg0KCMsrEv28r93B99v4x55x2v.t532z4BE-331Crt1.58x@flex--axelrasmussen.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659981388; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XlgksS8fZomBMU6we0kRE5nSjV4DtoPIEs0s8v9e0Dk=; b=j4LRKCTEH0O3GrwCYc3B7iss0xUc7vceIcXP4HrHdClby40W+Xye9Uhf/FzZdG3KMqt9tD +f4Fdxt86m+JWuMN5177BH5bHb8YF8JAdN9GXEigKoyimZW+NG3cSkKijGO4tTvN0sb/lE 16clmUzh7tUhfKftLbCd1FePILOOTjs= X-Rspamd-Server: rspam06 X-Rspam-User: X-Stat-Signature: c8ojsjy98qkt1djntp4gxyx9rr86zkik X-Rspamd-Queue-Id: D48D240024 Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=eV0w1ZYN; spf=pass (imf07.hostedemail.com: domain of 3TE7xYg0KCMsrEv28r93B99v4x55x2v.t532z4BE-331Crt1.58x@flex--axelrasmussen.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3TE7xYg0KCMsrEv28r93B99v4x55x2v.t532z4BE-331Crt1.58x@flex--axelrasmussen.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1659981388-680564 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We clearly want to ensure both userfaultfd(2) and /dev/userfaultfd keep working into the future, so just run the test twice, using each interface. Instead of always testing both userfaultfd(2) and /dev/userfaultfd, let the user choose which to test. As with other test features, change the behavior based on a new command line flag. Introduce the idea of "test mods", which are generic (not specific to a test type) modifications to the behavior of the test. This is sort of borrowed from this RFC patch series [1], but simplified a bit. The benefit is, in "typical" configurations this test is somewhat slow (say, 30sec or something). Testing both clearly doubles it, so it may not always be desirable, as users are likely to use one or the other, but never both, in the "real world". [1]: https://patchwork.kernel.org/project/linux-mm/patch/20201129004548.1619714-14-namit@vmware.com/ Acked-by: Peter Xu Signed-off-by: Axel Rasmussen Acked-by: Mike Rapoport --- tools/testing/selftests/vm/userfaultfd.c | 69 ++++++++++++++++++++---- 1 file changed, 60 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 7c3f1b0ab468..cae72867c173 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -77,6 +77,11 @@ static int bounces; #define TEST_SHMEM 3 static int test_type; +#define UFFD_FLAGS (O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY) + +/* test using /dev/userfaultfd, instead of userfaultfd(2) */ +static bool test_dev_userfaultfd; + /* exercise the test_uffdio_*_eexist every ALARM_INTERVAL_SECS */ #define ALARM_INTERVAL_SECS 10 static volatile bool test_uffdio_copy_eexist = true; @@ -125,6 +130,8 @@ struct uffd_stats { const char *examples = "# Run anonymous memory test on 100MiB region with 99999 bounces:\n" "./userfaultfd anon 100 99999\n\n" + "# Run the same anonymous memory test, but using /dev/userfaultfd:\n" + "./userfaultfd anon:dev 100 99999\n\n" "# Run share memory test on 1GiB region with 99 bounces:\n" "./userfaultfd shmem 1000 99\n\n" "# Run hugetlb memory test on 256MiB region with 50 bounces:\n" @@ -141,6 +148,14 @@ static void usage(void) "[hugetlbfs_file]\n\n"); fprintf(stderr, "Supported : anon, hugetlb, " "hugetlb_shared, shmem\n\n"); + fprintf(stderr, "'Test mods' can be joined to the test type string with a ':'. " + "Supported mods:\n"); + fprintf(stderr, "\tsyscall - Use userfaultfd(2) (default)\n"); + fprintf(stderr, "\tdev - Use /dev/userfaultfd instead of userfaultfd(2)\n"); + fprintf(stderr, "\nExample test mod usage:\n"); + fprintf(stderr, "# Run anonymous memory test with /dev/userfaultfd:\n"); + fprintf(stderr, "./userfaultfd anon:dev 100 99999\n\n"); + fprintf(stderr, "Examples:\n\n"); fprintf(stderr, "%s", examples); exit(1); @@ -154,12 +169,14 @@ static void usage(void) ret, __LINE__); \ } while (0) -#define err(fmt, ...) \ +#define errexit(exitcode, fmt, ...) \ do { \ _err(fmt, ##__VA_ARGS__); \ - exit(1); \ + exit(exitcode); \ } while (0) +#define err(fmt, ...) errexit(1, fmt, ##__VA_ARGS__) + static void uffd_stats_reset(struct uffd_stats *uffd_stats, unsigned long n_cpus) { @@ -383,13 +400,29 @@ static void assert_expected_ioctls_present(uint64_t mode, uint64_t ioctls) } } +static int __userfaultfd_open_dev(void) +{ + int fd, _uffd = -1; + + fd = open("/dev/userfaultfd", O_RDWR | O_CLOEXEC); + if (fd < 0) + return -1; + + _uffd = ioctl(fd, USERFAULTFD_IOC_NEW, UFFD_FLAGS); + close(fd); + return _uffd; +} + static void userfaultfd_open(uint64_t *features) { struct uffdio_api uffdio_api; - uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY); + if (test_dev_userfaultfd) + uffd = __userfaultfd_open_dev(); + else + uffd = syscall(__NR_userfaultfd, UFFD_FLAGS); if (uffd < 0) - err("userfaultfd syscall not available in this kernel"); + errexit(KSFT_SKIP, "creating userfaultfd failed"); uffd_flags = fcntl(uffd, F_GETFD, NULL); uffdio_api.api = UFFD_API; @@ -1584,8 +1617,6 @@ unsigned long default_huge_page_size(void) static void set_test_type(const char *type) { - uint64_t features = UFFD_API_FEATURES; - if (!strcmp(type, "anon")) { test_type = TEST_ANON; uffd_test_ops = &anon_uffd_test_ops; @@ -1603,9 +1634,29 @@ static void set_test_type(const char *type) test_type = TEST_SHMEM; uffd_test_ops = &shmem_uffd_test_ops; test_uffdio_minor = true; - } else { - err("Unknown test type: %s", type); } +} + +static void parse_test_type_arg(const char *raw_type) +{ + char *buf = strdup(raw_type); + uint64_t features = UFFD_API_FEATURES; + + while (buf) { + const char *token = strsep(&buf, ":"); + + if (!test_type) + set_test_type(token); + else if (!strcmp(token, "dev")) + test_dev_userfaultfd = true; + else if (!strcmp(token, "syscall")) + test_dev_userfaultfd = false; + else + err("unrecognized test mod '%s'", token); + } + + if (!test_type) + err("failed to parse test type argument: '%s'", raw_type); if (test_type == TEST_HUGETLB) page_size = default_huge_page_size(); @@ -1653,7 +1704,7 @@ int main(int argc, char **argv) err("failed to arm SIGALRM"); alarm(ALARM_INTERVAL_SECS); - set_test_type(argv[1]); + parse_test_type_arg(argv[1]); nr_cpus = sysconf(_SC_NPROCESSORS_ONLN); nr_pages_per_cpu = atol(argv[2]) * 1024*1024 / page_size / From patchwork Mon Aug 8 17:56:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12938956 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF3B6C25B08 for ; Mon, 8 Aug 2022 17:56:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4ACBD8E0003; Mon, 8 Aug 2022 13:56:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 45C7E6B0075; Mon, 8 Aug 2022 13:56:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3240F8E0003; Mon, 8 Aug 2022 13:56:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 23F806B0074 for ; Mon, 8 Aug 2022 13:56:32 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id DB28A1A0759 for ; Mon, 8 Aug 2022 17:56:31 +0000 (UTC) X-FDA: 79777180182.04.1593C0C Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf18.hostedemail.com (Postfix) with ESMTP id 596EE1C015D for ; Mon, 8 Aug 2022 17:56:31 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id k16-20020a252410000000b006718984ef63so7937632ybk.3 for ; Mon, 08 Aug 2022 10:56:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc; bh=yr7nZVaG6KnPFzU0GkO8LDPXim3Y8RycylsE6e+KYS8=; b=XTaOh2Emx3RzjrAPUqBKn7IkMuMVZE1sNV0UiYrdi4ST+a2W2wVmMRWlnckX47kh8Z 3pwWPZdwO8GQdIoktR8SRLEK3dSAejavjjUTWBsMyR+Lp5qe4E/lj1k5E4pudhXvTlU1 vWnmtqMlJJxhIo/oHFLmarz9DbuZqb7BY/Hq5+m69WpRoZFozOBzbmm16L/QQMNwNYDw /zjQVMRUVK3zklQ5u/E8z49CYBhRyjzy5hwdW98+CYPELiPv45DHhqY4YhFMqJv/CvHY fo3mzTiobWlSpgKtmrAIQxHOldOvv7ws6xTPxVwYC1JWlOe8DXo3TZGLvcyRrvwKcC6H hdPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc; bh=yr7nZVaG6KnPFzU0GkO8LDPXim3Y8RycylsE6e+KYS8=; b=nhXz9wddAmIjPSzluU/gLXC1pITwI3sd2Ne3bXkKYK9E/GttgYKSpwWnWPqwLlYBIk Ev1jhgmgyqVr9GJNyyHNUlxBUQlDuzO5wx7w6dpOCMcHco7wNOS0dY8ifokQVJ6NOrcR /jSPBTUmZ1OX5JK0o63WKCppW5gRNb+DUjAf281vaJU4LG2qZTYW0eh1I2D/MwauZqcK 4Mh1KjphYADaW/GgDTqy3OHqeBEUjp/ZYPlg8zlQvPj8JGexJ6GIB5bUrba65V/gOZOh XsjFRxKd8XMkXRBFuXOJJZnZbB6pRtxMEub/fq7bORQ1DesQ6TGed/6JKbmBzYU0P6AP Ygig== X-Gm-Message-State: ACgBeo2LuNCV33YfuMeE3JQgbj70YNmnzY8mt6ybpKQh/ZxQIkB9RXSv 0B62CqcgReMptoU5hTMTovsduW84FA7TH42msj7n X-Google-Smtp-Source: AA6agR5OFChi2bWVcsCQV2S7ZdR7lgDMmRJT1bj1DNJoylej7bIRyUqOMr7NzN7+osLysSkQ7f1ZEMtCfU38FTFoNzeM X-Received: from ajr0.svl.corp.google.com ([2620:15c:2d4:203:7a2a:3bb5:f3a0:3bbc]) (user=axelrasmussen job=sendgmr) by 2002:a25:25d8:0:b0:671:80a8:2d73 with SMTP id l207-20020a2525d8000000b0067180a82d73mr16769495ybl.125.1659981390656; Mon, 08 Aug 2022 10:56:30 -0700 (PDT) Date: Mon, 8 Aug 2022 10:56:13 -0700 In-Reply-To: <20220808175614.3885028-1-axelrasmussen@google.com> Message-Id: <20220808175614.3885028-5-axelrasmussen@google.com> Mime-Version: 1.0 References: <20220808175614.3885028-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.37.1.559.g78731f0fdb-goog Subject: [PATCH v5 4/5] userfaultfd: update documentation to describe /dev/userfaultfd From: Axel Rasmussen To: Alexander Viro , Andrew Morton , Dave Hansen , "Dmitry V . Levin" , Gleb Fotengauer-Malinovskiy , Hugh Dickins , Jan Kara , Jonathan Corbet , Mel Gorman , Mike Kravetz , Mike Rapoport , Nadav Amit , Peter Xu , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , zhangyi Cc: Axel Rasmussen , linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659981391; a=rsa-sha256; cv=none; b=L5zP/nUuXMzx69RxQwmKZG85NeO3Am/Xir4z3Qv3v4dLW6Rn6b3k79B2N4KCz5McZeWUal RNdWcPzH/FQEUL8QutHNLy9q8aHG9AGHLIF41Y17smOBy0wp+vxK0rY7D8ii4J+eAKyoIr E25ndrg3kbS/NEvxQxe5HMvpwHgX6AI= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=XTaOh2Em; spf=pass (imf18.hostedemail.com: domain of 3Tk7xYg0KCM0tGx4AtB5DBBx6z77z4x.v75416DG-553Etv3.7Az@flex--axelrasmussen.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3Tk7xYg0KCM0tGx4AtB5DBBx6z77z4x.v75416DG-553Etv3.7Az@flex--axelrasmussen.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659981391; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yr7nZVaG6KnPFzU0GkO8LDPXim3Y8RycylsE6e+KYS8=; b=HNUgX0uD7jr6ZPTEUMZ8Vbv9+PXKiWvx2bmUXg67kzOypjliT7EEKrZucYGaYZUhaApdj7 +lLDuzzMS9XYh0G3xy/2fPNxLXCXZNZIuWIVwa70ejOA4FgkDkwl2vvA8D971lIvVQIEbH zPlJPbYj9Q8WKlo3ywFIdYukoDgRAak= X-Rspamd-Server: rspam06 X-Rspam-User: X-Stat-Signature: 69wte6dcdnyzhdohd6xwtx9sqy57erjy X-Rspamd-Queue-Id: 596EE1C015D Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=XTaOh2Em; spf=pass (imf18.hostedemail.com: domain of 3Tk7xYg0KCM0tGx4AtB5DBBx6z77z4x.v75416DG-553Etv3.7Az@flex--axelrasmussen.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3Tk7xYg0KCM0tGx4AtB5DBBx6z77z4x.v75416DG-553Etv3.7Az@flex--axelrasmussen.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1659981391-822388 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Explain the different ways to create a new userfaultfd, and how access control works for each way. Acked-by: Peter Xu Signed-off-by: Axel Rasmussen --- Documentation/admin-guide/mm/userfaultfd.rst | 41 ++++++++++++++++++-- Documentation/admin-guide/sysctl/vm.rst | 3 ++ 2 files changed, 41 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst index 6528036093e1..a76c9dc1865b 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -17,7 +17,10 @@ of the ``PROT_NONE+SIGSEGV`` trick. Design ====== -Userfaults are delivered and resolved through the ``userfaultfd`` syscall. +Userspace creates a new userfaultfd, initializes it, and registers one or more +regions of virtual memory with it. Then, any page faults which occur within the +region(s) result in a message being delivered to the userfaultfd, notifying +userspace of the fault. The ``userfaultfd`` (aside from registering and unregistering virtual memory ranges) provides two primary functionalities: @@ -34,12 +37,11 @@ The real advantage of userfaults if compared to regular virtual memory management of mremap/mprotect is that the userfaults in all their operations never involve heavyweight structures like vmas (in fact the ``userfaultfd`` runtime load never takes the mmap_lock for writing). - Vmas are not suitable for page- (or hugepage) granular fault tracking when dealing with virtual address spaces that could span Terabytes. Too many vmas would be needed for that. -The ``userfaultfd`` once opened by invoking the syscall, can also be +The ``userfaultfd``, once created, can also be passed using unix domain sockets to a manager process, so the same manager process could handle the userfaults of a multitude of different processes without them being aware about what is going on @@ -50,6 +52,39 @@ is a corner case that would currently return ``-EBUSY``). API === +Creating a userfaultfd +---------------------- + +There are two ways to create a new userfaultfd, each of which provide ways to +restrict access to this functionality (since historically userfaultfds which +handle kernel page faults have been a useful tool for exploiting the kernel). + +The first way, supported since userfaultfd was introduced, is the +userfaultfd(2) syscall. Access to this is controlled in several ways: + +- Any user can always create a userfaultfd which traps userspace page faults + only. Such a userfaultfd can be created using the userfaultfd(2) syscall + with the flag UFFD_USER_MODE_ONLY. + +- In order to also trap kernel page faults for the address space, then either + the process needs the CAP_SYS_PTRACE capability, or the system must have + vm.unprivileged_userfaultfd set to 1. By default, vm.unprivileged_userfaultfd + is set to 0. + +The second way, added to the kernel more recently, is by opening and issuing a +USERFAULTFD_IOC_NEW ioctl to /dev/userfaultfd. This method yields equivalent +userfaultfds to the userfaultfd(2) syscall. + +Unlike userfaultfd(2), access to /dev/userfaultfd is controlled via normal +filesystem permissions (user/group/mode), which gives fine grained access to +userfaultfd specifically, without also granting other unrelated privileges at +the same time (as e.g. granting CAP_SYS_PTRACE would do). Users who have access +to /dev/userfaultfd can always create userfaultfds that trap kernel page faults; +vm.unprivileged_userfaultfd is not considered. + +Initializing a userfaultfd +-------------------------- + When first opened the ``userfaultfd`` must be enabled invoking the ``UFFDIO_API`` ioctl specifying a ``uffdio_api.api`` value set to ``UFFD_API`` (or a later API version) which will specify the ``read/POLLIN`` protocol diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index f74f722ad702..b3e40b42e1b3 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -927,6 +927,9 @@ calls without any restrictions. The default value is 0. +Another way to control permissions for userfaultfd is to use +/dev/userfaultfd instead of userfaultfd(2). See +Documentation/admin-guide/mm/userfaultfd.rst. user_reserve_kbytes =================== From patchwork Mon Aug 8 17:56:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12938957 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3144BC00140 for ; Mon, 8 Aug 2022 17:56:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C27E88E0005; Mon, 8 Aug 2022 13:56:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BD65E6B0075; Mon, 8 Aug 2022 13:56:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A77468E0005; Mon, 8 Aug 2022 13:56:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 987F36B0074 for ; Mon, 8 Aug 2022 13:56:34 -0400 (EDT) Received: from smtpin31.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 60EDA1C2B17 for ; Mon, 8 Aug 2022 17:56:34 +0000 (UTC) X-FDA: 79777180308.31.AEA5B74 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf07.hostedemail.com (Postfix) with ESMTP id 0732640158 for ; Mon, 8 Aug 2022 17:56:33 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-3283109eae2so83504177b3.15 for ; Mon, 08 Aug 2022 10:56:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc; bh=D9R60mZ4udMapF+NpFfSHyOa59OoX0uMh+RPjpQzj1Q=; b=eNRxM030DKLOhrtL5bqYSMNughx6a2UFrcdYQNOMy2xGwEM8RIYbzDgO0Qr+cRrlG7 leQpFy4g/mdb/tV0YTxYoL1QksZqc0IxxJg0F3wxsSX1ccKSaYzblLNYO6keoTM8Req6 cajjuxZ1ZB7s1XMrbeIoJVspmPJzRx4I6ee+JExSGu3vzVquaX6zNwS1ls9m58bW6oRu w2kL8A1B7NhrHkoLMksRCh7L7msDEXutktd6wE5DOs2Sfh1C/tAS+/lhZV9YGI6OI1HK ZpwKo2QlPJonELBxUwyauzx1CTN7Nbn2HpJzaUwn7+t1OTlJI0M0LNoOhXINFzQgP+n9 Y/Fw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc; bh=D9R60mZ4udMapF+NpFfSHyOa59OoX0uMh+RPjpQzj1Q=; b=nAJVmsCdxHBYeEIJRWWWtZwFQTds9JOyaUWZjTtY9TWFnseWqO9ETb5mL0Gzkc/955 N+QJEgmw5RiDl67a1v3QAcJNJLu+Lub48xy4wb1Ox4yBi5CaGvhK9hw72PuEXHLZjXuD wWf/k57kQer35cUUt6MdFWt9AT7MoXCHvFTCjE81y74QyYzYMIxeUgo/1vnxIwqD6YPQ EkTJQRkK1fngqjxAMnKg7LCg9PUh4POzQ+K9RQix46AqhhaTtPoomJZ5iuoXO0u9qTRq X9ud0a9YmAIILuLyasoeTRj+ke4l9XnJisKFZCjiqE3x9DnBvspZeRbas/VSEkrlfReK Rgcg== X-Gm-Message-State: ACgBeo3YpLfB+M1f15Ecw6wA2HWYmdKztdmFGMTTSzzyb03xTAqs0Xp7 v2TEVP7MDr0WIiEI61Xh7HXQPANLmOS0irA5KNMd X-Google-Smtp-Source: AA6agR5KTaywCydCwtykG2+H6b+nQdzztDWi7EZVgv5coCe9zqkxOlqe6qANypuRRJkuoUBTS0lz5yfLxOdFKFcdDmID X-Received: from ajr0.svl.corp.google.com ([2620:15c:2d4:203:7a2a:3bb5:f3a0:3bbc]) (user=axelrasmussen job=sendgmr) by 2002:a0d:e881:0:b0:31f:3bff:2224 with SMTP id r123-20020a0de881000000b0031f3bff2224mr19837540ywe.302.1659981393328; Mon, 08 Aug 2022 10:56:33 -0700 (PDT) Date: Mon, 8 Aug 2022 10:56:14 -0700 In-Reply-To: <20220808175614.3885028-1-axelrasmussen@google.com> Message-Id: <20220808175614.3885028-6-axelrasmussen@google.com> Mime-Version: 1.0 References: <20220808175614.3885028-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.37.1.559.g78731f0fdb-goog Subject: [PATCH v5 5/5] selftests: vm: add /dev/userfaultfd test cases to run_vmtests.sh From: Axel Rasmussen To: Alexander Viro , Andrew Morton , Dave Hansen , "Dmitry V . Levin" , Gleb Fotengauer-Malinovskiy , Hugh Dickins , Jan Kara , Jonathan Corbet , Mel Gorman , Mike Kravetz , Mike Rapoport , Nadav Amit , Peter Xu , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , zhangyi Cc: Axel Rasmussen , linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, Shuah Khan ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659981394; a=rsa-sha256; cv=none; b=2aagBFlfbSYTrNJZyn74yACqy7wiuYKqBjKofVal6b1YIpDb8zwHBeapzRtfoxjwgzAAYP T1e5G9/g5h6CCnWuArnjhbQL8UR6GHyUiYqPuPKOo0Uai78jVjKywvF55aUZbi1V1KZOk+ pjOdHXzTjyGM9K6/yUqNejwTIGxz1jw= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=eNRxM030; spf=pass (imf07.hostedemail.com: domain of 3UU7xYg0KCNAwJ07DwE8GEE092AA270.yA8749GJ-886Hwy6.AD2@flex--axelrasmussen.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3UU7xYg0KCNAwJ07DwE8GEE092AA270.yA8749GJ-886Hwy6.AD2@flex--axelrasmussen.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659981394; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=D9R60mZ4udMapF+NpFfSHyOa59OoX0uMh+RPjpQzj1Q=; b=ET+gi6mEPEJaDIsNi6DuPtu6xR/F6GSa2brqHZ0JE+HgFvhFISjpN2exqydOCDf+z/I6Se zwxha3s69YIpqSpyIsvSfOxQZGQAjw41TO1aaYcc/yJVPLt85DCae9CbBdqw+OQSWYIoDZ NIwAFIoLTfKZV7SYDVPaZvA197b34Ko= X-Stat-Signature: b4jyto1e4iw49d6of8metzzfpmhzudgs X-Rspamd-Queue-Id: 0732640158 Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=eNRxM030; spf=pass (imf07.hostedemail.com: domain of 3UU7xYg0KCNAwJ07DwE8GEE092AA270.yA8749GJ-886Hwy6.AD2@flex--axelrasmussen.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3UU7xYg0KCNAwJ07DwE8GEE092AA270.yA8749GJ-886Hwy6.AD2@flex--axelrasmussen.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1659981393-918875 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This new mode was recently added to the userfaultfd selftest. We want to exercise both userfaultfd(2) as well as /dev/userfaultfd, so add both test cases to the script. Reviewed-by: Shuah Khan Acked-by: Peter Xu Signed-off-by: Axel Rasmussen --- tools/testing/selftests/vm/run_vmtests.sh | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/tools/testing/selftests/vm/run_vmtests.sh b/tools/testing/selftests/vm/run_vmtests.sh index b8e7f6f38d64..e780e76c26b8 100755 --- a/tools/testing/selftests/vm/run_vmtests.sh +++ b/tools/testing/selftests/vm/run_vmtests.sh @@ -120,13 +120,16 @@ run_test ./gup_test -a # Dump pages 0, 19, and 4096, using pin_user_pages: run_test ./gup_test -ct -F 0x1 0 19 0x1000 -run_test ./userfaultfd anon 20 16 -# Hugetlb tests require source and destination huge pages. Pass in half the -# size ($half_ufd_size_MB), which is used for *each*. -run_test ./userfaultfd hugetlb "$half_ufd_size_MB" 32 -run_test ./userfaultfd hugetlb_shared "$half_ufd_size_MB" 32 "$mnt"/uffd-test -rm -f "$mnt"/uffd-test -run_test ./userfaultfd shmem 20 16 +uffd_mods=("" ":dev") +for mod in "${uffd_mods[@]}"; do + run_test ./userfaultfd anon${mod} 20 16 + # Hugetlb tests require source and destination huge pages. Pass in half + # the size ($half_ufd_size_MB), which is used for *each*. + run_test ./userfaultfd hugetlb${mod} "$half_ufd_size_MB" 32 + run_test ./userfaultfd hugetlb_shared${mod} "$half_ufd_size_MB" 32 "$mnt"/uffd-test + rm -f "$mnt"/uffd-test + run_test ./userfaultfd shmem${mod} 20 16 +done #cleanup umount "$mnt"