From patchwork Wed Aug 17 21:47:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12946521 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E439CC32772 for ; Wed, 17 Aug 2022 21:47:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 75C296B0073; Wed, 17 Aug 2022 17:47:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 70B3A8D0002; Wed, 17 Aug 2022 17:47:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5B5EC6B0075; Wed, 17 Aug 2022 17:47:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 4B7226B0073 for ; Wed, 17 Aug 2022 17:47:36 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1C6121216F1 for ; Wed, 17 Aug 2022 21:47:36 +0000 (UTC) X-FDA: 79810421670.16.F06FEEB Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf27.hostedemail.com (Postfix) with ESMTP id A3E9240057 for ; Wed, 17 Aug 2022 21:47:35 +0000 (UTC) Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-335420c7bfeso43111387b3.16 for ; Wed, 17 Aug 2022 14:47:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc; bh=cU7/6qINBgFAfTXfLa178LOY1yjmrSCVESmZNe7FU4M=; b=hkkW+cPBtBwBtd13GQEN9rmENzaMrtG41NPJb/PgUS0IMKflR0ybOP+DOGkh+tegM3 KBc0tw/HnrcDhmSWECnXug+OFT36fJNH86hY9N4cveCHsYT2LEukQiA09NHf7PSze2Ze gZ6LkYMOcbz5YFTbs94rJ/F5oy6jy0zQu2NiuE5bYAa5RNRIKLPf9+P9+LOt+MgI9FBW Sp/z8rhpongmYqFSVPDBXxTdr6wo5uIPwD7YOlJ9g45SO426YhCNSRnTzgnZf4HMx+Qs ciFa5HfjDwGNQvaxJcZqAUAIcrXkTU0CcM/vlFfkF1C7iOesvdf8kAzEvWJPbtltnegU mAiw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc; bh=cU7/6qINBgFAfTXfLa178LOY1yjmrSCVESmZNe7FU4M=; b=mwvxI5qFN+Dfpr0LZSQAD6mq6hcp5M/anRsTCc+GqI1c4HcwvA1c63ZtE82+QKHnmj Bk4jHf7XAWoNf4E7/LBSbwuXnMDoyuZC+vZS/U1d2Lzp7Ig11FT/QXwSBBaETuXJi+aR vBEfMG3BFy3+hNhzjeR90XeL597MuJcxapUTx/Lh6m7+hNO46tiKc11GCsb3gMRP4HOQ 5K2YjB+nk2BmvASWHfVpqeDlEquwojbsf4uyTcE3iduJK4EcjdFpZW/da1nbW/TFoMZn db7i0T8TjFMwq05s2k2tPDmerD7NW4I2+AdeFpPgXNpfii6k0p9w7f5E/PMkctvS0N44 TNsw== X-Gm-Message-State: ACgBeo1IgkbV0/82KxNB9Lnv3Lzjrp0VMugvh3FE+uwKk4mXm10fs6a+ OmEj43SUThzc1UF2GU2ea07Vm5EQtpt0A7FwJejk X-Google-Smtp-Source: AA6agR6m2oPxeL8QFTcUaPmrHHrUfsAkYPB07Q9g2heOngwoQkb+NCEv/HhiuzywTvWsUou7cQl2z0Qm9xTtUBZYcsCH X-Received: from ajr0.svl.corp.google.com ([2620:15c:2d4:203:2f41:f176:4bac:b729]) (user=axelrasmussen job=sendgmr) by 2002:a81:ae0a:0:b0:324:59ab:feec with SMTP id m10-20020a81ae0a000000b0032459abfeecmr144125ywh.7.1660772854950; Wed, 17 Aug 2022 14:47:34 -0700 (PDT) Date: Wed, 17 Aug 2022 14:47:24 -0700 In-Reply-To: <20220817214728.489904-1-axelrasmussen@google.com> Message-Id: <20220817214728.489904-2-axelrasmussen@google.com> Mime-Version: 1.0 References: <20220817214728.489904-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.37.1.595.g718a3a8f04-goog Subject: [PATCH v6 1/5] selftests: vm: add hugetlb_shared userfaultfd test to run_vmtests.sh From: Axel Rasmussen To: Alexander Viro , Andrew Morton , Dave Hansen , "Dmitry V . Levin" , Gleb Fotengauer-Malinovskiy , Hugh Dickins , Jan Kara , Jonathan Corbet , Mel Gorman , Mike Kravetz , Mike Rapoport , Nadav Amit , Peter Xu , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , zhangyi Cc: Axel Rasmussen , linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, Shuah Khan ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=hkkW+cPB; spf=pass (imf27.hostedemail.com: domain of 39mH9Yg0KCM0tGx4AtB5DBBx6z77z4x.v75416DG-553Etv3.7Az@flex--axelrasmussen.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=39mH9Yg0KCM0tGx4AtB5DBBx6z77z4x.v75416DG-553Etv3.7Az@flex--axelrasmussen.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660772855; a=rsa-sha256; cv=none; b=TPIAyqH0Fba20PHdEh3viIFF4tsXCIHBiQX7kMhJEqzSWO93/5zdQyuIOBLVnBz7w4kzth AslYhFW8od8Mfx4NP/QLuuOzNOAMALXIrzssgzB5LkCu18rXSTW6MaO7bXIdpZIhAUlw8c 8xF/BnzeyVA1IRpUmtk9ycQI+/9nOws= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660772855; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cU7/6qINBgFAfTXfLa178LOY1yjmrSCVESmZNe7FU4M=; b=OavT1G6HBK9udob92Wdpvz5hMFfCzXW2XBY1BImVvg39drC/K5SOrmbyodSm3Qui0Uej6I eIXfJiO+dzQaY3Ohi4jZG+QuKqnAzW8s8fH9qaD3lCFh/r0oTkjHIV3HzkZx2yWvsHXX4G /0Fie7kmNLuCzsyf6Jl5P3MkjqfeDZ4= Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=hkkW+cPB; spf=pass (imf27.hostedemail.com: domain of 39mH9Yg0KCM0tGx4AtB5DBBx6z77z4x.v75416DG-553Etv3.7Az@flex--axelrasmussen.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=39mH9Yg0KCM0tGx4AtB5DBBx6z77z4x.v75416DG-553Etv3.7Az@flex--axelrasmussen.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Stat-Signature: z53s8mcyn8bz6ie7ajo5xy6hhpef69n9 X-Rspamd-Queue-Id: A3E9240057 X-Rspamd-Server: rspam05 X-HE-Tag: 1660772855-533447 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This not being included was just a simple oversight. There are certain features (like minor fault support) which are only enabled on shared mappings, so without including hugetlb_shared we actually lose a significant amount of test coverage. Reviewed-by: Shuah Khan Reviewed-by: Peter Xu Signed-off-by: Axel Rasmussen --- tools/testing/selftests/vm/run_vmtests.sh | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/vm/run_vmtests.sh b/tools/testing/selftests/vm/run_vmtests.sh index de86983b8a0f..b8e7f6f38d64 100755 --- a/tools/testing/selftests/vm/run_vmtests.sh +++ b/tools/testing/selftests/vm/run_vmtests.sh @@ -121,9 +121,11 @@ run_test ./gup_test -a run_test ./gup_test -ct -F 0x1 0 19 0x1000 run_test ./userfaultfd anon 20 16 -# Test requires source and destination huge pages. Size of source -# (half_ufd_size_MB) is passed as argument to test. +# Hugetlb tests require source and destination huge pages. Pass in half the +# size ($half_ufd_size_MB), which is used for *each*. run_test ./userfaultfd hugetlb "$half_ufd_size_MB" 32 +run_test ./userfaultfd hugetlb_shared "$half_ufd_size_MB" 32 "$mnt"/uffd-test +rm -f "$mnt"/uffd-test run_test ./userfaultfd shmem 20 16 #cleanup From patchwork Wed Aug 17 21:47:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12946522 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB8C1C32772 for ; Wed, 17 Aug 2022 21:47:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 33D278D0003; Wed, 17 Aug 2022 17:47:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2ED578D0002; Wed, 17 Aug 2022 17:47:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 166978D0003; Wed, 17 Aug 2022 17:47:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 091708D0002 for ; Wed, 17 Aug 2022 17:47:39 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id CC32BAC1E1 for ; Wed, 17 Aug 2022 21:47:38 +0000 (UTC) X-FDA: 79810421796.11.6164092 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf17.hostedemail.com (Postfix) with ESMTP id 56B94401C9 for ; Wed, 17 Aug 2022 21:47:38 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-335cf0fd1a4so26664687b3.11 for ; Wed, 17 Aug 2022 14:47:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc; bh=yLHz07eKQlXccHA6tj5KlO991qOgQ0D3ylm0UoRMXpc=; b=AFkI/b8HdD4q3yRB5+26vHFE9o4APehNtEu0VteYUmrgnOysiE3ghPo13NauvFGU9n Zr4npnGPfTSuPwggNDIjAS1jpr7fXLGgPhN6rpXH0hR6az+rxAtX4665tCjq2Tll3Szj 5r2uXOvejkfBsUxd9KquMV6pNcQqlTmPKGJIhBSFekf5LE7mg1z6z6LMfYrkiA63SW65 QR/24mlh+d/YbrjknvYPY67p2rwdns6PHNRwVZGM6X7gOMsoeKkOBUOWNsF6LmpwW1Rd 59smdhXwVPyn1YCFGYo9dDW61ASNf2SU1kRSxBeBjGFKddtgXZzcK4rHcAeN+fcAO4l3 eUPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc; bh=yLHz07eKQlXccHA6tj5KlO991qOgQ0D3ylm0UoRMXpc=; b=LvyTlB5dnUjPpUAc6yfjab2qViKQsIsdLB0oeGJg2MSTTgqfvStvLbXJV+2WFayf+M UuT7O8hihqx2YqSOpxKLcwQ2trfHmcJM9lNtij8OygSQjbYLD6z9RGU98RRAo1RKLOlE s574EFmIVYcurLe20PzcnK3tJWfCvEr66C2PX1e3bgqfTyrdCSilBkm2FTiR9lNtOPGv 6L+iNyqWEZ8Dsw5W/WOMuRj/ARnWqdbEvQhJClrmU8hwhJiXSaeixaCGtiAfOycXnj4a Gsht5smzCFQkNlHAE58IYrYPjwhBdN0VcVZjw1oFSDDo4RnD0sH8pi5ijvuNmgFwirbh CmgQ== X-Gm-Message-State: ACgBeo0JoXWksogsoILa0vQA9JYSBFboimZyKOXeB5e+Vk0G4lv9xxnr J+uA4XqVVCvvbMsuVVeCQd8KwZpZOIXGpmPd935h X-Google-Smtp-Source: AA6agR7ePwk0i6PZYAKjGabK73VjHMKkbAObzZxln/nSCiLmaTdDEOnI/dPYoQiSFSwMRVhwJWrsB0QOC+QFM9QDU20V X-Received: from ajr0.svl.corp.google.com ([2620:15c:2d4:203:2f41:f176:4bac:b729]) (user=axelrasmussen job=sendgmr) by 2002:a05:6902:1206:b0:676:e465:24b1 with SMTP id s6-20020a056902120600b00676e46524b1mr204503ybu.323.1660772857476; Wed, 17 Aug 2022 14:47:37 -0700 (PDT) Date: Wed, 17 Aug 2022 14:47:25 -0700 In-Reply-To: <20220817214728.489904-1-axelrasmussen@google.com> Message-Id: <20220817214728.489904-3-axelrasmussen@google.com> Mime-Version: 1.0 References: <20220817214728.489904-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.37.1.595.g718a3a8f04-goog Subject: [PATCH v6 2/5] userfaultfd: add /dev/userfaultfd for fine grained access control From: Axel Rasmussen To: Alexander Viro , Andrew Morton , Dave Hansen , "Dmitry V . Levin" , Gleb Fotengauer-Malinovskiy , Hugh Dickins , Jan Kara , Jonathan Corbet , Mel Gorman , Mike Kravetz , Mike Rapoport , Nadav Amit , Peter Xu , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , zhangyi Cc: Axel Rasmussen , linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, Mike Rapoport ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="AFkI/b8H"; spf=pass (imf17.hostedemail.com: domain of 3-WH9Yg0KCNAwJ07DwE8GEE092AA270.yA8749GJ-886Hwy6.AD2@flex--axelrasmussen.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3-WH9Yg0KCNAwJ07DwE8GEE092AA270.yA8749GJ-886Hwy6.AD2@flex--axelrasmussen.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660772857; a=rsa-sha256; cv=none; b=UZ9/q9vphvdSwBchiXtGnMwMxirhpNK0SNuwxnLZrcYO5ou/6Cop9ZvBY9hp80RCbWR1iS a32SxBikoHEPVaQrHEV2QRGqob8kWsJ9b0vZUeD/OtB+63JkFJ55juNvEtNlF8q9sAPyOT +f0v4hIijVlKBCANVCkjiguFddqt8zw= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660772857; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yLHz07eKQlXccHA6tj5KlO991qOgQ0D3ylm0UoRMXpc=; b=d1V75kp73+/dmHVdkEqU5xPfkJGuVtlIfoIU0vr+uyp1ifFDLMfhMSftHGeam7Gxu5GwXZ fS/34CUfPoaw/8NSrj/d8juzYsYaRj8noHNojhS/v4YSX2YsCtwROVYBXV6RWii/DT6E4Q 2XbQRIPOgwNiPd3I/Sh/uBL419+hj5I= X-Stat-Signature: rwf1zdu5ozqycgh5eyqoo9o149chukg4 X-Rspam-User: Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="AFkI/b8H"; spf=pass (imf17.hostedemail.com: domain of 3-WH9Yg0KCNAwJ07DwE8GEE092AA270.yA8749GJ-886Hwy6.AD2@flex--axelrasmussen.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3-WH9Yg0KCNAwJ07DwE8GEE092AA270.yA8749GJ-886Hwy6.AD2@flex--axelrasmussen.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Queue-Id: 56B94401C9 X-Rspamd-Server: rspam03 X-HE-Tag: 1660772858-944318 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Historically, it has been shown that intercepting kernel faults with userfaultfd (thereby forcing the kernel to wait for an arbitrary amount of time) can be exploited, or at least can make some kinds of exploits easier. So, in 37cd0575b8 "userfaultfd: add UFFD_USER_MODE_ONLY" we changed things so, in order for kernel faults to be handled by userfaultfd, either the process needs CAP_SYS_PTRACE, or this sysctl must be configured so that any unprivileged user can do it. In a typical implementation of a hypervisor with live migration (take QEMU/KVM as one such example), we do indeed need to be able to handle kernel faults. But, both options above are less than ideal: - Toggling the sysctl increases attack surface by allowing any unprivileged user to do it. - Granting the live migration process CAP_SYS_PTRACE gives it this ability, but *also* the ability to "observe and control the execution of another process [...], and examine and change [its] memory and registers" (from ptrace(2)). This isn't something we need or want to be able to do, so granting this permission violates the "principle of least privilege". This is all a long winded way to say: we want a more fine-grained way to grant access to userfaultfd, without granting other additional permissions at the same time. To achieve this, add a /dev/userfaultfd misc device. This device provides an alternative to the userfaultfd(2) syscall for the creation of new userfaultfds. The idea is, any userfaultfds created this way will be able to handle kernel faults, without the caller having any special capabilities. Access to this mechanism is instead restricted using e.g. standard filesystem permissions. Acked-by: Mike Rapoport Acked-by: Nadav Amit Acked-by: Peter Xu Signed-off-by: Axel Rasmussen --- fs/userfaultfd.c | 73 +++++++++++++++++++++++++------- include/uapi/linux/userfaultfd.h | 4 ++ 2 files changed, 61 insertions(+), 16 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 1c44bf75f916..698e768d5c3d 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -30,6 +30,7 @@ #include #include #include +#include int sysctl_unprivileged_userfaultfd __read_mostly; @@ -415,13 +416,8 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) if (ctx->features & UFFD_FEATURE_SIGBUS) goto out; - if ((vmf->flags & FAULT_FLAG_USER) == 0 && - ctx->flags & UFFD_USER_MODE_ONLY) { - printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd " - "sysctl knob to 1 if kernel faults must be handled " - "without obtaining CAP_SYS_PTRACE capability\n"); + if (!(vmf->flags & FAULT_FLAG_USER) && (ctx->flags & UFFD_USER_MODE_ONLY)) goto out; - } /* * If it's already released don't get it. This avoids to loop @@ -2052,20 +2048,11 @@ static void init_once_userfaultfd_ctx(void *mem) seqcount_spinlock_init(&ctx->refile_seq, &ctx->fault_pending_wqh.lock); } -SYSCALL_DEFINE1(userfaultfd, int, flags) +static int new_userfaultfd(int flags) { struct userfaultfd_ctx *ctx; int fd; - if (!sysctl_unprivileged_userfaultfd && - (flags & UFFD_USER_MODE_ONLY) == 0 && - !capable(CAP_SYS_PTRACE)) { - printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd " - "sysctl knob to 1 if kernel faults must be handled " - "without obtaining CAP_SYS_PTRACE capability\n"); - return -EPERM; - } - BUG_ON(!current->mm); /* Check the UFFD_* constants for consistency. */ @@ -2098,8 +2085,62 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) return fd; } +static inline bool userfaultfd_syscall_allowed(int flags) +{ + /* Userspace-only page faults are always allowed */ + if (flags & UFFD_USER_MODE_ONLY) + return true; + + /* + * The user is requesting a userfaultfd which can handle kernel faults. + * Privileged users are always allowed to do this. + */ + if (capable(CAP_SYS_PTRACE)) + return true; + + /* Otherwise, access to kernel fault handling is sysctl controlled. */ + return sysctl_unprivileged_userfaultfd; +} + +SYSCALL_DEFINE1(userfaultfd, int, flags) +{ + if (!userfaultfd_syscall_allowed(flags)) + return -EPERM; + + return new_userfaultfd(flags); +} + +static int userfaultfd_dev_open(struct inode *inode, struct file *file) +{ + return 0; +} + +static long userfaultfd_dev_ioctl(struct file *file, unsigned int cmd, unsigned long flags) +{ + if (cmd != USERFAULTFD_IOC_NEW) + return -EINVAL; + + return new_userfaultfd(flags); +} + +static const struct file_operations userfaultfd_dev_fops = { + .open = userfaultfd_dev_open, + .unlocked_ioctl = userfaultfd_dev_ioctl, + .compat_ioctl = userfaultfd_dev_ioctl, + .owner = THIS_MODULE, + .llseek = noop_llseek, +}; + +static struct miscdevice userfaultfd_misc = { + .minor = MISC_DYNAMIC_MINOR, + .name = "userfaultfd", + .fops = &userfaultfd_dev_fops +}; + static int __init userfaultfd_init(void) { + WARN_ON(misc_register(&userfaultfd_misc)); + userfaultfd_ctx_cachep = kmem_cache_create("userfaultfd_ctx_cache", sizeof(struct userfaultfd_ctx), 0, diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 7d32b1e797fb..005e5e306266 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -12,6 +12,10 @@ #include +/* ioctls for /dev/userfaultfd */ +#define USERFAULTFD_IOC 0xAA +#define USERFAULTFD_IOC_NEW _IO(USERFAULTFD_IOC, 0x00) + /* * If the UFFDIO_API is upgraded someday, the UFFDIO_UNREGISTER and * UFFDIO_WAKE ioctls should be defined as _IOW and not as _IOR. In From patchwork Wed Aug 17 21:47:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12946523 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0405C3F6B0 for ; Wed, 17 Aug 2022 21:47:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 427388D0005; Wed, 17 Aug 2022 17:47:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3AF368D0002; Wed, 17 Aug 2022 17:47:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1DB5E8D0005; Wed, 17 Aug 2022 17:47:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 0D93B8D0002 for ; Wed, 17 Aug 2022 17:47:41 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id D39DFABCC3 for ; Wed, 17 Aug 2022 21:47:40 +0000 (UTC) X-FDA: 79810421880.26.70E7259 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf15.hostedemail.com (Postfix) with ESMTP id 6E3E9A01E0 for ; Wed, 17 Aug 2022 21:47:40 +0000 (UTC) Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-3339532b6a8so94291707b3.1 for ; Wed, 17 Aug 2022 14:47:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc; bh=yddoa+559b8nrUIn5RkzA6LuoLnD/Z+9mfZhmJLmujQ=; b=m9tiA0nPg/s7VYVOjcdjDNni4budWfetLx4rz82pZbVDwEPrPwRNE+d0za8Ah41wVX fAZ9qV7UwzKqhXTuh15M/FMYRAjV5W8R9lY//jQ5RJ+qm9Pu+o+KpTelgenH5TAnKE4X cvQWA++nmAxmU3CLPfqnSRY+N+KY8PPFoBLYeOwn7NxThxdLKXwGQkhEwwKZ4utGheye gRVg8rFECj25MeBphygKyPxeUprIcAXTnCEkY8F1Sz9BHEmIldeeg5VjXBGC2pnmhIOp 48WWLZ9QbUGdbla+SIs+rDOPKuT+EF2+7NZ004G3S1o0JwvkQ3HqTFd0RUz1R9bkouNS zmCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc; bh=yddoa+559b8nrUIn5RkzA6LuoLnD/Z+9mfZhmJLmujQ=; b=G3DkSyMNsFsiDX9y/4dnydWj3yo6QEalCb63A5/k5KMXmLENqt7s8xYwHq5I9TxZqC vxPhjBGNA7/g4rhDayAr9f59gweIYU2CnwSTNN60DSOY8CJr+bkkDm31KdGRroSYZMHW apa2Xutw+RFHu/EohiIFdSbehZlzhZ9Oq7A+t3SO81A0HYE8J3fjByDTx6OHs3E6PTx1 ifHrsNV7woC04i8ACjzz1+ZBIAUkH2LmTRsircol3BRo2ab5Zdu6AgErOpvA2F27HEJv FEb2R03hNUIvfFvjUZY5YqpwipEF1slfDdzSX1Q5g5s3vSQGRZ2NxW0o8vht3QWa1vpu id4A== X-Gm-Message-State: ACgBeo0/dq2Wl1Fid+Vj+XqUCG7Q3r1mlsJI9RY/+k0+b8Z/pzd12QOa +SE1qg+Q4bxcMQQ3xwifXQQGuEzeSuMtor6Tfo8W X-Google-Smtp-Source: AA6agR5uiGSbWNSyAVwfvEjoliYuIG/CANePnm5YVailnBUiHaTKY38N4XKiRjGKv1XOqkLFxT5GeSaoN05kueWOJ71k X-Received: from ajr0.svl.corp.google.com ([2620:15c:2d4:203:2f41:f176:4bac:b729]) (user=axelrasmussen job=sendgmr) by 2002:a5b:a0d:0:b0:689:9eee:348f with SMTP id k13-20020a5b0a0d000000b006899eee348fmr219249ybq.111.1660772859837; Wed, 17 Aug 2022 14:47:39 -0700 (PDT) Date: Wed, 17 Aug 2022 14:47:26 -0700 In-Reply-To: <20220817214728.489904-1-axelrasmussen@google.com> Message-Id: <20220817214728.489904-4-axelrasmussen@google.com> Mime-Version: 1.0 References: <20220817214728.489904-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.37.1.595.g718a3a8f04-goog Subject: [PATCH v6 3/5] userfaultfd: selftests: modify selftest to use /dev/userfaultfd From: Axel Rasmussen To: Alexander Viro , Andrew Morton , Dave Hansen , "Dmitry V . Levin" , Gleb Fotengauer-Malinovskiy , Hugh Dickins , Jan Kara , Jonathan Corbet , Mel Gorman , Mike Kravetz , Mike Rapoport , Nadav Amit , Peter Xu , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , zhangyi Cc: Axel Rasmussen , linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, Mike Rapoport ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660772860; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yddoa+559b8nrUIn5RkzA6LuoLnD/Z+9mfZhmJLmujQ=; b=2MlvCZ982dpYPXqi450fC9nQoFbDpfeV3eEmk9QFO9P/3DuXbcZdZeikTipmmu+Xi6wx+z 0bZ9oYxIuvnB7UoESBFvt/JA3SJmnVV8WvpGhZmxdCHAIjdE/hL2fMGXUeqqskd5JhMIcO jDJzAms42HokdtzJR0P7v6icYUfVz0g= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=m9tiA0nP; spf=pass (imf15.hostedemail.com: domain of 3-2H9Yg0KCNIyL29FyGAIGG2B4CC492.0CA96BIL-AA8Jy08.CF4@flex--axelrasmussen.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3-2H9Yg0KCNIyL29FyGAIGG2B4CC492.0CA96BIL-AA8Jy08.CF4@flex--axelrasmussen.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660772860; a=rsa-sha256; cv=none; b=fVSgvb2TM8jXi19ci5sPEoKlkqAc/Xj5Kd/Ih9nQLlnLBZmKXDYo+ScTbzHCQG4PQxzkCM IK7CsPM1Re+x+RWUNvG3/TCprC/2dB/frvy/7M8k1f9ZUJ40K6U4remuBv1Nd8+13IIppd IhmVo36+rEjSobz+4QR/ZLHK7h0Q2i0= Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=m9tiA0nP; spf=pass (imf15.hostedemail.com: domain of 3-2H9Yg0KCNIyL29FyGAIGG2B4CC492.0CA96BIL-AA8Jy08.CF4@flex--axelrasmussen.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3-2H9Yg0KCNIyL29FyGAIGG2B4CC492.0CA96BIL-AA8Jy08.CF4@flex--axelrasmussen.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: bopso919e1m4qargd8s74pb6u8gkjde5 X-Rspamd-Queue-Id: 6E3E9A01E0 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1660772860-234155 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We clearly want to ensure both userfaultfd(2) and /dev/userfaultfd keep working into the future, so just run the test twice, using each interface. Instead of always testing both userfaultfd(2) and /dev/userfaultfd, let the user choose which to test. As with other test features, change the behavior based on a new command line flag. Introduce the idea of "test mods", which are generic (not specific to a test type) modifications to the behavior of the test. This is sort of borrowed from this RFC patch series [1], but simplified a bit. The benefit is, in "typical" configurations this test is somewhat slow (say, 30sec or something). Testing both clearly doubles it, so it may not always be desirable, as users are likely to use one or the other, but never both, in the "real world". [1]: https://patchwork.kernel.org/project/linux-mm/patch/20201129004548.1619714-14-namit@vmware.com/ Acked-by: Mike Rapoport Acked-by: Peter Xu Signed-off-by: Axel Rasmussen --- tools/testing/selftests/vm/userfaultfd.c | 76 ++++++++++++++++++++---- 1 file changed, 66 insertions(+), 10 deletions(-) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 7c3f1b0ab468..7be709d9eed0 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -77,6 +77,11 @@ static int bounces; #define TEST_SHMEM 3 static int test_type; +#define UFFD_FLAGS (O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY) + +/* test using /dev/userfaultfd, instead of userfaultfd(2) */ +static bool test_dev_userfaultfd; + /* exercise the test_uffdio_*_eexist every ALARM_INTERVAL_SECS */ #define ALARM_INTERVAL_SECS 10 static volatile bool test_uffdio_copy_eexist = true; @@ -125,6 +130,8 @@ struct uffd_stats { const char *examples = "# Run anonymous memory test on 100MiB region with 99999 bounces:\n" "./userfaultfd anon 100 99999\n\n" + "# Run the same anonymous memory test, but using /dev/userfaultfd:\n" + "./userfaultfd anon:dev 100 99999\n\n" "# Run share memory test on 1GiB region with 99 bounces:\n" "./userfaultfd shmem 1000 99\n\n" "# Run hugetlb memory test on 256MiB region with 50 bounces:\n" @@ -141,6 +148,14 @@ static void usage(void) "[hugetlbfs_file]\n\n"); fprintf(stderr, "Supported : anon, hugetlb, " "hugetlb_shared, shmem\n\n"); + fprintf(stderr, "'Test mods' can be joined to the test type string with a ':'. " + "Supported mods:\n"); + fprintf(stderr, "\tsyscall - Use userfaultfd(2) (default)\n"); + fprintf(stderr, "\tdev - Use /dev/userfaultfd instead of userfaultfd(2)\n"); + fprintf(stderr, "\nExample test mod usage:\n"); + fprintf(stderr, "# Run anonymous memory test with /dev/userfaultfd:\n"); + fprintf(stderr, "./userfaultfd anon:dev 100 99999\n\n"); + fprintf(stderr, "Examples:\n\n"); fprintf(stderr, "%s", examples); exit(1); @@ -154,12 +169,14 @@ static void usage(void) ret, __LINE__); \ } while (0) -#define err(fmt, ...) \ +#define errexit(exitcode, fmt, ...) \ do { \ _err(fmt, ##__VA_ARGS__); \ - exit(1); \ + exit(exitcode); \ } while (0) +#define err(fmt, ...) errexit(1, fmt, ##__VA_ARGS__) + static void uffd_stats_reset(struct uffd_stats *uffd_stats, unsigned long n_cpus) { @@ -383,13 +400,34 @@ static void assert_expected_ioctls_present(uint64_t mode, uint64_t ioctls) } } +static int __userfaultfd_open_dev(void) +{ + int fd, _uffd; + + fd = open("/dev/userfaultfd", O_RDWR | O_CLOEXEC); + if (fd < 0) + errexit(KSFT_SKIP, "opening /dev/userfaultfd failed"); + + _uffd = ioctl(fd, USERFAULTFD_IOC_NEW, UFFD_FLAGS); + if (_uffd < 0) + errexit(errno == ENOTTY ? KSFT_SKIP : 1, + "creating userfaultfd failed"); + close(fd); + return _uffd; +} + static void userfaultfd_open(uint64_t *features) { struct uffdio_api uffdio_api; - uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY); - if (uffd < 0) - err("userfaultfd syscall not available in this kernel"); + if (test_dev_userfaultfd) + uffd = __userfaultfd_open_dev(); + else { + uffd = syscall(__NR_userfaultfd, UFFD_FLAGS); + if (uffd < 0) + errexit(errno == ENOSYS ? KSFT_SKIP : 1, + "creating userfaultfd failed"); + } uffd_flags = fcntl(uffd, F_GETFD, NULL); uffdio_api.api = UFFD_API; @@ -1584,8 +1622,6 @@ unsigned long default_huge_page_size(void) static void set_test_type(const char *type) { - uint64_t features = UFFD_API_FEATURES; - if (!strcmp(type, "anon")) { test_type = TEST_ANON; uffd_test_ops = &anon_uffd_test_ops; @@ -1603,9 +1639,29 @@ static void set_test_type(const char *type) test_type = TEST_SHMEM; uffd_test_ops = &shmem_uffd_test_ops; test_uffdio_minor = true; - } else { - err("Unknown test type: %s", type); } +} + +static void parse_test_type_arg(const char *raw_type) +{ + char *buf = strdup(raw_type); + uint64_t features = UFFD_API_FEATURES; + + while (buf) { + const char *token = strsep(&buf, ":"); + + if (!test_type) + set_test_type(token); + else if (!strcmp(token, "dev")) + test_dev_userfaultfd = true; + else if (!strcmp(token, "syscall")) + test_dev_userfaultfd = false; + else + err("unrecognized test mod '%s'", token); + } + + if (!test_type) + err("failed to parse test type argument: '%s'", raw_type); if (test_type == TEST_HUGETLB) page_size = default_huge_page_size(); @@ -1653,7 +1709,7 @@ int main(int argc, char **argv) err("failed to arm SIGALRM"); alarm(ALARM_INTERVAL_SECS); - set_test_type(argv[1]); + parse_test_type_arg(argv[1]); nr_cpus = sysconf(_SC_NPROCESSORS_ONLN); nr_pages_per_cpu = atol(argv[2]) * 1024*1024 / page_size / From patchwork Wed Aug 17 21:47:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12946524 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CBC48C25B08 for ; Wed, 17 Aug 2022 21:47:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 623128D0006; Wed, 17 Aug 2022 17:47:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5D3398D0002; Wed, 17 Aug 2022 17:47:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 472D28D0006; Wed, 17 Aug 2022 17:47:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 33F948D0002 for ; Wed, 17 Aug 2022 17:47:43 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 0EA84A06C3 for ; Wed, 17 Aug 2022 21:47:43 +0000 (UTC) X-FDA: 79810422006.09.B666EA1 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf13.hostedemail.com (Postfix) with ESMTP id 7C17820088 for ; Wed, 17 Aug 2022 21:47:42 +0000 (UTC) Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-3360c0f0583so19688357b3.2 for ; Wed, 17 Aug 2022 14:47:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc; bh=jMyZorWE6E22hV3NDdjIMErSuhZAX9bJlMRQ4lfwoQk=; b=kmf5TPe5TDihxUks/W6HE9xnDr7EKu9fX5X/rvsm0Ya68c0awpeJBVGH0nzmUKuYJ8 rE8tjpC3S/ed1CEBFsuxhHdeIjP56/BSB5drqGzrmIJXsqSJBNKymt6+K0z6vPXyWx3i 0tBuuyCs3uyC0x5ddlwxY2lKwN7wUGTezJaC4PMmelE9aNTy46ALdXCVGFD99eosR85J gV9LNz5Sca6lRKdL7jFoNuTCrfmAaL+Lx/NKNp5PuCO/vmUdvXCvKOciNBepvGWehlrZ 8nDB0O42+9TdWUts8kxMGVRvWkzlxt6ohWl6fGZIeamEURvXy40mRJONvGqeYX0aNkb3 QEog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc; bh=jMyZorWE6E22hV3NDdjIMErSuhZAX9bJlMRQ4lfwoQk=; b=2n3GQmba37EEfnVKi+HK0WIgsx/IWnnFrPn+jNqx+tPdHVu417iUaZntxpqDv276OO 3V2xtGrnnXk+Vq090iU9fwgn5mOBgfWnb9X2cfZqdcI6Nx+fSTbnB7aMO8ijafT4nJo5 5y/UB8O/Zx3sa3JG1fE/1KjoHzXYvpQC0RhWqcCv7HEhqFgdfsEl/wqE8Eh2EvDeWJ4E OWzAyaCda231tahm/PW5F0+2bhqtGMKO1JnrhfhsZSJDm3M+wFeuVQ6blGaWIy3dHk8q leAe0I3HJb4Lrb5JXVcrqBnzWjcKQTClhUw4E7I9/xsnJhu5w+XQBb+RAVkSq7uvLoZA pt4Q== X-Gm-Message-State: ACgBeo0FDFDujGJJz9xOHaMYzMfttI269PqwFcp0yD1PhKgHNlSXirbv kdNZyK9Mih3EfgAA4dDasa4HbqWBf4EWErORvB8t X-Google-Smtp-Source: AA6agR6aBMm/RqNkrbkKOZWC8FO8iRne3LMJ4YFzZ+fs5wAz7NO45vvRzpcivQOY2FdD/cwWt1QvlitSsbytGa9Pxd0v X-Received: from ajr0.svl.corp.google.com ([2620:15c:2d4:203:2f41:f176:4bac:b729]) (user=axelrasmussen job=sendgmr) by 2002:a81:1dd1:0:b0:335:dd05:372c with SMTP id d200-20020a811dd1000000b00335dd05372cmr159548ywd.342.1660772862105; Wed, 17 Aug 2022 14:47:42 -0700 (PDT) Date: Wed, 17 Aug 2022 14:47:27 -0700 In-Reply-To: <20220817214728.489904-1-axelrasmussen@google.com> Message-Id: <20220817214728.489904-5-axelrasmussen@google.com> Mime-Version: 1.0 References: <20220817214728.489904-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.37.1.595.g718a3a8f04-goog Subject: [PATCH v6 4/5] userfaultfd: update documentation to describe /dev/userfaultfd From: Axel Rasmussen To: Alexander Viro , Andrew Morton , Dave Hansen , "Dmitry V . Levin" , Gleb Fotengauer-Malinovskiy , Hugh Dickins , Jan Kara , Jonathan Corbet , Mel Gorman , Mike Kravetz , Mike Rapoport , Nadav Amit , Peter Xu , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , zhangyi Cc: Axel Rasmussen , linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660772862; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jMyZorWE6E22hV3NDdjIMErSuhZAX9bJlMRQ4lfwoQk=; b=zN+iJDp/EyekxbKCkOjz+pSd3RRxKo4KfBK7SritKsKI+8oW4ev1vU89D5wgbOqgEWzMjA HvHNWtNHwR6sj76TV+oSa+KU3T/0QmFk+rcXntr0twDQxCulzK67je3iA5amWONdYVIOr9 KVciy0N9+JydlhpqFFSNcd4PCDPng80= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=kmf5TPe5; spf=pass (imf13.hostedemail.com: domain of 3_mH9Yg0KCNU1O5CI1JDLJJ5E7FF7C5.3FDC9ELO-DDBM13B.FI7@flex--axelrasmussen.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3_mH9Yg0KCNU1O5CI1JDLJJ5E7FF7C5.3FDC9ELO-DDBM13B.FI7@flex--axelrasmussen.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660772862; a=rsa-sha256; cv=none; b=cj/cU1sC015SnljikuiRMNz+O9IQB0d/9SsEcEfPCaMTsMBtQqW4Wb/7NK6uVX5KKNQbwb BXWa4plVc+41VaCzfWM8811bdOX1y3KIW6tKzOSbbm5RWiObw8wMuzzudgjRrO1Jc5KzgV KCPQC5WvU9WCFXrjxFmB9F26QqZdL74= Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=kmf5TPe5; spf=pass (imf13.hostedemail.com: domain of 3_mH9Yg0KCNU1O5CI1JDLJJ5E7FF7C5.3FDC9ELO-DDBM13B.FI7@flex--axelrasmussen.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3_mH9Yg0KCNU1O5CI1JDLJJ5E7FF7C5.3FDC9ELO-DDBM13B.FI7@flex--axelrasmussen.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: 5i7hpfpoadfa4pox9t8wraakezbyqkqc X-Rspamd-Queue-Id: 7C17820088 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1660772862-575852 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Explain the different ways to create a new userfaultfd, and how access control works for each way. Acked-by: Peter Xu Signed-off-by: Axel Rasmussen --- Documentation/admin-guide/mm/userfaultfd.rst | 41 ++++++++++++++++++-- Documentation/admin-guide/sysctl/vm.rst | 3 ++ 2 files changed, 41 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst index 6528036093e1..83f31919ebb3 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -17,7 +17,10 @@ of the ``PROT_NONE+SIGSEGV`` trick. Design ====== -Userfaults are delivered and resolved through the ``userfaultfd`` syscall. +Userspace creates a new userfaultfd, initializes it, and registers one or more +regions of virtual memory with it. Then, any page faults which occur within the +region(s) result in a message being delivered to the userfaultfd, notifying +userspace of the fault. The ``userfaultfd`` (aside from registering and unregistering virtual memory ranges) provides two primary functionalities: @@ -34,12 +37,11 @@ The real advantage of userfaults if compared to regular virtual memory management of mremap/mprotect is that the userfaults in all their operations never involve heavyweight structures like vmas (in fact the ``userfaultfd`` runtime load never takes the mmap_lock for writing). - Vmas are not suitable for page- (or hugepage) granular fault tracking when dealing with virtual address spaces that could span Terabytes. Too many vmas would be needed for that. -The ``userfaultfd`` once opened by invoking the syscall, can also be +The ``userfaultfd``, once created, can also be passed using unix domain sockets to a manager process, so the same manager process could handle the userfaults of a multitude of different processes without them being aware about what is going on @@ -50,6 +52,39 @@ is a corner case that would currently return ``-EBUSY``). API === +Creating a userfaultfd +---------------------- + +There are two ways to create a new userfaultfd, each of which provide ways to +restrict access to this functionality (since historically userfaultfds which +handle kernel page faults have been a useful tool for exploiting the kernel). + +The first way, supported since userfaultfd was introduced, is the +userfaultfd(2) syscall. Access to this is controlled in several ways: + +- Any user can always create a userfaultfd which traps userspace page faults + only. Such a userfaultfd can be created using the userfaultfd(2) syscall + with the flag UFFD_USER_MODE_ONLY. + +- In order to also trap kernel page faults for the address space, either the + process needs the CAP_SYS_PTRACE capability, or the system must have + vm.unprivileged_userfaultfd set to 1. By default, vm.unprivileged_userfaultfd + is set to 0. + +The second way, added to the kernel more recently, is by opening +/dev/userfaultfd and issuing a USERFAULTFD_IOC_NEW ioctl to it. This method +yields equivalent userfaultfds to the userfaultfd(2) syscall. + +Unlike userfaultfd(2), access to /dev/userfaultfd is controlled via normal +filesystem permissions (user/group/mode), which gives fine grained access to +userfaultfd specifically, without also granting other unrelated privileges at +the same time (as e.g. granting CAP_SYS_PTRACE would do). Users who have access +to /dev/userfaultfd can always create userfaultfds that trap kernel page faults; +vm.unprivileged_userfaultfd is not considered. + +Initializing a userfaultfd +-------------------------- + When first opened the ``userfaultfd`` must be enabled invoking the ``UFFDIO_API`` ioctl specifying a ``uffdio_api.api`` value set to ``UFFD_API`` (or a later API version) which will specify the ``read/POLLIN`` protocol diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 9b833e439f09..988f6a4c8084 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -926,6 +926,9 @@ calls without any restrictions. The default value is 0. +Another way to control permissions for userfaultfd is to use +/dev/userfaultfd instead of userfaultfd(2). See +Documentation/admin-guide/mm/userfaultfd.rst. user_reserve_kbytes =================== From patchwork Wed Aug 17 21:47:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12946525 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D853C25B08 for ; Wed, 17 Aug 2022 21:47:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9F3EB8D0007; Wed, 17 Aug 2022 17:47:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A36D8D0002; Wed, 17 Aug 2022 17:47:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 86B278D0007; Wed, 17 Aug 2022 17:47:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 7932E8D0002 for ; Wed, 17 Aug 2022 17:47:47 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 578C51C6DD4 for ; Wed, 17 Aug 2022 21:47:47 +0000 (UTC) X-FDA: 79810422174.05.82E0DF6 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) by imf04.hostedemail.com (Postfix) with ESMTP id DA64F4019F for ; Wed, 17 Aug 2022 21:47:45 +0000 (UTC) Received: by mail-pg1-f202.google.com with SMTP id l190-20020a6388c7000000b00429eadd0a58so522229pgd.19 for ; Wed, 17 Aug 2022 14:47:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc; bh=jnCzi2soCjCrdoYyE26okeRANWMjNTuMb/vBG8/63Ug=; b=KZ0P1170IwcAUTv8uQbp9z7pte5r8kbVvoL+qqtlH3QccUqJDgZWrT/GJqfAKRnvAK CzZLov1dutB82SgbWPPzafLbe4YaRt7nlqDWzpwzjvXJ1vIbUcBzRXJWjlaGaSLqYEns FoLMoKN+UHsLMa1XItGd8jNF2ArKMQZhcmw+3Q+Hjpr0JdoQ0gVe+/+Ls57ppTFJ3zEk 9ReRy0BgHOs/xGg2zwKIF/plGC6NvWST4LGC8hdXiPFxmbd0zWTPsdGp73UpgXt7/Jo5 Y7sx0Ehv6+9al+cCFThmxBBXb5blmspH061tjTD7GYZLCKIaVqXu2A93PxXnKLVAnpIi hnyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc; bh=jnCzi2soCjCrdoYyE26okeRANWMjNTuMb/vBG8/63Ug=; b=PYqHQXA90+Ke0UXmYWuhKA54vHKqsF/E5L+73DTXdmlfptHAy5t6h8mfI20xvIJDCQ 9MxyNHY2Hl8iLay+lNW7oOuDF3bFIPdeJvWOcsxO/wew6DMIt2Rh9q9v0UMARA8eXFe4 sv9VtgzWXJ2eAmijWWVTFvvFGnI7tV4vgV4w62llj4B+mn126HWipehG9VpEjGULI9WZ qX6hPGm7hYMdtU1hqR0q/S+5Icft94OIjzOpD2LySWCHCDlrwzVN3o6WlUBt900rHZWZ qKNu7ojS2eNh6Dyx9pw3bLbc5+gzf1e/zA8UKq8vFCw23ptjvtYvcBBTdoFNBgV6JWaP 4RPQ== X-Gm-Message-State: ACgBeo072BHEGWVZSdBPcwjxr9MgiDL9BAMibSSgqrsj0ziZfZmtGX2u 9hDZ8nHOl0rjXikRNhsf2zXVy6F9aJoTFJw5AXDz X-Google-Smtp-Source: AA6agR5jERdtN3diOuEm1pmrj1QHlYiMHvWG8qI727CmuqPey4kLQo+pfMvxcFs9I8HGucRMJxmUfmrjgGwpfsKtL631 X-Received: from ajr0.svl.corp.google.com ([2620:15c:2d4:203:2f41:f176:4bac:b729]) (user=axelrasmussen job=sendgmr) by 2002:a17:90b:1b48:b0:1f4:f4e5:c189 with SMTP id nv8-20020a17090b1b4800b001f4f4e5c189mr5408096pjb.226.1660772864770; Wed, 17 Aug 2022 14:47:44 -0700 (PDT) Date: Wed, 17 Aug 2022 14:47:28 -0700 In-Reply-To: <20220817214728.489904-1-axelrasmussen@google.com> Message-Id: <20220817214728.489904-6-axelrasmussen@google.com> Mime-Version: 1.0 References: <20220817214728.489904-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.37.1.595.g718a3a8f04-goog Subject: [PATCH v6 5/5] selftests: vm: add /dev/userfaultfd test cases to run_vmtests.sh From: Axel Rasmussen To: Alexander Viro , Andrew Morton , Dave Hansen , "Dmitry V . Levin" , Gleb Fotengauer-Malinovskiy , Hugh Dickins , Jan Kara , Jonathan Corbet , Mel Gorman , Mike Kravetz , Mike Rapoport , Nadav Amit , Peter Xu , Shuah Khan , Suren Baghdasaryan , Vlastimil Babka , zhangyi Cc: Axel Rasmussen , linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, Shuah Khan ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=KZ0P1170; spf=pass (imf04.hostedemail.com: domain of 3AGL9Yg0KCNc3Q7EK3LFNLL7G9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--axelrasmussen.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3AGL9Yg0KCNc3Q7EK3LFNLL7G9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--axelrasmussen.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1660772865; a=rsa-sha256; cv=none; b=nClNSPxE+JJfsQbr30PpvJBnTN3mDoERMpIO5UkQXH+hLMTd7SqMANDTBjQa+7KHb5PNFm ZK1FAvvSZw9hL0dRR/9h2CLjPkO03lW3v8sscLoT/LFejMFSKU9Zjbi/0j2t/eDusffulP yxzY+UJXEsopElPXIUitTS0oa+n7JFM= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1660772865; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jnCzi2soCjCrdoYyE26okeRANWMjNTuMb/vBG8/63Ug=; b=JI8h3lCbqEPRY+bzq2lG7gJ62pn14xDydfmqyNYM9SXhzr2Zeb7d0YViRAvFZHRjgwaZFS hNy1Bt8ZywDudXgkZa+jEHkEuj969XMNmOgFEEMBD98TREZ0E6htg1jH+JTH6bzaqSgWNX 9dVCwTVrxr0TZlu1QLB1GPFqOt8BID0= Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=KZ0P1170; spf=pass (imf04.hostedemail.com: domain of 3AGL9Yg0KCNc3Q7EK3LFNLL7G9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--axelrasmussen.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3AGL9Yg0KCNc3Q7EK3LFNLL7G9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--axelrasmussen.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: o5wwcn8oppf4sqh6a54xhkw399cp5phg X-Rspamd-Queue-Id: DA64F4019F X-Rspamd-Server: rspam06 X-Rspam-User: X-HE-Tag: 1660772865-172109 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This new mode was recently added to the userfaultfd selftest. We want to exercise both userfaultfd(2) as well as /dev/userfaultfd, so add both test cases to the script. Reviewed-by: Shuah Khan Acked-by: Peter Xu Signed-off-by: Axel Rasmussen --- tools/testing/selftests/vm/run_vmtests.sh | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/tools/testing/selftests/vm/run_vmtests.sh b/tools/testing/selftests/vm/run_vmtests.sh index b8e7f6f38d64..e780e76c26b8 100755 --- a/tools/testing/selftests/vm/run_vmtests.sh +++ b/tools/testing/selftests/vm/run_vmtests.sh @@ -120,13 +120,16 @@ run_test ./gup_test -a # Dump pages 0, 19, and 4096, using pin_user_pages: run_test ./gup_test -ct -F 0x1 0 19 0x1000 -run_test ./userfaultfd anon 20 16 -# Hugetlb tests require source and destination huge pages. Pass in half the -# size ($half_ufd_size_MB), which is used for *each*. -run_test ./userfaultfd hugetlb "$half_ufd_size_MB" 32 -run_test ./userfaultfd hugetlb_shared "$half_ufd_size_MB" 32 "$mnt"/uffd-test -rm -f "$mnt"/uffd-test -run_test ./userfaultfd shmem 20 16 +uffd_mods=("" ":dev") +for mod in "${uffd_mods[@]}"; do + run_test ./userfaultfd anon${mod} 20 16 + # Hugetlb tests require source and destination huge pages. Pass in half + # the size ($half_ufd_size_MB), which is used for *each*. + run_test ./userfaultfd hugetlb${mod} "$half_ufd_size_MB" 32 + run_test ./userfaultfd hugetlb_shared${mod} "$half_ufd_size_MB" 32 "$mnt"/uffd-test + rm -f "$mnt"/uffd-test + run_test ./userfaultfd shmem${mod} 20 16 +done #cleanup umount "$mnt"