From patchwork Fri Apr 14 00:11:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13210788 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F775C7EE23 for ; Fri, 14 Apr 2023 00:12:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6A3946B007E; Thu, 13 Apr 2023 20:12:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 60570900002; Thu, 13 Apr 2023 20:12:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 36EBA6B0081; Thu, 13 Apr 2023 20:12:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 1716C6B007E for ; Thu, 13 Apr 2023 20:12:10 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id CBD45404BA for ; Fri, 14 Apr 2023 00:12:09 +0000 (UTC) X-FDA: 80678069178.13.19D4364 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf16.hostedemail.com (Postfix) with ESMTP id 1B00B180016 for ; Fri, 14 Apr 2023 00:12:07 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=IeAvbh5L; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of 3V5o4ZAsKCJY02A4HB4OJD66EE6B4.2ECB8DKN-CCAL02A.EH6@flex--ackerleytng.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3V5o4ZAsKCJY02A4HB4OJD66EE6B4.2ECB8DKN-CCAL02A.EH6@flex--ackerleytng.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681431128; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Xv2Sbflv1PdMqZUyau/vGtw69NpiI05I8fyTNzzRBtE=; b=u5xlTpbNVMvJD9eujklRmQZk9x3/d3MXjP7OsS3p2Trm/u9ejpXpOMPywXfmRE/AhLFGOU jhAJHfJ49HzoiAs03P1eiEBJnQyGLL1BAeWkTSMeBlybyB/85IECOifrww6UifmFCPqk/p d0SmBY/C/Ccs0wnPMv3cVT/kZWnP1U4= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=IeAvbh5L; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of 3V5o4ZAsKCJY02A4HB4OJD66EE6B4.2ECB8DKN-CCAL02A.EH6@flex--ackerleytng.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3V5o4ZAsKCJY02A4HB4OJD66EE6B4.2ECB8DKN-CCAL02A.EH6@flex--ackerleytng.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681431128; a=rsa-sha256; cv=none; b=8ZzQkqZagU6fWYrBELEgAu2U1/FsBEi6OSLKhsI1+7vEoeeYt5BKl7HgjX9iMhKkOA+7RG eoEIx3kM5BrRuhqaBP2fsNiNLko0usFfjQ+rkck46Jv/nqTwxiKOm4lwa7XGUtvvRsKCWK 6zXgvxw1vKJEauYN9qvoLAfaMz5yud0= Received: by mail-yb1-f201.google.com with SMTP id j193-20020a2523ca000000b00b8f6b82ec94so1100389ybj.18 for ; Thu, 13 Apr 2023 17:12:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1681431127; x=1684023127; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Xv2Sbflv1PdMqZUyau/vGtw69NpiI05I8fyTNzzRBtE=; b=IeAvbh5LiWqeMu3W4lreAV4pEYXDzFTMIK++YoYtL9HONMbDl2n8JG1cJvXTxQ4W/Z GxRYbHWg5mdzTkvevWjucHzalLOLaRJPEZSOdabNIZoeGxpZJ7sqGOpBbuMaxh+N9UO4 laslticySsTQ5Z1P33WcVcn5XKY3X/szRjrTAuUNdQOCywlR5EjataM835zXeQyYF84o sNh45HAi3xIJdUtIf7JG4FaHzE4kQ9wGOth27DrwXDDPKWE4i8uvOAxFPGUJWILtbFqJ B0wpIq7/vwtgTuDsvfcJaOD5/zbThG99OLcVYbtOP9o7ei2nNW7AfnDuSWHGLmDR7UAC GOjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681431127; x=1684023127; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Xv2Sbflv1PdMqZUyau/vGtw69NpiI05I8fyTNzzRBtE=; b=gsSCRKxlE31EJs4I7fKS6UznkaRfkLlZHT38GA72KYJjwfSys6IcCOAib9eZqSSRcX w3f8mdmqPYCFPMhFcUmRDtYv8cPER2mIypSESf3l4mYIo3iL3N8Sa7rICWBFRCCuDGHf Lwe4VawrvCbu/ZD/r1OOTm+K9hZ2/d1BHnVemWuqq0phj9fMIf95xxhg9acKQwOoOoQl PQiCCjfe2VAmabWlpgtVLr/IwUN695k/0nFbMAyHfjxcs9hY5Kx8BbtS6d5pN8KzDt7y HHreWvHz1R+h0IvYJL8lScqRqsutzZBI7va2sWxtOG5yBGhyqc5uFo/GpEl+1bZfQKqb /L3g== X-Gm-Message-State: AAQBX9cBGJcHXvj248YA1Oz6u3XxvCRJTwdqyLkWD1Is+m9itlpfbbE4 P7IN6NGX7ebbkOQwpS7Bzo2PrLtaasBWhjkZyQ== X-Google-Smtp-Source: AKy350bz8+zUcgBBK3sJ0KBBaZXIEou5c04bKvPj9N3prJO/LHXQy58hyZlX1YHGEbHTNZX5qvCYitSezXVlLtbvug== X-Received: from ackerleytng-cloudtop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1f5f]) (user=ackerleytng job=sendgmr) by 2002:a25:68cc:0:b0:a27:3ecc:ffe7 with SMTP id d195-20020a2568cc000000b00a273eccffe7mr5642963ybc.3.1681431127201; Thu, 13 Apr 2023 17:12:07 -0700 (PDT) Date: Fri, 14 Apr 2023 00:11:54 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.40.0.634.g4ca3ef3211-goog Message-ID: Subject: [RFC PATCH 5/6] mm: restrictedmem: Add memfd_restricted_bind() syscall From: Ackerley Tng To: kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org Cc: aarcange@redhat.com, ak@linux.intel.com, akpm@linux-foundation.org, arnd@arndb.de, bfields@fieldses.org, bp@alien8.de, chao.p.peng@linux.intel.com, corbet@lwn.net, dave.hansen@intel.com, david@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, hpa@zytor.com, hughd@google.com, jlayton@kernel.org, jmattson@google.com, joro@8bytes.org, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, linmiaohe@huawei.com, luto@kernel.org, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, mingo@redhat.com, naoya.horiguchi@nec.com, pbonzini@redhat.com, qperret@google.com, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, tabba@google.com, tglx@linutronix.de, vannapurve@google.com, vbabka@suse.cz, vkuznets@redhat.com, wanpengli@tencent.com, wei.w.wang@intel.com, x86@kernel.org, yu.c.zhang@linux.intel.com, muchun.song@linux.dev, feng.tang@intel.com, brgerst@gmail.com, rdunlap@infradead.org, masahiroy@kernel.org, mailhol.vincent@wanadoo.fr, Ackerley Tng X-Rspamd-Queue-Id: 1B00B180016 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: deqrzbr75qpe99kuz6qufyo9q3zjh7ko X-HE-Tag: 1681431127-770135 X-HE-Meta: U2FsdGVkX18j/r8Z3Y80rCX1/xJ5aPYOJGzUKGGySwNfH0NnEJ8gTYw+iBedGmfk7d5fH7A4IRbqjbFKcOGDbzHU3HgyqaIThSX+y88TJeGiJ5GakZGKaUO9AJ8oyN735voUkMjah83pNq+p+D3xfMz30jO6Z22yepYDCpYXpfj0YgTm3LILsrKjGVAc3U1Xpk504p/PL+EzcfbyrBCk9nWROkhC6KtXuXE6md3AQt01kPwe8g8A8CYhk+q1Gu0g6DWMdKcEhzFCqk6jy+sExWtmclVQuUHRQ24sojlDcYjT1Y7+XDZaZTUclHrb5rPGVhDl4wV/HC4oAEYiWtvdYWYB9UpaWjYt32OXOepoeQ+M+KZmBBAwBXGwdzyCV+e6Yp9QZWm0ghfTUCwtPCredtMkH7LAzPIBvVtSwJyteaVHenyTMUYJj7RdGRHTUe0ylWuUVQ4nLj+XdMXBSTXI/o1eMcl+vc1vnxD/M8egnjuwbnq8FLs5DKwCOf5Otr9J6dqtyXQzR4IbmrOQ6PwTKrn/CppRRKuy0hiySvF6io+FPXwzdZY/zuB6yiT6cO1u9z33hdkEOeN9ecj1Ndw5zvV4wx+gtzbPupAyKRgF4VBgYSbQrInGS+1ve610XiPOJ6fIG5TwTKbK43cnzzMrvTJlJPVULUE2cuxpiIb0jhQ7muVZCnuOIXcNPPvf3ancmMlR0AcimmpwcZBKup+SwUKMhuABYHD9/eSmHlJI1DC14HN2jSr4B3z67DYcQQKYyurSiyV+q+4vK/6lyDzpn0x00lyZaokJN8JsupqaP5Hqn5gc3UIJaXOtQV1Dzwv+JThSM6Mjlhy+RKY5rofZ2vc+pXs9d+CVCSbMKsjUOW5guhseHe0RRuo+tTs7IrFIb8RqjLMQwjdTQQZoETiQ8d71GqMTbuwDrR1A8K78f70GvT2ZnlOxGYbziawnNc74SCJSJohtXMpYhZeeBsC NdCbBiBE KS8FxY2dV8pVGhZRmO3pIHSAyfC8yZWXnFWBSWyuQcQSeOyr7p2RFDIN8SkV2ew06yH/qObLV44v+fNW3lx4/YmfjzzL6JhZzVaYrjUVpL/aj7FqOscO81wpsZXArUCR8pdo7c/U6Udu4WeudDkA5WZWZyn50O1uA5M1KaGptSGYFH05+EdnLSNUAv65V4/F145rbY9RjdfAheIyC8GaZeJ/nyNgySKcIxm3p05r6SClzI+Egl1R5QX0rfHIYh5jIXFpMGXixZSMifPqtQMrwvpulROwUAUGdQXwjkw0BJzUSAQ6xUjyzxeMrTrMl/ncjzDqJCKrjdzB3Qu9Nr05ESlbcbBdp1QE5KKgKy98weDt5PQf6LR9PPAKFkb6S9L24WgfZWv7+4MCQ4DaVyIFQZQAwoGXT5HVYX2YjTMUuVVAZp5RM+i3QwHNztb793IKXWV+CiHLZAr7/idzBaphRNtnrAY29nYGJRGzaecy1iwfjrTeUBulpg1FZIZ74EVmz8TQz/ujZrl77k+nt+2203rAsnibC1Cw29L8J+BulDzbEhQ7rY+AFTdYt3YS7FnQMvK9eeRQZynQzCoP+yPAfVS4BPVbLn7jiMFViv2zgtEVlG5+5NboDk55Epspx4AWe7F6Tc8WYqPjDrldxekWf2B62DM4SJtaz3brNG8+XxuGpfbB89ZCqNo9zxNxMuSQuS4rNd9tgE31m2dyzu97KG3jw3g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: memfd_restricted_bind() sets the NUMA memory policy, which consists of a policy mode and zero or more nodes, for an offset within a restrictedmem file with file descriptor fd and continuing for len bytes. This is intended to be like mbind() but specially for restrictedmem files, which cannot be mmap()ed into userspace and hence has no memory addresses that can be used with mbind(). Unlike mbind(), memfd_restricted_bind() will override any existing memory policy if a new memory policy is defined for the same ranges. For now, memfd_restricted_bind() does not perform migrations and no flags are supported. This syscall is specialised just for restrictedmem files because this functionality is not required by other files. Signed-off-by: Ackerley Tng --- arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + include/linux/mempolicy.h | 2 +- include/linux/syscalls.h | 5 ++ include/uapi/asm-generic/unistd.h | 5 +- include/uapi/linux/mempolicy.h | 7 ++- kernel/sys_ni.c | 1 + mm/restrictedmem.c | 75 ++++++++++++++++++++++++++ scripts/checksyscalls.sh | 1 + 9 files changed, 95 insertions(+), 3 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index dc70ba90247e..c94e9ce46cc3 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -456,3 +456,4 @@ 449 i386 futex_waitv sys_futex_waitv 450 i386 set_mempolicy_home_node sys_set_mempolicy_home_node 451 i386 memfd_restricted sys_memfd_restricted +452 i386 memfd_restricted_bind sys_memfd_restricted_bind diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 06516abc8318..6bd86b45d63a 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -373,6 +373,7 @@ 449 common futex_waitv sys_futex_waitv 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common memfd_restricted sys_memfd_restricted +452 common memfd_restricted_bind sys_memfd_restricted_bind # # Due to a historical design error, certain syscalls are numbered differently diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 15facd9de087..af62233df0c0 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -126,7 +126,7 @@ struct shared_policy { int vma_dup_policy(struct vm_area_struct *src, struct vm_area_struct *dst); struct mempolicy *mpol_create( - unsigned long mode, const unsigned long __user *nmask, unsigned long maxnode) + unsigned long mode, const unsigned long __user *nmask, unsigned long maxnode); void mpol_shared_policy_init(struct shared_policy *sp, struct mempolicy *mpol); int __mpol_set_shared_policy(struct shared_policy *info, struct mempolicy *mpol, unsigned long pgoff_start, unsigned long npages); diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 660be0bf89d5..852b202d3837 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1059,6 +1059,11 @@ asmlinkage long sys_set_mempolicy_home_node(unsigned long start, unsigned long l unsigned long home_node, unsigned long flags); asmlinkage long sys_memfd_restricted(unsigned int flags); +asmlinkage long sys_memfd_restricted_bind(int fd, struct file_range __user *range, + unsigned long mode, + const unsigned long __user *nmask, + unsigned long maxnode, + unsigned int flags); /* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index e2ea7cd964f8..b5a1385bb4a7 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -889,10 +889,13 @@ __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node) #ifdef __ARCH_WANT_MEMFD_RESTRICTED #define __NR_memfd_restricted 451 __SYSCALL(__NR_memfd_restricted, sys_memfd_restricted) + +#define __NR_memfd_restricted_bind 452 +__SYSCALL(__NR_memfd_restricted_bind, sys_memfd_restricted_bind) #endif #undef __NR_syscalls -#define __NR_syscalls 452 +#define __NR_syscalls 453 /* * 32 bit systems traditionally used different diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h index 046d0ccba4cd..979499abd253 100644 --- a/include/uapi/linux/mempolicy.h +++ b/include/uapi/linux/mempolicy.h @@ -6,9 +6,9 @@ #ifndef _UAPI_LINUX_MEMPOLICY_H #define _UAPI_LINUX_MEMPOLICY_H +#include #include - /* * Both the MPOL_* mempolicy mode and the MPOL_F_* optional mode flags are * passed by the user to either set_mempolicy() or mbind() in an 'int' actual. @@ -72,4 +72,9 @@ enum { #define RECLAIM_WRITE (1<<1) /* Writeout pages during reclaim */ #define RECLAIM_UNMAP (1<<2) /* Unmap pages during reclaim */ +struct file_range { + __kernel_loff_t offset; + __kernel_size_t len; +}; + #endif /* _UAPI_LINUX_MEMPOLICY_H */ diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 7c4a32cbd2e7..db24d3fe6dc5 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -362,6 +362,7 @@ COND_SYSCALL(memfd_secret); /* memfd_restricted */ COND_SYSCALL(memfd_restricted); +COND_SYSCALL(memfd_restricted_bind); /* * Architecture specific weak syscall entries. diff --git a/mm/restrictedmem.c b/mm/restrictedmem.c index 55e99e6c09a1..9c249722c61b 100644 --- a/mm/restrictedmem.c +++ b/mm/restrictedmem.c @@ -1,4 +1,5 @@ // SPDX-License-Identifier: GPL-2.0 +#include #include "linux/sbitmap.h" #include #include @@ -359,3 +360,77 @@ int restrictedmem_get_page(struct file *file, pgoff_t offset, return 0; } EXPORT_SYMBOL_GPL(restrictedmem_get_page); + +static int restrictedmem_set_shared_policy( + struct file *file, loff_t start, size_t len, struct mempolicy *mpol) +{ + struct restrictedmem *rm; + unsigned long end; + + if (!PAGE_ALIGNED(start)) + return -EINVAL; + + len = PAGE_ALIGN(len); + end = start + len; + + if (end < start) + return -EINVAL; + if (end == start) + return 0; + + rm = file->f_mapping->private_data; + return __mpol_set_shared_policy(shmem_shared_policy(rm->memfd), mpol, + start >> PAGE_SHIFT, len >> PAGE_SHIFT); +} + +static long do_memfd_restricted_bind( + int fd, loff_t offset, size_t len, + unsigned long mode, const unsigned long __user *nmask, + unsigned long maxnode, unsigned int flags) +{ + long ret; + struct fd f; + struct mempolicy *mpol; + + /* None of the flags are supported */ + if (flags) + return -EINVAL; + + f = fdget_raw(fd); + if (!f.file) + return -EBADF; + + if (!file_is_restrictedmem(f.file)) + return -EINVAL; + + mpol = mpol_create(mode, nmask, maxnode); + if (IS_ERR(mpol)) { + ret = PTR_ERR(mpol); + goto out; + } + + ret = restrictedmem_set_shared_policy(f.file, offset, len, mpol); + + mpol_put(mpol); + +out: + fdput(f); + + return ret; +} + +SYSCALL_DEFINE6(memfd_restricted_bind, int, fd, struct file_range __user *, range, + unsigned long, mode, const unsigned long __user *, nmask, + unsigned long, maxnode, unsigned int, flags) +{ + loff_t offset; + size_t len; + + if (unlikely(get_user(offset, &range->offset))) + return -EFAULT; + if (unlikely(get_user(len, &range->len))) + return -EFAULT; + + return do_memfd_restricted_bind(fd, offset, len, mode, nmask, + maxnode, flags); +} diff --git a/scripts/checksyscalls.sh b/scripts/checksyscalls.sh index 3c4d2508226a..e253529cf1ec 100755 --- a/scripts/checksyscalls.sh +++ b/scripts/checksyscalls.sh @@ -46,6 +46,7 @@ cat << EOF #ifndef __ARCH_WANT_MEMFD_RESTRICTED #define __IGNORE_memfd_restricted +#define __IGNORE_memfd_restricted_bind #endif /* Missing flags argument */