From patchwork Fri Apr 14 00:11:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ackerley Tng X-Patchwork-Id: 13210783 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 183F8C77B6E for ; Fri, 14 Apr 2023 00:12:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 592276B0072; Thu, 13 Apr 2023 20:12:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 51BDB6B0075; Thu, 13 Apr 2023 20:12:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3BB2C900002; Thu, 13 Apr 2023 20:12:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 258886B0072 for ; Thu, 13 Apr 2023 20:12:02 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D3B3E404B9 for ; Fri, 14 Apr 2023 00:12:01 +0000 (UTC) X-FDA: 80678068842.06.3A69A92 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf07.hostedemail.com (Postfix) with ESMTP id 2976A40006 for ; Fri, 14 Apr 2023 00:11:59 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=SxumLiXB; spf=pass (imf07.hostedemail.com: domain of 3T5o4ZAsKCI4su2w93wGB5yy66y3w.u64305CF-442Dsu2.69y@flex--ackerleytng.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3T5o4ZAsKCI4su2w93wGB5yy66y3w.u64305CF-442Dsu2.69y@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681431120; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=i+pweVgTeWv08z0ELDTsB0Ltp1Ye2bcZK0KHo4GqVbw=; b=6LQ09pQjRM42LSEmx1ZEooLXXaAq+BiGIMEAQ3MnQgniZwVKW1iHE1IBXcrG08eW2V+XGA uYmohTg7cAONpaWtRL/p8EpPiOvJc5m+FRsoatXBAaesfCBgpfZGGcw4dvCeZszKK+bBhA gRJudtF5hilh+oV5FUHSM33M/RCs69I= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=SxumLiXB; spf=pass (imf07.hostedemail.com: domain of 3T5o4ZAsKCI4su2w93wGB5yy66y3w.u64305CF-442Dsu2.69y@flex--ackerleytng.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3T5o4ZAsKCI4su2w93wGB5yy66y3w.u64305CF-442Dsu2.69y@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681431120; a=rsa-sha256; cv=none; b=WLC6EmTkv3N7EC+yi0cYAwDHmDMZ8cQsHMZjIvGBXBvNvFw+ij083jOEAJHl+Lr8VbDDNb Ib2j9bjG0jjOvokwlQm+FOdYydXeu5yhs96TOtnehoNz09r98H1cmlkHA9URX4RzUblIxc vHHeEeloK2+Vx3hICRyoeczmIT/7fyQ= Received: by mail-yb1-f202.google.com with SMTP id v200-20020a252fd1000000b00b8f548a72bbso4563999ybv.9 for ; Thu, 13 Apr 2023 17:11:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1681431119; x=1684023119; h=content-transfer-encoding:cc:to:from:subject:message-id :mime-version:date:from:to:cc:subject:date:message-id:reply-to; bh=i+pweVgTeWv08z0ELDTsB0Ltp1Ye2bcZK0KHo4GqVbw=; b=SxumLiXBZqVvPSsO4nE95gXtUUxXVcRm2giZRY9/5ORZ+OjADqAAbFJkobyKcqmfCt ePUsZvwKG04KkaahIzuPRZeniJUEzn01e6frQqq0jAgCaSwlASsR29xtCtNlSjb+T5pj VEWY0EVxAPptAmO48HG5tvNAJPPxntiFot3tECFeEreHjLrDdGnAKPlTKHy30+MbbqI8 64DeVDk4CEwT4IFGcaTIdQJJ7t2C3BZm2t+9Gp59h71CPchlPnK0jhl4w3WF/G3NQ2HG SBMasteYkoK5+FJ8YkRnsN77u+ShB1rljGgMoVKQvjdG4NVzxFbFqiD01CFCoQkbIqFa aeNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681431119; x=1684023119; h=content-transfer-encoding:cc:to:from:subject:message-id :mime-version:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=i+pweVgTeWv08z0ELDTsB0Ltp1Ye2bcZK0KHo4GqVbw=; b=QqzdvapCc7h9aOIyt+Dd45VgBqrpjqJuD+QiD5Fg9sKZYyhds9QMAWHAQfxfe4Wkk4 SU6Ps5OvSAuk/Eayj01BCMnm0bagjqeq9HaL+DZtrxNeWBQO1Kr9Gxq7LxLoV8nzAl17 lUnQd5QOybF0Be28V4RwRskIpHVNJ6/+OdlajGRIOz500rlzaNg2BGuOX+4t0fPLhFJu ZmEpuT+RFyWrfA975KrDaMsgWwCcaL4ThQK3NrRBQl+5AWHISuzAfdUPwgQB4mPR5vah Py9hbGeTZxLh97q3B7P8gSqqnikRDhyVTfUvhOZS2OvPPDspvg1BrHYlbIHbmes1Mnk0 EDJw== X-Gm-Message-State: AAQBX9eYZ2HKHR8mI+GvdYXguX6huN7Wx+fdBJibnjGBbORz2n3Z+D04 nKjGOHndEs5MGt3XJZCS23LzhP7MnTENfggT9w== X-Google-Smtp-Source: AKy350aFyMPBdyqRFopcQD1OOk7RIztiWHN973sMgROTFTwCsBiX6bTJGMx6PuL8Cu3fjd47w7zygKuC4QNdUU7X4w== X-Received: from ackerleytng-cloudtop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1f5f]) (user=ackerleytng job=sendgmr) by 2002:a25:72d6:0:b0:b8f:55f6:e50f with SMTP id n205-20020a2572d6000000b00b8f55f6e50fmr2609596ybc.1.1681431119162; Thu, 13 Apr 2023 17:11:59 -0700 (PDT) Date: Fri, 14 Apr 2023 00:11:49 +0000 Mime-Version: 1.0 X-Mailer: git-send-email 2.40.0.634.g4ca3ef3211-goog Message-ID: Subject: [RFC PATCH 0/6] Setting memory policy for restrictedmem file From: Ackerley Tng To: kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org Cc: aarcange@redhat.com, ak@linux.intel.com, akpm@linux-foundation.org, arnd@arndb.de, bfields@fieldses.org, bp@alien8.de, chao.p.peng@linux.intel.com, corbet@lwn.net, dave.hansen@intel.com, david@redhat.com, ddutile@redhat.com, dhildenb@redhat.com, hpa@zytor.com, hughd@google.com, jlayton@kernel.org, jmattson@google.com, joro@8bytes.org, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, linmiaohe@huawei.com, luto@kernel.org, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, mingo@redhat.com, naoya.horiguchi@nec.com, pbonzini@redhat.com, qperret@google.com, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, tabba@google.com, tglx@linutronix.de, vannapurve@google.com, vbabka@suse.cz, vkuznets@redhat.com, wanpengli@tencent.com, wei.w.wang@intel.com, x86@kernel.org, yu.c.zhang@linux.intel.com, muchun.song@linux.dev, feng.tang@intel.com, brgerst@gmail.com, rdunlap@infradead.org, masahiroy@kernel.org, mailhol.vincent@wanadoo.fr, Ackerley Tng X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 2976A40006 X-Stat-Signature: d6ehaqznwm1bkn3pry3zrrtaca7t5oyo X-HE-Tag: 1681431119-598961 X-HE-Meta: U2FsdGVkX19zorH4R7SUAguBUAAFZ/v65NbzcvfRZqVGEVbufj5JOydYEGaBNhU5Joq2x8iLGuHp5bIkkYRLl0M1tPZwXmveD5NlkhXp60Ix5KK/gY9UYRBqwftOYvs4NGL+Bddm7a1i22KJ1heoVXEVrQWPwF6hYBkjveT9BZ8hJCUmdyeEMROohf78RsAGJzeW2FNm1XKbDXTa/Gl5zZOeJWfXkc6asM3BvypEBinoU8FRykaBwQOK1VMezNJVt/BkxhjuZT7W5Kn3QMlHMfjjJSAsmzK43nEIudX0XmU/nIH8karLOt7iq/u7PSQ1k4i7+j0B64KmpbGBxHrn1RoNz4YN4hWk9J8zXGWRI3do3oR9fMqh6p1UXZ7hrxx9LTbD9RHlGQ9VtcxlBQvwzdhFiI4kjLd2pEY7a2oB1yGMbAcFgSCR6IM4TW9M54Oe5kJXguf3icEbO81wAhtur0YS0YlRyc391VSP0IMv33r/0yH0vjgFL8ajalo+2h4geqasiKxWCZDrcpoEjDcCJp7auQAKMZNT1D4dz5WaL0RVdEF/uHj8zrOC/B427xsqqmZClrh/+buxvHAa7NhPQxWg1CsxBwlljxlJ7JorPyb4Q+Y9cLrRxxDyw13hlkGx+orAVcbJg0ZOlM/1bT4qU3PNjqf8wIHqvUeQGAdRTg1Wc2SFg13kZEuLKE16QHXKJpmtKtGzjcRS2jBHb2FMW0o3J5mfeh4wU+nlT4Y7+q5icLyhr57gofvJ2C0+xYkfX9rMqevpUidxa3dczptnSU6jojD+inawMq8ujvBi1WjaZSZQHjDAZxmmdizCm8fgcqJstBIP83q8huPVfG+Qmc7rbDIhtB30F+y5p55tIfciZN8bFA5DSY6R4BXbni5okyRtKz7CdZb+o4snPnHqPX4yV18MIRxhqE+TQwVkFEi43B+l8qcj+Z4WPi5Gyef1QploxVYsbweyEa/Xk4v TsxKL1FJ 1MdChSfEN51MiXBq2YAdKp5BOigS+nzy0RuHP2XbC9R0rnqBWyf9gk/Gtly1k8+Vbe8Q3o+xzJXtWZmnhoIc+xhdXEIbbk7ccDEJlk0Hb6cgsqwx3+BNmG2vRQzH6qvxZSJK7tw87sxDuSE+4ELSoH3rxXmKVq70sTy3JfoFoZL9ELp8BHlmU4U6orPyeKoMp/16WHhQNYK+kdQ8I2Gv5o3J3ROIRSJi0Pi4GBAq2RsKPpp535eUx2oXIccFsQoyCA5AfgjfB2kvhnIZwtEK+s3GrozHw33McTKobSuO3IbZEVc1U+iqcHSj8/xWqlwyGU/Eiyawkjb4tWW10xAkrSPMKiK7JauJVEurM12jW3rmkzBmeQ4lzDBhOrkhhUxSqFaTX0droJxDFAg5kPlmahPgQXKtLPDP7ozRUMNW4P8bOiPequBF2+fZ1XzlMn6VWli6fmh+G7Jd0wOiSUrWH96VIkDsqS6zV6Y1+qXtENWfqAXlIvdQCywmsqY5R3z3+iaH8f+i20O9+yMcWmY5awbzkRPCqihupqFVVqQCjUi9bCYEoP5YmpbR4p3j7G2MP+bSWncyQuh2U77sB8xHh1Wv4assQHvgfbW4aY9/VH7+R8Mxb1/brelaRuyT7GYPzkNEdGbL0Aq15dPcykHGcZHyXtCVaTOPft0immBTMF1s/Epfc9LuX3NSS0P7Qm8/wmQ1FdcAQYw3vPtH7nvSC1YIuYjobE3P6cQ6Um5B7rGaaj6m8O23wIl4lHIpPkiaOLDv2tLR5eG7W0lg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello, This patchset builds upon the memfd_restricted() system call that was discussed in the 'KVM: mm: fd-based approach for supporting KVM' patch series [1]. The tree can be found at: https://github.com/googleprodkernel/linux-cc/tree/restrictedmem-set-memory-policy In this patchset, a new syscall is introduced, which allows userspace to set the memory policy (e.g. NUMA bindings) for a restrictedmem file, to the granularity of offsets within the file. The offset/length tuple is termed a file_range which is passed to the kernel via a pointer to get around the limit of 6 arguments for a syscall. The following other approaches were also considered: 1. Pre-configuring a mount with a memory policy and providing that mount to memfd_restricted() as proposed at [2]. + Pro: It allows choice of a specific backing mount with custom memory policy configurations + Con: Will need to create an entire new mount just to set memory policy for a restrictedmem file; files on the same mount cannot have different memory policies. 2. Passing memory policy to the memfd_restricted() syscall at creation time. + Pro: Only need to make a single syscall to create a file with a given memory policy + Con: At creation time, the kernel doesn’t know the size of the restrictedmem file. Given that memory policy is stored in the inode based on ranges (start, end), it is awkward for the kernel to store the memory policy and then add hooks to set the memory policy when allocation is done. 3. A more generic fbind(): it seems like this new functionality is really only needed for restrictedmem files, hence a separate, specific syscall was proposed to avoid complexities with handling conflicting policies that may be specified via other syscalls like mbind() TODOs + Return -EINVAL if file_range is not within the size of the file and tests for this Dependencies: + Chao’s work on UPM [3] [1] https://lore.kernel.org/lkml/20221202061347.1070246-1-chao.p.peng@linux.intel.com/T/ [2] https://lore.kernel.org/lkml/cover.1681176340.git.ackerleytng@google.com/T/ [3] https://github.com/chao-p/linux/commits/privmem-v11.5 --- Ackerley Tng (6): mm: shmem: Refactor out shmem_shared_policy() function mm: mempolicy: Refactor out mpol_init_from_nodemask mm: mempolicy: Refactor out __mpol_set_shared_policy() mm: mempolicy: Add and expose mpol_create mm: restrictedmem: Add memfd_restricted_bind() syscall selftests: mm: Add selftest for memfd_restricted_bind() arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + include/linux/mempolicy.h | 4 + include/linux/shmem_fs.h | 7 + include/linux/syscalls.h | 5 + include/uapi/asm-generic/unistd.h | 5 +- include/uapi/linux/mempolicy.h | 7 +- kernel/sys_ni.c | 1 + mm/mempolicy.c | 100 ++++++++++--- mm/restrictedmem.c | 75 ++++++++++ mm/shmem.c | 10 +- scripts/checksyscalls.sh | 1 + tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 8 + .../selftests/mm/memfd_restricted_bind.c | 139 ++++++++++++++++++ .../mm/restrictedmem_testmod/Makefile | 21 +++ .../restrictedmem_testmod.c | 89 +++++++++++ tools/testing/selftests/mm/run_vmtests.sh | 6 + 18 files changed, 454 insertions(+), 27 deletions(-) create mode 100644 tools/testing/selftests/mm/memfd_restricted_bind.c create mode 100644 tools/testing/selftests/mm/restrictedmem_testmod/Makefile create mode 100644 tools/testing/selftests/mm/restrictedmem_testmod/restrictedmem_testmod.c -- 2.40.0.634.g4ca3ef3211-goog