From patchwork Fri Feb 21 16:07:13 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Roy X-Patchwork-Id: 13985867 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1789C021B3 for ; Fri, 21 Feb 2025 16:09:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 39278280004; Fri, 21 Feb 2025 11:09:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 34252280001; Fri, 21 Feb 2025 11:09:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1BCA4280004; Fri, 21 Feb 2025 11:09:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id ECE18280001 for ; Fri, 21 Feb 2025 11:09:16 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A21CA1CA91E for ; Fri, 21 Feb 2025 16:09:16 +0000 (UTC) X-FDA: 83144436312.11.DCB2579 Received: from smtp-fw-52003.amazon.com (smtp-fw-52003.amazon.com [52.119.213.152]) by imf09.hostedemail.com (Postfix) with ESMTP id A227E14000E for ; Fri, 21 Feb 2025 16:09:14 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b="a/Tyw2Nl"; spf=pass (imf09.hostedemail.com: domain of "prvs=140b82bcc=roypat@amazon.co.uk" designates 52.119.213.152 as permitted sender) smtp.mailfrom="prvs=140b82bcc=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740154154; a=rsa-sha256; cv=none; b=aEp2c5TnDqJ6OpGD/2CnmjRTwaYX6KPGoNfErDdtcFbwPDGDCam5vpMP4cnCTUqxnwKXLo byWEuPpmPNjjTzBUvlw0aDjMQsxb9LMKgNg9qqSI3mKnJ8C2XabxvrqISZkfFsY0/uhM8D zBmQZpKGrF5Xd3W8H6ElCS8OA2E3InQ= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b="a/Tyw2Nl"; spf=pass (imf09.hostedemail.com: domain of "prvs=140b82bcc=roypat@amazon.co.uk" designates 52.119.213.152 as permitted sender) smtp.mailfrom="prvs=140b82bcc=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740154154; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=PHVzDpLvU92kBE3YeWHx+UpvZAPAAQDXw2hzc0F/GoM=; b=w09O18EKW5Wgc+iAXU9HqTiDwYx96S4FKIsoEV5/H0ExruGTOz7Fkl5NO0t6aECcR/eANb XPefPWpItv1MRoA87EPT209lxxX76DUrgrphYtdsFQeuNrFDVVkbpUvpDZx/4MIegwGh1f WqbKSSn+8U+UVUcevytIZHsid8T7fXA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1740154154; x=1771690154; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=PHVzDpLvU92kBE3YeWHx+UpvZAPAAQDXw2hzc0F/GoM=; b=a/Tyw2NlDSQU93jYMTiaVHpqswsizlOaXdEa7lza5pj4i3O6iBLfSxDv fRfUKeeuGn0srJ1T4ZhC4dz5G30A8Tnm0a8JVLnmBkH7pkkoqlcaQCcbi LzYtcAQ3/pZZMtbsm9+4BGhR65DuveGr7YSILWsIxEtM+BslyIc5HCVLN U=; X-IronPort-AV: E=Sophos;i="6.13,305,1732579200"; d="scan'208";a="68171595" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52003.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Feb 2025 16:09:09 +0000 Received: from EX19MTAUWC001.ant.amazon.com [10.0.38.20:27470] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.48.97:2525] with esmtp (Farcaster) id 0f0eb951-77e0-44a7-963b-1320b4a5ebbf; Fri, 21 Feb 2025 16:09:09 +0000 (UTC) X-Farcaster-Flow-ID: 0f0eb951-77e0-44a7-963b-1320b4a5ebbf Received: from EX19D003UWB004.ant.amazon.com (10.13.138.24) by EX19MTAUWC001.ant.amazon.com (10.250.64.174) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.39; Fri, 21 Feb 2025 16:08:57 +0000 Received: from EX19MTAUWC002.ant.amazon.com (10.250.64.143) by EX19D003UWB004.ant.amazon.com (10.13.138.24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 21 Feb 2025 16:08:57 +0000 Received: from email-imr-corp-prod-pdx-all-2b-c1559d0e.us-west-2.amazon.com (10.25.36.210) by mail-relay.amazon.com (10.250.64.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.39 via Frontend Transport; Fri, 21 Feb 2025 16:08:57 +0000 Received: from ua2d7e1a6107c5b.ant.amazon.com (dev-dsk-roypat-1c-dbe2a224.eu-west-1.amazon.com [172.19.88.180]) by email-imr-corp-prod-pdx-all-2b-c1559d0e.us-west-2.amazon.com (Postfix) with ESMTPS id 5BAFC404C9; Fri, 21 Feb 2025 16:08:50 +0000 (UTC) From: Patrick Roy To: , , CC: Patrick Roy , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH v4 00/12] Direct Map Removal for guest_memfd Date: Fri, 21 Feb 2025 16:07:13 +0000 Message-ID: <20250221160728.1584559-1-roypat@amazon.co.uk> X-Mailer: git-send-email 2.48.1 MIME-Version: 1.0 X-Rspamd-Queue-Id: A227E14000E X-Stat-Signature: 9bau4mfkdjztcaito4bmkto3jc4hdybm X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1740154154-26080 X-HE-Meta: U2FsdGVkX18J7lcPKaGZ8g+zD+hyljmG/LZK5o7qc9EIaYeiLC60VK/1c3g5U3TsWVEZYWBaPHxEw9osJXTahkkq/InaVF2J/Lv2iVn99QgEUTyXPEqgRZPoUjUcOiBUg0PvoDxOTTThvxIV+0/HgNqkQTqgkh8s3MalbIfEyvw8Tepwlpj0+dcz6MuFWYPH7Gnc+NFO+BYtkxQBYshmzJ/2cATqMWxwXGu/2wUJxTXwU3HxLjZXEBxVfmFYvVCkFFU0psLkUiGlVWPU5eRuw+dbzx/SCQIjlef8UzrJqsSKc1lvzHkfoKMcoxcKtqbi6F8MelXDRMJCQBY6t7eR03vTMumT3hsYRpMqFlk56jOy/Ykt+BxWG/xSL6oQVWHcKo9HNA2nIRW9+vdQMjjmXERJ9rVs1J3VaJevjzS9L8SReStGUmF2BbrMy8zhO5CCugkTAbi5QUIQF9YhWSgUb14akAPxegHW+4tLugH58wRhutoUveoAjtcPJFiSQCkyclPziDFyVAkmeCaSRv4t20td/2Bx15ZiZjFnREogkS4Iv+9XM0SEpMIRP9FAsWNDhlIs8PV4U/4qNW3f6Y6sd5NLOdfBjW5uG4vOOmT62Jk9ODzi+/+Ixap+aqmQNwIxi5PdWuGfHxt+qWsih4HzHZZEp1hSEoIxxIAfWZxL++kjEYQDeRuljizZN/vVj1RdeFroH0PdC3HOMO7Fs0pRxDcv3OJKhm8aBJ37J9vKw/Jm+9O7+O4IwoWQwsKio9gGq6Iz4JYfMXV1gogPCyLfpDcGsj+5SE0SaNiQCOo/W1yHe4+SKDOFdayeBsLEzivNp8MGWlm8KIeKTBKu7RAnwgLGUm/PGExrqxd+qHcCzUeEgkpv3WDbgn7yF28ntwHIAFbyEbhrTRIBd90+79oCG7B5TT4A9zGLlplaGBNzddH4i1ZJSxRMEOQvo2R1KBX1EIuhgfVpdExwaGImG8B 4ujIJ3zw F6k6Y1/rajYuVoPxN9q3rEhHOiNeJ3QJx4qW8Vfsy/Y2QC+R1txSrVKNj2QPRsPh6ZKXnLFaHiYPaCiM/9UsSpfvNaqD1rmSFfKHvURVKA4wTfQXe6oOEGq4VYBuI/2ZCdZb2sPkMUFyfbKXn8R2pY81mZ/ZPRoTwXeMCS+ku1q9tOofHuOgVdh9wYJVbehE2RPqZN4LgCcuvjWl1vqNu8Ig1mRsommCGb1k2sANSyfD6lxZ7+ImVNCxpytwoodE9C9XEkZfQVBIYfngB/bY8UGDKRXjbJSFPuCuwfTPKEPJl6Rmf693Y0JDiM1p/5WBeuxNsfp+VPMEa7zfbAwspoqFG/iLaDKzCW1ceOGrynir3KzRe0eU6yOCdghrffxTO/Og2XWk+0f5KZW/75wwqzetqDU4Xrr53Uh2quvxkYfoKlONlHhYzvhIjL/IKs4E9KT2VbZBPk487qLNe2g7Gncnw1JGMHKGkwwi61Ma784lsiIM0D6RvSj/GPsHVBJBmPpIYhhuCGEmz9XDfw88dVt/m10D3HyiZsKe0YMBkXmtQB4GUwuf1T1LhSsDISlhbysYuq39g9hRug4EAbp4e6k6gegabH1a6hLZcaAszcNa2Wfo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.001292, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Unmapping virtual machine guest memory from the host kernel's direct map is a successful mitigation against Spectre-style transient execution issues: If the kernel page tables do not contain entries pointing to guest memory, then any attempted speculative read through the direct map will necessarily be blocked by the MMU before any observable microarchitectural side-effects happen. This means that Spectre-gadgets and similar cannot be used to target virtual machine memory. Roughly 60% of speculative execution issues fall into this category [1, Table 1]. This patch series extends guest_memfd with the ability to remove its memory from the host kernel's direct map, to be able to attain the above protection for KVM guests running inside guest_memfd. === Changes to RFC v3 === - Settle relationship between direct map removal and shared/private memory in guest_memfd (David H.) - Omit TLB flushes upon direct map removal again - Settle uABI for how KVM accesses guest memory in non-CoCo guest_memfd VMs (upstream guest_memfd calls) - Add selftests that exercise the codepaths of non-CoCo guest_memfd VMs Lastly, this series is rebased on top of Fuad's v4 for shared mapping of guest_memfd [2]. The KVM parts should also apply on top of 0ad2507d5d93 ("Linux 6.14-rc3"), but the selftest patches need Fuad's series as base. === Overview === guest_memfd should be usable for "non-CoCo" VMs - virtual machines where host userspace is trusted (e.g. can have access to all of guest memory), but which should still be hardened against speculative execution attacks (Spectre, etc.) staged through potentially existing gadgets in the host kernel. To attain this hardening, unmap guest memory from the host kernels address space (e.g. zap direct map entries), while allowing KVM to continue accessing guest memory through userspace mappings. This works because KVM already almost always uses userspace mappings whenever KVM needs to access guest memory - the only parts that require direct map entries (because they use GUP) are KVM's MMU, and kvm-clock on x86. Building on top of guest_memfd sidesteps the MMU problem, as for memslots with KVM_MEM_GUEST_MEMFD set, the MMU consumes fd + offset directly without going through any VMAs. kvm-clock on the other hand is not strictly needed (guests boot fine without it), so ignore it for now. === Implementation === Make KVM_CREATE_GUEST_MEMFD accept a flag (KVM_GMEM_NO_DIRECT_MAP) that instructs it to remove newly allocated folios from the host kernels direct map immediately after preparation. Nothing further is needed to make non-CoCo VMs work - particularly, KVM does not need to be taught any special ways of accessing guest memory if it is in guest_memfd. Userspace can simply mmap guest_memfd (via KVM_GMEM_SHARED_MEM added in Fuad's series), and set the memslot's userspace_addr to this userspace mapping of guest_memfd. === Open Questions === In this patch series, stale TLB entries do not get flushed after direct map entries are marked as not present. This is fine from a functional point of view (as the mapping is still valid, it's just temporarily not supposed to be used), but pokes a theoretical hole into the speculation protection: Something could try to keep alive stale TLB entries for specific pages until the guest starts using them for sensitive information, and then stage a Spectre attack on that memory, despite it being unmapped. In practice, this would require knowing in advance, at gmem fault-time, which pages will eventually contain information of interest, and then preventing these specific TLB entries from getting naturally evicted (where the number of pages that can be targeted like this is limited by the size of the TLB). These seem to be fairly difficult requisites to fulfill, but we were wondering what the community thinks. === Summary === Patch 1 adds a struct address_space flag that indices that folios in a mapping are direct map removed, and threads it through mm code to ensure direct map removed folios don't end up in places where they can cause mayhem (particularly, we reject them in get_user_pages). Since these checks end up being duplicates of already existing checks for secretmem folios, patch 2 unifies the two by using the new address_space flag for secretmem mappings. Patches 3 through 5 are about support for direct map removal in guest_memfd, while patches 6 through 12 are about testing the non-CoCo setup in KVM selftests, with patches 6 through 9 being preparatory, and patches 10 through 12 adding the actual test cases. [1]: https://download.vusec.net/papers/quarantine_raid23.pdf [2]: https://lore.kernel.org/kvm/20250218172500.807733-1-tabba@google.com/ [RFC v1]: https://lore.kernel.org/kvm/20240709132041.3625501-1-roypat@amazon.co.uk/ [RFC v2]: https://lore.kernel.org/kvm/20240910163038.1298452-1-roypat@amazon.co.uk/ [RFC v3]: https://lore.kernel.org/kvm/20241030134912.515725-1-roypat@amazon.co.uk/ Patrick Roy (12): mm: introduce AS_NO_DIRECT_MAP mm/secretmem: set AS_NO_DIRECT_MAP instead of special-casing KVM: guest_memfd: Add flag to remove from direct map KVM: Add capability to discover KVM_GMEM_NO_DIRECT_MAP support KVM: Documentation: document KVM_GMEM_NO_DIRECT_MAP flag KVM: selftests: load elf via bounce buffer KVM: selftests: set KVM_MEM_GUEST_MEMFD in vm_mem_add() if guest_memfd != -1 KVM: selftests: Add guest_memfd based vm_mem_backing_src_types KVM: selftests: stuff vm_mem_backing_src_type into vm_shape KVM: selftests: adjust test_create_guest_memfd_invalid KVM: selftests: set KVM_GMEM_NO_DIRECT_MAP in mem conversion tests KVM: selftests: Test guest execution from direct map removed gmem Documentation/virt/kvm/api.rst | 13 ++++ include/linux/pagemap.h | 16 +++++ include/linux/secretmem.h | 18 ------ include/uapi/linux/kvm.h | 3 + lib/buildid.c | 4 +- mm/gup.c | 14 +--- mm/mlock.c | 2 +- mm/secretmem.c | 6 +- .../testing/selftests/kvm/guest_memfd_test.c | 2 +- .../testing/selftests/kvm/include/kvm_util.h | 29 ++++++--- .../testing/selftests/kvm/include/test_util.h | 8 +++ tools/testing/selftests/kvm/lib/elf.c | 8 +-- tools/testing/selftests/kvm/lib/io.c | 23 +++++++ tools/testing/selftests/kvm/lib/kvm_util.c | 64 +++++++++++-------- tools/testing/selftests/kvm/lib/test_util.c | 8 +++ tools/testing/selftests/kvm/lib/x86/sev.c | 1 + .../selftests/kvm/pre_fault_memory_test.c | 1 + .../selftests/kvm/set_memory_region_test.c | 40 ++++++++++++ .../kvm/x86/private_mem_conversions_test.c | 7 +- virt/kvm/guest_memfd.c | 24 ++++++- virt/kvm/kvm_main.c | 5 ++ 21 files changed, 214 insertions(+), 82 deletions(-) base-commit: da40655874b54a2b563f8ceb3ed839c6cd38e0b4