From patchwork Fri Feb 21 16:07:16 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Roy X-Patchwork-Id: 13985869 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90C31C021B3 for ; Fri, 21 Feb 2025 16:09:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1222028000B; Fri, 21 Feb 2025 11:09:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0D298280001; Fri, 21 Feb 2025 11:09:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E90A428000B; Fri, 21 Feb 2025 11:09:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id C99FB280001 for ; Fri, 21 Feb 2025 11:09:32 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 813EE5164C for ; Fri, 21 Feb 2025 16:09:32 +0000 (UTC) X-FDA: 83144436984.15.73B5F1A Received: from smtp-fw-80007.amazon.com (smtp-fw-80007.amazon.com [99.78.197.218]) by imf06.hostedemail.com (Postfix) with ESMTP id 656ED180006 for ; Fri, 21 Feb 2025 16:09:30 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=LbhPhfYM; spf=pass (imf06.hostedemail.com: domain of "prvs=140b82bcc=roypat@amazon.co.uk" designates 99.78.197.218 as permitted sender) smtp.mailfrom="prvs=140b82bcc=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740154170; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nm9zNepSUdoA1TQLI0kM62WN+98N1j5Ykxo1Wq9f3hY=; b=erSvrmsWfv657yM11M3938zFxq3UcgoPSSBQZUhecniaJFd1T7oWG5y3S8p9SUdGPmKcpT xxLzgiO89Y+eGLzrCpsOZEG+CkPlPsq1f0ctHMpnx1sW81qOOdySIacDJIGAGfGd2tgSsa pFgjl7iXyVUkvRzUIYiqkkr0ZgYZBNc= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=amazon.co.uk header.s=amazon201209 header.b=LbhPhfYM; spf=pass (imf06.hostedemail.com: domain of "prvs=140b82bcc=roypat@amazon.co.uk" designates 99.78.197.218 as permitted sender) smtp.mailfrom="prvs=140b82bcc=roypat@amazon.co.uk"; dmarc=pass (policy=quarantine) header.from=amazon.co.uk ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740154170; a=rsa-sha256; cv=none; b=HIw6e5+xASvJ6ge33Ney5u7cLCKu2/awnapbFW6uTpadKTYxTkLdr3KR2Ep6pfhTzA5IaH A0jYfz4fuySoW0Ze8KLe5L0B6ASEFoPsn6WWkge2jpp0RBGqT7CqiQtM+OPWVF9vCy5GBo LfZBhcC/aq2WLnA4mJB4ouiDNf4N7ik= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.uk; i=@amazon.co.uk; q=dns/txt; s=amazon201209; t=1740154171; x=1771690171; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nm9zNepSUdoA1TQLI0kM62WN+98N1j5Ykxo1Wq9f3hY=; b=LbhPhfYMRus1+SkF6PaJHW1VumwEoPAibRsMp7wNjKM6WF4R2i0s1AWc Gw4ZbnV/VotrBW3NAN9T8qvIXOQGvkpQp8sOk0fLZp3lyCQYbE2RR4OI4 Td+mZSiOl3jOns0N/dnLkdP7EDrF/uh/q4BIl3WCJPMSWLiGObuCxOgEK I=; X-IronPort-AV: E=Sophos;i="6.13,305,1732579200"; d="scan'208";a="379576549" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-80007.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Feb 2025 16:09:30 +0000 Received: from EX19MTAUWB001.ant.amazon.com [10.0.7.35:14255] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.11.69:2525] with esmtp (Farcaster) id 2f8b5d22-ec9a-4b09-902a-79e210123256; Fri, 21 Feb 2025 16:09:28 +0000 (UTC) X-Farcaster-Flow-ID: 2f8b5d22-ec9a-4b09-902a-79e210123256 Received: from EX19D003UWB001.ant.amazon.com (10.13.138.92) by EX19MTAUWB001.ant.amazon.com (10.250.64.248) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.39; Fri, 21 Feb 2025 16:09:28 +0000 Received: from EX19MTAUWC002.ant.amazon.com (10.250.64.143) by EX19D003UWB001.ant.amazon.com (10.13.138.92) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1544.14; Fri, 21 Feb 2025 16:09:28 +0000 Received: from email-imr-corp-prod-pdx-all-2b-c1559d0e.us-west-2.amazon.com (10.25.36.210) by mail-relay.amazon.com (10.250.64.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.39 via Frontend Transport; Fri, 21 Feb 2025 16:09:28 +0000 Received: from ua2d7e1a6107c5b.ant.amazon.com (dev-dsk-roypat-1c-dbe2a224.eu-west-1.amazon.com [172.19.88.180]) by email-imr-corp-prod-pdx-all-2b-c1559d0e.us-west-2.amazon.com (Postfix) with ESMTPS id A35CC404C9; Fri, 21 Feb 2025 16:09:20 +0000 (UTC) From: Patrick Roy To: , , CC: Patrick Roy , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH v4 03/12] KVM: guest_memfd: Add flag to remove from direct map Date: Fri, 21 Feb 2025 16:07:16 +0000 Message-ID: <20250221160728.1584559-4-roypat@amazon.co.uk> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250221160728.1584559-1-roypat@amazon.co.uk> References: <20250221160728.1584559-1-roypat@amazon.co.uk> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 656ED180006 X-Stat-Signature: kuddimue3apfarecptpqxgsqwtc54i5e X-Rspamd-Server: rspam03 X-HE-Tag: 1740154170-325384 X-HE-Meta: U2FsdGVkX1/1eQcltpbpVY0URjrGPL9ilSAci6xYWDwpR+Ua4RQA0Wn+WBCc1mtQZQGgd2t2WWl49vNX1tc/D8iE7nX+zCykFjycSIk5W12WYnS9oCLRWSRGwpE5Bcpgei+QmHm1sfTbJBEbVY+htqAshSap2mPBir8RntNVYun060qWfOqtA4+Xv8BSzrRSwyMl3V0MbzPahc68vI8FwFHUyDwyaHlDRT04HeSLZrEzbrJ30kAZSDUpw/32m0yx7SZhJamS2oCorFNwo/0/euBc2uwfO+2BVg99TkNAkHosD3lsP+p7YrPML7EHXyjU4KqJZtGBlZiYZuUHTo7LmK6GVUWMP4eLF9Bi63x1hBkjAGy1yy5tjsHmgWPXxwKogRdEo4oW3ktqO4N1+89b0UshB1H6TAZv8UQHXYKOnmYee59nELhW+X8AfqThuOLu3DXnS8bwnfNGI0edVArXGe7B+4aarjtlgpJJ+N+dpfPtZbOjce4oYYbAPQ/yUHYzm1f3C/izzrkfhb/XBr2sC75Cs/TjBmkN6G5+8TBqDgbTDMLsEYFIMY460Td2vfNfeXup/SvuAf24/YMQ3WH68scpmqRQAvUryGVD6XsjKscnXTKZ14vJHyVXILh+QW9DDet3e37I2MEazl5rnOpA0812aJRMgLBHU3F34TkXimeEY2tHcwHCNz9qRgodyBa1Fh8Tk0YcJMd8kpQ8bUjxgk3HZm4fjf6z89RfihjpHI+DRRLdO3P7Th+0qwlfGnr1GvRm8DzAdd6PX9Trjo44KWPLzDS8gCL3Bd2R992fNlQPIQsQNRt35wlDhKvExDTWd7sbZ/FqdVsWRCmFEQplIAjpZW9aAYNtU7bjv+zcjxcn19AvoDiyF6t08YNaHiOnhrQgCd30xee9Ds7NFeImWiA5OspT75cWRystursICp4wlWVpPNPJQojFdmouDOsPnkadx84H+P/oEYwkBl5 pyFq7VwL F49OEy8/fOIpkZSJzxKt45gigT10lRqIg+YFj456PJHtTmUsieqcpi/aMxrs9vSi8OEv6hNzjNJEE2agFc8g+LgAjS0mNUMhWi3/ydplepJ/b2qP1mbqezU47UMlgpIVk1Iu6iyRRu9TJb9QbZPDWVZBM4K156Qu4zA8qxkvzWm6u3c5+6FAMH/Mb8YYDUVpj1z6EtUInA41C2xcavhlB0laCb0tSu7eTlHDu0CenYEkSO2E7Pflu4db55P/6XP2sSDu63L4vJEZ0SjnB9LQSLm35B91leC3B9ea2g3pOW5vhmuyfa83CZWXLO5t8D/qQnduyVHoqhnhLbAVigy9QVBGRTXQSBRHrDgZBOjwZ32g4h/5crbZ8pd6YcVJ9wK3ivVpURW/sWOFvzikAE8ufeDOWzvE5MFx21NYLeIPTw5VtZpN23xHeYAvSOx+kNH7PmYXoaJaZZNqWFom6veRHz1mlgPQv5N84Oo76bbBD0IFv4S4aWjkulUMIp620QdAvPaJt X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add KVM_GMEM_NO_DIRECT_MAP flag for KVM_CREATE_GUEST_MEMFD() ioctl. When set, guest_memfd folios will be removed from the direct map after preparation, with direct map entries only restored when the folios are freed. To ensure these folios do not end up in places where the kernel cannot deal with them, set AS_NO_DIRECT_MAP on the guest_memfd's struct address_space if KVM_GMEM_NO_DIRECT_MAP is requested. Note that this flag causes removal of direct map entries for all guest_memfd folios independent of whether they are "shared" or "private" (although current guest_memfd only supports either all folios in the "shared" state, or all folios in the "private" state if !IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM)). The usecase for removing direct map entries of also the shared parts of guest_memfd are a special type of non-CoCo VM where, host userspace is trusted to have access to all of guest memory, but where Spectre-style transient execution attacks through the host kernel's direct map should still be mitigated. Note that KVM retains access to guest memory via userspace mappings of guest_memfd, which are reflected back into KVM's memslots via userspace_addr. This is needed for things like MMIO emulation on x86_64 to work. Previous iterations attempted to instead have KVM temporarily restore direct map entries whenever such an access to guest memory was needed, but this turned out to have a significant performance impact, as well as additional complexity due to needing to refcount direct map reinsertion operations and making them play nicely with gmem truncations. This iteration also doesn't have KVM perform TLB flushes after direct map manipulations. This is because TLB flushes resulted in a up to 40x elongation of page faults in guest_memfd (scaling with the number of CPU cores), or a 5x elongation of memory population. On the one hand, TLB flushes are not needed for functional correctness (the virt->phys mapping technically stays "correct", the kernel should simply to not it for a while), so this is a correct optimization to make. On the other hand, it means that the desired protection from Spectre-style attacks is not perfect, as an attacker could try to prevent a stale TLB entry from getting evicted, keeping it alive until the page it refers to is used by the guest for some sensitive data, and then targeting it using a spectre-gadget. Signed-off-by: Patrick Roy --- include/uapi/linux/kvm.h | 2 ++ virt/kvm/guest_memfd.c | 28 +++++++++++++++++++++++++++- 2 files changed, 29 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 117937a895da..4654c01a0a01 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1573,6 +1573,8 @@ struct kvm_create_guest_memfd { __u64 reserved[6]; }; +#define KVM_GMEM_NO_DIRECT_MAP (1ULL << 0) + #define KVM_PRE_FAULT_MEMORY _IOWR(KVMIO, 0xd5, struct kvm_pre_fault_memory) struct kvm_pre_fault_memory { diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 30b47ff0e6d2..bd7d361c9bb7 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -4,6 +4,7 @@ #include #include #include +#include #include "kvm_mm.h" @@ -42,8 +43,23 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slo return 0; } +static bool kvm_gmem_test_no_direct_map(struct inode *inode) +{ + return ((unsigned long) inode->i_private) & KVM_GMEM_NO_DIRECT_MAP; +} + static inline void kvm_gmem_mark_prepared(struct folio *folio) { + struct inode *inode = folio_inode(folio); + + if (kvm_gmem_test_no_direct_map(inode)) { + int r = set_direct_map_valid_noflush(folio_page(folio, 0), folio_nr_pages(folio), + false); + + if (!r) + folio_set_private(folio); + } + folio_mark_uptodate(folio); } @@ -479,6 +495,10 @@ static void kvm_gmem_free_folio(struct folio *folio) kvm_pfn_t pfn = page_to_pfn(page); int order = folio_order(folio); + if (folio_test_private(folio)) + WARN_ON_ONCE(set_direct_map_valid_noflush(folio_page(folio, 0), + folio_nr_pages(folio), true)); + kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order)); } #endif @@ -552,6 +572,9 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) /* Unmovable mappings are supposed to be marked unevictable as well. */ WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping)); + if (flags & KVM_GMEM_NO_DIRECT_MAP) + mapping_set_no_direct_map(inode->i_mapping); + kvm_get_kvm(kvm); gmem->kvm = kvm; xa_init(&gmem->bindings); @@ -571,7 +594,10 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args) { loff_t size = args->size; u64 flags = args->flags; - u64 valid_flags = 0; + u64 valid_flags = KVM_GMEM_NO_DIRECT_MAP; + + if (!can_set_direct_map()) + valid_flags &= ~KVM_GMEM_NO_DIRECT_MAP; if (flags & ~valid_flags) return -EINVAL;