From patchwork Wed Sep 11 14:34:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fares Mehanna X-Patchwork-Id: 13800773 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 10021EE49BC for ; Wed, 11 Sep 2024 14:57:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:CC: To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=M6VWy3IvZCNc4zvvTM+7ldu6BbKiq6bFPxrlGyI8xuQ=; b=c2qFbJAaLgtdpwTeTAonYVwF2o VN13A3tfyEa6YDoAVrJ8UWQpyR3SEpUU3iClXoACa1BZVLoO/AAPziFJ2mTa+Nn8S4wF8PyKlSXmn wD9dWT/kLrDrsY+6Q3d56EuIj3rsVr5zKchBRmlwvLxwJU3osb4Xe0Ps+xGrV7nsI2DCY8jLoiqRQ L7Hw39532Ix+YBO+2+NNFse9Q3j1tM8ZULWSGakDPPYSvneskmrHVX3Ho8TEAf27SkkKNkjfvhN4L wr+gHRAuGnZ4HDasxuh72aiVBbGv4XrggjaAWDIdi0xbjTLxE2Pmq+t/UYX5ZaisYd0u9xuw4vc2i VdKaO++Q==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1soOmL-0000000A2On-1p8E; Wed, 11 Sep 2024 14:57:09 +0000 Received: from smtp-fw-52005.amazon.com ([52.119.213.156]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1soORj-00000009wKc-015i for linux-arm-kernel@lists.infradead.org; Wed, 11 Sep 2024 14:35:53 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1726065351; x=1757601351; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=M6VWy3IvZCNc4zvvTM+7ldu6BbKiq6bFPxrlGyI8xuQ=; b=JNdnTI26Q9gLq1dV3unh9t3wEeD/ugo+jsSsqVr099KljpiS+ydss1X2 PPdFQS+eZ1bdVozF5dCZzdXTGwuoR1+IEhuXxyf4szHzGp0sQcEPzeMIo yU94HhBrtO6GZMvdCFnWvhKJ9Ij45a5W0RnV/C11dLk2Z51/dgfdficP5 o=; X-IronPort-AV: E=Sophos;i="6.10,220,1719878400"; d="scan'208";a="679649476" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52005.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Sep 2024 14:35:48 +0000 Received: from EX19MTAEUC002.ant.amazon.com [10.0.43.254:50131] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.39.168:2525] with esmtp (Farcaster) id ce0e4cf6-8c87-42ac-ae89-33a20b34e603; Wed, 11 Sep 2024 14:35:46 +0000 (UTC) X-Farcaster-Flow-ID: ce0e4cf6-8c87-42ac-ae89-33a20b34e603 Received: from EX19D007EUA004.ant.amazon.com (10.252.50.76) by EX19MTAEUC002.ant.amazon.com (10.252.51.245) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Wed, 11 Sep 2024 14:35:42 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D007EUA004.ant.amazon.com (10.252.50.76) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.35; Wed, 11 Sep 2024 14:35:41 +0000 Received: from dev-dsk-faresx-1b-27755bf1.eu-west-1.amazon.com (10.253.79.181) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Wed, 11 Sep 2024 14:35:39 +0000 From: Fares Mehanna To: CC: , Fares Mehanna , Roman Kagan , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Andrew Morton , Kemeng Shi , =?utf-8?q?Pierre-Cl=C3=A9ment_Tosi?= , Ard Biesheuvel , Mark Rutland , "Javier Martinez Canillas" , Arnd Bergmann , Fuad Tabba , Mark Brown , Joey Gouly , Kristina Martsenko , "Randy Dunlap" , Bjorn Helgaas , Jean-Philippe Brucker , "Mike Rapoport (IBM)" , David Hildenbrand , "moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , "open list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , open list , "open list:MEMORY MANAGEMENT" Subject: [RFC PATCH 1/7] mseal: expose interface to seal / unseal user memory ranges Date: Wed, 11 Sep 2024 14:34:00 +0000 Message-ID: <20240911143421.85612-2-faresx@amazon.de> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240911143421.85612-1-faresx@amazon.de> References: <20240911143421.85612-1-faresx@amazon.de> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240911_073551_212561_23B5FF08 X-CRM114-Status: GOOD ( 22.69 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org To make sure the kernel mm-local mapping is untouched by the user, we will seal the VMA before changing the protection to be used by the kernel. This will guarantee that userspace can't unmap or alter this VMA while it is being used by the kernel. After the kernel is done with the secret memory, it will unseal the VMA to be able to unmap and free it. Unseal operation is not exposed to userspace. Signed-off-by: Fares Mehanna Signed-off-by: Roman Kagan --- mm/internal.h | 7 +++++ mm/mseal.c | 81 ++++++++++++++++++++++++++++++++------------------- 2 files changed, 58 insertions(+), 30 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index b4d86436565b..cf7280d101e9 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1501,6 +1501,8 @@ bool can_modify_mm(struct mm_struct *mm, unsigned long start, unsigned long end); bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start, unsigned long end, int behavior); +/* mm's mmap write lock must be taken before seal/unseal operation */ +int do_mseal(unsigned long start, unsigned long end, bool seal); #else static inline int can_do_mseal(unsigned long flags) { @@ -1518,6 +1520,11 @@ static inline bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start, { return true; } + +static inline int do_mseal(unsigned long start, unsigned long end, bool seal) +{ + return -EINVAL; +} #endif #ifdef CONFIG_SHRINKER_DEBUG diff --git a/mm/mseal.c b/mm/mseal.c index 15bba28acc00..aac9399ffd5d 100644 --- a/mm/mseal.c +++ b/mm/mseal.c @@ -26,6 +26,11 @@ static inline void set_vma_sealed(struct vm_area_struct *vma) vm_flags_set(vma, VM_SEALED); } +static inline void clear_vma_sealed(struct vm_area_struct *vma) +{ + vm_flags_clear(vma, VM_SEALED); +} + /* * check if a vma is sealed for modification. * return true, if modification is allowed. @@ -117,7 +122,7 @@ bool can_modify_mm_madv(struct mm_struct *mm, unsigned long start, unsigned long static int mseal_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma, struct vm_area_struct **prev, unsigned long start, - unsigned long end, vm_flags_t newflags) + unsigned long end, vm_flags_t newflags, bool seal) { int ret = 0; vm_flags_t oldflags = vma->vm_flags; @@ -131,7 +136,10 @@ static int mseal_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma, goto out; } - set_vma_sealed(vma); + if (seal) + set_vma_sealed(vma); + else + clear_vma_sealed(vma); out: *prev = vma; return ret; @@ -167,9 +175,9 @@ static int check_mm_seal(unsigned long start, unsigned long end) } /* - * Apply sealing. + * Apply sealing / unsealing. */ -static int apply_mm_seal(unsigned long start, unsigned long end) +static int apply_mm_seal(unsigned long start, unsigned long end, bool seal) { unsigned long nstart; struct vm_area_struct *vma, *prev; @@ -191,11 +199,14 @@ static int apply_mm_seal(unsigned long start, unsigned long end) unsigned long tmp; vm_flags_t newflags; - newflags = vma->vm_flags | VM_SEALED; + if (seal) + newflags = vma->vm_flags | VM_SEALED; + else + newflags = vma->vm_flags & ~(VM_SEALED); tmp = vma->vm_end; if (tmp > end) tmp = end; - error = mseal_fixup(&vmi, vma, &prev, nstart, tmp, newflags); + error = mseal_fixup(&vmi, vma, &prev, nstart, tmp, newflags, seal); if (error) return error; nstart = vma_iter_end(&vmi); @@ -204,6 +215,37 @@ static int apply_mm_seal(unsigned long start, unsigned long end) return 0; } +int do_mseal(unsigned long start, unsigned long end, bool seal) +{ + int ret; + + if (end < start) + return -EINVAL; + + if (end == start) + return 0; + + /* + * First pass, this helps to avoid + * partial sealing in case of error in input address range, + * e.g. ENOMEM error. + */ + ret = check_mm_seal(start, end); + if (ret) + goto out; + + /* + * Second pass, this should success, unless there are errors + * from vma_modify_flags, e.g. merge/split error, or process + * reaching the max supported VMAs, however, those cases shall + * be rare. + */ + ret = apply_mm_seal(start, end, seal); + +out: + return ret; +} + /* * mseal(2) seals the VM's meta data from * selected syscalls. @@ -256,7 +298,7 @@ static int apply_mm_seal(unsigned long start, unsigned long end) * * unseal() is not supported. */ -static int do_mseal(unsigned long start, size_t len_in, unsigned long flags) +static int __do_mseal(unsigned long start, size_t len_in, unsigned long flags) { size_t len; int ret = 0; @@ -277,33 +319,12 @@ static int do_mseal(unsigned long start, size_t len_in, unsigned long flags) return -EINVAL; end = start + len; - if (end < start) - return -EINVAL; - - if (end == start) - return 0; if (mmap_write_lock_killable(mm)) return -EINTR; - /* - * First pass, this helps to avoid - * partial sealing in case of error in input address range, - * e.g. ENOMEM error. - */ - ret = check_mm_seal(start, end); - if (ret) - goto out; - - /* - * Second pass, this should success, unless there are errors - * from vma_modify_flags, e.g. merge/split error, or process - * reaching the max supported VMAs, however, those cases shall - * be rare. - */ - ret = apply_mm_seal(start, end); + ret = do_mseal(start, end, true); -out: mmap_write_unlock(current->mm); return ret; } @@ -311,5 +332,5 @@ static int do_mseal(unsigned long start, size_t len_in, unsigned long flags) SYSCALL_DEFINE3(mseal, unsigned long, start, size_t, len, unsigned long, flags) { - return do_mseal(start, len, flags); + return __do_mseal(start, len, flags); } From patchwork Wed Sep 11 14:34:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fares Mehanna X-Patchwork-Id: 13800774 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 4C8F9EE49BD for ; Wed, 11 Sep 2024 14:58:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:CC: To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=Y9TjOwsiSXZl9P0/RyFMFJbYNQPYXtbeuapzd5F/ewk=; b=fcZgg6d0aT+Ky8IudzzOQEQBrV HaWeRtMDxdSeG91UT44DD8g972ZrDNKjqSMY+KzTZhaRn94gl2B9fk5N14XK2uYxsTzOAQhxcLxTA 9iZP/u+KNFB9uxv66m5rFRzkHwN7bVd1pf+WXvGif1eyMggSldRsInQOqOT4gOE41sdHHTEC+fEVw ENjYrIWXLinQXN7GAV4CWeROKN3aX3eQ4FNeDbAT1ICJKC3eWMM/6ThjlBsIXlsikj813twmMtdyh +TPef29FTF9eTVpSHMxiM8nraZs7xyGvSj5FE7tZm4vk/izEcG6Ie7G65XksDxbfKpbNidImvPe3y yjxZXWNQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1soOnP-0000000A2iN-2Kvj; Wed, 11 Sep 2024 14:58:15 +0000 Received: from smtp-fw-52004.amazon.com ([52.119.213.154]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1soOS7-00000009wTk-0cRQ for linux-arm-kernel@lists.infradead.org; Wed, 11 Sep 2024 14:36:16 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1726065375; x=1757601375; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Y9TjOwsiSXZl9P0/RyFMFJbYNQPYXtbeuapzd5F/ewk=; b=HqO17lx00vqqPohK4U5h2UT+oNulJBG5L3nQ3J6Mg9zWWpgpT3nUlDzt vUklyB2zuGjGtZZ7I+ZCev85WYvY85t5S0pdqLNuPqkCt3wQpZyeROEL9 sdx+1iS+kHz8ljc9clLQ8aH3m/cx9XJzqAN4myCMate3+pNu9a9HWmUEa w=; X-IronPort-AV: E=Sophos;i="6.10,220,1719878400"; d="scan'208";a="231193640" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.2]) by smtp-border-fw-52004.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Sep 2024 14:36:07 +0000 Received: from EX19MTAEUC001.ant.amazon.com [10.0.17.79:61643] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.38.136:2525] with esmtp (Farcaster) id 3550e7db-38ee-4821-b3d5-b8907b20a81c; Wed, 11 Sep 2024 14:36:06 +0000 (UTC) X-Farcaster-Flow-ID: 3550e7db-38ee-4821-b3d5-b8907b20a81c Received: from EX19D033EUB004.ant.amazon.com (10.252.61.103) by EX19MTAEUC001.ant.amazon.com (10.252.51.193) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Wed, 11 Sep 2024 14:36:01 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D033EUB004.ant.amazon.com (10.252.61.103) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Wed, 11 Sep 2024 14:36:01 +0000 Received: from dev-dsk-faresx-1b-27755bf1.eu-west-1.amazon.com (10.253.79.181) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Wed, 11 Sep 2024 14:35:59 +0000 From: Fares Mehanna To: CC: , Fares Mehanna , Roman Kagan , Marc Zyngier , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Andrew Morton , Kemeng Shi , =?utf-8?q?Pierre-Cl=C3=A9ment_Tosi?= , Ard Biesheuvel , Mark Rutland , "Javier Martinez Canillas" , Arnd Bergmann , Fuad Tabba , Mark Brown , Joey Gouly , Kristina Martsenko , "Randy Dunlap" , Bjorn Helgaas , Jean-Philippe Brucker , "Mike Rapoport (IBM)" , David Hildenbrand , "moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , "open list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , open list , "open list:MEMORY MANAGEMENT" Subject: [RFC PATCH 2/7] mm/secretmem: implement mm-local kernel allocations Date: Wed, 11 Sep 2024 14:34:01 +0000 Message-ID: <20240911143421.85612-3-faresx@amazon.de> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240911143421.85612-1-faresx@amazon.de> References: <20240911143421.85612-1-faresx@amazon.de> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240911_073615_412269_F741E357 X-CRM114-Status: GOOD ( 30.46 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org In order to be resilient against cross-process speculation-based attacks, it makes sense to store certain (secret) items in kernel memory local to mm. Implement such allocations on top of secretmem infrastructure. Specifically, on allocate 1. Create secretmem file. 2. To distinguish it from the conventional memfd_secret()-created one and to maintain associated mm-local allocation context, put the latter on ->private_data of the file. 3. Create virtual mapping in user virtual address space using mmap(). 4. Seal the virtual mapping to disallow the user from affecting it in any way. 5. Fault the pages in, effectively calling secretmem fault handler to remove the pages from kernel linear address and make them local to process mm. 6. Change the PTE from user mode to kernel mode, any access from userspace will result in segmentation fault. Kernel can access this virtual address now. 7. Return the secure area as a struct containing the pointer to the actual memory and providing the context for the release function later. On release, - if called while mm is still in use, remove the mapping - otherwise, if performed at mm teardown, no unmapping is necessary The rest is taken care of by secretmem file cleanup, including returning the pages to the kernel direct map. Signed-off-by: Fares Mehanna Signed-off-by: Roman Kagan --- include/linux/secretmem.h | 29 ++++++ mm/Kconfig | 10 ++ mm/gup.c | 4 +- mm/secretmem.c | 213 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 254 insertions(+), 2 deletions(-) diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h index e918f96881f5..39cc73a0e4bd 100644 --- a/include/linux/secretmem.h +++ b/include/linux/secretmem.h @@ -2,6 +2,10 @@ #ifndef _LINUX_SECRETMEM_H #define _LINUX_SECRETMEM_H +struct secretmem_area { + void *ptr; +}; + #ifdef CONFIG_SECRETMEM extern const struct address_space_operations secretmem_aops; @@ -33,4 +37,29 @@ static inline bool secretmem_active(void) #endif /* CONFIG_SECRETMEM */ +#ifdef CONFIG_KERNEL_SECRETMEM + +bool can_access_secretmem_vma(struct vm_area_struct *vma); +struct secretmem_area *secretmem_allocate_pages(unsigned int order); +void secretmem_release_pages(struct secretmem_area *data); + +#else + +static inline bool can_access_secretmem_vma(struct vm_area_struct *vma) +{ + return true; +} + +static inline struct secretmem_area *secretmem_allocate_pages(unsigned int order) +{ + return NULL; +} + +static inline void secretmem_release_pages(struct secretmem_area *data) +{ + WARN_ONCE(1, "Called secret memory release page without support\n"); +} + +#endif /* CONFIG_KERNEL_SECRETMEM */ + #endif /* _LINUX_SECRETMEM_H */ diff --git a/mm/Kconfig b/mm/Kconfig index b72e7d040f78..a327d8def179 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1168,6 +1168,16 @@ config SECRETMEM memory areas visible only in the context of the owning process and not mapped to other processes and other kernel page tables. +config KERNEL_SECRETMEM + default y + bool "Enable kernel usage of memfd_secret()" if EXPERT + depends on SECRETMEM + depends on MMU + help + Enable the kernel usage of memfd_secret() for kernel memory allocations, + The allocated memory is visible only to the kernel in the context of + the owning process. + config ANON_VMA_NAME bool "Anonymous VMA name support" depends on PROC_FS && ADVISE_SYSCALLS && MMU diff --git a/mm/gup.c b/mm/gup.c index 54d0dc3831fb..6c2c6a0cbe2a 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1076,7 +1076,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, struct follow_page_context ctx = { NULL }; struct page *page; - if (vma_is_secretmem(vma)) + if (!can_access_secretmem_vma(vma)) return NULL; if (WARN_ON_ONCE(foll_flags & FOLL_PIN)) @@ -1281,7 +1281,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma)) return -EOPNOTSUPP; - if (vma_is_secretmem(vma)) + if (!can_access_secretmem_vma(vma)) return -EFAULT; if (write) { diff --git a/mm/secretmem.c b/mm/secretmem.c index 3afb5ad701e1..86afedc65889 100644 --- a/mm/secretmem.c +++ b/mm/secretmem.c @@ -13,13 +13,17 @@ #include #include #include +#include #include #include #include #include #include +#include +#include #include +#include #include @@ -42,6 +46,16 @@ MODULE_PARM_DESC(secretmem_enable, static atomic_t secretmem_users; +/* secretmem file private context */ +struct secretmem_ctx { + struct secretmem_area _area; + struct page **_pages; + unsigned long _nr_pages; + struct file *_file; + struct mm_struct *_mm; +}; + + bool secretmem_active(void) { return !!atomic_read(&secretmem_users); @@ -116,6 +130,7 @@ static const struct vm_operations_struct secretmem_vm_ops = { static int secretmem_release(struct inode *inode, struct file *file) { + kfree(file->private_data); atomic_dec(&secretmem_users); return 0; } @@ -123,13 +138,23 @@ static int secretmem_release(struct inode *inode, struct file *file) static int secretmem_mmap(struct file *file, struct vm_area_struct *vma) { unsigned long len = vma->vm_end - vma->vm_start; + struct secretmem_ctx *ctx = file->private_data; + unsigned long kernel_no_permissions; + + kernel_no_permissions = (VM_READ | VM_WRITE | VM_EXEC | VM_MAYEXEC); if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0) return -EINVAL; + if (ctx && (vma->vm_flags & kernel_no_permissions)) + return -EINVAL; + if (!mlock_future_ok(vma->vm_mm, vma->vm_flags | VM_LOCKED, len)) return -EAGAIN; + if (ctx) + vm_flags_set(vma, VM_MIXEDMAP); + vm_flags_set(vma, VM_LOCKED | VM_DONTDUMP); vma->vm_ops = &secretmem_vm_ops; @@ -230,6 +255,194 @@ static struct file *secretmem_file_create(unsigned long flags) return file; } +#ifdef CONFIG_KERNEL_SECRETMEM + +struct secretmem_area *secretmem_allocate_pages(unsigned int order) +{ + unsigned long uvaddr, uvaddr_inc, unused, nr_pages, bytes_length; + struct file *kernel_secfile; + struct vm_area_struct *vma; + struct secretmem_ctx *ctx; + struct page **sec_pages; + struct mm_struct *mm; + long nr_pinned_pages; + pte_t pte, old_pte; + spinlock_t *ptl; + pte_t *upte; + int rc; + + nr_pages = (1 << order); + bytes_length = nr_pages * PAGE_SIZE; + mm = current->mm; + + if (!mm || !mmget_not_zero(mm)) + return NULL; + + /* Create secret memory file / truncate it */ + kernel_secfile = secretmem_file_create(0); + if (IS_ERR(kernel_secfile)) + goto put_mm; + + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); + if (IS_ERR(ctx)) + goto close_secfile; + kernel_secfile->private_data = ctx; + + rc = do_truncate(file_mnt_idmap(kernel_secfile), + file_dentry(kernel_secfile), bytes_length, 0, NULL); + if (rc) + goto close_secfile; + + if (mmap_write_lock_killable(mm)) + goto close_secfile; + + /* Map pages to the secretmem file */ + uvaddr = do_mmap(kernel_secfile, 0, bytes_length, PROT_NONE, + MAP_SHARED, 0, 0, &unused, NULL); + if (IS_ERR_VALUE(uvaddr)) + goto unlock_mmap; + + /* mseal() the VMA to make sure it won't change */ + rc = do_mseal(uvaddr, uvaddr + bytes_length, true); + if (rc) + goto unmap_pages; + + /* Make sure VMA is there, and is kernel-secure */ + vma = find_vma(current->mm, uvaddr); + if (!vma) + goto unseal_vma; + + if (!vma_is_secretmem(vma) || + !can_access_secretmem_vma(vma)) + goto unseal_vma; + + /* Pin user pages; fault them in */ + sec_pages = kzalloc(sizeof(struct page *) * nr_pages, GFP_KERNEL); + if (!sec_pages) + goto unseal_vma; + + nr_pinned_pages = pin_user_pages(uvaddr, nr_pages, FOLL_FORCE | FOLL_LONGTERM, sec_pages); + if (nr_pinned_pages < 0) + goto free_sec_pages; + if (nr_pinned_pages != nr_pages) + goto unpin_pages; + + /* Modify the existing mapping to be kernel accessible, local to this process mm */ + uvaddr_inc = uvaddr; + while (uvaddr_inc < uvaddr + bytes_length) { + upte = get_locked_pte(mm, uvaddr_inc, &ptl); + if (!upte) + goto unpin_pages; + old_pte = ptep_modify_prot_start(vma, uvaddr_inc, upte); + pte = pte_modify(old_pte, PAGE_KERNEL); + ptep_modify_prot_commit(vma, uvaddr_inc, upte, old_pte, pte); + pte_unmap_unlock(upte, ptl); + uvaddr_inc += PAGE_SIZE; + } + flush_tlb_range(vma, uvaddr, uvaddr + bytes_length); + + /* Return data */ + mmgrab(mm); + ctx->_area.ptr = (void *) uvaddr; + ctx->_pages = sec_pages; + ctx->_nr_pages = nr_pages; + ctx->_mm = mm; + ctx->_file = kernel_secfile; + + mmap_write_unlock(mm); + mmput(mm); + + return &ctx->_area; + +unpin_pages: + unpin_user_pages(sec_pages, nr_pinned_pages); +free_sec_pages: + kfree(sec_pages); +unseal_vma: + rc = do_mseal(uvaddr, uvaddr + bytes_length, false); + if (rc) + BUG(); +unmap_pages: + rc = do_munmap(mm, uvaddr, bytes_length, NULL); + if (rc) + BUG(); +unlock_mmap: + mmap_write_unlock(mm); +close_secfile: + fput(kernel_secfile); +put_mm: + mmput(mm); + return NULL; +} + +void secretmem_release_pages(struct secretmem_area *data) +{ + unsigned long uvaddr, bytes_length; + struct secretmem_ctx *ctx; + int rc; + + if (!data || !data->ptr) + BUG(); + + ctx = container_of(data, struct secretmem_ctx, _area); + if (!ctx || !ctx->_file || !ctx->_pages || !ctx->_mm) + BUG(); + + bytes_length = ctx->_nr_pages * PAGE_SIZE; + uvaddr = (unsigned long) data->ptr; + + /* + * Remove the mapping if mm is still in use. + * Not secure to continue if unmapping failed. + */ + if (mmget_not_zero(ctx->_mm)) { + mmap_write_lock(ctx->_mm); + rc = do_mseal(uvaddr, uvaddr + bytes_length, false); + if (rc) { + mmap_write_unlock(ctx->_mm); + BUG(); + } + rc = do_munmap(ctx->_mm, uvaddr, bytes_length, NULL); + if (rc) { + mmap_write_unlock(ctx->_mm); + BUG(); + } + mmap_write_unlock(ctx->_mm); + mmput(ctx->_mm); + } + + mmdrop(ctx->_mm); + unpin_user_pages(ctx->_pages, ctx->_nr_pages); + fput(ctx->_file); + kfree(ctx->_pages); + + ctx->_nr_pages = 0; + ctx->_pages = NULL; + ctx->_file = NULL; + ctx->_mm = NULL; + ctx->_area.ptr = NULL; +} + +bool can_access_secretmem_vma(struct vm_area_struct *vma) +{ + struct secretmem_ctx *ctx; + + if (!vma_is_secretmem(vma)) + return true; + + /* + * If VMA is owned by running process, and marked for kernel + * usage, then allow access. + */ + ctx = vma->vm_file->private_data; + if (ctx && current->mm == vma->vm_mm) + return true; + + return false; +} + +#endif /* CONFIG_KERNEL_SECRETMEM */ + SYSCALL_DEFINE1(memfd_secret, unsigned int, flags) { struct file *file; From patchwork Wed Sep 11 14:34:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fares Mehanna X-Patchwork-Id: 13800775 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 384D5EE49BC for ; Wed, 11 Sep 2024 14:59:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:CC: To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=rCn+xTWUjnpsWnBBnpI3iR0NNWrWMHIE+UutE4x6c8o=; b=PJIH6PL8FL5N9qei5xy94oOKAi FTZb5UthGi1FE7+AAvDghrfhc5rJuoGB9P3kiJF09pYUAmYW5+shUDpWrppkwHYo1Hmjx0r6IkNtq qX+kNTl5MOqtzFmoZnp9+6PbWXZN9UsV4ZFpgTFRBZkiEpuLPlzWC3ghIasXIIBUwyh7vAOK3Cp7Z 3CHKwbYPVX4a95akNpkucNOhyVCI9GmWO5of+u5EYnnGduM23aLxxK0r455NYNjOlCJmIuNRzP4f7 FTQaBDFuNwtEBYm2EPWEXNwgnf5aMgkwuPuXA2wj1OxLoo2uxnMNxpJs2AvWgcPjWNL6X/7NsA9tO N6Dk+otQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1soOoQ-0000000A2x0-3UrE; Wed, 11 Sep 2024 14:59:18 +0000 Received: from smtp-fw-9102.amazon.com ([207.171.184.29]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1soOSW-00000009wch-28zd for linux-arm-kernel@lists.infradead.org; Wed, 11 Sep 2024 14:36:41 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1726065401; x=1757601401; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=rCn+xTWUjnpsWnBBnpI3iR0NNWrWMHIE+UutE4x6c8o=; b=qUITUvUi2/np1RiGkbOVJ6uHAINEXCstgBBPUSCZY1PfVx8p0495vFn2 1v881a+QiaSmludVKjHlj8RLbTgFJhHGJGMv2Ulh+2v1HrJTJING58zoF SojWkc3JeOu39c4zWD7nD7LaEMhI3080akV/vzNUuNIAPzp1/73p093xt 4=; X-IronPort-AV: E=Sophos;i="6.10,220,1719878400"; d="scan'208";a="452953260" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.214]) by smtp-border-fw-9102.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Sep 2024 14:36:30 +0000 Received: from EX19MTAEUC001.ant.amazon.com [10.0.17.79:20048] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.13.80:2525] with esmtp (Farcaster) id 337e7469-1cf4-42a1-a92c-28b8d968fd53; Wed, 11 Sep 2024 14:36:28 +0000 (UTC) X-Farcaster-Flow-ID: 337e7469-1cf4-42a1-a92c-28b8d968fd53 Received: from EX19D007EUB001.ant.amazon.com (10.252.51.82) by EX19MTAEUC001.ant.amazon.com (10.252.51.155) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Wed, 11 Sep 2024 14:36:22 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D007EUB001.ant.amazon.com (10.252.51.82) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.35; Wed, 11 Sep 2024 14:36:21 +0000 Received: from dev-dsk-faresx-1b-27755bf1.eu-west-1.amazon.com (10.253.79.181) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Wed, 11 Sep 2024 14:36:19 +0000 From: Fares Mehanna To: CC: , Fares Mehanna , "Marc Zyngier" , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , "Will Deacon" , Andrew Morton , "Kemeng Shi" , =?utf-8?q?Pierre-Cl=C3=A9ment_Tos?= =?utf-8?q?i?= , Ard Biesheuvel , Mark Rutland , Javier Martinez Canillas , "Arnd Bergmann" , Fuad Tabba , Mark Brown , Joey Gouly , Kristina Martsenko , Randy Dunlap , "Bjorn Helgaas" , Jean-Philippe Brucker , "Mike Rapoport (IBM)" , "David Hildenbrand" , Roman Kagan , "moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , "open list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , open list , "open list:MEMORY MANAGEMENT" Subject: [RFC PATCH 3/7] arm64: KVM: Refactor C-code to access vCPU gp-registers through macros Date: Wed, 11 Sep 2024 14:34:02 +0000 Message-ID: <20240911143421.85612-4-faresx@amazon.de> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240911143421.85612-1-faresx@amazon.de> References: <20240911143421.85612-1-faresx@amazon.de> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240911_073640_662790_728FE866 X-CRM114-Status: GOOD ( 17.33 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Unify how KVM accesses vCPU gp-regs by using two macros vcpu_gp_regs() and ctxt_gp_regs(). This is prerequisite to move the gp-regs later to be dynamically allocated for vCPUs. Signed-off-by: Fares Mehanna --- arch/arm64/include/asm/kvm_emulate.h | 2 +- arch/arm64/include/asm/kvm_host.h | 3 ++- arch/arm64/kvm/guest.c | 8 ++++---- arch/arm64/kvm/hyp/include/hyp/switch.h | 2 +- arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 10 +++++----- arch/arm64/kvm/hyp/include/nvhe/trap_handler.h | 2 +- 6 files changed, 14 insertions(+), 13 deletions(-) diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h index a601a9305b10..cabfb76ca514 100644 --- a/arch/arm64/include/asm/kvm_emulate.h +++ b/arch/arm64/include/asm/kvm_emulate.h @@ -170,7 +170,7 @@ static __always_inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num, static inline bool vcpu_is_el2_ctxt(const struct kvm_cpu_context *ctxt) { - switch (ctxt->regs.pstate & (PSR_MODE32_BIT | PSR_MODE_MASK)) { + switch (ctxt_gp_regs(ctxt)->pstate & (PSR_MODE32_BIT | PSR_MODE_MASK)) { case PSR_MODE_EL2h: case PSR_MODE_EL2t: return true; diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index a33f5996ca9f..31cbd62a5d06 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -946,7 +946,8 @@ struct kvm_vcpu_arch { #define vcpu_clear_on_unsupported_cpu(vcpu) \ vcpu_clear_flag(vcpu, ON_UNSUPPORTED_CPU) -#define vcpu_gp_regs(v) (&(v)->arch.ctxt.regs) +#define ctxt_gp_regs(ctxt) (&(ctxt)->regs) +#define vcpu_gp_regs(v) (ctxt_gp_regs(&(v)->arch.ctxt)) /* * Only use __vcpu_sys_reg/ctxt_sys_reg if you know you want the diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c index 11098eb7eb44..821a2b7de388 100644 --- a/arch/arm64/kvm/guest.c +++ b/arch/arm64/kvm/guest.c @@ -134,16 +134,16 @@ static void *core_reg_addr(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg) KVM_REG_ARM_CORE_REG(regs.regs[30]): off -= KVM_REG_ARM_CORE_REG(regs.regs[0]); off /= 2; - return &vcpu->arch.ctxt.regs.regs[off]; + return &vcpu_gp_regs(vcpu)->regs[off]; case KVM_REG_ARM_CORE_REG(regs.sp): - return &vcpu->arch.ctxt.regs.sp; + return &vcpu_gp_regs(vcpu)->sp; case KVM_REG_ARM_CORE_REG(regs.pc): - return &vcpu->arch.ctxt.regs.pc; + return &vcpu_gp_regs(vcpu)->pc; case KVM_REG_ARM_CORE_REG(regs.pstate): - return &vcpu->arch.ctxt.regs.pstate; + return &vcpu_gp_regs(vcpu)->pstate; case KVM_REG_ARM_CORE_REG(sp_el1): return __ctxt_sys_reg(&vcpu->arch.ctxt, SP_EL1); diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index 37ff87d782b6..d2ed0938fc90 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -649,7 +649,7 @@ static inline void synchronize_vcpu_pstate(struct kvm_vcpu *vcpu, u64 *exit_code ESR_ELx_EC(read_sysreg_el2(SYS_ESR)) == ESR_ELx_EC_PAC) write_sysreg_el2(*vcpu_cpsr(vcpu), SYS_SPSR); - vcpu->arch.ctxt.regs.pstate = read_sysreg_el2(SYS_SPSR); + vcpu_gp_regs(vcpu)->pstate = read_sysreg_el2(SYS_SPSR); } /* diff --git a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h index 4c0fdabaf8ae..d17033766010 100644 --- a/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h +++ b/arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h @@ -105,13 +105,13 @@ static inline void __sysreg_save_el1_state(struct kvm_cpu_context *ctxt) static inline void __sysreg_save_el2_return_state(struct kvm_cpu_context *ctxt) { - ctxt->regs.pc = read_sysreg_el2(SYS_ELR); + ctxt_gp_regs(ctxt)->pc = read_sysreg_el2(SYS_ELR); /* * Guest PSTATE gets saved at guest fixup time in all * cases. We still need to handle the nVHE host side here. */ if (!has_vhe() && ctxt->__hyp_running_vcpu) - ctxt->regs.pstate = read_sysreg_el2(SYS_SPSR); + ctxt_gp_regs(ctxt)->pstate = read_sysreg_el2(SYS_SPSR); if (cpus_have_final_cap(ARM64_HAS_RAS_EXTN)) ctxt_sys_reg(ctxt, DISR_EL1) = read_sysreg_s(SYS_VDISR_EL2); @@ -202,7 +202,7 @@ static inline void __sysreg_restore_el1_state(struct kvm_cpu_context *ctxt) /* Read the VCPU state's PSTATE, but translate (v)EL2 to EL1. */ static inline u64 to_hw_pstate(const struct kvm_cpu_context *ctxt) { - u64 mode = ctxt->regs.pstate & (PSR_MODE_MASK | PSR_MODE32_BIT); + u64 mode = ctxt_gp_regs(ctxt)->pstate & (PSR_MODE_MASK | PSR_MODE32_BIT); switch (mode) { case PSR_MODE_EL2t: @@ -213,7 +213,7 @@ static inline u64 to_hw_pstate(const struct kvm_cpu_context *ctxt) break; } - return (ctxt->regs.pstate & ~(PSR_MODE_MASK | PSR_MODE32_BIT)) | mode; + return (ctxt_gp_regs(ctxt)->pstate & ~(PSR_MODE_MASK | PSR_MODE32_BIT)) | mode; } static inline void __sysreg_restore_el2_return_state(struct kvm_cpu_context *ctxt) @@ -235,7 +235,7 @@ static inline void __sysreg_restore_el2_return_state(struct kvm_cpu_context *ctx if (!(mode & PSR_MODE32_BIT) && mode >= PSR_MODE_EL2t) pstate = PSR_MODE_EL2h | PSR_IL_BIT; - write_sysreg_el2(ctxt->regs.pc, SYS_ELR); + write_sysreg_el2(ctxt_gp_regs(ctxt)->pc, SYS_ELR); write_sysreg_el2(pstate, SYS_SPSR); if (cpus_have_final_cap(ARM64_HAS_RAS_EXTN)) diff --git a/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h b/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h index 45a84f0ade04..dfe5be0d70ef 100644 --- a/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h +++ b/arch/arm64/kvm/hyp/include/nvhe/trap_handler.h @@ -11,7 +11,7 @@ #include -#define cpu_reg(ctxt, r) (ctxt)->regs.regs[r] +#define cpu_reg(ctxt, r) (ctxt_gp_regs((ctxt))->regs[r]) #define DECLARE_REG(type, name, ctxt, reg) \ type name = (type)cpu_reg(ctxt, (reg)) From patchwork Wed Sep 11 14:34:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fares Mehanna X-Patchwork-Id: 13800776 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8D66DEE49BD for ; Wed, 11 Sep 2024 15:00:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:CC: To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=gUT0NHH1rkNY0jIUGubcfNpyrpnEMMZzd9a2sKPpzdA=; b=smc1YED4Ip21ZlbUrPBhEe0UCG HLRAl2wSEurEP1zTbxOLH6hIIeuCGHOQq6LWdWv7ceK424cDAa8cifCPkWjiR5x1tk9WZnnCm3pTq 03z6HWOEZqYEHVgtmp4n59jtFG6InlaXfDp5kyBrDlieO6G432e/e/WgIrIOJmWpfxC5V3mXXlbbP UWTTMg3/SIvJ10Kqem/A7j7PKwOC5t1X6Tfy4Njj6l4Fq+af4eIs7cdwHNEYAtarFtSNtuWoF2M75 woHmtizkJeYfUqVqN6PR5VrO6SI1c2f0y9T9LUiQc0jYavrwsaRUpQAR9wBlMYtD6GumNxaSOSv7N x9s51dkg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1soOpY-0000000A3JO-0EAP; Wed, 11 Sep 2024 15:00:28 +0000 Received: from smtp-fw-52003.amazon.com ([52.119.213.152]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1soOSm-00000009whr-1chS for linux-arm-kernel@lists.infradead.org; Wed, 11 Sep 2024 14:36:57 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1726065417; x=1757601417; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=gUT0NHH1rkNY0jIUGubcfNpyrpnEMMZzd9a2sKPpzdA=; b=snYbScVqSCRpQUpNEAgmAVSrsdRYDfOXcBKHHSEbPItdDpTNM6+TJ7Oy koH6wFM15hZU/iVRzJRBNFarmsVoAy4YSAagEZGdvJ+fZqk3kKnqJ86+r bW5YRZRzmfgqTVaUFe1anQoTVUDNj4UNNZtTEo2Etx8VU4CPJN0Xatb76 M=; X-IronPort-AV: E=Sophos;i="6.10,220,1719878400"; d="scan'208";a="24916368" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52003.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Sep 2024 14:36:54 +0000 Received: from EX19MTAEUA001.ant.amazon.com [10.0.10.100:10017] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.27.59:2525] with esmtp (Farcaster) id f9f29f1f-caf9-4ac7-bc5d-27f9e417256f; Wed, 11 Sep 2024 14:36:52 +0000 (UTC) X-Farcaster-Flow-ID: f9f29f1f-caf9-4ac7-bc5d-27f9e417256f Received: from EX19D007EUA001.ant.amazon.com (10.252.50.133) by EX19MTAEUA001.ant.amazon.com (10.252.50.223) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Wed, 11 Sep 2024 14:36:52 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D007EUA001.ant.amazon.com (10.252.50.133) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.35; Wed, 11 Sep 2024 14:36:51 +0000 Received: from dev-dsk-faresx-1b-27755bf1.eu-west-1.amazon.com (10.253.79.181) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Wed, 11 Sep 2024 14:36:49 +0000 From: Fares Mehanna To: CC: , Fares Mehanna , "Marc Zyngier" , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , "Will Deacon" , Andrew Morton , "Kemeng Shi" , =?utf-8?q?Pierre-Cl=C3=A9ment_Tos?= =?utf-8?q?i?= , Ard Biesheuvel , Mark Rutland , Javier Martinez Canillas , "Arnd Bergmann" , Fuad Tabba , Mark Brown , Joey Gouly , Kristina Martsenko , Randy Dunlap , "Bjorn Helgaas" , Jean-Philippe Brucker , "Mike Rapoport (IBM)" , "David Hildenbrand" , Roman Kagan , "moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , "open list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , open list , "open list:MEMORY MANAGEMENT" Subject: [RFC PATCH 4/7] KVM: Refactor Assembly-code to access vCPU gp-registers through a macro Date: Wed, 11 Sep 2024 14:34:03 +0000 Message-ID: <20240911143421.85612-5-faresx@amazon.de> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240911143421.85612-1-faresx@amazon.de> References: <20240911143421.85612-1-faresx@amazon.de> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240911_073656_626361_1A680DB2 X-CRM114-Status: GOOD ( 18.52 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Right now assembly code accesses vCPU gp-regs directly from the context struct "struct kvm_cpu_context" using "CPU_XREG_OFFSET()". Since we want to move gp-regs to dynamic memory, we can no longer assume that gp-regs will be embedded in the context struct, thus split the access to two steps. The first is to get the gp-regs from the context using the assembly macro "get_ctxt_gp_regs". And the second is to access the gp-registers directly from within the "struct user_pt_regs" by removing the offset "CPU_USER_PT_REGS" from the access macro "CPU_XREG_OFFSET()". I also changed variable naming and comments where appropriate. Signed-off-by: Fares Mehanna --- arch/arm64/include/asm/kvm_asm.h | 48 +++++++++++++++++--------------- arch/arm64/kvm/hyp/entry.S | 15 ++++++++++ arch/arm64/kvm/hyp/nvhe/host.S | 20 ++++++++++--- 3 files changed, 57 insertions(+), 26 deletions(-) diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h index 2181a11b9d92..fa4fb642a5f5 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -313,6 +313,10 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr, u64 elr_virt, str \vcpu, [\ctxt, #HOST_CONTEXT_VCPU] .endm +.macro get_ctxt_gp_regs ctxt, regs + add \regs, \ctxt, #CPU_USER_PT_REGS +.endm + /* * KVM extable for unexpected exceptions. * Create a struct kvm_exception_table_entry output to a section that can be @@ -329,7 +333,7 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr, u64 elr_virt, .popsection .endm -#define CPU_XREG_OFFSET(x) (CPU_USER_PT_REGS + 8*x) +#define CPU_XREG_OFFSET(x) (8 * (x)) #define CPU_LR_OFFSET CPU_XREG_OFFSET(30) #define CPU_SP_EL0_OFFSET (CPU_LR_OFFSET + 8) @@ -337,34 +341,34 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr, u64 elr_virt, * We treat x18 as callee-saved as the host may use it as a platform * register (e.g. for shadow call stack). */ -.macro save_callee_saved_regs ctxt - str x18, [\ctxt, #CPU_XREG_OFFSET(18)] - stp x19, x20, [\ctxt, #CPU_XREG_OFFSET(19)] - stp x21, x22, [\ctxt, #CPU_XREG_OFFSET(21)] - stp x23, x24, [\ctxt, #CPU_XREG_OFFSET(23)] - stp x25, x26, [\ctxt, #CPU_XREG_OFFSET(25)] - stp x27, x28, [\ctxt, #CPU_XREG_OFFSET(27)] - stp x29, lr, [\ctxt, #CPU_XREG_OFFSET(29)] +.macro save_callee_saved_regs regs + str x18, [\regs, #CPU_XREG_OFFSET(18)] + stp x19, x20, [\regs, #CPU_XREG_OFFSET(19)] + stp x21, x22, [\regs, #CPU_XREG_OFFSET(21)] + stp x23, x24, [\regs, #CPU_XREG_OFFSET(23)] + stp x25, x26, [\regs, #CPU_XREG_OFFSET(25)] + stp x27, x28, [\regs, #CPU_XREG_OFFSET(27)] + stp x29, lr, [\regs, #CPU_XREG_OFFSET(29)] .endm -.macro restore_callee_saved_regs ctxt - // We require \ctxt is not x18-x28 - ldr x18, [\ctxt, #CPU_XREG_OFFSET(18)] - ldp x19, x20, [\ctxt, #CPU_XREG_OFFSET(19)] - ldp x21, x22, [\ctxt, #CPU_XREG_OFFSET(21)] - ldp x23, x24, [\ctxt, #CPU_XREG_OFFSET(23)] - ldp x25, x26, [\ctxt, #CPU_XREG_OFFSET(25)] - ldp x27, x28, [\ctxt, #CPU_XREG_OFFSET(27)] - ldp x29, lr, [\ctxt, #CPU_XREG_OFFSET(29)] +.macro restore_callee_saved_regs regs + // We require \regs is not x18-x28 + ldr x18, [\regs, #CPU_XREG_OFFSET(18)] + ldp x19, x20, [\regs, #CPU_XREG_OFFSET(19)] + ldp x21, x22, [\regs, #CPU_XREG_OFFSET(21)] + ldp x23, x24, [\regs, #CPU_XREG_OFFSET(23)] + ldp x25, x26, [\regs, #CPU_XREG_OFFSET(25)] + ldp x27, x28, [\regs, #CPU_XREG_OFFSET(27)] + ldp x29, lr, [\regs, #CPU_XREG_OFFSET(29)] .endm -.macro save_sp_el0 ctxt, tmp +.macro save_sp_el0 regs, tmp mrs \tmp, sp_el0 - str \tmp, [\ctxt, #CPU_SP_EL0_OFFSET] + str \tmp, [\regs, #CPU_SP_EL0_OFFSET] .endm -.macro restore_sp_el0 ctxt, tmp - ldr \tmp, [\ctxt, #CPU_SP_EL0_OFFSET] +.macro restore_sp_el0 regs, tmp + ldr \tmp, [\regs, #CPU_SP_EL0_OFFSET] msr sp_el0, \tmp .endm diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S index 4433a234aa9b..628a123bcdc1 100644 --- a/arch/arm64/kvm/hyp/entry.S +++ b/arch/arm64/kvm/hyp/entry.S @@ -28,6 +28,9 @@ SYM_FUNC_START(__guest_enter) adr_this_cpu x1, kvm_hyp_ctxt, x2 + // Get gp-regs pointer from the context + get_ctxt_gp_regs x1, x1 + // Store the hyp regs save_callee_saved_regs x1 @@ -62,6 +65,9 @@ alternative_else_nop_endif // when this feature is enabled for kernel code. ptrauth_switch_to_guest x29, x0, x1, x2 + // Get gp-regs pointer from the context + get_ctxt_gp_regs x29, x29 + // Restore the guest's sp_el0 restore_sp_el0 x29, x0 @@ -108,6 +114,7 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL) // current state is saved to the guest context but it will only be // accurate if the guest had been completely restored. adr_this_cpu x0, kvm_hyp_ctxt, x1 + get_ctxt_gp_regs x0, x0 adr_l x1, hyp_panic str x1, [x0, #CPU_XREG_OFFSET(30)] @@ -120,6 +127,7 @@ SYM_INNER_LABEL(__guest_exit, SYM_L_GLOBAL) // vcpu x0-x1 on the stack add x1, x1, #VCPU_CONTEXT + get_ctxt_gp_regs x1, x1 ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN) @@ -145,6 +153,10 @@ SYM_INNER_LABEL(__guest_exit, SYM_L_GLOBAL) // Store the guest's sp_el0 save_sp_el0 x1, x2 + // Recover vCPU context to x1 + get_vcpu_ptr x1, x2 + add x1, x1, #VCPU_CONTEXT + adr_this_cpu x2, kvm_hyp_ctxt, x3 // Macro ptrauth_switch_to_hyp format: @@ -157,6 +169,9 @@ SYM_INNER_LABEL(__guest_exit, SYM_L_GLOBAL) // mte_switch_to_hyp(g_ctxt, h_ctxt, reg1) mte_switch_to_hyp x1, x2, x3 + // Get gp-regs pointer from the context + get_ctxt_gp_regs x2, x2 + // Restore hyp's sp_el0 restore_sp_el0 x2, x3 diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S index 3d610fc51f4d..31afa7396294 100644 --- a/arch/arm64/kvm/hyp/nvhe/host.S +++ b/arch/arm64/kvm/hyp/nvhe/host.S @@ -17,6 +17,12 @@ SYM_FUNC_START(__host_exit) get_host_ctxt x0, x1 + /* Keep host context in x1 */ + mov x1, x0 + + /* Get gp-regs pointer from the context */ + get_ctxt_gp_regs x0, x0 + /* Store the host regs x2 and x3 */ stp x2, x3, [x0, #CPU_XREG_OFFSET(2)] @@ -36,7 +42,10 @@ SYM_FUNC_START(__host_exit) /* Store the host regs x18-x29, lr */ save_callee_saved_regs x0 - /* Save the host context pointer in x29 across the function call */ + /* Save the host context pointer in x28 across the function call */ + mov x28, x1 + + /* Save the host gp-regs pointer in x29 across the function call */ mov x29, x0 #ifdef CONFIG_ARM64_PTR_AUTH_KERNEL @@ -46,7 +55,7 @@ alternative_else_nop_endif alternative_if ARM64_KVM_PROTECTED_MODE /* Save kernel ptrauth keys. */ - add x18, x29, #CPU_APIAKEYLO_EL1 + add x18, x28, #CPU_APIAKEYLO_EL1 ptrauth_save_state x18, x19, x20 /* Use hyp keys. */ @@ -58,6 +67,7 @@ alternative_else_nop_endif __skip_pauth_save: #endif /* CONFIG_ARM64_PTR_AUTH_KERNEL */ + mov x0, x28 bl handle_trap __host_enter_restore_full: @@ -68,7 +78,7 @@ b __skip_pauth_restore alternative_else_nop_endif alternative_if ARM64_KVM_PROTECTED_MODE - add x18, x29, #CPU_APIAKEYLO_EL1 + add x18, x28, #CPU_APIAKEYLO_EL1 ptrauth_restore_state x18, x19, x20 alternative_else_nop_endif __skip_pauth_restore: @@ -101,7 +111,8 @@ SYM_FUNC_END(__host_exit) * void __noreturn __host_enter(struct kvm_cpu_context *host_ctxt); */ SYM_FUNC_START(__host_enter) - mov x29, x0 + mov x28, x0 + get_ctxt_gp_regs x0, x29 b __host_enter_restore_full SYM_FUNC_END(__host_enter) @@ -141,6 +152,7 @@ SYM_FUNC_START(__hyp_do_panic) /* Enter the host, conditionally restoring the host context. */ cbz x29, __host_enter_without_restoring + get_ctxt_gp_regs x29, x29 b __host_enter_for_panic SYM_FUNC_END(__hyp_do_panic) From patchwork Wed Sep 11 14:34:04 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fares Mehanna X-Patchwork-Id: 13800780 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 57F9BEE49BD for ; Wed, 11 Sep 2024 15:01:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:CC: To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=y7aHv3GtJyRu8tFW/46gCzZ/hJvvqzg3ZHRgqRHdqdU=; b=fR32ihnKboPcXGTezLT42LhyaQ Amq1aYK94K+YATdYaXdgfWa8k2lRoSURQc3E5O2hwQb3WIx5biL8MQu7JJxucoKDn/mDGPKZx8SMM wHSTxoWp7nGxs5aD02qlDE9/C9rHHHAMVwAiFUIMwHtshDeyplUXLUW1UREM/d/n7TSWfEs/UYbWY FmhSDOTLcWMyb+qE8A7ns453q2WtvdFtQbseCtZJiOXTPu5UcQCsjjh51xDeIf2W/DlyBN5IVHaZQ b/3+0xnTIwQ0drBgP8IKd/OhkEzn8QcNxuOXavOB4+RB41ZS0ZJVMzXh72Sl9tBO378rcO3hse0Hk 2KHARNvA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1soOqU-0000000A3X0-2bLj; Wed, 11 Sep 2024 15:01:26 +0000 Received: from smtp-fw-52002.amazon.com ([52.119.213.150]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1soOTA-00000009wz9-3kiS for linux-arm-kernel@lists.infradead.org; Wed, 11 Sep 2024 14:37:23 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1726065440; x=1757601440; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=y7aHv3GtJyRu8tFW/46gCzZ/hJvvqzg3ZHRgqRHdqdU=; b=IbDNlsvyrCD8ZbuuNBVsGYeP+GUO299ftCHedP7bb3OAfv/ddOY0+PjL 2LE25EmPTTBm2S2cFii/ELs3/pJouyjzhXYDUJJvQdeem3LcetSYbwlb3 HWhkd+Rady5l+3Kc6vbw2nq2XiuZu2cSDc6qeTIM43Orcb9n2DOZZzmLQ o=; X-IronPort-AV: E=Sophos;i="6.10,220,1719878400"; d="scan'208";a="658274111" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52002.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Sep 2024 14:37:14 +0000 Received: from EX19MTAEUC002.ant.amazon.com [10.0.43.254:16521] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.27.59:2525] with esmtp (Farcaster) id c7fcd23e-81e6-4810-ad32-43bfceb7c5f9; Wed, 11 Sep 2024 14:37:13 +0000 (UTC) X-Farcaster-Flow-ID: c7fcd23e-81e6-4810-ad32-43bfceb7c5f9 Received: from EX19D007EUA004.ant.amazon.com (10.252.50.76) by EX19MTAEUC002.ant.amazon.com (10.252.51.245) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Wed, 11 Sep 2024 14:37:12 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D007EUA004.ant.amazon.com (10.252.50.76) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.35; Wed, 11 Sep 2024 14:37:12 +0000 Received: from dev-dsk-faresx-1b-27755bf1.eu-west-1.amazon.com (10.253.79.181) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Wed, 11 Sep 2024 14:37:09 +0000 From: Fares Mehanna To: CC: , Fares Mehanna , "Marc Zyngier" , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , "Will Deacon" , Andrew Morton , "Kemeng Shi" , =?utf-8?q?Pierre-Cl=C3=A9ment_Tos?= =?utf-8?q?i?= , Ard Biesheuvel , Mark Rutland , Javier Martinez Canillas , "Arnd Bergmann" , Fuad Tabba , Mark Brown , Joey Gouly , Kristina Martsenko , Randy Dunlap , "Bjorn Helgaas" , Jean-Philippe Brucker , "Mike Rapoport (IBM)" , "David Hildenbrand" , Roman Kagan , "moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , "open list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , open list , "open list:MEMORY MANAGEMENT" Subject: [RFC PATCH 5/7] arm64: KVM: Allocate vCPU gp-regs dynamically on VHE and KERNEL_SECRETMEM enabled systems Date: Wed, 11 Sep 2024 14:34:04 +0000 Message-ID: <20240911143421.85612-6-faresx@amazon.de> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240911143421.85612-1-faresx@amazon.de> References: <20240911143421.85612-1-faresx@amazon.de> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240911_073721_191644_0074241C X-CRM114-Status: GOOD ( 27.88 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org To allocate the vCPU gp-regs using secret memory, we need to dynamically allocate the vCPU gp-regs first. This is tricky with NVHE (Non-Virtualization Host Extensions) since it will require adjusting the virtual address on every access. With a large shared codebase between the OS and the hypervisor, it would be cumbersome to duplicate the code with one version using `kern_hyp_va()`. To avoid this issue, and since the secret memory feature will not be enabled on NVHE systems, we're introducing the following changes: 1. Maintain a `struct user_pt_regs regs_storage` in the vCPU context struct as a fallback storage for the vCPU gp-regs. 2. Introduce a pointer `struct user_pt_regs *regs` in the vCPU context struct to hold the dynamically allocated vCPU gp-regs. If we are on an NVHE system or a VHE (Virtualization Host Extensions) system that doesn't support `KERNEL_SECRETMEM`, we will use `ctxt_storage`. Accessing the context in this case will not require a de-reference operation. If we are on a VHE system with support for `KERNEL_SECRETMEM`, we will use the `regs` pointer. In this case, we will add one de-reference operation every time the vCPU gp-reg is accessed. Accessing the gp-regs embedded in the vCPU context without de-reference is done as: add \regs, \ctxt, #CPU_USER_PT_REGS_STRG Accessing the dynamically allocated gp-regs with de-reference is done as: ldr \regs, [\ctxt, #CPU_USER_PT_REGS] By default, we are using the first version. If we are booting on a system that supports VHE and `KERNEL_SECRETMEM`, we switch to the second version. We are also allocating the needed gp-regs allocations for vCPU, kvm_hyp_ctxt and kvm_host_data structs when needed. Signed-off-by: Fares Mehanna --- arch/arm64/include/asm/kvm_asm.h | 4 +- arch/arm64/include/asm/kvm_host.h | 24 +++++++++++- arch/arm64/kernel/asm-offsets.c | 1 + arch/arm64/kernel/image-vars.h | 1 + arch/arm64/kvm/arm.c | 63 ++++++++++++++++++++++++++++++- arch/arm64/kvm/va_layout.c | 23 +++++++++++ 6 files changed, 112 insertions(+), 4 deletions(-) diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h index fa4fb642a5f5..1d6de0806dbd 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -314,7 +314,9 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr, u64 elr_virt, .endm .macro get_ctxt_gp_regs ctxt, regs - add \regs, \ctxt, #CPU_USER_PT_REGS +alternative_cb ARM64_HAS_VIRT_HOST_EXTN, kvm_update_ctxt_gp_regs + add \regs, \ctxt, #CPU_USER_PT_REGS_STRG +alternative_cb_end .endm /* diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 31cbd62a5d06..23a10178d1b0 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -541,7 +541,9 @@ struct kvm_sysreg_masks { }; struct kvm_cpu_context { - struct user_pt_regs regs; /* sp = sp_el0 */ + struct user_pt_regs *regs; /* sp = sp_el0 */ + struct user_pt_regs regs_storage; + struct secretmem_area *regs_area; u64 spsr_abt; u64 spsr_und; @@ -946,7 +948,25 @@ struct kvm_vcpu_arch { #define vcpu_clear_on_unsupported_cpu(vcpu) \ vcpu_clear_flag(vcpu, ON_UNSUPPORTED_CPU) -#define ctxt_gp_regs(ctxt) (&(ctxt)->regs) +/* Static allocation is used if NVHE-host or if KERNEL_SECRETMEM is not enabled */ +static __inline bool kvm_use_dynamic_regs(void) +{ +#ifndef CONFIG_KERNEL_SECRETMEM + return false; +#endif + return cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN); +} + +static __always_inline struct user_pt_regs *ctxt_gp_regs(const struct kvm_cpu_context *ctxt) +{ + struct user_pt_regs *regs = (void *) ctxt; + asm volatile(ALTERNATIVE_CB("add %0, %0, %1\n", + ARM64_HAS_VIRT_HOST_EXTN, + kvm_update_ctxt_gp_regs) + : "+r" (regs) + : "I" (offsetof(struct kvm_cpu_context, regs_storage))); + return regs; +} #define vcpu_gp_regs(v) (ctxt_gp_regs(&(v)->arch.ctxt)) /* diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c index 27de1dddb0ab..275d480f5e65 100644 --- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c @@ -128,6 +128,7 @@ int main(void) DEFINE(VCPU_FAULT_DISR, offsetof(struct kvm_vcpu, arch.fault.disr_el1)); DEFINE(VCPU_HCR_EL2, offsetof(struct kvm_vcpu, arch.hcr_el2)); DEFINE(CPU_USER_PT_REGS, offsetof(struct kvm_cpu_context, regs)); + DEFINE(CPU_USER_PT_REGS_STRG, offsetof(struct kvm_cpu_context, regs_storage)); DEFINE(CPU_ELR_EL2, offsetof(struct kvm_cpu_context, sys_regs[ELR_EL2])); DEFINE(CPU_RGSR_EL1, offsetof(struct kvm_cpu_context, sys_regs[RGSR_EL1])); DEFINE(CPU_GCR_EL1, offsetof(struct kvm_cpu_context, sys_regs[GCR_EL1])); diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h index 8f5422ed1b75..e3bb626e299c 100644 --- a/arch/arm64/kernel/image-vars.h +++ b/arch/arm64/kernel/image-vars.h @@ -86,6 +86,7 @@ KVM_NVHE_ALIAS(kvm_patch_vector_branch); KVM_NVHE_ALIAS(kvm_update_va_mask); KVM_NVHE_ALIAS(kvm_get_kimage_voffset); KVM_NVHE_ALIAS(kvm_compute_final_ctr_el0); +KVM_NVHE_ALIAS(kvm_update_ctxt_gp_regs); KVM_NVHE_ALIAS(spectre_bhb_patch_loop_iter); KVM_NVHE_ALIAS(spectre_bhb_patch_loop_mitigation_enable); KVM_NVHE_ALIAS(spectre_bhb_patch_wa3); diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 9bef7638342e..78c562a060de 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -452,6 +453,7 @@ int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id) int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) { + unsigned long pages_needed; int err; spin_lock_init(&vcpu->arch.mp_state_lock); @@ -469,6 +471,14 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO; + if (kvm_use_dynamic_regs()) { + pages_needed = (sizeof(*vcpu_gp_regs(vcpu)) + PAGE_SIZE - 1) / PAGE_SIZE; + vcpu->arch.ctxt.regs_area = secretmem_allocate_pages(fls(pages_needed - 1)); + if (!vcpu->arch.ctxt.regs_area) + return -ENOMEM; + vcpu->arch.ctxt.regs = vcpu->arch.ctxt.regs_area->ptr; + } + /* Set up the timer */ kvm_timer_vcpu_init(vcpu); @@ -489,9 +499,14 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) err = kvm_vgic_vcpu_init(vcpu); if (err) - return err; + goto free_vcpu_ctxt; return kvm_share_hyp(vcpu, vcpu + 1); + +free_vcpu_ctxt: + if (kvm_use_dynamic_regs()) + secretmem_release_pages(vcpu->arch.ctxt.regs_area); + return err; } void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu) @@ -508,6 +523,9 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) kvm_pmu_vcpu_destroy(vcpu); kvm_vgic_vcpu_destroy(vcpu); kvm_arm_vcpu_destroy(vcpu); + + if (kvm_use_dynamic_regs()) + secretmem_release_pages(vcpu->arch.ctxt.regs_area); } void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) @@ -2683,6 +2701,45 @@ static int __init init_hyp_mode(void) return err; } +static int init_hyp_hve_mode(void) +{ + int cpu; + int err = 0; + + if (!kvm_use_dynamic_regs()) + return 0; + + /* Allocate gp-regs */ + for_each_possible_cpu(cpu) { + void *hyp_ctxt_regs; + void *kvm_host_data_regs; + + hyp_ctxt_regs = kzalloc(sizeof(struct user_pt_regs), GFP_KERNEL); + if (!hyp_ctxt_regs) { + err = -ENOMEM; + goto free_regs; + } + per_cpu(kvm_hyp_ctxt, cpu).regs = hyp_ctxt_regs; + + kvm_host_data_regs = kzalloc(sizeof(struct user_pt_regs), GFP_KERNEL); + if (!kvm_host_data_regs) { + err = -ENOMEM; + goto free_regs; + } + per_cpu(kvm_host_data, cpu).host_ctxt.regs = kvm_host_data_regs; + } + + return 0; + +free_regs: + for_each_possible_cpu(cpu) { + kfree(per_cpu(kvm_hyp_ctxt, cpu).regs); + kfree(per_cpu(kvm_host_data, cpu).host_ctxt.regs); + } + + return err; +} + struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr) { struct kvm_vcpu *vcpu = NULL; @@ -2806,6 +2863,10 @@ static __init int kvm_arm_init(void) err = init_hyp_mode(); if (err) goto out_err; + } else { + err = init_hyp_hve_mode(); + if (err) + goto out_err; } err = kvm_init_vector_slots(); diff --git a/arch/arm64/kvm/va_layout.c b/arch/arm64/kvm/va_layout.c index 91b22a014610..fcef7e89d042 100644 --- a/arch/arm64/kvm/va_layout.c +++ b/arch/arm64/kvm/va_layout.c @@ -185,6 +185,29 @@ void __init kvm_update_va_mask(struct alt_instr *alt, } } +void __init kvm_update_ctxt_gp_regs(struct alt_instr *alt, + __le32 *origptr, __le32 *updptr, int nr_inst) +{ + u32 rd, rn, imm, insn, oinsn; + + BUG_ON(nr_inst != 1); + + if (!kvm_use_dynamic_regs()) + return; + + oinsn = le32_to_cpu(origptr[0]); + rd = aarch64_insn_decode_register(AARCH64_INSN_REGTYPE_RD, oinsn); + rn = aarch64_insn_decode_register(AARCH64_INSN_REGTYPE_RN, oinsn); + imm = offsetof(struct kvm_cpu_context, regs); + + insn = aarch64_insn_gen_load_store_imm(rd, rn, imm, + AARCH64_INSN_SIZE_64, + AARCH64_INSN_LDST_LOAD_IMM_OFFSET); + BUG_ON(insn == AARCH64_BREAK_FAULT); + + updptr[0] = cpu_to_le32(insn); +} + void kvm_patch_vector_branch(struct alt_instr *alt, __le32 *origptr, __le32 *updptr, int nr_inst) { From patchwork Wed Sep 11 14:34:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fares Mehanna X-Patchwork-Id: 13800781 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 288E5EE49BD for ; Wed, 11 Sep 2024 15:02:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:CC: To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=ttqjEP9NZOH/UBgVCbYLfsAknImtx66onLISyRd5NFM=; b=B6De92zieujI88uDULlsYPN07j Kn8Lz3NMMT0udNTx5dLKgCJNHZ297h7wEksALc92rsKtHkLw+wVFYXe3Mz8KPhWGYlul4sSU7X95v 5A0I+uYpUzgR4MtC2qH9Zf9rvplhApHgsPNOiD1UrQP/nPyJkdjFEv66SGJ37j63eF7Vtz3iXjGbs P4iSXszge9pEpJjlAUMebSECNkMvwi1qjtnLlKUesMWlPvyApw65EyQIuY4CNdWyLpFtObuIMKI8U qdV/JciQEMC+GUpnV0Y8OjGGHQbMPANQzirN7goB2vMlDO8I7P2i5RK3Tg8VZ80M0LN+2uF9p9hGL SmP7/Vww==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1soOrY-0000000A3tJ-3tte; Wed, 11 Sep 2024 15:02:32 +0000 Received: from smtp-fw-80007.amazon.com ([99.78.197.218]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1soOTS-00000009xGe-3Dqn for linux-arm-kernel@lists.infradead.org; Wed, 11 Sep 2024 14:37:40 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1726065459; x=1757601459; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ttqjEP9NZOH/UBgVCbYLfsAknImtx66onLISyRd5NFM=; b=uaQOaXOENcpgQc/hhB8UQAG+FvHrUq4XkeDhTLi0Jut+XNyJt8x8dgYi rqdxQYQviTOzHvg0boMaQG/+NlOdYT1s60WO53oIvQIYCgyM7SP3X9W5m ZlGRCcKUyAlyaGQXwzsU++fwYrP26PjvRoT9vJszgljzfcuIyi2FRRdaf k=; X-IronPort-AV: E=Sophos;i="6.10,220,1719878400"; d="scan'208";a="329956727" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-east-1.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-80007.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Sep 2024 14:37:36 +0000 Received: from EX19MTAEUA001.ant.amazon.com [10.0.10.100:3852] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.38.136:2525] with esmtp (Farcaster) id ac0f2d97-7ed9-430d-a2d7-58bab541e8ef; Wed, 11 Sep 2024 14:37:35 +0000 (UTC) X-Farcaster-Flow-ID: ac0f2d97-7ed9-430d-a2d7-58bab541e8ef Received: from EX19D007EUB001.ant.amazon.com (10.252.51.82) by EX19MTAEUA001.ant.amazon.com (10.252.50.50) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Wed, 11 Sep 2024 14:37:33 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D007EUB001.ant.amazon.com (10.252.51.82) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.35; Wed, 11 Sep 2024 14:37:33 +0000 Received: from dev-dsk-faresx-1b-27755bf1.eu-west-1.amazon.com (10.253.79.181) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Wed, 11 Sep 2024 14:37:30 +0000 From: Fares Mehanna To: CC: , Fares Mehanna , "Marc Zyngier" , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , "Will Deacon" , Andrew Morton , "Kemeng Shi" , =?utf-8?q?Pierre-Cl=C3=A9ment_Tos?= =?utf-8?q?i?= , Ard Biesheuvel , Mark Rutland , Javier Martinez Canillas , "Arnd Bergmann" , Fuad Tabba , Mark Brown , Joey Gouly , Kristina Martsenko , Randy Dunlap , "Bjorn Helgaas" , Jean-Philippe Brucker , "Mike Rapoport (IBM)" , "David Hildenbrand" , Roman Kagan , "moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , "open list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , open list , "open list:MEMORY MANAGEMENT" Subject: [RFC PATCH 6/7] arm64: KVM: Refactor C-code to access vCPU fp-registers through macros Date: Wed, 11 Sep 2024 14:34:05 +0000 Message-ID: <20240911143421.85612-7-faresx@amazon.de> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240911143421.85612-1-faresx@amazon.de> References: <20240911143421.85612-1-faresx@amazon.de> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240911_073738_920537_41D11207 X-CRM114-Status: GOOD ( 19.01 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Unify how KVM accesses vCPU fp-regs by using vcpu_fp_regs(). This is a prerequisite to move the fp-regs later to be dynamically allocated for vCPUs. Signed-off-by: Fares Mehanna --- arch/arm64/include/asm/kvm_host.h | 2 ++ arch/arm64/kvm/arm.c | 2 +- arch/arm64/kvm/fpsimd.c | 2 +- arch/arm64/kvm/guest.c | 6 +++--- arch/arm64/kvm/hyp/include/hyp/switch.h | 4 ++-- arch/arm64/kvm/hyp/nvhe/hyp-main.c | 4 ++-- arch/arm64/kvm/reset.c | 2 +- 7 files changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 23a10178d1b0..e8ed2c12479f 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -968,6 +968,8 @@ static __always_inline struct user_pt_regs *ctxt_gp_regs(const struct kvm_cpu_co return regs; } #define vcpu_gp_regs(v) (ctxt_gp_regs(&(v)->arch.ctxt)) +#define ctxt_fp_regs(ctxt) (&(ctxt).fp_regs) +#define vcpu_fp_regs(v) (ctxt_fp_regs(&(v)->arch.ctxt)) /* * Only use __vcpu_sys_reg/ctxt_sys_reg if you know you want the diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 78c562a060de..7542af3f766a 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -2507,7 +2507,7 @@ static void finalize_init_hyp_mode(void) for_each_possible_cpu(cpu) { struct user_fpsimd_state *fpsimd_state; - fpsimd_state = &per_cpu_ptr_nvhe_sym(kvm_host_data, cpu)->host_ctxt.fp_regs; + fpsimd_state = ctxt_fp_regs(&per_cpu_ptr_nvhe_sym(kvm_host_data, cpu)->host_ctxt); per_cpu_ptr_nvhe_sym(kvm_host_data, cpu)->fpsimd_state = kern_hyp_va(fpsimd_state); } diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c index c53e5b14038d..c27c96ae22e1 100644 --- a/arch/arm64/kvm/fpsimd.c +++ b/arch/arm64/kvm/fpsimd.c @@ -130,7 +130,7 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu) * Currently we do not support SME guests so SVCR is * always 0 and we just need a variable to point to. */ - fp_state.st = &vcpu->arch.ctxt.fp_regs; + fp_state.st = vcpu_fp_regs(vcpu); fp_state.sve_state = vcpu->arch.sve_state; fp_state.sve_vl = vcpu->arch.sve_max_vl; fp_state.sme_state = NULL; diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c index 821a2b7de388..3474874a00a7 100644 --- a/arch/arm64/kvm/guest.c +++ b/arch/arm64/kvm/guest.c @@ -170,13 +170,13 @@ static void *core_reg_addr(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg) KVM_REG_ARM_CORE_REG(fp_regs.vregs[31]): off -= KVM_REG_ARM_CORE_REG(fp_regs.vregs[0]); off /= 4; - return &vcpu->arch.ctxt.fp_regs.vregs[off]; + return &vcpu_fp_regs(vcpu)->vregs[off]; case KVM_REG_ARM_CORE_REG(fp_regs.fpsr): - return &vcpu->arch.ctxt.fp_regs.fpsr; + return &vcpu_fp_regs(vcpu)->fpsr; case KVM_REG_ARM_CORE_REG(fp_regs.fpcr): - return &vcpu->arch.ctxt.fp_regs.fpcr; + return &vcpu_fp_regs(vcpu)->fpcr; default: return NULL; diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index d2ed0938fc90..1444bad519db 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -319,7 +319,7 @@ static inline void __hyp_sve_restore_guest(struct kvm_vcpu *vcpu) */ sve_cond_update_zcr_vq(vcpu_sve_max_vq(vcpu) - 1, SYS_ZCR_EL2); __sve_restore_state(vcpu_sve_pffr(vcpu), - &vcpu->arch.ctxt.fp_regs.fpsr, + &vcpu_fp_regs(vcpu)->fpsr, true); /* @@ -401,7 +401,7 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code) if (sve_guest) __hyp_sve_restore_guest(vcpu); else - __fpsimd_restore_state(&vcpu->arch.ctxt.fp_regs); + __fpsimd_restore_state(vcpu_fp_regs(vcpu)); /* Skip restoring fpexc32 for AArch64 guests */ if (!(read_sysreg(hcr_el2) & HCR_RW)) diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c index f43d845f3c4e..feb1dd37f2a5 100644 --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c @@ -32,7 +32,7 @@ static void __hyp_sve_save_guest(struct kvm_vcpu *vcpu) * on the VL, so use a consistent (i.e., the maximum) guest VL. */ sve_cond_update_zcr_vq(vcpu_sve_max_vq(vcpu) - 1, SYS_ZCR_EL2); - __sve_save_state(vcpu_sve_pffr(vcpu), &vcpu->arch.ctxt.fp_regs.fpsr, true); + __sve_save_state(vcpu_sve_pffr(vcpu), &vcpu_fp_regs(vcpu)->fpsr, true); write_sysreg_s(ZCR_ELx_LEN_MASK, SYS_ZCR_EL2); } @@ -71,7 +71,7 @@ static void fpsimd_sve_sync(struct kvm_vcpu *vcpu) if (vcpu_has_sve(vcpu)) __hyp_sve_save_guest(vcpu); else - __fpsimd_save_state(&vcpu->arch.ctxt.fp_regs); + __fpsimd_save_state(vcpu_fp_regs(vcpu)); if (system_supports_sve()) __hyp_sve_restore_host(); diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index 0b0ae5ae7bc2..5f38acf5d156 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -229,7 +229,7 @@ void kvm_reset_vcpu(struct kvm_vcpu *vcpu) /* Reset core registers */ memset(vcpu_gp_regs(vcpu), 0, sizeof(*vcpu_gp_regs(vcpu))); - memset(&vcpu->arch.ctxt.fp_regs, 0, sizeof(vcpu->arch.ctxt.fp_regs)); + memset(vcpu_fp_regs(vcpu), 0, sizeof(*vcpu_fp_regs(vcpu))); vcpu->arch.ctxt.spsr_abt = 0; vcpu->arch.ctxt.spsr_und = 0; vcpu->arch.ctxt.spsr_irq = 0; From patchwork Wed Sep 11 14:34:06 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fares Mehanna X-Patchwork-Id: 13800783 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 2E9DEEE49BC for ; Wed, 11 Sep 2024 15:04:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:CC: To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=Hqkci3dj0GUqOn7J4NBabiUfpbZVZgBkN05sOca7hnY=; b=tVPqJruV8IadfIRzgBCdLKBdxG QUefLc5REp9yFEWL83eBeplhWAh8urWnEuArInEzXoAcMQeTLTByaIuKroz/nmP4ZO5PkX3Iru8m4 hAbU1XJ7XI3pkDX9993BRJhXgS26r2RprfHs1bYjoBgBW+Ox7htSzyy8ZqzRLNLZnA+aSxme46Gaq eqOQfrFx9RwLygAaiEMheASEBW0KCqzHxi7i/D+3TnxXS0qtvBG1lXlSQHrGTvvV1UbtQu3/truQF pmYGW7w3TYhBBZK3ru3yaREyUVjYR+VfnLLdFaoHIn+fMTQi7/UPSywvEaG2tNcF6wFjd0FjVMSnK PAg5Wj8A==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1soOtd-0000000A4Km-0odS; Wed, 11 Sep 2024 15:04:41 +0000 Received: from smtp-fw-80007.amazon.com ([99.78.197.218]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1soOTt-00000009xfW-3QiM for linux-arm-kernel@lists.infradead.org; Wed, 11 Sep 2024 14:38:07 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.de; i=@amazon.de; q=dns/txt; s=amazon201209; t=1726065486; x=1757601486; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Hqkci3dj0GUqOn7J4NBabiUfpbZVZgBkN05sOca7hnY=; b=sNaJ9dwGJxnyWCCOy6Pok0wedQTv3IFkORvGiB0cbAGauH28iN+0L/EW 0c4e/M2Zz6WNtsnRVeoV3+jB7Cu8jw816Y3O1Bs9/6CbjTUzeQ2scah6/ vh2Vs2p0Hw8Cb+1C42OImp86hFD5hctMEbAjc+lY3T1WmeGevjjj41NHy E=; X-IronPort-AV: E=Sophos;i="6.10,220,1719878400"; d="scan'208";a="329956973" Received: from pdx4-co-svc-p1-lb2-vlan2.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.25.36.210]) by smtp-border-fw-80007.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Sep 2024 14:38:05 +0000 Received: from EX19MTAEUC001.ant.amazon.com [10.0.17.79:41644] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.25.181:2525] with esmtp (Farcaster) id 4f621809-6571-476e-b300-f14101ba8a55; Wed, 11 Sep 2024 14:38:03 +0000 (UTC) X-Farcaster-Flow-ID: 4f621809-6571-476e-b300-f14101ba8a55 Received: from EX19D007EUA001.ant.amazon.com (10.252.50.133) by EX19MTAEUC001.ant.amazon.com (10.252.51.193) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Wed, 11 Sep 2024 14:38:00 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D007EUA001.ant.amazon.com (10.252.50.133) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.35; Wed, 11 Sep 2024 14:37:59 +0000 Received: from dev-dsk-faresx-1b-27755bf1.eu-west-1.amazon.com (10.253.79.181) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34 via Frontend Transport; Wed, 11 Sep 2024 14:37:56 +0000 From: Fares Mehanna To: CC: , Fares Mehanna , "Marc Zyngier" , Oliver Upton , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , "Will Deacon" , Andrew Morton , "Kemeng Shi" , =?utf-8?q?Pierre-Cl=C3=A9ment_Tos?= =?utf-8?q?i?= , Ard Biesheuvel , Mark Rutland , Javier Martinez Canillas , "Arnd Bergmann" , Fuad Tabba , Mark Brown , Joey Gouly , Kristina Martsenko , Randy Dunlap , "Bjorn Helgaas" , Jean-Philippe Brucker , "Mike Rapoport (IBM)" , "David Hildenbrand" , Roman Kagan , "moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , "open list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , open list , "open list:MEMORY MANAGEMENT" Subject: [RFC PATCH 7/7] arm64: KVM: Allocate vCPU fp-regs dynamically on VHE and KERNEL_SECRETMEM enabled systems Date: Wed, 11 Sep 2024 14:34:06 +0000 Message-ID: <20240911143421.85612-8-faresx@amazon.de> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240911143421.85612-1-faresx@amazon.de> References: <20240911143421.85612-1-faresx@amazon.de> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240911_073806_299254_B77E55F8 X-CRM114-Status: GOOD ( 18.61 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Similar to what was done in this commit: "arm64: KVM: Allocate vCPU gp-regs dynamically on VHE and KERNEL_SECRETMEM enabled systems" We're moving fp-regs to dynamic memory for systems supporting VHE and compiled with KERNEL_SECRETMEM support. Otherwise, we will use the "fp_regs_storage" struct embedded in the vCPU context. Accessing fp-regs embedded in the vCPU context without de-reference is done as: add \regs, \ctxt, #offsetof(struct kvm_cpu_context, fp_regs_storage) Accessing the dynamically allocated fp-regs with de-reference is done as: ldr \regs, [\ctxt, #offsetof(struct kvm_cpu_context, fp_regs)] Signed-off-by: Fares Mehanna --- arch/arm64/include/asm/kvm_host.h | 16 ++++++++++++++-- arch/arm64/kernel/image-vars.h | 1 + arch/arm64/kvm/arm.c | 29 +++++++++++++++++++++++++++-- arch/arm64/kvm/va_layout.c | 23 +++++++++++++++++++---- 4 files changed, 61 insertions(+), 8 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index e8ed2c12479f..4132c57d7e69 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -550,7 +550,9 @@ struct kvm_cpu_context { u64 spsr_irq; u64 spsr_fiq; - struct user_fpsimd_state fp_regs; + struct user_fpsimd_state *fp_regs; + struct user_fpsimd_state fp_regs_storage; + struct secretmem_area *fp_regs_area; u64 sys_regs[NR_SYS_REGS]; @@ -968,7 +970,17 @@ static __always_inline struct user_pt_regs *ctxt_gp_regs(const struct kvm_cpu_co return regs; } #define vcpu_gp_regs(v) (ctxt_gp_regs(&(v)->arch.ctxt)) -#define ctxt_fp_regs(ctxt) (&(ctxt).fp_regs) + +static __always_inline struct user_fpsimd_state *ctxt_fp_regs(const struct kvm_cpu_context *ctxt) +{ + struct user_fpsimd_state *fp_regs = (void *) ctxt; + asm volatile(ALTERNATIVE_CB("add %0, %0, %1\n", + ARM64_HAS_VIRT_HOST_EXTN, + kvm_update_ctxt_fp_regs) + : "+r" (fp_regs) + : "I" (offsetof(struct kvm_cpu_context, fp_regs_storage))); + return fp_regs; +} #define vcpu_fp_regs(v) (ctxt_fp_regs(&(v)->arch.ctxt)) /* diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h index e3bb626e299c..904573598e0f 100644 --- a/arch/arm64/kernel/image-vars.h +++ b/arch/arm64/kernel/image-vars.h @@ -87,6 +87,7 @@ KVM_NVHE_ALIAS(kvm_update_va_mask); KVM_NVHE_ALIAS(kvm_get_kimage_voffset); KVM_NVHE_ALIAS(kvm_compute_final_ctr_el0); KVM_NVHE_ALIAS(kvm_update_ctxt_gp_regs); +KVM_NVHE_ALIAS(kvm_update_ctxt_fp_regs); KVM_NVHE_ALIAS(spectre_bhb_patch_loop_iter); KVM_NVHE_ALIAS(spectre_bhb_patch_loop_mitigation_enable); KVM_NVHE_ALIAS(spectre_bhb_patch_wa3); diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 7542af3f766a..17b42e9099c3 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -477,6 +477,14 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) if (!vcpu->arch.ctxt.regs_area) return -ENOMEM; vcpu->arch.ctxt.regs = vcpu->arch.ctxt.regs_area->ptr; + + pages_needed = (sizeof(*vcpu_fp_regs(vcpu)) + PAGE_SIZE - 1) / PAGE_SIZE; + vcpu->arch.ctxt.fp_regs_area = secretmem_allocate_pages(fls(pages_needed - 1)); + if (!vcpu->arch.ctxt.fp_regs_area) { + err = -ENOMEM; + goto free_vcpu_ctxt; + } + vcpu->arch.ctxt.fp_regs = vcpu->arch.ctxt.fp_regs_area->ptr; } /* Set up the timer */ @@ -504,8 +512,10 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) return kvm_share_hyp(vcpu, vcpu + 1); free_vcpu_ctxt: - if (kvm_use_dynamic_regs()) + if (kvm_use_dynamic_regs()) { secretmem_release_pages(vcpu->arch.ctxt.regs_area); + secretmem_release_pages(vcpu->arch.ctxt.fp_regs_area); + } return err; } @@ -524,8 +534,10 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) kvm_vgic_vcpu_destroy(vcpu); kvm_arm_vcpu_destroy(vcpu); - if (kvm_use_dynamic_regs()) + if (kvm_use_dynamic_regs()) { secretmem_release_pages(vcpu->arch.ctxt.regs_area); + secretmem_release_pages(vcpu->arch.ctxt.fp_regs_area); + } } void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) @@ -2729,12 +2741,25 @@ static int init_hyp_hve_mode(void) per_cpu(kvm_host_data, cpu).host_ctxt.regs = kvm_host_data_regs; } + /* Allocate fp-regs */ + for_each_possible_cpu(cpu) { + void *kvm_host_data_regs; + + kvm_host_data_regs = kzalloc(sizeof(struct user_fpsimd_state), GFP_KERNEL); + if (!kvm_host_data_regs) { + err = -ENOMEM; + goto free_regs; + } + per_cpu(kvm_host_data, cpu).host_ctxt.fp_regs = kvm_host_data_regs; + } + return 0; free_regs: for_each_possible_cpu(cpu) { kfree(per_cpu(kvm_hyp_ctxt, cpu).regs); kfree(per_cpu(kvm_host_data, cpu).host_ctxt.regs); + kfree(per_cpu(kvm_host_data, cpu).host_ctxt.fp_regs); } return err; diff --git a/arch/arm64/kvm/va_layout.c b/arch/arm64/kvm/va_layout.c index fcef7e89d042..ba1030fa5b08 100644 --- a/arch/arm64/kvm/va_layout.c +++ b/arch/arm64/kvm/va_layout.c @@ -185,10 +185,12 @@ void __init kvm_update_va_mask(struct alt_instr *alt, } } -void __init kvm_update_ctxt_gp_regs(struct alt_instr *alt, - __le32 *origptr, __le32 *updptr, int nr_inst) +static __always_inline void __init kvm_update_ctxt_regs(struct alt_instr *alt, + __le32 *origptr, + __le32 *updptr, + int nr_inst, u32 imm) { - u32 rd, rn, imm, insn, oinsn; + u32 rd, rn, insn, oinsn; BUG_ON(nr_inst != 1); @@ -198,7 +200,6 @@ void __init kvm_update_ctxt_gp_regs(struct alt_instr *alt, oinsn = le32_to_cpu(origptr[0]); rd = aarch64_insn_decode_register(AARCH64_INSN_REGTYPE_RD, oinsn); rn = aarch64_insn_decode_register(AARCH64_INSN_REGTYPE_RN, oinsn); - imm = offsetof(struct kvm_cpu_context, regs); insn = aarch64_insn_gen_load_store_imm(rd, rn, imm, AARCH64_INSN_SIZE_64, @@ -208,6 +209,20 @@ void __init kvm_update_ctxt_gp_regs(struct alt_instr *alt, updptr[0] = cpu_to_le32(insn); } +void __init kvm_update_ctxt_gp_regs(struct alt_instr *alt, + __le32 *origptr, __le32 *updptr, int nr_inst) +{ + u32 offset = offsetof(struct kvm_cpu_context, regs); + kvm_update_ctxt_regs(alt, origptr, updptr, nr_inst, offset); +} + +void __init kvm_update_ctxt_fp_regs(struct alt_instr *alt, + __le32 *origptr, __le32 *updptr, int nr_inst) +{ + u32 offset = offsetof(struct kvm_cpu_context, fp_regs); + kvm_update_ctxt_regs(alt, origptr, updptr, nr_inst, offset); +} + void kvm_patch_vector_branch(struct alt_instr *alt, __le32 *origptr, __le32 *updptr, int nr_inst) {