Message ID | 1678329614-3482-7-git-send-email-mikelley@microsoft.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | Add PCI pass-thru support to Hyper-V Confidential VMs | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On Wed, Mar 08, 2023 at 06:40:07PM -0800, Michael Kelley wrote: > diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c > index 49b44f8..d1c3306 100644 > --- a/arch/x86/coco/core.c > +++ b/arch/x86/coco/core.c > @@ -88,8 +106,6 @@ bool cc_platform_has(enum cc_attr attr) > return amd_cc_platform_has(attr); > case CC_VENDOR_INTEL: > return intel_cc_platform_has(attr); > - case CC_VENDOR_HYPERV: > - return hyperv_cc_platform_has(attr); > default: > return false; > } > @@ -103,11 +119,14 @@ u64 cc_mkenc(u64 val) > * encryption status of the page. > * > * - for AMD, bit *set* means the page is encrypted > - * - for Intel *clear* means encrypted. > + * - for AMD with vTOM and for Intel, *clear* means encrypted > */ > switch (vendor) { > case CC_VENDOR_AMD: > - return val | cc_mask; > + if (sev_status & MSR_AMD64_SNP_VTOM) > + return val & ~cc_mask; This is silly. It should simply be: if (sev_status & MSR_AMD64_SNP_VTOM) return val; > + else > + return val | cc_mask; > case CC_VENDOR_INTEL: > return val & ~cc_mask; > default: > @@ -120,7 +139,10 @@ u64 cc_mkdec(u64 val) > /* See comment in cc_mkenc() */ > switch (vendor) { > case CC_VENDOR_AMD: > - return val & ~cc_mask; > + if (sev_status & MSR_AMD64_SNP_VTOM) > + return val | cc_mask; So if you set the C-bit, that doesn't make it decrypted on AMD. cc_mask on VTOM is 0 so why even bother? Same as the above. > + else > + return val & ~cc_mask; > case CC_VENDOR_INTEL: > return val | cc_mask; > default: ... > +void __init hv_vtom_init(void) > +{ > + /* > + * By design, a VM using vTOM doesn't see the SEV setting, > + * so SEV initialization is bypassed and sev_status isn't set. > + * Set it here to indicate a vTOM VM. > + */ This looks like a hack. The SEV status MSR cannot be intercepted so the guest should see vTOM. How are you running vTOM without setting it even up?! > + sev_status = MSR_AMD64_SNP_VTOM; > + cc_set_vendor(CC_VENDOR_AMD); > + cc_set_mask(ms_hyperv.shared_gpa_boundary); > + physical_mask &= ms_hyperv.shared_gpa_boundary - 1; > + > + x86_platform.hyper.is_private_mmio = hv_is_private_mmio; > + x86_platform.guest.enc_cache_flush_required = hv_vtom_cache_flush_required; > + x86_platform.guest.enc_tlb_flush_required = hv_vtom_tlb_flush_required; > + x86_platform.guest.enc_status_change_finish = hv_vtom_set_host_visibility; > +} > + > +#endif /* CONFIG_AMD_MEM_ENCRYPT */ > + > /* > * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM. > */
From: Borislav Petkov <bp@alien8.de> Sent: Monday, March 20, 2023 4:23 AM > > On Wed, Mar 08, 2023 at 06:40:07PM -0800, Michael Kelley wrote: > > diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c > > index 49b44f8..d1c3306 100644 > > --- a/arch/x86/coco/core.c > > +++ b/arch/x86/coco/core.c > > @@ -88,8 +106,6 @@ bool cc_platform_has(enum cc_attr attr) > > return amd_cc_platform_has(attr); > > case CC_VENDOR_INTEL: > > return intel_cc_platform_has(attr); > > - case CC_VENDOR_HYPERV: > > - return hyperv_cc_platform_has(attr); > > default: > > return false; > > } > > @@ -103,11 +119,14 @@ u64 cc_mkenc(u64 val) > > * encryption status of the page. > > * > > * - for AMD, bit *set* means the page is encrypted > > - * - for Intel *clear* means encrypted. > > + * - for AMD with vTOM and for Intel, *clear* means encrypted > > */ > > switch (vendor) { > > case CC_VENDOR_AMD: > > - return val | cc_mask; > > + if (sev_status & MSR_AMD64_SNP_VTOM) > > + return val & ~cc_mask; > > This is silly. It should simply be: > > if (sev_status & MSR_AMD64_SNP_VTOM) > return val; > To be clear, cc_mask contains the vTOM bit. It's not zero. See the call to cc_set_mask() further down below. My code makes sure the vTOM bit is *not* set for the encrypted case, just like the CC_VENDOR_INTEL code below does for the TDX SHARED bit. > > > + else > > + return val | cc_mask; > > case CC_VENDOR_INTEL: > > return val & ~cc_mask; > > default: > > @@ -120,7 +139,10 @@ u64 cc_mkdec(u64 val) > > /* See comment in cc_mkenc() */ > > switch (vendor) { > > case CC_VENDOR_AMD: > > - return val & ~cc_mask; > > + if (sev_status & MSR_AMD64_SNP_VTOM) > > + return val | cc_mask; > > So if you set the C-bit, that doesn't make it decrypted on AMD. cc_mask > on VTOM is 0 so why even bother? cc_mask is *not* zero in the vTOM case. It contains the vTOM bit. The C-bit is not used or set in the vTOM case. > > Same as the above. > > > + else > > + return val & ~cc_mask; > > case CC_VENDOR_INTEL: > > return val | cc_mask; > > default: > > ... > > > +void __init hv_vtom_init(void) > > +{ > > + /* > > + * By design, a VM using vTOM doesn't see the SEV setting, > > + * so SEV initialization is bypassed and sev_status isn't set. > > + * Set it here to indicate a vTOM VM. > > + */ > > This looks like a hack. The SEV status MSR cannot be intercepted so the > guest should see vTOM. How are you running vTOM without setting it even up?! > In a vTOM VM, CPUID leaf 0x8000001f is filtered so it does *not* return Bit 1 (SEV) as set. Consequently, sme_enable() does not read MSR_AMD64_SEV and does not populate sev_status. The Linux boot sequence proceeds as if SEV-SNP (and any other memory encryption) is *not* enabled, which is the whole point of vTOM mode. The tricky SME/SEV code for setting the C-bit, getting the kernel encrypted, etc. is not needed or wanted because the hardware already encrypts all memory by default. The bootloader runs with memory encrypted, it loads the kernel into encrypted memory, and so forth. sev_status is used here only to communicate to cc_platform_has() and cc_mkenc() and cc_mkdec() that we're in vTOM mode. If using sev_status to communicate is confusing, this could just as easily be some new global variable, and sev_status could be left as all zeros. Michael > > + sev_status = MSR_AMD64_SNP_VTOM; > > + cc_set_vendor(CC_VENDOR_AMD); > > + cc_set_mask(ms_hyperv.shared_gpa_boundary); > > + physical_mask &= ms_hyperv.shared_gpa_boundary - 1; > > + > > + x86_platform.hyper.is_private_mmio = hv_is_private_mmio; > > + x86_platform.guest.enc_cache_flush_required = hv_vtom_cache_flush_required; > > + x86_platform.guest.enc_tlb_flush_required = hv_vtom_tlb_flush_required; > > + x86_platform.guest.enc_status_change_finish = hv_vtom_set_host_visibility; > > +} > > + > > +#endif /* CONFIG_AMD_MEM_ENCRYPT */
On Mon, Mar 20, 2023 at 01:30:54PM +0000, Michael Kelley (LINUX) wrote: > In a vTOM VM, CPUID leaf 0x8000001f is filtered so it does *not* return > Bit 1 (SEV) as set. Consequently, sme_enable() does not read MSR_AMD64_SEV > and does not populate sev_status. So how much of the hardware side of vTOM are you actually using besides the actual encryption? Virtual TOM MSR (C001_0135)? Anything else? AFAICT, you're passing the vTOM value from CPUID from the hypervisor so I'm guessing that happens underneath in the hypervisor? I'd like to make sure there are no more "surprises" down the road... Thx.
From: Borislav Petkov <bp@alien8.de> Sent: Monday, March 20, 2023 11:17 AM > > On Mon, Mar 20, 2023 at 01:30:54PM +0000, Michael Kelley (LINUX) wrote: > > In a vTOM VM, CPUID leaf 0x8000001f is filtered so it does *not* return > > Bit 1 (SEV) as set. Consequently, sme_enable() does not read MSR_AMD64_SEV > > and does not populate sev_status. > > So how much of the hardware side of vTOM are you actually using besides > the actual encryption? vTOM mode in Linux is just turning on/off the vTOM bit in the PTE to create unencrypted or encrypted mappings, with encrypted being the default. There's no other hardware dependency except CPUID leaf 0x8000001f reporting that SEV is not enabled, and the GHCB protocol (if you want to call that "hardware") as mentioned below. > > Virtual TOM MSR (C001_0135)? Anything else? > > AFAICT, you're passing the vTOM value from CPUID from the hypervisor so > I'm guessing that happens underneath in the hypervisor? Correct. Linux in vTOM mode is not reading MSR 0xC0010135. The PTE bit position of the vTOM bit is coming from Hyper-V (or the paravisor) via a synthetic MSR. Presumably Hyper-V or the paravisor is reading the vTOM MSR, but I haven't reviewed that code. > > I'd like to make sure there are no more "surprises" down the road... > The only other vTOM changes are for software protocols for communication between the guest and Hyper-V (or the paravisor). Some hypercalls and synthetic MSR accesses need to bypass the paravisor and are handled with the GHCB protocol. The Hyper-V and VMbus specific code in Linux handles those idiosyncrasies. That code went into the 5.15 kernel and isn't modified by this patch set. The vTOM case is down to the bare minimum in the use of the hardware functionality, so it's unlikely anything else would turn up as being different. Michael
On Mon, Mar 20, 2023 at 06:50:05PM +0000, Michael Kelley (LINUX) wrote: > The vTOM case is down to the bare minimum in the use of the hardware > functionality, so it's unlikely anything else would turn up as being > different. Ok, lemme queue 1-2,4-6 as previously mentioned. Thx.
On Thu, Mar 23, 2023 at 02:43:06PM +0100, Borislav Petkov wrote:
> Ok, lemme queue 1-2,4-6 as previously mentioned.
With first six applied:
arch/x86/coco/core.c:123:7: error: use of undeclared identifier 'sev_status'
if (sev_status & MSR_AMD64_SNP_VTOM)
^
arch/x86/coco/core.c:139:7: error: use of undeclared identifier 'sev_status'
if (sev_status & MSR_AMD64_SNP_VTOM)
^
2 errors generated.
make[3]: *** [scripts/Makefile.build:252: arch/x86/coco/core.o] Error 1
make[2]: *** [scripts/Makefile.build:494: arch/x86/coco] Error 2
make[1]: *** [scripts/Makefile.build:494: arch/x86] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:2025: .] Error 2
compiler is:
Debian clang version 14.0.6-2
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
.config is attached.
> From: Borislav Petkov <bp@alien8.de> > Sent: Friday, March 24, 2023 8:49 AM > ... > With first six applied: > > arch/x86/coco/core.c:123:7: error: use of undeclared identifier 'sev_status' > if (sev_status & MSR_AMD64_SNP_VTOM) > ^ Your config doesn't define CONFIG_AMD_MEM_ENCRYPT: # CONFIG_AMD_MEM_ENCRYPT is not set
Hi, On 3/24/23 10:10 AM, Dexuan Cui wrote: >> From: Borislav Petkov <bp@alien8.de> >> Sent: Friday, March 24, 2023 8:49 AM >> ... >> With first six applied: >> >> arch/x86/coco/core.c:123:7: error: use of undeclared identifier 'sev_status' >> if (sev_status & MSR_AMD64_SNP_VTOM) >> ^ > > Your config doesn't define CONFIG_AMD_MEM_ENCRYPT: > # CONFIG_AMD_MEM_ENCRYPT is not set If you have config dependency, I think you should fix it in the code or add Kconfig dependency.
On Fri, Mar 24, 2023 at 05:10:26PM +0000, Dexuan Cui wrote: > Your config doesn't define CONFIG_AMD_MEM_ENCRYPT: > # CONFIG_AMD_MEM_ENCRYPT is not set That's why it is called randconfig builds. That doesn't mean that they should not build properly.
From: Borislav Petkov <bp@alien8.de> Sent: Friday, March 24, 2023 8:49 AM > > On Thu, Mar 23, 2023 at 02:43:06PM +0100, Borislav Petkov wrote: > > Ok, lemme queue 1-2,4-6 as previously mentioned. > > With first six applied: > > arch/x86/coco/core.c:123:7: error: use of undeclared identifier 'sev_status' > if (sev_status & MSR_AMD64_SNP_VTOM) > ^ > arch/x86/coco/core.c:139:7: error: use of undeclared identifier 'sev_status' > if (sev_status & MSR_AMD64_SNP_VTOM) > ^ > 2 errors generated. > make[3]: *** [scripts/Makefile.build:252: arch/x86/coco/core.o] Error 1 > make[2]: *** [scripts/Makefile.build:494: arch/x86/coco] Error 2 > make[1]: *** [scripts/Makefile.build:494: arch/x86] Error 2 > make[1]: *** Waiting for unfinished jobs.... > make: *** [Makefile:2025: .] Error 2 > > compiler is: > > Debian clang version 14.0.6-2 > Target: x86_64-pc-linux-gnu > Thread model: posix > InstalledDir: /usr/bin > > .config is attached. > OK, I see what went wrong. I had tested with CONFIG_AMD_MEM_ENCRYPT=n and didn't see any compile problems. It turns out in my test, arch/x86/coco/core.c wasn't built at all because I did not also have TDX configured, so I didn't see any errors. But with CONFIG_INTEL_TDX_GUEST=y, coco/core.c gets built, and the error with undefined sev_status pops out. The straightforward fix is somewhat ugly. That's to put #ifdef CONFIG_AMD_MEM_ENCRYPT around the entire CC_VENDOR_AMD case in cc_mkenc() and in cc_mkdec(). Or put it just around the test of sev_status. Perhaps a cleaner way would be to have a "vendor_subtype" variable declared in arch/x86/coco/core.c and tested instead of sev_status. That subtype variable would be set from hv_vtom_init(), maybe via a separate accessor function. But didn't I recently see a patch that makes the existing "vendor" variable no longer static? In that case just setting vendor_subtype without the accessor function may be OK. What's your preference Boris? I can spin a v7 of the patch series that fixes this, and that squashes the last two patches of the series per Lorenz Pieralisi's comments. Michael
From: Michael Kelley (LINUX) <mikelley@microsoft.com> > > From: Borislav Petkov <bp@alien8.de> Sent: Friday, March 24, 2023 8:49 AM > > > > On Thu, Mar 23, 2023 at 02:43:06PM +0100, Borislav Petkov wrote: > > > Ok, lemme queue 1-2,4-6 as previously mentioned. > > > > With first six applied: > > > > arch/x86/coco/core.c:123:7: error: use of undeclared identifier 'sev_status' > > if (sev_status & MSR_AMD64_SNP_VTOM) > > ^ > > arch/x86/coco/core.c:139:7: error: use of undeclared identifier 'sev_status' > > if (sev_status & MSR_AMD64_SNP_VTOM) > > ^ > > 2 errors generated. > > make[3]: *** [scripts/Makefile.build:252: arch/x86/coco/core.o] Error 1 > > make[2]: *** [scripts/Makefile.build:494: arch/x86/coco] Error 2 > > make[1]: *** [scripts/Makefile.build:494: arch/x86] Error 2 > > make[1]: *** Waiting for unfinished jobs.... > > make: *** [Makefile:2025: .] Error 2 > > > > compiler is: > > > > Debian clang version 14.0.6-2 > > Target: x86_64-pc-linux-gnu > > Thread model: posix > > InstalledDir: /usr/bin > > > > .config is attached. > > > > OK, I see what went wrong. I had tested with CONFIG_AMD_MEM_ENCRYPT=n > and didn't see any compile problems. It turns out in my test, arch/x86/coco/core.c > wasn't built at all because I did not also have TDX configured, so I didn't see > any errors. But with CONFIG_INTEL_TDX_GUEST=y, coco/core.c gets built, and > the error with undefined sev_status pops out. > > The straightforward fix is somewhat ugly. That's to put #ifdef > CONFIG_AMD_MEM_ENCRYPT around the entire CC_VENDOR_AMD > case in cc_mkenc() and in cc_mkdec(). Or put it just around the test of > sev_status. > > Perhaps a cleaner way would be to have a "vendor_subtype" variable > declared in arch/x86/coco/core.c and tested instead of sev_status. > That subtype variable would be set from hv_vtom_init(), maybe via > a separate accessor function. But didn't I recently see a patch that > makes the existing "vendor" variable no longer static? In that case > just setting vendor_subtype without the accessor function may be > OK. > > What's your preference Boris? I can spin a v7 of the patch series > that fixes this, and that squashes the last two patches of the series > per Lorenz Pieralisi's comments. > Actually, a pretty clean approach is to #define sev_status 0ULL in the #else /* !CONFIG_AMD_MEM_ENCRYPT */ half of arch/x86/include/asm/mem_encrypt.h. That's where the existing extern statement is, and sme_me_mask is already handled that way. I'll respin the patch set with that approach. Michael
diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c index 49b44f8..d1c3306 100644 --- a/arch/x86/coco/core.c +++ b/arch/x86/coco/core.c @@ -29,6 +29,18 @@ static bool intel_cc_platform_has(enum cc_attr attr) } } +/* Helper function for AMD SEV-SNP vTOM case */ +static __maybe_unused bool amd_cc_platform_vtom(enum cc_attr attr) +{ + switch (attr) { + case CC_ATTR_GUEST_MEM_ENCRYPT: + case CC_ATTR_MEM_ENCRYPT: + return true; + default: + return false; + } +} + /* * SME and SEV are very similar but they are not the same, so there are * times that the kernel will need to distinguish between SME and SEV. The @@ -41,9 +53,20 @@ static bool intel_cc_platform_has(enum cc_attr attr) * up under SME the trampoline area cannot be encrypted, whereas under SEV * the trampoline area must be encrypted. */ + static bool amd_cc_platform_has(enum cc_attr attr) { #ifdef CONFIG_AMD_MEM_ENCRYPT + + /* + * Handle the SEV-SNP vTOM case where sme_me_mask is zero, and + * the other levels of SME/SEV functionality, including C-bit + * based SEV-SNP, are not enabled. + */ + if (sev_status & MSR_AMD64_SNP_VTOM) + return amd_cc_platform_vtom(attr); + + /* Handle the C-bit case */ switch (attr) { case CC_ATTR_MEM_ENCRYPT: return sme_me_mask; @@ -76,11 +99,6 @@ static bool amd_cc_platform_has(enum cc_attr attr) #endif } -static bool hyperv_cc_platform_has(enum cc_attr attr) -{ - return attr == CC_ATTR_GUEST_MEM_ENCRYPT; -} - bool cc_platform_has(enum cc_attr attr) { switch (vendor) { @@ -88,8 +106,6 @@ bool cc_platform_has(enum cc_attr attr) return amd_cc_platform_has(attr); case CC_VENDOR_INTEL: return intel_cc_platform_has(attr); - case CC_VENDOR_HYPERV: - return hyperv_cc_platform_has(attr); default: return false; } @@ -103,11 +119,14 @@ u64 cc_mkenc(u64 val) * encryption status of the page. * * - for AMD, bit *set* means the page is encrypted - * - for Intel *clear* means encrypted. + * - for AMD with vTOM and for Intel, *clear* means encrypted */ switch (vendor) { case CC_VENDOR_AMD: - return val | cc_mask; + if (sev_status & MSR_AMD64_SNP_VTOM) + return val & ~cc_mask; + else + return val | cc_mask; case CC_VENDOR_INTEL: return val & ~cc_mask; default: @@ -120,7 +139,10 @@ u64 cc_mkdec(u64 val) /* See comment in cc_mkenc() */ switch (vendor) { case CC_VENDOR_AMD: - return val & ~cc_mask; + if (sev_status & MSR_AMD64_SNP_VTOM) + return val | cc_mask; + else + return val & ~cc_mask; case CC_VENDOR_INTEL: return val | cc_mask; default: diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index 41ef036..edbc67e 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -29,7 +29,6 @@ #include <linux/syscore_ops.h> #include <clocksource/hyperv_timer.h> #include <linux/highmem.h> -#include <linux/swiotlb.h> int hyperv_init_cpuhp; u64 hv_current_partition_id = ~0ull; @@ -504,16 +503,6 @@ void __init hyperv_init(void) /* Query the VMs extended capability once, so that it can be cached. */ hv_query_ext_cap(0); -#ifdef CONFIG_SWIOTLB - /* - * Swiotlb bounce buffer needs to be mapped in extra address - * space. Map function doesn't work in the early place and so - * call swiotlb_update_mem_attributes() here. - */ - if (hv_is_isolation_supported()) - swiotlb_update_mem_attributes(); -#endif - return; clean_guest_os_id: diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c index 5648efb..f6a020c 100644 --- a/arch/x86/hyperv/ivm.c +++ b/arch/x86/hyperv/ivm.c @@ -13,6 +13,8 @@ #include <asm/svm.h> #include <asm/sev.h> #include <asm/io.h> +#include <asm/coco.h> +#include <asm/mem_encrypt.h> #include <asm/mshyperv.h> #include <asm/hypervisor.h> @@ -233,7 +235,6 @@ void hv_ghcb_msr_read(u64 msr, u64 *value) local_irq_restore(flags); } EXPORT_SYMBOL_GPL(hv_ghcb_msr_read); -#endif /* * hv_mark_gpa_visibility - Set pages visible to host via hvcall. @@ -286,27 +287,25 @@ static int hv_mark_gpa_visibility(u16 count, const u64 pfn[], } /* - * hv_set_mem_host_visibility - Set specified memory visible to host. + * hv_vtom_set_host_visibility - Set specified memory visible to host. * * In Isolation VM, all guest memory is encrypted from host and guest * needs to set memory visible to host via hvcall before sharing memory * with host. This function works as wrap of hv_mark_gpa_visibility() * with memory base and size. */ -int hv_set_mem_host_visibility(unsigned long kbuffer, int pagecount, bool visible) +static bool hv_vtom_set_host_visibility(unsigned long kbuffer, int pagecount, bool enc) { - enum hv_mem_host_visibility visibility = visible ? - VMBUS_PAGE_VISIBLE_READ_WRITE : VMBUS_PAGE_NOT_VISIBLE; + enum hv_mem_host_visibility visibility = enc ? + VMBUS_PAGE_NOT_VISIBLE : VMBUS_PAGE_VISIBLE_READ_WRITE; u64 *pfn_array; int ret = 0; + bool result = true; int i, pfn; - if (!hv_is_isolation_supported() || !hv_hypercall_pg) - return 0; - pfn_array = kmalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL); if (!pfn_array) - return -ENOMEM; + return false; for (i = 0, pfn = 0; i < pagecount; i++) { pfn_array[pfn] = virt_to_hvpfn((void *)kbuffer + i * HV_HYP_PAGE_SIZE); @@ -315,17 +314,68 @@ int hv_set_mem_host_visibility(unsigned long kbuffer, int pagecount, bool visibl if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) { ret = hv_mark_gpa_visibility(pfn, pfn_array, visibility); - if (ret) + if (ret) { + result = false; goto err_free_pfn_array; + } pfn = 0; } } err_free_pfn_array: kfree(pfn_array); - return ret; + return result; } +static bool hv_vtom_tlb_flush_required(bool private) +{ + return true; +} + +static bool hv_vtom_cache_flush_required(void) +{ + return false; +} + +static bool hv_is_private_mmio(u64 addr) +{ + /* + * Hyper-V always provides a single IO-APIC in a guest VM. + * When a paravisor is used, it is emulated by the paravisor + * in the guest context and must be mapped private. + */ + if (addr >= HV_IOAPIC_BASE_ADDRESS && + addr < (HV_IOAPIC_BASE_ADDRESS + PAGE_SIZE)) + return true; + + /* Same with a vTPM */ + if (addr >= VTPM_BASE_ADDRESS && + addr < (VTPM_BASE_ADDRESS + PAGE_SIZE)) + return true; + + return false; +} + +void __init hv_vtom_init(void) +{ + /* + * By design, a VM using vTOM doesn't see the SEV setting, + * so SEV initialization is bypassed and sev_status isn't set. + * Set it here to indicate a vTOM VM. + */ + sev_status = MSR_AMD64_SNP_VTOM; + cc_set_vendor(CC_VENDOR_AMD); + cc_set_mask(ms_hyperv.shared_gpa_boundary); + physical_mask &= ms_hyperv.shared_gpa_boundary - 1; + + x86_platform.hyper.is_private_mmio = hv_is_private_mmio; + x86_platform.guest.enc_cache_flush_required = hv_vtom_cache_flush_required; + x86_platform.guest.enc_tlb_flush_required = hv_vtom_tlb_flush_required; + x86_platform.guest.enc_status_change_finish = hv_vtom_set_host_visibility; +} + +#endif /* CONFIG_AMD_MEM_ENCRYPT */ + /* * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM. */ diff --git a/arch/x86/include/asm/coco.h b/arch/x86/include/asm/coco.h index 3d98c3a..d2c6a2e 100644 --- a/arch/x86/include/asm/coco.h +++ b/arch/x86/include/asm/coco.h @@ -7,7 +7,6 @@ enum cc_vendor { CC_VENDOR_NONE, CC_VENDOR_AMD, - CC_VENDOR_HYPERV, CC_VENDOR_INTEL, }; diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index 4c4c0ec..e3cef98 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -11,6 +11,14 @@ #include <asm/paravirt.h> #include <asm/mshyperv.h> +/* + * Hyper-V always provides a single IO-APIC at this MMIO address. + * Ideally, the value should be looked up in ACPI tables, but it + * is needed for mapping the IO-APIC early in boot on Confidential + * VMs, before ACPI functions can be used. + */ +#define HV_IOAPIC_BASE_ADDRESS 0xfec00000 + union hv_ghcb; DECLARE_STATIC_KEY_FALSE(isolation_type_snp); @@ -206,18 +214,19 @@ static inline void hv_apic_init(void) {} int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector, struct hv_interrupt_entry *entry); int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry); -int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible); #ifdef CONFIG_AMD_MEM_ENCRYPT void hv_ghcb_msr_write(u64 msr, u64 value); void hv_ghcb_msr_read(u64 msr, u64 *value); bool hv_ghcb_negotiate_protocol(void); void hv_ghcb_terminate(unsigned int set, unsigned int reason); +void hv_vtom_init(void); #else static inline void hv_ghcb_msr_write(u64 msr, u64 value) {} static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {} static inline bool hv_ghcb_negotiate_protocol(void) { return false; } static inline void hv_ghcb_terminate(unsigned int set, unsigned int reason) {} +static inline void hv_vtom_init(void) {} #endif extern bool hv_isolation_type_snp(void); @@ -259,11 +268,6 @@ static inline void hv_set_register(unsigned int reg, u64 value) { } static inline u64 hv_get_register(unsigned int reg) { return 0; } static inline void hv_set_non_nested_register(unsigned int reg, u64 value) { } static inline u64 hv_get_non_nested_register(unsigned int reg) { return 0; } -static inline int hv_set_mem_host_visibility(unsigned long addr, int numpages, - bool visible) -{ - return -1; -} #endif /* CONFIG_HYPERV */ diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index f36dc2f..ded7506 100644 --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -33,7 +33,6 @@ #include <asm/nmi.h> #include <clocksource/hyperv_timer.h> #include <asm/numa.h> -#include <asm/coco.h> /* Is Linux running as the root partition? */ bool hv_root_partition; @@ -397,8 +396,10 @@ static void __init ms_hyperv_init_platform(void) if (ms_hyperv.priv_high & HV_ISOLATION) { ms_hyperv.isolation_config_a = cpuid_eax(HYPERV_CPUID_ISOLATION_CONFIG); ms_hyperv.isolation_config_b = cpuid_ebx(HYPERV_CPUID_ISOLATION_CONFIG); - ms_hyperv.shared_gpa_boundary = - BIT_ULL(ms_hyperv.shared_gpa_boundary_bits); + + if (ms_hyperv.shared_gpa_boundary_active) + ms_hyperv.shared_gpa_boundary = + BIT_ULL(ms_hyperv.shared_gpa_boundary_bits); pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n", ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b); @@ -409,11 +410,6 @@ static void __init ms_hyperv_init_platform(void) swiotlb_unencrypted_base = ms_hyperv.shared_gpa_boundary; #endif } - /* Isolation VMs are unenlightened SEV-based VMs, thus this check: */ - if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) { - if (hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE) - cc_set_vendor(CC_VENDOR_HYPERV); - } } if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) { @@ -482,6 +478,9 @@ static void __init ms_hyperv_init_platform(void) i8253_clear_counter_on_shutdown = false; #if IS_ENABLED(CONFIG_HYPERV) + if ((hv_get_isolation_type() == HV_ISOLATION_TYPE_VBS) || + (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)) + hv_vtom_init(); /* * Setup the hook to get control post apic initialization. */ diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 356758b..b037954 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -2175,9 +2175,6 @@ static int __set_memory_enc_pgtable(unsigned long addr, int numpages, bool enc) static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc) { - if (hv_is_isolation_supported()) - return hv_set_mem_host_visibility(addr, numpages, !enc); - if (cc_platform_has(CC_ATTR_MEM_ENCRYPT)) return __set_memory_enc_pgtable(addr, numpages, enc); diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index d24dd65..e9e1c41 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -2156,7 +2156,6 @@ void vmbus_device_unregister(struct hv_device *device_obj) * VMBUS is an acpi enumerated device. Get the information we * need from DSDT. */ -#define VTPM_BASE_ADDRESS 0xfed40000 static acpi_status vmbus_walk_resources(struct acpi_resource *res, void *ctx) { resource_size_t start = 0; diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h index 8845a2e..90d7f68 100644 --- a/include/asm-generic/mshyperv.h +++ b/include/asm-generic/mshyperv.h @@ -26,6 +26,8 @@ #include <asm/ptrace.h> #include <asm/hyperv-tlfs.h> +#define VTPM_BASE_ADDRESS 0xfed40000 + struct ms_hyperv_info { u32 features; u32 priv_high;
Hyper-V guests on AMD SEV-SNP hardware have the option of using the "virtual Top Of Memory" (vTOM) feature specified by the SEV-SNP architecture. With vTOM, shared vs. private memory accesses are controlled by splitting the guest physical address space into two halves. vTOM is the dividing line where the uppermost bit of the physical address space is set; e.g., with 47 bits of guest physical address space, vTOM is 0x400000000000 (bit 46 is set). Guest physical memory is accessible at two parallel physical addresses -- one below vTOM and one above vTOM. Accesses below vTOM are private (encrypted) while accesses above vTOM are shared (decrypted). In this sense, vTOM is like the GPA.SHARED bit in Intel TDX. Support for Hyper-V guests using vTOM was added to the Linux kernel in two patch sets[1][2]. This support treats the vTOM bit as part of the physical address. For accessing shared (decrypted) memory, these patch sets create a second kernel virtual mapping that maps to physical addresses above vTOM. A better approach is to treat the vTOM bit as a protection flag, not as part of the physical address. This new approach is like the approach for the GPA.SHARED bit in Intel TDX. Rather than creating a second kernel virtual mapping, the existing mapping is updated using recently added coco mechanisms. When memory is changed between private and shared using set_memory_decrypted() and set_memory_encrypted(), the PTEs for the existing kernel mapping are changed to add or remove the vTOM bit in the guest physical address, just as with TDX. The hypercalls to change the memory status on the host side are made using the existing callback mechanism. Everything just works, with a minor tweak to map the IO-APIC to use private accesses. To accomplish the switch in approach, the following must be done: * Update Hyper-V initialization to set the cc_mask based on vTOM and do other coco initialization. * Update physical_mask so the vTOM bit is no longer treated as part of the physical address * Remove CC_VENDOR_HYPERV and merge the associated vTOM functionality under CC_VENDOR_AMD. Update cc_mkenc() and cc_mkdec() to set/clear the vTOM bit as a protection flag. * Code already exists to make hypercalls to inform Hyper-V about pages changing between shared and private. Update this code to run as a callback from __set_memory_enc_pgtable(). * Remove the Hyper-V special case from __set_memory_enc_dec() * Remove the Hyper-V specific call to swiotlb_update_mem_attributes() since mem_encrypt_init() will now do it. * Add a Hyper-V specific implementation of the is_private_mmio() callback that returns true for the IO-APIC and vTPM MMIO addresses [1] https://lore.kernel.org/all/20211025122116.264793-1-ltykernel@gmail.com/ [2] https://lore.kernel.org/all/20211213071407.314309-1-ltykernel@gmail.com/ Signed-off-by: Michael Kelley <mikelley@microsoft.com> --- arch/x86/coco/core.c | 42 ++++++++++++++++++------ arch/x86/hyperv/hv_init.c | 11 ------- arch/x86/hyperv/ivm.c | 72 ++++++++++++++++++++++++++++++++++------- arch/x86/include/asm/coco.h | 1 - arch/x86/include/asm/mshyperv.h | 16 +++++---- arch/x86/kernel/cpu/mshyperv.c | 15 ++++----- arch/x86/mm/pat/set_memory.c | 3 -- drivers/hv/vmbus_drv.c | 1 - include/asm-generic/mshyperv.h | 2 ++ 9 files changed, 112 insertions(+), 51 deletions(-)