Message ID | 20211213071407.314309-2-ltykernel@gmail.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | x86/Hyper-V: Add Hyper-V Isolation VM support(Second part) | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On 12/12/21 11:14 PM, Tianyu Lan wrote: > In Isolation VM with AMD SEV, bounce buffer needs to be accessed via > extra address space which is above shared_gpa_boundary (E.G 39 bit > address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access > physical address will be original physical address + shared_gpa_boundary. > The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of > memory(vTOM). Memory addresses below vTOM are automatically treated as > private while memory above vTOM is treated as shared. This seems to be independently reintroducing some of the SEV infrastructure. Is it really OK that this doesn't interact at all with any existing SEV code? For instance, do we need a new 'swiotlb_unencrypted_base', or should this just be using sme_me_mask somehow?
On 12/14/2021 12:45 AM, Dave Hansen wrote: > On 12/12/21 11:14 PM, Tianyu Lan wrote: >> In Isolation VM with AMD SEV, bounce buffer needs to be accessed via >> extra address space which is above shared_gpa_boundary (E.G 39 bit >> address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access >> physical address will be original physical address + shared_gpa_boundary. >> The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of >> memory(vTOM). Memory addresses below vTOM are automatically treated as >> private while memory above vTOM is treated as shared. > > This seems to be independently reintroducing some of the SEV > infrastructure. Is it really OK that this doesn't interact at all with > any existing SEV code? > > For instance, do we need a new 'swiotlb_unencrypted_base', or should > this just be using sme_me_mask somehow? Hi Dave: Thanks for your review. Hyper-V provides a para-virtualized confidential computing solution based on the AMD SEV function and not expose sev&sme capabilities to guest. So sme_me_mask is unset in the Hyper-V Isolation VM. swiotlb_unencrypted_base is more general solution to handle such case of different address space for encrypted and decrypted memory and other platform also may reuse it.
On 12/13/21 8:36 PM, Tianyu Lan wrote: > On 12/14/2021 12:45 AM, Dave Hansen wrote: >> On 12/12/21 11:14 PM, Tianyu Lan wrote: >>> In Isolation VM with AMD SEV, bounce buffer needs to be accessed via >>> extra address space which is above shared_gpa_boundary (E.G 39 bit >>> address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access >>> physical address will be original physical address + >>> shared_gpa_boundary. >>> The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of >>> memory(vTOM). Memory addresses below vTOM are automatically treated as >>> private while memory above vTOM is treated as shared. >> >> This seems to be independently reintroducing some of the SEV >> infrastructure. Is it really OK that this doesn't interact at all with >> any existing SEV code? >> >> For instance, do we need a new 'swiotlb_unencrypted_base', or should >> this just be using sme_me_mask somehow? > > Thanks for your review. Hyper-V provides a para-virtualized > confidential computing solution based on the AMD SEV function and not > expose sev&sme capabilities to guest. So sme_me_mask is unset in the > Hyper-V Isolation VM. swiotlb_unencrypted_base is more general solution > to handle such case of different address space for encrypted and > decrypted memory and other platform also may reuse it. I don't really understand how this can be more general any *not* get utilized by the existing SEV support.
On 12/14/21 12:40 PM, Dave Hansen wrote: > On 12/13/21 8:36 PM, Tianyu Lan wrote: >> On 12/14/2021 12:45 AM, Dave Hansen wrote: >>> On 12/12/21 11:14 PM, Tianyu Lan wrote: >>>> In Isolation VM with AMD SEV, bounce buffer needs to be accessed via >>>> extra address space which is above shared_gpa_boundary (E.G 39 bit >>>> address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access >>>> physical address will be original physical address + >>>> shared_gpa_boundary. >>>> The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of >>>> memory(vTOM). Memory addresses below vTOM are automatically treated as >>>> private while memory above vTOM is treated as shared. >>> >>> This seems to be independently reintroducing some of the SEV >>> infrastructure. Is it really OK that this doesn't interact at all with >>> any existing SEV code? >>> >>> For instance, do we need a new 'swiotlb_unencrypted_base', or should >>> this just be using sme_me_mask somehow? >> >> Thanks for your review. Hyper-V provides a para-virtualized >> confidential computing solution based on the AMD SEV function and not >> expose sev&sme capabilities to guest. So sme_me_mask is unset in the >> Hyper-V Isolation VM. swiotlb_unencrypted_base is more general solution >> to handle such case of different address space for encrypted and >> decrypted memory and other platform also may reuse it. > > I don't really understand how this can be more general any *not* get > utilized by the existing SEV support. The Virtual Top-of-Memory (VTOM) support is an SEV-SNP feature that is meant to be used with a (relatively) un-enlightened guest. The idea is that the C-bit in the guest page tables must be 0 for all accesses. It is only the physical address relative to VTOM that determines if the access is encrypted or not. So setting sme_me_mask will actually cause issues when running with this feature. Since all DMA for an SEV-SNP guest must still be to shared (unencrypted) memory, some enlightenment is needed. In this case, memory mapped above VTOM will provide that via the SWIOTLB update. For SEV-SNP guests running with VTOM, they are likely to also be running with the Reflect #VC feature, allowing a "paravisor" to handle any #VCs generated by the guest. See sections 15.36.8 "Virtual Top-of-Memory" and 15.36.9 "Reflect #VC" in volume 2 of the AMD APM [1]. I'm not sure if that will answer your question or generate more :) Thanks, Tom [1] https://www.amd.com/system/files/TechDocs/24593.pdf >
On 12/14/21 2:23 PM, Tom Lendacky wrote: >> I don't really understand how this can be more general any *not* get >> utilized by the existing SEV support. > > The Virtual Top-of-Memory (VTOM) support is an SEV-SNP feature that is > meant to be used with a (relatively) un-enlightened guest. The idea is > that the C-bit in the guest page tables must be 0 for all accesses. It > is only the physical address relative to VTOM that determines if the > access is encrypted or not. So setting sme_me_mask will actually cause > issues when running with this feature. Since all DMA for an SEV-SNP > guest must still be to shared (unencrypted) memory, some enlightenment > is needed. In this case, memory mapped above VTOM will provide that via > the SWIOTLB update. For SEV-SNP guests running with VTOM, they are > likely to also be running with the Reflect #VC feature, allowing a > "paravisor" to handle any #VCs generated by the guest. > > See sections 15.36.8 "Virtual Top-of-Memory" and 15.36.9 "Reflect #VC" > in volume 2 of the AMD APM [1]. Thanks, Tom, that's pretty much what I was looking for. The C-bit normally comes from the page tables. But, the hardware also provides an alternative way to effectively get C-bit behavior without actually setting the bit in the page tables: Virtual Top-of-Memory (VTOM). Right? It sounds like Hyper-V has chosen to use VTOM instead of requiring the guest to do the C-bit in its page tables. But, the thing that confuses me is when you said: "it (VTOM) is meant to be used with a (relatively) un-enlightened guest". We don't have an unenlightened guest here. We have Linux, which is quite enlightened. Is VTOM being used because there's something that completely rules out using the C-bit in the page tables? What's that "something"?
On 12/15/2021 6:40 AM, Dave Hansen wrote: > On 12/14/21 2:23 PM, Tom Lendacky wrote: >>> I don't really understand how this can be more general any *not* get >>> utilized by the existing SEV support. >> >> The Virtual Top-of-Memory (VTOM) support is an SEV-SNP feature that is >> meant to be used with a (relatively) un-enlightened guest. The idea is >> that the C-bit in the guest page tables must be 0 for all accesses. It >> is only the physical address relative to VTOM that determines if the >> access is encrypted or not. So setting sme_me_mask will actually cause >> issues when running with this feature. Since all DMA for an SEV-SNP >> guest must still be to shared (unencrypted) memory, some enlightenment >> is needed. In this case, memory mapped above VTOM will provide that via >> the SWIOTLB update. For SEV-SNP guests running with VTOM, they are >> likely to also be running with the Reflect #VC feature, allowing a >> "paravisor" to handle any #VCs generated by the guest. >> >> See sections 15.36.8 "Virtual Top-of-Memory" and 15.36.9 "Reflect #VC" >> in volume 2 of the AMD APM [1]. > > Thanks, Tom, that's pretty much what I was looking for. > > The C-bit normally comes from the page tables. But, the hardware also > provides an alternative way to effectively get C-bit behavior without > actually setting the bit in the page tables: Virtual Top-of-Memory > (VTOM). Right? > > It sounds like Hyper-V has chosen to use VTOM instead of requiring the > guest to do the C-bit in its page tables. > > But, the thing that confuses me is when you said: "it (VTOM) is meant to > be used with a (relatively) un-enlightened guest". We don't have an > unenlightened guest here. We have Linux, which is quite enlightened. > >> Is VTOM being used because there's something that completely rules out >> using the C-bit in the page tables? What's that "something"? For "un-enlightened" guest, there is an another system running insider the VM to emulate some functions(tsc, timer, interrupt and so on) and this helps not to modify OS(Linux/Windows) a lot. In Hyper-V Isolation VM, we called the new system as HCL/paravisor. HCL runs in the VMPL0 and Linux runs in VMPL2. This is similar with nested virtualization. HCL plays similar role as L1 hypervisor to emulate some general functions (e.g, rdmsr/wrmsr accessing and interrupt injection) which needs to be enlightened in the enlightened guest. Linux kernel needs to handle #vc/#ve exception directly in the enlightened guest. HCL handles such exception in un-enlightened guest and emulate interrupt injection which helps not to modify OS core part code. Using vTOM also is same purpose. Hyper-V uses vTOM avoid changing page table related code in OS(include Windows and Linux)and just needs to map memory into decrypted address space above vTOM in the driver code. Linux has generic swiotlb bounce buffer implementation and so introduce swiotlb_unencrypted_base here to set shared memory boundary or vTOM. Hyper-V Isolation VM is un-enlightened guest. Hyper-V doesn't expose sev/sme capability to guest and so SEV code actually doesn't work. So we also can't interact current existing SEV code and these code is for enlightened guest support without HCL/paravisor. If other platforms or SEV want to use similar vTOM feature, swiotlb_unencrypted_base can be reused. So swiotlb_unencrypted_base is a general solution for all platforms besides SEV and Hyper-V.
On Wed, Dec 15, 2021 at 01:00:38PM +0800, Tianyu Lan wrote: > > > On 12/15/2021 6:40 AM, Dave Hansen wrote: > > On 12/14/21 2:23 PM, Tom Lendacky wrote: > > > > I don't really understand how this can be more general any *not* get > > > > utilized by the existing SEV support. > > > > > > The Virtual Top-of-Memory (VTOM) support is an SEV-SNP feature that is > > > meant to be used with a (relatively) un-enlightened guest. The idea is > > > that the C-bit in the guest page tables must be 0 for all accesses. It > > > is only the physical address relative to VTOM that determines if the > > > access is encrypted or not. So setting sme_me_mask will actually cause > > > issues when running with this feature. Since all DMA for an SEV-SNP > > > guest must still be to shared (unencrypted) memory, some enlightenment > > > is needed. In this case, memory mapped above VTOM will provide that via > > > the SWIOTLB update. For SEV-SNP guests running with VTOM, they are > > > likely to also be running with the Reflect #VC feature, allowing a > > > "paravisor" to handle any #VCs generated by the guest. > > > > > > See sections 15.36.8 "Virtual Top-of-Memory" and 15.36.9 "Reflect #VC" > > > in volume 2 of the AMD APM [1]. > > > > Thanks, Tom, that's pretty much what I was looking for. > > > > The C-bit normally comes from the page tables. But, the hardware also > > provides an alternative way to effectively get C-bit behavior without > > actually setting the bit in the page tables: Virtual Top-of-Memory > > (VTOM). Right? > > > > It sounds like Hyper-V has chosen to use VTOM instead of requiring the > > guest to do the C-bit in its page tables. > > > > But, the thing that confuses me is when you said: "it (VTOM) is meant to > > be used with a (relatively) un-enlightened guest". We don't have an > > unenlightened guest here. We have Linux, which is quite enlightened. > > > > > Is VTOM being used because there's something that completely rules out > > > using the C-bit in the page tables? What's that "something"? > > > For "un-enlightened" guest, there is an another system running insider > the VM to emulate some functions(tsc, timer, interrupt and so on) and > this helps not to modify OS(Linux/Windows) a lot. In Hyper-V Isolation > VM, we called the new system as HCL/paravisor. HCL runs in the VMPL0 and > Linux runs in VMPL2. This is similar with nested virtualization. HCL > plays similar role as L1 hypervisor to emulate some general functions > (e.g, rdmsr/wrmsr accessing and interrupt injection) which needs to be > enlightened in the enlightened guest. Linux kernel needs to handle > #vc/#ve exception directly in the enlightened guest. HCL handles such > exception in un-enlightened guest and emulate interrupt injection which > helps not to modify OS core part code. Using vTOM also is same purpose. > Hyper-V uses vTOM avoid changing page table related code in OS(include > Windows and Linux)and just needs to map memory into decrypted address > space above vTOM in the driver code. > > Linux has generic swiotlb bounce buffer implementation and so introduce > swiotlb_unencrypted_base here to set shared memory boundary or vTOM. > Hyper-V Isolation VM is un-enlightened guest. Hyper-V doesn't expose sev/sme > capability to guest and so SEV code actually doesn't work. > So we also can't interact current existing SEV code and these code is > for enlightened guest support without HCL/paravisor. If other platforms > or SEV want to use similar vTOM feature, swiotlb_unencrypted_base can > be reused. So swiotlb_unencrypted_base is a general solution for all > platforms besides SEV and Hyper-V. > Thanks for the detailed explanation. Dave, are you happy with this? The code looks pretty solid to my untrained eyes. And the series has collected necessary acks from stakeholders. If I don't hear objection by EOD Friday I will apply this series to hyperv-next. Wei.
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 569272871375..f6c3638255d5 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -73,6 +73,9 @@ extern enum swiotlb_force swiotlb_force; * @end: The end address of the swiotlb memory pool. Used to do a quick * range check to see if the memory was in fact allocated by this * API. + * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool + * may be remapped in the memory encrypted case and store virtual + * address for bounce buffer operation. * @nslabs: The number of IO TLB blocks (in groups of 64) between @start and * @end. For default swiotlb, this is command line adjustable via * setup_io_tlb_npages. @@ -92,6 +95,7 @@ extern enum swiotlb_force swiotlb_force; struct io_tlb_mem { phys_addr_t start; phys_addr_t end; + void *vaddr; unsigned long nslabs; unsigned long used; unsigned int index; @@ -186,4 +190,6 @@ static inline bool is_swiotlb_for_alloc(struct device *dev) } #endif /* CONFIG_DMA_RESTRICTED_POOL */ +extern phys_addr_t swiotlb_unencrypted_base; + #endif /* __LINUX_SWIOTLB_H */ diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 8e840fbbed7c..34e6ade4f73c 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -50,6 +50,7 @@ #include <asm/io.h> #include <asm/dma.h> +#include <linux/io.h> #include <linux/init.h> #include <linux/memblock.h> #include <linux/iommu-helper.h> @@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force; struct io_tlb_mem io_tlb_default_mem; +phys_addr_t swiotlb_unencrypted_base; + /* * Max segment that we can provide which (if pages are contingous) will * not be bounced (unless SWIOTLB_FORCE is set). @@ -155,6 +158,27 @@ static inline unsigned long nr_slots(u64 val) return DIV_ROUND_UP(val, IO_TLB_SIZE); } +/* + * Remap swioltb memory in the unencrypted physical address space + * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP + * Isolation VMs). + */ +void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes) +{ + void *vaddr = NULL; + + if (swiotlb_unencrypted_base) { + phys_addr_t paddr = mem->start + swiotlb_unencrypted_base; + + vaddr = memremap(paddr, bytes, MEMREMAP_WB); + if (!vaddr) + pr_err("Failed to map the unencrypted memory %llx size %lx.\n", + paddr, bytes); + } + + return vaddr; +} + /* * Early SWIOTLB allocation may be too early to allow an architecture to * perform the desired operations. This function allows the architecture to @@ -172,7 +196,12 @@ void __init swiotlb_update_mem_attributes(void) vaddr = phys_to_virt(mem->start); bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT); set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT); - memset(vaddr, 0, bytes); + + mem->vaddr = swiotlb_mem_remap(mem, bytes); + if (!mem->vaddr) + mem->vaddr = vaddr; + + memset(mem->vaddr, 0, bytes); } static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, @@ -196,7 +225,17 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, mem->slots[i].orig_addr = INVALID_PHYS_ADDR; mem->slots[i].alloc_size = 0; } + + /* + * If swiotlb_unencrypted_base is set, the bounce buffer memory will + * be remapped and cleared in swiotlb_update_mem_attributes. + */ + if (swiotlb_unencrypted_base) + return; + memset(vaddr, 0, bytes); + mem->vaddr = vaddr; + return; } int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose) @@ -371,7 +410,7 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size phys_addr_t orig_addr = mem->slots[index].orig_addr; size_t alloc_size = mem->slots[index].alloc_size; unsigned long pfn = PFN_DOWN(orig_addr); - unsigned char *vaddr = phys_to_virt(tlb_addr); + unsigned char *vaddr = mem->vaddr + tlb_addr - mem->start; unsigned int tlb_offset, orig_addr_offset; if (orig_addr == INVALID_PHYS_ADDR)