Message ID | 20210414144945.3460554-11-ltykernel@gmail.com (mailing list archive) |
---|---|
State | RFC |
Headers | show |
Series | x86/Hyper-V: Add Hyper-V Isolation VM support | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Guessing tree name failed - patch did not apply |
> +static dma_addr_t hyperv_map_page(struct device *dev, struct page *page, > + unsigned long offset, size_t size, > + enum dma_data_direction dir, > + unsigned long attrs) > +{ > + phys_addr_t map, phys = (page_to_pfn(page) << PAGE_SHIFT) + offset; > + > + if (!hv_is_isolation_supported()) > + return phys; > + > + map = swiotlb_tbl_map_single(dev, phys, size, HV_HYP_PAGE_SIZE, dir, > + attrs); > + if (map == (phys_addr_t)DMA_MAPPING_ERROR) > + return DMA_MAPPING_ERROR; > + > + return map; > +} This largerly duplicates what dma-direct + swiotlb does. Please use force_dma_unencrypted to force bounce buffering and just use the generic code. > + if (hv_isolation_type_snp()) { > + ret = hv_set_mem_host_visibility( > + phys_to_virt(hyperv_io_tlb_start), > + hyperv_io_tlb_size, > + VMBUS_PAGE_VISIBLE_READ_WRITE); > + if (ret) > + panic("%s: Fail to mark Hyper-v swiotlb buffer visible to host. err=%d\n", > + __func__, ret); > + > + hyperv_io_tlb_remap = ioremap_cache(hyperv_io_tlb_start > + + ms_hyperv.shared_gpa_boundary, > + hyperv_io_tlb_size); > + if (!hyperv_io_tlb_remap) > + panic("%s: Fail to remap io tlb.\n", __func__); > + > + memset(hyperv_io_tlb_remap, 0x00, hyperv_io_tlb_size); > + swiotlb_set_bounce_remap(hyperv_io_tlb_remap); And this really needs to go into a common hook where we currently just call set_memory_decrypted so that all the different schemes for these trusted VMs (we have about half a dozen now) can share code rather than reinventing it.
Hi Christoph and Konrad: Current Swiotlb bounce buffer uses a pool for all devices. There is a high overhead to get or free bounce buffer during performance test. Swiotlb code now use a global spin lock to protect bounce buffer data. Several device queues try to acquire the spin lock and this introduce additional overhead. For performance and security perspective, each devices should have a separate swiotlb bounce buffer pool and so this part needs to rework. I want to check this is right way to resolve performance issues with swiotlb bounce buffer. If you have some other suggestions,welcome. Thanks. On 4/14/2021 11:47 PM, Christoph Hellwig wrote: >> +static dma_addr_t hyperv_map_page(struct device *dev, struct page *page, >> + unsigned long offset, size_t size, >> + enum dma_data_direction dir, >> + unsigned long attrs) >> +{ >> + phys_addr_t map, phys = (page_to_pfn(page) << PAGE_SHIFT) + offset; >> + >> + if (!hv_is_isolation_supported()) >> + return phys; >> + >> + map = swiotlb_tbl_map_single(dev, phys, size, HV_HYP_PAGE_SIZE, dir, >> + attrs); >> + if (map == (phys_addr_t)DMA_MAPPING_ERROR) >> + return DMA_MAPPING_ERROR; >> + >> + return map; >> +} > > This largerly duplicates what dma-direct + swiotlb does. Please use > force_dma_unencrypted to force bounce buffering and just use the generic > code. > >> + if (hv_isolation_type_snp()) { >> + ret = hv_set_mem_host_visibility( >> + phys_to_virt(hyperv_io_tlb_start), >> + hyperv_io_tlb_size, >> + VMBUS_PAGE_VISIBLE_READ_WRITE); >> + if (ret) >> + panic("%s: Fail to mark Hyper-v swiotlb buffer visible to host. err=%d\n", >> + __func__, ret); >> + >> + hyperv_io_tlb_remap = ioremap_cache(hyperv_io_tlb_start >> + + ms_hyperv.shared_gpa_boundary, >> + hyperv_io_tlb_size); >> + if (!hyperv_io_tlb_remap) >> + panic("%s: Fail to remap io tlb.\n", __func__); >> + >> + memset(hyperv_io_tlb_remap, 0x00, hyperv_io_tlb_size); >> + swiotlb_set_bounce_remap(hyperv_io_tlb_remap); > > And this really needs to go into a common hook where we currently just > call set_memory_decrypted so that all the different schemes for these > trusted VMs (we have about half a dozen now) can share code rather than > reinventing it. >
On 2021-05-12 17:01, Tianyu Lan wrote: > Hi Christoph and Konrad: > Current Swiotlb bounce buffer uses a pool for all devices. There > is a high overhead to get or free bounce buffer during performance test. > Swiotlb code now use a global spin lock to protect bounce buffer data. > Several device queues try to acquire the spin lock and this introduce > additional overhead. > > For performance and security perspective, each devices should have a > separate swiotlb bounce buffer pool and so this part needs to rework. > I want to check this is right way to resolve performance issues with > swiotlb bounce buffer. If you have some other suggestions,welcome. We're already well on the way to factoring out SWIOTLB to allow for just this sort of more flexible usage like per-device bounce pools - see here: https://lore.kernel.org/linux-iommu/20210510095026.3477496-1-tientzu@chromium.org/T/#t FWIW this looks to have an awful lot in common with what's going to be needed for Android's protected KVM and Arm's Confidential Compute Architecture, so we'll all be better off by doing it right. I'm getting the feeling that set_memory_decrypted() wants to grow into a more general abstraction that can return an alias at a different GPA if necessary. Robin. > > Thanks. > > On 4/14/2021 11:47 PM, Christoph Hellwig wrote: >>> +static dma_addr_t hyperv_map_page(struct device *dev, struct page >>> *page, >>> + unsigned long offset, size_t size, >>> + enum dma_data_direction dir, >>> + unsigned long attrs) >>> +{ >>> + phys_addr_t map, phys = (page_to_pfn(page) << PAGE_SHIFT) + offset; >>> + >>> + if (!hv_is_isolation_supported()) >>> + return phys; >>> + >>> + map = swiotlb_tbl_map_single(dev, phys, size, HV_HYP_PAGE_SIZE, >>> dir, >>> + attrs); >>> + if (map == (phys_addr_t)DMA_MAPPING_ERROR) >>> + return DMA_MAPPING_ERROR; >>> + >>> + return map; >>> +} >> >> This largerly duplicates what dma-direct + swiotlb does. Please use >> force_dma_unencrypted to force bounce buffering and just use the generic >> code. >> >>> + if (hv_isolation_type_snp()) { >>> + ret = hv_set_mem_host_visibility( >>> + phys_to_virt(hyperv_io_tlb_start), >>> + hyperv_io_tlb_size, >>> + VMBUS_PAGE_VISIBLE_READ_WRITE); >>> + if (ret) >>> + panic("%s: Fail to mark Hyper-v swiotlb buffer visible >>> to host. err=%d\n", >>> + __func__, ret); >>> + >>> + hyperv_io_tlb_remap = ioremap_cache(hyperv_io_tlb_start >>> + + ms_hyperv.shared_gpa_boundary, >>> + hyperv_io_tlb_size); >>> + if (!hyperv_io_tlb_remap) >>> + panic("%s: Fail to remap io tlb.\n", __func__); >>> + >>> + memset(hyperv_io_tlb_remap, 0x00, hyperv_io_tlb_size); >>> + swiotlb_set_bounce_remap(hyperv_io_tlb_remap); >> >> And this really needs to go into a common hook where we currently just >> call set_memory_decrypted so that all the different schemes for these >> trusted VMs (we have about half a dozen now) can share code rather than >> reinventing it. >>
On 5/13/21 12:01 AM, Tianyu Lan wrote: > Hi Christoph and Konrad: > Current Swiotlb bounce buffer uses a pool for all devices. There > is a high overhead to get or free bounce buffer during performance test. > Swiotlb code now use a global spin lock to protect bounce buffer data. > Several device queues try to acquire the spin lock and this introduce > additional overhead. > > For performance and security perspective, each devices should have a > separate swiotlb bounce buffer pool and so this part needs to rework. > I want to check this is right way to resolve performance issues with > swiotlb bounce buffer. If you have some other suggestions,welcome. Is this what you want? https://lore.kernel.org/linux-iommu/20210510095026.3477496-1-tientzu@chromium.org/ Best regards, baolu
diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c index c2cfa5e7c152..caaf68c06f24 100644 --- a/arch/x86/kernel/pci-swiotlb.c +++ b/arch/x86/kernel/pci-swiotlb.c @@ -15,6 +15,7 @@ #include <asm/iommu_table.h> int swiotlb __read_mostly; +extern int hyperv_swiotlb; /* * pci_swiotlb_detect_override - set swiotlb to 1 if necessary @@ -68,7 +69,7 @@ void __init pci_swiotlb_init(void) void __init pci_swiotlb_late_init(void) { /* An IOMMU turned us off. */ - if (!swiotlb) + if (!swiotlb && !hyperv_swiotlb) swiotlb_exit(); else { printk(KERN_INFO "PCI-DMA: " diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 10dce9f91216..0ee6ec3a5de6 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -23,6 +23,7 @@ #include <linux/cpu.h> #include <linux/sched/task_stack.h> +#include <linux/dma-map-ops.h> #include <linux/delay.h> #include <linux/notifier.h> #include <linux/ptrace.h> @@ -2030,6 +2031,7 @@ struct hv_device *vmbus_device_create(const guid_t *type, return child_device_obj; } +static u64 vmbus_dma_mask = DMA_BIT_MASK(64); /* * vmbus_device_register - Register the child device */ @@ -2070,6 +2072,7 @@ int vmbus_device_register(struct hv_device *child_device_obj) } hv_debug_add_dev_dir(child_device_obj); + child_device_obj->device.dma_mask = &vmbus_dma_mask; return 0; err_kset_unregister: diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c index e285a220c913..588ba847f0cc 100644 --- a/drivers/iommu/hyperv-iommu.c +++ b/drivers/iommu/hyperv-iommu.c @@ -13,19 +13,28 @@ #include <linux/irq.h> #include <linux/iommu.h> #include <linux/module.h> +#include <linux/hyperv.h> +#include <asm/io.h> #include <asm/apic.h> #include <asm/cpu.h> #include <asm/hw_irq.h> #include <asm/io_apic.h> +#include <asm/iommu.h> +#include <asm/iommu_table.h> #include <asm/irq_remapping.h> #include <asm/hypervisor.h> #include <asm/mshyperv.h> +#include <asm/swiotlb.h> +#include <linux/dma-map-ops.h> +#include <linux/dma-direct.h> #include "irq_remapping.h" #ifdef CONFIG_IRQ_REMAP +int hyperv_swiotlb __read_mostly; + /* * According 82093AA IO-APIC spec , IO APIC has a 24-entry Interrupt * Redirection Table. Hyper-V exposes one single IO-APIC and so define @@ -36,6 +45,10 @@ static cpumask_t ioapic_max_cpumask = { CPU_BITS_NONE }; static struct irq_domain *ioapic_ir_domain; +static unsigned long hyperv_io_tlb_start, *hyperv_io_tlb_end; +static unsigned long hyperv_io_tlb_nslabs, hyperv_io_tlb_size; +static void *hyperv_io_tlb_remap; + static int hyperv_ir_set_affinity(struct irq_data *data, const struct cpumask *mask, bool force) { @@ -337,4 +350,118 @@ static const struct irq_domain_ops hyperv_root_ir_domain_ops = { .free = hyperv_root_irq_remapping_free, }; +static dma_addr_t hyperv_map_page(struct device *dev, struct page *page, + unsigned long offset, size_t size, + enum dma_data_direction dir, + unsigned long attrs) +{ + phys_addr_t map, phys = (page_to_pfn(page) << PAGE_SHIFT) + offset; + + if (!hv_is_isolation_supported()) + return phys; + + map = swiotlb_tbl_map_single(dev, phys, size, HV_HYP_PAGE_SIZE, dir, + attrs); + if (map == (phys_addr_t)DMA_MAPPING_ERROR) + return DMA_MAPPING_ERROR; + + return map; +} + +static void hyperv_unmap_page(struct device *dev, dma_addr_t dev_addr, + size_t size, enum dma_data_direction dir, unsigned long attrs) +{ + if (!hv_is_isolation_supported()) + return; + + swiotlb_tbl_unmap_single(dev, dev_addr, size, HV_HYP_PAGE_SIZE, dir, + attrs); +} + +int __init hyperv_swiotlb_init(void) +{ + unsigned long bytes; + void *vstart = 0; + + bytes = 200 * 1024 * 1024; + vstart = memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE); + hyperv_io_tlb_nslabs = bytes >> IO_TLB_SHIFT; + hyperv_io_tlb_size = bytes; + + if (!vstart) { + pr_warn("Fail to allocate swiotlb.\n"); + return -ENOMEM; + } + + hyperv_io_tlb_start = virt_to_phys(vstart); + if (!hyperv_io_tlb_start) + panic("%s: Failed to allocate %lu bytes align=0x%lx.\n", + __func__, PAGE_ALIGN(bytes), PAGE_SIZE); + + if (swiotlb_init_with_tbl(vstart, hyperv_io_tlb_nslabs, 1)) + panic("%s: Cannot allocate SWIOTLB buffer.\n", __func__); + + swiotlb_set_max_segment(PAGE_SIZE); + hyperv_io_tlb_end = hyperv_io_tlb_start + bytes; + return 0; +} + +const struct dma_map_ops hyperv_dma_ops = { + .map_page = hyperv_map_page, + .unmap_page = hyperv_unmap_page, +}; + +int __init hyperv_swiotlb_detect(void) +{ + dma_ops = &hyperv_dma_ops; + + if (hypervisor_is_type(X86_HYPER_MS_HYPERV) + && hv_is_isolation_supported()) { + /* + * Disable generic swiotlb and allocate Hyper-v swiotlb + * in the hyperv_iommu_swiotlb_init(). + */ + swiotlb = 0; + hyperv_swiotlb = 1; + + return 1; + } + + return 0; +} + +void __init hyperv_iommu_swiotlb_init(void) +{ + hyperv_swiotlb_init(); +} + +void __init hyperv_iommu_swiotlb_later_init(void) +{ + int ret; + + /* Mask bounce buffer visible to host and remap extra address. */ + if (hv_isolation_type_snp()) { + ret = hv_set_mem_host_visibility( + phys_to_virt(hyperv_io_tlb_start), + hyperv_io_tlb_size, + VMBUS_PAGE_VISIBLE_READ_WRITE); + if (ret) + panic("%s: Fail to mark Hyper-v swiotlb buffer visible to host. err=%d\n", + __func__, ret); + + hyperv_io_tlb_remap = ioremap_cache(hyperv_io_tlb_start + + ms_hyperv.shared_gpa_boundary, + hyperv_io_tlb_size); + if (!hyperv_io_tlb_remap) + panic("%s: Fail to remap io tlb.\n", __func__); + + memset(hyperv_io_tlb_remap, 0x00, hyperv_io_tlb_size); + swiotlb_set_bounce_remap(hyperv_io_tlb_remap); + } +} + +IOMMU_INIT_FINISH(hyperv_swiotlb_detect, + NULL, hyperv_iommu_swiotlb_init, + hyperv_iommu_swiotlb_later_init); + #endif