diff mbox series

[Resend,RFC,V2,10/12] HV/IOMMU: Add Hyper-V dma ops support

Message ID 20210414144945.3460554-11-ltykernel@gmail.com (mailing list archive)
State RFC
Headers show
Series x86/Hyper-V: Add Hyper-V Isolation VM support | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Tianyu Lan April 14, 2021, 2:49 p.m. UTC
From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyper-V Isolation VM requires bounce buffer support. To use swiotlb
bounce buffer, add Hyper-V dma ops and use swiotlb functions in the
map and unmap callback.

Allocate bounce buffer in the Hyper-V code because bounce buffer
needs to be accessed via extra address space(e.g, address above 39bit)
in the AMD SEV SNP based Isolation VM.

ioremap_cache() can't use in the hyperv_iommu_swiotlb_init() which
is too early place and remap bounce buffer in the hyperv_iommu_swiotlb_
later_init().

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 arch/x86/kernel/pci-swiotlb.c |   3 +-
 drivers/hv/vmbus_drv.c        |   3 +
 drivers/iommu/hyperv-iommu.c  | 127 ++++++++++++++++++++++++++++++++++
 3 files changed, 132 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig April 14, 2021, 3:47 p.m. UTC | #1
> +static dma_addr_t hyperv_map_page(struct device *dev, struct page *page,
> +				  unsigned long offset, size_t size,
> +				  enum dma_data_direction dir,
> +				  unsigned long attrs)
> +{
> +	phys_addr_t map, phys = (page_to_pfn(page) << PAGE_SHIFT) + offset;
> +
> +	if (!hv_is_isolation_supported())
> +		return phys;
> +
> +	map = swiotlb_tbl_map_single(dev, phys, size, HV_HYP_PAGE_SIZE, dir,
> +				     attrs);
> +	if (map == (phys_addr_t)DMA_MAPPING_ERROR)
> +		return DMA_MAPPING_ERROR;
> +
> +	return map;
> +}

This largerly duplicates what dma-direct + swiotlb does.  Please use
force_dma_unencrypted to force bounce buffering and just use the generic
code.

> +	if (hv_isolation_type_snp()) {
> +		ret = hv_set_mem_host_visibility(
> +				phys_to_virt(hyperv_io_tlb_start),
> +				hyperv_io_tlb_size,
> +				VMBUS_PAGE_VISIBLE_READ_WRITE);
> +		if (ret)
> +			panic("%s: Fail to mark Hyper-v swiotlb buffer visible to host. err=%d\n",
> +			      __func__, ret);
> +
> +		hyperv_io_tlb_remap = ioremap_cache(hyperv_io_tlb_start
> +					    + ms_hyperv.shared_gpa_boundary,
> +						    hyperv_io_tlb_size);
> +		if (!hyperv_io_tlb_remap)
> +			panic("%s: Fail to remap io tlb.\n", __func__);
> +
> +		memset(hyperv_io_tlb_remap, 0x00, hyperv_io_tlb_size);
> +		swiotlb_set_bounce_remap(hyperv_io_tlb_remap);

And this really needs to go into a common hook where we currently just
call set_memory_decrypted so that all the different schemes for these
trusted VMs (we have about half a dozen now) can share code rather than
reinventing it.
Tianyu Lan May 12, 2021, 4:01 p.m. UTC | #2
Hi Christoph and Konrad:
      Current Swiotlb bounce buffer uses a pool for all devices. There
is a high overhead to get or free bounce buffer during performance test.
Swiotlb code now use a global spin lock to protect bounce buffer data.
Several device queues try to acquire the spin lock and this introduce
additional overhead.

For performance and security perspective, each devices should have a
separate swiotlb bounce buffer pool and so this part needs to rework.
I want to check this is right way to resolve performance issues with 
swiotlb bounce buffer. If you have some other suggestions,welcome.

Thanks.

On 4/14/2021 11:47 PM, Christoph Hellwig wrote:
>> +static dma_addr_t hyperv_map_page(struct device *dev, struct page *page,
>> +				  unsigned long offset, size_t size,
>> +				  enum dma_data_direction dir,
>> +				  unsigned long attrs)
>> +{
>> +	phys_addr_t map, phys = (page_to_pfn(page) << PAGE_SHIFT) + offset;
>> +
>> +	if (!hv_is_isolation_supported())
>> +		return phys;
>> +
>> +	map = swiotlb_tbl_map_single(dev, phys, size, HV_HYP_PAGE_SIZE, dir,
>> +				     attrs);
>> +	if (map == (phys_addr_t)DMA_MAPPING_ERROR)
>> +		return DMA_MAPPING_ERROR;
>> +
>> +	return map;
>> +}
> 
> This largerly duplicates what dma-direct + swiotlb does.  Please use
> force_dma_unencrypted to force bounce buffering and just use the generic
> code.
> 
>> +	if (hv_isolation_type_snp()) {
>> +		ret = hv_set_mem_host_visibility(
>> +				phys_to_virt(hyperv_io_tlb_start),
>> +				hyperv_io_tlb_size,
>> +				VMBUS_PAGE_VISIBLE_READ_WRITE);
>> +		if (ret)
>> +			panic("%s: Fail to mark Hyper-v swiotlb buffer visible to host. err=%d\n",
>> +			      __func__, ret);
>> +
>> +		hyperv_io_tlb_remap = ioremap_cache(hyperv_io_tlb_start
>> +					    + ms_hyperv.shared_gpa_boundary,
>> +						    hyperv_io_tlb_size);
>> +		if (!hyperv_io_tlb_remap)
>> +			panic("%s: Fail to remap io tlb.\n", __func__);
>> +
>> +		memset(hyperv_io_tlb_remap, 0x00, hyperv_io_tlb_size);
>> +		swiotlb_set_bounce_remap(hyperv_io_tlb_remap);
> 
> And this really needs to go into a common hook where we currently just
> call set_memory_decrypted so that all the different schemes for these
> trusted VMs (we have about half a dozen now) can share code rather than
> reinventing it.
>
Robin Murphy May 12, 2021, 5:29 p.m. UTC | #3
On 2021-05-12 17:01, Tianyu Lan wrote:
> Hi Christoph and Konrad:
>       Current Swiotlb bounce buffer uses a pool for all devices. There
> is a high overhead to get or free bounce buffer during performance test.
> Swiotlb code now use a global spin lock to protect bounce buffer data.
> Several device queues try to acquire the spin lock and this introduce
> additional overhead.
> 
> For performance and security perspective, each devices should have a
> separate swiotlb bounce buffer pool and so this part needs to rework.
> I want to check this is right way to resolve performance issues with 
> swiotlb bounce buffer. If you have some other suggestions,welcome.

We're already well on the way to factoring out SWIOTLB to allow for just 
this sort of more flexible usage like per-device bounce pools - see here:

https://lore.kernel.org/linux-iommu/20210510095026.3477496-1-tientzu@chromium.org/T/#t

FWIW this looks to have an awful lot in common with what's going to be 
needed for Android's protected KVM and Arm's Confidential Compute 
Architecture, so we'll all be better off by doing it right. I'm getting 
the feeling that set_memory_decrypted() wants to grow into a more 
general abstraction that can return an alias at a different GPA if 
necessary.

Robin.

> 
> Thanks.
> 
> On 4/14/2021 11:47 PM, Christoph Hellwig wrote:
>>> +static dma_addr_t hyperv_map_page(struct device *dev, struct page 
>>> *page,
>>> +                  unsigned long offset, size_t size,
>>> +                  enum dma_data_direction dir,
>>> +                  unsigned long attrs)
>>> +{
>>> +    phys_addr_t map, phys = (page_to_pfn(page) << PAGE_SHIFT) + offset;
>>> +
>>> +    if (!hv_is_isolation_supported())
>>> +        return phys;
>>> +
>>> +    map = swiotlb_tbl_map_single(dev, phys, size, HV_HYP_PAGE_SIZE, 
>>> dir,
>>> +                     attrs);
>>> +    if (map == (phys_addr_t)DMA_MAPPING_ERROR)
>>> +        return DMA_MAPPING_ERROR;
>>> +
>>> +    return map;
>>> +}
>>
>> This largerly duplicates what dma-direct + swiotlb does.  Please use
>> force_dma_unencrypted to force bounce buffering and just use the generic
>> code.
>>
>>> +    if (hv_isolation_type_snp()) {
>>> +        ret = hv_set_mem_host_visibility(
>>> +                phys_to_virt(hyperv_io_tlb_start),
>>> +                hyperv_io_tlb_size,
>>> +                VMBUS_PAGE_VISIBLE_READ_WRITE);
>>> +        if (ret)
>>> +            panic("%s: Fail to mark Hyper-v swiotlb buffer visible 
>>> to host. err=%d\n",
>>> +                  __func__, ret);
>>> +
>>> +        hyperv_io_tlb_remap = ioremap_cache(hyperv_io_tlb_start
>>> +                        + ms_hyperv.shared_gpa_boundary,
>>> +                            hyperv_io_tlb_size);
>>> +        if (!hyperv_io_tlb_remap)
>>> +            panic("%s: Fail to remap io tlb.\n", __func__);
>>> +
>>> +        memset(hyperv_io_tlb_remap, 0x00, hyperv_io_tlb_size);
>>> +        swiotlb_set_bounce_remap(hyperv_io_tlb_remap);
>>
>> And this really needs to go into a common hook where we currently just
>> call set_memory_decrypted so that all the different schemes for these
>> trusted VMs (we have about half a dozen now) can share code rather than
>> reinventing it.
>>
Baolu Lu May 13, 2021, 3:19 a.m. UTC | #4
On 5/13/21 12:01 AM, Tianyu Lan wrote:
> Hi Christoph and Konrad:
>       Current Swiotlb bounce buffer uses a pool for all devices. There
> is a high overhead to get or free bounce buffer during performance test.
> Swiotlb code now use a global spin lock to protect bounce buffer data.
> Several device queues try to acquire the spin lock and this introduce
> additional overhead.
> 
> For performance and security perspective, each devices should have a
> separate swiotlb bounce buffer pool and so this part needs to rework.
> I want to check this is right way to resolve performance issues with 
> swiotlb bounce buffer. If you have some other suggestions,welcome.

Is this what you want?

https://lore.kernel.org/linux-iommu/20210510095026.3477496-1-tientzu@chromium.org/

Best regards,
baolu
diff mbox series

Patch

diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index c2cfa5e7c152..caaf68c06f24 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -15,6 +15,7 @@ 
 #include <asm/iommu_table.h>
 
 int swiotlb __read_mostly;
+extern int hyperv_swiotlb;
 
 /*
  * pci_swiotlb_detect_override - set swiotlb to 1 if necessary
@@ -68,7 +69,7 @@  void __init pci_swiotlb_init(void)
 void __init pci_swiotlb_late_init(void)
 {
 	/* An IOMMU turned us off. */
-	if (!swiotlb)
+	if (!swiotlb && !hyperv_swiotlb)
 		swiotlb_exit();
 	else {
 		printk(KERN_INFO "PCI-DMA: "
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 10dce9f91216..0ee6ec3a5de6 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -23,6 +23,7 @@ 
 #include <linux/cpu.h>
 #include <linux/sched/task_stack.h>
 
+#include <linux/dma-map-ops.h>
 #include <linux/delay.h>
 #include <linux/notifier.h>
 #include <linux/ptrace.h>
@@ -2030,6 +2031,7 @@  struct hv_device *vmbus_device_create(const guid_t *type,
 	return child_device_obj;
 }
 
+static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
 /*
  * vmbus_device_register - Register the child device
  */
@@ -2070,6 +2072,7 @@  int vmbus_device_register(struct hv_device *child_device_obj)
 	}
 	hv_debug_add_dev_dir(child_device_obj);
 
+	child_device_obj->device.dma_mask = &vmbus_dma_mask;
 	return 0;
 
 err_kset_unregister:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e285a220c913..588ba847f0cc 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -13,19 +13,28 @@ 
 #include <linux/irq.h>
 #include <linux/iommu.h>
 #include <linux/module.h>
+#include <linux/hyperv.h>
 
+#include <asm/io.h>
 #include <asm/apic.h>
 #include <asm/cpu.h>
 #include <asm/hw_irq.h>
 #include <asm/io_apic.h>
+#include <asm/iommu.h>
+#include <asm/iommu_table.h>
 #include <asm/irq_remapping.h>
 #include <asm/hypervisor.h>
 #include <asm/mshyperv.h>
+#include <asm/swiotlb.h>
+#include <linux/dma-map-ops.h>
+#include <linux/dma-direct.h>
 
 #include "irq_remapping.h"
 
 #ifdef CONFIG_IRQ_REMAP
 
+int hyperv_swiotlb __read_mostly;
+
 /*
  * According 82093AA IO-APIC spec , IO APIC has a 24-entry Interrupt
  * Redirection Table. Hyper-V exposes one single IO-APIC and so define
@@ -36,6 +45,10 @@ 
 static cpumask_t ioapic_max_cpumask = { CPU_BITS_NONE };
 static struct irq_domain *ioapic_ir_domain;
 
+static unsigned long hyperv_io_tlb_start, *hyperv_io_tlb_end; 
+static unsigned long hyperv_io_tlb_nslabs, hyperv_io_tlb_size;
+static void *hyperv_io_tlb_remap;
+
 static int hyperv_ir_set_affinity(struct irq_data *data,
 		const struct cpumask *mask, bool force)
 {
@@ -337,4 +350,118 @@  static const struct irq_domain_ops hyperv_root_ir_domain_ops = {
 	.free = hyperv_root_irq_remapping_free,
 };
 
+static dma_addr_t hyperv_map_page(struct device *dev, struct page *page,
+				  unsigned long offset, size_t size,
+				  enum dma_data_direction dir,
+				  unsigned long attrs)
+{
+	phys_addr_t map, phys = (page_to_pfn(page) << PAGE_SHIFT) + offset;
+
+	if (!hv_is_isolation_supported())
+		return phys;
+
+	map = swiotlb_tbl_map_single(dev, phys, size, HV_HYP_PAGE_SIZE, dir,
+				     attrs);
+	if (map == (phys_addr_t)DMA_MAPPING_ERROR)
+		return DMA_MAPPING_ERROR;
+
+	return map;
+}
+
+static void hyperv_unmap_page(struct device *dev, dma_addr_t dev_addr,
+		size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+	if (!hv_is_isolation_supported())
+		return;
+
+	swiotlb_tbl_unmap_single(dev, dev_addr, size, HV_HYP_PAGE_SIZE, dir,
+				attrs);
+}
+
+int __init hyperv_swiotlb_init(void)
+{
+	unsigned long bytes;
+	void *vstart = 0;
+
+	bytes = 200 * 1024 * 1024;
+	vstart = memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);
+	hyperv_io_tlb_nslabs = bytes >> IO_TLB_SHIFT;
+	hyperv_io_tlb_size = bytes;
+
+	if (!vstart) {
+		pr_warn("Fail to allocate swiotlb.\n");
+		return -ENOMEM;
+	}
+
+	hyperv_io_tlb_start = virt_to_phys(vstart);
+	if (!hyperv_io_tlb_start)
+		panic("%s: Failed to allocate %lu bytes align=0x%lx.\n",
+		      __func__, PAGE_ALIGN(bytes), PAGE_SIZE);
+
+	if (swiotlb_init_with_tbl(vstart, hyperv_io_tlb_nslabs, 1))
+		panic("%s: Cannot allocate SWIOTLB buffer.\n", __func__);
+
+	swiotlb_set_max_segment(PAGE_SIZE);
+	hyperv_io_tlb_end = hyperv_io_tlb_start + bytes;
+	return 0;
+}
+
+const struct dma_map_ops hyperv_dma_ops = {
+	.map_page = hyperv_map_page,
+	.unmap_page = hyperv_unmap_page,
+};
+
+int __init hyperv_swiotlb_detect(void)
+{
+	dma_ops = &hyperv_dma_ops;
+
+	if (hypervisor_is_type(X86_HYPER_MS_HYPERV)
+	    && hv_is_isolation_supported()) {
+		/*
+		 * Disable generic swiotlb and allocate Hyper-v swiotlb
+		 * in the hyperv_iommu_swiotlb_init().
+		 */
+		swiotlb = 0;
+		hyperv_swiotlb = 1;
+
+		return 1;
+	}
+
+	return 0;
+}
+
+void __init hyperv_iommu_swiotlb_init(void)
+{
+	hyperv_swiotlb_init();
+}
+
+void __init hyperv_iommu_swiotlb_later_init(void)
+{
+	int ret;
+
+	/* Mask bounce buffer visible to host and remap extra address. */
+	if (hv_isolation_type_snp()) {
+		ret = hv_set_mem_host_visibility(
+				phys_to_virt(hyperv_io_tlb_start),
+				hyperv_io_tlb_size,
+				VMBUS_PAGE_VISIBLE_READ_WRITE);
+		if (ret)
+			panic("%s: Fail to mark Hyper-v swiotlb buffer visible to host. err=%d\n",
+			      __func__, ret);
+
+		hyperv_io_tlb_remap = ioremap_cache(hyperv_io_tlb_start
+					    + ms_hyperv.shared_gpa_boundary,
+						    hyperv_io_tlb_size);
+		if (!hyperv_io_tlb_remap)
+			panic("%s: Fail to remap io tlb.\n", __func__);
+
+		memset(hyperv_io_tlb_remap, 0x00, hyperv_io_tlb_size);
+		swiotlb_set_bounce_remap(hyperv_io_tlb_remap);
+	}
+}
+
+IOMMU_INIT_FINISH(hyperv_swiotlb_detect,
+		  NULL, hyperv_iommu_swiotlb_init,
+		  hyperv_iommu_swiotlb_later_init);
+
 #endif