Message ID | 1647534311-2349-5-git-send-email-mikelley@microsoft.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Fix coherence for VMbus and PCI pass-thru devices in Hyper-V VM | expand |
On 2022-03-17 16:25, Michael Kelley via iommu wrote: > PCI pass-thru devices in a Hyper-V VM are represented as a VMBus > device and as a PCI device. The coherence of the VMbus device is > set based on the VMbus node in ACPI, but the PCI device has no > ACPI node and defaults to not hardware coherent. This results > in extra software coherence management overhead on ARM64 when > devices are hardware coherent. > > Fix this by propagating the coherence of the VMbus device to the > PCI device. There's no effect on x86/x64 where devices are > always hardware coherent. > > Signed-off-by: Michael Kelley <mikelley@microsoft.com> > --- > drivers/pci/controller/pci-hyperv.c | 17 +++++++++++++---- > 1 file changed, 13 insertions(+), 4 deletions(-) > > diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c > index ae0bc2f..14276f5 100644 > --- a/drivers/pci/controller/pci-hyperv.c > +++ b/drivers/pci/controller/pci-hyperv.c > @@ -49,6 +49,7 @@ > #include <linux/refcount.h> > #include <linux/irqdomain.h> > #include <linux/acpi.h> > +#include <linux/dma-map-ops.h> > #include <asm/mshyperv.h> > > /* > @@ -2142,9 +2143,9 @@ static void hv_pci_remove_slots(struct hv_pcibus_device *hbus) > } > > /* > - * Set NUMA node for the devices on the bus > + * Set NUMA node and DMA coherence for the devices on the bus > */ > -static void hv_pci_assign_numa_node(struct hv_pcibus_device *hbus) > +static void hv_pci_assign_properties(struct hv_pcibus_device *hbus) > { > struct pci_dev *dev; > struct pci_bus *bus = hbus->bridge->bus; > @@ -2167,6 +2168,14 @@ static void hv_pci_assign_numa_node(struct hv_pcibus_device *hbus) > numa_map_to_online_node( > hv_dev->desc.virtual_numa_node)); > > + /* > + * On ARM64, propagate the DMA coherence from the VMbus device > + * to the corresponding PCI device. On x86/x64, these calls > + * have no effect because DMA is always hardware coherent. > + */ > + dev_set_dma_coherent(&dev->dev, > + dev_is_dma_coherent(&hbus->hdev->device)); Eww... if you really have to do this, I'd prefer to see a proper hv_dma_configure() helper implemented and wired up to pci_dma_configure(). Although since it's a generic property I guess at worst pci_dma_configure could perhaps propagate coherency from the host bridge to its children by itself in the absence of any other firmware info. And it's built-in so could use arch_setup_dma_ops() like everyone else. Robin. > + > put_pcichild(hv_dev); > } > } > @@ -2191,7 +2200,7 @@ static int create_root_hv_pci_bus(struct hv_pcibus_device *hbus) > return error; > > pci_lock_rescan_remove(); > - hv_pci_assign_numa_node(hbus); > + hv_pci_assign_properties(hbus); > pci_bus_assign_resources(bridge->bus); > hv_pci_assign_slots(hbus); > pci_bus_add_devices(bridge->bus); > @@ -2458,7 +2467,7 @@ static void pci_devices_present_work(struct work_struct *work) > */ > pci_lock_rescan_remove(); > pci_scan_child_bus(hbus->bridge->bus); > - hv_pci_assign_numa_node(hbus); > + hv_pci_assign_properties(hbus); > hv_pci_assign_slots(hbus); > pci_unlock_rescan_remove(); > break;
From: Robin Murphy <robin.murphy@arm.com> Sent: Thursday, March 17, 2022 10:15 AM > > On 2022-03-17 16:25, Michael Kelley via iommu wrote: > > PCI pass-thru devices in a Hyper-V VM are represented as a VMBus > > device and as a PCI device. The coherence of the VMbus device is > > set based on the VMbus node in ACPI, but the PCI device has no > > ACPI node and defaults to not hardware coherent. This results > > in extra software coherence management overhead on ARM64 when > > devices are hardware coherent. > > > > Fix this by propagating the coherence of the VMbus device to the > > PCI device. There's no effect on x86/x64 where devices are > > always hardware coherent. > > > > Signed-off-by: Michael Kelley <mikelley@microsoft.com> > > --- > > drivers/pci/controller/pci-hyperv.c | 17 +++++++++++++---- > > 1 file changed, 13 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c > > index ae0bc2f..14276f5 100644 > > --- a/drivers/pci/controller/pci-hyperv.c > > +++ b/drivers/pci/controller/pci-hyperv.c > > @@ -49,6 +49,7 @@ > > #include <linux/refcount.h> > > #include <linux/irqdomain.h> > > #include <linux/acpi.h> > > +#include <linux/dma-map-ops.h> > > #include <asm/mshyperv.h> > > > > /* > > @@ -2142,9 +2143,9 @@ static void hv_pci_remove_slots(struct hv_pcibus_device > *hbus) > > } > > > > /* > > - * Set NUMA node for the devices on the bus > > + * Set NUMA node and DMA coherence for the devices on the bus > > */ > > -static void hv_pci_assign_numa_node(struct hv_pcibus_device *hbus) > > +static void hv_pci_assign_properties(struct hv_pcibus_device *hbus) > > { > > struct pci_dev *dev; > > struct pci_bus *bus = hbus->bridge->bus; > > @@ -2167,6 +2168,14 @@ static void hv_pci_assign_numa_node(struct > hv_pcibus_device *hbus) > > numa_map_to_online_node( > > hv_dev->desc.virtual_numa_node)); > > > > + /* > > + * On ARM64, propagate the DMA coherence from the VMbus device > > + * to the corresponding PCI device. On x86/x64, these calls > > + * have no effect because DMA is always hardware coherent. > > + */ > > + dev_set_dma_coherent(&dev->dev, > > + dev_is_dma_coherent(&hbus->hdev->device)); > > Eww... if you really have to do this, I'd prefer to see a proper > hv_dma_configure() helper implemented and wired up to > pci_dma_configure(). Although since it's a generic property I guess at > worst pci_dma_configure could perhaps propagate coherency from the host > bridge to its children by itself in the absence of any other firmware > info. And it's built-in so could use arch_setup_dma_ops() like everyone > else. > I'm not seeing an existing mechanism to provide a "helper" or override of pci_dma_configure(). Could you elaborate? Or is this something that needs to be created? Michael > Robin. > > > + > > put_pcichild(hv_dev); > > } > > } > > @@ -2191,7 +2200,7 @@ static int create_root_hv_pci_bus(struct hv_pcibus_device > *hbus) > > return error; > > > > pci_lock_rescan_remove(); > > - hv_pci_assign_numa_node(hbus); > > + hv_pci_assign_properties(hbus); > > pci_bus_assign_resources(bridge->bus); > > hv_pci_assign_slots(hbus); > > pci_bus_add_devices(bridge->bus); > > @@ -2458,7 +2467,7 @@ static void pci_devices_present_work(struct work_struct > *work) > > */ > > pci_lock_rescan_remove(); > > pci_scan_child_bus(hbus->bridge->bus); > > - hv_pci_assign_numa_node(hbus); > > + hv_pci_assign_properties(hbus); > > hv_pci_assign_slots(hbus); > > pci_unlock_rescan_remove(); > > break;
On 2022-03-18 05:12, Michael Kelley (LINUX) wrote: > From: Robin Murphy <robin.murphy@arm.com> Sent: Thursday, March 17, 2022 10:15 AM >> >> On 2022-03-17 16:25, Michael Kelley via iommu wrote: >>> PCI pass-thru devices in a Hyper-V VM are represented as a VMBus >>> device and as a PCI device. The coherence of the VMbus device is >>> set based on the VMbus node in ACPI, but the PCI device has no >>> ACPI node and defaults to not hardware coherent. This results >>> in extra software coherence management overhead on ARM64 when >>> devices are hardware coherent. >>> >>> Fix this by propagating the coherence of the VMbus device to the >>> PCI device. There's no effect on x86/x64 where devices are >>> always hardware coherent. >>> >>> Signed-off-by: Michael Kelley <mikelley@microsoft.com> >>> --- >>> drivers/pci/controller/pci-hyperv.c | 17 +++++++++++++---- >>> 1 file changed, 13 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c >>> index ae0bc2f..14276f5 100644 >>> --- a/drivers/pci/controller/pci-hyperv.c >>> +++ b/drivers/pci/controller/pci-hyperv.c >>> @@ -49,6 +49,7 @@ >>> #include <linux/refcount.h> >>> #include <linux/irqdomain.h> >>> #include <linux/acpi.h> >>> +#include <linux/dma-map-ops.h> >>> #include <asm/mshyperv.h> >>> >>> /* >>> @@ -2142,9 +2143,9 @@ static void hv_pci_remove_slots(struct hv_pcibus_device >> *hbus) >>> } >>> >>> /* >>> - * Set NUMA node for the devices on the bus >>> + * Set NUMA node and DMA coherence for the devices on the bus >>> */ >>> -static void hv_pci_assign_numa_node(struct hv_pcibus_device *hbus) >>> +static void hv_pci_assign_properties(struct hv_pcibus_device *hbus) >>> { >>> struct pci_dev *dev; >>> struct pci_bus *bus = hbus->bridge->bus; >>> @@ -2167,6 +2168,14 @@ static void hv_pci_assign_numa_node(struct >> hv_pcibus_device *hbus) >>> numa_map_to_online_node( >>> hv_dev->desc.virtual_numa_node)); >>> >>> + /* >>> + * On ARM64, propagate the DMA coherence from the VMbus device >>> + * to the corresponding PCI device. On x86/x64, these calls >>> + * have no effect because DMA is always hardware coherent. >>> + */ >>> + dev_set_dma_coherent(&dev->dev, >>> + dev_is_dma_coherent(&hbus->hdev->device)); >> >> Eww... if you really have to do this, I'd prefer to see a proper >> hv_dma_configure() helper implemented and wired up to >> pci_dma_configure(). Although since it's a generic property I guess at >> worst pci_dma_configure could perhaps propagate coherency from the host >> bridge to its children by itself in the absence of any other firmware >> info. And it's built-in so could use arch_setup_dma_ops() like everyone >> else. >> > > I'm not seeing an existing mechanism to provide a "helper" or override > of pci_dma_configure(). Could you elaborate? Or is this something > that needs to be created? I mean something like the diff below (other #includes omitted for clarity). Essentially if VMBus has its own way of describing parts of the system, then for those parts it's nice if it could fit into the same abstractions we use for firmware-based system description. Cheers, Robin. ----->8----- diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 588588cfda48..7d92ccad1569 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -20,6 +20,7 @@ #include <linux/of_device.h> #include <linux/acpi.h> #include <linux/dma-map-ops.h> +#include <linux/hyperv.h> #include "pci.h" #include "pcie/portdrv.h" @@ -1602,6 +1603,8 @@ static int pci_dma_configure(struct device *dev) struct acpi_device *adev = to_acpi_device_node(bridge->fwnode); ret = acpi_dma_configure(dev, acpi_get_dma_attr(adev)); + } else if (is_vmbus_dev(bridge)) { + ret = hv_dma_configure(dev, device_get_dma_attr(bridge)); } pci_put_host_bridge_device(bridge); diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index f565a8938836..d1d4dd3d5a3a 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1764,4 +1764,19 @@ static inline unsigned long virt_to_hvpfn(void *addr) #define HVPFN_DOWN(x) ((x) >> HV_HYP_PAGE_SHIFT) #define page_to_hvpfn(page) (page_to_pfn(page) * NR_HV_HYP_PAGES_IN_PAGE) +static inline bool is_vmbus_dev(struct device *dev) +{ + /* + * dev->bus == &hv_bus would break when the caller is built-in + * and CONFIG_HYPERV=m, so look for it by name instead. + */ + return !strcmp(dev->bus->name, "vmbus"); +} + +static inline int hv_dma_configure(struct device *dev, enum dev_dma_attr attr) +{ + arch_setup_dma_ops(dev, 0, 0, NULL, attr == DEV_DMA_COHERENT); + return 0; +} + #endif /* _HYPERV_H */
From: Robin Murphy <robin.murphy@arm.com> Sent: Friday, March 18, 2022 3:58 AM > > On 2022-03-18 05:12, Michael Kelley (LINUX) wrote: > > From: Robin Murphy <robin.murphy@arm.com> Sent: Thursday, March 17, 2022 > 10:15 AM > >> > >> On 2022-03-17 16:25, Michael Kelley via iommu wrote: > >>> PCI pass-thru devices in a Hyper-V VM are represented as a VMBus > >>> device and as a PCI device. The coherence of the VMbus device is > >>> set based on the VMbus node in ACPI, but the PCI device has no > >>> ACPI node and defaults to not hardware coherent. This results > >>> in extra software coherence management overhead on ARM64 when > >>> devices are hardware coherent. > >>> > >>> Fix this by propagating the coherence of the VMbus device to the > >>> PCI device. There's no effect on x86/x64 where devices are > >>> always hardware coherent. > >>> > >>> Signed-off-by: Michael Kelley <mikelley@microsoft.com> > >>> --- > >>> drivers/pci/controller/pci-hyperv.c | 17 +++++++++++++---- > >>> 1 file changed, 13 insertions(+), 4 deletions(-) > >>> > >>> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c > >>> index ae0bc2f..14276f5 100644 > >>> --- a/drivers/pci/controller/pci-hyperv.c > >>> +++ b/drivers/pci/controller/pci-hyperv.c > >>> @@ -49,6 +49,7 @@ > >>> #include <linux/refcount.h> > >>> #include <linux/irqdomain.h> > >>> #include <linux/acpi.h> > >>> +#include <linux/dma-map-ops.h> > >>> #include <asm/mshyperv.h> > >>> > >>> /* > >>> @@ -2142,9 +2143,9 @@ static void hv_pci_remove_slots(struct hv_pcibus_device > >> *hbus) > >>> } > >>> > >>> /* > >>> - * Set NUMA node for the devices on the bus > >>> + * Set NUMA node and DMA coherence for the devices on the bus > >>> */ > >>> -static void hv_pci_assign_numa_node(struct hv_pcibus_device *hbus) > >>> +static void hv_pci_assign_properties(struct hv_pcibus_device *hbus) > >>> { > >>> struct pci_dev *dev; > >>> struct pci_bus *bus = hbus->bridge->bus; > >>> @@ -2167,6 +2168,14 @@ static void hv_pci_assign_numa_node(struct > >> hv_pcibus_device *hbus) > >>> numa_map_to_online_node( > >>> hv_dev->desc.virtual_numa_node)); > >>> > >>> + /* > >>> + * On ARM64, propagate the DMA coherence from the VMbus device > >>> + * to the corresponding PCI device. On x86/x64, these calls > >>> + * have no effect because DMA is always hardware coherent. > >>> + */ > >>> + dev_set_dma_coherent(&dev->dev, > >>> + dev_is_dma_coherent(&hbus->hdev->device)); > >> > >> Eww... if you really have to do this, I'd prefer to see a proper > >> hv_dma_configure() helper implemented and wired up to > >> pci_dma_configure(). Although since it's a generic property I guess at > >> worst pci_dma_configure could perhaps propagate coherency from the host > >> bridge to its children by itself in the absence of any other firmware > >> info. And it's built-in so could use arch_setup_dma_ops() like everyone > >> else. > >> > > > > I'm not seeing an existing mechanism to provide a "helper" or override > > of pci_dma_configure(). Could you elaborate? Or is this something > > that needs to be created? > > I mean something like the diff below (other #includes omitted for > clarity). Essentially if VMBus has its own way of describing parts of > the system, then for those parts it's nice if it could fit into the same > abstractions we use for firmware-based system description. OK, got it. Adding the VMbus special case into pci_dma_configure() would work. But after investigating further, let me throw out two other possible solutions as well. 1) It's easy to make the top-level VMbus node in the DSDT be the ACPI companion for the pci_host bridge. The function pcibios_root_bridge_prepare() will do this if the pci-hyperv.c driver sets sysdata->parent to that top-level VMbus node. Then pci_dma_configure()works as is. Also, commit 7d40c0f70 that special cases pcibios_root_bridge_prepare() for Hyper-V becomes unnecessary. I've coded this approach and it seems to work, but I don't know all the implications of making that VMbus node be the ACPI companion, potentially for multiple pci_host bridges. 2) Since there's no vIOMMU and we don't know how it will be described if there should be one in the future, it's a bit premature to try to set things up "correctly" now to handle it. A simple approach is to set dma_default_coherent to true if the top-level VMbus node in the DSDT indicates coherent. No other changes are needed. If there's a vIOMMU at some later time, then we add the use of arch_setup_dma_ops() for each device. Any thoughts on these two approaches vs. adding the VMbus special case into pci_dma_configure()? My preference would be to avoid adding that special case if one of the other two approaches is reasonable. Michael > > Cheers, > Robin. > > ----->8----- > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > index 588588cfda48..7d92ccad1569 100644 > --- a/drivers/pci/pci-driver.c > +++ b/drivers/pci/pci-driver.c > @@ -20,6 +20,7 @@ > #include <linux/of_device.h> > #include <linux/acpi.h> > #include <linux/dma-map-ops.h> > +#include <linux/hyperv.h> > #include "pci.h" > #include "pcie/portdrv.h" > > @@ -1602,6 +1603,8 @@ static int pci_dma_configure(struct device *dev) > struct acpi_device *adev = to_acpi_device_node(bridge->fwnode); > > ret = acpi_dma_configure(dev, acpi_get_dma_attr(adev)); > + } else if (is_vmbus_dev(bridge)) { > + ret = hv_dma_configure(dev, device_get_dma_attr(bridge)); > } > > pci_put_host_bridge_device(bridge); > diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h > index f565a8938836..d1d4dd3d5a3a 100644 > --- a/include/linux/hyperv.h > +++ b/include/linux/hyperv.h > @@ -1764,4 +1764,19 @@ static inline unsigned long virt_to_hvpfn(void *addr) > #define HVPFN_DOWN(x) ((x) >> HV_HYP_PAGE_SHIFT) > #define page_to_hvpfn(page) (page_to_pfn(page) * > NR_HV_HYP_PAGES_IN_PAGE) > > +static inline bool is_vmbus_dev(struct device *dev) > +{ > + /* > + * dev->bus == &hv_bus would break when the caller is built-in > + * and CONFIG_HYPERV=m, so look for it by name instead. > + */ > + return !strcmp(dev->bus->name, "vmbus"); > +} > + > +static inline int hv_dma_configure(struct device *dev, enum > dev_dma_attr attr) > +{ > + arch_setup_dma_ops(dev, 0, 0, NULL, attr == DEV_DMA_COHERENT); > + return 0; > +} > + > #endif /* _HYPERV_H */
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c index ae0bc2f..14276f5 100644 --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -49,6 +49,7 @@ #include <linux/refcount.h> #include <linux/irqdomain.h> #include <linux/acpi.h> +#include <linux/dma-map-ops.h> #include <asm/mshyperv.h> /* @@ -2142,9 +2143,9 @@ static void hv_pci_remove_slots(struct hv_pcibus_device *hbus) } /* - * Set NUMA node for the devices on the bus + * Set NUMA node and DMA coherence for the devices on the bus */ -static void hv_pci_assign_numa_node(struct hv_pcibus_device *hbus) +static void hv_pci_assign_properties(struct hv_pcibus_device *hbus) { struct pci_dev *dev; struct pci_bus *bus = hbus->bridge->bus; @@ -2167,6 +2168,14 @@ static void hv_pci_assign_numa_node(struct hv_pcibus_device *hbus) numa_map_to_online_node( hv_dev->desc.virtual_numa_node)); + /* + * On ARM64, propagate the DMA coherence from the VMbus device + * to the corresponding PCI device. On x86/x64, these calls + * have no effect because DMA is always hardware coherent. + */ + dev_set_dma_coherent(&dev->dev, + dev_is_dma_coherent(&hbus->hdev->device)); + put_pcichild(hv_dev); } } @@ -2191,7 +2200,7 @@ static int create_root_hv_pci_bus(struct hv_pcibus_device *hbus) return error; pci_lock_rescan_remove(); - hv_pci_assign_numa_node(hbus); + hv_pci_assign_properties(hbus); pci_bus_assign_resources(bridge->bus); hv_pci_assign_slots(hbus); pci_bus_add_devices(bridge->bus); @@ -2458,7 +2467,7 @@ static void pci_devices_present_work(struct work_struct *work) */ pci_lock_rescan_remove(); pci_scan_child_bus(hbus->bridge->bus); - hv_pci_assign_numa_node(hbus); + hv_pci_assign_properties(hbus); hv_pci_assign_slots(hbus); pci_unlock_rescan_remove(); break;
PCI pass-thru devices in a Hyper-V VM are represented as a VMBus device and as a PCI device. The coherence of the VMbus device is set based on the VMbus node in ACPI, but the PCI device has no ACPI node and defaults to not hardware coherent. This results in extra software coherence management overhead on ARM64 when devices are hardware coherent. Fix this by propagating the coherence of the VMbus device to the PCI device. There's no effect on x86/x64 where devices are always hardware coherent. Signed-off-by: Michael Kelley <mikelley@microsoft.com> --- drivers/pci/controller/pci-hyperv.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-)