Message ID | 1612385805-3412-12-git-send-email-megha.dey@intel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Introduce dev-msi and interrupt message store | expand |
On Wed, Feb 03, 2021 at 12:56:44PM -0800, Megha Dey wrote: > +bool arch_support_pci_device_ims(struct pci_dev *pdev) > +{ Consistent language please, we are not using IMS elsewhere, this feature is called device_msi in Linux. Jason
Hi Jason, On 2021/2/4 20:14, Jason Gunthorpe wrote: > On Wed, Feb 03, 2021 at 12:56:44PM -0800, Megha Dey wrote: >> +bool arch_support_pci_device_ims(struct pci_dev *pdev) >> +{ > > Consistent language please, we are not using IMS elsewhere, this > feature is called device_msi in Linux. > Thanks for pointing this out. I will correct it. > Jason > Best regards, baolu
On Wed, Feb 03, 2021 at 12:56:44PM -0800, Megha Dey wrote: > From: Lu Baolu <baolu.lu@linux.intel.com> > > The pci_subdevice_msi_create_irq_domain() should fail if the underlying > platform is not able to support IMS (Interrupt Message Storage). Otherwise, > the isolation of interrupt is not guaranteed. > > For x86, IMS is only supported on bare metal for now. We could enable it > in the virtualization environments in the future if interrupt HYPERCALL > domain is supported or the hardware has the capability of interrupt > isolation for subdevices. > > Cc: David Woodhouse <dwmw@amazon.co.uk> > Cc: Leon Romanovsky <leon@kernel.org> > Cc: Kevin Tian <kevin.tian@intel.com> > Suggested-by: Thomas Gleixner <tglx@linutronix.de> > Link: https://lore.kernel.org/linux-pci/87pn4nk7nn.fsf@nanos.tec.linutronix.de/ > Link: https://lore.kernel.org/linux-pci/877dqrnzr3.fsf@nanos.tec.linutronix.de/ > Link: https://lore.kernel.org/linux-pci/877dqqmc2h.fsf@nanos.tec.linutronix.de/ > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> > Signed-off-by: Megha Dey <megha.dey@intel.com> > --- > arch/x86/pci/common.c | 74 +++++++++++++++++++++++++++++++++++++++++++++ > drivers/base/platform-msi.c | 8 +++++ > include/linux/msi.h | 1 + > 3 files changed, 83 insertions(+) > > diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c > index 3507f45..263ccf6 100644 > --- a/arch/x86/pci/common.c > +++ b/arch/x86/pci/common.c > @@ -12,6 +12,8 @@ > #include <linux/init.h> > #include <linux/dmi.h> > #include <linux/slab.h> > +#include <linux/iommu.h> > +#include <linux/msi.h> > > #include <asm/acpi.h> > #include <asm/segment.h> > @@ -724,3 +726,75 @@ struct pci_dev *pci_real_dma_dev(struct pci_dev *dev) > return dev; > } > #endif > + > +#ifdef CONFIG_DEVICE_MSI Sorry for my naive question, but I see it in all your patches in this series and wonder why did you wrap everything with ifdefs?. All *.c code is wrapped with those ifdefs, which is hard to navigate and unlikely to give any code/size optimization benefit if kernel is compiled without CONFIG_DEVICE_MSI. The more common approach is to put those ifdef in the public header files and leave to the compiler to drop not called functions. Thanks
Hi Leon, On 2/8/21 4:21 PM, Leon Romanovsky wrote: > On Wed, Feb 03, 2021 at 12:56:44PM -0800, Megha Dey wrote: >> From: Lu Baolu <baolu.lu@linux.intel.com> >> >> The pci_subdevice_msi_create_irq_domain() should fail if the underlying >> platform is not able to support IMS (Interrupt Message Storage). Otherwise, >> the isolation of interrupt is not guaranteed. >> >> For x86, IMS is only supported on bare metal for now. We could enable it >> in the virtualization environments in the future if interrupt HYPERCALL >> domain is supported or the hardware has the capability of interrupt >> isolation for subdevices. >> >> Cc: David Woodhouse <dwmw@amazon.co.uk> >> Cc: Leon Romanovsky <leon@kernel.org> >> Cc: Kevin Tian <kevin.tian@intel.com> >> Suggested-by: Thomas Gleixner <tglx@linutronix.de> >> Link: https://lore.kernel.org/linux-pci/87pn4nk7nn.fsf@nanos.tec.linutronix.de/ >> Link: https://lore.kernel.org/linux-pci/877dqrnzr3.fsf@nanos.tec.linutronix.de/ >> Link: https://lore.kernel.org/linux-pci/877dqqmc2h.fsf@nanos.tec.linutronix.de/ >> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> >> Signed-off-by: Megha Dey <megha.dey@intel.com> >> --- >> arch/x86/pci/common.c | 74 +++++++++++++++++++++++++++++++++++++++++++++ >> drivers/base/platform-msi.c | 8 +++++ >> include/linux/msi.h | 1 + >> 3 files changed, 83 insertions(+) >> >> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c >> index 3507f45..263ccf6 100644 >> --- a/arch/x86/pci/common.c >> +++ b/arch/x86/pci/common.c >> @@ -12,6 +12,8 @@ >> #include <linux/init.h> >> #include <linux/dmi.h> >> #include <linux/slab.h> >> +#include <linux/iommu.h> >> +#include <linux/msi.h> >> >> #include <asm/acpi.h> >> #include <asm/segment.h> >> @@ -724,3 +726,75 @@ struct pci_dev *pci_real_dma_dev(struct pci_dev *dev) >> return dev; >> } >> #endif >> + >> +#ifdef CONFIG_DEVICE_MSI > > Sorry for my naive question, but I see it in all your patches in this series > and wonder why did you wrap everything with ifdefs?. The added code is only called when DEVICE_MSI is configured. > > All *.c code is wrapped with those ifdefs, which is hard to navigate and > unlikely to give any code/size optimization benefit if kernel is compiled > without CONFIG_DEVICE_MSI. The more common approach is to put those > ifdef in the public header files and leave to the compiler to drop not > called functions. Yes. This looks better. > > Thanks > Best regards, baolu
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c index 3507f45..263ccf6 100644 --- a/arch/x86/pci/common.c +++ b/arch/x86/pci/common.c @@ -12,6 +12,8 @@ #include <linux/init.h> #include <linux/dmi.h> #include <linux/slab.h> +#include <linux/iommu.h> +#include <linux/msi.h> #include <asm/acpi.h> #include <asm/segment.h> @@ -724,3 +726,75 @@ struct pci_dev *pci_real_dma_dev(struct pci_dev *dev) return dev; } #endif + +#ifdef CONFIG_DEVICE_MSI +/* + * We want to figure out which context we are running in. But the hardware + * does not introduce a reliable way (instruction, CPUID leaf, MSR, whatever) + * which can be manipulated by the VMM to let the OS figure out where it runs. + * So we go with the below probably on_bare_metal() function as a replacement + * for definitely on_bare_metal() to go forward only for the very simple reason + * that this is the only option we have. + */ +static const char * const vmm_vendor_name[] = { + "QEMU", "Bochs", "KVM", "Xen", "VMware", "VMW", "VMware Inc.", + "innotek GmbH", "Oracle Corporation", "Parallels", "BHYVE" +}; + +static void read_type0_virtual_machine(const struct dmi_header *dm, void *p) +{ + u8 *data = (u8 *)dm + 0x13; + + /* BIOS Information (Type 0) */ + if (dm->type != 0 || dm->length < 0x14) + return; + + /* Bit 4 of BIOS Characteristics Extension Byte 2*/ + if (*data & BIT(4)) + *((bool *)p) = true; +} + +static bool smbios_virtual_machine(void) +{ + bool bit_present = false; + + dmi_walk(read_type0_virtual_machine, &bit_present); + + return bit_present; +} + +static bool on_bare_metal(struct device *dev) +{ + int i; + + if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) + return false; + + if (smbios_virtual_machine()) + return false; + + if (iommu_capable(dev->bus, IOMMU_CAP_VIOMMU_HINT)) + return false; + + for (i = 0; i < ARRAY_SIZE(vmm_vendor_name); i++) + if (dmi_match(DMI_SYS_VENDOR, vmm_vendor_name[i])) + return false; + + pr_info("System running on bare metal, report to bugzilla.kernel.org if not the case."); + + return true; +} + +bool arch_support_pci_device_ims(struct pci_dev *pdev) +{ + /* + * When we are running in a VMM context, the device IMS could only be + * enabled when the underlying hardware supports interrupt isolation + * of the subdevice, or any mechanism (trap, hypercall) is added so + * that changes in the interrupt message store could be managed by the + * VMM. For now, we only support the device IMS when we are running on + * the bare metal. + */ + return on_bare_metal(&pdev->dev); +} +#endif diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c index 6127b3b..d5ae26f 100644 --- a/drivers/base/platform-msi.c +++ b/drivers/base/platform-msi.c @@ -519,6 +519,11 @@ struct irq_domain *device_msi_create_irq_domain(struct fwnode_handle *fn, #ifdef CONFIG_PCI #include <linux/pci.h> +bool __weak arch_support_pci_device_ims(struct pci_dev *pdev) +{ + return false; +} + /** * pci_subdevice_msi_create_irq_domain - Create an irq domain for subdevices * @pdev: Pointer to PCI device for which the subdevice domain is created @@ -530,6 +535,9 @@ struct irq_domain *pci_subdevice_msi_create_irq_domain(struct pci_dev *pdev, struct irq_domain *domain, *pdev_msi; struct fwnode_handle *fn; + if (!arch_support_pci_device_ims(pdev)) + return NULL; + /* * Retrieve the MSI domain of the underlying PCI device's MSI * domain. The PCI device domain's parent domain is also the parent diff --git a/include/linux/msi.h b/include/linux/msi.h index a6b419d..fa02542 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -478,6 +478,7 @@ struct irq_domain *device_msi_create_irq_domain(struct fwnode_handle *fn, struct irq_domain *parent); # ifdef CONFIG_PCI +bool arch_support_pci_device_ims(struct pci_dev *pdev); struct irq_domain *pci_subdevice_msi_create_irq_domain(struct pci_dev *pdev, struct msi_domain_info *info); # endif