Message ID | 20220104015644.2294354-10-baolu.lu@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix BUG_ON in vfio_iommu_group_notifier() | expand |
On Tue, Jan 04, 2022 at 09:56:39AM +0800, Lu Baolu wrote: > If a switch lacks ACS P2P Request Redirect, a device below the switch can > bypass the IOMMU and DMA directly to other devices below the switch, so > all the downstream devices must be in the same IOMMU group as the switch > itself. Help me think through what's going on here. IIUC, we put devices in the same IOMMU group when they can interfere with each other in any way (DMA, config access, etc). (We said "DMA" above, but I guess this would also apply to config requests, right?) *This* patch doesn't check for any ACS features. Can you connect the dots for me? I guess the presence or absence of P2P Request Redirect determines the size of the IOMMU group. And the following says something about what is allowed in the group? And .no_kernel_api_dma allows an exception to the general rule? > The existing vfio framework allows the portdrv driver to be bound > to the bridge while its downstream devices are assigned to user space. I.e., the existing VFIO framework allows a switch to be in the same IOMMU group as the devices below it, even though the switch has a kernel driver and the other devices may have userspace drivers? Is this a function of VFIO design or of the IOMMU driver? > The pci_dma_configure() marks the iommu_group as containing only devices > with kernel drivers that manage DMA. Avoid this default behavior for the > portdrv driver in order for compatibility with the current vfio policy. I assume "IOMMU group" means the same as "iommu_group"; maybe we can use one of them consistently? > Suggested-by: Jason Gunthorpe <jgg@nvidia.com> > Suggested-by: Kevin Tian <kevin.tian@intel.com> > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> > --- > drivers/pci/pcie/portdrv_pci.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c > index 35eca6277a96..2116f821c005 100644 > --- a/drivers/pci/pcie/portdrv_pci.c > +++ b/drivers/pci/pcie/portdrv_pci.c > @@ -202,6 +202,8 @@ static struct pci_driver pcie_portdriver = { > > .err_handler = &pcie_portdrv_err_handler, > > + .no_kernel_api_dma = true, > + > .driver.pm = PCIE_PORTDRV_PM_OPS, > }; > > -- > 2.25.1 >
On Tue, Jan 04, 2022 at 11:06:31AM -0600, Bjorn Helgaas wrote: > > The existing vfio framework allows the portdrv driver to be bound > > to the bridge while its downstream devices are assigned to user space. > > I.e., the existing VFIO framework allows a switch to be in the same > IOMMU group as the devices below it, even though the switch has a > kernel driver and the other devices may have userspace drivers? Yes, this patch exists to maintain current VFIO behavior which has this same check. I belive the basis for VFIO doing this is that the these devices cannot do DMA, so don't care about the DMA API or the group->domain, and do not expose MMIO memory so do not care about the P2P attack. A comment in the code to this effect would be good, IMHO. Jason
On Tue, Jan 04, 2022 at 03:26:14PM -0400, Jason Gunthorpe wrote: > On Tue, Jan 04, 2022 at 11:06:31AM -0600, Bjorn Helgaas wrote: > > > > The existing vfio framework allows the portdrv driver to be bound > > > to the bridge while its downstream devices are assigned to user space. > > > > I.e., the existing VFIO framework allows a switch to be in the same > > IOMMU group as the devices below it, even though the switch has a > > kernel driver and the other devices may have userspace drivers? > > Yes, this patch exists to maintain current VFIO behavior which has this > same check. > > I belive the basis for VFIO doing this is that the these devices > cannot do DMA, so don't care about the DMA API or the group->domain, > and do not expose MMIO memory so do not care about the P2P attack. "These devices" means bridges, right? Not sure why we wouldn't care about the P2P attack. PCIe switches use MSI or MSI-X for hotplug, PME, etc, so they do DMA for that. Is that not relevant here? Is there something that *prohibits* a bridge from having device-specific functionality including DMA? I know some bridges have device-specific BARs for performance counters and the like.
On Tue, Jan 04, 2022 at 01:51:30PM -0600, Bjorn Helgaas wrote: > On Tue, Jan 04, 2022 at 03:26:14PM -0400, Jason Gunthorpe wrote: > > On Tue, Jan 04, 2022 at 11:06:31AM -0600, Bjorn Helgaas wrote: > > > > > > The existing vfio framework allows the portdrv driver to be bound > > > > to the bridge while its downstream devices are assigned to user space. > > > > > > I.e., the existing VFIO framework allows a switch to be in the same > > > IOMMU group as the devices below it, even though the switch has a > > > kernel driver and the other devices may have userspace drivers? > > > > Yes, this patch exists to maintain current VFIO behavior which has this > > same check. > > > > I belive the basis for VFIO doing this is that the these devices > > cannot do DMA, so don't care about the DMA API or the group->domain, > > and do not expose MMIO memory so do not care about the P2P attack. > > "These devices" means bridges, right? Not sure why we wouldn't care > about the P2P attack. Yes bridges. I said "I belive" because VFIO was changed to ignore bridges a long time ago and the security rational was a bit unclear in the commit. See 5f096b14d421 ("vfio: Whitelist PCI bridges") > PCIe switches use MSI or MSI-X for hotplug, PME, etc, so they do DMA > for that. Is that not relevant here? Alex's comment in the above commit notes about interrupts, but I think the answer today is that it does not matter. For platforms like x86 that keep interrupts and DMA seperate it works fine. For platforms that comingle the iommu_domain and interrupts (do some exist?) we expect the platform to do whatever is necessary at iommu_domain attach time to ensure interrupts continue to work. (AFAICT at least) In other words we don't have an API restriction to use iommu_device_use_dma_api() to use request_irq(). So the main concern should be P2P attacks on bridge MMIO registers. > Is there something that *prohibits* a bridge from having > device-specific functionality including DMA? I'm not sure, I think not, but the 'Bus Master Enable' language does have a different definition for Type 1.. However, it doesn't matter - the question here is not about what the device HW is capable of, but what does the kernel driver do. The portdrv does not use the DMA API, so that alone is half the requirement to skip the iommu_device_use_dma_api() call. Some future device-specific bridge driver that is able to issue the device-specific MMIO's and operate the DMA API should be coded to use iommu_device_use_dma_api(), probably by using a different device_driver for the bridge. > I know some bridges have device-specific BARs for performance counters > and the like. Yep. IMHO it is probably not so great as-is.. However, this patch is just porting the status-quo of commit 5f09 into this new framework. What this new framework does allow is for portdrv to police itself. eg it could check if there is a MMIO BAR and call iommu_device_use_dma_api() out of caution. It could also have an allowance list of devices where we know hostile writes to the MMIO are known harmless. Without knowing more about what motivated 5f09 it is hard to say, other than it has been 7 years like this and nobody has complained yet :) Jason
Hi Bjorn, On 1/5/22 1:06 AM, Bjorn Helgaas wrote: > On Tue, Jan 04, 2022 at 09:56:39AM +0800, Lu Baolu wrote: >> If a switch lacks ACS P2P Request Redirect, a device below the switch can >> bypass the IOMMU and DMA directly to other devices below the switch, so >> all the downstream devices must be in the same IOMMU group as the switch >> itself. > Help me think through what's going on here. IIUC, we put devices in > the same IOMMU group when they can interfere with each other in any > way (DMA, config access, etc). > > (We said "DMA" above, but I guess this would also apply to config > requests, right?) I am not sure whether devices could interfere each other through config space access. The IOMMU hardware only protects and isolates DMA accesses, so that userspace could control DMA directly. The config accesses will always be intercepted by VFIO. Hence, I don't see a problem. > > *This* patch doesn't check for any ACS features. Can you connect the > dots for me? I guess the presence or absence of P2P Request Redirect > determines the size of the IOMMU group. And the following says It's done in iommu core (drivers/iommu/iommu.c): /* * Use standard PCI bus topology, isolation features, and DMA alias quirks * to find or create an IOMMU group for a device. */ struct iommu_group *pci_device_group(struct device *dev) > something about what is allowed in the group? And .no_kernel_api_dma > allows an exception to the general rule? > Yes. >> The pci_dma_configure() marks the iommu_group as containing only devices >> with kernel drivers that manage DMA. Avoid this default behavior for the >> portdrv driver in order for compatibility with the current vfio policy. > I assume "IOMMU group" means the same as "iommu_group"; maybe we can > use one of them consistently? Sure. Best regards, baolu
On Thu, Jan 06, 2022 at 12:12:35PM +0800, Lu Baolu wrote: > On 1/5/22 1:06 AM, Bjorn Helgaas wrote: > > On Tue, Jan 04, 2022 at 09:56:39AM +0800, Lu Baolu wrote: > > > If a switch lacks ACS P2P Request Redirect, a device below the switch can > > > bypass the IOMMU and DMA directly to other devices below the switch, so > > > all the downstream devices must be in the same IOMMU group as the switch > > > itself. > > Help me think through what's going on here. IIUC, we put devices in > > the same IOMMU group when they can interfere with each other in any > > way (DMA, config access, etc). > > > > (We said "DMA" above, but I guess this would also apply to config > > requests, right?) > > I am not sure whether devices could interfere each other through config > space access. The IOMMU hardware only protects and isolates DMA > accesses, so that userspace could control DMA directly. The config > accesses will always be intercepted by VFIO. Hence, I don't see a > problem. I was wondering about config accesses generated by an endpoint, e.g., an endpoint doing config writes to a peer or the upstream bridge. But I think that is prohibited by spec - PCIe r5.0, sec 7.3.3, says "Propagation of Configuration Requests from Downstream to Upstream as well as peer-to-peer are not supported" and "Configuration Requests are initiated only by the Host Bridge, including those passed through the SFI CAM mechanism." Bjorn
On 1/7/22 2:32 AM, Bjorn Helgaas wrote: > On Thu, Jan 06, 2022 at 12:12:35PM +0800, Lu Baolu wrote: >> On 1/5/22 1:06 AM, Bjorn Helgaas wrote: >>> On Tue, Jan 04, 2022 at 09:56:39AM +0800, Lu Baolu wrote: >>>> If a switch lacks ACS P2P Request Redirect, a device below the switch can >>>> bypass the IOMMU and DMA directly to other devices below the switch, so >>>> all the downstream devices must be in the same IOMMU group as the switch >>>> itself. >>> Help me think through what's going on here. IIUC, we put devices in >>> the same IOMMU group when they can interfere with each other in any >>> way (DMA, config access, etc). >>> >>> (We said "DMA" above, but I guess this would also apply to config >>> requests, right?) >> >> I am not sure whether devices could interfere each other through config >> space access. The IOMMU hardware only protects and isolates DMA >> accesses, so that userspace could control DMA directly. The config >> accesses will always be intercepted by VFIO. Hence, I don't see a >> problem. > > I was wondering about config accesses generated by an endpoint, e.g., > an endpoint doing config writes to a peer or the upstream bridge. > > But I think that is prohibited by spec - PCIe r5.0, sec 7.3.3, says > "Propagation of Configuration Requests from Downstream to Upstream as > well as peer-to-peer are not supported" and "Configuration Requests > are initiated only by the Host Bridge, including those passed through > the SFI CAM mechanism." That's clear. Thank you for the clarification. > > Bjorn > Best regards, baolu
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c index 35eca6277a96..2116f821c005 100644 --- a/drivers/pci/pcie/portdrv_pci.c +++ b/drivers/pci/pcie/portdrv_pci.c @@ -202,6 +202,8 @@ static struct pci_driver pcie_portdriver = { .err_handler = &pcie_portdrv_err_handler, + .no_kernel_api_dma = true, + .driver.pm = PCIE_PORTDRV_PM_OPS, };
If a switch lacks ACS P2P Request Redirect, a device below the switch can bypass the IOMMU and DMA directly to other devices below the switch, so all the downstream devices must be in the same IOMMU group as the switch itself. The existing vfio framework allows the portdrv driver to be bound to the bridge while its downstream devices are assigned to user space. The pci_dma_configure() marks the iommu_group as containing only devices with kernel drivers that manage DMA. Avoid this default behavior for the portdrv driver in order for compatibility with the current vfio policy. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> --- drivers/pci/pcie/portdrv_pci.c | 2 ++ 1 file changed, 2 insertions(+)