Message ID | 20211217063708.1740334-5-baolu.lu@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix BUG_ON in vfio_iommu_group_notifier() | expand |
On Fri, Dec 17, 2021 at 02:36:59PM +0800, Lu Baolu wrote: > IOMMU grouping on PCI necessitates that if we lack isolation on a bridge > then all of the downstream devices will be part of the same IOMMU group > as the bridge. The existing vfio framework allows the portdrv driver to > be bound to the bridge while its downstream devices are assigned to user > space. The pci_dma_configure() marks the iommu_group as containing only > devices with kernel drivers that manage DMA. Avoid this default behavior > for the portdrv driver in order for compatibility with the current vfio > policy. A word about the isolation would be useful. I think you're referring to some specific ACS controls, probably P2P Request Redirect? I guess this is just a wording issue, but I think it's actually the *lack* of some ACS controls that forces us to put several devices in the same IOMMU group, isn't it? It's not that we start with "IOMMU grouping" and that necessitates something else. Maybe something like this? If a switch lacks ACS P2P Request Redirect (and possibly other controls?), a device below the switch can bypass the IOMMU and DMA directly to other devices below the switch, so all the downstream devices must be in the same IOMMU group as the switch itself. > The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") extended above > policy to all kernel drivers of bridge class. This is not always safe. > For example, The shpchp_core driver relies on the PCI MMIO access for the > controller functionality. With its downstream devices assigned to the > userspace, the MMIO might be changed through user initiated P2P accesses > without any notification. This might break the kernel driver integrity > and lead to some unpredictable consequences. > > For any bridge driver, in order to avoiding default kernel DMA ownership > claiming, we should consider: > > 1) Does the bridge driver use DMA? Calling pci_set_master() or > a dma_map_* API is a sure indicate the driver is doing DMA > > 2) If the bridge driver uses MMIO, is it tolerant to hostile > userspace also touching the same MMIO registers via P2P DMA > attacks? > > Conservatively if the driver maps an MMIO region at all, we can say that > it fails the test. I'm not sure what all this explanation is telling me. It says something done by 5f096b14d421 is not always safe, but this patch doesn't fix any of those unsafe things. If it doesn't explain why we need this patch or how this patch works, I don't think we need it in the commit log. Maybe this is an explanation for why you didn't set .suppress_auto_claim_dma_owner for shpc_driver? Minor typos above: s/in order to avoiding default/before avoiding default/ s/relies on the PCI MMIO access/relies on PCI MMIO access/ s/For example, The/For example, the/ s/is a sure indicate the/is a sure indication the/ > Suggested-by: Jason Gunthorpe <jgg@nvidia.com> > Suggested-by: Kevin Tian <kevin.tian@intel.com> > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> > --- > drivers/pci/pcie/portdrv_pci.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c > index 35eca6277a96..c48a8734f9c4 100644 > --- a/drivers/pci/pcie/portdrv_pci.c > +++ b/drivers/pci/pcie/portdrv_pci.c > @@ -202,7 +202,10 @@ static struct pci_driver pcie_portdriver = { > > .err_handler = &pcie_portdrv_err_handler, > > - .driver.pm = PCIE_PORTDRV_PM_OPS, > + .driver = { > + .pm = PCIE_PORTDRV_PM_OPS, > + .suppress_auto_claim_dma_owner = true, > + }, > }; > > static int __init dmi_pcie_pme_disable_msi(const struct dmi_system_id *d) > -- > 2.25.1 >
Hi Bjorn, On 12/30/21 5:16 AM, Bjorn Helgaas wrote: > On Fri, Dec 17, 2021 at 02:36:59PM +0800, Lu Baolu wrote: >> IOMMU grouping on PCI necessitates that if we lack isolation on a bridge >> then all of the downstream devices will be part of the same IOMMU group >> as the bridge. The existing vfio framework allows the portdrv driver to >> be bound to the bridge while its downstream devices are assigned to user >> space. The pci_dma_configure() marks the iommu_group as containing only >> devices with kernel drivers that manage DMA. Avoid this default behavior >> for the portdrv driver in order for compatibility with the current vfio >> policy. > > A word about the isolation would be useful. I think you're referring > to some specific ACS controls, probably P2P Request Redirect? > > I guess this is just a wording issue, but I think it's actually the > *lack* of some ACS controls that forces us to put several devices in > the same IOMMU group, isn't it? It's not that we start with "IOMMU > grouping" and that necessitates something else. > > Maybe something like this? > > If a switch lacks ACS P2P Request Redirect (and possibly other > controls?), a device below the switch can bypass the IOMMU and DMA > directly to other devices below the switch, so all the downstream > devices must be in the same IOMMU group as the switch itself. Yes. That's what it means from the perspective of PCI/PCIe. I will use this in the next version. Thanks! > >> The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") extended above >> policy to all kernel drivers of bridge class. This is not always safe. >> For example, The shpchp_core driver relies on the PCI MMIO access for the >> controller functionality. With its downstream devices assigned to the >> userspace, the MMIO might be changed through user initiated P2P accesses >> without any notification. This might break the kernel driver integrity >> and lead to some unpredictable consequences. >> >> For any bridge driver, in order to avoiding default kernel DMA ownership >> claiming, we should consider: >> >> 1) Does the bridge driver use DMA? Calling pci_set_master() or >> a dma_map_* API is a sure indicate the driver is doing DMA >> >> 2) If the bridge driver uses MMIO, is it tolerant to hostile >> userspace also touching the same MMIO registers via P2P DMA >> attacks? >> >> Conservatively if the driver maps an MMIO region at all, we can say that >> it fails the test. > > I'm not sure what all this explanation is telling me. It says > something done by 5f096b14d421 is not always safe, but this patch > doesn't fix any of those unsafe things. > > If it doesn't explain why we need this patch or how this patch works, > I don't think we need it in the commit log. > > Maybe this is an explanation for why you didn't set > .suppress_auto_claim_dma_owner for shpc_driver? You are right. This doesn't explain why this is needed and how it works. It only explains why we don't do the same thing to other pci port drivers. I will move this out of the commit message. Perhaps put it in the cover letter or some patches for vifo. > > Minor typos above: > s/in order to avoiding default/before avoiding default/ > s/relies on the PCI MMIO access/relies on PCI MMIO access/ > s/For example, The/For example, the/ > s/is a sure indicate the/is a sure indication the/ Thank you! I will correct these. > >> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> >> Suggested-by: Kevin Tian <kevin.tian@intel.com> >> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> >> --- >> drivers/pci/pcie/portdrv_pci.c | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c >> index 35eca6277a96..c48a8734f9c4 100644 >> --- a/drivers/pci/pcie/portdrv_pci.c >> +++ b/drivers/pci/pcie/portdrv_pci.c >> @@ -202,7 +202,10 @@ static struct pci_driver pcie_portdriver = { >> >> .err_handler = &pcie_portdrv_err_handler, >> >> - .driver.pm = PCIE_PORTDRV_PM_OPS, >> + .driver = { >> + .pm = PCIE_PORTDRV_PM_OPS, >> + .suppress_auto_claim_dma_owner = true, >> + }, >> }; >> >> static int __init dmi_pcie_pme_disable_msi(const struct dmi_system_id *d) >> -- >> 2.25.1 >> Best regards, baolu
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c index 35eca6277a96..c48a8734f9c4 100644 --- a/drivers/pci/pcie/portdrv_pci.c +++ b/drivers/pci/pcie/portdrv_pci.c @@ -202,7 +202,10 @@ static struct pci_driver pcie_portdriver = { .err_handler = &pcie_portdrv_err_handler, - .driver.pm = PCIE_PORTDRV_PM_OPS, + .driver = { + .pm = PCIE_PORTDRV_PM_OPS, + .suppress_auto_claim_dma_owner = true, + }, }; static int __init dmi_pcie_pme_disable_msi(const struct dmi_system_id *d)
IOMMU grouping on PCI necessitates that if we lack isolation on a bridge then all of the downstream devices will be part of the same IOMMU group as the bridge. The existing vfio framework allows the portdrv driver to be bound to the bridge while its downstream devices are assigned to user space. The pci_dma_configure() marks the iommu_group as containing only devices with kernel drivers that manage DMA. Avoid this default behavior for the portdrv driver in order for compatibility with the current vfio policy. The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") extended above policy to all kernel drivers of bridge class. This is not always safe. For example, The shpchp_core driver relies on the PCI MMIO access for the controller functionality. With its downstream devices assigned to the userspace, the MMIO might be changed through user initiated P2P accesses without any notification. This might break the kernel driver integrity and lead to some unpredictable consequences. For any bridge driver, in order to avoiding default kernel DMA ownership claiming, we should consider: 1) Does the bridge driver use DMA? Calling pci_set_master() or a dma_map_* API is a sure indicate the driver is doing DMA 2) If the bridge driver uses MMIO, is it tolerant to hostile userspace also touching the same MMIO registers via P2P DMA attacks? Conservatively if the driver maps an MMIO region at all, we can say that it fails the test. Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> --- drivers/pci/pcie/portdrv_pci.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)