diff mbox series

[v4,04/13] PCI: portdrv: Suppress kernel DMA ownership auto-claiming

Message ID 20211217063708.1740334-5-baolu.lu@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series Fix BUG_ON in vfio_iommu_group_notifier() | expand

Commit Message

Baolu Lu Dec. 17, 2021, 6:36 a.m. UTC
IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
then all of the downstream devices will be part of the same IOMMU group
as the bridge. The existing vfio framework allows the portdrv driver to
be bound to the bridge while its downstream devices are assigned to user
space. The pci_dma_configure() marks the iommu_group as containing only
devices with kernel drivers that manage DMA. Avoid this default behavior
for the portdrv driver in order for compatibility with the current vfio
policy.

The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") extended above
policy to all kernel drivers of bridge class. This is not always safe.
For example, The shpchp_core driver relies on the PCI MMIO access for the
controller functionality. With its downstream devices assigned to the
userspace, the MMIO might be changed through user initiated P2P accesses
without any notification. This might break the kernel driver integrity
and lead to some unpredictable consequences.

For any bridge driver, in order to avoiding default kernel DMA ownership
claiming, we should consider:

 1) Does the bridge driver use DMA? Calling pci_set_master() or
    a dma_map_* API is a sure indicate the driver is doing DMA

 2) If the bridge driver uses MMIO, is it tolerant to hostile
    userspace also touching the same MMIO registers via P2P DMA
    attacks?

Conservatively if the driver maps an MMIO region at all, we can say that
it fails the test.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/pci/pcie/portdrv_pci.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Bjorn Helgaas Dec. 29, 2021, 9:16 p.m. UTC | #1
On Fri, Dec 17, 2021 at 02:36:59PM +0800, Lu Baolu wrote:
> IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
> then all of the downstream devices will be part of the same IOMMU group
> as the bridge. The existing vfio framework allows the portdrv driver to
> be bound to the bridge while its downstream devices are assigned to user
> space. The pci_dma_configure() marks the iommu_group as containing only
> devices with kernel drivers that manage DMA. Avoid this default behavior
> for the portdrv driver in order for compatibility with the current vfio
> policy.

A word about the isolation would be useful.  I think you're referring
to some specific ACS controls, probably P2P Request Redirect?

I guess this is just a wording issue, but I think it's actually the
*lack* of some ACS controls that forces us to put several devices in
the same IOMMU group, isn't it?  It's not that we start with "IOMMU
grouping" and that necessitates something else.

Maybe something like this?

  If a switch lacks ACS P2P Request Redirect (and possibly other
  controls?), a device below the switch can bypass the IOMMU and DMA
  directly to other devices below the switch, so all the downstream
  devices must be in the same IOMMU group as the switch itself.

> The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") extended above
> policy to all kernel drivers of bridge class. This is not always safe.
> For example, The shpchp_core driver relies on the PCI MMIO access for the
> controller functionality. With its downstream devices assigned to the
> userspace, the MMIO might be changed through user initiated P2P accesses
> without any notification. This might break the kernel driver integrity
> and lead to some unpredictable consequences.
> 
> For any bridge driver, in order to avoiding default kernel DMA ownership
> claiming, we should consider:
> 
>  1) Does the bridge driver use DMA? Calling pci_set_master() or
>     a dma_map_* API is a sure indicate the driver is doing DMA
> 
>  2) If the bridge driver uses MMIO, is it tolerant to hostile
>     userspace also touching the same MMIO registers via P2P DMA
>     attacks?
> 
> Conservatively if the driver maps an MMIO region at all, we can say that
> it fails the test.

I'm not sure what all this explanation is telling me.  It says
something done by 5f096b14d421 is not always safe, but this patch
doesn't fix any of those unsafe things.

If it doesn't explain why we need this patch or how this patch works,
I don't think we need it in the commit log.

Maybe this is an explanation for why you didn't set
.suppress_auto_claim_dma_owner for shpc_driver?

Minor typos above:
  s/in order to avoiding default/before avoiding default/
  s/relies on the PCI MMIO access/relies on PCI MMIO access/
  s/For example, The/For example, the/
  s/is a sure indicate the/is a sure indication the/

> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Suggested-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/pci/pcie/portdrv_pci.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index 35eca6277a96..c48a8734f9c4 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -202,7 +202,10 @@ static struct pci_driver pcie_portdriver = {
>  
>  	.err_handler	= &pcie_portdrv_err_handler,
>  
> -	.driver.pm	= PCIE_PORTDRV_PM_OPS,
> +	.driver		= {
> +		.pm = PCIE_PORTDRV_PM_OPS,
> +		.suppress_auto_claim_dma_owner = true,
> +	},
>  };
>  
>  static int __init dmi_pcie_pme_disable_msi(const struct dmi_system_id *d)
> -- 
> 2.25.1
>
Baolu Lu Dec. 30, 2021, 5:49 a.m. UTC | #2
Hi Bjorn,

On 12/30/21 5:16 AM, Bjorn Helgaas wrote:
> On Fri, Dec 17, 2021 at 02:36:59PM +0800, Lu Baolu wrote:
>> IOMMU grouping on PCI necessitates that if we lack isolation on a bridge
>> then all of the downstream devices will be part of the same IOMMU group
>> as the bridge. The existing vfio framework allows the portdrv driver to
>> be bound to the bridge while its downstream devices are assigned to user
>> space. The pci_dma_configure() marks the iommu_group as containing only
>> devices with kernel drivers that manage DMA. Avoid this default behavior
>> for the portdrv driver in order for compatibility with the current vfio
>> policy.
> 
> A word about the isolation would be useful.  I think you're referring
> to some specific ACS controls, probably P2P Request Redirect?
> 
> I guess this is just a wording issue, but I think it's actually the
> *lack* of some ACS controls that forces us to put several devices in
> the same IOMMU group, isn't it?  It's not that we start with "IOMMU
> grouping" and that necessitates something else.
> 
> Maybe something like this?
> 
>    If a switch lacks ACS P2P Request Redirect (and possibly other
>    controls?), a device below the switch can bypass the IOMMU and DMA
>    directly to other devices below the switch, so all the downstream
>    devices must be in the same IOMMU group as the switch itself.

Yes. That's what it means from the perspective of PCI/PCIe. I will use
this in the next version. Thanks!

> 
>> The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") extended above
>> policy to all kernel drivers of bridge class. This is not always safe.
>> For example, The shpchp_core driver relies on the PCI MMIO access for the
>> controller functionality. With its downstream devices assigned to the
>> userspace, the MMIO might be changed through user initiated P2P accesses
>> without any notification. This might break the kernel driver integrity
>> and lead to some unpredictable consequences.
>>
>> For any bridge driver, in order to avoiding default kernel DMA ownership
>> claiming, we should consider:
>>
>>   1) Does the bridge driver use DMA? Calling pci_set_master() or
>>      a dma_map_* API is a sure indicate the driver is doing DMA
>>
>>   2) If the bridge driver uses MMIO, is it tolerant to hostile
>>      userspace also touching the same MMIO registers via P2P DMA
>>      attacks?
>>
>> Conservatively if the driver maps an MMIO region at all, we can say that
>> it fails the test.
> 
> I'm not sure what all this explanation is telling me.  It says
> something done by 5f096b14d421 is not always safe, but this patch
> doesn't fix any of those unsafe things.
> 
> If it doesn't explain why we need this patch or how this patch works,
> I don't think we need it in the commit log.
> 
> Maybe this is an explanation for why you didn't set
> .suppress_auto_claim_dma_owner for shpc_driver?

You are right. This doesn't explain why this is needed and how it works.
It only explains why we don't do the same thing to other pci port
drivers. I will move this out of the commit message. Perhaps put it
in the cover letter or some patches for vifo.

> 
> Minor typos above:
>    s/in order to avoiding default/before avoiding default/
>    s/relies on the PCI MMIO access/relies on PCI MMIO access/
>    s/For example, The/For example, the/
>    s/is a sure indicate the/is a sure indication the/

Thank you! I will correct these.

> 
>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>> Suggested-by: Kevin Tian <kevin.tian@intel.com>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   drivers/pci/pcie/portdrv_pci.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
>> index 35eca6277a96..c48a8734f9c4 100644
>> --- a/drivers/pci/pcie/portdrv_pci.c
>> +++ b/drivers/pci/pcie/portdrv_pci.c
>> @@ -202,7 +202,10 @@ static struct pci_driver pcie_portdriver = {
>>   
>>   	.err_handler	= &pcie_portdrv_err_handler,
>>   
>> -	.driver.pm	= PCIE_PORTDRV_PM_OPS,
>> +	.driver		= {
>> +		.pm = PCIE_PORTDRV_PM_OPS,
>> +		.suppress_auto_claim_dma_owner = true,
>> +	},
>>   };
>>   
>>   static int __init dmi_pcie_pme_disable_msi(const struct dmi_system_id *d)
>> -- 
>> 2.25.1
>>

Best regards,
baolu
diff mbox series

Patch

diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 35eca6277a96..c48a8734f9c4 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -202,7 +202,10 @@  static struct pci_driver pcie_portdriver = {
 
 	.err_handler	= &pcie_portdrv_err_handler,
 
-	.driver.pm	= PCIE_PORTDRV_PM_OPS,
+	.driver		= {
+		.pm = PCIE_PORTDRV_PM_OPS,
+		.suppress_auto_claim_dma_owner = true,
+	},
 };
 
 static int __init dmi_pcie_pme_disable_msi(const struct dmi_system_id *d)