Message ID | 20220218005521.172832-11-baolu.lu@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix BUG_ON in vfio_iommu_group_notifier() | expand |
On Fri, 18 Feb 2022 08:55:20 +0800 Lu Baolu <baolu.lu@linux.intel.com> wrote: > The iommu core and driver core have been enhanced to avoid unsafe driver > binding to a live group after iommu_group_set_dma_owner(PRIVATE_USER) > has been called. There's no need to register iommu group notifier. This > removes the iommu group notifer which contains BUG_ON() and WARN(). > > The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") allowed all > pcieport drivers to be bound with devices while the group is assigned to > user space. This is not always safe. For example, The shpchp_core driver > relies on the PCI MMIO access for the controller functionality. With its > downstream devices assigned to the userspace, the MMIO might be changed > through user initiated P2P accesses without any notification. This might > break the kernel driver integrity and lead to some unpredictable > consequences. As the result, currently we only allow the portdrv driver. > > For any bridge driver, in order to avoiding default kernel DMA ownership > claiming, we should consider: > > 1) Does the bridge driver use DMA? Calling pci_set_master() or > a dma_map_* API is a sure indicate the driver is doing DMA > > 2) If the bridge driver uses MMIO, is it tolerant to hostile > userspace also touching the same MMIO registers via P2P DMA > attacks? > > Conservatively if the driver maps an MMIO region at all, we can say that > it fails the test. IIUC, there's a chance we're going to break user configurations if they're assigning devices from a group containing a bridge that uses a driver other than pcieport. The recommendation to such an affected user would be that the previously allowed host bridge driver was unsafe for this use case and to continue to enable assignment of devices within that group, the driver should be unbound from the bridge device or replaced with the pci-stub driver. Is that right? Unfortunately I also think a bisect of such a breakage wouldn't land here, I think it was actually broken in "vfio: Set DMA ownership for VFIO" since that's where vfio starts to make use of iommu_group_claim_dma_owner() which should fail due to pci_dma_configure() calling iommu_device_use_default_domain() for any driver not identifying itself as driver_managed_dma. If that's correct, can we leave a breadcrumb in the correct commit log indicating why this potential breakage is intentional and how the bridge driver might be reconfigured to continue to allow assignment from within the group more safely? Thanks, Alex
Hi Alex, On 2/24/22 5:53 AM, Alex Williamson wrote: > On Fri, 18 Feb 2022 08:55:20 +0800 > Lu Baolu <baolu.lu@linux.intel.com> wrote: > >> The iommu core and driver core have been enhanced to avoid unsafe driver >> binding to a live group after iommu_group_set_dma_owner(PRIVATE_USER) >> has been called. There's no need to register iommu group notifier. This >> removes the iommu group notifer which contains BUG_ON() and WARN(). >> >> The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") allowed all >> pcieport drivers to be bound with devices while the group is assigned to >> user space. This is not always safe. For example, The shpchp_core driver >> relies on the PCI MMIO access for the controller functionality. With its >> downstream devices assigned to the userspace, the MMIO might be changed >> through user initiated P2P accesses without any notification. This might >> break the kernel driver integrity and lead to some unpredictable >> consequences. As the result, currently we only allow the portdrv driver. >> >> For any bridge driver, in order to avoiding default kernel DMA ownership >> claiming, we should consider: >> >> 1) Does the bridge driver use DMA? Calling pci_set_master() or >> a dma_map_* API is a sure indicate the driver is doing DMA >> >> 2) If the bridge driver uses MMIO, is it tolerant to hostile >> userspace also touching the same MMIO registers via P2P DMA >> attacks? >> >> Conservatively if the driver maps an MMIO region at all, we can say that >> it fails the test. > > IIUC, there's a chance we're going to break user configurations if > they're assigning devices from a group containing a bridge that uses a > driver other than pcieport. The recommendation to such an affected user > would be that the previously allowed host bridge driver was unsafe for > this use case and to continue to enable assignment of devices within > that group, the driver should be unbound from the bridge device or > replaced with the pci-stub driver. Is that right? Yes. You are right. Another possible solution (for long term) is to re-audit the bridge driver code and set the .device_managed_dma field on the premise that the driver doesn't violate above potential hazards. > > Unfortunately I also think a bisect of such a breakage wouldn't land > here, I think it was actually broken in "vfio: Set DMA ownership for > VFIO" since that's where vfio starts to make use of > iommu_group_claim_dma_owner() which should fail due to > pci_dma_configure() calling iommu_device_use_default_domain() for > any driver not identifying itself as driver_managed_dma. Yes. Great point. Thank you! > > If that's correct, can we leave a breadcrumb in the correct commit log > indicating why this potential breakage is intentional and how the > bridge driver might be reconfigured to continue to allow assignment from > within the group more safely? Thanks, Sure. I will add below in the commit message of "vfio: Set DMA ownership for VFIO": " This change disallows some unsafe bridge drivers to bind to non-ACS bridges while devices under them are assigned to user space. This is an intentional enhancement and possibly breaks some existing configurations. The recommendation to such an affected user would be that the previously allowed host bridge driver was unsafe for this use case and to continue to enable assignment of devices within that group, the driver should be unbound from the bridge device or replaced with the pci-stub driver. For any bridge driver, we consider it unsafe if it satisfies any of the following conditions: 1) The bridge driver uses DMA. Calling pci_set_master() or calling any kernel DMA API (dma_map_*() and etc.) is an indicate that the driver is doing DMA. 2) If the bridge driver uses MMIO, it should be tolerant to hostile userspace also touching the same MMIO registers via P2P DMA attacks. If the bridge driver turns out to be a safe one, it could be used as before by setting the driver's .driver_managed_dma field, just like what we have done in the pcieport driver. " Best regards, baolu
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index e0df2bc692b2..dd3fac0d6bc9 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -71,7 +71,6 @@ struct vfio_group { struct vfio_container *container; struct list_head device_list; struct mutex device_lock; - struct notifier_block nb; struct list_head vfio_next; struct list_head container_next; atomic_t opened; @@ -274,8 +273,6 @@ void vfio_unregister_iommu_driver(const struct vfio_iommu_driver_ops *ops) } EXPORT_SYMBOL_GPL(vfio_unregister_iommu_driver); -static int vfio_iommu_group_notifier(struct notifier_block *nb, - unsigned long action, void *data); static void vfio_group_get(struct vfio_group *group); /* @@ -395,13 +392,6 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group, goto err_put; } - group->nb.notifier_call = vfio_iommu_group_notifier; - err = iommu_group_register_notifier(iommu_group, &group->nb); - if (err) { - ret = ERR_PTR(err); - goto err_put; - } - mutex_lock(&vfio.group_lock); /* Did we race creating this group? */ @@ -422,7 +412,6 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group, err_unlock: mutex_unlock(&vfio.group_lock); - iommu_group_unregister_notifier(group->iommu_group, &group->nb); err_put: put_device(&group->dev); return ret; @@ -447,7 +436,6 @@ static void vfio_group_put(struct vfio_group *group) cdev_device_del(&group->cdev, &group->dev); mutex_unlock(&vfio.group_lock); - iommu_group_unregister_notifier(group->iommu_group, &group->nb); put_device(&group->dev); } @@ -503,141 +491,6 @@ static struct vfio_device *vfio_group_get_device(struct vfio_group *group, return NULL; } -/* - * Some drivers, like pci-stub, are only used to prevent other drivers from - * claiming a device and are therefore perfectly legitimate for a user owned - * group. The pci-stub driver has no dependencies on DMA or the IOVA mapping - * of the device, but it does prevent the user from having direct access to - * the device, which is useful in some circumstances. - * - * We also assume that we can include PCI interconnect devices, ie. bridges. - * IOMMU grouping on PCI necessitates that if we lack isolation on a bridge - * then all of the downstream devices will be part of the same IOMMU group as - * the bridge. Thus, if placing the bridge into the user owned IOVA space - * breaks anything, it only does so for user owned devices downstream. Note - * that error notification via MSI can be affected for platforms that handle - * MSI within the same IOVA space as DMA. - */ -static const char * const vfio_driver_allowed[] = { "pci-stub" }; - -static bool vfio_dev_driver_allowed(struct device *dev, - struct device_driver *drv) -{ - if (dev_is_pci(dev)) { - struct pci_dev *pdev = to_pci_dev(dev); - - if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL) - return true; - } - - return match_string(vfio_driver_allowed, - ARRAY_SIZE(vfio_driver_allowed), - drv->name) >= 0; -} - -/* - * A vfio group is viable for use by userspace if all devices are in - * one of the following states: - * - driver-less - * - bound to a vfio driver - * - bound to an otherwise allowed driver - * - a PCI interconnect device - * - * We use two methods to determine whether a device is bound to a vfio - * driver. The first is to test whether the device exists in the vfio - * group. The second is to test if the device exists on the group - * unbound_list, indicating it's in the middle of transitioning from - * a vfio driver to driver-less. - */ -static int vfio_dev_viable(struct device *dev, void *data) -{ - struct vfio_group *group = data; - struct vfio_device *device; - struct device_driver *drv = READ_ONCE(dev->driver); - - if (!drv || vfio_dev_driver_allowed(dev, drv)) - return 0; - - device = vfio_group_get_device(group, dev); - if (device) { - vfio_device_put(device); - return 0; - } - - return -EINVAL; -} - -/* - * Async device support - */ -static int vfio_group_nb_add_dev(struct vfio_group *group, struct device *dev) -{ - struct vfio_device *device; - - /* Do we already know about it? We shouldn't */ - device = vfio_group_get_device(group, dev); - if (WARN_ON_ONCE(device)) { - vfio_device_put(device); - return 0; - } - - /* Nothing to do for idle groups */ - if (!atomic_read(&group->container_users)) - return 0; - - /* TODO Prevent device auto probing */ - dev_WARN(dev, "Device added to live group %d!\n", - iommu_group_id(group->iommu_group)); - - return 0; -} - -static int vfio_group_nb_verify(struct vfio_group *group, struct device *dev) -{ - /* We don't care what happens when the group isn't in use */ - if (!atomic_read(&group->container_users)) - return 0; - - return vfio_dev_viable(dev, group); -} - -static int vfio_iommu_group_notifier(struct notifier_block *nb, - unsigned long action, void *data) -{ - struct vfio_group *group = container_of(nb, struct vfio_group, nb); - struct device *dev = data; - - switch (action) { - case IOMMU_GROUP_NOTIFY_ADD_DEVICE: - vfio_group_nb_add_dev(group, dev); - break; - case IOMMU_GROUP_NOTIFY_DEL_DEVICE: - /* - * Nothing to do here. If the device is in use, then the - * vfio sub-driver should block the remove callback until - * it is unused. If the device is unused or attached to a - * stub driver, then it should be released and we don't - * care that it will be going away. - */ - break; - case IOMMU_GROUP_NOTIFY_BIND_DRIVER: - dev_dbg(dev, "%s: group %d binding to driver\n", __func__, - iommu_group_id(group->iommu_group)); - break; - case IOMMU_GROUP_NOTIFY_BOUND_DRIVER: - dev_dbg(dev, "%s: group %d bound to driver %s\n", __func__, - iommu_group_id(group->iommu_group), dev->driver->name); - BUG_ON(vfio_group_nb_verify(group, dev)); - break; - case IOMMU_GROUP_NOTIFY_UNBIND_DRIVER: - dev_dbg(dev, "%s: group %d unbinding from driver %s\n", - __func__, iommu_group_id(group->iommu_group), - dev->driver->name); - break; - } - return NOTIFY_OK; -} - /* * VFIO driver API */
The iommu core and driver core have been enhanced to avoid unsafe driver binding to a live group after iommu_group_set_dma_owner(PRIVATE_USER) has been called. There's no need to register iommu group notifier. This removes the iommu group notifer which contains BUG_ON() and WARN(). The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") allowed all pcieport drivers to be bound with devices while the group is assigned to user space. This is not always safe. For example, The shpchp_core driver relies on the PCI MMIO access for the controller functionality. With its downstream devices assigned to the userspace, the MMIO might be changed through user initiated P2P accesses without any notification. This might break the kernel driver integrity and lead to some unpredictable consequences. As the result, currently we only allow the portdrv driver. For any bridge driver, in order to avoiding default kernel DMA ownership claiming, we should consider: 1) Does the bridge driver use DMA? Calling pci_set_master() or a dma_map_* API is a sure indicate the driver is doing DMA 2) If the bridge driver uses MMIO, is it tolerant to hostile userspace also touching the same MMIO registers via P2P DMA attacks? Conservatively if the driver maps an MMIO region at all, we can say that it fails the test. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> --- drivers/vfio/vfio.c | 147 -------------------------------------------- 1 file changed, 147 deletions(-)