Message ID | 0-v1-7612f88c19f5+2f21-iommufd_alloc_jgg@nvidia.com (mailing list archive) |
---|---|
Headers | show |
Series | Add iommufd physical device operations for replace and alloc hwpt | expand |
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Saturday, February 25, 2023 8:28 AM > [...] > The implementation is complicated because we have to introduce some > per-iommu_group memory in iommufd and redo how we think about multi- > device > groups to be more explicit. This solves all the locking problems in the > prior attempts. > Now think about the pasid case. pasid attach is managed as a device operation today: iommu_attach_device_pasid() Following it naturally we'll have a pasid array per iommufd_device to track attached HWPT per pasid. But internally there is only one pasid table per iommu group. i.e. same story as RID attach that once dev1 replaces hwpt on pasid1 then it takes effect on all other devices in the same group. Then confusion comes. If we must have explicit group object in iommufd to manage domain replacement per rid, then do we need the same explicit mechanism e.g. tracking pasid attached hwpt in iommufd_group instead of in iommufd_device? and have a iommu_attach_group_pasid() API. In contrast if it's fine to continue managing pasid attach/replace per device, does it imply there is another way of solving it cleanly w/o explicit group object in the rid case? Thanks Kevin
On Tue, Mar 07, 2023 at 08:42:06AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@nvidia.com> > > Sent: Saturday, February 25, 2023 8:28 AM > > > [...] > > The implementation is complicated because we have to introduce some > > per-iommu_group memory in iommufd and redo how we think about multi- > > device > > groups to be more explicit. This solves all the locking problems in the > > prior attempts. > > > > Now think about the pasid case. > > pasid attach is managed as a device operation today: > iommu_attach_device_pasid() > > Following it naturally we'll have a pasid array per iommufd_device to > track attached HWPT per pasid. > > But internally there is only one pasid table per iommu group. i.e. same > story as RID attach that once dev1 replaces hwpt on pasid1 then it takes > effect on all other devices in the same group. IMHO I can't belive that any actual systems that support PASID have a RID aliasing problem too. I think we should fix the iommu core to make PASID per-device and require systems that have a RID aliasing problem to block PASID. This is a bigger picture, if drivers have to optionally share their PASID tables with other drivers then we can't have per-driver PASID allocators at all either. > Then confusion comes. If we must have explicit group object in iommufd > to manage domain replacement per rid, then do we need the same > explicit mechanism e.g. tracking pasid attached hwpt in iommufd_group > instead of in iommufd_device? and have a iommu_attach_group_pasid() API. If we make PASID per-group then yes But no actual HW models per-group PASID to the VM so this is a complete API disaster for vIOMMU. Better not to do it. Jason
On 3/7/23 8:46 PM, Jason Gunthorpe wrote: > On Tue, Mar 07, 2023 at 08:42:06AM +0000, Tian, Kevin wrote: >>> From: Jason Gunthorpe<jgg@nvidia.com> >>> Sent: Saturday, February 25, 2023 8:28 AM >>> >> [...] >>> The implementation is complicated because we have to introduce some >>> per-iommu_group memory in iommufd and redo how we think about multi- >>> device >>> groups to be more explicit. This solves all the locking problems in the >>> prior attempts. >>> >> Now think about the pasid case. >> >> pasid attach is managed as a device operation today: >> iommu_attach_device_pasid() >> >> Following it naturally we'll have a pasid array per iommufd_device to >> track attached HWPT per pasid. >> >> But internally there is only one pasid table per iommu group. i.e. same >> story as RID attach that once dev1 replaces hwpt on pasid1 then it takes >> effect on all other devices in the same group. > IMHO I can't belive that any actual systems that support PASID have a > RID aliasing problem too. > > I think we should fix the iommu core to make PASID per-device and > require systems that have a RID aliasing problem to block PASID. > > This is a bigger picture, if drivers have to optionally share their > PASID tables with other drivers then we can't have per-driver PASID > allocators at all either. This is actually required in PCI and IOMMU core. pci_enable_pasid() requires full ACS support on device's upstream path: if (!pci_acs_path_enabled(pdev, NULL, PCI_ACS_RR | PCI_ACS_UF)) return -EINVAL; and, for such PCI topology, iommu core always allocates an exclusive iommu group. The only place where seems to be a little messy is, static int __iommu_set_group_pasid(struct iommu_domain *domain, struct iommu_group *group, ioasid_t pasid) { struct group_device *device; int ret = 0; list_for_each_entry(device, &group->devices, list) { ret = domain->ops->set_dev_pasid(domain, device->dev, pasid); if (ret) break; } return ret; } Perhaps we need a check on singleton group? Best regards, baolu
> From: Baolu Lu <baolu.lu@linux.intel.com> > Sent: Wednesday, March 8, 2023 10:08 AM > > On 3/7/23 8:46 PM, Jason Gunthorpe wrote: > > On Tue, Mar 07, 2023 at 08:42:06AM +0000, Tian, Kevin wrote: > >>> From: Jason Gunthorpe<jgg@nvidia.com> > >>> Sent: Saturday, February 25, 2023 8:28 AM > >>> > >> [...] > >>> The implementation is complicated because we have to introduce some > >>> per-iommu_group memory in iommufd and redo how we think about > multi- > >>> device > >>> groups to be more explicit. This solves all the locking problems in the > >>> prior attempts. > >>> > >> Now think about the pasid case. > >> > >> pasid attach is managed as a device operation today: > >> iommu_attach_device_pasid() > >> > >> Following it naturally we'll have a pasid array per iommufd_device to > >> track attached HWPT per pasid. > >> > >> But internally there is only one pasid table per iommu group. i.e. same > >> story as RID attach that once dev1 replaces hwpt on pasid1 then it takes > >> effect on all other devices in the same group. > > IMHO I can't belive that any actual systems that support PASID have a > > RID aliasing problem too. > > > > I think we should fix the iommu core to make PASID per-device and > > require systems that have a RID aliasing problem to block PASID. > > > > This is a bigger picture, if drivers have to optionally share their > > PASID tables with other drivers then we can't have per-driver PASID > > allocators at all either. > > This is actually required in PCI and IOMMU core. pci_enable_pasid() > requires full ACS support on device's upstream path: > > if (!pci_acs_path_enabled(pdev, NULL, PCI_ACS_RR | PCI_ACS_UF)) > return -EINVAL; > > and, for such PCI topology, iommu core always allocates an exclusive > iommu group. > > The only place where seems to be a little messy is, > > static int __iommu_set_group_pasid(struct iommu_domain *domain, > struct iommu_group *group, ioasid_t > pasid) > { > struct group_device *device; > int ret = 0; > > list_for_each_entry(device, &group->devices, list) { > ret = domain->ops->set_dev_pasid(domain, device->dev, > pasid); > if (ret) > break; > } > > return ret; > } > yes, this is exactly where my confusion comes. I thought we ever agreed that PASID usage won't be supported on multi-devices group. Then I saw about code (but didn't catch the implication of pci acs) which led me to think about the group mess. sticking to per-device pasid would certainly simplify the logic in iommufd.
On Wed, Mar 08, 2023 at 07:38:38AM +0000, Tian, Kevin wrote: > > Perhaps we need a check on singleton group? > > > > or move the xarray to device? Storing it in group while adding a singleton > restriction only causes confusion. Makes sense Jason