diff mbox series

[v13,07/13] iommu/vt-d: Add SVA domain support

Message ID 20220906124458.46461-8-baolu.lu@linux.intel.com (mailing list archive)
State Superseded
Headers show
Series iommu: SVA and IOPF refactoring | expand

Commit Message

Baolu Lu Sept. 6, 2022, 12:44 p.m. UTC
Add support for SVA domain allocation and provide an SVA-specific
iommu_domain_ops. This implementation is based on the existing SVA
code. Possible cleanup and refactoring are left for incremental
changes later.

The VT-d driver will also need to support setting a DMA domain to a
PASID of device. Current SVA implementation uses different data
structures to track the domain and device PASID relationship. That's
the reason why we need to check the domain type in remove_dev_pasid
callback. Eventually we'll consolidate the data structures and remove
the need of domain type check.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Tony Zhu <tony.zhu@intel.com>
---
 drivers/iommu/intel/iommu.h | 10 ++++++++
 drivers/iommu/intel/iommu.c | 25 ++++++++++++++++++++
 drivers/iommu/intel/svm.c   | 47 +++++++++++++++++++++++++++++++++++++
 3 files changed, 82 insertions(+)

Comments

Jason Gunthorpe Sept. 22, 2022, 3:49 p.m. UTC | #1
On Tue, Sep 06, 2022 at 08:44:52PM +0800, Lu Baolu wrote:
> +static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
> +{
> +	struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
> +	struct iommu_domain *domain;
> +
> +	/* Domain type specific cleanup: */
> +	domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0);
> +	if (domain) {
> +		switch (domain->type) {
> +		case IOMMU_DOMAIN_SVA:
> +			intel_svm_remove_dev_pasid(dev, pasid);
> +			break;
> +		default:
> +			/* should never reach here */
> +			WARN_ON(1);
> +			break;

This is eventually going to need a lot more cleaning up to split out
the PASID from the SVM stuff.

SVA should *only* be a set of predefined handlers (in the core code!)
for the generic PRI mechanism, it shouldn't be entangled deeply into
PASID or the drivers like this.

When we get done with this, the flow should have the core code attach
a SVA domain to a PASID with PRI enabled and the core code should
supply a generic PRI implementation that does the mmu_notifier
stuff.

Also, stuff like this:

				/* We mandate that no page faults may be outstanding
				 * for the PASID when intel_svm_unbind_mm() is called.
				 * If that is not obeyed, subtle errors will happen.
				 * Let's make them less subtle... */

Are going to be problematic for VFIO as well. In a VFIO world the
entire RID and its entire PASID table has to be owned by VFIO and
never shared - so these sequencing issues should be solvable.

But this is all for further series..

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason
Baolu Lu Sept. 23, 2022, 2:21 a.m. UTC | #2
On 2022/9/22 23:49, Jason Gunthorpe wrote:
> On Tue, Sep 06, 2022 at 08:44:52PM +0800, Lu Baolu wrote:
>> +static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
>> +{
>> +	struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
>> +	struct iommu_domain *domain;
>> +
>> +	/* Domain type specific cleanup: */
>> +	domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0);
>> +	if (domain) {
>> +		switch (domain->type) {
>> +		case IOMMU_DOMAIN_SVA:
>> +			intel_svm_remove_dev_pasid(dev, pasid);
>> +			break;
>> +		default:
>> +			/* should never reach here */
>> +			WARN_ON(1);
>> +			break;
> 
> This is eventually going to need a lot more cleaning up to split out
> the PASID from the SVM stuff.
> 
> SVA should *only* be a set of predefined handlers (in the core code!)
> for the generic PRI mechanism, it shouldn't be entangled deeply into
> PASID or the drivers like this.
> 
> When we get done with this, the flow should have the core code attach
> a SVA domain to a PASID with PRI enabled and the core code should
> supply a generic PRI implementation that does the mmu_notifier
> stuff.

Yes. Agreed.

At the beginning of this project, I wanted to consolidate the mm
notifications into the core. However, ARM SMMUv3 and Intel handle the mm
notifications a little differently. Then I decided to do this work
separately from the current series.

> 
> Also, stuff like this:
> 
> 				/* We mandate that no page faults may be outstanding
> 				 * for the PASID when intel_svm_unbind_mm() is called.
> 				 * If that is not obeyed, subtle errors will happen.
> 				 * Let's make them less subtle... */
> 
> Are going to be problematic for VFIO as well. In a VFIO world the
> entire RID and its entire PASID table has to be owned by VFIO and
> never shared - so these sequencing issues should be solvable.
> 
> But this is all for further series..

Yes. All these sound interesting future series.

> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

I'm very appreciated for your time.

Best regards,
baolu
Jason Gunthorpe Sept. 23, 2022, 12:15 p.m. UTC | #3
On Fri, Sep 23, 2022 at 10:21:51AM +0800, Baolu Lu wrote:

> At the beginning of this project, I wanted to consolidate the mm
> notifications into the core. However, ARM SMMUv3 and Intel handle the mm
> notifications a little differently. Then I decided to do this work
> separately from the current series.

It doesn't look really different..

The SVA iommu domain needs two new ops triggered by the notifier:

 - 'stop using the mm, subsitute a blocking domain' tied to release
 - Cache invalidate a range, maybe this is just iotlb_sync_map()

And we could even think about the first as the core code literally
attaching a dummy blocking domain and releasing the SVA domain. There
is no reason we need to have every driver do this tricky bit of
lifecycle management.

Jason
Baolu Lu Sept. 23, 2022, 12:41 p.m. UTC | #4
On 2022/9/23 20:15, Jason Gunthorpe wrote:
> On Fri, Sep 23, 2022 at 10:21:51AM +0800, Baolu Lu wrote:
> 
>> At the beginning of this project, I wanted to consolidate the mm
>> notifications into the core. However, ARM SMMUv3 and Intel handle the mm
>> notifications a little differently. Then I decided to do this work
>> separately from the current series.
> 
> It doesn't look really different..

They are essentially the same, but slightly different. For example, arm
smmuv3 provides .free_notifier, and I don't think it could be merged to
the release callback.

> 
> The SVA iommu domain needs two new ops triggered by the notifier:
> 
>   - 'stop using the mm, subsitute a blocking domain' tied to release
>   - Cache invalidate a range, maybe this is just iotlb_sync_map()
> 
> And we could even think about the first as the core code literally
> attaching a dummy blocking domain and releasing the SVA domain. There
> is no reason we need to have every driver do this tricky bit of
> lifecycle management.

Yes. I have similar ideas. We can further discuss it with the real code
later.

Best regards,
baolu
Jason Gunthorpe Sept. 23, 2022, 1:07 p.m. UTC | #5
On Fri, Sep 23, 2022 at 08:41:56PM +0800, Baolu Lu wrote:
> On 2022/9/23 20:15, Jason Gunthorpe wrote:
> > On Fri, Sep 23, 2022 at 10:21:51AM +0800, Baolu Lu wrote:
> > 
> > > At the beginning of this project, I wanted to consolidate the mm
> > > notifications into the core. However, ARM SMMUv3 and Intel handle the mm
> > > notifications a little differently. Then I decided to do this work
> > > separately from the current series.
> > 
> > It doesn't look really different..
> 
> They are essentially the same, but slightly different. For example, arm
> smmuv3 provides .free_notifier, and I don't think it could be merged to
> the release callback.

free_notifier allows to use mmu_notifier_put() instead of
mmu_notifier_unregister() which avoids a synchronize_rcu() penalty on
teardown. Intel should copy the same design.

Jason
Baolu Lu Sept. 23, 2022, 1:12 p.m. UTC | #6
On 2022/9/23 21:07, Jason Gunthorpe wrote:
> On Fri, Sep 23, 2022 at 08:41:56PM +0800, Baolu Lu wrote:
>> On 2022/9/23 20:15, Jason Gunthorpe wrote:
>>> On Fri, Sep 23, 2022 at 10:21:51AM +0800, Baolu Lu wrote:
>>>
>>>> At the beginning of this project, I wanted to consolidate the mm
>>>> notifications into the core. However, ARM SMMUv3 and Intel handle the mm
>>>> notifications a little differently. Then I decided to do this work
>>>> separately from the current series.
>>> It doesn't look really different..
>> They are essentially the same, but slightly different. For example, arm
>> smmuv3 provides .free_notifier, and I don't think it could be merged to
>> the release callback.
> free_notifier allows to use mmu_notifier_put() instead of
> mmu_notifier_unregister() which avoids a synchronize_rcu() penalty on
> teardown. Intel should copy the same design.

Ah! Thanks for the guide. I will head in this direction.

Best regards,
baolu
diff mbox series

Patch

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 91312aabdd53..8310bc2f5f0d 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -751,6 +751,8 @@  void intel_svm_unbind(struct iommu_sva *handle);
 u32 intel_svm_get_pasid(struct iommu_sva *handle);
 int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
 			    struct iommu_page_response *msg);
+struct iommu_domain *intel_svm_domain_alloc(void);
+void intel_svm_remove_dev_pasid(struct device *dev, ioasid_t pasid);
 
 struct intel_svm_dev {
 	struct list_head list;
@@ -776,6 +778,14 @@  struct intel_svm {
 };
 #else
 static inline void intel_svm_check(struct intel_iommu *iommu) {}
+static inline struct iommu_domain *intel_svm_domain_alloc(void)
+{
+	return NULL;
+}
+
+static inline void intel_svm_remove_dev_pasid(struct device *dev, ioasid_t pasid)
+{
+}
 #endif
 
 #ifdef CONFIG_INTEL_IOMMU_DEBUGFS
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 64d30895a4c8..bca38dd79569 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4156,6 +4156,8 @@  static struct iommu_domain *intel_iommu_domain_alloc(unsigned type)
 		return domain;
 	case IOMMU_DOMAIN_IDENTITY:
 		return &si_domain->domain;
+	case IOMMU_DOMAIN_SVA:
+		return intel_svm_domain_alloc();
 	default:
 		return NULL;
 	}
@@ -4746,6 +4748,28 @@  static void intel_iommu_iotlb_sync_map(struct iommu_domain *domain,
 		__mapping_notify_one(info->iommu, dmar_domain, pfn, pages);
 }
 
+static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
+{
+	struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
+	struct iommu_domain *domain;
+
+	/* Domain type specific cleanup: */
+	domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0);
+	if (domain) {
+		switch (domain->type) {
+		case IOMMU_DOMAIN_SVA:
+			intel_svm_remove_dev_pasid(dev, pasid);
+			break;
+		default:
+			/* should never reach here */
+			WARN_ON(1);
+			break;
+		}
+	}
+
+	intel_pasid_tear_down_entry(iommu, dev, pasid, false);
+}
+
 const struct iommu_ops intel_iommu_ops = {
 	.capable		= intel_iommu_capable,
 	.domain_alloc		= intel_iommu_domain_alloc,
@@ -4758,6 +4782,7 @@  const struct iommu_ops intel_iommu_ops = {
 	.dev_disable_feat	= intel_iommu_dev_disable_feat,
 	.is_attach_deferred	= intel_iommu_is_attach_deferred,
 	.def_domain_type	= device_def_domain_type,
+	.remove_dev_pasid	= intel_iommu_remove_dev_pasid,
 	.pgsize_bitmap		= SZ_4K,
 #ifdef CONFIG_INTEL_IOMMU_SVM
 	.sva_bind		= intel_svm_bind,
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 2420fa5c2360..8fe7cff78356 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -928,3 +928,50 @@  int intel_svm_page_response(struct device *dev,
 	mutex_unlock(&pasid_mutex);
 	return ret;
 }
+
+void intel_svm_remove_dev_pasid(struct device *dev, ioasid_t pasid)
+{
+	mutex_lock(&pasid_mutex);
+	intel_svm_unbind_mm(dev, pasid);
+	mutex_unlock(&pasid_mutex);
+}
+
+static int intel_svm_set_dev_pasid(struct iommu_domain *domain,
+				   struct device *dev, ioasid_t pasid)
+{
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+	struct intel_iommu *iommu = info->iommu;
+	struct mm_struct *mm = domain->mm;
+	struct iommu_sva *sva;
+	int ret = 0;
+
+	mutex_lock(&pasid_mutex);
+	sva = intel_svm_bind_mm(iommu, dev, mm);
+	if (IS_ERR(sva))
+		ret = PTR_ERR(sva);
+	mutex_unlock(&pasid_mutex);
+
+	return ret;
+}
+
+static void intel_svm_domain_free(struct iommu_domain *domain)
+{
+	kfree(to_dmar_domain(domain));
+}
+
+static const struct iommu_domain_ops intel_svm_domain_ops = {
+	.set_dev_pasid		= intel_svm_set_dev_pasid,
+	.free			= intel_svm_domain_free
+};
+
+struct iommu_domain *intel_svm_domain_alloc(void)
+{
+	struct dmar_domain *domain;
+
+	domain = kzalloc(sizeof(*domain), GFP_KERNEL);
+	if (!domain)
+		return NULL;
+	domain->domain.ops = &intel_svm_domain_ops;
+
+	return &domain->domain;
+}