Message ID | 20240812205255.97781-3-smostafa@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix handling of S2 stalls | expand |
On 12/08/2024 9:52 pm, Mostafa Saleh wrote: > Previously, S2 stall was disabled and in case there was an event it > wouldn't be reported on the assumption that it's always pinned by VFIO. > > However, now since we can enable stall, devices that use S2 outside > VFIO should be able to report the stalls similar to S1. > > Also, to keep the old behaviour were S2 events from nested domains were > not reported as they are pinned (from VFIO) add a new flag to track this. > > Signed-off-by: Mostafa Saleh <smostafa@google.com> > --- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 18 +++++++++++++----- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 ++ > 2 files changed, 15 insertions(+), 5 deletions(-) > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > index 8d573d9ca93c..ffa865529d73 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > @@ -1733,6 +1733,7 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt) > u32 sid = FIELD_GET(EVTQ_0_SID, evt[0]); > struct iopf_fault fault_evt = { }; > struct iommu_fault *flt = &fault_evt.fault; > + struct arm_smmu_domain *smmu_domain; > > switch (FIELD_GET(EVTQ_0_ID, evt[0])) { > case EVT_ID_TRANSLATION_FAULT: > @@ -1744,10 +1745,6 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt) > return -EOPNOTSUPP; > } > > - /* Stage-2 is always pinned at the moment */ > - if (evt[1] & EVTQ_1_S2) > - return -EFAULT; > - > if (!(evt[1] & EVTQ_1_STALL)) > return -EOPNOTSUPP; > > @@ -1782,6 +1779,15 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt) > goto out_unlock; > } > > + /* It is guaranteed that smmu_domain exists as EVTQ_1_STALL is checked. */ > + smmu_domain = to_smmu_domain(iommu_get_domain_for_dev(master->dev)); > + > + /* nesting domain is always pinned at the moment */ > + if (smmu_domain->enable_nesting) { Ugh, has the whole enable_nesting method still not gone away already? However, at least for now, isn't this functionally equivalent to just testing !(smmu->features & ARM_SMMU_FEAT_TRANS_S1) anyway? We still won't be able to differentiate a nominally-pinned non-nested VFIO domain from a nominally-stallable non-VFIO domain on S2-only hardware. Thanks, Robin. > + ret = -EINVAL; > + goto out_unlock; > + } > + > iommu_report_device_fault(master->dev, &fault_evt); > out_unlock: > mutex_unlock(&smmu->streams_mutex); > @@ -3373,8 +3379,10 @@ static int arm_smmu_enable_nesting(struct iommu_domain *domain) > mutex_lock(&smmu_domain->init_mutex); > if (smmu_domain->smmu) > ret = -EPERM; > - else > + else { > smmu_domain->stage = ARM_SMMU_DOMAIN_S2; > + smmu_domain->enable_nesting = true; > + } > mutex_unlock(&smmu_domain->init_mutex); > > return ret; > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > index 0dc7ad43c64c..f66efeec2cf8 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > @@ -745,6 +745,8 @@ struct arm_smmu_domain { > spinlock_t devices_lock; > > struct mmu_notifier mmu_notifier; > + > + bool enable_nesting; > }; > > /* The following are exposed for testing purposes. */
Hi Robin, On Tue, Aug 13, 2024 at 12:57:02PM +0100, Robin Murphy wrote: > On 12/08/2024 9:52 pm, Mostafa Saleh wrote: > > Previously, S2 stall was disabled and in case there was an event it > > wouldn't be reported on the assumption that it's always pinned by VFIO. > > > > However, now since we can enable stall, devices that use S2 outside > > VFIO should be able to report the stalls similar to S1. > > > > Also, to keep the old behaviour were S2 events from nested domains were > > not reported as they are pinned (from VFIO) add a new flag to track this. > > > > Signed-off-by: Mostafa Saleh <smostafa@google.com> > > --- > > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 18 +++++++++++++----- > > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 ++ > > 2 files changed, 15 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > index 8d573d9ca93c..ffa865529d73 100644 > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > @@ -1733,6 +1733,7 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt) > > u32 sid = FIELD_GET(EVTQ_0_SID, evt[0]); > > struct iopf_fault fault_evt = { }; > > struct iommu_fault *flt = &fault_evt.fault; > > + struct arm_smmu_domain *smmu_domain; > > switch (FIELD_GET(EVTQ_0_ID, evt[0])) { > > case EVT_ID_TRANSLATION_FAULT: > > @@ -1744,10 +1745,6 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt) > > return -EOPNOTSUPP; > > } > > - /* Stage-2 is always pinned at the moment */ > > - if (evt[1] & EVTQ_1_S2) > > - return -EFAULT; > > - > > if (!(evt[1] & EVTQ_1_STALL)) > > return -EOPNOTSUPP; > > @@ -1782,6 +1779,15 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt) > > goto out_unlock; > > } > > + /* It is guaranteed that smmu_domain exists as EVTQ_1_STALL is checked. */ > > + smmu_domain = to_smmu_domain(iommu_get_domain_for_dev(master->dev)); > > + > > + /* nesting domain is always pinned at the moment */ > > + if (smmu_domain->enable_nesting) { > > Ugh, has the whole enable_nesting method still not gone away already? It should go away with Jason latest nesting patches: https://lore.kernel.org/linux-iommu/0-v1-54e734311a7f+14f72-smmuv3_nesting_jgg@nvidia.com/T/#mc5558cb88b10fcdc4b91076cf813123cac75f11d I can just ignore that here, and only remove the S2 check, and as VFIO anyway doesn't register a handler it shouldn't really matter. So, it's one line change, I can also squash with the previous patch, please let me know what you think. Thanks, Mostafa > > However, at least for now, isn't this functionally equivalent to just > testing !(smmu->features & ARM_SMMU_FEAT_TRANS_S1) anyway? We still won't be > able to differentiate a nominally-pinned non-nested VFIO domain from a > nominally-stallable non-VFIO domain on S2-only hardware. > > Thanks, > Robin. > > > + ret = -EINVAL; > > + goto out_unlock; > > + } > > + > > iommu_report_device_fault(master->dev, &fault_evt); > > out_unlock: > > mutex_unlock(&smmu->streams_mutex); > > @@ -3373,8 +3379,10 @@ static int arm_smmu_enable_nesting(struct iommu_domain *domain) > > mutex_lock(&smmu_domain->init_mutex); > > if (smmu_domain->smmu) > > ret = -EPERM; > > - else > > + else { > > smmu_domain->stage = ARM_SMMU_DOMAIN_S2; > > + smmu_domain->enable_nesting = true; > > + } > > mutex_unlock(&smmu_domain->init_mutex); > > return ret; > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > > index 0dc7ad43c64c..f66efeec2cf8 100644 > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > > @@ -745,6 +745,8 @@ struct arm_smmu_domain { > > spinlock_t devices_lock; > > struct mmu_notifier mmu_notifier; > > + > > + bool enable_nesting; > > }; > > /* The following are exposed for testing purposes. */
On Mon, Aug 12, 2024 at 08:52:55PM +0000, Mostafa Saleh wrote: > Previously, S2 stall was disabled and in case there was an event it > wouldn't be reported on the assumption that it's always pinned by VFIO. > > However, now since we can enable stall, devices that use S2 outside > VFIO should be able to report the stalls similar to S1. > > Also, to keep the old behaviour were S2 events from nested domains were > not reported as they are pinned (from VFIO) add a new flag to track this. I'm not entirely clear on every detail of this stall feature... But from a core perspective device fault reporting should only ever be turned on in the STE/CD if the attached domain->iopf_handler is not NULL. If it is NULL then any access to a non-present address should trigger some kind of device error failure automatically. This is new core functionality since this code would have been originally written. Now it is all handled transparently by the core code. The driver should just deliver all fault events to iommu_report_device_fault() and it will sort it out. > Signed-off-by: Mostafa Saleh <smostafa@google.com> > --- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 18 +++++++++++++----- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 ++ > 2 files changed, 15 insertions(+), 5 deletions(-) > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > index 8d573d9ca93c..ffa865529d73 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > @@ -1733,6 +1733,7 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt) > u32 sid = FIELD_GET(EVTQ_0_SID, evt[0]); > struct iopf_fault fault_evt = { }; > struct iommu_fault *flt = &fault_evt.fault; > + struct arm_smmu_domain *smmu_domain; > > switch (FIELD_GET(EVTQ_0_ID, evt[0])) { > case EVT_ID_TRANSLATION_FAULT: > @@ -1744,10 +1745,6 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt) > return -EOPNOTSUPP; > } > > - /* Stage-2 is always pinned at the moment */ > - if (evt[1] & EVTQ_1_S2) > - return -EFAULT; > - This makes sense at first blush since the domain mode shouldn't define if events should be processed or not, and the events should be failed anyhow right? If someone did turn on fault reporting in the STE then it should always be processed to conclusion. > if (!(evt[1] & EVTQ_1_STALL)) > return -EOPNOTSUPP; > > @@ -1782,6 +1779,15 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt) > goto out_unlock; > } > > + /* It is guaranteed that smmu_domain exists as EVTQ_1_STALL is checked. */ > + smmu_domain = to_smmu_domain(iommu_get_domain_for_dev(master->dev)); Strongly discouraging drivers from calling iommu_get_domain_for_dev() in async paths like this. The locking is tricky and the core code does... > + /* nesting domain is always pinned at the moment */ > + if (smmu_domain->enable_nesting) { This is not necessary - a nesting domain will never have an iopf_handler set. It immediately calls iommu_report_device_fault() which will reject it because of: if (!group->attach_handle->domain->iopf_handler) goto err_abort; Which after the rework will end up in find_fault_handler() at the top of the function: https://lore.kernel.org/r/ZrTNGepJXbmfuKBK@google.com So I think these parts are not necessary. Though arguably we should be rejecting domains with iopf_handler set in some of the attach calls.. Jason
On Tue, Aug 13, 2024 at 02:51:55PM -0300, Jason Gunthorpe wrote: > On Mon, Aug 12, 2024 at 08:52:55PM +0000, Mostafa Saleh wrote: > > Previously, S2 stall was disabled and in case there was an event it > > wouldn't be reported on the assumption that it's always pinned by VFIO. > > > > However, now since we can enable stall, devices that use S2 outside > > VFIO should be able to report the stalls similar to S1. > > > > Also, to keep the old behaviour were S2 events from nested domains were > > not reported as they are pinned (from VFIO) add a new flag to track this. > > I'm not entirely clear on every detail of this stall feature... > > But from a core perspective device fault reporting should only ever be > turned on in the STE/CD if the attached domain->iopf_handler is not NULL. > > If it is NULL then any access to a non-present address should trigger > some kind of device error failure automatically. > > This is new core functionality since this code would have been > originally written. Now it is all handled transparently by the core > code. The driver should just deliver all fault events to > iommu_report_device_fault() and it will sort it out. > I agree, as there is no iopf handler in this case, we should just report it to the iommu code and it will reject it instead of tracking this in the driver. And as all the “enable_nesting” stuff is going away soon anyway it’s not worth adding extra code for it. (plus we shouldn’t really assume the intention of caller) > > Signed-off-by: Mostafa Saleh <smostafa@google.com> > > --- > > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 18 +++++++++++++----- > > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 ++ > > 2 files changed, 15 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > index 8d573d9ca93c..ffa865529d73 100644 > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > @@ -1733,6 +1733,7 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt) > > u32 sid = FIELD_GET(EVTQ_0_SID, evt[0]); > > struct iopf_fault fault_evt = { }; > > struct iommu_fault *flt = &fault_evt.fault; > > + struct arm_smmu_domain *smmu_domain; > > > > switch (FIELD_GET(EVTQ_0_ID, evt[0])) { > > case EVT_ID_TRANSLATION_FAULT: > > @@ -1744,10 +1745,6 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt) > > return -EOPNOTSUPP; > > } > > > > - /* Stage-2 is always pinned at the moment */ > > - if (evt[1] & EVTQ_1_S2) > > - return -EFAULT; > > - > > This makes sense at first blush since the domain mode shouldn't define > if events should be processed or not, and the events should be failed > anyhow right? If someone did turn on fault reporting in the STE then > it should always be processed to conclusion. > > > if (!(evt[1] & EVTQ_1_STALL)) > > return -EOPNOTSUPP; > > > > @@ -1782,6 +1779,15 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt) > > goto out_unlock; > > } > > > > + /* It is guaranteed that smmu_domain exists as EVTQ_1_STALL is checked. */ > > + smmu_domain = to_smmu_domain(iommu_get_domain_for_dev(master->dev)); > > Strongly discouraging drivers from calling iommu_get_domain_for_dev() > in async paths like this. The locking is tricky and the core code does... > > > + /* nesting domain is always pinned at the moment */ > > + if (smmu_domain->enable_nesting) { > > This is not necessary - a nesting domain will never have an > iopf_handler set. > > It immediately calls iommu_report_device_fault() which will reject it > because of: > > if (!group->attach_handle->domain->iopf_handler) > goto err_abort; > > Which after the rework will end up in find_fault_handler() at the top > of the function: > > https://lore.kernel.org/r/ZrTNGepJXbmfuKBK@google.com > > So I think these parts are not necessary. Yes, I will remove this patch and squash the removal of ignoring S2 events. Thanks, Mostafa > > Though arguably we should be rejecting domains with iopf_handler set > in some of the attach calls.. > > Jason
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 8d573d9ca93c..ffa865529d73 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -1733,6 +1733,7 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt) u32 sid = FIELD_GET(EVTQ_0_SID, evt[0]); struct iopf_fault fault_evt = { }; struct iommu_fault *flt = &fault_evt.fault; + struct arm_smmu_domain *smmu_domain; switch (FIELD_GET(EVTQ_0_ID, evt[0])) { case EVT_ID_TRANSLATION_FAULT: @@ -1744,10 +1745,6 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt) return -EOPNOTSUPP; } - /* Stage-2 is always pinned at the moment */ - if (evt[1] & EVTQ_1_S2) - return -EFAULT; - if (!(evt[1] & EVTQ_1_STALL)) return -EOPNOTSUPP; @@ -1782,6 +1779,15 @@ static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt) goto out_unlock; } + /* It is guaranteed that smmu_domain exists as EVTQ_1_STALL is checked. */ + smmu_domain = to_smmu_domain(iommu_get_domain_for_dev(master->dev)); + + /* nesting domain is always pinned at the moment */ + if (smmu_domain->enable_nesting) { + ret = -EINVAL; + goto out_unlock; + } + iommu_report_device_fault(master->dev, &fault_evt); out_unlock: mutex_unlock(&smmu->streams_mutex); @@ -3373,8 +3379,10 @@ static int arm_smmu_enable_nesting(struct iommu_domain *domain) mutex_lock(&smmu_domain->init_mutex); if (smmu_domain->smmu) ret = -EPERM; - else + else { smmu_domain->stage = ARM_SMMU_DOMAIN_S2; + smmu_domain->enable_nesting = true; + } mutex_unlock(&smmu_domain->init_mutex); return ret; diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 0dc7ad43c64c..f66efeec2cf8 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -745,6 +745,8 @@ struct arm_smmu_domain { spinlock_t devices_lock; struct mmu_notifier mmu_notifier; + + bool enable_nesting; }; /* The following are exposed for testing purposes. */
Previously, S2 stall was disabled and in case there was an event it wouldn't be reported on the assumption that it's always pinned by VFIO. However, now since we can enable stall, devices that use S2 outside VFIO should be able to report the stalls similar to S1. Also, to keep the old behaviour were S2 events from nested domains were not reported as they are pinned (from VFIO) add a new flag to track this. Signed-off-by: Mostafa Saleh <smostafa@google.com> --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 18 +++++++++++++----- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 ++ 2 files changed, 15 insertions(+), 5 deletions(-)