Message ID | 20230306163138.587484-9-fenghua.yu@intel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Enable DSA 2.0 Event Log and completion record faulting features | expand |
On 3/7/23 12:31 AM, Fenghua Yu wrote: > Define and export iommu_access_remote_vm() to allow IOMMU related > drivers to access user address space by PASID. > > The IDXD driver would like to use it to write the user's completion > record that the hardware device is not able to write to due to user > page fault. I don't quite follow here. Isn't I/O page fault already supported? Best regards, baolu
Hi Fenghua, On Mon, Mar 06, 2023 at 08:31:30AM -0800, Fenghua Yu wrote: > Define and export iommu_access_remote_vm() to allow IOMMU related > drivers to access user address space by PASID. > > The IDXD driver would like to use it to write the user's completion > record that the hardware device is not able to write to due to user > page fault. > > Without the API, it's complex for IDXD driver to copy completion record > to a process' fault address for two reasons: > 1. access_remote_vm() is not exported and shouldn't be exported for > drivers because drivers may easily cause mm reference issue. > 2. user frees fault address pages to trigger fault by IDXD device. > > The driver has to call iommu_sva_find(), kthread_use_mm(), re-implement > majority of access_remote_vm() etc to access remote vm. > > This IOMMU specific API hides these details and provides a clean interface > for idxd driver and potentially other IOMMU related drivers. > > Suggested-by: Alistair Popple <apopple@nvidia.com> > Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> > Cc: Joerg Roedel <joro@8bytes.org> > Cc: Will Deacon <will@kernel.org> > Cc: Robin Murphy <robin.murphy@arm.com> > Cc: Alistair Popple <apopple@nvidia.com> > Cc: Lorenzo Stoakes <lstoakes@gmail.com> > Cc: Christoph Hellwig <hch@infradead.org> > Cc: iommu@lists.linux.dev > --- > v2: > - Define and export iommu_access_remote_vm() for IDXD driver to write > completion record to user address space. This change removes > patch 8 and 9 in v1 (Alistair Popple) > > drivers/iommu/iommu-sva.c | 35 +++++++++++++++++++++++++++++++++++ > include/linux/iommu.h | 9 +++++++++ > 2 files changed, 44 insertions(+) > > diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c > index 24bf9b2b58aa..1d7a0aee58f7 100644 > --- a/drivers/iommu/iommu-sva.c > +++ b/drivers/iommu/iommu-sva.c > @@ -71,6 +71,41 @@ struct mm_struct *iommu_sva_find(ioasid_t pasid) > } > EXPORT_SYMBOL_GPL(iommu_sva_find); > > +/** > + * iommu_access_remote_vm - access another process' address space by PASID > + * @pasid: Process Address Space ID assigned to the mm > + * @addr: start address to access > + * @buf: source or destination buffer > + * @len: number of bytes to transfer > + * @gup_flags: flags modifying lookup behaviour > + * > + * Another process' address space is found by PASID. A reference on @mm > + * is taken and released inside the function. > + * > + * Return: number of bytes copied from source to destination. > + */ > +int iommu_access_remote_vm(ioasid_t pasid, unsigned long addr, void *buf, > + int len, unsigned int gup_flags) > +{ > + struct mm_struct *mm; > + int copied; > + > + mm = iommu_sva_find(pasid); The ability to find a mm by PASID is being removed, see https://lore.kernel.org/linux-iommu/20230301235646.2692846-4-jacob.jun.pan@linux.intel.com/ Thanks, Jean > + if (IS_ERR_OR_NULL(mm)) > + return 0; > + > + /* > + * A reference on @mm has been held by mmget_not_zero() > + * during iommu_sva_find(). > + */ > + copied = access_remote_vm(mm, addr, buf, len, gup_flags); > + /* The reference is released. */ > + mmput(mm); > + > + return copied; > +} > +EXPORT_SYMBOL_GPL(iommu_access_remote_vm); > + > /** > * iommu_sva_bind_device() - Bind a process address space to a device > * @dev: the device > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > index 6595454d4f48..414a46a53799 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -1177,6 +1177,8 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, > struct mm_struct *mm); > void iommu_sva_unbind_device(struct iommu_sva *handle); > u32 iommu_sva_get_pasid(struct iommu_sva *handle); > +int iommu_access_remote_vm(ioasid_t pasid, unsigned long addr, void *buf, > + int len, unsigned int gup_flags); > #else > static inline struct iommu_sva * > iommu_sva_bind_device(struct device *dev, struct mm_struct *mm) > @@ -1192,6 +1194,13 @@ static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle) > { > return IOMMU_PASID_INVALID; > } > + > +static inline int iommu_access_remote_vm(ioasid_t pasid, unsigned long addr, > + void *buf, int len, > + unsigned int gup_flags) > +{ > + return 0; > +} > #endif /* CONFIG_IOMMU_SVA */ > > #endif /* __LINUX_IOMMU_H */ > -- > 2.37.1 > >
Hi, Jean, On 3/7/23 00:40, Jean-Philippe Brucker wrote: > Hi Fenghua, > > On Mon, Mar 06, 2023 at 08:31:30AM -0800, Fenghua Yu wrote: >> Define and export iommu_access_remote_vm() to allow IOMMU related >> drivers to access user address space by PASID. >> >> The IDXD driver would like to use it to write the user's completion >> record that the hardware device is not able to write to due to user >> page fault. >> >> Without the API, it's complex for IDXD driver to copy completion record >> to a process' fault address for two reasons: >> 1. access_remote_vm() is not exported and shouldn't be exported for >> drivers because drivers may easily cause mm reference issue. >> 2. user frees fault address pages to trigger fault by IDXD device. >> >> The driver has to call iommu_sva_find(), kthread_use_mm(), re-implement >> majority of access_remote_vm() etc to access remote vm. >> >> This IOMMU specific API hides these details and provides a clean interface >> for idxd driver and potentially other IOMMU related drivers. >> >> Suggested-by: Alistair Popple <apopple@nvidia.com> >> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> >> Cc: Joerg Roedel <joro@8bytes.org> >> Cc: Will Deacon <will@kernel.org> >> Cc: Robin Murphy <robin.murphy@arm.com> >> Cc: Alistair Popple <apopple@nvidia.com> >> Cc: Lorenzo Stoakes <lstoakes@gmail.com> >> Cc: Christoph Hellwig <hch@infradead.org> >> Cc: iommu@lists.linux.dev >> --- >> v2: >> - Define and export iommu_access_remote_vm() for IDXD driver to write >> completion record to user address space. This change removes >> patch 8 and 9 in v1 (Alistair Popple) >> >> drivers/iommu/iommu-sva.c | 35 +++++++++++++++++++++++++++++++++++ >> include/linux/iommu.h | 9 +++++++++ >> 2 files changed, 44 insertions(+) >> >> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c >> index 24bf9b2b58aa..1d7a0aee58f7 100644 >> --- a/drivers/iommu/iommu-sva.c >> +++ b/drivers/iommu/iommu-sva.c >> @@ -71,6 +71,41 @@ struct mm_struct *iommu_sva_find(ioasid_t pasid) >> } >> EXPORT_SYMBOL_GPL(iommu_sva_find); >> >> +/** >> + * iommu_access_remote_vm - access another process' address space by PASID >> + * @pasid: Process Address Space ID assigned to the mm >> + * @addr: start address to access >> + * @buf: source or destination buffer >> + * @len: number of bytes to transfer >> + * @gup_flags: flags modifying lookup behaviour >> + * >> + * Another process' address space is found by PASID. A reference on @mm >> + * is taken and released inside the function. >> + * >> + * Return: number of bytes copied from source to destination. >> + */ >> +int iommu_access_remote_vm(ioasid_t pasid, unsigned long addr, void *buf, >> + int len, unsigned int gup_flags) >> +{ >> + struct mm_struct *mm; >> + int copied; >> + >> + mm = iommu_sva_find(pasid); > > The ability to find a mm by PASID is being removed, see > https://lore.kernel.org/linux-iommu/20230301235646.2692846-4-jacob.jun.pan@linux.intel.com/ > Thank you very much for pointing out this. I talked to Jacob just now. He will keep iommu_sva_find() function in his next version because this patch is still using the function. He agrees that I can still call iommu_sva_find() in this patch. -Fenghua
Hi, Baolu, On 3/6/23 17:41, Baolu Lu wrote: > On 3/7/23 12:31 AM, Fenghua Yu wrote: >> Define and export iommu_access_remote_vm() to allow IOMMU related >> drivers to access user address space by PASID. >> >> The IDXD driver would like to use it to write the user's completion >> record that the hardware device is not able to write to due to user >> page fault. > > I don't quite follow here. Isn't I/O page fault already supported? The following patch 9 in this series explains in details why IDXD device cannot use page fault to write to user memory: https://lore.kernel.org/dmaengine/20230306163138.587484-10-fenghua.yu@intel.com/ "DSA supports page fault handling through PRS. However, the DMA engine that's processing the descriptor is blocked until the PRS response is received. Other workqueues sharing the engine are also blocked. Page fault handing by the driver with PRS disabled can be used to mitigate the stalling. With PRS disabled while ATS remain enabled, DSA handles page faults on a completion record by reporting an event in the event log. In this instance, the descriptor is completed and the event log contains the completion record address and the contents of the completion record." That's why IDXD driver needs this IOMMU's helper iommu_access_remote_vm() to copy the completion record from event log buffer to user space. Thanks. -Fenghua
On 3/8/23 1:55 AM, Fenghua Yu wrote: > Hi, Baolu, > > On 3/6/23 17:41, Baolu Lu wrote: >> On 3/7/23 12:31 AM, Fenghua Yu wrote: >>> Define and export iommu_access_remote_vm() to allow IOMMU related >>> drivers to access user address space by PASID. >>> >>> The IDXD driver would like to use it to write the user's completion >>> record that the hardware device is not able to write to due to user >>> page fault. >> >> I don't quite follow here. Isn't I/O page fault already supported? > > The following patch 9 in this series explains in details why IDXD device > cannot use page fault to write to user memory: > https://lore.kernel.org/dmaengine/20230306163138.587484-10-fenghua.yu@intel.com/ > > "DSA supports page fault handling through PRS. However, the DMA engine > that's processing the descriptor is blocked until the PRS response is > received. Other workqueues sharing the engine are also blocked. > Page fault handing by the driver with PRS disabled can be used to > mitigate the stalling. Ah! Get your point now. Thanks for the explanation. > > With PRS disabled while ATS remain enabled, DSA handles page faults on > a completion record by reporting an event in the event log. In this > instance, the descriptor is completed and the event log contains the > completion record address and the contents of the completion record." > > That's why IDXD driver needs this IOMMU's helper > iommu_access_remote_vm() to copy the completion record from event log > buffer to user space. > > Thanks. > > -Fenghua Best regards, baolu
Hi, Jean and Jacob, On 3/7/23 08:33, Fenghua Yu wrote: > Hi, Jean, > > On 3/7/23 00:40, Jean-Philippe Brucker wrote: >> Hi Fenghua, >> >> On Mon, Mar 06, 2023 at 08:31:30AM -0800, Fenghua Yu wrote: >>> Define and export iommu_access_remote_vm() to allow IOMMU related >>> drivers to access user address space by PASID. >>> >>> The IDXD driver would like to use it to write the user's completion >>> record that the hardware device is not able to write to due to user >>> page fault. >>> >>> Without the API, it's complex for IDXD driver to copy completion record >>> to a process' fault address for two reasons: >>> 1. access_remote_vm() is not exported and shouldn't be exported for >>> drivers because drivers may easily cause mm reference issue. >>> 2. user frees fault address pages to trigger fault by IDXD device. >>> >>> The driver has to call iommu_sva_find(), kthread_use_mm(), re-implement >>> majority of access_remote_vm() etc to access remote vm. >>> >>> This IOMMU specific API hides these details and provides a clean >>> interface >>> for idxd driver and potentially other IOMMU related drivers. >>> >>> Suggested-by: Alistair Popple <apopple@nvidia.com> >>> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> >>> Cc: Joerg Roedel <joro@8bytes.org> >>> Cc: Will Deacon <will@kernel.org> >>> Cc: Robin Murphy <robin.murphy@arm.com> >>> Cc: Alistair Popple <apopple@nvidia.com> >>> Cc: Lorenzo Stoakes <lstoakes@gmail.com> >>> Cc: Christoph Hellwig <hch@infradead.org> >>> Cc: iommu@lists.linux.dev >>> --- >>> v2: >>> - Define and export iommu_access_remote_vm() for IDXD driver to write >>> completion record to user address space. This change removes >>> patch 8 and 9 in v1 (Alistair Popple) >>> >>> drivers/iommu/iommu-sva.c | 35 +++++++++++++++++++++++++++++++++++ >>> include/linux/iommu.h | 9 +++++++++ >>> 2 files changed, 44 insertions(+) >>> >>> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c >>> index 24bf9b2b58aa..1d7a0aee58f7 100644 >>> --- a/drivers/iommu/iommu-sva.c >>> +++ b/drivers/iommu/iommu-sva.c >>> @@ -71,6 +71,41 @@ struct mm_struct *iommu_sva_find(ioasid_t pasid) >>> } >>> EXPORT_SYMBOL_GPL(iommu_sva_find); >>> +/** >>> + * iommu_access_remote_vm - access another process' address space by >>> PASID >>> + * @pasid: Process Address Space ID assigned to the mm >>> + * @addr: start address to access >>> + * @buf: source or destination buffer >>> + * @len: number of bytes to transfer >>> + * @gup_flags: flags modifying lookup behaviour >>> + * >>> + * Another process' address space is found by PASID. A reference on @mm >>> + * is taken and released inside the function. >>> + * >>> + * Return: number of bytes copied from source to destination. >>> + */ >>> +int iommu_access_remote_vm(ioasid_t pasid, unsigned long addr, void >>> *buf, >>> + int len, unsigned int gup_flags) >>> +{ >>> + struct mm_struct *mm; >>> + int copied; >>> + >>> + mm = iommu_sva_find(pasid); >> >> The ability to find a mm by PASID is being removed, see >> https://lore.kernel.org/linux-iommu/20230301235646.2692846-4-jacob.jun.pan@linux.intel.com/ >> >> > > Thank you very much for pointing out this. > > I talked to Jacob just now. He will keep iommu_sva_find() function > in his next version because this patch is still using the function. He > agrees that I can still call iommu_sva_find() in this patch. Further comment from Jason confirms that iommu_sva_find() will be removed (https://lore.kernel.org/lkml/ZAjSsm4%2FPDRqViwa@nvidia.com/). So cannot call iommu_sva_find() any more. Will maintain mm and find mm from PASID inside IDXD driver. And will implement accessing the remote mm inside IDXD driver although the implementation will have duplicate code as access_remote_vm(). Next version will only change IDXD driver code. There won't be IOMMU code change. Thanks. -Fenghua
On Tue, Mar 07, 2023 at 09:55:28AM -0800, Fenghua Yu wrote: > > > > I don't quite follow here. Isn't I/O page fault already supported? > > The following patch 9 in this series explains in details why IDXD device > cannot use page fault to write to user memory: https://lore.kernel.org/dmaengine/20230306163138.587484-10-fenghua.yu@intel.com/ > > "DSA supports page fault handling through PRS. However, the DMA engine > that's processing the descriptor is blocked until the PRS response is > received. Other workqueues sharing the engine are also blocked. > Page fault handing by the driver with PRS disabled can be used to > mitigate the stalling. > > With PRS disabled while ATS remain enabled, DSA handles page faults on > a completion record by reporting an event in the event log. In this > instance, the descriptor is completed and the event log contains the > completion record address and the contents of the completion record." This seems like a completely broken I/O model, and I don't think Linux should support this model when it requires operations like access_remote_vm.
Hi, Christoph, On 3/20/23 06:35, Christoph Hellwig wrote: > On Tue, Mar 07, 2023 at 09:55:28AM -0800, Fenghua Yu wrote: >>> >>> I don't quite follow here. Isn't I/O page fault already supported? >> >> The following patch 9 in this series explains in details why IDXD device >> cannot use page fault to write to user memory: https://lore.kernel.org/dmaengine/20230306163138.587484-10-fenghua.yu@intel.com/ >> >> "DSA supports page fault handling through PRS. However, the DMA engine >> that's processing the descriptor is blocked until the PRS response is >> received. Other workqueues sharing the engine are also blocked. >> Page fault handing by the driver with PRS disabled can be used to >> mitigate the stalling. >> >> With PRS disabled while ATS remain enabled, DSA handles page faults on >> a completion record by reporting an event in the event log. In this >> instance, the descriptor is completed and the event log contains the >> completion record address and the contents of the completion record." > > This seems like a completely broken I/O model, and I don't think Linux > should support this model when it requires operations like > access_remote_vm. This patch set have two parts: 1. Basic event log support and PRS disabling knob. 2. Completion record page fault fixup part. The current patch is the major patch in this part. It tries to fix up completion record page fault from user space. Since the fix up in part 2 is debatable and part 1 can be sent out independently, how about we send out the parts separately? In the new part 1, simply warn on completion record page fault and don't try to fix it up. Usually a process executes ENQCMD instruction and then keeps polling completion record periodically. So the completion record will be likely backed by page and won't generate page fault. If page fault is really an issue, sysadmin can enable PRS (which is enabled by default anyway) and let PRS to handle page fault. Then in the next step, we will send out a new part 2 to eliminate completion record page fault. One proposal is to mmap() kernel allocated completion record area. So there won't be completion record page fault and fix up(no access_remote_vm() of course). Is this OK for you? Thank you very much for your comment! -Fenghua
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c index 24bf9b2b58aa..1d7a0aee58f7 100644 --- a/drivers/iommu/iommu-sva.c +++ b/drivers/iommu/iommu-sva.c @@ -71,6 +71,41 @@ struct mm_struct *iommu_sva_find(ioasid_t pasid) } EXPORT_SYMBOL_GPL(iommu_sva_find); +/** + * iommu_access_remote_vm - access another process' address space by PASID + * @pasid: Process Address Space ID assigned to the mm + * @addr: start address to access + * @buf: source or destination buffer + * @len: number of bytes to transfer + * @gup_flags: flags modifying lookup behaviour + * + * Another process' address space is found by PASID. A reference on @mm + * is taken and released inside the function. + * + * Return: number of bytes copied from source to destination. + */ +int iommu_access_remote_vm(ioasid_t pasid, unsigned long addr, void *buf, + int len, unsigned int gup_flags) +{ + struct mm_struct *mm; + int copied; + + mm = iommu_sva_find(pasid); + if (IS_ERR_OR_NULL(mm)) + return 0; + + /* + * A reference on @mm has been held by mmget_not_zero() + * during iommu_sva_find(). + */ + copied = access_remote_vm(mm, addr, buf, len, gup_flags); + /* The reference is released. */ + mmput(mm); + + return copied; +} +EXPORT_SYMBOL_GPL(iommu_access_remote_vm); + /** * iommu_sva_bind_device() - Bind a process address space to a device * @dev: the device diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 6595454d4f48..414a46a53799 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -1177,6 +1177,8 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm); void iommu_sva_unbind_device(struct iommu_sva *handle); u32 iommu_sva_get_pasid(struct iommu_sva *handle); +int iommu_access_remote_vm(ioasid_t pasid, unsigned long addr, void *buf, + int len, unsigned int gup_flags); #else static inline struct iommu_sva * iommu_sva_bind_device(struct device *dev, struct mm_struct *mm) @@ -1192,6 +1194,13 @@ static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle) { return IOMMU_PASID_INVALID; } + +static inline int iommu_access_remote_vm(ioasid_t pasid, unsigned long addr, + void *buf, int len, + unsigned int gup_flags) +{ + return 0; +} #endif /* CONFIG_IOMMU_SVA */ #endif /* __LINUX_IOMMU_H */
Define and export iommu_access_remote_vm() to allow IOMMU related drivers to access user address space by PASID. The IDXD driver would like to use it to write the user's completion record that the hardware device is not able to write to due to user page fault. Without the API, it's complex for IDXD driver to copy completion record to a process' fault address for two reasons: 1. access_remote_vm() is not exported and shouldn't be exported for drivers because drivers may easily cause mm reference issue. 2. user frees fault address pages to trigger fault by IDXD device. The driver has to call iommu_sva_find(), kthread_use_mm(), re-implement majority of access_remote_vm() etc to access remote vm. This IOMMU specific API hides these details and provides a clean interface for idxd driver and potentially other IOMMU related drivers. Suggested-by: Alistair Popple <apopple@nvidia.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Will Deacon <will@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: iommu@lists.linux.dev --- v2: - Define and export iommu_access_remote_vm() for IDXD driver to write completion record to user address space. This change removes patch 8 and 9 in v1 (Alistair Popple) drivers/iommu/iommu-sva.c | 35 +++++++++++++++++++++++++++++++++++ include/linux/iommu.h | 9 +++++++++ 2 files changed, 44 insertions(+)