diff mbox series

[v4,10/17] iommufd: Support IOMMU_HWPT_ALLOC allocation with user data

Message ID 20230921075138.124099-11-yi.l.liu@intel.com (mailing list archive)
State New, archived
Headers show
Series iommufd: Add nesting infrastructure | expand

Commit Message

Yi Liu Sept. 21, 2023, 7:51 a.m. UTC
IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce.
But it can only allocate a hw_pagetable that associates to a given IOAS,
i.e. only a kernel-managed hw_pagetable of IOMMU_HWPT_TYPE_DEFAULT type.

IOMMU drivers can now support user-managed hw_pagetables, for two-stage
translation use cases, that require user data input from the user space.

Extend the IOMMU_HWPT_ALLOC ioctl to accept non-default hwpt_type with a
type specified user data. Also, update the @pt_id to accept hwpt_id too
besides an ioas_id. Then, pass them to the downstream alloc_fn().

Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/iommufd/hw_pagetable.c | 19 ++++++++++++++++++-
 include/uapi/linux/iommufd.h         | 23 +++++++++++++++++++++--
 2 files changed, 39 insertions(+), 3 deletions(-)

Comments

Jason Gunthorpe Oct. 13, 2023, 3:19 p.m. UTC | #1
On Thu, Sep 21, 2023 at 12:51:31AM -0700, Yi Liu wrote:
> IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce.
> But it can only allocate a hw_pagetable that associates to a given IOAS,
> i.e. only a kernel-managed hw_pagetable of IOMMU_HWPT_TYPE_DEFAULT type.
> 
> IOMMU drivers can now support user-managed hw_pagetables, for two-stage
> translation use cases, that require user data input from the user space.
> 
> Extend the IOMMU_HWPT_ALLOC ioctl to accept non-default hwpt_type with a
> type specified user data. Also, update the @pt_id to accept hwpt_id too
> besides an ioas_id. Then, pass them to the downstream alloc_fn().
> 
> Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  drivers/iommu/iommufd/hw_pagetable.c | 19 ++++++++++++++++++-
>  include/uapi/linux/iommufd.h         | 23 +++++++++++++++++++++--
>  2 files changed, 39 insertions(+), 3 deletions(-)

Can we also come with a small vt-d patch that does implement an op for
this? Or is it too big?

It would be nice if we could wrap IOMMU_HWPT_ALLOC into one
self-contained series and another series for invalidate.

Jason
Nicolin Chen Oct. 13, 2023, 8:58 p.m. UTC | #2
On Fri, Oct 13, 2023 at 12:19:23PM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 21, 2023 at 12:51:31AM -0700, Yi Liu wrote:
> > IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce.
> > But it can only allocate a hw_pagetable that associates to a given IOAS,
> > i.e. only a kernel-managed hw_pagetable of IOMMU_HWPT_TYPE_DEFAULT type.
> > 
> > IOMMU drivers can now support user-managed hw_pagetables, for two-stage
> > translation use cases, that require user data input from the user space.
> > 
> > Extend the IOMMU_HWPT_ALLOC ioctl to accept non-default hwpt_type with a
> > type specified user data. Also, update the @pt_id to accept hwpt_id too
> > besides an ioas_id. Then, pass them to the downstream alloc_fn().
> > 
> > Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > ---
> >  drivers/iommu/iommufd/hw_pagetable.c | 19 ++++++++++++++++++-
> >  include/uapi/linux/iommufd.h         | 23 +++++++++++++++++++++--
> >  2 files changed, 39 insertions(+), 3 deletions(-)
> 
> Can we also come with a small vt-d patch that does implement an op for
> this? Or is it too big?
> 
> It would be nice if we could wrap IOMMU_HWPT_ALLOC into one
> self-contained series and another series for invalidate.

We now only use IOMMU_HWPT_ALLOC for nested domain allocations,
which won't be supported until the cache_invalidate_user ops is
preset?

/* e.g. the following piece is in iommufd_user_managed_hwpt_alloc */

+       /* Driver is buggy by missing cache_invalidate_user in domain_ops */
+       if (WARN_ON_ONCE(!hwpt->domain->ops->cache_invalidate_user)) {
+               rc = -EINVAL;
+               goto out_abort;
+       }

Thanks
Nic
Jason Gunthorpe Oct. 14, 2023, 12:07 a.m. UTC | #3
On Fri, Oct 13, 2023 at 01:58:59PM -0700, Nicolin Chen wrote:
> On Fri, Oct 13, 2023 at 12:19:23PM -0300, Jason Gunthorpe wrote:
> > On Thu, Sep 21, 2023 at 12:51:31AM -0700, Yi Liu wrote:
> > > IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce.
> > > But it can only allocate a hw_pagetable that associates to a given IOAS,
> > > i.e. only a kernel-managed hw_pagetable of IOMMU_HWPT_TYPE_DEFAULT type.
> > > 
> > > IOMMU drivers can now support user-managed hw_pagetables, for two-stage
> > > translation use cases, that require user data input from the user space.
> > > 
> > > Extend the IOMMU_HWPT_ALLOC ioctl to accept non-default hwpt_type with a
> > > type specified user data. Also, update the @pt_id to accept hwpt_id too
> > > besides an ioas_id. Then, pass them to the downstream alloc_fn().
> > > 
> > > Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
> > > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > ---
> > >  drivers/iommu/iommufd/hw_pagetable.c | 19 ++++++++++++++++++-
> > >  include/uapi/linux/iommufd.h         | 23 +++++++++++++++++++++--
> > >  2 files changed, 39 insertions(+), 3 deletions(-)
> > 
> > Can we also come with a small vt-d patch that does implement an op for
> > this? Or is it too big?
> > 
> > It would be nice if we could wrap IOMMU_HWPT_ALLOC into one
> > self-contained series and another series for invalidate.
> 
> We now only use IOMMU_HWPT_ALLOC for nested domain allocations,
> which won't be supported until the cache_invalidate_user ops is
> preset?
> 
> /* e.g. the following piece is in iommufd_user_managed_hwpt_alloc */
> 
> +       /* Driver is buggy by missing cache_invalidate_user in domain_ops */
> +       if (WARN_ON_ONCE(!hwpt->domain->ops->cache_invalidate_user)) {
> +               rc = -EINVAL;
> +               goto out_abort;
> +       }
>

Hm. That hunk could migrate to the invalidate series. 

I'm just leeary of doing the invalidation too considering how
complicated it is

Jason
Nicolin Chen Oct. 14, 2023, 12:51 a.m. UTC | #4
On Fri, Oct 13, 2023 at 09:07:09PM -0300, Jason Gunthorpe wrote:
> On Fri, Oct 13, 2023 at 01:58:59PM -0700, Nicolin Chen wrote:
> > On Fri, Oct 13, 2023 at 12:19:23PM -0300, Jason Gunthorpe wrote:
> > > On Thu, Sep 21, 2023 at 12:51:31AM -0700, Yi Liu wrote:
> > > > IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce.
> > > > But it can only allocate a hw_pagetable that associates to a given IOAS,
> > > > i.e. only a kernel-managed hw_pagetable of IOMMU_HWPT_TYPE_DEFAULT type.
> > > > 
> > > > IOMMU drivers can now support user-managed hw_pagetables, for two-stage
> > > > translation use cases, that require user data input from the user space.
> > > > 
> > > > Extend the IOMMU_HWPT_ALLOC ioctl to accept non-default hwpt_type with a
> > > > type specified user data. Also, update the @pt_id to accept hwpt_id too
> > > > besides an ioas_id. Then, pass them to the downstream alloc_fn().
> > > > 
> > > > Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
> > > > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > > ---
> > > >  drivers/iommu/iommufd/hw_pagetable.c | 19 ++++++++++++++++++-
> > > >  include/uapi/linux/iommufd.h         | 23 +++++++++++++++++++++--
> > > >  2 files changed, 39 insertions(+), 3 deletions(-)
> > > 
> > > Can we also come with a small vt-d patch that does implement an op for
> > > this? Or is it too big?
> > > 
> > > It would be nice if we could wrap IOMMU_HWPT_ALLOC into one
> > > self-contained series and another series for invalidate.
> > 
> > We now only use IOMMU_HWPT_ALLOC for nested domain allocations,
> > which won't be supported until the cache_invalidate_user ops is
> > preset?
> > 
> > /* e.g. the following piece is in iommufd_user_managed_hwpt_alloc */
> > 
> > +       /* Driver is buggy by missing cache_invalidate_user in domain_ops */
> > +       if (WARN_ON_ONCE(!hwpt->domain->ops->cache_invalidate_user)) {
> > +               rc = -EINVAL;
> > +               goto out_abort;
> > +       }
> >
> 
> Hm. That hunk could migrate to the invalidate series. 
> 
> I'm just leeary of doing the invalidation too considering how
> complicated it is

OK. Let's see how Yi/Kevin/Baolu reply about the feasibility
with the VT-d driver. Then Yi and I can accordingly separate
the allocation part into a smaller series.

Thanks
Nic
Yi Liu Oct. 16, 2023, 7:03 a.m. UTC | #5
On 2023/10/14 08:51, Nicolin Chen wrote:
> On Fri, Oct 13, 2023 at 09:07:09PM -0300, Jason Gunthorpe wrote:
>> On Fri, Oct 13, 2023 at 01:58:59PM -0700, Nicolin Chen wrote:
>>> On Fri, Oct 13, 2023 at 12:19:23PM -0300, Jason Gunthorpe wrote:
>>>> On Thu, Sep 21, 2023 at 12:51:31AM -0700, Yi Liu wrote:
>>>>> IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce.
>>>>> But it can only allocate a hw_pagetable that associates to a given IOAS,
>>>>> i.e. only a kernel-managed hw_pagetable of IOMMU_HWPT_TYPE_DEFAULT type.
>>>>>
>>>>> IOMMU drivers can now support user-managed hw_pagetables, for two-stage
>>>>> translation use cases, that require user data input from the user space.
>>>>>
>>>>> Extend the IOMMU_HWPT_ALLOC ioctl to accept non-default hwpt_type with a
>>>>> type specified user data. Also, update the @pt_id to accept hwpt_id too
>>>>> besides an ioas_id. Then, pass them to the downstream alloc_fn().
>>>>>
>>>>> Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
>>>>> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
>>>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>>>> ---
>>>>>   drivers/iommu/iommufd/hw_pagetable.c | 19 ++++++++++++++++++-
>>>>>   include/uapi/linux/iommufd.h         | 23 +++++++++++++++++++++--
>>>>>   2 files changed, 39 insertions(+), 3 deletions(-)
>>>>
>>>> Can we also come with a small vt-d patch that does implement an op for
>>>> this? Or is it too big?
>>>>
>>>> It would be nice if we could wrap IOMMU_HWPT_ALLOC into one
>>>> self-contained series and another series for invalidate.
>>>
>>> We now only use IOMMU_HWPT_ALLOC for nested domain allocations,
>>> which won't be supported until the cache_invalidate_user ops is
>>> preset?
>>>
>>> /* e.g. the following piece is in iommufd_user_managed_hwpt_alloc */
>>>
>>> +       /* Driver is buggy by missing cache_invalidate_user in domain_ops */
>>> +       if (WARN_ON_ONCE(!hwpt->domain->ops->cache_invalidate_user)) {
>>> +               rc = -EINVAL;
>>> +               goto out_abort;
>>> +       }
>>>
>>
>> Hm. That hunk could migrate to the invalidate series.
>>
>> I'm just leeary of doing the invalidation too considering how
>> complicated it is
> 
> OK. Let's see how Yi/Kevin/Baolu reply about the feasibility
> with the VT-d driver. Then Yi and I can accordingly separate
> the allocation part into a smaller series.

Current nesting series actually extends HWPT_ALLOC ioctl to accept user
data for allocating domain with vendor specific data. Nested translation
happens to be the usage of it. But nesting requires invalidation. If we
want to do further split, then this new series would be just "extending
HWPT_ALLOC to accept vendor specific data from userspace". But it will
lack of a user if nesting is separated. Is this acceptable? @Jason

BTW. Do we still have unsolved issue on the invalidation path?
Jason Gunthorpe Oct. 16, 2023, 11:59 a.m. UTC | #6
On Mon, Oct 16, 2023 at 03:03:04PM +0800, Yi Liu wrote:
> On 2023/10/14 08:51, Nicolin Chen wrote:
> > On Fri, Oct 13, 2023 at 09:07:09PM -0300, Jason Gunthorpe wrote:
> > > On Fri, Oct 13, 2023 at 01:58:59PM -0700, Nicolin Chen wrote:
> > > > On Fri, Oct 13, 2023 at 12:19:23PM -0300, Jason Gunthorpe wrote:
> > > > > On Thu, Sep 21, 2023 at 12:51:31AM -0700, Yi Liu wrote:
> > > > > > IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce.
> > > > > > But it can only allocate a hw_pagetable that associates to a given IOAS,
> > > > > > i.e. only a kernel-managed hw_pagetable of IOMMU_HWPT_TYPE_DEFAULT type.
> > > > > > 
> > > > > > IOMMU drivers can now support user-managed hw_pagetables, for two-stage
> > > > > > translation use cases, that require user data input from the user space.
> > > > > > 
> > > > > > Extend the IOMMU_HWPT_ALLOC ioctl to accept non-default hwpt_type with a
> > > > > > type specified user data. Also, update the @pt_id to accept hwpt_id too
> > > > > > besides an ioas_id. Then, pass them to the downstream alloc_fn().
> > > > > > 
> > > > > > Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
> > > > > > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > > > > > Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> > > > > > ---
> > > > > >   drivers/iommu/iommufd/hw_pagetable.c | 19 ++++++++++++++++++-
> > > > > >   include/uapi/linux/iommufd.h         | 23 +++++++++++++++++++++--
> > > > > >   2 files changed, 39 insertions(+), 3 deletions(-)
> > > > > 
> > > > > Can we also come with a small vt-d patch that does implement an op for
> > > > > this? Or is it too big?
> > > > > 
> > > > > It would be nice if we could wrap IOMMU_HWPT_ALLOC into one
> > > > > self-contained series and another series for invalidate.
> > > > 
> > > > We now only use IOMMU_HWPT_ALLOC for nested domain allocations,
> > > > which won't be supported until the cache_invalidate_user ops is
> > > > preset?
> > > > 
> > > > /* e.g. the following piece is in iommufd_user_managed_hwpt_alloc */
> > > > 
> > > > +       /* Driver is buggy by missing cache_invalidate_user in domain_ops */
> > > > +       if (WARN_ON_ONCE(!hwpt->domain->ops->cache_invalidate_user)) {
> > > > +               rc = -EINVAL;
> > > > +               goto out_abort;
> > > > +       }
> > > > 
> > > 
> > > Hm. That hunk could migrate to the invalidate series.
> > > 
> > > I'm just leeary of doing the invalidation too considering how
> > > complicated it is
> > 
> > OK. Let's see how Yi/Kevin/Baolu reply about the feasibility
> > with the VT-d driver. Then Yi and I can accordingly separate
> > the allocation part into a smaller series.
> 
> Current nesting series actually extends HWPT_ALLOC ioctl to accept user
> data for allocating domain with vendor specific data. Nested translation
> happens to be the usage of it. But nesting requires invalidation. If we
> want to do further split, then this new series would be just "extending
> HWPT_ALLOC to accept vendor specific data from userspace". But it will
> lack of a user if nesting is separated. Is this acceptable? @Jason

I'd still like to include the nesting allocation and attach parts
though, even if they are not usable without invalidation ..

> BTW. Do we still have unsolved issue on the invalidation path?

I'm not sure, there were so many different versions of it we need to
go back over it and check the dirver implementations again

Jason
Nicolin Chen Oct. 16, 2023, 6:44 p.m. UTC | #7
On Mon, Oct 16, 2023 at 08:59:07AM -0300, Jason Gunthorpe wrote:
> On Mon, Oct 16, 2023 at 03:03:04PM +0800, Yi Liu wrote:
> > Current nesting series actually extends HWPT_ALLOC ioctl to accept user
> > data for allocating domain with vendor specific data. Nested translation
> > happens to be the usage of it. But nesting requires invalidation. If we
> > want to do further split, then this new series would be just "extending
> > HWPT_ALLOC to accept vendor specific data from userspace". But it will
> > lack of a user if nesting is separated. Is this acceptable? @Jason
> 
> I'd still like to include the nesting allocation and attach parts
> though, even if they are not usable without invalidation ..

This is the latest series that I reworked (in bottom-up order):
 iommu: Add a pair of helper to copy struct iommu_user_data{_array}
 iommufd: Add IOMMU_HWPT_INVALIDATE
 iommufd: Add a nested HW pagetable object
 iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED
 iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable
 iommufd: Rename IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING
 iommufd/device: Add helpers to enforce/remove device reserved regions
 iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op
 iommu: Pass in parent domain with user_data to domain_alloc_user op

Perhaps we can have a preparatory series to merge first:
 iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED
 iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable
 iommufd: Rename IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING
 iommufd/device: Add helpers to enforce/remove device reserved regions

Then next cycle would be basically 4 patches + selftests:
 iommufd: Add IOMMU_HWPT_INVALIDATE
 iommufd: Add a nested HW pagetable object
 iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op
 iommu: Pass in parent domain with user_data to domain_alloc_user op

The preparatory series doesn't involve functional changes yet have
a good amount of pieces to simplify the "nested HW pagetable" that
is basically nested_alloc/abort/destroy.

> > BTW. Do we still have unsolved issue on the invalidation path?
> 
> I'm not sure, there were so many different versions of it we need to
> go back over it and check the dirver implementations again

Only this v4 has the latest array-based invalidation design. And
it should be straightforward for drivers to define entry/request
structures. It might be a bit rush to review/finalize it at the
stage of rc6 though.

Thanks
Nic
Yi Liu Oct. 17, 2023, 8:55 a.m. UTC | #8
On 2023/10/17 02:44, Nicolin Chen wrote:
> On Mon, Oct 16, 2023 at 08:59:07AM -0300, Jason Gunthorpe wrote:
>> On Mon, Oct 16, 2023 at 03:03:04PM +0800, Yi Liu wrote:
>>> Current nesting series actually extends HWPT_ALLOC ioctl to accept user
>>> data for allocating domain with vendor specific data. Nested translation
>>> happens to be the usage of it. But nesting requires invalidation. If we
>>> want to do further split, then this new series would be just "extending
>>> HWPT_ALLOC to accept vendor specific data from userspace". But it will
>>> lack of a user if nesting is separated. Is this acceptable? @Jason
>>
>> I'd still like to include the nesting allocation and attach parts
>> though, even if they are not usable without invalidation ..
> 
> This is the latest series that I reworked (in bottom-up order):
>   iommu: Add a pair of helper to copy struct iommu_user_data{_array}
>   iommufd: Add IOMMU_HWPT_INVALIDATE
>   iommufd: Add a nested HW pagetable object
>   iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED
>   iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable
>   iommufd: Rename IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING
>   iommufd/device: Add helpers to enforce/remove device reserved regions
>   iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op
>   iommu: Pass in parent domain with user_data to domain_alloc_user op

following Jason's comment, it looks like we can just split the cache
invalidation path out. Then the above looks good after removing
"iommufd: Add IOMMU_HWPT_INVALIDATE" and also the cache_invalidate_user
callback in "iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op".
Is it? @Jason

> Perhaps we can have a preparatory series to merge first:
>   iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED
>   iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable
>   iommufd: Rename IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING
>   iommufd/device: Add helpers to enforce/remove device reserved regions
> 
> Then next cycle would be basically 4 patches + selftests:
>   iommufd: Add IOMMU_HWPT_INVALIDATE
>   iommufd: Add a nested HW pagetable object
>   iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op
>   iommu: Pass in parent domain with user_data to domain_alloc_user op
> 
> The preparatory series doesn't involve functional changes yet have
> a good amount of pieces to simplify the "nested HW pagetable" that
> is basically nested_alloc/abort/destroy.
> >>> BTW. Do we still have unsolved issue on the invalidation path?
>>
>> I'm not sure, there were so many different versions of it we need to
>> go back over it and check the dirver implementations again
> 
> Only this v4 has the latest array-based invalidation design. And
> it should be straightforward for drivers to define entry/request
> structures. It might be a bit rush to review/finalize it at the
> stage of rc6 though.

yes, before v4, the cache invalidation path is simple and vendor
drivers have their own handling.
Jason Gunthorpe Oct. 17, 2023, 3:50 p.m. UTC | #9
On Tue, Oct 17, 2023 at 04:55:12PM +0800, Yi Liu wrote:
> On 2023/10/17 02:44, Nicolin Chen wrote:
> > On Mon, Oct 16, 2023 at 08:59:07AM -0300, Jason Gunthorpe wrote:
> > > On Mon, Oct 16, 2023 at 03:03:04PM +0800, Yi Liu wrote:
> > > > Current nesting series actually extends HWPT_ALLOC ioctl to accept user
> > > > data for allocating domain with vendor specific data. Nested translation
> > > > happens to be the usage of it. But nesting requires invalidation. If we
> > > > want to do further split, then this new series would be just "extending
> > > > HWPT_ALLOC to accept vendor specific data from userspace". But it will
> > > > lack of a user if nesting is separated. Is this acceptable? @Jason
> > > 
> > > I'd still like to include the nesting allocation and attach parts
> > > though, even if they are not usable without invalidation ..
> > 
> > This is the latest series that I reworked (in bottom-up order):
> >   iommu: Add a pair of helper to copy struct iommu_user_data{_array}
> >   iommufd: Add IOMMU_HWPT_INVALIDATE
> >   iommufd: Add a nested HW pagetable object
> >   iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED
> >   iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable
> >   iommufd: Rename IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING
> >   iommufd/device: Add helpers to enforce/remove device reserved regions
> >   iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op
> >   iommu: Pass in parent domain with user_data to domain_alloc_user op
> 
> following Jason's comment, it looks like we can just split the cache
> invalidation path out. Then the above looks good after removing
> "iommufd: Add IOMMU_HWPT_INVALIDATE" and also the cache_invalidate_user
> callback in "iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op".
> Is it? @Jason

If it can make sense, sure. It would be nice to be finished with the
alloc path

> > Only this v4 has the latest array-based invalidation design. And
> > it should be straightforward for drivers to define entry/request
> > structures. It might be a bit rush to review/finalize it at the
> > stage of rc6 though.
> 
> yes, before v4, the cache invalidation path is simple and vendor
> drivers have their own handling.

Have driver implementations of v4 been done to look at?

Jason
Nicolin Chen Oct. 17, 2023, 7:32 p.m. UTC | #10
On Tue, Oct 17, 2023 at 12:50:11PM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 17, 2023 at 04:55:12PM +0800, Yi Liu wrote:
> > On 2023/10/17 02:44, Nicolin Chen wrote:
> > > On Mon, Oct 16, 2023 at 08:59:07AM -0300, Jason Gunthorpe wrote:
> > > > On Mon, Oct 16, 2023 at 03:03:04PM +0800, Yi Liu wrote:
> > > > > Current nesting series actually extends HWPT_ALLOC ioctl to accept user
> > > > > data for allocating domain with vendor specific data. Nested translation
> > > > > happens to be the usage of it. But nesting requires invalidation. If we
> > > > > want to do further split, then this new series would be just "extending
> > > > > HWPT_ALLOC to accept vendor specific data from userspace". But it will
> > > > > lack of a user if nesting is separated. Is this acceptable? @Jason
> > > > 
> > > > I'd still like to include the nesting allocation and attach parts
> > > > though, even if they are not usable without invalidation ..
> > > 
> > > This is the latest series that I reworked (in bottom-up order):
> > >   iommu: Add a pair of helper to copy struct iommu_user_data{_array}
> > >   iommufd: Add IOMMU_HWPT_INVALIDATE
> > >   iommufd: Add a nested HW pagetable object
> > >   iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED
> > >   iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable
> > >   iommufd: Rename IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING
> > >   iommufd/device: Add helpers to enforce/remove device reserved regions
> > >   iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op
> > >   iommu: Pass in parent domain with user_data to domain_alloc_user op
> > 
> > following Jason's comment, it looks like we can just split the cache
> > invalidation path out. Then the above looks good after removing
> > "iommufd: Add IOMMU_HWPT_INVALIDATE" and also the cache_invalidate_user
> > callback in "iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op".
> > Is it? @Jason
> 
> If it can make sense, sure. It would be nice to be finished with the
> alloc path

I can do the split today. Shall we have a domain_alloc_user op
in VT-d driver? Can we accept a core series only? I understood
it's better to have though...

> > > Only this v4 has the latest array-based invalidation design. And
> > > it should be straightforward for drivers to define entry/request
> > > structures. It might be a bit rush to review/finalize it at the
> > > stage of rc6 though.
> > 
> > yes, before v4, the cache invalidation path is simple and vendor
> > drivers have their own handling.
> 
> Have driver implementations of v4 been done to look at?

I think so:
https://lore.kernel.org/linux-iommu/20230921075431.125239-10-yi.l.liu@intel.com/

Thanks
Nicolin
diff mbox series

Patch

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 90fd65859e28..ab25de149ae6 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -300,6 +300,7 @@  int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 					bool flag);
 	struct iommufd_hw_pagetable *hwpt, *parent;
 	struct iommu_hwpt_alloc *cmd = ucmd->cmd;
+	struct iommu_user_data *data = NULL;
 	struct iommufd_object *pt_obj;
 	struct iommufd_device *idev;
 	struct iommufd_ioas *ioas;
@@ -308,6 +309,11 @@  int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 
 	if (cmd->flags & ~IOMMU_HWPT_ALLOC_NEST_PARENT || cmd->__reserved)
 		return -EOPNOTSUPP;
+	if (!cmd->data_len && cmd->hwpt_type != IOMMU_HWPT_TYPE_DEFAULT)
+		return -EINVAL;
+	if (cmd->flags & IOMMU_HWPT_ALLOC_NEST_PARENT &&
+	    cmd->hwpt_type != IOMMU_HWPT_TYPE_DEFAULT)
+		return -EINVAL;
 
 	idev = iommufd_get_device(ucmd, cmd->dev_id);
 	if (IS_ERR(idev))
@@ -340,9 +346,19 @@  int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 		goto out_put_pt;
 	}
 
+	if (cmd->data_len) {
+		data = kzalloc(sizeof(*data), GFP_KERNEL);
+		if (!data) {
+			rc = -ENOMEM;
+			goto out_put_pt;
+		}
+		data->uptr = u64_to_user_ptr(cmd->data_uptr);
+		data->len = cmd->data_len;
+	}
+
 	mutex_lock(mutex);
 	hwpt = alloc_fn(ucmd->ictx, pt_obj, idev, cmd->flags,
-			IOMMU_HWPT_TYPE_DEFAULT, NULL, false);
+			cmd->hwpt_type, data, false);
 	if (IS_ERR(hwpt)) {
 		rc = PTR_ERR(hwpt);
 		goto out_unlock;
@@ -359,6 +375,7 @@  int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
 	iommufd_object_abort_and_destroy(ucmd->ictx, &hwpt->obj);
 out_unlock:
 	mutex_unlock(mutex);
+	kfree(data);
 out_put_pt:
 	iommufd_put_object(pt_obj);
 out_put_idev:
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 3c8660fe9bb1..c46b1f772f20 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -370,15 +370,31 @@  enum iommu_hwpt_type {
  * @size: sizeof(struct iommu_hwpt_alloc)
  * @flags: Combination of enum iommufd_hwpt_alloc_flags
  * @dev_id: The device to allocate this HWPT for
- * @pt_id: The IOAS to connect this HWPT to
+ * @pt_id: The IOAS or HWPT to connect this HWPT to
  * @out_hwpt_id: The ID of the new HWPT
  * @__reserved: Must be 0
+ * @hwpt_type: One of enum iommu_hwpt_type
+ * @data_len: Length of the type specific data
+ * @data_uptr: User pointer to the type specific data
  *
  * Explicitly allocate a hardware page table object. This is the same object
  * type that is returned by iommufd_device_attach() and represents the
  * underlying iommu driver's iommu_domain kernel object.
  *
- * A HWPT will be created with the IOVA mappings from the given IOAS.
+ * A kernel-managed HWPT will be created with the mappings from the given
+ * IOAS via the @pt_id. The @hwpt_type for this allocation can be set to
+ * either IOMMU_HWPT_TYPE_DEFAULT or a pre-defined type corresponding to
+ * an I/O page table type supported by the underlying IOMMU hardware.
+ *
+ * A user-managed HWPT will be created from a given parent HWPT via the
+ * @pt_id, in which the parent HWPT must be allocated previously via the
+ * same ioctl from a given IOAS (@pt_id). In this case, the @hwpt_type
+ * must be set to a pre-defined type corresponding to an I/O page table
+ * type supported by the underlying IOMMU hardware.
+ *
+ * If the @hwpt_type is set to IOMMU_HWPT_TYPE_DEFAULT, @data_len and
+ * @data_uptr should be zero. Otherwise, both @data_len and @data_uptr
+ * must be given.
  */
 struct iommu_hwpt_alloc {
 	__u32 size;
@@ -387,6 +403,9 @@  struct iommu_hwpt_alloc {
 	__u32 pt_id;
 	__u32 out_hwpt_id;
 	__u32 __reserved;
+	__u32 hwpt_type;
+	__u32 data_len;
+	__aligned_u64 data_uptr;
 };
 #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)