Message ID | 0-v5-9a37e0c884ce+31e3-smmuv3_newapi_p2_jgg@nvidia.com (mailing list archive) |
---|---|
Headers | show |
Series | Update SMMUv3 to the modern iommu API (part 2/3) | expand |
> -----Original Message----- > From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Monday, March 4, 2024 11:44 PM > To: iommu@lists.linux.dev; Joerg Roedel <joro@8bytes.org>; linux-arm- > kernel@lists.infradead.org; Robin Murphy <robin.murphy@arm.com>; Will > Deacon <will@kernel.org> > Cc: Eric Auger <eric.auger@redhat.com>; Jean-Philippe Brucker <jean- > philippe@linaro.org>; Moritz Fischer <mdf@kernel.org>; Michael Shavit > <mshavit@google.com>; Nicolin Chen <nicolinc@nvidia.com>; > patches@lists.linux.dev; Shameerali Kolothum Thodi > <shameerali.kolothum.thodi@huawei.com> > Subject: [PATCH v5 00/27] Update SMMUv3 to the modern iommu API (part 2/3) > > Continuing the work of part 1 this focuses on the CD, PASID and SVA > components: > > - attach_dev failure does not change the HW configuration. > > - Full PASID API support including: > - S1/SVA domains attached to PASIDs > - IDENTITY/BLOCKED/S1 attached to RID > - Change of the RID domain while PASIDs are attached > > - Streamlined SVA support using the core infrastructure > > - Hitless, whenever possible, change between two domains > > Making the CD programming work like the new STE programming allows > untangling some of the confusing SVA flows. From there the focus is on > building out the core infrastructure for dealing with PASID and CD > entries, then keeping track of unique SSID's for ATS invalidation. > > The ATS ordering is generalized so that the PASID flow can use it and put > into a form where it is fully hitless, whenever possible. Care is taken to > ensure that ATC flushes are present after any change in translation. > > Finally we simply kill the entire outdated SVA mmu_notifier implementation > in one shot and switch it over to the newly created generic PASID & CD > code. This avoids the messy and confusing approach of trying to > incrementally untangle this in place. The new code is small and simple > enough this is much better than trying to figure out smaller steps. > > Once SVA is resting on the right CD code it is straightforward to make the > PASID interface functionally complete. > > It achieves the same goals as the several series from Michael and the S1DSS > series from Nicolin that were trying to improve portions of the API. > > This is on github: > https://github.com/jgunthorpe/linux/commits/smmuv3_newapi Performed few tests with this series on a HiSilicon D06 board(SMMUv3). -Host kernel: boot with translated and passthrough cases. -Host kernel: ACC dev SVA test run with uadk/uadk_tool benchmark. With Qemu branch: https://github.com/nicolinc/qemu/commits/wip/iommufd_vsmmu-02292024/ -Guest with a n/w VF dev, legacy VFIO mode. -Guest with a n/w VF dev, IOMMUFD mode. -Hot plug(add/del) on both VFIO and IOMMUFD modes. All works as expected. FWIW: Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Thanks, Shameer
Hi Jason, On Mon, Mar 04, 2024 at 07:43:48PM -0400, Jason Gunthorpe wrote: > Continuing the work of part 1 this focuses on the CD, PASID and SVA > components: > > - attach_dev failure does not change the HW configuration. > > - Full PASID API support including: > - S1/SVA domains attached to PASIDs I am still going through the series, but I see at the end the main SMMUv3 driver has set_dev_pasid operation, are there any in-tree drivers that use that? (and how can I test it). > - IDENTITY/BLOCKED/S1 attached to RID > - Change of the RID domain while PASIDs are attached > > - Streamlined SVA support using the core infrastructure > > - Hitless, whenever possible, change between two domains Can you please clarify what cases are expected to be hitless? From what I see if ASID and TTB0 changes that would break the CD. > > Making the CD programming work like the new STE programming allows > untangling some of the confusing SVA flows. From there the focus is on > building out the core infrastructure for dealing with PASID and CD > entries, then keeping track of unique SSID's for ATS invalidation. > > The ATS ordering is generalized so that the PASID flow can use it and put > into a form where it is fully hitless, whenever possible. Care is taken to > ensure that ATC flushes are present after any change in translation. > > Finally we simply kill the entire outdated SVA mmu_notifier implementation > in one shot and switch it over to the newly created generic PASID & CD > code. This avoids the messy and confusing approach of trying to > incrementally untangle this in place. The new code is small and simple > enough this is much better than trying to figure out smaller steps. > > Once SVA is resting on the right CD code it is straightforward to make the > PASID interface functionally complete. > > It achieves the same goals as the several series from Michael and the S1DSS > series from Nicolin that were trying to improve portions of the API. > > This is on github: > https://github.com/jgunthorpe/linux/commits/smmuv3_newapi > > v5: > - Rebase on v6.8-rc7 & Will's tree > - Accomdate the SVA rc patch removing the master list iteration > - Move the kfree(to_smmu_domain(domain)) hunk to the right patch > - Move S1DSS get_used hunk to "Allow IDENTITY/BLOCKED to be set while > PASID is used" > v4: https://lore.kernel.org/r/0-v4-e7091cdd9e8d+43b1-smmuv3_newapi_p2_jgg@nvidia.com > - Rebase on v6.8-rc1, adjust to use mm_get_enqcmd_pasid() and eventually > remove all references from ARM. Move the new ARM_SMMU_FEAT_STALL_FORCE > stuff to arm_smmu_make_sva_cd() > - Adjust to use the new shared STE/CD writer logic. Disable some of the > sanity checks for the interior of the series > - Return ERR_PTR from domain_alloc functions > - Move the ATS disablement flow into arm_smmu_attach_prepare()/commit() > which lets all the STE update flows use the same sequence. This is > needed for nesting in part 3 > - Put ssid in attach_state > - Replace to_smmu_domain_safe() with to_smmu_domain_devices() > v3: https://lore.kernel.org/r/0-v3-9083a9368a5c+23fb-smmuv3_newapi_p2_jgg@nvidia.com > - Rebase on the latest part 1 > - update comments and commit messages > - Fix error exit in arm_smmu_set_pasid() > - Fix inverted logic for btm_invalidation > - Add missing ATC invalidation on mm release > - Add a big comment explaining that BTM is not enabled and what is > missing to enable it. > v2: https://lore.kernel.org/r/0-v2-16665a652079+5947-smmuv3_newapi_p2_jgg@nvidia.com > - Rebased on iommmufd + Joerg's tree > - Use sid_smmu_domain consistently to refer to the domain attached to the > device (eg the PCIe RID) > - Rework how arm_smmu_attach_*() and callers flow to be more careful > about ordering around ATC invalidation. The ATC must be invalidated > after it is impossible to establish stale entires. > - ATS disable is now entirely part of arm_smmu_attach_dev_ste(), which is > the only STE type that ever disables ATS. > - Remove the 'existing_master_domain' optimization, the code is > functionally fine without it. > - Whitespace, spelling, and checkpatch related items > - Fixed wrong value stored in the xa for the BTM flows > - Use pasid more consistently instead of id > v1: https://lore.kernel.org/r/0-v1-afbb86647bbd+5-smmuv3_newapi_p2_jgg@nvidia.com > > Jason Gunthorpe (27): > iommu/arm-smmu-v3: Do not allow a SVA domain to be set on the wrong > PASID > iommu/arm-smmu-v3: Do not ATC invalidate the entire domain > iommu/arm-smmu-v3: Add a type for the CD entry > iommu/arm-smmu-v3: Add an ops indirection to the STE code > iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry() > iommu/arm-smmu-v3: Consolidate clearing a CD table entry > iommu/arm-smmu-v3: Move the CD generation for S1 domains into a > function > iommu/arm-smmu-v3: Move allocation of the cdtable into > arm_smmu_get_cd_ptr() > iommu/arm-smmu-v3: Allocate the CD table entry in advance > iommu/arm-smmu-v3: Move the CD generation for SVA into a function > iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd() > iommu/arm-smmu-v3: Start building a generic PASID layer > iommu/arm-smmu-v3: Make smmu_domain->devices into an allocated list > iommu/arm-smmu-v3: Make changing domains be hitless for ATS > iommu/arm-smmu-v3: Add ssid to struct arm_smmu_master_domain > iommu/arm-smmu-v3: Keep track of valid CD entries in the cd_table > iommu/arm-smmu-v3: Thread SSID through the arm_smmu_attach_*() > interface > iommu/arm-smmu-v3: Make SVA allocate a normal arm_smmu_domain > iommu/arm-smmu-v3: Keep track of arm_smmu_master_domain for SVA > iommu: Add ops->domain_alloc_sva() > iommu/arm-smmu-v3: Put the SVA mmu notifier in the smmu_domain > iommu/arm-smmu-v3: Consolidate freeing the ASID/VMID > iommu/arm-smmu-v3: Move the arm_smmu_asid_xa to per-smmu like vmid > iommu/arm-smmu-v3: Bring back SVA BTM support > iommu/arm-smmu-v3: Allow IDENTITY/BLOCKED to be set while PASID is > used > iommu/arm-smmu-v3: Allow a PASID to be set when RID is > IDENTITY/BLOCKED > iommu/arm-smmu-v3: Allow setting a S1 domain to a PASID > > .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 639 +++++----- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1036 +++++++++++------ > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 79 +- > drivers/iommu/iommu-sva.c | 4 +- > drivers/iommu/iommu.c | 12 +- > include/linux/iommu.h | 3 + > 6 files changed, 1024 insertions(+), 749 deletions(-) > > > base-commit: 98b23ebb0c84657a135957d727eedebd1280cbbf > -- > 2.43.2 > Thansks, Mostafa
Hi Jason, On Mon, Mar 04, 2024 at 07:43:48PM -0400, Jason Gunthorpe wrote: > Continuing the work of part 1 this focuses on the CD, PASID and SVA > components: > > - attach_dev failure does not change the HW configuration. > > - Full PASID API support including: > - S1/SVA domains attached to PASIDs > - IDENTITY/BLOCKED/S1 attached to RID > - Change of the RID domain while PASIDs are attached > > - Streamlined SVA support using the core infrastructure > > - Hitless, whenever possible, change between two domains > > Making the CD programming work like the new STE programming allows > untangling some of the confusing SVA flows. From there the focus is on > building out the core infrastructure for dealing with PASID and CD > entries, then keeping track of unique SSID's for ATS invalidation. > > The ATS ordering is generalized so that the PASID flow can use it and put > into a form where it is fully hitless, whenever possible. Care is taken to > ensure that ATC flushes are present after any change in translation. > > Finally we simply kill the entire outdated SVA mmu_notifier implementation > in one shot and switch it over to the newly created generic PASID & CD > code. This avoids the messy and confusing approach of trying to > incrementally untangle this in place. The new code is small and simple > enough this is much better than trying to figure out smaller steps. > > Once SVA is resting on the right CD code it is straightforward to make the > PASID interface functionally complete. > > It achieves the same goals as the several series from Michael and the S1DSS > series from Nicolin that were trying to improve portions of the API. > > This is on github: > https://github.com/jgunthorpe/linux/commits/smmuv3_newapi Testing on qemu[1], with the same VMM Shameer tested with[2]: qemu/build/qemu-system-aarch64 -M virt -machine virt,gic-version=3,iommu=nested-smmuv3,iommufd=iommufd0 \ -cpu cortex-a53,pmu=off -smp 1 -m 2048 \ -kernel Image \ -drive file=rootfs.ext4,if=virtio,format=raw \ -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -nographic \ -append 'console=ttyAMA0 rootwait root=/dev/vda' \ -device virtio-scsi-pci,id=scsi0 \ -device ioh3420,id=pcie.1,chassis=1 \ -object iommufd,id=iommufd0 \ -device vfio-pci,host=0000:00:03.0,iommufd=iommufd0 I see the following panic: [ 155.141233] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 155.142416] Mem abort info: [ 155.142722] ESR = 0x0000000086000004 [ 155.143106] EC = 0x21: IABT (current EL), IL = 32 bits [ 155.143827] SET = 0, FnV = 0 [ 155.144266] EA = 0, S1PTW = 0 [ 155.144721] FSC = 0x04: level 0 translation fault [ 155.145432] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101059000 [ 155.146234] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 [ 155.148162] Internal error: Oops: 0000000086000004 [#1] PREEMPT SMP [ 155.149399] Modules linked in: [ 155.150366] CPU: 2 PID: 371 Comm: qemu-system-aar Not tainted 6.8.0-rc7-gde77230ac23a #9 [ 155.151728] Hardware name: linux,dummy-virt (DT) [ 155.152770] pstate: 81400809 (Nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=-c) [ 155.153895] pc : 0x0 [ 155.154889] lr : iommufd_hwpt_invalidate+0xa4/0x204 [ 155.156272] sp : ffff800080f3bcc0 [ 155.156971] x29: ffff800080f3bcf0 x28: ffff0000c369b300 x27: 0000000000000000 [ 155.158135] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 [ 155.159175] x23: 0000000000000000 x22: 00000000c1e334a0 x21: ffff0000c1e334a0 [ 155.160343] x20: ffff800080f3bd38 x19: ffff800080f3bd58 x18: 0000000000000000 [ 155.161298] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffff8240d6d8 [ 155.162355] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 155.163463] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 [ 155.164947] x8 : 0000001000000002 x7 : 0000fffeac1ec950 x6 : 0000000000000000 [ 155.166057] x5 : ffff800080f3bd78 x4 : 0000000000000003 x3 : 0000000000000002 [ 155.167343] x2 : 0000000000000000 x1 : ffff800080f3bcc8 x0 : ffff0000c6034d80 [ 155.168851] Call trace: [ 155.169738] 0x0 [ 155.170623] iommufd_fops_ioctl+0x154/0x274 [ 155.171555] __arm64_sys_ioctl+0xac/0xf0 [ 155.172095] invoke_syscall+0x48/0x110 [ 155.172633] el0_svc_common.constprop.0+0x40/0xe0 [ 155.173277] do_el0_svc+0x1c/0x28 [ 155.173847] el0_svc+0x34/0xb4 [ 155.174312] el0t_64_sync_handler+0x120/0x12c [ 155.174969] el0t_64_sync+0x190/0x194 [ 155.176006] Code: ???????? ???????? ???????? ???????? (????????) [ 155.178349] ---[ end trace 0000000000000000 ]--- The core IOMMUFD code calls domain->ops->cache_invalidate_user unconditionally from IOCTL:IOMMU_HWPT_INVALIDATE and the SMMUv3 driver doesn't implement it, that seems missing as otherwise the VMM can't invalidate S1 mappings, or I a missing something? [1] https://lore.kernel.org/all/20240325101442.1306300-1-smostafa@google.com/ [2] https://github.com/nicolinc/qemu/commits/wip/iommufd_vsmmu-02292024/ > > v5: > - Rebase on v6.8-rc7 & Will's tree > - Accomdate the SVA rc patch removing the master list iteration > - Move the kfree(to_smmu_domain(domain)) hunk to the right patch > - Move S1DSS get_used hunk to "Allow IDENTITY/BLOCKED to be set while > PASID is used" > v4: https://lore.kernel.org/r/0-v4-e7091cdd9e8d+43b1-smmuv3_newapi_p2_jgg@nvidia.com > - Rebase on v6.8-rc1, adjust to use mm_get_enqcmd_pasid() and eventually > remove all references from ARM. Move the new ARM_SMMU_FEAT_STALL_FORCE > stuff to arm_smmu_make_sva_cd() > - Adjust to use the new shared STE/CD writer logic. Disable some of the > sanity checks for the interior of the series > - Return ERR_PTR from domain_alloc functions > - Move the ATS disablement flow into arm_smmu_attach_prepare()/commit() > which lets all the STE update flows use the same sequence. This is > needed for nesting in part 3 > - Put ssid in attach_state > - Replace to_smmu_domain_safe() with to_smmu_domain_devices() > v3: https://lore.kernel.org/r/0-v3-9083a9368a5c+23fb-smmuv3_newapi_p2_jgg@nvidia.com > - Rebase on the latest part 1 > - update comments and commit messages > - Fix error exit in arm_smmu_set_pasid() > - Fix inverted logic for btm_invalidation > - Add missing ATC invalidation on mm release > - Add a big comment explaining that BTM is not enabled and what is > missing to enable it. > v2: https://lore.kernel.org/r/0-v2-16665a652079+5947-smmuv3_newapi_p2_jgg@nvidia.com > - Rebased on iommmufd + Joerg's tree > - Use sid_smmu_domain consistently to refer to the domain attached to the > device (eg the PCIe RID) > - Rework how arm_smmu_attach_*() and callers flow to be more careful > about ordering around ATC invalidation. The ATC must be invalidated > after it is impossible to establish stale entires. > - ATS disable is now entirely part of arm_smmu_attach_dev_ste(), which is > the only STE type that ever disables ATS. > - Remove the 'existing_master_domain' optimization, the code is > functionally fine without it. > - Whitespace, spelling, and checkpatch related items > - Fixed wrong value stored in the xa for the BTM flows > - Use pasid more consistently instead of id > v1: https://lore.kernel.org/r/0-v1-afbb86647bbd+5-smmuv3_newapi_p2_jgg@nvidia.com > > Jason Gunthorpe (27): > iommu/arm-smmu-v3: Do not allow a SVA domain to be set on the wrong > PASID > iommu/arm-smmu-v3: Do not ATC invalidate the entire domain > iommu/arm-smmu-v3: Add a type for the CD entry > iommu/arm-smmu-v3: Add an ops indirection to the STE code > iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry() > iommu/arm-smmu-v3: Consolidate clearing a CD table entry > iommu/arm-smmu-v3: Move the CD generation for S1 domains into a > function > iommu/arm-smmu-v3: Move allocation of the cdtable into > arm_smmu_get_cd_ptr() > iommu/arm-smmu-v3: Allocate the CD table entry in advance > iommu/arm-smmu-v3: Move the CD generation for SVA into a function > iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd() > iommu/arm-smmu-v3: Start building a generic PASID layer > iommu/arm-smmu-v3: Make smmu_domain->devices into an allocated list > iommu/arm-smmu-v3: Make changing domains be hitless for ATS > iommu/arm-smmu-v3: Add ssid to struct arm_smmu_master_domain > iommu/arm-smmu-v3: Keep track of valid CD entries in the cd_table > iommu/arm-smmu-v3: Thread SSID through the arm_smmu_attach_*() > interface > iommu/arm-smmu-v3: Make SVA allocate a normal arm_smmu_domain > iommu/arm-smmu-v3: Keep track of arm_smmu_master_domain for SVA > iommu: Add ops->domain_alloc_sva() > iommu/arm-smmu-v3: Put the SVA mmu notifier in the smmu_domain > iommu/arm-smmu-v3: Consolidate freeing the ASID/VMID > iommu/arm-smmu-v3: Move the arm_smmu_asid_xa to per-smmu like vmid > iommu/arm-smmu-v3: Bring back SVA BTM support > iommu/arm-smmu-v3: Allow IDENTITY/BLOCKED to be set while PASID is > used > iommu/arm-smmu-v3: Allow a PASID to be set when RID is > IDENTITY/BLOCKED > iommu/arm-smmu-v3: Allow setting a S1 domain to a PASID > > .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 639 +++++----- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1036 +++++++++++------ > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 79 +- > drivers/iommu/iommu-sva.c | 4 +- > drivers/iommu/iommu.c | 12 +- > include/linux/iommu.h | 3 + > 6 files changed, 1024 insertions(+), 749 deletions(-) > > > base-commit: 98b23ebb0c84657a135957d727eedebd1280cbbf > -- > 2.43.2 >
> -----Original Message----- > From: Mostafa Saleh <smostafa@google.com> > Sent: Monday, March 25, 2024 10:22 AM > To: Jason Gunthorpe <jgg@nvidia.com> > Cc: iommu@lists.linux.dev; Joerg Roedel <joro@8bytes.org>; linux-arm- > kernel@lists.infradead.org; Robin Murphy <robin.murphy@arm.com>; Will > Deacon <will@kernel.org>; Eric Auger <eric.auger@redhat.com>; Jean- > Philippe Brucker <jean-philippe@linaro.org>; Moritz Fischer > <mdf@kernel.org>; Michael Shavit <mshavit@google.com>; Nicolin Chen > <nicolinc@nvidia.com>; patches@lists.linux.dev; Shameerali Kolothum Thodi > <shameerali.kolothum.thodi@huawei.com> > Subject: Re: [PATCH v5 00/27] Update SMMUv3 to the modern iommu API > (part 2/3) > > Hi Jason, > > On Mon, Mar 04, 2024 at 07:43:48PM -0400, Jason Gunthorpe wrote: > > Continuing the work of part 1 this focuses on the CD, PASID and SVA > > components: > > > > - attach_dev failure does not change the HW configuration. > > > > - Full PASID API support including: > > - S1/SVA domains attached to PASIDs > > - IDENTITY/BLOCKED/S1 attached to RID > > - Change of the RID domain while PASIDs are attached > > > > - Streamlined SVA support using the core infrastructure > > > > - Hitless, whenever possible, change between two domains > > > > Making the CD programming work like the new STE programming allows > > untangling some of the confusing SVA flows. From there the focus is on > > building out the core infrastructure for dealing with PASID and CD > > entries, then keeping track of unique SSID's for ATS invalidation. > > > > The ATS ordering is generalized so that the PASID flow can use it and put > > into a form where it is fully hitless, whenever possible. Care is taken to > > ensure that ATC flushes are present after any change in translation. > > > > Finally we simply kill the entire outdated SVA mmu_notifier > implementation > > in one shot and switch it over to the newly created generic PASID & CD > > code. This avoids the messy and confusing approach of trying to > > incrementally untangle this in place. The new code is small and simple > > enough this is much better than trying to figure out smaller steps. > > > > Once SVA is resting on the right CD code it is straightforward to make the > > PASID interface functionally complete. > > > > It achieves the same goals as the several series from Michael and the S1DSS > > series from Nicolin that were trying to improve portions of the API. > > > > This is on github: > > https://github.com/jgunthorpe/linux/commits/smmuv3_newapi > > Testing on qemu[1], with the same VMM Shameer tested with[2]: > qemu/build/qemu-system-aarch64 -M virt -machine virt,gic- > version=3,iommu=nested-smmuv3,iommufd=iommufd0 \ > -cpu cortex-a53,pmu=off -smp 1 -m 2048 \ > -kernel Image \ > -drive file=rootfs.ext4,if=virtio,format=raw \ > -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng- > pci,rng=rng0 -nographic \ > -append 'console=ttyAMA0 rootwait root=/dev/vda' \ > -device virtio-scsi-pci,id=scsi0 \ > -device ioh3420,id=pcie.1,chassis=1 \ > -object iommufd,id=iommufd0 \ > -device vfio-pci,host=0000:00:03.0,iommufd=iommufd0 > > I see the following panic: I think that is probably because you are testing with "nested-smmuv3". This series not yet fully enable that. For that, I think you are missing few patches from Nicolin's iommufd branch, https://github.com/nicolinc/iommufd/commits/wip/iommufd_nesting-03112024/ Thanks, Shameer
Hi Shameer, On Mon, Mar 25, 2024 at 10:44 AM Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> wrote: > > > > > -----Original Message----- > > From: Mostafa Saleh <smostafa@google.com> > > Sent: Monday, March 25, 2024 10:22 AM > > To: Jason Gunthorpe <jgg@nvidia.com> > > Cc: iommu@lists.linux.dev; Joerg Roedel <joro@8bytes.org>; linux-arm- > > kernel@lists.infradead.org; Robin Murphy <robin.murphy@arm.com>; Will > > Deacon <will@kernel.org>; Eric Auger <eric.auger@redhat.com>; Jean- > > Philippe Brucker <jean-philippe@linaro.org>; Moritz Fischer > > <mdf@kernel.org>; Michael Shavit <mshavit@google.com>; Nicolin Chen > > <nicolinc@nvidia.com>; patches@lists.linux.dev; Shameerali Kolothum Thodi > > <shameerali.kolothum.thodi@huawei.com> > > Subject: Re: [PATCH v5 00/27] Update SMMUv3 to the modern iommu API > > (part 2/3) > > > > Hi Jason, > > > > On Mon, Mar 04, 2024 at 07:43:48PM -0400, Jason Gunthorpe wrote: > > > Continuing the work of part 1 this focuses on the CD, PASID and SVA > > > components: > > > > > > - attach_dev failure does not change the HW configuration. > > > > > > - Full PASID API support including: > > > - S1/SVA domains attached to PASIDs > > > - IDENTITY/BLOCKED/S1 attached to RID > > > - Change of the RID domain while PASIDs are attached > > > > > > - Streamlined SVA support using the core infrastructure > > > > > > - Hitless, whenever possible, change between two domains > > > > > > Making the CD programming work like the new STE programming allows > > > untangling some of the confusing SVA flows. From there the focus is on > > > building out the core infrastructure for dealing with PASID and CD > > > entries, then keeping track of unique SSID's for ATS invalidation. > > > > > > The ATS ordering is generalized so that the PASID flow can use it and put > > > into a form where it is fully hitless, whenever possible. Care is taken to > > > ensure that ATC flushes are present after any change in translation. > > > > > > Finally we simply kill the entire outdated SVA mmu_notifier > > implementation > > > in one shot and switch it over to the newly created generic PASID & CD > > > code. This avoids the messy and confusing approach of trying to > > > incrementally untangle this in place. The new code is small and simple > > > enough this is much better than trying to figure out smaller steps. > > > > > > Once SVA is resting on the right CD code it is straightforward to make the > > > PASID interface functionally complete. > > > > > > It achieves the same goals as the several series from Michael and the S1DSS > > > series from Nicolin that were trying to improve portions of the API. > > > > > > This is on github: > > > https://github.com/jgunthorpe/linux/commits/smmuv3_newapi > > > > Testing on qemu[1], with the same VMM Shameer tested with[2]: > > qemu/build/qemu-system-aarch64 -M virt -machine virt,gic- > > version=3,iommu=nested-smmuv3,iommufd=iommufd0 \ > > -cpu cortex-a53,pmu=off -smp 1 -m 2048 \ > > -kernel Image \ > > -drive file=rootfs.ext4,if=virtio,format=raw \ > > -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng- > > pci,rng=rng0 -nographic \ > > -append 'console=ttyAMA0 rootwait root=/dev/vda' \ > > -device virtio-scsi-pci,id=scsi0 \ > > -device ioh3420,id=pcie.1,chassis=1 \ > > -object iommufd,id=iommufd0 \ > > -device vfio-pci,host=0000:00:03.0,iommufd=iommufd0 > > > > I see the following panic: > > I think that is probably because you are testing with "nested-smmuv3". This > series not yet fully enable that. For that, I think you are missing few patches > from Nicolin's iommufd branch, > https://github.com/nicolinc/iommufd/commits/wip/iommufd_nesting-03112024/ I see, thanks for clarifying. I think we shouldn't still crash the kernel, but that's a problem for part 3. Thanks, Mostafa
On Sat, Mar 23, 2024 at 01:38:04PM +0000, Mostafa Saleh wrote: > Hi Jason, > > On Mon, Mar 04, 2024 at 07:43:48PM -0400, Jason Gunthorpe wrote: > > Continuing the work of part 1 this focuses on the CD, PASID and SVA > > components: > > > > - attach_dev failure does not change the HW configuration. > > > > - Full PASID API support including: > > - S1/SVA domains attached to PASIDs > > I am still going through the series, but I see at the end the main SMMUv3 > driver has set_dev_pasid operation, are there any in-tree drivers that > use that? (and how can I test it). Not yet, but some will be coming. Currently only Intel driver supports it, but Intel HW has other problems making it unusable.. A big part of the effort here is to enable the platform ecosystem so devices and drivers can use it. Moritz has access to a device that can exercise this, though we are still working on it. > > - IDENTITY/BLOCKED/S1 attached to RID > > - Change of the RID domain while PASIDs are attached > > > > - Streamlined SVA support using the core infrastructure > > > > - Hitless, whenever possible, change between two domains > > Can you please clarify what cases are expected to be hitless? > From what I see if ASID and TTB0 changes that would break the CD. Right. For CD it is only the SVA mm release flow, setting EPD0. Jason
On Mon, Mar 25, 2024 at 11:22:19AM +0000, Mostafa Saleh wrote: > > I think that is probably because you are testing with "nested-smmuv3". This > > series not yet fully enable that. For that, I think you are missing few patches > > from Nicolin's iommufd branch, > > https://github.com/nicolinc/iommufd/commits/wip/iommufd_nesting-03112024/ > > I see, thanks for clarifying. I think we shouldn't still crash the > kernel, but that's a problem for part 3. Yeah, definately. Part 3 needs to include the invalidation bits too, I haven't integrated them from Nicolin. I'll send a patch like this for iommufd to stop the oops: @@ -236,7 +236,8 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx, } hwpt->domain->owner = ops; - if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_NESTED)) { + if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_NESTED || + !hwpt->domain->ops->cache_invalidate_user)) { rc = -EINVAL; goto out_abort; } Jason
On Mon, Mar 25, 2024 at 11:35:03AM -0300, Jason Gunthorpe wrote: > On Sat, Mar 23, 2024 at 01:38:04PM +0000, Mostafa Saleh wrote: > > Hi Jason, > > > > On Mon, Mar 04, 2024 at 07:43:48PM -0400, Jason Gunthorpe wrote: > > > Continuing the work of part 1 this focuses on the CD, PASID and SVA > > > components: > > > > > > - attach_dev failure does not change the HW configuration. > > > > > > - Full PASID API support including: > > > - S1/SVA domains attached to PASIDs > > > > I am still going through the series, but I see at the end the main SMMUv3 > > driver has set_dev_pasid operation, are there any in-tree drivers that > > use that? (and how can I test it). > > Not yet, but some will be coming. Currently only Intel driver supports > it, but Intel HW has other problems making it unusable.. > > A big part of the effort here is to enable the platform ecosystem so > devices and drivers can use it. Moritz has access to a device that > can exercise this, though we are still working on it. > Just out of curiosity, are there plans to upstream that driver? > > > - IDENTITY/BLOCKED/S1 attached to RID > > > - Change of the RID domain while PASIDs are attached > > > > > > - Streamlined SVA support using the core infrastructure > > > > > > - Hitless, whenever possible, change between two domains > > > > Can you please clarify what cases are expected to be hitless? > > From what I see if ASID and TTB0 changes that would break the CD. > > Right. For CD it is only the SVA mm release flow, setting EPD0. > I see, thanks for confirming, I am still going through the series, but I now wonder if this case is worth the extra complexity, unlike the STE where the hitless transition was usefull in many cases. Thanks, Mostafa.
On Mon, Mar 25, 2024 at 09:06:23PM +0000, Mostafa Saleh wrote: > On Mon, Mar 25, 2024 at 11:35:03AM -0300, Jason Gunthorpe wrote: > > On Sat, Mar 23, 2024 at 01:38:04PM +0000, Mostafa Saleh wrote: > > > Hi Jason, > > > > > > On Mon, Mar 04, 2024 at 07:43:48PM -0400, Jason Gunthorpe wrote: > > > > Continuing the work of part 1 this focuses on the CD, PASID and SVA > > > > components: > > > > > > > > - attach_dev failure does not change the HW configuration. > > > > > > > > - Full PASID API support including: > > > > - S1/SVA domains attached to PASIDs > > > > > > I am still going through the series, but I see at the end the main SMMUv3 > > > driver has set_dev_pasid operation, are there any in-tree drivers that > > > use that? (and how can I test it). > > > > Not yet, but some will be coming. Currently only Intel driver supports > > it, but Intel HW has other problems making it unusable.. > > > > A big part of the effort here is to enable the platform ecosystem so > > devices and drivers can use it. Moritz has access to a device that > > can exercise this, though we are still working on it. > > Just out of curiosity, are there plans to upstream that driver? I expect so, but until it passes out of the evaluation stage and into a production stage it isn't something guaranteed. The team working on it needs a HW/SW ecosystem to test the device on which is only now just barely starting to exist. > I see, thanks for confirming, I am still going through the series, but > I now wonder if this case is worth the extra complexity, unlike the STE > where the hitless transition was usefull in many cases. Well, it is worth it to convert everything into 'make' functions for sure. At that point it is just re-using the complexity that already exists. Implementing a special programming logic just for CD that did the V/0=1 and EPD0 special case as open coded would be more code than adding ops. Jason