Message ID | 20240208151837.35068-1-shameerali.kolothum.thodi@huawei.com (mailing list archive) |
---|---|
Headers | show |
Series | iommu/arm-smmu-v3: Use pinned KVM VMID for stage 2 | expand |
On Thu, Feb 08, 2024 at 03:18:30PM +0000, Shameer Kolothum wrote: > Hi, > > On an ARM64 system with a SMMUv3 implementation that fully supports > Broadcast TLB Maintenance(BTM) feature as part of the Distributed > Virtual Memory(DVM) protocol, the CPU TLB invalidate instructions are > received by SMMUv3. This is very useful when the SMMUv3 shares the > page tables with the CPU(eg: Guest SVA use case). For this to work, > the SMMUv3 must use the same VMID that is allocated by KVM to configure > the nested stage 2(S2) translations. Ah so you are going all the way and looking to enable BTM within a vSVA environment too? > An earlier proposal sent out[1] a while back resulted in changing the > ARM64/KVM VMID allocator similar to the ASID allocator to make it > better suited for this. > > This RFC adds, > -Support for pinned KVM VMID. > -Support associating KVM pointer and iommufd ctx. > -Changes to domain_alloc_user() to receive a kvm pointer. > -Configure SMMUV3 S2 using KVM VMID > -Finally enable BTM only if SMMUV3 supports S1 translation. This > is to make sure that PAGING domains always use S1 and S2 is only > used for nested domains with a valid KVM. The idea is to make sure > when BTM is enabled in Guest, we use KVM VMID for S2. > > Not sure I miss any explicit TLB invalidations with any use case > that may configure a S2 with a private VMID that matches a KVM > one. > > This is based on Jason's ongoing SMMUv3 refactor series[2]. I'm glad to see this, thank you for finishing the BTM stuff! If someone is using it I wonder if we need to get a more solid answer on the races with invalidation an ASID reassign.. I added some notes in comments in part 2 after I audited all of it. Jason
> -----Original Message----- > From: Jason Gunthorpe <jgg@ziepe.ca> > Sent: Thursday, February 8, 2024 3:36 PM > To: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> > Cc: kvmarm@lists.linux.dev; iommu@lists.linux.dev; linux-arm- > kernel@lists.infradead.org; linuxarm@huawei.com; kevin.tian@intel.com; > alex.williamson@redhat.com; maz@kernel.org; oliver.upton@linux.dev; > will@kernel.org; robin.murphy@arm.com; jean-philippe@linaro.org; > jonathan.cameron@huawei.com > Subject: Re: [RFC PATCH v2 0/7] iommu/arm-smmu-v3: Use pinned KVM VMID > for stage 2 > > On Thu, Feb 08, 2024 at 03:18:30PM +0000, Shameer Kolothum wrote: > > Hi, > > > > On an ARM64 system with a SMMUv3 implementation that fully supports > > Broadcast TLB Maintenance(BTM) feature as part of the Distributed > > Virtual Memory(DVM) protocol, the CPU TLB invalidate instructions are > > received by SMMUv3. This is very useful when the SMMUv3 shares the > > page tables with the CPU(eg: Guest SVA use case). For this to work, > > the SMMUv3 must use the same VMID that is allocated by KVM to configure > > the nested stage 2(S2) translations. > > Ah so you are going all the way and looking to enable BTM within a > vSVA environment too? Yes, eager to get the vSVA support
On Thu, Feb 08, 2024 at 03:49:20PM +0000, Shameerali Kolothum Thodi wrote: > > On Thu, Feb 08, 2024 at 03:18:30PM +0000, Shameer Kolothum wrote: > > > Hi, > > > > > > On an ARM64 system with a SMMUv3 implementation that fully supports > > > Broadcast TLB Maintenance(BTM) feature as part of the Distributed > > > Virtual Memory(DVM) protocol, the CPU TLB invalidate instructions are > > > received by SMMUv3. This is very useful when the SMMUv3 shares the > > > page tables with the CPU(eg: Guest SVA use case). For this to work, > > > the SMMUv3 must use the same VMID that is allocated by KVM to configure > > > the nested stage 2(S2) translations. > > > > Ah so you are going all the way and looking to enable BTM within a > > vSVA environment too? > > Yes, eager to get the vSVA support
Hi Shameer, On Thu, Feb 08, 2024 at 03:18:30PM +0000, Shameer Kolothum wrote: > Hi, > > On an ARM64 system with a SMMUv3 implementation that fully supports > Broadcast TLB Maintenance(BTM) feature as part of the Distributed > Virtual Memory(DVM) protocol, the CPU TLB invalidate instructions are > received by SMMUv3. This is very useful when the SMMUv3 shares the > page tables with the CPU(eg: Guest SVA use case). For this to work, > the SMMUv3 must use the same VMID that is allocated by KVM to configure > the nested stage 2(S2) translations. The series makes sense to me. Maybe a little more detail to help the KVM maintainers understand why we need something like this even though the s2 page tables aren't shared between CPU and SMMU: * When enabling BTM in the SMMU, all TLB invalidations to the inner-shareable domain issued by the CPU are taken into account by the SMMU. That includes for example the TLBI IPAS2E1IS from __kvm_tlb_flush_vmid_range(). * BTM is enabled globally in the SMMU CR2 register. If we enable BTM for host SVA, then it also affects KVM. * Stage-1 TLB entries in the SMMU have a bit (ASET) saying "this entry is private and does not participate in BTM", which we set for private SMMU address spaces. Annoyingly, the stage-2 TLB entries do not have it. With BTM all VMIDs are shared between CPU and SMMU. * So, if the SMMU driver allocates VMID privately and we enable BTM, then CPU invalidations will remove unrelated SMMU TLB entries. Instead, the SMMU driver needs to coordinate with KVM on VMID allocation. * Private stage-2 address spaces in the SMMU would need to allocate VMIDs that aren't used by KVM, but that's not a use-case at the moment: - For assigning devices to a host process or to a VM, we use private stage-1 mappings. stage-2 will be used to enable nesting translation, and will typically mirror the KVM stage-2 since it pins the guest address space. - If the SMMU doesn't support stage-1, the driver falls back to stage-2 for private address spaces. For such an implementation we disable BTM. - The old VFIO_TYPE1_NESTING_IOMMU lets userspace allocate a private stage-2, and has only been used for testing as far as I know. I don't think I ever found a program that used it in the wild, but haven't checked recently. The effects of using it with BTM enabled is performance degradation: TLB entries of that VFIO container get invalidated by unrelated KVM activity, and maybe that can be used in a side-channel attack. It needs to be deprecated over a few releases (starting with a warning maybe?), and the replacement API shouldn't allow creating a stage-2 without a KVM context. Thanks, Jean > > An earlier proposal sent out[1] a while back resulted in changing the > ARM64/KVM VMID allocator similar to the ASID allocator to make it > better suited for this. > > This RFC adds, > -Support for pinned KVM VMID. > -Support associating KVM pointer and iommufd ctx. > -Changes to domain_alloc_user() to receive a kvm pointer. > -Configure SMMUV3 S2 using KVM VMID > -Finally enable BTM only if SMMUV3 supports S1 translation. This > is to make sure that PAGING domains always use S1 and S2 is only > used for nested domains with a valid KVM. The idea is to make sure > when BTM is enabled in Guest, we use KVM VMID for S2. > > Not sure I miss any explicit TLB invalidations with any use case > that may configure a S2 with a private VMID that matches a KVM > one. > > This is based on Jason's ongoing SMMUv3 refactor series[2]. > > Please take a look and let me know. > > Thanks, > Shameer > > 1. https://lore.kernel.org/linux-arm-kernel/20210222155338.26132-1-shameerali.kolothum.thodi@huawei.com/ > 2. https://lore.kernel.org/linux-arm-kernel/0-v5-cd1be8dd9c71+3fa-smmuv3_newapi_p1_jgg@nvidia.com/ > > Jean-Philippe Brucker (1): > iommu/arm-smmu-v3: Enable broadcast TLB maintenance > > Shameer Kolothum (6): > KVM: Add generic infrastructure to support pinned VMIDs > KVM: arm64: Introduce support to pin VMIDs > KVM: arm64: Add interfaces for pinned VMID support > iommufd: Associate kvm pointer to iommufd ctx > iommu: Pass in kvm pointer to domain_alloc_user > iommu/arm-smmu-v3: Use KVM VMID for s2 stage > > arch/arm64/include/asm/kvm_host.h | 3 + > arch/arm64/kvm/Kconfig | 1 + > arch/arm64/kvm/arm.c | 14 ++++ > arch/arm64/kvm/vmid.c | 84 ++++++++++++++++++++- > drivers/iommu/amd/iommu.c | 1 + > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 42 +++++++++-- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 + > drivers/iommu/intel/iommu.c | 1 + > drivers/iommu/iommufd/hw_pagetable.c | 5 +- > drivers/iommu/iommufd/iommufd_private.h | 3 + > drivers/iommu/iommufd/main.c | 14 ++++ > drivers/iommu/iommufd/selftest.c | 1 + > drivers/vfio/device_cdev.c | 3 + > include/linux/iommu.h | 3 +- > include/linux/iommufd.h | 7 ++ > include/linux/kvm_host.h | 18 +++++ > virt/kvm/Kconfig | 3 + > virt/kvm/kvm_main.c | 23 ++++++ > 18 files changed, 218 insertions(+), 11 deletions(-) > > -- > 2.34.1 >
On Fri, Feb 09, 2024 at 11:58:24AM +0000, Jean-Philippe Brucker wrote: > * Stage-1 TLB entries in the SMMU have a bit (ASET) saying "this entry > is private and does not participate in BTM", which we set for private > SMMU address spaces. > > Annoyingly, the stage-2 TLB entries do not have it. With BTM all VMIDs > are shared between CPU and SMMU. Right, the spec justified this decision like this: Note: Arm expects that SMMU stage 2 address spaces are generally shared with their respective PE virtual machine stage 2 configuration. If broadcast invalidation is required to be avoided for a particular SMMU stage 2 address space, Arm recommends that a hypervisor configures the STE with a VMID that is not allocated for virtual machine use on the PEs. Which doesn't match how Linux works and I think after the recent KVM PUCK call on this topic we can say it does not match how Linux will work going into the future. Assuming the KVM S2 and IOMMU S2 are shared is not true. So unfortuntely this creates a waste as the BTM will generate worthless invalidation workload on the related IOMMU S2. We cannot do as the spec suggests to avoid broadcast invalidation here with a unique VMID as that will break vSVA vBTM invalidation to the S1. I do wonder how much of a performance negative this will create. At least the S1 isn't flushed so perhaps the performace hit is small. Anyhow, I view it as a defect that the HW doesn't have a BTM ignore bit at the S2 level so that we can use the same VMID to make vBTM work but not participate in CPU originated invalidations for a non-shared S2. A global bit to disable S2 BTM would have been fine for Linux. > - The old VFIO_TYPE1_NESTING_IOMMU lets userspace allocate a private > stage-2, and has only been used for testing as far as I know. I don't > think I ever found a program that used it in the wild, but haven't > checked recently. There isn't, it is useless and cannot do anything. A patch has been waiting to remove it for a while now, I've got it in my part 3 right now. > It needs to be deprecated over a few releases (starting with a > warning maybe?), No need, we just NOP'd it. It has no user visible side effect, replacing the S2 with a S1 is fine. > and the replacement API shouldn't allow creating a > stage-2 without a KVM context. See my remarks yesterday. Jason
> -----Original Message----- > From: Jean-Philippe Brucker <jean-philippe@linaro.org> > Sent: Friday, February 9, 2024 11:58 AM > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> > Cc: kvmarm@lists.linux.dev; iommu@lists.linux.dev; linux-arm- > kernel@lists.infradead.org; Linuxarm <linuxarm@huawei.com>; > kevin.tian@intel.com; jgg@ziepe.ca; alex.williamson@redhat.com; > maz@kernel.org; oliver.upton@linux.dev; will@kernel.org; > robin.murphy@arm.com; Jonathan Cameron > <jonathan.cameron@huawei.com> > Subject: Re: [RFC PATCH v2 0/7] iommu/arm-smmu-v3: Use pinned KVM > VMID for stage 2 > > Hi Shameer, > > On Thu, Feb 08, 2024 at 03:18:30PM +0000, Shameer Kolothum wrote: > > Hi, > > > > On an ARM64 system with a SMMUv3 implementation that fully supports > > Broadcast TLB Maintenance(BTM) feature as part of the Distributed > > Virtual Memory(DVM) protocol, the CPU TLB invalidate instructions are > > received by SMMUv3. This is very useful when the SMMUv3 shares the > > page tables with the CPU(eg: Guest SVA use case). For this to work, > > the SMMUv3 must use the same VMID that is allocated by KVM to > configure > > the nested stage 2(S2) translations. > > The series makes sense to me. Maybe a little more detail to help the KVM > maintainers understand why we need something like this even though the s2 > page tables aren't shared between CPU and SMMU: Thanks Jean for the nice write up. I will use this for next revisions of this series. Shameer