Message ID | 20241108125242.60136-1-shameerali.kolothum.thodi@huawei.com (mailing list archive) |
---|---|
Headers | show |
Series | hw/arm/virt: Add support for user-creatable nested SMMUv3 | expand |
On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum wrote: > Few ToDos to note, > 1. At present default-bus-bypass-iommu=on should be set when > arm-smmuv3-nested dev is specified. Otherwise you may get an IORT > related boot error. Requires fixing. > 2. Hot adding a device is not working at the moment. Looks like pcihp irq issue. > Could be a bug in IORT id mappings. Do we have enough bus number space for each pbx bus in IORT? The bus range is defined by min_/max_bus in hort_host_bridges(), where the pci_bus_range() function call might not leave enough space in the range for hotplugs IIRC. > ./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass-iommu=on \ > -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \ > -object iommufd,id=iommufd0 \ > -bios QEMU_EFI.fd \ > -kernel Image \ > -device virtio-blk-device,drive=fs \ > -drive if=none,file=rootfs.qcow2,id=fs \ > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \ > -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \ > -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \ > -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \ > -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \ > -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \ > -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \ > -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \ > -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw earlycon=pl011,0x9000000" \ > -device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \ > -fsdev local,id=p9fs2,path=p9root,security_model=mapped \ > -net none \ > -nographic .. > With a pci topology like below, > [root@localhost ~]# lspci -tv > -+-[0000:00]-+-00.0 Red Hat, Inc. QEMU PCIe Host bridge > | +-01.0 Red Hat, Inc. QEMU PCIe Expander bridge > | +-02.0 Red Hat, Inc. QEMU PCIe Expander bridge > | \-03.0 Virtio: Virtio filesystem > +-[0000:08]---00.0-[09]----00.0 Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function) > \-[0000:10]---00.0-[11]----00.0 Huawei Technologies Co., Ltd. HiSilicon ZIP Engine(Virtual Function) > [root@localhost ~]# > > And if you want to add another HNS VF, it should be added to the same SMMUv3 > as of the first HNS dev, > > -device pcie-root-port,id=pcie.port3,bus=pcie.1,chassis=3 \ > -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0 \ .. > At present Qemu is not doing any extra validation other than the above > failure to make sure the user configuration is correct or not. The > assumption is libvirt will take care of this. Nathan from NVIDIA side is working on the libvirt. And he already did some prototype coding in libvirt that could generate required PCI topology. I think he can take this patches for a combined test. Thanks Nicolin
Hi Shameer, On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote: > Hi, > > This series adds initial support for a user-creatable "arm-smmuv3-nested" > device to Qemu. At present the Qemu ARM SMMUv3 emulation is per machine > and cannot support multiple SMMUv3s. > I had a quick look at the SMMUv3 files, as now SMMUv3 supports nested translation emulation, would it make sense to rename this? As AFAIU, this is about virt (stage-1) SMMUv3 that is emulated to a guest. Including vSMMU or virt would help distinguish the code, as now some new function as smmu_nested_realize() looks confusing. Thanks, Mostafa > In order to support vfio-pci dev assignment with vSMMUv3, the physical > SMMUv3 has to be configured in nested mode. Having a pluggable > "arm-smmuv3-nested" device enables us to have multiple vSMMUv3 for Guests > running on a host with multiple physical SMMUv3s. A few benefits of doing > this are, > > 1. Avoid invalidation broadcast or lookup in case devices are behind > multiple phys SMMUv3s. > 2. Makes it easy to handle phys SMMUv3s that differ in features. > 3. Easy to handle future requirements such as vCMDQ support. > > This is based on discussions/suggestions received for a previous RFC by > Nicolin here[0]. > > This series includes, > -Adds support for "arm-smmuv3-nested" device. At present only virt is > supported and is using _plug_cb() callback to hook the sysbus mem > and irq (Not sure this has any negative repercussions). Patch #3. > -Provides a way to associate a pci-bus(pxb-pcie) to the above device. > Patch #3. > -The last patch is adding RMR support for MSI doorbell handling. Patch #5. > This may change in future[1]. > > This RFC is for initial discussion/test purposes only and includes patches > that are only relevant for adding the "arm-smmuv3-nested" support. For the > complete branch please find, > https://github.com/hisilicon/qemu/tree/private-smmuv3-nested-dev-rfc-v1 > > Few ToDos to note, > 1. At present default-bus-bypass-iommu=on should be set when > arm-smmuv3-nested dev is specified. Otherwise you may get an IORT > related boot error. Requires fixing. > 2. Hot adding a device is not working at the moment. Looks like pcihp irq issue. > Could be a bug in IORT id mappings. > 3. The above branch doesn't support vSVA yet. > > Hopefully this is helpful in taking the discussion forward. Please take a > look and let me know. > > How to use it(Eg:): > > On a HiSilicon platform that has multiple physical SMMUv3s, the ACC ZIP VF > devices and HNS VF devices are behind different SMMUv3s. So for a Guest, > specify two smmuv3-nested devices each behind a pxb-pcie as below, > > ./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass-iommu=on \ > -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \ > -object iommufd,id=iommufd0 \ > -bios QEMU_EFI.fd \ > -kernel Image \ > -device virtio-blk-device,drive=fs \ > -drive if=none,file=rootfs.qcow2,id=fs \ > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \ > -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \ > -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \ > -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \ > -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \ > -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \ > -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \ > -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \ > -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw earlycon=pl011,0x9000000" \ > -device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \ > -fsdev local,id=p9fs2,path=p9root,security_model=mapped \ > -net none \ > -nographic > > Guest will boot with two SMMuv3s, > [ 1.608130] arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0 > [ 1.609655] arm-smmu-v3 arm-smmu-v3.0.auto: ias 48-bit, oas 48-bit (features 0x00020b25) > [ 1.612475] arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq > [ 1.614444] arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq > [ 1.617451] arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0 > [ 1.618842] arm-smmu-v3 arm-smmu-v3.1.auto: ias 48-bit, oas 48-bit (features 0x00020b25) > [ 1.621366] arm-smmu-v3 arm-smmu-v3.1.auto: allocated 65536 entries for cmdq > [ 1.623225] arm-smmu-v3 arm-smmu-v3.1.auto: allocated 32768 entries for evtq > > With a pci topology like below, > [root@localhost ~]# lspci -tv > -+-[0000:00]-+-00.0 Red Hat, Inc. QEMU PCIe Host bridge > | +-01.0 Red Hat, Inc. QEMU PCIe Expander bridge > | +-02.0 Red Hat, Inc. QEMU PCIe Expander bridge > | \-03.0 Virtio: Virtio filesystem > +-[0000:08]---00.0-[09]----00.0 Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function) > \-[0000:10]---00.0-[11]----00.0 Huawei Technologies Co., Ltd. HiSilicon ZIP Engine(Virtual Function) > [root@localhost ~]# > > And if you want to add another HNS VF, it should be added to the same SMMUv3 > as of the first HNS dev, > > -device pcie-root-port,id=pcie.port3,bus=pcie.1,chassis=3 \ > -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0 \ > > [root@localhost ~]# lspci -tv > -+-[0000:00]-+-00.0 Red Hat, Inc. QEMU PCIe Host bridge > | +-01.0 Red Hat, Inc. QEMU PCIe Expander bridge > | +-02.0 Red Hat, Inc. QEMU PCIe Expander bridge > | \-03.0 Virtio: Virtio filesystem > +-[0000:08]-+-00.0-[09]----00.0 Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function) > | \-01.0-[0a]----00.0 Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function) > \-[0000:10]---00.0-[11]----00.0 Huawei Technologies Co., Ltd. HiSilicon ZIP Engine(Virtual Function) > [root@localhost ~]# > > Attempt to add the HNS VF to a different SMMUv3 will result in, > > -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: Unable to attach viommu > -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0: vfio 0000:7d:02.2: > Failed to set iommu_device: [iommufd=29] error attach 0000:7d:02.2 (38) to id=11: Invalid argument > > At present Qemu is not doing any extra validation other than the above > failure to make sure the user configuration is correct or not. The > assumption is libvirt will take care of this. > > Thanks, > Shameer > [0] https://lore.kernel.org/qemu-devel/cover.1719361174.git.nicolinc@nvidia.com/ > [1] https://lore.kernel.org/linux-iommu/ZrVN05VylFq8lK4q@Asurada-Nvidia/ > > Eric Auger (1): > hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested > binding > > Nicolin Chen (2): > hw/arm/virt: Add an SMMU_IO_LEN macro > hw/arm/virt-acpi-build: Build IORT with multiple SMMU nodes > > Shameer Kolothum (2): > hw/arm/smmuv3: Add initial support for SMMUv3 Nested device > hw/arm/smmuv3: Associate a pci bus with a SMMUv3 Nested device > > hw/arm/smmuv3.c | 61 ++++++++++++++++++++++ > hw/arm/virt-acpi-build.c | 109 ++++++++++++++++++++++++++++++++------- > hw/arm/virt.c | 33 ++++++++++-- > hw/core/sysbus-fdt.c | 1 + > include/hw/arm/smmuv3.h | 17 ++++++ > include/hw/arm/virt.h | 15 ++++++ > 6 files changed, 215 insertions(+), 21 deletions(-) > > -- > 2.34.1 > >
On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum wrote: > This RFC is for initial discussion/test purposes only and includes patches > that are only relevant for adding the "arm-smmuv3-nested" support. For the > complete branch please find, > https://github.com/hisilicon/qemu/commits/private-smmuv3-nested-dev-rfc-v1/ I guess the QEMU branch above pairs with this (vIOMMU v6)? https://github.com/nicolinc/iommufd/commits/smmuv3_nesting-with-rmr Thanks Nicolin
> -----Original Message----- > From: Nicolin Chen <nicolinc@nvidia.com> > Sent: Tuesday, November 12, 2024 11:00 PM > To: Shameerali Kolothum Thodi > <shameerali.kolothum.thodi@huawei.com>; nathanc@nvidia.com > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org; > eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com; > ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B) > <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>; > Jonathan Cameron <jonathan.cameron@huawei.com>; > zhangfei.gao@linaro.org > Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable > nested SMMUv3 > > On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum wrote: > > Few ToDos to note, > > 1. At present default-bus-bypass-iommu=on should be set when > > arm-smmuv3-nested dev is specified. Otherwise you may get an IORT > > related boot error. Requires fixing. > > 2. Hot adding a device is not working at the moment. Looks like pcihp irq > issue. > > Could be a bug in IORT id mappings. > > Do we have enough bus number space for each pbx bus in IORT? > > The bus range is defined by min_/max_bus in hort_host_bridges(), > where the pci_bus_range() function call might not leave enough > space in the range for hotplugs IIRC. Ok. Thanks for the pointer. I will debug that. > > ./qemu-system-aarch64 -machine virt,gic-version=3,default-bus-bypass- > iommu=on \ > > -enable-kvm -cpu host -m 4G -smp cpus=8,maxcpus=8 \ > > -object iommufd,id=iommufd0 \ > > -bios QEMU_EFI.fd \ > > -kernel Image \ > > -device virtio-blk-device,drive=fs \ > > -drive if=none,file=rootfs.qcow2,id=fs \ > > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \ > > -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1 \ > > -device arm-smmuv3-nested,id=smmuv1,pci-bus=pcie.1 \ > > -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \ > > -device pxb-pcie,id=pcie.2,bus_nr=16,bus=pcie.0 \ > > -device pcie-root-port,id=pcie.port2,bus=pcie.2,chassis=2 \ > > -device arm-smmuv3-nested,id=smmuv2,pci-bus=pcie.2 \ > > -device vfio-pci,host=0000:75:00.1,bus=pcie.port2,iommufd=iommufd0 \ > > -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw > earlycon=pl011,0x9000000" \ > > -device virtio-9p-pci,fsdev=p9fs2,mount_tag=p9,bus=pcie.0 \ > > -fsdev local,id=p9fs2,path=p9root,security_model=mapped \ > > -net none \ > > -nographic > .. > > With a pci topology like below, > > [root@localhost ~]# lspci -tv > > -+-[0000:00]-+-00.0 Red Hat, Inc. QEMU PCIe Host bridge > > | +-01.0 Red Hat, Inc. QEMU PCIe Expander bridge > > | +-02.0 Red Hat, Inc. QEMU PCIe Expander bridge > > | \-03.0 Virtio: Virtio filesystem > > +-[0000:08]---00.0-[09]----00.0 Huawei Technologies Co., Ltd. HNS > Network Controller (Virtual Function) > > \-[0000:10]---00.0-[11]----00.0 Huawei Technologies Co., Ltd. HiSilicon ZIP > Engine(Virtual Function) > > [root@localhost ~]# > > > > And if you want to add another HNS VF, it should be added to the same > SMMUv3 > > as of the first HNS dev, > > > > -device pcie-root-port,id=pcie.port3,bus=pcie.1,chassis=3 \ > > -device vfio-pci,host=0000:7d:02.2,bus=pcie.port3,iommufd=iommufd0 \ > .. > > At present Qemu is not doing any extra validation other than the above > > failure to make sure the user configuration is correct or not. The > > assumption is libvirt will take care of this. > > Nathan from NVIDIA side is working on the libvirt. And he already > did some prototype coding in libvirt that could generate required > PCI topology. I think he can take this patches for a combined test. Cool. That's good to know. Thanks, SHameer
Hi Mostafa, > -----Original Message----- > From: Mostafa Saleh <smostafa@google.com> > Sent: Wednesday, November 13, 2024 4:17 PM > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org; > eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com; > nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm > <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>; > jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron > <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org > Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable > nested SMMUv3 > > Hi Shameer, > > On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote: > > Hi, > > > > This series adds initial support for a user-creatable "arm-smmuv3-nested" > > device to Qemu. At present the Qemu ARM SMMUv3 emulation is per > machine > > and cannot support multiple SMMUv3s. > > > > I had a quick look at the SMMUv3 files, as now SMMUv3 supports nested > translation emulation, would it make sense to rename this? As AFAIU, > this is about virt (stage-1) SMMUv3 that is emulated to a guest. > Including vSMMU or virt would help distinguish the code, as now > some new function as smmu_nested_realize() looks confusing. Yes. I have noticed that. We need to call it something else to avoid the confusion. Not sure including "virt" is a good idea as it may indicate virt machine. Probably "acc" as Nicolin suggested to indicate hw accelerated. I will think about a better one. Open to suggestions. Thanks, Shameer
> -----Original Message----- > From: Nicolin Chen <nicolinc@nvidia.com> > Sent: Wednesday, November 13, 2024 9:43 PM > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org; > eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com; > ddutile@redhat.com; Linuxarm <linuxarm@huawei.com>; Wangzhou (B) > <wangzhou1@hisilicon.com>; jiangkunkun <jiangkunkun@huawei.com>; > Jonathan Cameron <jonathan.cameron@huawei.com>; > zhangfei.gao@linaro.org > Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable > nested SMMUv3 > > On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum wrote: > > This RFC is for initial discussion/test purposes only and includes > > patches that are only relevant for adding the "arm-smmuv3-nested" > > support. For the complete branch please find, > > https://github.com/hisilicon/qemu/commits/private-smmuv3-nested-dev- > rf > > c-v1/ > > I guess the QEMU branch above pairs with this (vIOMMU v6)? > https://github.com/nicolinc/iommufd/commits/smmuv3_nesting-with-rmr I actually based it on top of a kernel branch that Zhangfei is keeping for his verification tests. https://github.com/Linaro/linux-kernel-uadk/commits/6.12-wip-10.26/ But yes, it indeed looks like based on the branch you mentioned above. Thanks, Shameer.
Hi Shameer, On Thu, Nov 14, 2024 at 08:01:28AM +0000, Shameerali Kolothum Thodi wrote: > Hi Mostafa, > > > -----Original Message----- > > From: Mostafa Saleh <smostafa@google.com> > > Sent: Wednesday, November 13, 2024 4:17 PM > > To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> > > Cc: qemu-arm@nongnu.org; qemu-devel@nongnu.org; > > eric.auger@redhat.com; peter.maydell@linaro.org; jgg@nvidia.com; > > nicolinc@nvidia.com; ddutile@redhat.com; Linuxarm > > <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>; > > jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron > > <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org > > Subject: Re: [RFC PATCH 0/5] hw/arm/virt: Add support for user-creatable > > nested SMMUv3 > > > > Hi Shameer, > > > > On Fri, Nov 08, 2024 at 12:52:37PM +0000, Shameer Kolothum via wrote: > > > Hi, > > > > > > This series adds initial support for a user-creatable "arm-smmuv3-nested" > > > device to Qemu. At present the Qemu ARM SMMUv3 emulation is per > > machine > > > and cannot support multiple SMMUv3s. > > > > > > > I had a quick look at the SMMUv3 files, as now SMMUv3 supports nested > > translation emulation, would it make sense to rename this? As AFAIU, > > this is about virt (stage-1) SMMUv3 that is emulated to a guest. > > Including vSMMU or virt would help distinguish the code, as now > > some new function as smmu_nested_realize() looks confusing. > > Yes. I have noticed that. We need to call it something else to avoid the > confusion. Not sure including "virt" is a good idea as it may indicate virt > machine. Probably "acc" as Nicolin suggested to indicate hw accelerated. > I will think about a better one. Open to suggestions. "acc" sounds good to me, also if possible we can have smmuv3-acc.c where it has all the specific logic, and the main file just calls into it. Thanks, Mostafa > > Thanks, > Shameer >