Message ID | 1580300216-86172-3-git-send-email-yi.l.liu@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | intel_iommu: expose Shared Virtual Addressing to VMs | expand |
On Wed, Jan 29, 2020 at 04:16:33AM -0800, Liu, Yi L wrote: > From: Liu Yi L <yi.l.liu@intel.com> > > Currently, many platform vendors provide the capability of dual stage > DMA address translation in hardware. For example, nested translation > on Intel VT-d scalable mode, nested stage translation on ARM SMMUv3, > and etc. In dual stage DMA address translation, there are two stages > address translation, stage-1 (a.k.a first-level) and stage-2 (a.k.a > second-level) translation structures. Stage-1 translation results are > also subjected to stage-2 translation structures. Take vSVA (Virtual > Shared Virtual Addressing) as an example, guest IOMMU driver owns > stage-1 translation structures (covers GVA->GPA translation), and host > IOMMU driver owns stage-2 translation structures (covers GPA->HPA > translation). VMM is responsible to bind stage-1 translation structures > to host, thus hardware could achieve GVA->GPA and then GPA->HPA > translation. For more background on SVA, refer the below links. > - https://www.youtube.com/watch?v=Kq_nfGK5MwQ > - https://events19.lfasiallc.com/wp-content/uploads/2017/11/\ > Shared-Virtual-Memory-in-KVM_Yi-Liu.pdf > > As above, dual stage DMA translation offers two stage address mappings, > which could have better DMA address translation support for passthru > devices. This is also what vIOMMU developers are doing so far. Efforts > includes vSVA enabling from Yi Liu and SMMUv3 Nested Stage Setup from > Eric Auger. > https://www.spinics.net/lists/kvm/msg198556.html > https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg02842.html > > Both efforts are aiming to expose a vIOMMU with dual stage hardware > backed. As so, QEMU needs to have an explicit object to stand for > the dual stage capability from hardware. Such object offers abstract > for the dual stage DMA translation related operations, like: > > 1) PASID allocation (allow host to intercept in PASID allocation) > 2) bind stage-1 translation structures to host > 3) propagate stage-1 cache invalidation to host > 4) DMA address translation fault (I/O page fault) servicing etc. > > This patch introduces DualStageIOMMUObject to stand for the hardware > dual stage DMA translation capability. PASID allocation/free are the > first operation included in it, in future, there will be more operations > like bind_stage1_pgtbl and invalidate_stage1_cache and etc. > > Cc: Kevin Tian <kevin.tian@intel.com> > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> > Cc: Peter Xu <peterx@redhat.com> > Cc: Eric Auger <eric.auger@redhat.com> > Cc: Yi Sun <yi.y.sun@linux.intel.com> > Cc: David Gibson <david@gibson.dropbear.id.au> > Signed-off-by: Liu Yi L <yi.l.liu@intel.com> Several overall queries about this: 1) Since it's explicitly handling PASIDs, this seems a lot more specific to SVM than the name suggests. I'd suggest a rename. 2) Why are you hand rolling structures of pointers, rather than making this a QOM class or interface and putting those things into methods? 3) It's not really clear to me if this is for the case where both stages of translation are visible to the guest, or only one of them.
Hi David, > From: David Gibson [mailto:david@gibson.dropbear.id.au] > Sent: Friday, January 31, 2020 11:59 AM > To: Liu, Yi L <yi.l.liu@intel.com> > Subject: Re: [RFC v3 02/25] hw/iommu: introduce DualStageIOMMUObject > > On Wed, Jan 29, 2020 at 04:16:33AM -0800, Liu, Yi L wrote: > > From: Liu Yi L <yi.l.liu@intel.com> > > > > Currently, many platform vendors provide the capability of dual stage > > DMA address translation in hardware. For example, nested translation > > on Intel VT-d scalable mode, nested stage translation on ARM SMMUv3, > > and etc. In dual stage DMA address translation, there are two stages > > address translation, stage-1 (a.k.a first-level) and stage-2 (a.k.a > > second-level) translation structures. Stage-1 translation results are > > also subjected to stage-2 translation structures. Take vSVA (Virtual > > Shared Virtual Addressing) as an example, guest IOMMU driver owns > > stage-1 translation structures (covers GVA->GPA translation), and host > > IOMMU driver owns stage-2 translation structures (covers GPA->HPA > > translation). VMM is responsible to bind stage-1 translation structures > > to host, thus hardware could achieve GVA->GPA and then GPA->HPA > > translation. For more background on SVA, refer the below links. > > - https://www.youtube.com/watch?v=Kq_nfGK5MwQ > > - https://events19.lfasiallc.com/wp-content/uploads/2017/11/\ > > Shared-Virtual-Memory-in-KVM_Yi-Liu.pdf > > > > As above, dual stage DMA translation offers two stage address mappings, > > which could have better DMA address translation support for passthru > > devices. This is also what vIOMMU developers are doing so far. Efforts > > includes vSVA enabling from Yi Liu and SMMUv3 Nested Stage Setup from > > Eric Auger. > > https://www.spinics.net/lists/kvm/msg198556.html > > https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg02842.html > > > > Both efforts are aiming to expose a vIOMMU with dual stage hardware > > backed. As so, QEMU needs to have an explicit object to stand for > > the dual stage capability from hardware. Such object offers abstract > > for the dual stage DMA translation related operations, like: > > > > 1) PASID allocation (allow host to intercept in PASID allocation) > > 2) bind stage-1 translation structures to host > > 3) propagate stage-1 cache invalidation to host > > 4) DMA address translation fault (I/O page fault) servicing etc. > > > > This patch introduces DualStageIOMMUObject to stand for the hardware > > dual stage DMA translation capability. PASID allocation/free are the > > first operation included in it, in future, there will be more operations > > like bind_stage1_pgtbl and invalidate_stage1_cache and etc. > > > > Cc: Kevin Tian <kevin.tian@intel.com> > > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> > > Cc: Peter Xu <peterx@redhat.com> > > Cc: Eric Auger <eric.auger@redhat.com> > > Cc: Yi Sun <yi.y.sun@linux.intel.com> > > Cc: David Gibson <david@gibson.dropbear.id.au> > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com> > > Several overall queries about this: > > 1) Since it's explicitly handling PASIDs, this seems a lot more > specific to SVM than the name suggests. I'd suggest a rename. It is not specific to SVM in future. We have efforts to move guest IOVA support based on host IOMMU's dual-stage DMA translation capability. Then, guest IOVA support will also re-use the methods provided by this abstract layer. e.g. the bind_guest_pgtbl() and flush_iommu_iotlb(). For the naming, how about HostIOMMUContext? This layer is to provide explicit methods for setting up dual-stage DMA translation in host. > > 2) Why are you hand rolling structures of pointers, rather than making > this a QOM class or interface and putting those things into methods? Maybe the name is not proper. Although I named it as DualStageIOMMUObject, it is actually a kind of abstract layer we discussed in previous email. I think this is similar with VFIO_MAP/UNMAP. The difference is that VFIO_MAP/ UNMAP programs mappings to host iommu domain. While the newly added explicit method is to link guest page table to host iommu domain. VFIO_MAP/UNMAP is exposed to vIOMMU emulators via MemoryRegion layer. right? Maybe adding a similar abstract layer is enough. Is adding QOM really necessary for this case? > 3) It's not really clear to me if this is for the case where both > stages of translation are visible to the guest, or only one of > them. For this case, vIOMMU will only expose a single stage translation to VM. e.g. Intel VT-d, vIOMMU exposes first-level translation to guest. Hardware IOMMUs with the dual-stage translation capability lets guest own stage-1 translation structures and host owns the stage-2 translation structures. VMM is responsible to bind guest's translation structures to host and enable dual-stage translation. e.g. on Intel VT-d, config translation type to be NESTED. Take guest SVM as an example, guest iommu driver owns the gVA->gPA mappings, which is treated as stage-1 translation from host point of view. Host itself owns the gPA->hPPA translation and called stage-2 translation when dual-stage translation is configured. For guest IOVA, it is similar with guest SVM. Guest iommu driver owns the gIOVA->gPA mappings, which is treated as stage-1 translation. Host owns the gPA->hPA translation. Regards, Yi Liu
On Fri, Jan 31, 2020 at 11:42:06AM +0000, Liu, Yi L wrote: > Hi David, > > > From: David Gibson [mailto:david@gibson.dropbear.id.au] > > Sent: Friday, January 31, 2020 11:59 AM > > To: Liu, Yi L <yi.l.liu@intel.com> > > Subject: Re: [RFC v3 02/25] hw/iommu: introduce DualStageIOMMUObject > > > > On Wed, Jan 29, 2020 at 04:16:33AM -0800, Liu, Yi L wrote: > > > From: Liu Yi L <yi.l.liu@intel.com> > > > > > > Currently, many platform vendors provide the capability of dual stage > > > DMA address translation in hardware. For example, nested translation > > > on Intel VT-d scalable mode, nested stage translation on ARM SMMUv3, > > > and etc. In dual stage DMA address translation, there are two stages > > > address translation, stage-1 (a.k.a first-level) and stage-2 (a.k.a > > > second-level) translation structures. Stage-1 translation results are > > > also subjected to stage-2 translation structures. Take vSVA (Virtual > > > Shared Virtual Addressing) as an example, guest IOMMU driver owns > > > stage-1 translation structures (covers GVA->GPA translation), and host > > > IOMMU driver owns stage-2 translation structures (covers GPA->HPA > > > translation). VMM is responsible to bind stage-1 translation structures > > > to host, thus hardware could achieve GVA->GPA and then GPA->HPA > > > translation. For more background on SVA, refer the below links. > > > - https://www.youtube.com/watch?v=Kq_nfGK5MwQ > > > - https://events19.lfasiallc.com/wp-content/uploads/2017/11/\ > > > Shared-Virtual-Memory-in-KVM_Yi-Liu.pdf > > > > > > As above, dual stage DMA translation offers two stage address mappings, > > > which could have better DMA address translation support for passthru > > > devices. This is also what vIOMMU developers are doing so far. Efforts > > > includes vSVA enabling from Yi Liu and SMMUv3 Nested Stage Setup from > > > Eric Auger. > > > https://www.spinics.net/lists/kvm/msg198556.html > > > https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg02842.html > > > > > > Both efforts are aiming to expose a vIOMMU with dual stage hardware > > > backed. As so, QEMU needs to have an explicit object to stand for > > > the dual stage capability from hardware. Such object offers abstract > > > for the dual stage DMA translation related operations, like: > > > > > > 1) PASID allocation (allow host to intercept in PASID allocation) > > > 2) bind stage-1 translation structures to host > > > 3) propagate stage-1 cache invalidation to host > > > 4) DMA address translation fault (I/O page fault) servicing etc. > > > > > > This patch introduces DualStageIOMMUObject to stand for the hardware > > > dual stage DMA translation capability. PASID allocation/free are the > > > first operation included in it, in future, there will be more operations > > > like bind_stage1_pgtbl and invalidate_stage1_cache and etc. > > > > > > Cc: Kevin Tian <kevin.tian@intel.com> > > > Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> > > > Cc: Peter Xu <peterx@redhat.com> > > > Cc: Eric Auger <eric.auger@redhat.com> > > > Cc: Yi Sun <yi.y.sun@linux.intel.com> > > > Cc: David Gibson <david@gibson.dropbear.id.au> > > > Signed-off-by: Liu Yi L <yi.l.liu@intel.com> > > > > Several overall queries about this: > > > > 1) Since it's explicitly handling PASIDs, this seems a lot more > > specific to SVM than the name suggests. I'd suggest a rename. > > It is not specific to SVM in future. We have efforts to move guest > IOVA support based on host IOMMU's dual-stage DMA translation > capability. It's assuming the existence of pasids though, which is a rather more specific model than simply having two translation stages. > Then, guest IOVA support will also re-use the methods > provided by this abstract layer. e.g. the bind_guest_pgtbl() and > flush_iommu_iotlb(). > > For the naming, how about HostIOMMUContext? This layer is to provide > explicit methods for setting up dual-stage DMA translation in host. Uh.. maybe? I'm still having trouble figuring out what this object really represents. > > 2) Why are you hand rolling structures of pointers, rather than making > > this a QOM class or interface and putting those things into methods? > > Maybe the name is not proper. Although I named it as DualStageIOMMUObject, > it is actually a kind of abstract layer we discussed in previous email. I > think this is similar with VFIO_MAP/UNMAP. The difference is that VFIO_MAP/ > UNMAP programs mappings to host iommu domain. While the newly added explicit > method is to link guest page table to host iommu domain. VFIO_MAP/UNMAP > is exposed to vIOMMU emulators via MemoryRegion layer. right? Maybe adding a > similar abstract layer is enough. Is adding QOM really necessary for this > case? Um... sorry, I'm having a lot of trouble making any sense of that. > > 3) It's not really clear to me if this is for the case where both > > stages of translation are visible to the guest, or only one of > > them. > > For this case, vIOMMU will only expose a single stage translation to VM. > e.g. Intel VT-d, vIOMMU exposes first-level translation to guest. Hardware > IOMMUs with the dual-stage translation capability lets guest own stage-1 > translation structures and host owns the stage-2 translation structures. > VMM is responsible to bind guest's translation structures to host and > enable dual-stage translation. e.g. on Intel VT-d, config translation type > to be NESTED. Ok, understood. > Take guest SVM as an example, guest iommu driver owns the gVA->gPA mappings, > which is treated as stage-1 translation from host point of view. Host itself > owns the gPA->hPPA translation and called stage-2 translation when dual-stage > translation is configured. > > For guest IOVA, it is similar with guest SVM. Guest iommu driver owns the > gIOVA->gPA mappings, which is treated as stage-1 translation. Host owns the > gPA->hPA translation. Ok, that makes sense. It's still not really clear to me which part of this setup this object represents.
diff --git a/hw/Makefile.objs b/hw/Makefile.objs index 660e2b4..cab83fe 100644 --- a/hw/Makefile.objs +++ b/hw/Makefile.objs @@ -40,6 +40,7 @@ devices-dirs-$(CONFIG_MEM_DEVICE) += mem/ devices-dirs-$(CONFIG_NUBUS) += nubus/ devices-dirs-y += semihosting/ devices-dirs-y += smbios/ +devices-dirs-y += iommu/ endif common-obj-y += $(devices-dirs-y) diff --git a/hw/iommu/Makefile.objs b/hw/iommu/Makefile.objs new file mode 100644 index 0000000..d4f3b39 --- /dev/null +++ b/hw/iommu/Makefile.objs @@ -0,0 +1 @@ +obj-y += dual_stage_iommu.o diff --git a/hw/iommu/dual_stage_iommu.c b/hw/iommu/dual_stage_iommu.c new file mode 100644 index 0000000..be4179d --- /dev/null +++ b/hw/iommu/dual_stage_iommu.c @@ -0,0 +1,59 @@ +/* + * QEMU abstract of Hardware Dual Stage DMA translation capability + * + * Copyright (C) 2020 Intel Corporation. + * + * Authors: Liu Yi L <yi.l.liu@intel.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + + * You should have received a copy of the GNU General Public License along + * with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#include "qemu/osdep.h" +#include "hw/iommu/dual_stage_iommu.h" + +int ds_iommu_pasid_alloc(DualStageIOMMUObject *dsi_obj, uint32_t min, + uint32_t max, uint32_t *pasid) +{ + if (!dsi_obj) { + return -ENOENT; + } + + if (dsi_obj->ops && dsi_obj->ops->pasid_alloc) { + return dsi_obj->ops->pasid_alloc(dsi_obj, min, max, pasid); + } + return -ENOENT; +} + +int ds_iommu_pasid_free(DualStageIOMMUObject *dsi_obj, uint32_t pasid) +{ + if (!dsi_obj) { + return -ENOENT; + } + + if (dsi_obj->ops && dsi_obj->ops->pasid_free) { + return dsi_obj->ops->pasid_free(dsi_obj, pasid); + } + return -ENOENT; +} + +void ds_iommu_object_init(DualStageIOMMUObject *dsi_obj, + DualStageIOMMUOps *ops) +{ + dsi_obj->ops = ops; +} + +void ds_iommu_object_destroy(DualStageIOMMUObject *dsi_obj) +{ + dsi_obj->ops = NULL; +} diff --git a/include/hw/iommu/dual_stage_iommu.h b/include/hw/iommu/dual_stage_iommu.h new file mode 100644 index 0000000..e9891e3 --- /dev/null +++ b/include/hw/iommu/dual_stage_iommu.h @@ -0,0 +1,59 @@ +/* + * QEMU abstraction of IOMMU Context + * + * Copyright (C) 2020 Red Hat Inc. + * + * Authors: Liu, Yi L <yi.l.liu@intel.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + + * You should have received a copy of the GNU General Public License along + * with this program; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef HW_DS_IOMMU_H +#define HW_DS_IOMMU_H + +#include "qemu/queue.h" +#ifndef CONFIG_USER_ONLY +#include "exec/hwaddr.h" +#endif + +typedef struct DualStageIOMMUObject DualStageIOMMUObject; +typedef struct DualStageIOMMUOps DualStageIOMMUOps; + +struct DualStageIOMMUOps { + /* Allocate pasid from DualStageIOMMU (a.k.a. host IOMMU) */ + int (*pasid_alloc)(DualStageIOMMUObject *dsi_obj, + uint32_t min, + uint32_t max, + uint32_t *pasid); + /* Reclaim a pasid from DualStageIOMMU (a.k.a. host IOMMU) */ + int (*pasid_free)(DualStageIOMMUObject *dsi_obj, + uint32_t pasid); +}; + +/* + * This is an abstraction of Dual-stage IOMMU. + */ +struct DualStageIOMMUObject { + DualStageIOMMUOps *ops; +}; + +int ds_iommu_pasid_alloc(DualStageIOMMUObject *dsi_obj, uint32_t min, + uint32_t max, uint32_t *pasid); +int ds_iommu_pasid_free(DualStageIOMMUObject *dsi_obj, uint32_t pasid); + +void ds_iommu_object_init(DualStageIOMMUObject *dsi_obj, + DualStageIOMMUOps *ops); +void ds_iommu_object_destroy(DualStageIOMMUObject *dsi_obj); + +#endif