Message ID | 20180411072027.5656-1-tiwei.bie@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote: [...] > This is just a RFC for now. It seems that, it doesn't work > as expected when guest is using kernel driver (To handle > this case, it seems that some RAM regions' events also need > to be listened). Any comments would be appreciated! Thanks! Hi, Tiwei, What's your kernel command line in the guest? Is iommu=pt there?
On 2018年04月11日 15:20, Tiwei Bie wrote: > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > feature for vhost-user. By default, vhost-user backend needs > to query the IOTLBs from QEMU after meeting unknown IOVAs. > With this protocol feature negotiated, QEMU will provide all > the IOTLBs to vhost-user backend without waiting for the > queries from backend. This is helpful when using a hardware > accelerator which is not able to handle unknown IOVAs at the > vhost-user backend. > > Signed-off-by: Tiwei Bie<tiwei.bie@intel.com> > --- > The idea of this patch is to let QEMU push all the IOTLBs > to vhost-user backend without waiting for the queries from > the backend. Because hardware accelerator at the vhost-user > backend may not be able to handle unknown IOVAs. > > This is just a RFC for now. It seems that, it doesn't work > as expected when guest is using kernel driver (To handle > this case, it seems that some RAM regions' events also need > to be listened). Any comments would be appreciated! Thanks! Interesting, a quick question is why this is needed? Can we just use exist IOTLB update message? It looks to me at least kernel does not need this. Thanks
Hi Peter, On Wed, Apr 11, 2018 at 04:00:36PM +0800, Peter Xu wrote: > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote: > > [...] > > > This is just a RFC for now. It seems that, it doesn't work > > as expected when guest is using kernel driver (To handle > > this case, it seems that some RAM regions' events also need > > to be listened). Any comments would be appreciated! Thanks! > > Hi, Tiwei, > > What's your kernel command line in the guest? Is iommu=pt there? Yeah, you are right! The related things in kernel command line are: iommu=pt intel_iommu=on Hmm, how will this param affect vIOMMU's behaviour?.. Best regards, Tiwei Bie > > -- > Peter Xu
On Wed, Apr 11, 2018 at 04:25:56PM +0800, Tiwei Bie wrote: > Hi Peter, > > On Wed, Apr 11, 2018 at 04:00:36PM +0800, Peter Xu wrote: > > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote: > > > > [...] > > > > > This is just a RFC for now. It seems that, it doesn't work > > > as expected when guest is using kernel driver (To handle > > > this case, it seems that some RAM regions' events also need > > > to be listened). Any comments would be appreciated! Thanks! > > > > Hi, Tiwei, > > > > What's your kernel command line in the guest? Is iommu=pt there? > > Yeah, you are right! The related things in kernel command line are: > > iommu=pt intel_iommu=on > > Hmm, how will this param affect vIOMMU's behaviour?.. If iommu=pt is there, guest kernel will try to bypass IOMMU, the IOMMU regions will be disabled completely in that case, hence it's very possible that your IOMMU memory listeners won't get anything useful. Maybe you can consider removing iommu=pt in the guest parameter to see whether the guest kernel driver could work.
On Wed, Apr 11, 2018 at 04:01:19PM +0800, Jason Wang wrote: > On 2018年04月11日 15:20, Tiwei Bie wrote: > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > > feature for vhost-user. By default, vhost-user backend needs > > to query the IOTLBs from QEMU after meeting unknown IOVAs. > > With this protocol feature negotiated, QEMU will provide all > > the IOTLBs to vhost-user backend without waiting for the > > queries from backend. This is helpful when using a hardware > > accelerator which is not able to handle unknown IOVAs at the > > vhost-user backend. > > > > Signed-off-by: Tiwei Bie<tiwei.bie@intel.com> > > --- > > The idea of this patch is to let QEMU push all the IOTLBs > > to vhost-user backend without waiting for the queries from > > the backend. Because hardware accelerator at the vhost-user > > backend may not be able to handle unknown IOVAs. > > > > This is just a RFC for now. It seems that, it doesn't work > > as expected when guest is using kernel driver (To handle > > this case, it seems that some RAM regions' events also need > > to be listened). Any comments would be appreciated! Thanks! > > Interesting, a quick question is why this is needed? Can we just use exist > IOTLB update message? Yeah, we are still using the existing IOTLB update messages to send the IOTLB messages to backend. The only difference is that, QEMU won't wait for the queries before sending the IOTLB update messages. > > It looks to me at least kernel does not need this. Something similar in kernel vhost is that, for kernel vhost, QEMU needs to push the IOTLBs of some ring addrs to kernel vhost backend without waiting for the queries. Best regards, Tiwei Bie > > Thanks
On Wed, Apr 11, 2018 at 04:37:16PM +0800, Peter Xu wrote: > On Wed, Apr 11, 2018 at 04:25:56PM +0800, Tiwei Bie wrote: > > On Wed, Apr 11, 2018 at 04:00:36PM +0800, Peter Xu wrote: > > > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote: > > > > > > [...] > > > > > > > This is just a RFC for now. It seems that, it doesn't work > > > > as expected when guest is using kernel driver (To handle > > > > this case, it seems that some RAM regions' events also need > > > > to be listened). Any comments would be appreciated! Thanks! > > > > > > Hi, Tiwei, > > > > > > What's your kernel command line in the guest? Is iommu=pt there? > > > > Yeah, you are right! The related things in kernel command line are: > > > > iommu=pt intel_iommu=on > > > > Hmm, how will this param affect vIOMMU's behaviour?.. > > If iommu=pt is there, guest kernel will try to bypass IOMMU, the IOMMU > regions will be disabled completely in that case, hence it's very > possible that your IOMMU memory listeners won't get anything useful. > > Maybe you can consider removing iommu=pt in the guest parameter to see > whether the guest kernel driver could work. Cool. I'll give it a try! Considering we may also need to handle the iommu=pt case, is there any event in QEMU can be used to know whether the IOMMU regions are disabled or enabled by the guest? Best regards, Tiwei Bie > > -- > Peter Xu
On Wed, Apr 11, 2018 at 04:55:25PM +0800, Tiwei Bie wrote: > On Wed, Apr 11, 2018 at 04:37:16PM +0800, Peter Xu wrote: > > On Wed, Apr 11, 2018 at 04:25:56PM +0800, Tiwei Bie wrote: > > > On Wed, Apr 11, 2018 at 04:00:36PM +0800, Peter Xu wrote: > > > > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote: > > > > > > > > [...] > > > > > > > > > This is just a RFC for now. It seems that, it doesn't work > > > > > as expected when guest is using kernel driver (To handle > > > > > this case, it seems that some RAM regions' events also need > > > > > to be listened). Any comments would be appreciated! Thanks! > > > > > > > > Hi, Tiwei, > > > > > > > > What's your kernel command line in the guest? Is iommu=pt there? > > > > > > Yeah, you are right! The related things in kernel command line are: > > > > > > iommu=pt intel_iommu=on > > > > > > Hmm, how will this param affect vIOMMU's behaviour?.. > > > > If iommu=pt is there, guest kernel will try to bypass IOMMU, the IOMMU > > regions will be disabled completely in that case, hence it's very > > possible that your IOMMU memory listeners won't get anything useful. > > > > Maybe you can consider removing iommu=pt in the guest parameter to see > > whether the guest kernel driver could work. > > Cool. I'll give it a try! Considering we may also need to > handle the iommu=pt case, is there any event in QEMU can > be used to know whether the IOMMU regions are disabled > or enabled by the guest? You may consider to use similar way like below patch to detect that: https://patchwork.kernel.org/patch/9735739/ That was a patch trying to preheat the vhost cache. It was refused at that time, but IMHO the logic can be used. You can just send the updates only if your new flag set. Then that won't be a preheat any more, but a must for your hardwares to work.
On Wed, Apr 11, 2018 at 05:16:47PM +0800, Peter Xu wrote: > On Wed, Apr 11, 2018 at 04:55:25PM +0800, Tiwei Bie wrote: > > On Wed, Apr 11, 2018 at 04:37:16PM +0800, Peter Xu wrote: > > > On Wed, Apr 11, 2018 at 04:25:56PM +0800, Tiwei Bie wrote: > > > > On Wed, Apr 11, 2018 at 04:00:36PM +0800, Peter Xu wrote: > > > > > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote: > > > > > > > > > > [...] > > > > > > > > > > > This is just a RFC for now. It seems that, it doesn't work > > > > > > as expected when guest is using kernel driver (To handle > > > > > > this case, it seems that some RAM regions' events also need > > > > > > to be listened). Any comments would be appreciated! Thanks! > > > > > > > > > > Hi, Tiwei, > > > > > > > > > > What's your kernel command line in the guest? Is iommu=pt there? > > > > > > > > Yeah, you are right! The related things in kernel command line are: > > > > > > > > iommu=pt intel_iommu=on > > > > > > > > Hmm, how will this param affect vIOMMU's behaviour?.. > > > > > > If iommu=pt is there, guest kernel will try to bypass IOMMU, the IOMMU > > > regions will be disabled completely in that case, hence it's very > > > possible that your IOMMU memory listeners won't get anything useful. > > > > > > Maybe you can consider removing iommu=pt in the guest parameter to see > > > whether the guest kernel driver could work. > > > > Cool. I'll give it a try! Considering we may also need to > > handle the iommu=pt case, is there any event in QEMU can > > be used to know whether the IOMMU regions are disabled > > or enabled by the guest? > > You may consider to use similar way like below patch to detect that: > > https://patchwork.kernel.org/patch/9735739/ > > That was a patch trying to preheat the vhost cache. It was refused at > that time, but IMHO the logic can be used. You can just send the > updates only if your new flag set. Then that won't be a preheat any > more, but a must for your hardwares to work. It looks pretty helpful in my case! Thank you so much! Best regards, Tiwei Bie > > -- > Peter Xu
On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote: > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > feature for vhost-user. By default, vhost-user backend needs > to query the IOTLBs from QEMU after meeting unknown IOVAs. > With this protocol feature negotiated, QEMU will provide all > the IOTLBs to vhost-user backend without waiting for the > queries from backend. This is helpful when using a hardware > accelerator which is not able to handle unknown IOVAs at the > vhost-user backend. > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> This is potentially a large amount of data to be sent on a socket. I had an impression that a hardware accelerator was using VFIO anyway. Given this, can't we have QEMU program the shadow IOMMU tables into VFIO directly? > --- > The idea of this patch is to let QEMU push all the IOTLBs > to vhost-user backend without waiting for the queries from > the backend. Because hardware accelerator at the vhost-user > backend may not be able to handle unknown IOVAs. > > This is just a RFC for now. It seems that, it doesn't work > as expected when guest is using kernel driver (To handle > this case, it seems that some RAM regions' events also need > to be listened). Any comments would be appreciated! Thanks! > > docs/interop/vhost-user.txt | 9 ++++++++ > hw/virtio/vhost-backend.c | 7 ++++++ > hw/virtio/vhost-user.c | 8 +++++++ > hw/virtio/vhost.c | 47 ++++++++++++++++++++++++++++++++++++--- > include/hw/virtio/vhost-backend.h | 3 +++ > 5 files changed, 71 insertions(+), 3 deletions(-) > > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt > index 534caab18a..73e07f9dad 100644 > --- a/docs/interop/vhost-user.txt > +++ b/docs/interop/vhost-user.txt > @@ -380,6 +380,7 @@ Protocol features > #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7 > #define VHOST_USER_PROTOCOL_F_PAGEFAULT 8 > #define VHOST_USER_PROTOCOL_F_CONFIG 9 > +#define VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB 10 > > Master message types > -------------------- > @@ -797,3 +798,11 @@ resilient for selective requests. > For the message types that already solicit a reply from the client, the > presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings > no behavioural change. (See the 'Communication' section for details.) > + > +VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > +------------------------------------ > +By default, vhost-user backend needs to query the IOTLBs from QEMU after > +meeting unknown IOVAs. With this protocol feature negotiated, QEMU will > +provide all the IOTLBs to vhost backend without waiting for the queries > +from backend. This is helpful when using a hardware accelerator which is > +not able to handle unknown IOVAs at the vhost-user backend. > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c > index 7f09efab8b..d72994e9a5 100644 > --- a/hw/virtio/vhost-backend.c > +++ b/hw/virtio/vhost-backend.c > @@ -233,6 +233,11 @@ static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev, > qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL, NULL); > } > > +static bool vhost_kernel_need_all_device_iotlb(struct vhost_dev *dev) > +{ > + return false; > +} > + > static const VhostOps kernel_ops = { > .backend_type = VHOST_BACKEND_TYPE_KERNEL, > .vhost_backend_init = vhost_kernel_init, > @@ -264,6 +269,8 @@ static const VhostOps kernel_ops = { > #endif /* CONFIG_VHOST_VSOCK */ > .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback, > .vhost_send_device_iotlb_msg = vhost_kernel_send_device_iotlb_msg, > + .vhost_backend_need_all_device_iotlb = > + vhost_kernel_need_all_device_iotlb, > }; > > int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type) > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > index 38da8692bb..30e0b1c13b 100644 > --- a/hw/virtio/vhost-user.c > +++ b/hw/virtio/vhost-user.c > @@ -47,6 +47,7 @@ enum VhostUserProtocolFeature { > VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7, > VHOST_USER_PROTOCOL_F_PAGEFAULT = 8, > VHOST_USER_PROTOCOL_F_CONFIG = 9, > + VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB = 10, > VHOST_USER_PROTOCOL_F_MAX > }; > > @@ -1581,6 +1582,12 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id) > return 0; > } > > +static bool vhost_user_need_all_device_iotlb(struct vhost_dev *dev) > +{ > + return virtio_has_feature(dev->protocol_features, > + VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB); > +} > + > const VhostOps user_ops = { > .backend_type = VHOST_BACKEND_TYPE_USER, > .vhost_backend_init = vhost_user_init, > @@ -1611,4 +1618,5 @@ const VhostOps user_ops = { > .vhost_set_config = vhost_user_set_config, > .vhost_crypto_create_session = vhost_user_crypto_create_session, > .vhost_crypto_close_session = vhost_user_crypto_close_session, > + .vhost_backend_need_all_device_iotlb = vhost_user_need_all_device_iotlb, > }; > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > index f51bf573d5..70922ee998 100644 > --- a/hw/virtio/vhost.c > +++ b/hw/virtio/vhost.c > @@ -48,6 +48,9 @@ static unsigned int used_memslots; > static QLIST_HEAD(, vhost_dev) vhost_devices = > QLIST_HEAD_INITIALIZER(vhost_devices); > > +static int vhost_memory_region_lookup(struct vhost_dev *hdev, uint64_t gpa, > + uint64_t *uaddr, uint64_t *len); > + > bool vhost_has_free_slot(void) > { > unsigned int slots_limit = ~0U; > @@ -634,12 +637,39 @@ static void vhost_region_addnop(MemoryListener *listener, > vhost_region_add_section(dev, section); > } > > -static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) > +static void vhost_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) > { > struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n); > struct vhost_dev *hdev = iommu->hdev; > hwaddr iova = iotlb->iova + iommu->iommu_offset; > > + if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) { > + uint64_t uaddr, len; > + > + rcu_read_lock(); > + > + if (iotlb->target_as != NULL) { > + if (vhost_memory_region_lookup(hdev, iotlb->translated_addr, > + &uaddr, &len)) { > + error_report("Fail to lookup the translated address " > + "%"PRIx64, iotlb->translated_addr); > + goto out; > + } > + > + len = MIN(iotlb->addr_mask + 1, len); > + iova = iova & ~iotlb->addr_mask; > + > + if (vhost_backend_update_device_iotlb(hdev, iova, uaddr, > + len, iotlb->perm)) { > + error_report("Fail to update device iotlb"); > + goto out; > + } > + } > +out: > + rcu_read_unlock(); > + return; > + } > + > if (vhost_backend_invalidate_device_iotlb(hdev, iova, > iotlb->addr_mask + 1)) { > error_report("Fail to invalidate device iotlb"); > @@ -652,6 +682,7 @@ static void vhost_iommu_region_add(MemoryListener *listener, > struct vhost_dev *dev = container_of(listener, struct vhost_dev, > iommu_listener); > struct vhost_iommu *iommu; > + IOMMUNotifierFlag flags; > Int128 end; > > if (!memory_region_is_iommu(section->mr)) { > @@ -662,8 +693,15 @@ static void vhost_iommu_region_add(MemoryListener *listener, > end = int128_add(int128_make64(section->offset_within_region), > section->size); > end = int128_sub(end, int128_one()); > - iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify, > - IOMMU_NOTIFIER_UNMAP, > + > + if (dev->vhost_ops->vhost_backend_need_all_device_iotlb(dev)) { > + flags = IOMMU_NOTIFIER_ALL; > + } else { > + flags = IOMMU_NOTIFIER_UNMAP; > + } > + > + iommu_notifier_init(&iommu->n, vhost_iommu_map_notify, > + flags, > section->offset_within_region, > int128_get64(end)); > iommu->mr = section->mr; > @@ -673,6 +711,9 @@ static void vhost_iommu_region_add(MemoryListener *listener, > memory_region_register_iommu_notifier(section->mr, &iommu->n); > QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next); > /* TODO: can replay help performance here? */ > + if (flags == IOMMU_NOTIFIER_ALL) { > + memory_region_iommu_replay(IOMMU_MEMORY_REGION(iommu->mr), &iommu->n); > + } > } > > static void vhost_iommu_region_del(MemoryListener *listener, > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h > index 5dac61f9ea..602fd987a4 100644 > --- a/include/hw/virtio/vhost-backend.h > +++ b/include/hw/virtio/vhost-backend.h > @@ -101,6 +101,8 @@ typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev, > typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev, > uint64_t session_id); > > +typedef bool (*vhost_backend_need_all_device_iotlb_op)(struct vhost_dev *dev); > + > typedef struct VhostOps { > VhostBackendType backend_type; > vhost_backend_init vhost_backend_init; > @@ -138,6 +140,7 @@ typedef struct VhostOps { > vhost_set_config_op vhost_set_config; > vhost_crypto_create_session_op vhost_crypto_create_session; > vhost_crypto_close_session_op vhost_crypto_close_session; > + vhost_backend_need_all_device_iotlb_op vhost_backend_need_all_device_iotlb; > } VhostOps; > > extern const VhostOps user_ops; > -- > 2.11.0
On 2018年04月11日 16:38, Tiwei Bie wrote: > On Wed, Apr 11, 2018 at 04:01:19PM +0800, Jason Wang wrote: >> On 2018年04月11日 15:20, Tiwei Bie wrote: >>> This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB >>> feature for vhost-user. By default, vhost-user backend needs >>> to query the IOTLBs from QEMU after meeting unknown IOVAs. >>> With this protocol feature negotiated, QEMU will provide all >>> the IOTLBs to vhost-user backend without waiting for the >>> queries from backend. This is helpful when using a hardware >>> accelerator which is not able to handle unknown IOVAs at the >>> vhost-user backend. >>> >>> Signed-off-by: Tiwei Bie<tiwei.bie@intel.com> >>> --- >>> The idea of this patch is to let QEMU push all the IOTLBs >>> to vhost-user backend without waiting for the queries from >>> the backend. Because hardware accelerator at the vhost-user >>> backend may not be able to handle unknown IOVAs. >>> >>> This is just a RFC for now. It seems that, it doesn't work >>> as expected when guest is using kernel driver (To handle >>> this case, it seems that some RAM regions' events also need >>> to be listened). Any comments would be appreciated! Thanks! >> Interesting, a quick question is why this is needed? Can we just use exist >> IOTLB update message? > Yeah, we are still using the existing IOTLB update messages > to send the IOTLB messages to backend. The only difference > is that, QEMU won't wait for the queries before sending the > IOTLB update messages. Yes, my question is not very clear. I mean why must need a new feature bit? It looks to me qemu code can work without this. Thanks > >> It looks to me at least kernel does not need this. > Something similar in kernel vhost is that, for kernel vhost, > QEMU needs to push the IOTLBs of some ring addrs to kernel > vhost backend without waiting for the queries. > > Best regards, > Tiwei Bie > >> Thanks
On 2018年04月11日 21:22, Michael S. Tsirkin wrote: > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote: >> This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB >> feature for vhost-user. By default, vhost-user backend needs >> to query the IOTLBs from QEMU after meeting unknown IOVAs. >> With this protocol feature negotiated, QEMU will provide all >> the IOTLBs to vhost-user backend without waiting for the >> queries from backend. This is helpful when using a hardware >> accelerator which is not able to handle unknown IOVAs at the >> vhost-user backend. >> >> Signed-off-by: Tiwei Bie<tiwei.bie@intel.com> > This is potentially a large amount of data to be sent > on a socket. > > I had an impression that a hardware accelerator was using > VFIO anyway. Given this, can't we have QEMU program > the shadow IOMMU tables into VFIO directly? > > So if we want to do this, my understanding is we need to implement the driver inside qemu (or link it to qemu). Thanks
On Wed, Apr 11, 2018 at 09:41:05PM +0800, Jason Wang wrote: > > > On 2018年04月11日 16:38, Tiwei Bie wrote: > > On Wed, Apr 11, 2018 at 04:01:19PM +0800, Jason Wang wrote: > > > On 2018年04月11日 15:20, Tiwei Bie wrote: > > > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > > > > feature for vhost-user. By default, vhost-user backend needs > > > > to query the IOTLBs from QEMU after meeting unknown IOVAs. > > > > With this protocol feature negotiated, QEMU will provide all > > > > the IOTLBs to vhost-user backend without waiting for the > > > > queries from backend. This is helpful when using a hardware > > > > accelerator which is not able to handle unknown IOVAs at the > > > > vhost-user backend. > > > > > > > > Signed-off-by: Tiwei Bie<tiwei.bie@intel.com> > > > > --- > > > > The idea of this patch is to let QEMU push all the IOTLBs > > > > to vhost-user backend without waiting for the queries from > > > > the backend. Because hardware accelerator at the vhost-user > > > > backend may not be able to handle unknown IOVAs. > > > > > > > > This is just a RFC for now. It seems that, it doesn't work > > > > as expected when guest is using kernel driver (To handle > > > > this case, it seems that some RAM regions' events also need > > > > to be listened). Any comments would be appreciated! Thanks! > > > Interesting, a quick question is why this is needed? Can we just use exist > > > IOTLB update message? > > Yeah, we are still using the existing IOTLB update messages > > to send the IOTLB messages to backend. The only difference > > is that, QEMU won't wait for the queries before sending the > > IOTLB update messages. > > Yes, my question is not very clear. I mean why must need a new feature bit? > It looks to me qemu code can work without this. > > Thanks Generally we avoid adding new messages without a protocol feature bit. While careful analysis might sometimes prove it's not a strict requirement, it's just overall a clean and robust approach. > > > > > It looks to me at least kernel does not need this. > > Something similar in kernel vhost is that, for kernel vhost, > > QEMU needs to push the IOTLBs of some ring addrs to kernel > > vhost backend without waiting for the queries. > > > > Best regards, > > Tiwei Bie > > > > > Thanks
On Wed, Apr 11, 2018 at 04:38:53PM +0800, Tiwei Bie wrote: > On Wed, Apr 11, 2018 at 04:01:19PM +0800, Jason Wang wrote: > > On 2018年04月11日 15:20, Tiwei Bie wrote: > > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > > > feature for vhost-user. By default, vhost-user backend needs > > > to query the IOTLBs from QEMU after meeting unknown IOVAs. > > > With this protocol feature negotiated, QEMU will provide all > > > the IOTLBs to vhost-user backend without waiting for the > > > queries from backend. This is helpful when using a hardware > > > accelerator which is not able to handle unknown IOVAs at the > > > vhost-user backend. > > > > > > Signed-off-by: Tiwei Bie<tiwei.bie@intel.com> > > > --- > > > The idea of this patch is to let QEMU push all the IOTLBs > > > to vhost-user backend without waiting for the queries from > > > the backend. Because hardware accelerator at the vhost-user > > > backend may not be able to handle unknown IOVAs. > > > > > > This is just a RFC for now. It seems that, it doesn't work > > > as expected when guest is using kernel driver (To handle > > > this case, it seems that some RAM regions' events also need > > > to be listened). Any comments would be appreciated! Thanks! > > > > Interesting, a quick question is why this is needed? Can we just use exist > > IOTLB update message? > > Yeah, we are still using the existing IOTLB update messages > to send the IOTLB messages to backend. The only difference > is that, QEMU won't wait for the queries before sending the > IOTLB update messages. So I have a concern with that, in that without any flow control the socket buffer used by vhost-user might become full. We don't currently expect that. Again, my understanding is that the biggest benefit from use of a hardware accelerator is when notifications can be passed-through to the guest. And since that involves VFIO, and since VFIO already needs to support all kinds of IOMMUs, one wonders whether one can just pass that directly to the VFIO instead of shipping it to vhost-user. > > > > It looks to me at least kernel does not need this. > > Something similar in kernel vhost is that, for kernel vhost, > QEMU needs to push the IOTLBs of some ring addrs to kernel > vhost backend without waiting for the queries. > > Best regards, > Tiwei Bie That's really to work around a bug in kernel, we keep this around since we want to support old kernels. > > > > Thanks
On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote: > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote: > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > > feature for vhost-user. By default, vhost-user backend needs > > to query the IOTLBs from QEMU after meeting unknown IOVAs. > > With this protocol feature negotiated, QEMU will provide all > > the IOTLBs to vhost-user backend without waiting for the > > queries from backend. This is helpful when using a hardware > > accelerator which is not able to handle unknown IOVAs at the > > vhost-user backend. > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> > > This is potentially a large amount of data to be sent > on a socket. If we take the hardware accelerator out of this picture, we will find that it's actually a question of "pre-loading" vs "lazy-loading". I think neither of them is perfect. For "pre-loading", as you said, we may have a tough starting. But for "lazy-loading", we can't have a predictable performance. A sudden, unexpected performance drop may happen at any time, because we may meet an unknown IOVA at any time in this case. Once we meet an unknown IOVA, the backend's data path will need to stop and query the mapping of the IOVA via the socket and wait for the reply. And the latency is not negligible (sometimes it's even unacceptable), especially in high performance network case. So maybe it's better to make both of them available to the vhost backend. > > I had an impression that a hardware accelerator was using > VFIO anyway. Given this, can't we have QEMU program > the shadow IOMMU tables into VFIO directly? I think it's a good idea! Currently, my concern about it is that, the hardware device may not use IOMMU and it may have its builtin address translation unit. And it will be a pain for device vendors to teach VFIO to be able to work with the builtin address translation unit. Best regards, Tiwei Bie > > > > --- > > The idea of this patch is to let QEMU push all the IOTLBs > > to vhost-user backend without waiting for the queries from > > the backend. Because hardware accelerator at the vhost-user > > backend may not be able to handle unknown IOVAs. > > > > This is just a RFC for now. It seems that, it doesn't work > > as expected when guest is using kernel driver (To handle > > this case, it seems that some RAM regions' events also need > > to be listened). Any comments would be appreciated! Thanks! > > > > docs/interop/vhost-user.txt | 9 ++++++++ > > hw/virtio/vhost-backend.c | 7 ++++++ > > hw/virtio/vhost-user.c | 8 +++++++ > > hw/virtio/vhost.c | 47 ++++++++++++++++++++++++++++++++++++--- > > include/hw/virtio/vhost-backend.h | 3 +++ > > 5 files changed, 71 insertions(+), 3 deletions(-) > > > > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt > > index 534caab18a..73e07f9dad 100644 > > --- a/docs/interop/vhost-user.txt > > +++ b/docs/interop/vhost-user.txt > > @@ -380,6 +380,7 @@ Protocol features > > #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7 > > #define VHOST_USER_PROTOCOL_F_PAGEFAULT 8 > > #define VHOST_USER_PROTOCOL_F_CONFIG 9 > > +#define VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB 10 > > > > Master message types > > -------------------- > > @@ -797,3 +798,11 @@ resilient for selective requests. > > For the message types that already solicit a reply from the client, the > > presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings > > no behavioural change. (See the 'Communication' section for details.) > > + > > +VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > > +------------------------------------ > > +By default, vhost-user backend needs to query the IOTLBs from QEMU after > > +meeting unknown IOVAs. With this protocol feature negotiated, QEMU will > > +provide all the IOTLBs to vhost backend without waiting for the queries > > +from backend. This is helpful when using a hardware accelerator which is > > +not able to handle unknown IOVAs at the vhost-user backend. > > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c > > index 7f09efab8b..d72994e9a5 100644 > > --- a/hw/virtio/vhost-backend.c > > +++ b/hw/virtio/vhost-backend.c > > @@ -233,6 +233,11 @@ static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev, > > qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL, NULL); > > } > > > > +static bool vhost_kernel_need_all_device_iotlb(struct vhost_dev *dev) > > +{ > > + return false; > > +} > > + > > static const VhostOps kernel_ops = { > > .backend_type = VHOST_BACKEND_TYPE_KERNEL, > > .vhost_backend_init = vhost_kernel_init, > > @@ -264,6 +269,8 @@ static const VhostOps kernel_ops = { > > #endif /* CONFIG_VHOST_VSOCK */ > > .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback, > > .vhost_send_device_iotlb_msg = vhost_kernel_send_device_iotlb_msg, > > + .vhost_backend_need_all_device_iotlb = > > + vhost_kernel_need_all_device_iotlb, > > }; > > > > int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type) > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > index 38da8692bb..30e0b1c13b 100644 > > --- a/hw/virtio/vhost-user.c > > +++ b/hw/virtio/vhost-user.c > > @@ -47,6 +47,7 @@ enum VhostUserProtocolFeature { > > VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7, > > VHOST_USER_PROTOCOL_F_PAGEFAULT = 8, > > VHOST_USER_PROTOCOL_F_CONFIG = 9, > > + VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB = 10, > > VHOST_USER_PROTOCOL_F_MAX > > }; > > > > @@ -1581,6 +1582,12 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id) > > return 0; > > } > > > > +static bool vhost_user_need_all_device_iotlb(struct vhost_dev *dev) > > +{ > > + return virtio_has_feature(dev->protocol_features, > > + VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB); > > +} > > + > > const VhostOps user_ops = { > > .backend_type = VHOST_BACKEND_TYPE_USER, > > .vhost_backend_init = vhost_user_init, > > @@ -1611,4 +1618,5 @@ const VhostOps user_ops = { > > .vhost_set_config = vhost_user_set_config, > > .vhost_crypto_create_session = vhost_user_crypto_create_session, > > .vhost_crypto_close_session = vhost_user_crypto_close_session, > > + .vhost_backend_need_all_device_iotlb = vhost_user_need_all_device_iotlb, > > }; > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > > index f51bf573d5..70922ee998 100644 > > --- a/hw/virtio/vhost.c > > +++ b/hw/virtio/vhost.c > > @@ -48,6 +48,9 @@ static unsigned int used_memslots; > > static QLIST_HEAD(, vhost_dev) vhost_devices = > > QLIST_HEAD_INITIALIZER(vhost_devices); > > > > +static int vhost_memory_region_lookup(struct vhost_dev *hdev, uint64_t gpa, > > + uint64_t *uaddr, uint64_t *len); > > + > > bool vhost_has_free_slot(void) > > { > > unsigned int slots_limit = ~0U; > > @@ -634,12 +637,39 @@ static void vhost_region_addnop(MemoryListener *listener, > > vhost_region_add_section(dev, section); > > } > > > > -static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) > > +static void vhost_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) > > { > > struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n); > > struct vhost_dev *hdev = iommu->hdev; > > hwaddr iova = iotlb->iova + iommu->iommu_offset; > > > > + if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) { > > + uint64_t uaddr, len; > > + > > + rcu_read_lock(); > > + > > + if (iotlb->target_as != NULL) { > > + if (vhost_memory_region_lookup(hdev, iotlb->translated_addr, > > + &uaddr, &len)) { > > + error_report("Fail to lookup the translated address " > > + "%"PRIx64, iotlb->translated_addr); > > + goto out; > > + } > > + > > + len = MIN(iotlb->addr_mask + 1, len); > > + iova = iova & ~iotlb->addr_mask; > > + > > + if (vhost_backend_update_device_iotlb(hdev, iova, uaddr, > > + len, iotlb->perm)) { > > + error_report("Fail to update device iotlb"); > > + goto out; > > + } > > + } > > +out: > > + rcu_read_unlock(); > > + return; > > + } > > + > > if (vhost_backend_invalidate_device_iotlb(hdev, iova, > > iotlb->addr_mask + 1)) { > > error_report("Fail to invalidate device iotlb"); > > @@ -652,6 +682,7 @@ static void vhost_iommu_region_add(MemoryListener *listener, > > struct vhost_dev *dev = container_of(listener, struct vhost_dev, > > iommu_listener); > > struct vhost_iommu *iommu; > > + IOMMUNotifierFlag flags; > > Int128 end; > > > > if (!memory_region_is_iommu(section->mr)) { > > @@ -662,8 +693,15 @@ static void vhost_iommu_region_add(MemoryListener *listener, > > end = int128_add(int128_make64(section->offset_within_region), > > section->size); > > end = int128_sub(end, int128_one()); > > - iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify, > > - IOMMU_NOTIFIER_UNMAP, > > + > > + if (dev->vhost_ops->vhost_backend_need_all_device_iotlb(dev)) { > > + flags = IOMMU_NOTIFIER_ALL; > > + } else { > > + flags = IOMMU_NOTIFIER_UNMAP; > > + } > > + > > + iommu_notifier_init(&iommu->n, vhost_iommu_map_notify, > > + flags, > > section->offset_within_region, > > int128_get64(end)); > > iommu->mr = section->mr; > > @@ -673,6 +711,9 @@ static void vhost_iommu_region_add(MemoryListener *listener, > > memory_region_register_iommu_notifier(section->mr, &iommu->n); > > QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next); > > /* TODO: can replay help performance here? */ > > + if (flags == IOMMU_NOTIFIER_ALL) { > > + memory_region_iommu_replay(IOMMU_MEMORY_REGION(iommu->mr), &iommu->n); > > + } > > } > > > > static void vhost_iommu_region_del(MemoryListener *listener, > > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h > > index 5dac61f9ea..602fd987a4 100644 > > --- a/include/hw/virtio/vhost-backend.h > > +++ b/include/hw/virtio/vhost-backend.h > > @@ -101,6 +101,8 @@ typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev, > > typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev, > > uint64_t session_id); > > > > +typedef bool (*vhost_backend_need_all_device_iotlb_op)(struct vhost_dev *dev); > > + > > typedef struct VhostOps { > > VhostBackendType backend_type; > > vhost_backend_init vhost_backend_init; > > @@ -138,6 +140,7 @@ typedef struct VhostOps { > > vhost_set_config_op vhost_set_config; > > vhost_crypto_create_session_op vhost_crypto_create_session; > > vhost_crypto_close_session_op vhost_crypto_close_session; > > + vhost_backend_need_all_device_iotlb_op vhost_backend_need_all_device_iotlb; > > } VhostOps; > > > > extern const VhostOps user_ops; > > -- > > 2.11.0
On Thu, Apr 12, 2018 at 09:10:59AM +0800, Tiwei Bie wrote: > On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote: > > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote: > > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > > > feature for vhost-user. By default, vhost-user backend needs > > > to query the IOTLBs from QEMU after meeting unknown IOVAs. > > > With this protocol feature negotiated, QEMU will provide all > > > the IOTLBs to vhost-user backend without waiting for the > > > queries from backend. This is helpful when using a hardware > > > accelerator which is not able to handle unknown IOVAs at the > > > vhost-user backend. > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> > > > > This is potentially a large amount of data to be sent > > on a socket. > > If we take the hardware accelerator out of this picture, we > will find that it's actually a question of "pre-loading" vs > "lazy-loading". I think neither of them is perfect. > > For "pre-loading", as you said, we may have a tough starting. > But for "lazy-loading", we can't have a predictable performance. > A sudden, unexpected performance drop may happen at any time, > because we may meet an unknown IOVA at any time in this case. That's how hardware behaves too though. So we can expect guests to try to optimize locality. > Once we meet an unknown IOVA, the backend's data path will need > to stop and query the mapping of the IOVA via the socket and > wait for the reply. And the latency is not negligible (sometimes > it's even unacceptable), especially in high performance network > case. So maybe it's better to make both of them available to > the vhost backend. > > > > > I had an impression that a hardware accelerator was using > > VFIO anyway. Given this, can't we have QEMU program > > the shadow IOMMU tables into VFIO directly? > > I think it's a good idea! Currently, my concern about it is > that, the hardware device may not use IOMMU and it may have > its builtin address translation unit. And it will be a pain > for device vendors to teach VFIO to be able to work with the > builtin address translation unit. I think such drivers would have to interate with VFIO somehow. Otherwise, what is the plan for assigning such devices then? > Best regards, > Tiwei Bie > > > > > > > > --- > > > The idea of this patch is to let QEMU push all the IOTLBs > > > to vhost-user backend without waiting for the queries from > > > the backend. Because hardware accelerator at the vhost-user > > > backend may not be able to handle unknown IOVAs. > > > > > > This is just a RFC for now. It seems that, it doesn't work > > > as expected when guest is using kernel driver (To handle > > > this case, it seems that some RAM regions' events also need > > > to be listened). Any comments would be appreciated! Thanks! > > > > > > docs/interop/vhost-user.txt | 9 ++++++++ > > > hw/virtio/vhost-backend.c | 7 ++++++ > > > hw/virtio/vhost-user.c | 8 +++++++ > > > hw/virtio/vhost.c | 47 ++++++++++++++++++++++++++++++++++++--- > > > include/hw/virtio/vhost-backend.h | 3 +++ > > > 5 files changed, 71 insertions(+), 3 deletions(-) > > > > > > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt > > > index 534caab18a..73e07f9dad 100644 > > > --- a/docs/interop/vhost-user.txt > > > +++ b/docs/interop/vhost-user.txt > > > @@ -380,6 +380,7 @@ Protocol features > > > #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7 > > > #define VHOST_USER_PROTOCOL_F_PAGEFAULT 8 > > > #define VHOST_USER_PROTOCOL_F_CONFIG 9 > > > +#define VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB 10 > > > > > > Master message types > > > -------------------- > > > @@ -797,3 +798,11 @@ resilient for selective requests. > > > For the message types that already solicit a reply from the client, the > > > presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings > > > no behavioural change. (See the 'Communication' section for details.) > > > + > > > +VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > > > +------------------------------------ > > > +By default, vhost-user backend needs to query the IOTLBs from QEMU after > > > +meeting unknown IOVAs. With this protocol feature negotiated, QEMU will > > > +provide all the IOTLBs to vhost backend without waiting for the queries > > > +from backend. This is helpful when using a hardware accelerator which is > > > +not able to handle unknown IOVAs at the vhost-user backend. > > > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c > > > index 7f09efab8b..d72994e9a5 100644 > > > --- a/hw/virtio/vhost-backend.c > > > +++ b/hw/virtio/vhost-backend.c > > > @@ -233,6 +233,11 @@ static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev, > > > qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL, NULL); > > > } > > > > > > +static bool vhost_kernel_need_all_device_iotlb(struct vhost_dev *dev) > > > +{ > > > + return false; > > > +} > > > + > > > static const VhostOps kernel_ops = { > > > .backend_type = VHOST_BACKEND_TYPE_KERNEL, > > > .vhost_backend_init = vhost_kernel_init, > > > @@ -264,6 +269,8 @@ static const VhostOps kernel_ops = { > > > #endif /* CONFIG_VHOST_VSOCK */ > > > .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback, > > > .vhost_send_device_iotlb_msg = vhost_kernel_send_device_iotlb_msg, > > > + .vhost_backend_need_all_device_iotlb = > > > + vhost_kernel_need_all_device_iotlb, > > > }; > > > > > > int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type) > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > > index 38da8692bb..30e0b1c13b 100644 > > > --- a/hw/virtio/vhost-user.c > > > +++ b/hw/virtio/vhost-user.c > > > @@ -47,6 +47,7 @@ enum VhostUserProtocolFeature { > > > VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7, > > > VHOST_USER_PROTOCOL_F_PAGEFAULT = 8, > > > VHOST_USER_PROTOCOL_F_CONFIG = 9, > > > + VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB = 10, > > > VHOST_USER_PROTOCOL_F_MAX > > > }; > > > > > > @@ -1581,6 +1582,12 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id) > > > return 0; > > > } > > > > > > +static bool vhost_user_need_all_device_iotlb(struct vhost_dev *dev) > > > +{ > > > + return virtio_has_feature(dev->protocol_features, > > > + VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB); > > > +} > > > + > > > const VhostOps user_ops = { > > > .backend_type = VHOST_BACKEND_TYPE_USER, > > > .vhost_backend_init = vhost_user_init, > > > @@ -1611,4 +1618,5 @@ const VhostOps user_ops = { > > > .vhost_set_config = vhost_user_set_config, > > > .vhost_crypto_create_session = vhost_user_crypto_create_session, > > > .vhost_crypto_close_session = vhost_user_crypto_close_session, > > > + .vhost_backend_need_all_device_iotlb = vhost_user_need_all_device_iotlb, > > > }; > > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > > > index f51bf573d5..70922ee998 100644 > > > --- a/hw/virtio/vhost.c > > > +++ b/hw/virtio/vhost.c > > > @@ -48,6 +48,9 @@ static unsigned int used_memslots; > > > static QLIST_HEAD(, vhost_dev) vhost_devices = > > > QLIST_HEAD_INITIALIZER(vhost_devices); > > > > > > +static int vhost_memory_region_lookup(struct vhost_dev *hdev, uint64_t gpa, > > > + uint64_t *uaddr, uint64_t *len); > > > + > > > bool vhost_has_free_slot(void) > > > { > > > unsigned int slots_limit = ~0U; > > > @@ -634,12 +637,39 @@ static void vhost_region_addnop(MemoryListener *listener, > > > vhost_region_add_section(dev, section); > > > } > > > > > > -static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) > > > +static void vhost_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) > > > { > > > struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n); > > > struct vhost_dev *hdev = iommu->hdev; > > > hwaddr iova = iotlb->iova + iommu->iommu_offset; > > > > > > + if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) { > > > + uint64_t uaddr, len; > > > + > > > + rcu_read_lock(); > > > + > > > + if (iotlb->target_as != NULL) { > > > + if (vhost_memory_region_lookup(hdev, iotlb->translated_addr, > > > + &uaddr, &len)) { > > > + error_report("Fail to lookup the translated address " > > > + "%"PRIx64, iotlb->translated_addr); > > > + goto out; > > > + } > > > + > > > + len = MIN(iotlb->addr_mask + 1, len); > > > + iova = iova & ~iotlb->addr_mask; > > > + > > > + if (vhost_backend_update_device_iotlb(hdev, iova, uaddr, > > > + len, iotlb->perm)) { > > > + error_report("Fail to update device iotlb"); > > > + goto out; > > > + } > > > + } > > > +out: > > > + rcu_read_unlock(); > > > + return; > > > + } > > > + > > > if (vhost_backend_invalidate_device_iotlb(hdev, iova, > > > iotlb->addr_mask + 1)) { > > > error_report("Fail to invalidate device iotlb"); > > > @@ -652,6 +682,7 @@ static void vhost_iommu_region_add(MemoryListener *listener, > > > struct vhost_dev *dev = container_of(listener, struct vhost_dev, > > > iommu_listener); > > > struct vhost_iommu *iommu; > > > + IOMMUNotifierFlag flags; > > > Int128 end; > > > > > > if (!memory_region_is_iommu(section->mr)) { > > > @@ -662,8 +693,15 @@ static void vhost_iommu_region_add(MemoryListener *listener, > > > end = int128_add(int128_make64(section->offset_within_region), > > > section->size); > > > end = int128_sub(end, int128_one()); > > > - iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify, > > > - IOMMU_NOTIFIER_UNMAP, > > > + > > > + if (dev->vhost_ops->vhost_backend_need_all_device_iotlb(dev)) { > > > + flags = IOMMU_NOTIFIER_ALL; > > > + } else { > > > + flags = IOMMU_NOTIFIER_UNMAP; > > > + } > > > + > > > + iommu_notifier_init(&iommu->n, vhost_iommu_map_notify, > > > + flags, > > > section->offset_within_region, > > > int128_get64(end)); > > > iommu->mr = section->mr; > > > @@ -673,6 +711,9 @@ static void vhost_iommu_region_add(MemoryListener *listener, > > > memory_region_register_iommu_notifier(section->mr, &iommu->n); > > > QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next); > > > /* TODO: can replay help performance here? */ > > > + if (flags == IOMMU_NOTIFIER_ALL) { > > > + memory_region_iommu_replay(IOMMU_MEMORY_REGION(iommu->mr), &iommu->n); > > > + } > > > } > > > > > > static void vhost_iommu_region_del(MemoryListener *listener, > > > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h > > > index 5dac61f9ea..602fd987a4 100644 > > > --- a/include/hw/virtio/vhost-backend.h > > > +++ b/include/hw/virtio/vhost-backend.h > > > @@ -101,6 +101,8 @@ typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev, > > > typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev, > > > uint64_t session_id); > > > > > > +typedef bool (*vhost_backend_need_all_device_iotlb_op)(struct vhost_dev *dev); > > > + > > > typedef struct VhostOps { > > > VhostBackendType backend_type; > > > vhost_backend_init vhost_backend_init; > > > @@ -138,6 +140,7 @@ typedef struct VhostOps { > > > vhost_set_config_op vhost_set_config; > > > vhost_crypto_create_session_op vhost_crypto_create_session; > > > vhost_crypto_close_session_op vhost_crypto_close_session; > > > + vhost_backend_need_all_device_iotlb_op vhost_backend_need_all_device_iotlb; > > > } VhostOps; > > > > > > extern const VhostOps user_ops; > > > -- > > > 2.11.0
On Thu, Apr 12, 2018 at 04:29:29AM +0300, Michael S. Tsirkin wrote: > On Thu, Apr 12, 2018 at 09:10:59AM +0800, Tiwei Bie wrote: > > On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote: > > > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote: > > > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > > > > feature for vhost-user. By default, vhost-user backend needs > > > > to query the IOTLBs from QEMU after meeting unknown IOVAs. > > > > With this protocol feature negotiated, QEMU will provide all > > > > the IOTLBs to vhost-user backend without waiting for the > > > > queries from backend. This is helpful when using a hardware > > > > accelerator which is not able to handle unknown IOVAs at the > > > > vhost-user backend. > > > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> > > > > > > This is potentially a large amount of data to be sent > > > on a socket. > > > > If we take the hardware accelerator out of this picture, we > > will find that it's actually a question of "pre-loading" vs > > "lazy-loading". I think neither of them is perfect. > > > > For "pre-loading", as you said, we may have a tough starting. > > But for "lazy-loading", we can't have a predictable performance. > > A sudden, unexpected performance drop may happen at any time, > > because we may meet an unknown IOVA at any time in this case. > > That's how hardware behaves too though. So we can expect guests > to try to optimize locality. The difference is that, the software implementation needs to query the mappings via socket. And it's much slower.. > > > Once we meet an unknown IOVA, the backend's data path will need > > to stop and query the mapping of the IOVA via the socket and > > wait for the reply. And the latency is not negligible (sometimes > > it's even unacceptable), especially in high performance network > > case. So maybe it's better to make both of them available to > > the vhost backend. > > > > > > > > I had an impression that a hardware accelerator was using > > > VFIO anyway. Given this, can't we have QEMU program > > > the shadow IOMMU tables into VFIO directly? > > > > I think it's a good idea! Currently, my concern about it is > > that, the hardware device may not use IOMMU and it may have > > its builtin address translation unit. And it will be a pain > > for device vendors to teach VFIO to be able to work with the > > builtin address translation unit. > > I think such drivers would have to interate with VFIO somehow. > Otherwise, what is the plan for assigning such devices then? Such devices are just for vhost data path acceleration. They have many available queue pairs, the switch logic will be done among those queue pairs. And different queue pairs will serve different VMs directly. Best regards, Tiwei Bie > > > > Best regards, > > Tiwei Bie > > > > > > > > > > > > --- > > > > The idea of this patch is to let QEMU push all the IOTLBs > > > > to vhost-user backend without waiting for the queries from > > > > the backend. Because hardware accelerator at the vhost-user > > > > backend may not be able to handle unknown IOVAs. > > > > > > > > This is just a RFC for now. It seems that, it doesn't work > > > > as expected when guest is using kernel driver (To handle > > > > this case, it seems that some RAM regions' events also need > > > > to be listened). Any comments would be appreciated! Thanks! > > > > > > > > docs/interop/vhost-user.txt | 9 ++++++++ > > > > hw/virtio/vhost-backend.c | 7 ++++++ > > > > hw/virtio/vhost-user.c | 8 +++++++ > > > > hw/virtio/vhost.c | 47 ++++++++++++++++++++++++++++++++++++--- > > > > include/hw/virtio/vhost-backend.h | 3 +++ > > > > 5 files changed, 71 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt > > > > index 534caab18a..73e07f9dad 100644 > > > > --- a/docs/interop/vhost-user.txt > > > > +++ b/docs/interop/vhost-user.txt > > > > @@ -380,6 +380,7 @@ Protocol features > > > > #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7 > > > > #define VHOST_USER_PROTOCOL_F_PAGEFAULT 8 > > > > #define VHOST_USER_PROTOCOL_F_CONFIG 9 > > > > +#define VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB 10 > > > > > > > > Master message types > > > > -------------------- > > > > @@ -797,3 +798,11 @@ resilient for selective requests. > > > > For the message types that already solicit a reply from the client, the > > > > presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings > > > > no behavioural change. (See the 'Communication' section for details.) > > > > + > > > > +VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > > > > +------------------------------------ > > > > +By default, vhost-user backend needs to query the IOTLBs from QEMU after > > > > +meeting unknown IOVAs. With this protocol feature negotiated, QEMU will > > > > +provide all the IOTLBs to vhost backend without waiting for the queries > > > > +from backend. This is helpful when using a hardware accelerator which is > > > > +not able to handle unknown IOVAs at the vhost-user backend. > > > > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c > > > > index 7f09efab8b..d72994e9a5 100644 > > > > --- a/hw/virtio/vhost-backend.c > > > > +++ b/hw/virtio/vhost-backend.c > > > > @@ -233,6 +233,11 @@ static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev, > > > > qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL, NULL); > > > > } > > > > > > > > +static bool vhost_kernel_need_all_device_iotlb(struct vhost_dev *dev) > > > > +{ > > > > + return false; > > > > +} > > > > + > > > > static const VhostOps kernel_ops = { > > > > .backend_type = VHOST_BACKEND_TYPE_KERNEL, > > > > .vhost_backend_init = vhost_kernel_init, > > > > @@ -264,6 +269,8 @@ static const VhostOps kernel_ops = { > > > > #endif /* CONFIG_VHOST_VSOCK */ > > > > .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback, > > > > .vhost_send_device_iotlb_msg = vhost_kernel_send_device_iotlb_msg, > > > > + .vhost_backend_need_all_device_iotlb = > > > > + vhost_kernel_need_all_device_iotlb, > > > > }; > > > > > > > > int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type) > > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > > > index 38da8692bb..30e0b1c13b 100644 > > > > --- a/hw/virtio/vhost-user.c > > > > +++ b/hw/virtio/vhost-user.c > > > > @@ -47,6 +47,7 @@ enum VhostUserProtocolFeature { > > > > VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7, > > > > VHOST_USER_PROTOCOL_F_PAGEFAULT = 8, > > > > VHOST_USER_PROTOCOL_F_CONFIG = 9, > > > > + VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB = 10, > > > > VHOST_USER_PROTOCOL_F_MAX > > > > }; > > > > > > > > @@ -1581,6 +1582,12 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id) > > > > return 0; > > > > } > > > > > > > > +static bool vhost_user_need_all_device_iotlb(struct vhost_dev *dev) > > > > +{ > > > > + return virtio_has_feature(dev->protocol_features, > > > > + VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB); > > > > +} > > > > + > > > > const VhostOps user_ops = { > > > > .backend_type = VHOST_BACKEND_TYPE_USER, > > > > .vhost_backend_init = vhost_user_init, > > > > @@ -1611,4 +1618,5 @@ const VhostOps user_ops = { > > > > .vhost_set_config = vhost_user_set_config, > > > > .vhost_crypto_create_session = vhost_user_crypto_create_session, > > > > .vhost_crypto_close_session = vhost_user_crypto_close_session, > > > > + .vhost_backend_need_all_device_iotlb = vhost_user_need_all_device_iotlb, > > > > }; > > > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > > > > index f51bf573d5..70922ee998 100644 > > > > --- a/hw/virtio/vhost.c > > > > +++ b/hw/virtio/vhost.c > > > > @@ -48,6 +48,9 @@ static unsigned int used_memslots; > > > > static QLIST_HEAD(, vhost_dev) vhost_devices = > > > > QLIST_HEAD_INITIALIZER(vhost_devices); > > > > > > > > +static int vhost_memory_region_lookup(struct vhost_dev *hdev, uint64_t gpa, > > > > + uint64_t *uaddr, uint64_t *len); > > > > + > > > > bool vhost_has_free_slot(void) > > > > { > > > > unsigned int slots_limit = ~0U; > > > > @@ -634,12 +637,39 @@ static void vhost_region_addnop(MemoryListener *listener, > > > > vhost_region_add_section(dev, section); > > > > } > > > > > > > > -static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) > > > > +static void vhost_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) > > > > { > > > > struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n); > > > > struct vhost_dev *hdev = iommu->hdev; > > > > hwaddr iova = iotlb->iova + iommu->iommu_offset; > > > > > > > > + if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) { > > > > + uint64_t uaddr, len; > > > > + > > > > + rcu_read_lock(); > > > > + > > > > + if (iotlb->target_as != NULL) { > > > > + if (vhost_memory_region_lookup(hdev, iotlb->translated_addr, > > > > + &uaddr, &len)) { > > > > + error_report("Fail to lookup the translated address " > > > > + "%"PRIx64, iotlb->translated_addr); > > > > + goto out; > > > > + } > > > > + > > > > + len = MIN(iotlb->addr_mask + 1, len); > > > > + iova = iova & ~iotlb->addr_mask; > > > > + > > > > + if (vhost_backend_update_device_iotlb(hdev, iova, uaddr, > > > > + len, iotlb->perm)) { > > > > + error_report("Fail to update device iotlb"); > > > > + goto out; > > > > + } > > > > + } > > > > +out: > > > > + rcu_read_unlock(); > > > > + return; > > > > + } > > > > + > > > > if (vhost_backend_invalidate_device_iotlb(hdev, iova, > > > > iotlb->addr_mask + 1)) { > > > > error_report("Fail to invalidate device iotlb"); > > > > @@ -652,6 +682,7 @@ static void vhost_iommu_region_add(MemoryListener *listener, > > > > struct vhost_dev *dev = container_of(listener, struct vhost_dev, > > > > iommu_listener); > > > > struct vhost_iommu *iommu; > > > > + IOMMUNotifierFlag flags; > > > > Int128 end; > > > > > > > > if (!memory_region_is_iommu(section->mr)) { > > > > @@ -662,8 +693,15 @@ static void vhost_iommu_region_add(MemoryListener *listener, > > > > end = int128_add(int128_make64(section->offset_within_region), > > > > section->size); > > > > end = int128_sub(end, int128_one()); > > > > - iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify, > > > > - IOMMU_NOTIFIER_UNMAP, > > > > + > > > > + if (dev->vhost_ops->vhost_backend_need_all_device_iotlb(dev)) { > > > > + flags = IOMMU_NOTIFIER_ALL; > > > > + } else { > > > > + flags = IOMMU_NOTIFIER_UNMAP; > > > > + } > > > > + > > > > + iommu_notifier_init(&iommu->n, vhost_iommu_map_notify, > > > > + flags, > > > > section->offset_within_region, > > > > int128_get64(end)); > > > > iommu->mr = section->mr; > > > > @@ -673,6 +711,9 @@ static void vhost_iommu_region_add(MemoryListener *listener, > > > > memory_region_register_iommu_notifier(section->mr, &iommu->n); > > > > QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next); > > > > /* TODO: can replay help performance here? */ > > > > + if (flags == IOMMU_NOTIFIER_ALL) { > > > > + memory_region_iommu_replay(IOMMU_MEMORY_REGION(iommu->mr), &iommu->n); > > > > + } > > > > } > > > > > > > > static void vhost_iommu_region_del(MemoryListener *listener, > > > > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h > > > > index 5dac61f9ea..602fd987a4 100644 > > > > --- a/include/hw/virtio/vhost-backend.h > > > > +++ b/include/hw/virtio/vhost-backend.h > > > > @@ -101,6 +101,8 @@ typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev, > > > > typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev, > > > > uint64_t session_id); > > > > > > > > +typedef bool (*vhost_backend_need_all_device_iotlb_op)(struct vhost_dev *dev); > > > > + > > > > typedef struct VhostOps { > > > > VhostBackendType backend_type; > > > > vhost_backend_init vhost_backend_init; > > > > @@ -138,6 +140,7 @@ typedef struct VhostOps { > > > > vhost_set_config_op vhost_set_config; > > > > vhost_crypto_create_session_op vhost_crypto_create_session; > > > > vhost_crypto_close_session_op vhost_crypto_close_session; > > > > + vhost_backend_need_all_device_iotlb_op vhost_backend_need_all_device_iotlb; > > > > } VhostOps; > > > > > > > > extern const VhostOps user_ops; > > > > -- > > > > 2.11.0
On Wed, Apr 11, 2018 at 08:37:17PM +0300, Michael S. Tsirkin wrote: > On Wed, Apr 11, 2018 at 04:38:53PM +0800, Tiwei Bie wrote: > > On Wed, Apr 11, 2018 at 04:01:19PM +0800, Jason Wang wrote: > > > On 2018年04月11日 15:20, Tiwei Bie wrote: > > > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > > > > feature for vhost-user. By default, vhost-user backend needs > > > > to query the IOTLBs from QEMU after meeting unknown IOVAs. > > > > With this protocol feature negotiated, QEMU will provide all > > > > the IOTLBs to vhost-user backend without waiting for the > > > > queries from backend. This is helpful when using a hardware > > > > accelerator which is not able to handle unknown IOVAs at the > > > > vhost-user backend. > > > > > > > > Signed-off-by: Tiwei Bie<tiwei.bie@intel.com> > > > > --- > > > > The idea of this patch is to let QEMU push all the IOTLBs > > > > to vhost-user backend without waiting for the queries from > > > > the backend. Because hardware accelerator at the vhost-user > > > > backend may not be able to handle unknown IOVAs. > > > > > > > > This is just a RFC for now. It seems that, it doesn't work > > > > as expected when guest is using kernel driver (To handle > > > > this case, it seems that some RAM regions' events also need > > > > to be listened). Any comments would be appreciated! Thanks! > > > > > > Interesting, a quick question is why this is needed? Can we just use exist > > > IOTLB update message? > > > > Yeah, we are still using the existing IOTLB update messages > > to send the IOTLB messages to backend. The only difference > > is that, QEMU won't wait for the queries before sending the > > IOTLB update messages. > > So I have a concern with that, in that without any flow > control the socket buffer used by vhost-user might become > full. Each IOTLB update message needs a reply. So I think it won't happen. Best regards, Tiwei Bie > > We don't currently expect that. > > > Again, my understanding is that the biggest benefit from use of a > hardware accelerator is when notifications can be passed-through to the > guest. > > And since that involves VFIO, and since VFIO already needs to support > all kinds of IOMMUs, one wonders whether one can just pass that directly > to the VFIO instead of shipping it to vhost-user. > > > > > > > > It looks to me at least kernel does not need this. > > > > Something similar in kernel vhost is that, for kernel vhost, > > QEMU needs to push the IOTLBs of some ring addrs to kernel > > vhost backend without waiting for the queries. > > > > Best regards, > > Tiwei Bie > > That's really to work around a bug in kernel, we keep this around since > we want to support old kernels. > > > > > > > Thanks
On Thu, Apr 12, 2018 at 09:39:43AM +0800, Tiwei Bie wrote: > On Thu, Apr 12, 2018 at 04:29:29AM +0300, Michael S. Tsirkin wrote: > > On Thu, Apr 12, 2018 at 09:10:59AM +0800, Tiwei Bie wrote: > > > On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote: > > > > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote: > > > > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > > > > > feature for vhost-user. By default, vhost-user backend needs > > > > > to query the IOTLBs from QEMU after meeting unknown IOVAs. > > > > > With this protocol feature negotiated, QEMU will provide all > > > > > the IOTLBs to vhost-user backend without waiting for the > > > > > queries from backend. This is helpful when using a hardware > > > > > accelerator which is not able to handle unknown IOVAs at the > > > > > vhost-user backend. > > > > > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> > > > > > > > > This is potentially a large amount of data to be sent > > > > on a socket. > > > > > > If we take the hardware accelerator out of this picture, we > > > will find that it's actually a question of "pre-loading" vs > > > "lazy-loading". I think neither of them is perfect. > > > > > > For "pre-loading", as you said, we may have a tough starting. > > > But for "lazy-loading", we can't have a predictable performance. > > > A sudden, unexpected performance drop may happen at any time, > > > because we may meet an unknown IOVA at any time in this case. > > > > That's how hardware behaves too though. So we can expect guests > > to try to optimize locality. > > The difference is that, the software implementation needs to > query the mappings via socket. And it's much slower.. If you are proposing this new feature as an optimization, then I'd like to see numbers showing the performance gains. It's definitely possible to optimize things out. Pre-loading isn't where I would start optimizing though. For example, DPDK could have its own VTD emulation, then it could access guest memory directly. > > > > > Once we meet an unknown IOVA, the backend's data path will need > > > to stop and query the mapping of the IOVA via the socket and > > > wait for the reply. And the latency is not negligible (sometimes > > > it's even unacceptable), especially in high performance network > > > case. So maybe it's better to make both of them available to > > > the vhost backend. > > > > > > > > > > > I had an impression that a hardware accelerator was using > > > > VFIO anyway. Given this, can't we have QEMU program > > > > the shadow IOMMU tables into VFIO directly? > > > > > > I think it's a good idea! Currently, my concern about it is > > > that, the hardware device may not use IOMMU and it may have > > > its builtin address translation unit. And it will be a pain > > > for device vendors to teach VFIO to be able to work with the > > > builtin address translation unit. > > > > I think such drivers would have to interate with VFIO somehow. > > Otherwise, what is the plan for assigning such devices then? > > Such devices are just for vhost data path acceleration. That's not true I think. E.g. RDMA devices have an on-card MMU. > They have many available queue pairs, the switch logic > will be done among those queue pairs. And different queue > pairs will serve different VMs directly. > > Best regards, > Tiwei Bie The way I would do it is attach different PASID values to different queues. This way you can use the standard IOMMU to enforce protection. > > > > > > > Best regards, > > > Tiwei Bie > > > > > > > > > > > > > > > > --- > > > > > The idea of this patch is to let QEMU push all the IOTLBs > > > > > to vhost-user backend without waiting for the queries from > > > > > the backend. Because hardware accelerator at the vhost-user > > > > > backend may not be able to handle unknown IOVAs. > > > > > > > > > > This is just a RFC for now. It seems that, it doesn't work > > > > > as expected when guest is using kernel driver (To handle > > > > > this case, it seems that some RAM regions' events also need > > > > > to be listened). Any comments would be appreciated! Thanks! > > > > > > > > > > docs/interop/vhost-user.txt | 9 ++++++++ > > > > > hw/virtio/vhost-backend.c | 7 ++++++ > > > > > hw/virtio/vhost-user.c | 8 +++++++ > > > > > hw/virtio/vhost.c | 47 ++++++++++++++++++++++++++++++++++++--- > > > > > include/hw/virtio/vhost-backend.h | 3 +++ > > > > > 5 files changed, 71 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt > > > > > index 534caab18a..73e07f9dad 100644 > > > > > --- a/docs/interop/vhost-user.txt > > > > > +++ b/docs/interop/vhost-user.txt > > > > > @@ -380,6 +380,7 @@ Protocol features > > > > > #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7 > > > > > #define VHOST_USER_PROTOCOL_F_PAGEFAULT 8 > > > > > #define VHOST_USER_PROTOCOL_F_CONFIG 9 > > > > > +#define VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB 10 > > > > > > > > > > Master message types > > > > > -------------------- > > > > > @@ -797,3 +798,11 @@ resilient for selective requests. > > > > > For the message types that already solicit a reply from the client, the > > > > > presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings > > > > > no behavioural change. (See the 'Communication' section for details.) > > > > > + > > > > > +VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > > > > > +------------------------------------ > > > > > +By default, vhost-user backend needs to query the IOTLBs from QEMU after > > > > > +meeting unknown IOVAs. With this protocol feature negotiated, QEMU will > > > > > +provide all the IOTLBs to vhost backend without waiting for the queries > > > > > +from backend. This is helpful when using a hardware accelerator which is > > > > > +not able to handle unknown IOVAs at the vhost-user backend. > > > > > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c > > > > > index 7f09efab8b..d72994e9a5 100644 > > > > > --- a/hw/virtio/vhost-backend.c > > > > > +++ b/hw/virtio/vhost-backend.c > > > > > @@ -233,6 +233,11 @@ static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev, > > > > > qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL, NULL); > > > > > } > > > > > > > > > > +static bool vhost_kernel_need_all_device_iotlb(struct vhost_dev *dev) > > > > > +{ > > > > > + return false; > > > > > +} > > > > > + > > > > > static const VhostOps kernel_ops = { > > > > > .backend_type = VHOST_BACKEND_TYPE_KERNEL, > > > > > .vhost_backend_init = vhost_kernel_init, > > > > > @@ -264,6 +269,8 @@ static const VhostOps kernel_ops = { > > > > > #endif /* CONFIG_VHOST_VSOCK */ > > > > > .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback, > > > > > .vhost_send_device_iotlb_msg = vhost_kernel_send_device_iotlb_msg, > > > > > + .vhost_backend_need_all_device_iotlb = > > > > > + vhost_kernel_need_all_device_iotlb, > > > > > }; > > > > > > > > > > int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type) > > > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > > > > index 38da8692bb..30e0b1c13b 100644 > > > > > --- a/hw/virtio/vhost-user.c > > > > > +++ b/hw/virtio/vhost-user.c > > > > > @@ -47,6 +47,7 @@ enum VhostUserProtocolFeature { > > > > > VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7, > > > > > VHOST_USER_PROTOCOL_F_PAGEFAULT = 8, > > > > > VHOST_USER_PROTOCOL_F_CONFIG = 9, > > > > > + VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB = 10, > > > > > VHOST_USER_PROTOCOL_F_MAX > > > > > }; > > > > > > > > > > @@ -1581,6 +1582,12 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id) > > > > > return 0; > > > > > } > > > > > > > > > > +static bool vhost_user_need_all_device_iotlb(struct vhost_dev *dev) > > > > > +{ > > > > > + return virtio_has_feature(dev->protocol_features, > > > > > + VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB); > > > > > +} > > > > > + > > > > > const VhostOps user_ops = { > > > > > .backend_type = VHOST_BACKEND_TYPE_USER, > > > > > .vhost_backend_init = vhost_user_init, > > > > > @@ -1611,4 +1618,5 @@ const VhostOps user_ops = { > > > > > .vhost_set_config = vhost_user_set_config, > > > > > .vhost_crypto_create_session = vhost_user_crypto_create_session, > > > > > .vhost_crypto_close_session = vhost_user_crypto_close_session, > > > > > + .vhost_backend_need_all_device_iotlb = vhost_user_need_all_device_iotlb, > > > > > }; > > > > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > > > > > index f51bf573d5..70922ee998 100644 > > > > > --- a/hw/virtio/vhost.c > > > > > +++ b/hw/virtio/vhost.c > > > > > @@ -48,6 +48,9 @@ static unsigned int used_memslots; > > > > > static QLIST_HEAD(, vhost_dev) vhost_devices = > > > > > QLIST_HEAD_INITIALIZER(vhost_devices); > > > > > > > > > > +static int vhost_memory_region_lookup(struct vhost_dev *hdev, uint64_t gpa, > > > > > + uint64_t *uaddr, uint64_t *len); > > > > > + > > > > > bool vhost_has_free_slot(void) > > > > > { > > > > > unsigned int slots_limit = ~0U; > > > > > @@ -634,12 +637,39 @@ static void vhost_region_addnop(MemoryListener *listener, > > > > > vhost_region_add_section(dev, section); > > > > > } > > > > > > > > > > -static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) > > > > > +static void vhost_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) > > > > > { > > > > > struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n); > > > > > struct vhost_dev *hdev = iommu->hdev; > > > > > hwaddr iova = iotlb->iova + iommu->iommu_offset; > > > > > > > > > > + if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) { > > > > > + uint64_t uaddr, len; > > > > > + > > > > > + rcu_read_lock(); > > > > > + > > > > > + if (iotlb->target_as != NULL) { > > > > > + if (vhost_memory_region_lookup(hdev, iotlb->translated_addr, > > > > > + &uaddr, &len)) { > > > > > + error_report("Fail to lookup the translated address " > > > > > + "%"PRIx64, iotlb->translated_addr); > > > > > + goto out; > > > > > + } > > > > > + > > > > > + len = MIN(iotlb->addr_mask + 1, len); > > > > > + iova = iova & ~iotlb->addr_mask; > > > > > + > > > > > + if (vhost_backend_update_device_iotlb(hdev, iova, uaddr, > > > > > + len, iotlb->perm)) { > > > > > + error_report("Fail to update device iotlb"); > > > > > + goto out; > > > > > + } > > > > > + } > > > > > +out: > > > > > + rcu_read_unlock(); > > > > > + return; > > > > > + } > > > > > + > > > > > if (vhost_backend_invalidate_device_iotlb(hdev, iova, > > > > > iotlb->addr_mask + 1)) { > > > > > error_report("Fail to invalidate device iotlb"); > > > > > @@ -652,6 +682,7 @@ static void vhost_iommu_region_add(MemoryListener *listener, > > > > > struct vhost_dev *dev = container_of(listener, struct vhost_dev, > > > > > iommu_listener); > > > > > struct vhost_iommu *iommu; > > > > > + IOMMUNotifierFlag flags; > > > > > Int128 end; > > > > > > > > > > if (!memory_region_is_iommu(section->mr)) { > > > > > @@ -662,8 +693,15 @@ static void vhost_iommu_region_add(MemoryListener *listener, > > > > > end = int128_add(int128_make64(section->offset_within_region), > > > > > section->size); > > > > > end = int128_sub(end, int128_one()); > > > > > - iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify, > > > > > - IOMMU_NOTIFIER_UNMAP, > > > > > + > > > > > + if (dev->vhost_ops->vhost_backend_need_all_device_iotlb(dev)) { > > > > > + flags = IOMMU_NOTIFIER_ALL; > > > > > + } else { > > > > > + flags = IOMMU_NOTIFIER_UNMAP; > > > > > + } > > > > > + > > > > > + iommu_notifier_init(&iommu->n, vhost_iommu_map_notify, > > > > > + flags, > > > > > section->offset_within_region, > > > > > int128_get64(end)); > > > > > iommu->mr = section->mr; > > > > > @@ -673,6 +711,9 @@ static void vhost_iommu_region_add(MemoryListener *listener, > > > > > memory_region_register_iommu_notifier(section->mr, &iommu->n); > > > > > QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next); > > > > > /* TODO: can replay help performance here? */ > > > > > + if (flags == IOMMU_NOTIFIER_ALL) { > > > > > + memory_region_iommu_replay(IOMMU_MEMORY_REGION(iommu->mr), &iommu->n); > > > > > + } > > > > > } > > > > > > > > > > static void vhost_iommu_region_del(MemoryListener *listener, > > > > > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h > > > > > index 5dac61f9ea..602fd987a4 100644 > > > > > --- a/include/hw/virtio/vhost-backend.h > > > > > +++ b/include/hw/virtio/vhost-backend.h > > > > > @@ -101,6 +101,8 @@ typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev, > > > > > typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev, > > > > > uint64_t session_id); > > > > > > > > > > +typedef bool (*vhost_backend_need_all_device_iotlb_op)(struct vhost_dev *dev); > > > > > + > > > > > typedef struct VhostOps { > > > > > VhostBackendType backend_type; > > > > > vhost_backend_init vhost_backend_init; > > > > > @@ -138,6 +140,7 @@ typedef struct VhostOps { > > > > > vhost_set_config_op vhost_set_config; > > > > > vhost_crypto_create_session_op vhost_crypto_create_session; > > > > > vhost_crypto_close_session_op vhost_crypto_close_session; > > > > > + vhost_backend_need_all_device_iotlb_op vhost_backend_need_all_device_iotlb; > > > > > } VhostOps; > > > > > > > > > > extern const VhostOps user_ops; > > > > > -- > > > > > 2.11.0
On Thu, Apr 12, 2018 at 04:57:13AM +0300, Michael S. Tsirkin wrote: > On Thu, Apr 12, 2018 at 09:39:43AM +0800, Tiwei Bie wrote: > > On Thu, Apr 12, 2018 at 04:29:29AM +0300, Michael S. Tsirkin wrote: > > > On Thu, Apr 12, 2018 at 09:10:59AM +0800, Tiwei Bie wrote: > > > > On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote: > > > > > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote: > > > > > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > > > > > > feature for vhost-user. By default, vhost-user backend needs > > > > > > to query the IOTLBs from QEMU after meeting unknown IOVAs. > > > > > > With this protocol feature negotiated, QEMU will provide all > > > > > > the IOTLBs to vhost-user backend without waiting for the > > > > > > queries from backend. This is helpful when using a hardware > > > > > > accelerator which is not able to handle unknown IOVAs at the > > > > > > vhost-user backend. > > > > > > > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> > > > > > > > > > > This is potentially a large amount of data to be sent > > > > > on a socket. > > > > > > > > If we take the hardware accelerator out of this picture, we > > > > will find that it's actually a question of "pre-loading" vs > > > > "lazy-loading". I think neither of them is perfect. > > > > > > > > For "pre-loading", as you said, we may have a tough starting. > > > > But for "lazy-loading", we can't have a predictable performance. > > > > A sudden, unexpected performance drop may happen at any time, > > > > because we may meet an unknown IOVA at any time in this case. > > > > > > That's how hardware behaves too though. So we can expect guests > > > to try to optimize locality. > > > > The difference is that, the software implementation needs to > > query the mappings via socket. And it's much slower.. > > If you are proposing this new feature as an optimization, > then I'd like to see numbers showing the performance gains. > > It's definitely possible to optimize things out. Pre-loading isn't > where I would start optimizing though. For example, DPDK could have its > own VTD emulation, then it could access guest memory directly. > > > > > > > > > Once we meet an unknown IOVA, the backend's data path will need > > > > to stop and query the mapping of the IOVA via the socket and > > > > wait for the reply. And the latency is not negligible (sometimes > > > > it's even unacceptable), especially in high performance network > > > > case. So maybe it's better to make both of them available to > > > > the vhost backend. > > > > > > > > > > > > > > I had an impression that a hardware accelerator was using > > > > > VFIO anyway. Given this, can't we have QEMU program > > > > > the shadow IOMMU tables into VFIO directly? > > > > > > > > I think it's a good idea! Currently, my concern about it is > > > > that, the hardware device may not use IOMMU and it may have > > > > its builtin address translation unit. And it will be a pain > > > > for device vendors to teach VFIO to be able to work with the > > > > builtin address translation unit. > > > > > > I think such drivers would have to interate with VFIO somehow. > > > Otherwise, what is the plan for assigning such devices then? > > > > Such devices are just for vhost data path acceleration. > > That's not true I think. E.g. RDMA devices have an on-card MMU. > > > They have many available queue pairs, the switch logic > > will be done among those queue pairs. And different queue > > pairs will serve different VMs directly. > > > > Best regards, > > Tiwei Bie > > The way I would do it is attach different PASID values to > different queues. This way you can use the standard IOMMU > to enforce protection. I'm thinking about the case that device wants to use its builtin address translation, although I'm not really sure whether there will be a real product work in this way. Honestly, I like your idea, and I don't object to it. I'll do more investigations on it. And for the feature proposed in this RFC, I just think maybe it's better to provide one more possibility for the backend to support vIOMMU. Anyway, the work about adding the vIOMMU support in vDPA is just started few days ago. I'll do more investigations on each possibility. Thanks! :) Best regards, Tiwei Bie > > > > > > > > > > > > Best regards, > > > > Tiwei Bie > > > > > > > > > > > > > > > > > > > > --- > > > > > > The idea of this patch is to let QEMU push all the IOTLBs > > > > > > to vhost-user backend without waiting for the queries from > > > > > > the backend. Because hardware accelerator at the vhost-user > > > > > > backend may not be able to handle unknown IOVAs. > > > > > > > > > > > > This is just a RFC for now. It seems that, it doesn't work > > > > > > as expected when guest is using kernel driver (To handle > > > > > > this case, it seems that some RAM regions' events also need > > > > > > to be listened). Any comments would be appreciated! Thanks! > > > > > > > > > > > > docs/interop/vhost-user.txt | 9 ++++++++ > > > > > > hw/virtio/vhost-backend.c | 7 ++++++ > > > > > > hw/virtio/vhost-user.c | 8 +++++++ > > > > > > hw/virtio/vhost.c | 47 ++++++++++++++++++++++++++++++++++++--- > > > > > > include/hw/virtio/vhost-backend.h | 3 +++ > > > > > > 5 files changed, 71 insertions(+), 3 deletions(-) > > > > > > > > > > > > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt > > > > > > index 534caab18a..73e07f9dad 100644 > > > > > > --- a/docs/interop/vhost-user.txt > > > > > > +++ b/docs/interop/vhost-user.txt > > > > > > @@ -380,6 +380,7 @@ Protocol features > > > > > > #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7 > > > > > > #define VHOST_USER_PROTOCOL_F_PAGEFAULT 8 > > > > > > #define VHOST_USER_PROTOCOL_F_CONFIG 9 > > > > > > +#define VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB 10 > > > > > > > > > > > > Master message types > > > > > > -------------------- > > > > > > @@ -797,3 +798,11 @@ resilient for selective requests. > > > > > > For the message types that already solicit a reply from the client, the > > > > > > presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings > > > > > > no behavioural change. (See the 'Communication' section for details.) > > > > > > + > > > > > > +VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > > > > > > +------------------------------------ > > > > > > +By default, vhost-user backend needs to query the IOTLBs from QEMU after > > > > > > +meeting unknown IOVAs. With this protocol feature negotiated, QEMU will > > > > > > +provide all the IOTLBs to vhost backend without waiting for the queries > > > > > > +from backend. This is helpful when using a hardware accelerator which is > > > > > > +not able to handle unknown IOVAs at the vhost-user backend. > > > > > > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c > > > > > > index 7f09efab8b..d72994e9a5 100644 > > > > > > --- a/hw/virtio/vhost-backend.c > > > > > > +++ b/hw/virtio/vhost-backend.c > > > > > > @@ -233,6 +233,11 @@ static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev, > > > > > > qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL, NULL); > > > > > > } > > > > > > > > > > > > +static bool vhost_kernel_need_all_device_iotlb(struct vhost_dev *dev) > > > > > > +{ > > > > > > + return false; > > > > > > +} > > > > > > + > > > > > > static const VhostOps kernel_ops = { > > > > > > .backend_type = VHOST_BACKEND_TYPE_KERNEL, > > > > > > .vhost_backend_init = vhost_kernel_init, > > > > > > @@ -264,6 +269,8 @@ static const VhostOps kernel_ops = { > > > > > > #endif /* CONFIG_VHOST_VSOCK */ > > > > > > .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback, > > > > > > .vhost_send_device_iotlb_msg = vhost_kernel_send_device_iotlb_msg, > > > > > > + .vhost_backend_need_all_device_iotlb = > > > > > > + vhost_kernel_need_all_device_iotlb, > > > > > > }; > > > > > > > > > > > > int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type) > > > > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > > > > > index 38da8692bb..30e0b1c13b 100644 > > > > > > --- a/hw/virtio/vhost-user.c > > > > > > +++ b/hw/virtio/vhost-user.c > > > > > > @@ -47,6 +47,7 @@ enum VhostUserProtocolFeature { > > > > > > VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7, > > > > > > VHOST_USER_PROTOCOL_F_PAGEFAULT = 8, > > > > > > VHOST_USER_PROTOCOL_F_CONFIG = 9, > > > > > > + VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB = 10, > > > > > > VHOST_USER_PROTOCOL_F_MAX > > > > > > }; > > > > > > > > > > > > @@ -1581,6 +1582,12 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id) > > > > > > return 0; > > > > > > } > > > > > > > > > > > > +static bool vhost_user_need_all_device_iotlb(struct vhost_dev *dev) > > > > > > +{ > > > > > > + return virtio_has_feature(dev->protocol_features, > > > > > > + VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB); > > > > > > +} > > > > > > + > > > > > > const VhostOps user_ops = { > > > > > > .backend_type = VHOST_BACKEND_TYPE_USER, > > > > > > .vhost_backend_init = vhost_user_init, > > > > > > @@ -1611,4 +1618,5 @@ const VhostOps user_ops = { > > > > > > .vhost_set_config = vhost_user_set_config, > > > > > > .vhost_crypto_create_session = vhost_user_crypto_create_session, > > > > > > .vhost_crypto_close_session = vhost_user_crypto_close_session, > > > > > > + .vhost_backend_need_all_device_iotlb = vhost_user_need_all_device_iotlb, > > > > > > }; > > > > > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > > > > > > index f51bf573d5..70922ee998 100644 > > > > > > --- a/hw/virtio/vhost.c > > > > > > +++ b/hw/virtio/vhost.c > > > > > > @@ -48,6 +48,9 @@ static unsigned int used_memslots; > > > > > > static QLIST_HEAD(, vhost_dev) vhost_devices = > > > > > > QLIST_HEAD_INITIALIZER(vhost_devices); > > > > > > > > > > > > +static int vhost_memory_region_lookup(struct vhost_dev *hdev, uint64_t gpa, > > > > > > + uint64_t *uaddr, uint64_t *len); > > > > > > + > > > > > > bool vhost_has_free_slot(void) > > > > > > { > > > > > > unsigned int slots_limit = ~0U; > > > > > > @@ -634,12 +637,39 @@ static void vhost_region_addnop(MemoryListener *listener, > > > > > > vhost_region_add_section(dev, section); > > > > > > } > > > > > > > > > > > > -static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) > > > > > > +static void vhost_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) > > > > > > { > > > > > > struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n); > > > > > > struct vhost_dev *hdev = iommu->hdev; > > > > > > hwaddr iova = iotlb->iova + iommu->iommu_offset; > > > > > > > > > > > > + if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) { > > > > > > + uint64_t uaddr, len; > > > > > > + > > > > > > + rcu_read_lock(); > > > > > > + > > > > > > + if (iotlb->target_as != NULL) { > > > > > > + if (vhost_memory_region_lookup(hdev, iotlb->translated_addr, > > > > > > + &uaddr, &len)) { > > > > > > + error_report("Fail to lookup the translated address " > > > > > > + "%"PRIx64, iotlb->translated_addr); > > > > > > + goto out; > > > > > > + } > > > > > > + > > > > > > + len = MIN(iotlb->addr_mask + 1, len); > > > > > > + iova = iova & ~iotlb->addr_mask; > > > > > > + > > > > > > + if (vhost_backend_update_device_iotlb(hdev, iova, uaddr, > > > > > > + len, iotlb->perm)) { > > > > > > + error_report("Fail to update device iotlb"); > > > > > > + goto out; > > > > > > + } > > > > > > + } > > > > > > +out: > > > > > > + rcu_read_unlock(); > > > > > > + return; > > > > > > + } > > > > > > + > > > > > > if (vhost_backend_invalidate_device_iotlb(hdev, iova, > > > > > > iotlb->addr_mask + 1)) { > > > > > > error_report("Fail to invalidate device iotlb"); > > > > > > @@ -652,6 +682,7 @@ static void vhost_iommu_region_add(MemoryListener *listener, > > > > > > struct vhost_dev *dev = container_of(listener, struct vhost_dev, > > > > > > iommu_listener); > > > > > > struct vhost_iommu *iommu; > > > > > > + IOMMUNotifierFlag flags; > > > > > > Int128 end; > > > > > > > > > > > > if (!memory_region_is_iommu(section->mr)) { > > > > > > @@ -662,8 +693,15 @@ static void vhost_iommu_region_add(MemoryListener *listener, > > > > > > end = int128_add(int128_make64(section->offset_within_region), > > > > > > section->size); > > > > > > end = int128_sub(end, int128_one()); > > > > > > - iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify, > > > > > > - IOMMU_NOTIFIER_UNMAP, > > > > > > + > > > > > > + if (dev->vhost_ops->vhost_backend_need_all_device_iotlb(dev)) { > > > > > > + flags = IOMMU_NOTIFIER_ALL; > > > > > > + } else { > > > > > > + flags = IOMMU_NOTIFIER_UNMAP; > > > > > > + } > > > > > > + > > > > > > + iommu_notifier_init(&iommu->n, vhost_iommu_map_notify, > > > > > > + flags, > > > > > > section->offset_within_region, > > > > > > int128_get64(end)); > > > > > > iommu->mr = section->mr; > > > > > > @@ -673,6 +711,9 @@ static void vhost_iommu_region_add(MemoryListener *listener, > > > > > > memory_region_register_iommu_notifier(section->mr, &iommu->n); > > > > > > QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next); > > > > > > /* TODO: can replay help performance here? */ > > > > > > + if (flags == IOMMU_NOTIFIER_ALL) { > > > > > > + memory_region_iommu_replay(IOMMU_MEMORY_REGION(iommu->mr), &iommu->n); > > > > > > + } > > > > > > } > > > > > > > > > > > > static void vhost_iommu_region_del(MemoryListener *listener, > > > > > > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h > > > > > > index 5dac61f9ea..602fd987a4 100644 > > > > > > --- a/include/hw/virtio/vhost-backend.h > > > > > > +++ b/include/hw/virtio/vhost-backend.h > > > > > > @@ -101,6 +101,8 @@ typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev, > > > > > > typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev, > > > > > > uint64_t session_id); > > > > > > > > > > > > +typedef bool (*vhost_backend_need_all_device_iotlb_op)(struct vhost_dev *dev); > > > > > > + > > > > > > typedef struct VhostOps { > > > > > > VhostBackendType backend_type; > > > > > > vhost_backend_init vhost_backend_init; > > > > > > @@ -138,6 +140,7 @@ typedef struct VhostOps { > > > > > > vhost_set_config_op vhost_set_config; > > > > > > vhost_crypto_create_session_op vhost_crypto_create_session; > > > > > > vhost_crypto_close_session_op vhost_crypto_close_session; > > > > > > + vhost_backend_need_all_device_iotlb_op vhost_backend_need_all_device_iotlb; > > > > > > } VhostOps; > > > > > > > > > > > > extern const VhostOps user_ops; > > > > > > -- > > > > > > 2.11.0
On Thu, Apr 12, 2018 at 10:35:05AM +0800, Tiwei Bie wrote: > On Thu, Apr 12, 2018 at 04:57:13AM +0300, Michael S. Tsirkin wrote: > > On Thu, Apr 12, 2018 at 09:39:43AM +0800, Tiwei Bie wrote: > > > On Thu, Apr 12, 2018 at 04:29:29AM +0300, Michael S. Tsirkin wrote: > > > > On Thu, Apr 12, 2018 at 09:10:59AM +0800, Tiwei Bie wrote: > > > > > On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote: > > > > > > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote: > > > > > > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > > > > > > > feature for vhost-user. By default, vhost-user backend needs > > > > > > > to query the IOTLBs from QEMU after meeting unknown IOVAs. > > > > > > > With this protocol feature negotiated, QEMU will provide all > > > > > > > the IOTLBs to vhost-user backend without waiting for the > > > > > > > queries from backend. This is helpful when using a hardware > > > > > > > accelerator which is not able to handle unknown IOVAs at the > > > > > > > vhost-user backend. > > > > > > > > > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> > > > > > > > > > > > > This is potentially a large amount of data to be sent > > > > > > on a socket. > > > > > > > > > > If we take the hardware accelerator out of this picture, we > > > > > will find that it's actually a question of "pre-loading" vs > > > > > "lazy-loading". I think neither of them is perfect. > > > > > > > > > > For "pre-loading", as you said, we may have a tough starting. > > > > > But for "lazy-loading", we can't have a predictable performance. > > > > > A sudden, unexpected performance drop may happen at any time, > > > > > because we may meet an unknown IOVA at any time in this case. > > > > > > > > That's how hardware behaves too though. So we can expect guests > > > > to try to optimize locality. > > > > > > The difference is that, the software implementation needs to > > > query the mappings via socket. And it's much slower.. > > > > If you are proposing this new feature as an optimization, > > then I'd like to see numbers showing the performance gains. > > > > It's definitely possible to optimize things out. Pre-loading isn't > > where I would start optimizing though. For example, DPDK could have its > > own VTD emulation, then it could access guest memory directly. > > > > > > > > > > > > > Once we meet an unknown IOVA, the backend's data path will need > > > > > to stop and query the mapping of the IOVA via the socket and > > > > > wait for the reply. And the latency is not negligible (sometimes > > > > > it's even unacceptable), especially in high performance network > > > > > case. So maybe it's better to make both of them available to > > > > > the vhost backend. > > > > > > > > > > > > > > > > > I had an impression that a hardware accelerator was using > > > > > > VFIO anyway. Given this, can't we have QEMU program > > > > > > the shadow IOMMU tables into VFIO directly? > > > > > > > > > > I think it's a good idea! Currently, my concern about it is > > > > > that, the hardware device may not use IOMMU and it may have > > > > > its builtin address translation unit. And it will be a pain > > > > > for device vendors to teach VFIO to be able to work with the > > > > > builtin address translation unit. > > > > > > > > I think such drivers would have to interate with VFIO somehow. > > > > Otherwise, what is the plan for assigning such devices then? > > > > > > Such devices are just for vhost data path acceleration. > > > > That's not true I think. E.g. RDMA devices have an on-card MMU. > > > > > They have many available queue pairs, the switch logic > > > will be done among those queue pairs. And different queue > > > pairs will serve different VMs directly. > > > > > > Best regards, > > > Tiwei Bie > > > > The way I would do it is attach different PASID values to > > different queues. This way you can use the standard IOMMU > > to enforce protection. > > I'm thinking about the case that device wants to use its > builtin address translation, although I'm not really sure > whether there will be a real product work in this way. > > Honestly, I like your idea, and I don't object to it. I'll > do more investigations on it. And for the feature proposed > in this RFC, I just think maybe it's better to provide one > more possibility for the backend to support vIOMMU. > > Anyway, the work about adding the vIOMMU support in vDPA is > just started few days ago. I'll do more investigations on > each possibility. Thanks! :) > > Best regards, > Tiwei Bie There are more advantages to using request with PASID: You can use hardware support for nesting, having guest supply 1st level translation and host second level translation. I actually had an idea to do something like this for AMD and ARM which support nesting even for requests with PASID, having intel benefit too would be nice. > > > > > > > > > > > > > > > > > Best regards, > > > > > Tiwei Bie > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > The idea of this patch is to let QEMU push all the IOTLBs > > > > > > > to vhost-user backend without waiting for the queries from > > > > > > > the backend. Because hardware accelerator at the vhost-user > > > > > > > backend may not be able to handle unknown IOVAs. > > > > > > > > > > > > > > This is just a RFC for now. It seems that, it doesn't work > > > > > > > as expected when guest is using kernel driver (To handle > > > > > > > this case, it seems that some RAM regions' events also need > > > > > > > to be listened). Any comments would be appreciated! Thanks! > > > > > > > > > > > > > > docs/interop/vhost-user.txt | 9 ++++++++ > > > > > > > hw/virtio/vhost-backend.c | 7 ++++++ > > > > > > > hw/virtio/vhost-user.c | 8 +++++++ > > > > > > > hw/virtio/vhost.c | 47 ++++++++++++++++++++++++++++++++++++--- > > > > > > > include/hw/virtio/vhost-backend.h | 3 +++ > > > > > > > 5 files changed, 71 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt > > > > > > > index 534caab18a..73e07f9dad 100644 > > > > > > > --- a/docs/interop/vhost-user.txt > > > > > > > +++ b/docs/interop/vhost-user.txt > > > > > > > @@ -380,6 +380,7 @@ Protocol features > > > > > > > #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7 > > > > > > > #define VHOST_USER_PROTOCOL_F_PAGEFAULT 8 > > > > > > > #define VHOST_USER_PROTOCOL_F_CONFIG 9 > > > > > > > +#define VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB 10 > > > > > > > > > > > > > > Master message types > > > > > > > -------------------- > > > > > > > @@ -797,3 +798,11 @@ resilient for selective requests. > > > > > > > For the message types that already solicit a reply from the client, the > > > > > > > presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings > > > > > > > no behavioural change. (See the 'Communication' section for details.) > > > > > > > + > > > > > > > +VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > > > > > > > +------------------------------------ > > > > > > > +By default, vhost-user backend needs to query the IOTLBs from QEMU after > > > > > > > +meeting unknown IOVAs. With this protocol feature negotiated, QEMU will > > > > > > > +provide all the IOTLBs to vhost backend without waiting for the queries > > > > > > > +from backend. This is helpful when using a hardware accelerator which is > > > > > > > +not able to handle unknown IOVAs at the vhost-user backend. > > > > > > > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c > > > > > > > index 7f09efab8b..d72994e9a5 100644 > > > > > > > --- a/hw/virtio/vhost-backend.c > > > > > > > +++ b/hw/virtio/vhost-backend.c > > > > > > > @@ -233,6 +233,11 @@ static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev, > > > > > > > qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL, NULL); > > > > > > > } > > > > > > > > > > > > > > +static bool vhost_kernel_need_all_device_iotlb(struct vhost_dev *dev) > > > > > > > +{ > > > > > > > + return false; > > > > > > > +} > > > > > > > + > > > > > > > static const VhostOps kernel_ops = { > > > > > > > .backend_type = VHOST_BACKEND_TYPE_KERNEL, > > > > > > > .vhost_backend_init = vhost_kernel_init, > > > > > > > @@ -264,6 +269,8 @@ static const VhostOps kernel_ops = { > > > > > > > #endif /* CONFIG_VHOST_VSOCK */ > > > > > > > .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback, > > > > > > > .vhost_send_device_iotlb_msg = vhost_kernel_send_device_iotlb_msg, > > > > > > > + .vhost_backend_need_all_device_iotlb = > > > > > > > + vhost_kernel_need_all_device_iotlb, > > > > > > > }; > > > > > > > > > > > > > > int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type) > > > > > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > > > > > > index 38da8692bb..30e0b1c13b 100644 > > > > > > > --- a/hw/virtio/vhost-user.c > > > > > > > +++ b/hw/virtio/vhost-user.c > > > > > > > @@ -47,6 +47,7 @@ enum VhostUserProtocolFeature { > > > > > > > VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7, > > > > > > > VHOST_USER_PROTOCOL_F_PAGEFAULT = 8, > > > > > > > VHOST_USER_PROTOCOL_F_CONFIG = 9, > > > > > > > + VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB = 10, > > > > > > > VHOST_USER_PROTOCOL_F_MAX > > > > > > > }; > > > > > > > > > > > > > > @@ -1581,6 +1582,12 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id) > > > > > > > return 0; > > > > > > > } > > > > > > > > > > > > > > +static bool vhost_user_need_all_device_iotlb(struct vhost_dev *dev) > > > > > > > +{ > > > > > > > + return virtio_has_feature(dev->protocol_features, > > > > > > > + VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB); > > > > > > > +} > > > > > > > + > > > > > > > const VhostOps user_ops = { > > > > > > > .backend_type = VHOST_BACKEND_TYPE_USER, > > > > > > > .vhost_backend_init = vhost_user_init, > > > > > > > @@ -1611,4 +1618,5 @@ const VhostOps user_ops = { > > > > > > > .vhost_set_config = vhost_user_set_config, > > > > > > > .vhost_crypto_create_session = vhost_user_crypto_create_session, > > > > > > > .vhost_crypto_close_session = vhost_user_crypto_close_session, > > > > > > > + .vhost_backend_need_all_device_iotlb = vhost_user_need_all_device_iotlb, > > > > > > > }; > > > > > > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > > > > > > > index f51bf573d5..70922ee998 100644 > > > > > > > --- a/hw/virtio/vhost.c > > > > > > > +++ b/hw/virtio/vhost.c > > > > > > > @@ -48,6 +48,9 @@ static unsigned int used_memslots; > > > > > > > static QLIST_HEAD(, vhost_dev) vhost_devices = > > > > > > > QLIST_HEAD_INITIALIZER(vhost_devices); > > > > > > > > > > > > > > +static int vhost_memory_region_lookup(struct vhost_dev *hdev, uint64_t gpa, > > > > > > > + uint64_t *uaddr, uint64_t *len); > > > > > > > + > > > > > > > bool vhost_has_free_slot(void) > > > > > > > { > > > > > > > unsigned int slots_limit = ~0U; > > > > > > > @@ -634,12 +637,39 @@ static void vhost_region_addnop(MemoryListener *listener, > > > > > > > vhost_region_add_section(dev, section); > > > > > > > } > > > > > > > > > > > > > > -static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) > > > > > > > +static void vhost_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) > > > > > > > { > > > > > > > struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n); > > > > > > > struct vhost_dev *hdev = iommu->hdev; > > > > > > > hwaddr iova = iotlb->iova + iommu->iommu_offset; > > > > > > > > > > > > > > + if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) { > > > > > > > + uint64_t uaddr, len; > > > > > > > + > > > > > > > + rcu_read_lock(); > > > > > > > + > > > > > > > + if (iotlb->target_as != NULL) { > > > > > > > + if (vhost_memory_region_lookup(hdev, iotlb->translated_addr, > > > > > > > + &uaddr, &len)) { > > > > > > > + error_report("Fail to lookup the translated address " > > > > > > > + "%"PRIx64, iotlb->translated_addr); > > > > > > > + goto out; > > > > > > > + } > > > > > > > + > > > > > > > + len = MIN(iotlb->addr_mask + 1, len); > > > > > > > + iova = iova & ~iotlb->addr_mask; > > > > > > > + > > > > > > > + if (vhost_backend_update_device_iotlb(hdev, iova, uaddr, > > > > > > > + len, iotlb->perm)) { > > > > > > > + error_report("Fail to update device iotlb"); > > > > > > > + goto out; > > > > > > > + } > > > > > > > + } > > > > > > > +out: > > > > > > > + rcu_read_unlock(); > > > > > > > + return; > > > > > > > + } > > > > > > > + > > > > > > > if (vhost_backend_invalidate_device_iotlb(hdev, iova, > > > > > > > iotlb->addr_mask + 1)) { > > > > > > > error_report("Fail to invalidate device iotlb"); > > > > > > > @@ -652,6 +682,7 @@ static void vhost_iommu_region_add(MemoryListener *listener, > > > > > > > struct vhost_dev *dev = container_of(listener, struct vhost_dev, > > > > > > > iommu_listener); > > > > > > > struct vhost_iommu *iommu; > > > > > > > + IOMMUNotifierFlag flags; > > > > > > > Int128 end; > > > > > > > > > > > > > > if (!memory_region_is_iommu(section->mr)) { > > > > > > > @@ -662,8 +693,15 @@ static void vhost_iommu_region_add(MemoryListener *listener, > > > > > > > end = int128_add(int128_make64(section->offset_within_region), > > > > > > > section->size); > > > > > > > end = int128_sub(end, int128_one()); > > > > > > > - iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify, > > > > > > > - IOMMU_NOTIFIER_UNMAP, > > > > > > > + > > > > > > > + if (dev->vhost_ops->vhost_backend_need_all_device_iotlb(dev)) { > > > > > > > + flags = IOMMU_NOTIFIER_ALL; > > > > > > > + } else { > > > > > > > + flags = IOMMU_NOTIFIER_UNMAP; > > > > > > > + } > > > > > > > + > > > > > > > + iommu_notifier_init(&iommu->n, vhost_iommu_map_notify, > > > > > > > + flags, > > > > > > > section->offset_within_region, > > > > > > > int128_get64(end)); > > > > > > > iommu->mr = section->mr; > > > > > > > @@ -673,6 +711,9 @@ static void vhost_iommu_region_add(MemoryListener *listener, > > > > > > > memory_region_register_iommu_notifier(section->mr, &iommu->n); > > > > > > > QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next); > > > > > > > /* TODO: can replay help performance here? */ > > > > > > > + if (flags == IOMMU_NOTIFIER_ALL) { > > > > > > > + memory_region_iommu_replay(IOMMU_MEMORY_REGION(iommu->mr), &iommu->n); > > > > > > > + } > > > > > > > } > > > > > > > > > > > > > > static void vhost_iommu_region_del(MemoryListener *listener, > > > > > > > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h > > > > > > > index 5dac61f9ea..602fd987a4 100644 > > > > > > > --- a/include/hw/virtio/vhost-backend.h > > > > > > > +++ b/include/hw/virtio/vhost-backend.h > > > > > > > @@ -101,6 +101,8 @@ typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev, > > > > > > > typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev, > > > > > > > uint64_t session_id); > > > > > > > > > > > > > > +typedef bool (*vhost_backend_need_all_device_iotlb_op)(struct vhost_dev *dev); > > > > > > > + > > > > > > > typedef struct VhostOps { > > > > > > > VhostBackendType backend_type; > > > > > > > vhost_backend_init vhost_backend_init; > > > > > > > @@ -138,6 +140,7 @@ typedef struct VhostOps { > > > > > > > vhost_set_config_op vhost_set_config; > > > > > > > vhost_crypto_create_session_op vhost_crypto_create_session; > > > > > > > vhost_crypto_close_session_op vhost_crypto_close_session; > > > > > > > + vhost_backend_need_all_device_iotlb_op vhost_backend_need_all_device_iotlb; > > > > > > > } VhostOps; > > > > > > > > > > > > > > extern const VhostOps user_ops; > > > > > > > -- > > > > > > > 2.11.0
On 2018年04月12日 01:00, Michael S. Tsirkin wrote: > On Wed, Apr 11, 2018 at 09:41:05PM +0800, Jason Wang wrote: >> On 2018年04月11日 16:38, Tiwei Bie wrote: >>> On Wed, Apr 11, 2018 at 04:01:19PM +0800, Jason Wang wrote: >>>> On 2018年04月11日 15:20, Tiwei Bie wrote: >>>>> This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB >>>>> feature for vhost-user. By default, vhost-user backend needs >>>>> to query the IOTLBs from QEMU after meeting unknown IOVAs. >>>>> With this protocol feature negotiated, QEMU will provide all >>>>> the IOTLBs to vhost-user backend without waiting for the >>>>> queries from backend. This is helpful when using a hardware >>>>> accelerator which is not able to handle unknown IOVAs at the >>>>> vhost-user backend. >>>>> >>>>> Signed-off-by: Tiwei Bie<tiwei.bie@intel.com> >>>>> --- >>>>> The idea of this patch is to let QEMU push all the IOTLBs >>>>> to vhost-user backend without waiting for the queries from >>>>> the backend. Because hardware accelerator at the vhost-user >>>>> backend may not be able to handle unknown IOVAs. >>>>> >>>>> This is just a RFC for now. It seems that, it doesn't work >>>>> as expected when guest is using kernel driver (To handle >>>>> this case, it seems that some RAM regions' events also need >>>>> to be listened). Any comments would be appreciated! Thanks! >>>> Interesting, a quick question is why this is needed? Can we just use exist >>>> IOTLB update message? >>> Yeah, we are still using the existing IOTLB update messages >>> to send the IOTLB messages to backend. The only difference >>> is that, QEMU won't wait for the queries before sending the >>> IOTLB update messages. >> Yes, my question is not very clear. I mean why must need a new feature bit? >> It looks to me qemu code can work without this. >> >> Thanks > Generally we avoid adding new messages without a protocol feature bit. > While careful analysis might sometimes prove it's not a strict > requirement, it's just overall a clean and robust approach. > Right but the looks like the patch does not introduce any new type of messages. Thanks
On Thu, Apr 12, 2018 at 06:20:55AM +0300, Michael S. Tsirkin wrote: > On Thu, Apr 12, 2018 at 10:35:05AM +0800, Tiwei Bie wrote: > > On Thu, Apr 12, 2018 at 04:57:13AM +0300, Michael S. Tsirkin wrote: > > > On Thu, Apr 12, 2018 at 09:39:43AM +0800, Tiwei Bie wrote: > > > > On Thu, Apr 12, 2018 at 04:29:29AM +0300, Michael S. Tsirkin wrote: > > > > > On Thu, Apr 12, 2018 at 09:10:59AM +0800, Tiwei Bie wrote: > > > > > > On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote: > > > > > > > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote: > > > > > > > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > > > > > > > > feature for vhost-user. By default, vhost-user backend needs > > > > > > > > to query the IOTLBs from QEMU after meeting unknown IOVAs. > > > > > > > > With this protocol feature negotiated, QEMU will provide all > > > > > > > > the IOTLBs to vhost-user backend without waiting for the > > > > > > > > queries from backend. This is helpful when using a hardware > > > > > > > > accelerator which is not able to handle unknown IOVAs at the > > > > > > > > vhost-user backend. > > > > > > > > > > > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> > > > > > > > > > > > > > > This is potentially a large amount of data to be sent > > > > > > > on a socket. > > > > > > > > > > > > If we take the hardware accelerator out of this picture, we > > > > > > will find that it's actually a question of "pre-loading" vs > > > > > > "lazy-loading". I think neither of them is perfect. > > > > > > > > > > > > For "pre-loading", as you said, we may have a tough starting. > > > > > > But for "lazy-loading", we can't have a predictable performance. > > > > > > A sudden, unexpected performance drop may happen at any time, > > > > > > because we may meet an unknown IOVA at any time in this case. > > > > > > > > > > That's how hardware behaves too though. So we can expect guests > > > > > to try to optimize locality. > > > > > > > > The difference is that, the software implementation needs to > > > > query the mappings via socket. And it's much slower.. > > > > > > If you are proposing this new feature as an optimization, > > > then I'd like to see numbers showing the performance gains. > > > > > > It's definitely possible to optimize things out. Pre-loading isn't > > > where I would start optimizing though. For example, DPDK could have its > > > own VTD emulation, then it could access guest memory directly. > > > > > > > > > > > > > > > > > Once we meet an unknown IOVA, the backend's data path will need > > > > > > to stop and query the mapping of the IOVA via the socket and > > > > > > wait for the reply. And the latency is not negligible (sometimes > > > > > > it's even unacceptable), especially in high performance network > > > > > > case. So maybe it's better to make both of them available to > > > > > > the vhost backend. > > > > > > > > > > > > > > > > > > > > I had an impression that a hardware accelerator was using > > > > > > > VFIO anyway. Given this, can't we have QEMU program > > > > > > > the shadow IOMMU tables into VFIO directly? > > > > > > > > > > > > I think it's a good idea! Currently, my concern about it is > > > > > > that, the hardware device may not use IOMMU and it may have > > > > > > its builtin address translation unit. And it will be a pain > > > > > > for device vendors to teach VFIO to be able to work with the > > > > > > builtin address translation unit. > > > > > > > > > > I think such drivers would have to interate with VFIO somehow. > > > > > Otherwise, what is the plan for assigning such devices then? > > > > > > > > Such devices are just for vhost data path acceleration. > > > > > > That's not true I think. E.g. RDMA devices have an on-card MMU. > > > > > > > They have many available queue pairs, the switch logic > > > > will be done among those queue pairs. And different queue > > > > pairs will serve different VMs directly. > > > > > > > > Best regards, > > > > Tiwei Bie > > > > > > The way I would do it is attach different PASID values to > > > different queues. This way you can use the standard IOMMU > > > to enforce protection. > > > > I'm thinking about the case that device wants to use its > > builtin address translation, although I'm not really sure > > whether there will be a real product work in this way. > > > > Honestly, I like your idea, and I don't object to it. I'll > > do more investigations on it. And for the feature proposed > > in this RFC, I just think maybe it's better to provide one > > more possibility for the backend to support vIOMMU. > > > > Anyway, the work about adding the vIOMMU support in vDPA is > > just started few days ago. I'll do more investigations on > > each possibility. Thanks! :) > > > > Best regards, > > Tiwei Bie > > There are more advantages to using request with PASID: > > You can use hardware support for nesting, having guest supply 1st level > translation and host second level translation. > > I actually had an idea to do something like this for AMD > and ARM which support nesting even for requests with PASID, > having intel benefit too would be nice. Something else to consider is implementing PRS capability. In theory this could then go like this: - get page request from device - fetch request from VTD page tables - use response to issue a page response message This would match the current vhost-user model. > > > > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > Tiwei Bie > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > > The idea of this patch is to let QEMU push all the IOTLBs > > > > > > > > to vhost-user backend without waiting for the queries from > > > > > > > > the backend. Because hardware accelerator at the vhost-user > > > > > > > > backend may not be able to handle unknown IOVAs. > > > > > > > > > > > > > > > > This is just a RFC for now. It seems that, it doesn't work > > > > > > > > as expected when guest is using kernel driver (To handle > > > > > > > > this case, it seems that some RAM regions' events also need > > > > > > > > to be listened). Any comments would be appreciated! Thanks! > > > > > > > > > > > > > > > > docs/interop/vhost-user.txt | 9 ++++++++ > > > > > > > > hw/virtio/vhost-backend.c | 7 ++++++ > > > > > > > > hw/virtio/vhost-user.c | 8 +++++++ > > > > > > > > hw/virtio/vhost.c | 47 ++++++++++++++++++++++++++++++++++++--- > > > > > > > > include/hw/virtio/vhost-backend.h | 3 +++ > > > > > > > > 5 files changed, 71 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > > > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt > > > > > > > > index 534caab18a..73e07f9dad 100644 > > > > > > > > --- a/docs/interop/vhost-user.txt > > > > > > > > +++ b/docs/interop/vhost-user.txt > > > > > > > > @@ -380,6 +380,7 @@ Protocol features > > > > > > > > #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7 > > > > > > > > #define VHOST_USER_PROTOCOL_F_PAGEFAULT 8 > > > > > > > > #define VHOST_USER_PROTOCOL_F_CONFIG 9 > > > > > > > > +#define VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB 10 > > > > > > > > > > > > > > > > Master message types > > > > > > > > -------------------- > > > > > > > > @@ -797,3 +798,11 @@ resilient for selective requests. > > > > > > > > For the message types that already solicit a reply from the client, the > > > > > > > > presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings > > > > > > > > no behavioural change. (See the 'Communication' section for details.) > > > > > > > > + > > > > > > > > +VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > > > > > > > > +------------------------------------ > > > > > > > > +By default, vhost-user backend needs to query the IOTLBs from QEMU after > > > > > > > > +meeting unknown IOVAs. With this protocol feature negotiated, QEMU will > > > > > > > > +provide all the IOTLBs to vhost backend without waiting for the queries > > > > > > > > +from backend. This is helpful when using a hardware accelerator which is > > > > > > > > +not able to handle unknown IOVAs at the vhost-user backend. > > > > > > > > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c > > > > > > > > index 7f09efab8b..d72994e9a5 100644 > > > > > > > > --- a/hw/virtio/vhost-backend.c > > > > > > > > +++ b/hw/virtio/vhost-backend.c > > > > > > > > @@ -233,6 +233,11 @@ static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev, > > > > > > > > qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL, NULL); > > > > > > > > } > > > > > > > > > > > > > > > > +static bool vhost_kernel_need_all_device_iotlb(struct vhost_dev *dev) > > > > > > > > +{ > > > > > > > > + return false; > > > > > > > > +} > > > > > > > > + > > > > > > > > static const VhostOps kernel_ops = { > > > > > > > > .backend_type = VHOST_BACKEND_TYPE_KERNEL, > > > > > > > > .vhost_backend_init = vhost_kernel_init, > > > > > > > > @@ -264,6 +269,8 @@ static const VhostOps kernel_ops = { > > > > > > > > #endif /* CONFIG_VHOST_VSOCK */ > > > > > > > > .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback, > > > > > > > > .vhost_send_device_iotlb_msg = vhost_kernel_send_device_iotlb_msg, > > > > > > > > + .vhost_backend_need_all_device_iotlb = > > > > > > > > + vhost_kernel_need_all_device_iotlb, > > > > > > > > }; > > > > > > > > > > > > > > > > int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type) > > > > > > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > > > > > > > index 38da8692bb..30e0b1c13b 100644 > > > > > > > > --- a/hw/virtio/vhost-user.c > > > > > > > > +++ b/hw/virtio/vhost-user.c > > > > > > > > @@ -47,6 +47,7 @@ enum VhostUserProtocolFeature { > > > > > > > > VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7, > > > > > > > > VHOST_USER_PROTOCOL_F_PAGEFAULT = 8, > > > > > > > > VHOST_USER_PROTOCOL_F_CONFIG = 9, > > > > > > > > + VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB = 10, > > > > > > > > VHOST_USER_PROTOCOL_F_MAX > > > > > > > > }; > > > > > > > > > > > > > > > > @@ -1581,6 +1582,12 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id) > > > > > > > > return 0; > > > > > > > > } > > > > > > > > > > > > > > > > +static bool vhost_user_need_all_device_iotlb(struct vhost_dev *dev) > > > > > > > > +{ > > > > > > > > + return virtio_has_feature(dev->protocol_features, > > > > > > > > + VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB); > > > > > > > > +} > > > > > > > > + > > > > > > > > const VhostOps user_ops = { > > > > > > > > .backend_type = VHOST_BACKEND_TYPE_USER, > > > > > > > > .vhost_backend_init = vhost_user_init, > > > > > > > > @@ -1611,4 +1618,5 @@ const VhostOps user_ops = { > > > > > > > > .vhost_set_config = vhost_user_set_config, > > > > > > > > .vhost_crypto_create_session = vhost_user_crypto_create_session, > > > > > > > > .vhost_crypto_close_session = vhost_user_crypto_close_session, > > > > > > > > + .vhost_backend_need_all_device_iotlb = vhost_user_need_all_device_iotlb, > > > > > > > > }; > > > > > > > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > > > > > > > > index f51bf573d5..70922ee998 100644 > > > > > > > > --- a/hw/virtio/vhost.c > > > > > > > > +++ b/hw/virtio/vhost.c > > > > > > > > @@ -48,6 +48,9 @@ static unsigned int used_memslots; > > > > > > > > static QLIST_HEAD(, vhost_dev) vhost_devices = > > > > > > > > QLIST_HEAD_INITIALIZER(vhost_devices); > > > > > > > > > > > > > > > > +static int vhost_memory_region_lookup(struct vhost_dev *hdev, uint64_t gpa, > > > > > > > > + uint64_t *uaddr, uint64_t *len); > > > > > > > > + > > > > > > > > bool vhost_has_free_slot(void) > > > > > > > > { > > > > > > > > unsigned int slots_limit = ~0U; > > > > > > > > @@ -634,12 +637,39 @@ static void vhost_region_addnop(MemoryListener *listener, > > > > > > > > vhost_region_add_section(dev, section); > > > > > > > > } > > > > > > > > > > > > > > > > -static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) > > > > > > > > +static void vhost_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) > > > > > > > > { > > > > > > > > struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n); > > > > > > > > struct vhost_dev *hdev = iommu->hdev; > > > > > > > > hwaddr iova = iotlb->iova + iommu->iommu_offset; > > > > > > > > > > > > > > > > + if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) { > > > > > > > > + uint64_t uaddr, len; > > > > > > > > + > > > > > > > > + rcu_read_lock(); > > > > > > > > + > > > > > > > > + if (iotlb->target_as != NULL) { > > > > > > > > + if (vhost_memory_region_lookup(hdev, iotlb->translated_addr, > > > > > > > > + &uaddr, &len)) { > > > > > > > > + error_report("Fail to lookup the translated address " > > > > > > > > + "%"PRIx64, iotlb->translated_addr); > > > > > > > > + goto out; > > > > > > > > + } > > > > > > > > + > > > > > > > > + len = MIN(iotlb->addr_mask + 1, len); > > > > > > > > + iova = iova & ~iotlb->addr_mask; > > > > > > > > + > > > > > > > > + if (vhost_backend_update_device_iotlb(hdev, iova, uaddr, > > > > > > > > + len, iotlb->perm)) { > > > > > > > > + error_report("Fail to update device iotlb"); > > > > > > > > + goto out; > > > > > > > > + } > > > > > > > > + } > > > > > > > > +out: > > > > > > > > + rcu_read_unlock(); > > > > > > > > + return; > > > > > > > > + } > > > > > > > > + > > > > > > > > if (vhost_backend_invalidate_device_iotlb(hdev, iova, > > > > > > > > iotlb->addr_mask + 1)) { > > > > > > > > error_report("Fail to invalidate device iotlb"); > > > > > > > > @@ -652,6 +682,7 @@ static void vhost_iommu_region_add(MemoryListener *listener, > > > > > > > > struct vhost_dev *dev = container_of(listener, struct vhost_dev, > > > > > > > > iommu_listener); > > > > > > > > struct vhost_iommu *iommu; > > > > > > > > + IOMMUNotifierFlag flags; > > > > > > > > Int128 end; > > > > > > > > > > > > > > > > if (!memory_region_is_iommu(section->mr)) { > > > > > > > > @@ -662,8 +693,15 @@ static void vhost_iommu_region_add(MemoryListener *listener, > > > > > > > > end = int128_add(int128_make64(section->offset_within_region), > > > > > > > > section->size); > > > > > > > > end = int128_sub(end, int128_one()); > > > > > > > > - iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify, > > > > > > > > - IOMMU_NOTIFIER_UNMAP, > > > > > > > > + > > > > > > > > + if (dev->vhost_ops->vhost_backend_need_all_device_iotlb(dev)) { > > > > > > > > + flags = IOMMU_NOTIFIER_ALL; > > > > > > > > + } else { > > > > > > > > + flags = IOMMU_NOTIFIER_UNMAP; > > > > > > > > + } > > > > > > > > + > > > > > > > > + iommu_notifier_init(&iommu->n, vhost_iommu_map_notify, > > > > > > > > + flags, > > > > > > > > section->offset_within_region, > > > > > > > > int128_get64(end)); > > > > > > > > iommu->mr = section->mr; > > > > > > > > @@ -673,6 +711,9 @@ static void vhost_iommu_region_add(MemoryListener *listener, > > > > > > > > memory_region_register_iommu_notifier(section->mr, &iommu->n); > > > > > > > > QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next); > > > > > > > > /* TODO: can replay help performance here? */ > > > > > > > > + if (flags == IOMMU_NOTIFIER_ALL) { > > > > > > > > + memory_region_iommu_replay(IOMMU_MEMORY_REGION(iommu->mr), &iommu->n); > > > > > > > > + } > > > > > > > > } > > > > > > > > > > > > > > > > static void vhost_iommu_region_del(MemoryListener *listener, > > > > > > > > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h > > > > > > > > index 5dac61f9ea..602fd987a4 100644 > > > > > > > > --- a/include/hw/virtio/vhost-backend.h > > > > > > > > +++ b/include/hw/virtio/vhost-backend.h > > > > > > > > @@ -101,6 +101,8 @@ typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev, > > > > > > > > typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev, > > > > > > > > uint64_t session_id); > > > > > > > > > > > > > > > > +typedef bool (*vhost_backend_need_all_device_iotlb_op)(struct vhost_dev *dev); > > > > > > > > + > > > > > > > > typedef struct VhostOps { > > > > > > > > VhostBackendType backend_type; > > > > > > > > vhost_backend_init vhost_backend_init; > > > > > > > > @@ -138,6 +140,7 @@ typedef struct VhostOps { > > > > > > > > vhost_set_config_op vhost_set_config; > > > > > > > > vhost_crypto_create_session_op vhost_crypto_create_session; > > > > > > > > vhost_crypto_close_session_op vhost_crypto_close_session; > > > > > > > > + vhost_backend_need_all_device_iotlb_op vhost_backend_need_all_device_iotlb; > > > > > > > > } VhostOps; > > > > > > > > > > > > > > > > extern const VhostOps user_ops; > > > > > > > > -- > > > > > > > > 2.11.0
On Thu, Apr 12, 2018 at 11:23:31AM +0800, Jason Wang wrote: > > > On 2018年04月12日 01:00, Michael S. Tsirkin wrote: > > On Wed, Apr 11, 2018 at 09:41:05PM +0800, Jason Wang wrote: > > > On 2018年04月11日 16:38, Tiwei Bie wrote: > > > > On Wed, Apr 11, 2018 at 04:01:19PM +0800, Jason Wang wrote: > > > > > On 2018年04月11日 15:20, Tiwei Bie wrote: > > > > > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > > > > > > feature for vhost-user. By default, vhost-user backend needs > > > > > > to query the IOTLBs from QEMU after meeting unknown IOVAs. > > > > > > With this protocol feature negotiated, QEMU will provide all > > > > > > the IOTLBs to vhost-user backend without waiting for the > > > > > > queries from backend. This is helpful when using a hardware > > > > > > accelerator which is not able to handle unknown IOVAs at the > > > > > > vhost-user backend. > > > > > > > > > > > > Signed-off-by: Tiwei Bie<tiwei.bie@intel.com> > > > > > > --- > > > > > > The idea of this patch is to let QEMU push all the IOTLBs > > > > > > to vhost-user backend without waiting for the queries from > > > > > > the backend. Because hardware accelerator at the vhost-user > > > > > > backend may not be able to handle unknown IOVAs. > > > > > > > > > > > > This is just a RFC for now. It seems that, it doesn't work > > > > > > as expected when guest is using kernel driver (To handle > > > > > > this case, it seems that some RAM regions' events also need > > > > > > to be listened). Any comments would be appreciated! Thanks! > > > > > Interesting, a quick question is why this is needed? Can we just use exist > > > > > IOTLB update message? > > > > Yeah, we are still using the existing IOTLB update messages > > > > to send the IOTLB messages to backend. The only difference > > > > is that, QEMU won't wait for the queries before sending the > > > > IOTLB update messages. > > > Yes, my question is not very clear. I mean why must need a new feature bit? > > > It looks to me qemu code can work without this. > > > > > > Thanks > > Generally we avoid adding new messages without a protocol feature bit. > > While careful analysis might sometimes prove it's not a strict > > requirement, it's just overall a clean and robust approach. > > > > Right but the looks like the patch does not introduce any new type of > messages. > > Thanks In this case remote needs to know that it will send these messages.
On 2018年04月12日 09:57, Michael S. Tsirkin wrote: > On Thu, Apr 12, 2018 at 09:39:43AM +0800, Tiwei Bie wrote: >> On Thu, Apr 12, 2018 at 04:29:29AM +0300, Michael S. Tsirkin wrote: >>> On Thu, Apr 12, 2018 at 09:10:59AM +0800, Tiwei Bie wrote: >>>> On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote: >>>>> On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote: >>>>>> This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB >>>>>> feature for vhost-user. By default, vhost-user backend needs >>>>>> to query the IOTLBs from QEMU after meeting unknown IOVAs. >>>>>> With this protocol feature negotiated, QEMU will provide all >>>>>> the IOTLBs to vhost-user backend without waiting for the >>>>>> queries from backend. This is helpful when using a hardware >>>>>> accelerator which is not able to handle unknown IOVAs at the >>>>>> vhost-user backend. >>>>>> >>>>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> >>>>> This is potentially a large amount of data to be sent >>>>> on a socket. >>>> If we take the hardware accelerator out of this picture, we >>>> will find that it's actually a question of "pre-loading" vs >>>> "lazy-loading". I think neither of them is perfect. >>>> >>>> For "pre-loading", as you said, we may have a tough starting. >>>> But for "lazy-loading", we can't have a predictable performance. >>>> A sudden, unexpected performance drop may happen at any time, >>>> because we may meet an unknown IOVA at any time in this case. >>> That's how hardware behaves too though. So we can expect guests >>> to try to optimize locality. >> The difference is that, the software implementation needs to >> query the mappings via socket. And it's much slower.. > If you are proposing this new feature as an optimization, > then I'd like to see numbers showing the performance gains. > > It's definitely possible to optimize things out. Pre-loading isn't > where I would start optimizing though. For example, DPDK could have its > own VTD emulation, then it could access guest memory directly. Have vtd emulation in dpdk have many disadvantages: - vendor locked, can only work for intel - duplication of codes and bugs - a huge number of new message types needs to be invented So I tend to go to a reverse way, link dpdk to qemu. > > >>>> Once we meet an unknown IOVA, the backend's data path will need >>>> to stop and query the mapping of the IOVA via the socket and >>>> wait for the reply. And the latency is not negligible (sometimes >>>> it's even unacceptable), especially in high performance network >>>> case. So maybe it's better to make both of them available to >>>> the vhost backend. >>>> >>>>> I had an impression that a hardware accelerator was using >>>>> VFIO anyway. Given this, can't we have QEMU program >>>>> the shadow IOMMU tables into VFIO directly? >>>> I think it's a good idea! Currently, my concern about it is >>>> that, the hardware device may not use IOMMU and it may have >>>> its builtin address translation unit. And it will be a pain >>>> for device vendors to teach VFIO to be able to work with the >>>> builtin address translation unit. >>> I think such drivers would have to interate with VFIO somehow. >>> Otherwise, what is the plan for assigning such devices then? >> Such devices are just for vhost data path acceleration. > That's not true I think. E.g. RDMA devices have an on-card MMU. > >> They have many available queue pairs, the switch logic >> will be done among those queue pairs. And different queue >> pairs will serve different VMs directly. >> >> Best regards, >> Tiwei Bie > The way I would do it is attach different PASID values to > different queues. This way you can use the standard IOMMU > to enforce protection. So that's just shared virtual memory on host which can share iova address space between a specific queue pair and a process. I'm not sure how hard can exist vhost-user backend to support this. Thanks > > > >>> >>>> Best regards, >>>> Tiwei Bie >>>> >>>>> >>>>>> --- >>>>>> The idea of this patch is to let QEMU push all the IOTLBs >>>>>> to vhost-user backend without waiting for the queries from >>>>>> the backend. Because hardware accelerator at the vhost-user >>>>>> backend may not be able to handle unknown IOVAs. >>>>>> >>>>>> This is just a RFC for now. It seems that, it doesn't work >>>>>> as expected when guest is using kernel driver (To handle >>>>>> this case, it seems that some RAM regions' events also need >>>>>> to be listened). Any comments would be appreciated! Thanks! >>>>>> >>>>>> docs/interop/vhost-user.txt | 9 ++++++++ >>>>>> hw/virtio/vhost-backend.c | 7 ++++++ >>>>>> hw/virtio/vhost-user.c | 8 +++++++ >>>>>> hw/virtio/vhost.c | 47 ++++++++++++++++++++++++++++++++++++--- >>>>>> include/hw/virtio/vhost-backend.h | 3 +++ >>>>>> 5 files changed, 71 insertions(+), 3 deletions(-) >>>>>> >>>>>> diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt >>>>>> index 534caab18a..73e07f9dad 100644 >>>>>> --- a/docs/interop/vhost-user.txt >>>>>> +++ b/docs/interop/vhost-user.txt >>>>>> @@ -380,6 +380,7 @@ Protocol features >>>>>> #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7 >>>>>> #define VHOST_USER_PROTOCOL_F_PAGEFAULT 8 >>>>>> #define VHOST_USER_PROTOCOL_F_CONFIG 9 >>>>>> +#define VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB 10 >>>>>> >>>>>> Master message types >>>>>> -------------------- >>>>>> @@ -797,3 +798,11 @@ resilient for selective requests. >>>>>> For the message types that already solicit a reply from the client, the >>>>>> presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings >>>>>> no behavioural change. (See the 'Communication' section for details.) >>>>>> + >>>>>> +VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB >>>>>> +------------------------------------ >>>>>> +By default, vhost-user backend needs to query the IOTLBs from QEMU after >>>>>> +meeting unknown IOVAs. With this protocol feature negotiated, QEMU will >>>>>> +provide all the IOTLBs to vhost backend without waiting for the queries >>>>>> +from backend. This is helpful when using a hardware accelerator which is >>>>>> +not able to handle unknown IOVAs at the vhost-user backend. >>>>>> diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c >>>>>> index 7f09efab8b..d72994e9a5 100644 >>>>>> --- a/hw/virtio/vhost-backend.c >>>>>> +++ b/hw/virtio/vhost-backend.c >>>>>> @@ -233,6 +233,11 @@ static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev, >>>>>> qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL, NULL); >>>>>> } >>>>>> >>>>>> +static bool vhost_kernel_need_all_device_iotlb(struct vhost_dev *dev) >>>>>> +{ >>>>>> + return false; >>>>>> +} >>>>>> + >>>>>> static const VhostOps kernel_ops = { >>>>>> .backend_type = VHOST_BACKEND_TYPE_KERNEL, >>>>>> .vhost_backend_init = vhost_kernel_init, >>>>>> @@ -264,6 +269,8 @@ static const VhostOps kernel_ops = { >>>>>> #endif /* CONFIG_VHOST_VSOCK */ >>>>>> .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback, >>>>>> .vhost_send_device_iotlb_msg = vhost_kernel_send_device_iotlb_msg, >>>>>> + .vhost_backend_need_all_device_iotlb = >>>>>> + vhost_kernel_need_all_device_iotlb, >>>>>> }; >>>>>> >>>>>> int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type) >>>>>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c >>>>>> index 38da8692bb..30e0b1c13b 100644 >>>>>> --- a/hw/virtio/vhost-user.c >>>>>> +++ b/hw/virtio/vhost-user.c >>>>>> @@ -47,6 +47,7 @@ enum VhostUserProtocolFeature { >>>>>> VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7, >>>>>> VHOST_USER_PROTOCOL_F_PAGEFAULT = 8, >>>>>> VHOST_USER_PROTOCOL_F_CONFIG = 9, >>>>>> + VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB = 10, >>>>>> VHOST_USER_PROTOCOL_F_MAX >>>>>> }; >>>>>> >>>>>> @@ -1581,6 +1582,12 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id) >>>>>> return 0; >>>>>> } >>>>>> >>>>>> +static bool vhost_user_need_all_device_iotlb(struct vhost_dev *dev) >>>>>> +{ >>>>>> + return virtio_has_feature(dev->protocol_features, >>>>>> + VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB); >>>>>> +} >>>>>> + >>>>>> const VhostOps user_ops = { >>>>>> .backend_type = VHOST_BACKEND_TYPE_USER, >>>>>> .vhost_backend_init = vhost_user_init, >>>>>> @@ -1611,4 +1618,5 @@ const VhostOps user_ops = { >>>>>> .vhost_set_config = vhost_user_set_config, >>>>>> .vhost_crypto_create_session = vhost_user_crypto_create_session, >>>>>> .vhost_crypto_close_session = vhost_user_crypto_close_session, >>>>>> + .vhost_backend_need_all_device_iotlb = vhost_user_need_all_device_iotlb, >>>>>> }; >>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c >>>>>> index f51bf573d5..70922ee998 100644 >>>>>> --- a/hw/virtio/vhost.c >>>>>> +++ b/hw/virtio/vhost.c >>>>>> @@ -48,6 +48,9 @@ static unsigned int used_memslots; >>>>>> static QLIST_HEAD(, vhost_dev) vhost_devices = >>>>>> QLIST_HEAD_INITIALIZER(vhost_devices); >>>>>> >>>>>> +static int vhost_memory_region_lookup(struct vhost_dev *hdev, uint64_t gpa, >>>>>> + uint64_t *uaddr, uint64_t *len); >>>>>> + >>>>>> bool vhost_has_free_slot(void) >>>>>> { >>>>>> unsigned int slots_limit = ~0U; >>>>>> @@ -634,12 +637,39 @@ static void vhost_region_addnop(MemoryListener *listener, >>>>>> vhost_region_add_section(dev, section); >>>>>> } >>>>>> >>>>>> -static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) >>>>>> +static void vhost_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) >>>>>> { >>>>>> struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n); >>>>>> struct vhost_dev *hdev = iommu->hdev; >>>>>> hwaddr iova = iotlb->iova + iommu->iommu_offset; >>>>>> >>>>>> + if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) { >>>>>> + uint64_t uaddr, len; >>>>>> + >>>>>> + rcu_read_lock(); >>>>>> + >>>>>> + if (iotlb->target_as != NULL) { >>>>>> + if (vhost_memory_region_lookup(hdev, iotlb->translated_addr, >>>>>> + &uaddr, &len)) { >>>>>> + error_report("Fail to lookup the translated address " >>>>>> + "%"PRIx64, iotlb->translated_addr); >>>>>> + goto out; >>>>>> + } >>>>>> + >>>>>> + len = MIN(iotlb->addr_mask + 1, len); >>>>>> + iova = iova & ~iotlb->addr_mask; >>>>>> + >>>>>> + if (vhost_backend_update_device_iotlb(hdev, iova, uaddr, >>>>>> + len, iotlb->perm)) { >>>>>> + error_report("Fail to update device iotlb"); >>>>>> + goto out; >>>>>> + } >>>>>> + } >>>>>> +out: >>>>>> + rcu_read_unlock(); >>>>>> + return; >>>>>> + } >>>>>> + >>>>>> if (vhost_backend_invalidate_device_iotlb(hdev, iova, >>>>>> iotlb->addr_mask + 1)) { >>>>>> error_report("Fail to invalidate device iotlb"); >>>>>> @@ -652,6 +682,7 @@ static void vhost_iommu_region_add(MemoryListener *listener, >>>>>> struct vhost_dev *dev = container_of(listener, struct vhost_dev, >>>>>> iommu_listener); >>>>>> struct vhost_iommu *iommu; >>>>>> + IOMMUNotifierFlag flags; >>>>>> Int128 end; >>>>>> >>>>>> if (!memory_region_is_iommu(section->mr)) { >>>>>> @@ -662,8 +693,15 @@ static void vhost_iommu_region_add(MemoryListener *listener, >>>>>> end = int128_add(int128_make64(section->offset_within_region), >>>>>> section->size); >>>>>> end = int128_sub(end, int128_one()); >>>>>> - iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify, >>>>>> - IOMMU_NOTIFIER_UNMAP, >>>>>> + >>>>>> + if (dev->vhost_ops->vhost_backend_need_all_device_iotlb(dev)) { >>>>>> + flags = IOMMU_NOTIFIER_ALL; >>>>>> + } else { >>>>>> + flags = IOMMU_NOTIFIER_UNMAP; >>>>>> + } >>>>>> + >>>>>> + iommu_notifier_init(&iommu->n, vhost_iommu_map_notify, >>>>>> + flags, >>>>>> section->offset_within_region, >>>>>> int128_get64(end)); >>>>>> iommu->mr = section->mr; >>>>>> @@ -673,6 +711,9 @@ static void vhost_iommu_region_add(MemoryListener *listener, >>>>>> memory_region_register_iommu_notifier(section->mr, &iommu->n); >>>>>> QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next); >>>>>> /* TODO: can replay help performance here? */ >>>>>> + if (flags == IOMMU_NOTIFIER_ALL) { >>>>>> + memory_region_iommu_replay(IOMMU_MEMORY_REGION(iommu->mr), &iommu->n); >>>>>> + } >>>>>> } >>>>>> >>>>>> static void vhost_iommu_region_del(MemoryListener *listener, >>>>>> diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h >>>>>> index 5dac61f9ea..602fd987a4 100644 >>>>>> --- a/include/hw/virtio/vhost-backend.h >>>>>> +++ b/include/hw/virtio/vhost-backend.h >>>>>> @@ -101,6 +101,8 @@ typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev, >>>>>> typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev, >>>>>> uint64_t session_id); >>>>>> >>>>>> +typedef bool (*vhost_backend_need_all_device_iotlb_op)(struct vhost_dev *dev); >>>>>> + >>>>>> typedef struct VhostOps { >>>>>> VhostBackendType backend_type; >>>>>> vhost_backend_init vhost_backend_init; >>>>>> @@ -138,6 +140,7 @@ typedef struct VhostOps { >>>>>> vhost_set_config_op vhost_set_config; >>>>>> vhost_crypto_create_session_op vhost_crypto_create_session; >>>>>> vhost_crypto_close_session_op vhost_crypto_close_session; >>>>>> + vhost_backend_need_all_device_iotlb_op vhost_backend_need_all_device_iotlb; >>>>>> } VhostOps; >>>>>> >>>>>> extern const VhostOps user_ops; >>>>>> -- >>>>>> 2.11.0
On Thu, Apr 12, 2018 at 11:37:35AM +0800, Jason Wang wrote: > > > On 2018年04月12日 09:57, Michael S. Tsirkin wrote: > > On Thu, Apr 12, 2018 at 09:39:43AM +0800, Tiwei Bie wrote: > > > On Thu, Apr 12, 2018 at 04:29:29AM +0300, Michael S. Tsirkin wrote: > > > > On Thu, Apr 12, 2018 at 09:10:59AM +0800, Tiwei Bie wrote: > > > > > On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote: > > > > > > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote: > > > > > > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > > > > > > > feature for vhost-user. By default, vhost-user backend needs > > > > > > > to query the IOTLBs from QEMU after meeting unknown IOVAs. > > > > > > > With this protocol feature negotiated, QEMU will provide all > > > > > > > the IOTLBs to vhost-user backend without waiting for the > > > > > > > queries from backend. This is helpful when using a hardware > > > > > > > accelerator which is not able to handle unknown IOVAs at the > > > > > > > vhost-user backend. > > > > > > > > > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> > > > > > > This is potentially a large amount of data to be sent > > > > > > on a socket. > > > > > If we take the hardware accelerator out of this picture, we > > > > > will find that it's actually a question of "pre-loading" vs > > > > > "lazy-loading". I think neither of them is perfect. > > > > > > > > > > For "pre-loading", as you said, we may have a tough starting. > > > > > But for "lazy-loading", we can't have a predictable performance. > > > > > A sudden, unexpected performance drop may happen at any time, > > > > > because we may meet an unknown IOVA at any time in this case. > > > > That's how hardware behaves too though. So we can expect guests > > > > to try to optimize locality. > > > The difference is that, the software implementation needs to > > > query the mappings via socket. And it's much slower.. > > If you are proposing this new feature as an optimization, > > then I'd like to see numbers showing the performance gains. > > > > It's definitely possible to optimize things out. Pre-loading isn't > > where I would start optimizing though. For example, DPDK could have its > > own VTD emulation, then it could access guest memory directly. > > Have vtd emulation in dpdk have many disadvantages: > > - vendor locked, can only work for intel I don't see what would prevent other vendors from doing the same. > - duplication of codes and bugs > - a huge number of new message types needs to be invented Oh, just the flush I'd wager. > So I tend to go to a reverse way, link dpdk to qemu. Won't really help as people want to build software using dpdk. > > > > > > > > > Once we meet an unknown IOVA, the backend's data path will need > > > > > to stop and query the mapping of the IOVA via the socket and > > > > > wait for the reply. And the latency is not negligible (sometimes > > > > > it's even unacceptable), especially in high performance network > > > > > case. So maybe it's better to make both of them available to > > > > > the vhost backend. > > > > > > > > > > > I had an impression that a hardware accelerator was using > > > > > > VFIO anyway. Given this, can't we have QEMU program > > > > > > the shadow IOMMU tables into VFIO directly? > > > > > I think it's a good idea! Currently, my concern about it is > > > > > that, the hardware device may not use IOMMU and it may have > > > > > its builtin address translation unit. And it will be a pain > > > > > for device vendors to teach VFIO to be able to work with the > > > > > builtin address translation unit. > > > > I think such drivers would have to interate with VFIO somehow. > > > > Otherwise, what is the plan for assigning such devices then? > > > Such devices are just for vhost data path acceleration. > > That's not true I think. E.g. RDMA devices have an on-card MMU. > > > > > They have many available queue pairs, the switch logic > > > will be done among those queue pairs. And different queue > > > pairs will serve different VMs directly. > > > > > > Best regards, > > > Tiwei Bie > > The way I would do it is attach different PASID values to > > different queues. This way you can use the standard IOMMU > > to enforce protection. > > So that's just shared virtual memory on host which can share iova address > space between a specific queue pair and a process. I'm not sure how hard can > exist vhost-user backend to support this. > > Thanks That would be VFIO's job, nothing to do with vhost-user besides sharing the VFIO descriptor. > > > > > > > > > > > > > > > Best regards, > > > > > Tiwei Bie > > > > > > > > > > > > > > > > > > --- > > > > > > > The idea of this patch is to let QEMU push all the IOTLBs > > > > > > > to vhost-user backend without waiting for the queries from > > > > > > > the backend. Because hardware accelerator at the vhost-user > > > > > > > backend may not be able to handle unknown IOVAs. > > > > > > > > > > > > > > This is just a RFC for now. It seems that, it doesn't work > > > > > > > as expected when guest is using kernel driver (To handle > > > > > > > this case, it seems that some RAM regions' events also need > > > > > > > to be listened). Any comments would be appreciated! Thanks! > > > > > > > > > > > > > > docs/interop/vhost-user.txt | 9 ++++++++ > > > > > > > hw/virtio/vhost-backend.c | 7 ++++++ > > > > > > > hw/virtio/vhost-user.c | 8 +++++++ > > > > > > > hw/virtio/vhost.c | 47 ++++++++++++++++++++++++++++++++++++--- > > > > > > > include/hw/virtio/vhost-backend.h | 3 +++ > > > > > > > 5 files changed, 71 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt > > > > > > > index 534caab18a..73e07f9dad 100644 > > > > > > > --- a/docs/interop/vhost-user.txt > > > > > > > +++ b/docs/interop/vhost-user.txt > > > > > > > @@ -380,6 +380,7 @@ Protocol features > > > > > > > #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7 > > > > > > > #define VHOST_USER_PROTOCOL_F_PAGEFAULT 8 > > > > > > > #define VHOST_USER_PROTOCOL_F_CONFIG 9 > > > > > > > +#define VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB 10 > > > > > > > Master message types > > > > > > > -------------------- > > > > > > > @@ -797,3 +798,11 @@ resilient for selective requests. > > > > > > > For the message types that already solicit a reply from the client, the > > > > > > > presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings > > > > > > > no behavioural change. (See the 'Communication' section for details.) > > > > > > > + > > > > > > > +VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > > > > > > > +------------------------------------ > > > > > > > +By default, vhost-user backend needs to query the IOTLBs from QEMU after > > > > > > > +meeting unknown IOVAs. With this protocol feature negotiated, QEMU will > > > > > > > +provide all the IOTLBs to vhost backend without waiting for the queries > > > > > > > +from backend. This is helpful when using a hardware accelerator which is > > > > > > > +not able to handle unknown IOVAs at the vhost-user backend. > > > > > > > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c > > > > > > > index 7f09efab8b..d72994e9a5 100644 > > > > > > > --- a/hw/virtio/vhost-backend.c > > > > > > > +++ b/hw/virtio/vhost-backend.c > > > > > > > @@ -233,6 +233,11 @@ static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev, > > > > > > > qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL, NULL); > > > > > > > } > > > > > > > +static bool vhost_kernel_need_all_device_iotlb(struct vhost_dev *dev) > > > > > > > +{ > > > > > > > + return false; > > > > > > > +} > > > > > > > + > > > > > > > static const VhostOps kernel_ops = { > > > > > > > .backend_type = VHOST_BACKEND_TYPE_KERNEL, > > > > > > > .vhost_backend_init = vhost_kernel_init, > > > > > > > @@ -264,6 +269,8 @@ static const VhostOps kernel_ops = { > > > > > > > #endif /* CONFIG_VHOST_VSOCK */ > > > > > > > .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback, > > > > > > > .vhost_send_device_iotlb_msg = vhost_kernel_send_device_iotlb_msg, > > > > > > > + .vhost_backend_need_all_device_iotlb = > > > > > > > + vhost_kernel_need_all_device_iotlb, > > > > > > > }; > > > > > > > int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type) > > > > > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > > > > > > index 38da8692bb..30e0b1c13b 100644 > > > > > > > --- a/hw/virtio/vhost-user.c > > > > > > > +++ b/hw/virtio/vhost-user.c > > > > > > > @@ -47,6 +47,7 @@ enum VhostUserProtocolFeature { > > > > > > > VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7, > > > > > > > VHOST_USER_PROTOCOL_F_PAGEFAULT = 8, > > > > > > > VHOST_USER_PROTOCOL_F_CONFIG = 9, > > > > > > > + VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB = 10, > > > > > > > VHOST_USER_PROTOCOL_F_MAX > > > > > > > }; > > > > > > > @@ -1581,6 +1582,12 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id) > > > > > > > return 0; > > > > > > > } > > > > > > > +static bool vhost_user_need_all_device_iotlb(struct vhost_dev *dev) > > > > > > > +{ > > > > > > > + return virtio_has_feature(dev->protocol_features, > > > > > > > + VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB); > > > > > > > +} > > > > > > > + > > > > > > > const VhostOps user_ops = { > > > > > > > .backend_type = VHOST_BACKEND_TYPE_USER, > > > > > > > .vhost_backend_init = vhost_user_init, > > > > > > > @@ -1611,4 +1618,5 @@ const VhostOps user_ops = { > > > > > > > .vhost_set_config = vhost_user_set_config, > > > > > > > .vhost_crypto_create_session = vhost_user_crypto_create_session, > > > > > > > .vhost_crypto_close_session = vhost_user_crypto_close_session, > > > > > > > + .vhost_backend_need_all_device_iotlb = vhost_user_need_all_device_iotlb, > > > > > > > }; > > > > > > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > > > > > > > index f51bf573d5..70922ee998 100644 > > > > > > > --- a/hw/virtio/vhost.c > > > > > > > +++ b/hw/virtio/vhost.c > > > > > > > @@ -48,6 +48,9 @@ static unsigned int used_memslots; > > > > > > > static QLIST_HEAD(, vhost_dev) vhost_devices = > > > > > > > QLIST_HEAD_INITIALIZER(vhost_devices); > > > > > > > +static int vhost_memory_region_lookup(struct vhost_dev *hdev, uint64_t gpa, > > > > > > > + uint64_t *uaddr, uint64_t *len); > > > > > > > + > > > > > > > bool vhost_has_free_slot(void) > > > > > > > { > > > > > > > unsigned int slots_limit = ~0U; > > > > > > > @@ -634,12 +637,39 @@ static void vhost_region_addnop(MemoryListener *listener, > > > > > > > vhost_region_add_section(dev, section); > > > > > > > } > > > > > > > -static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) > > > > > > > +static void vhost_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) > > > > > > > { > > > > > > > struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n); > > > > > > > struct vhost_dev *hdev = iommu->hdev; > > > > > > > hwaddr iova = iotlb->iova + iommu->iommu_offset; > > > > > > > + if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) { > > > > > > > + uint64_t uaddr, len; > > > > > > > + > > > > > > > + rcu_read_lock(); > > > > > > > + > > > > > > > + if (iotlb->target_as != NULL) { > > > > > > > + if (vhost_memory_region_lookup(hdev, iotlb->translated_addr, > > > > > > > + &uaddr, &len)) { > > > > > > > + error_report("Fail to lookup the translated address " > > > > > > > + "%"PRIx64, iotlb->translated_addr); > > > > > > > + goto out; > > > > > > > + } > > > > > > > + > > > > > > > + len = MIN(iotlb->addr_mask + 1, len); > > > > > > > + iova = iova & ~iotlb->addr_mask; > > > > > > > + > > > > > > > + if (vhost_backend_update_device_iotlb(hdev, iova, uaddr, > > > > > > > + len, iotlb->perm)) { > > > > > > > + error_report("Fail to update device iotlb"); > > > > > > > + goto out; > > > > > > > + } > > > > > > > + } > > > > > > > +out: > > > > > > > + rcu_read_unlock(); > > > > > > > + return; > > > > > > > + } > > > > > > > + > > > > > > > if (vhost_backend_invalidate_device_iotlb(hdev, iova, > > > > > > > iotlb->addr_mask + 1)) { > > > > > > > error_report("Fail to invalidate device iotlb"); > > > > > > > @@ -652,6 +682,7 @@ static void vhost_iommu_region_add(MemoryListener *listener, > > > > > > > struct vhost_dev *dev = container_of(listener, struct vhost_dev, > > > > > > > iommu_listener); > > > > > > > struct vhost_iommu *iommu; > > > > > > > + IOMMUNotifierFlag flags; > > > > > > > Int128 end; > > > > > > > if (!memory_region_is_iommu(section->mr)) { > > > > > > > @@ -662,8 +693,15 @@ static void vhost_iommu_region_add(MemoryListener *listener, > > > > > > > end = int128_add(int128_make64(section->offset_within_region), > > > > > > > section->size); > > > > > > > end = int128_sub(end, int128_one()); > > > > > > > - iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify, > > > > > > > - IOMMU_NOTIFIER_UNMAP, > > > > > > > + > > > > > > > + if (dev->vhost_ops->vhost_backend_need_all_device_iotlb(dev)) { > > > > > > > + flags = IOMMU_NOTIFIER_ALL; > > > > > > > + } else { > > > > > > > + flags = IOMMU_NOTIFIER_UNMAP; > > > > > > > + } > > > > > > > + > > > > > > > + iommu_notifier_init(&iommu->n, vhost_iommu_map_notify, > > > > > > > + flags, > > > > > > > section->offset_within_region, > > > > > > > int128_get64(end)); > > > > > > > iommu->mr = section->mr; > > > > > > > @@ -673,6 +711,9 @@ static void vhost_iommu_region_add(MemoryListener *listener, > > > > > > > memory_region_register_iommu_notifier(section->mr, &iommu->n); > > > > > > > QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next); > > > > > > > /* TODO: can replay help performance here? */ > > > > > > > + if (flags == IOMMU_NOTIFIER_ALL) { > > > > > > > + memory_region_iommu_replay(IOMMU_MEMORY_REGION(iommu->mr), &iommu->n); > > > > > > > + } > > > > > > > } > > > > > > > static void vhost_iommu_region_del(MemoryListener *listener, > > > > > > > diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h > > > > > > > index 5dac61f9ea..602fd987a4 100644 > > > > > > > --- a/include/hw/virtio/vhost-backend.h > > > > > > > +++ b/include/hw/virtio/vhost-backend.h > > > > > > > @@ -101,6 +101,8 @@ typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev, > > > > > > > typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev, > > > > > > > uint64_t session_id); > > > > > > > +typedef bool (*vhost_backend_need_all_device_iotlb_op)(struct vhost_dev *dev); > > > > > > > + > > > > > > > typedef struct VhostOps { > > > > > > > VhostBackendType backend_type; > > > > > > > vhost_backend_init vhost_backend_init; > > > > > > > @@ -138,6 +140,7 @@ typedef struct VhostOps { > > > > > > > vhost_set_config_op vhost_set_config; > > > > > > > vhost_crypto_create_session_op vhost_crypto_create_session; > > > > > > > vhost_crypto_close_session_op vhost_crypto_close_session; > > > > > > > + vhost_backend_need_all_device_iotlb_op vhost_backend_need_all_device_iotlb; > > > > > > > } VhostOps; > > > > > > > extern const VhostOps user_ops; > > > > > > > -- > > > > > > > 2.11.0
On 2018年04月12日 11:35, Michael S. Tsirkin wrote: >> There are more advantages to using request with PASID: >> >> You can use hardware support for nesting, having guest supply 1st level >> translation and host second level translation. >> >> I actually had an idea to do something like this for AMD >> and ARM which support nesting even for requests with PASID, >> having intel benefit too would be nice. > Something else to consider is implementing PRS capability. > > > In theory this could then go like this: > > - get page request from device > - fetch request from VTD page tables > - use response to issue a page response message > > > This would match the current vhost-user model. This requires IOMMU driver can forward this to VFIO and then VFIO can forward it to userspace. Looks like a lot of changes and it would be even slower than what is proposed in this patch. Thanks
On 2018年04月12日 11:37, Michael S. Tsirkin wrote: >>>>> Yeah, we are still using the existing IOTLB update messages >>>>> to send the IOTLB messages to backend. The only difference >>>>> is that, QEMU won't wait for the queries before sending the >>>>> IOTLB update messages. >>>> Yes, my question is not very clear. I mean why must need a new feature bit? >>>> It looks to me qemu code can work without this. >>>> >>>> Thanks >>> Generally we avoid adding new messages without a protocol feature bit. >>> While careful analysis might sometimes prove it's not a strict >>> requirement, it's just overall a clean and robust approach. >>> >> Right but the looks like the patch does not introduce any new type of >> messages. >> >> Thanks > In this case remote needs to know that it will send these messages. > > -- MST Ok, if some backend does not expect qemu will send without any query from itself, I agree we need a new bit. But it looks more like a workaround for buggy backend, at least vhost kernel does not need to know about this. Thanks
On Thu, Apr 12, 2018 at 11:43:41AM +0800, Jason Wang wrote: > > > On 2018年04月12日 11:35, Michael S. Tsirkin wrote: > > > There are more advantages to using request with PASID: > > > > > > You can use hardware support for nesting, having guest supply 1st level > > > translation and host second level translation. > > > > > > I actually had an idea to do something like this for AMD > > > and ARM which support nesting even for requests with PASID, > > > having intel benefit too would be nice. > > Something else to consider is implementing PRS capability. > > > > > > In theory this could then go like this: > > > > - get page request from device > > - fetch request from VTD page tables > > - use response to issue a page response message > > > > > > This would match the current vhost-user model. > > This requires IOMMU driver can forward this to VFIO and then VFIO can > forward it to userspace. Looks like a lot of changes and it would be even > slower than what is proposed in this patch. > > Thanks It would work better for a static table as only accessed pages would need to be sent. Slower for the dynamic case but dynamic case needs hardware support to work properly in any case.
On 2018年04月12日 11:41, Michael S. Tsirkin wrote: > On Thu, Apr 12, 2018 at 11:37:35AM +0800, Jason Wang wrote: >> >> On 2018年04月12日 09:57, Michael S. Tsirkin wrote: >>> On Thu, Apr 12, 2018 at 09:39:43AM +0800, Tiwei Bie wrote: >>>> On Thu, Apr 12, 2018 at 04:29:29AM +0300, Michael S. Tsirkin wrote: >>>>> On Thu, Apr 12, 2018 at 09:10:59AM +0800, Tiwei Bie wrote: >>>>>> On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote: >>>>>>> On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote: >>>>>>>> This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB >>>>>>>> feature for vhost-user. By default, vhost-user backend needs >>>>>>>> to query the IOTLBs from QEMU after meeting unknown IOVAs. >>>>>>>> With this protocol feature negotiated, QEMU will provide all >>>>>>>> the IOTLBs to vhost-user backend without waiting for the >>>>>>>> queries from backend. This is helpful when using a hardware >>>>>>>> accelerator which is not able to handle unknown IOVAs at the >>>>>>>> vhost-user backend. >>>>>>>> >>>>>>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> >>>>>>> This is potentially a large amount of data to be sent >>>>>>> on a socket. >>>>>> If we take the hardware accelerator out of this picture, we >>>>>> will find that it's actually a question of "pre-loading" vs >>>>>> "lazy-loading". I think neither of them is perfect. >>>>>> >>>>>> For "pre-loading", as you said, we may have a tough starting. >>>>>> But for "lazy-loading", we can't have a predictable performance. >>>>>> A sudden, unexpected performance drop may happen at any time, >>>>>> because we may meet an unknown IOVA at any time in this case. >>>>> That's how hardware behaves too though. So we can expect guests >>>>> to try to optimize locality. >>>> The difference is that, the software implementation needs to >>>> query the mappings via socket. And it's much slower.. >>> If you are proposing this new feature as an optimization, >>> then I'd like to see numbers showing the performance gains. >>> >>> It's definitely possible to optimize things out. Pre-loading isn't >>> where I would start optimizing though. For example, DPDK could have its >>> own VTD emulation, then it could access guest memory directly. >> Have vtd emulation in dpdk have many disadvantages: >> >> - vendor locked, can only work for intel > I don't see what would prevent other vendors from doing the same. Technically it can, two questions here: - Shouldn't we keep vhost-user vendor/transport independent? - Do we really prefer the split device model here, it means to implement datapath in two places at least. Personally I prefer to keep all virt stuffs inside qemu. > >> - duplication of codes and bugs >> - a huge number of new message types needs to be invented > Oh, just the flush I'd wager. Not only flush, but also error reporting, context entry programming and even PRS in the future. And we need a feature negotiation between them like vhost to keep the compatibility for future features. This sounds not good. > >> So I tend to go to a reverse way, link dpdk to qemu. > Won't really help as people want to build software using dpdk. Well I believe the main use case it vDPA which is hardware virtio offload. For building software using dpdk like ovs-dpdk, it's another interesting topic. We can seek solution other than linking dpdk to qemu, e.g we can do all virtio and packet copy stuffs inside a qemu IOThread and use another inter-process channel to communicate with OVS-dpdk (or another virtio-user here). The key is to hide all virtualization details from OVS-dpdk. > > >>> >>>>>> Once we meet an unknown IOVA, the backend's data path will need >>>>>> to stop and query the mapping of the IOVA via the socket and >>>>>> wait for the reply. And the latency is not negligible (sometimes >>>>>> it's even unacceptable), especially in high performance network >>>>>> case. So maybe it's better to make both of them available to >>>>>> the vhost backend. >>>>>> >>>>>>> I had an impression that a hardware accelerator was using >>>>>>> VFIO anyway. Given this, can't we have QEMU program >>>>>>> the shadow IOMMU tables into VFIO directly? >>>>>> I think it's a good idea! Currently, my concern about it is >>>>>> that, the hardware device may not use IOMMU and it may have >>>>>> its builtin address translation unit. And it will be a pain >>>>>> for device vendors to teach VFIO to be able to work with the >>>>>> builtin address translation unit. >>>>> I think such drivers would have to interate with VFIO somehow. >>>>> Otherwise, what is the plan for assigning such devices then? >>>> Such devices are just for vhost data path acceleration. >>> That's not true I think. E.g. RDMA devices have an on-card MMU. >>> >>>> They have many available queue pairs, the switch logic >>>> will be done among those queue pairs. And different queue >>>> pairs will serve different VMs directly. >>>> >>>> Best regards, >>>> Tiwei Bie >>> The way I would do it is attach different PASID values to >>> different queues. This way you can use the standard IOMMU >>> to enforce protection. >> So that's just shared virtual memory on host which can share iova address >> space between a specific queue pair and a process. I'm not sure how hard can >> exist vhost-user backend to support this. >> >> Thanks > That would be VFIO's job, nothing to do with vhost-user besides > sharing the VFIO descriptor. At least dpdk need to offload DMA mapping setup to qemu. Thanks
On 2018年04月12日 09:44, Tiwei Bie wrote: > On Wed, Apr 11, 2018 at 08:37:17PM +0300, Michael S. Tsirkin wrote: >> On Wed, Apr 11, 2018 at 04:38:53PM +0800, Tiwei Bie wrote: >>> On Wed, Apr 11, 2018 at 04:01:19PM +0800, Jason Wang wrote: >>>> On 2018年04月11日 15:20, Tiwei Bie wrote: >>>>> This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB >>>>> feature for vhost-user. By default, vhost-user backend needs >>>>> to query the IOTLBs from QEMU after meeting unknown IOVAs. >>>>> With this protocol feature negotiated, QEMU will provide all >>>>> the IOTLBs to vhost-user backend without waiting for the >>>>> queries from backend. This is helpful when using a hardware >>>>> accelerator which is not able to handle unknown IOVAs at the >>>>> vhost-user backend. >>>>> >>>>> Signed-off-by: Tiwei Bie<tiwei.bie@intel.com> >>>>> --- >>>>> The idea of this patch is to let QEMU push all the IOTLBs >>>>> to vhost-user backend without waiting for the queries from >>>>> the backend. Because hardware accelerator at the vhost-user >>>>> backend may not be able to handle unknown IOVAs. >>>>> >>>>> This is just a RFC for now. It seems that, it doesn't work >>>>> as expected when guest is using kernel driver (To handle >>>>> this case, it seems that some RAM regions' events also need >>>>> to be listened). Any comments would be appreciated! Thanks! >>>> Interesting, a quick question is why this is needed? Can we just use exist >>>> IOTLB update message? >>> Yeah, we are still using the existing IOTLB update messages >>> to send the IOTLB messages to backend. The only difference >>> is that, QEMU won't wait for the queries before sending the >>> IOTLB update messages. >> So I have a concern with that, in that without any flow >> control the socket buffer used by vhost-user might become >> full. > Each IOTLB update message needs a reply. So I think it > won't happen. Is this what we've already done now? I don't find any statement on this in vhost-user.txt? """ * VHOST_USER_IOTLB_MSG Id: 22 Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type) Master payload: struct vhost_iotlb_msg Slave payload: u64 Send IOTLB messages with struct vhost_iotlb_msg as payload. Master sends such requests to update and invalidate entries in the device IOTLB. The slave has to acknowledge the request with sending zero as u64 payload for success, non-zero otherwise. This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature has been successfully negotiated. """ And you probably need to modify the following statement: """ The master isn't expected to take the initiative to send IOTLB update messages, as the slave sends IOTLB miss messages for the guest virtual memory areas it needs to access. """ Thanks > > Best regards, > Tiwei Bie >
On Thu, Apr 12, 2018 at 03:38:50PM +0800, Jason Wang wrote: > On 2018年04月12日 09:44, Tiwei Bie wrote: > > On Wed, Apr 11, 2018 at 08:37:17PM +0300, Michael S. Tsirkin wrote: > > > On Wed, Apr 11, 2018 at 04:38:53PM +0800, Tiwei Bie wrote: > > > > On Wed, Apr 11, 2018 at 04:01:19PM +0800, Jason Wang wrote: > > > > > On 2018年04月11日 15:20, Tiwei Bie wrote: > > > > > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > > > > > > feature for vhost-user. By default, vhost-user backend needs > > > > > > to query the IOTLBs from QEMU after meeting unknown IOVAs. > > > > > > With this protocol feature negotiated, QEMU will provide all > > > > > > the IOTLBs to vhost-user backend without waiting for the > > > > > > queries from backend. This is helpful when using a hardware > > > > > > accelerator which is not able to handle unknown IOVAs at the > > > > > > vhost-user backend. > > > > > > > > > > > > Signed-off-by: Tiwei Bie<tiwei.bie@intel.com> > > > > > > --- > > > > > > The idea of this patch is to let QEMU push all the IOTLBs > > > > > > to vhost-user backend without waiting for the queries from > > > > > > the backend. Because hardware accelerator at the vhost-user > > > > > > backend may not be able to handle unknown IOVAs. > > > > > > > > > > > > This is just a RFC for now. It seems that, it doesn't work > > > > > > as expected when guest is using kernel driver (To handle > > > > > > this case, it seems that some RAM regions' events also need > > > > > > to be listened). Any comments would be appreciated! Thanks! > > > > > Interesting, a quick question is why this is needed? Can we just use exist > > > > > IOTLB update message? > > > > Yeah, we are still using the existing IOTLB update messages > > > > to send the IOTLB messages to backend. The only difference > > > > is that, QEMU won't wait for the queries before sending the > > > > IOTLB update messages. > > > So I have a concern with that, in that without any flow > > > control the socket buffer used by vhost-user might become > > > full. > > Each IOTLB update message needs a reply. So I think it > > won't happen. > > Is this what we've already done now? I don't find any statement on this in > vhost-user.txt? > > """ > * VHOST_USER_IOTLB_MSG > > Id: 22 > Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type) > Master payload: struct vhost_iotlb_msg > Slave payload: u64 > > Send IOTLB messages with struct vhost_iotlb_msg as payload. > Master sends such requests to update and invalidate entries in the > device > IOTLB. The slave has to acknowledge the request with sending zero as > u64 > payload for success, non-zero otherwise. > This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature > has been successfully negotiated. > """ Yeah, it's what we've already done now. It's in the above statement you quoted: "The slave has to acknowledge the request with sending zero as u64 payload for success, non-zero otherwise." > > And you probably need to modify the following statement: > > """ > The master isn't expected to take the initiative to send IOTLB update > messages, > as the slave sends IOTLB miss messages for the guest virtual memory areas it > needs to access. > """ Yeah, you're right. Thanks! This statement needs some minor updates. Best regards, Tiwei Bie > > Thanks > > > > > > Best regards, > > Tiwei Bie > > >
On 2018年04月12日 16:10, Tiwei Bie wrote: > On Thu, Apr 12, 2018 at 03:38:50PM +0800, Jason Wang wrote: >> On 2018年04月12日 09:44, Tiwei Bie wrote: >>> On Wed, Apr 11, 2018 at 08:37:17PM +0300, Michael S. Tsirkin wrote: >>>> On Wed, Apr 11, 2018 at 04:38:53PM +0800, Tiwei Bie wrote: >>>>> On Wed, Apr 11, 2018 at 04:01:19PM +0800, Jason Wang wrote: >>>>>> On 2018年04月11日 15:20, Tiwei Bie wrote: >>>>>>> This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB >>>>>>> feature for vhost-user. By default, vhost-user backend needs >>>>>>> to query the IOTLBs from QEMU after meeting unknown IOVAs. >>>>>>> With this protocol feature negotiated, QEMU will provide all >>>>>>> the IOTLBs to vhost-user backend without waiting for the >>>>>>> queries from backend. This is helpful when using a hardware >>>>>>> accelerator which is not able to handle unknown IOVAs at the >>>>>>> vhost-user backend. >>>>>>> >>>>>>> Signed-off-by: Tiwei Bie<tiwei.bie@intel.com> >>>>>>> --- >>>>>>> The idea of this patch is to let QEMU push all the IOTLBs >>>>>>> to vhost-user backend without waiting for the queries from >>>>>>> the backend. Because hardware accelerator at the vhost-user >>>>>>> backend may not be able to handle unknown IOVAs. >>>>>>> >>>>>>> This is just a RFC for now. It seems that, it doesn't work >>>>>>> as expected when guest is using kernel driver (To handle >>>>>>> this case, it seems that some RAM regions' events also need >>>>>>> to be listened). Any comments would be appreciated! Thanks! >>>>>> Interesting, a quick question is why this is needed? Can we just use exist >>>>>> IOTLB update message? >>>>> Yeah, we are still using the existing IOTLB update messages >>>>> to send the IOTLB messages to backend. The only difference >>>>> is that, QEMU won't wait for the queries before sending the >>>>> IOTLB update messages. >>>> So I have a concern with that, in that without any flow >>>> control the socket buffer used by vhost-user might become >>>> full. >>> Each IOTLB update message needs a reply. So I think it >>> won't happen. >> Is this what we've already done now? I don't find any statement on this in >> vhost-user.txt? >> >> """ >> * VHOST_USER_IOTLB_MSG >> >> Id: 22 >> Equivalent ioctl: N/A (equivalent to VHOST_IOTLB_MSG message type) >> Master payload: struct vhost_iotlb_msg >> Slave payload: u64 >> >> Send IOTLB messages with struct vhost_iotlb_msg as payload. >> Master sends such requests to update and invalidate entries in the >> device >> IOTLB. The slave has to acknowledge the request with sending zero as >> u64 >> payload for success, non-zero otherwise. >> This request should be send only when VIRTIO_F_IOMMU_PLATFORM feature >> has been successfully negotiated. >> """ > Yeah, it's what we've already done now. It's in the above > statement you quoted: > > "The slave has to acknowledge the request with sending zero as > u64 payload for success, non-zero otherwise." My bad. Actually, there's a minor optimization here. When QI is enabled, we only need replay ack for wait descriptor, this allows some kinds of batching. Another interesting idea is to send multiqueue request through a single message. Thanks > >> And you probably need to modify the following statement: >> >> """ >> The master isn't expected to take the initiative to send IOTLB update >> messages, >> as the slave sends IOTLB miss messages for the guest virtual memory areas it >> needs to access. >> """ > Yeah, you're right. Thanks! This statement needs some minor updates. > > Best regards, > Tiwei Bie > >> Thanks >> >> >>> Best regards, >>> Tiwei Bie >>>
On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
> This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
Naming suggestion: "PREFILL_IOTLB" instead of "NEED_ALL_IOTLB".
I wasn't able to guess what "NEED_ALL_IOTLB" does. "PREFILL_IOTLB"
communicates that memory translation will be set up in advance.
On 2018年04月16日 15:47, Stefan Hajnoczi wrote: > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote: >> This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > Naming suggestion: "PREFILL_IOTLB" instead of "NEED_ALL_IOTLB". > > I wasn't able to guess what "NEED_ALL_IOTLB" does. "PREFILL_IOTLB" > communicates that memory translation will be set up in advance. I agree we need a better name here but it should not have anything like "IOTLB" since it actually try to do page table shadowing on the remote vhost-user backend. Thanks
On Mon, Apr 16, 2018 at 03:47:06PM +0800, Stefan Hajnoczi wrote: > On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote: > > This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB > > Naming suggestion: "PREFILL_IOTLB" instead of "NEED_ALL_IOTLB". > > I wasn't able to guess what "NEED_ALL_IOTLB" does. "PREFILL_IOTLB" > communicates that memory translation will be set up in advance. By naming it as NEED_ALL_IOTLB, I want to indicate that, without this, it's hard or impossible for backend to get all the IOTLBs (because backend needs to meet IOVAs first before getting the corresponding IOTLBs, and backend can't tell whether it has met all the IOVAs or not, especially when dynamic mapping is being used by guest). But with this feature negotiated, backend will expect to get all the IOTLBs. But it seems that, the name (NEED_ALL_IOTLB) isn't good. And thanks for your suggestion! :) Best regards, Tiwei Bie
diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt index 534caab18a..73e07f9dad 100644 --- a/docs/interop/vhost-user.txt +++ b/docs/interop/vhost-user.txt @@ -380,6 +380,7 @@ Protocol features #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION 7 #define VHOST_USER_PROTOCOL_F_PAGEFAULT 8 #define VHOST_USER_PROTOCOL_F_CONFIG 9 +#define VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB 10 Master message types -------------------- @@ -797,3 +798,11 @@ resilient for selective requests. For the message types that already solicit a reply from the client, the presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings no behavioural change. (See the 'Communication' section for details.) + +VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB +------------------------------------ +By default, vhost-user backend needs to query the IOTLBs from QEMU after +meeting unknown IOVAs. With this protocol feature negotiated, QEMU will +provide all the IOTLBs to vhost backend without waiting for the queries +from backend. This is helpful when using a hardware accelerator which is +not able to handle unknown IOVAs at the vhost-user backend. diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c index 7f09efab8b..d72994e9a5 100644 --- a/hw/virtio/vhost-backend.c +++ b/hw/virtio/vhost-backend.c @@ -233,6 +233,11 @@ static void vhost_kernel_set_iotlb_callback(struct vhost_dev *dev, qemu_set_fd_handler((uintptr_t)dev->opaque, NULL, NULL, NULL); } +static bool vhost_kernel_need_all_device_iotlb(struct vhost_dev *dev) +{ + return false; +} + static const VhostOps kernel_ops = { .backend_type = VHOST_BACKEND_TYPE_KERNEL, .vhost_backend_init = vhost_kernel_init, @@ -264,6 +269,8 @@ static const VhostOps kernel_ops = { #endif /* CONFIG_VHOST_VSOCK */ .vhost_set_iotlb_callback = vhost_kernel_set_iotlb_callback, .vhost_send_device_iotlb_msg = vhost_kernel_send_device_iotlb_msg, + .vhost_backend_need_all_device_iotlb = + vhost_kernel_need_all_device_iotlb, }; int vhost_set_backend_type(struct vhost_dev *dev, VhostBackendType backend_type) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 38da8692bb..30e0b1c13b 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -47,6 +47,7 @@ enum VhostUserProtocolFeature { VHOST_USER_PROTOCOL_F_CRYPTO_SESSION = 7, VHOST_USER_PROTOCOL_F_PAGEFAULT = 8, VHOST_USER_PROTOCOL_F_CONFIG = 9, + VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB = 10, VHOST_USER_PROTOCOL_F_MAX }; @@ -1581,6 +1582,12 @@ vhost_user_crypto_close_session(struct vhost_dev *dev, uint64_t session_id) return 0; } +static bool vhost_user_need_all_device_iotlb(struct vhost_dev *dev) +{ + return virtio_has_feature(dev->protocol_features, + VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB); +} + const VhostOps user_ops = { .backend_type = VHOST_BACKEND_TYPE_USER, .vhost_backend_init = vhost_user_init, @@ -1611,4 +1618,5 @@ const VhostOps user_ops = { .vhost_set_config = vhost_user_set_config, .vhost_crypto_create_session = vhost_user_crypto_create_session, .vhost_crypto_close_session = vhost_user_crypto_close_session, + .vhost_backend_need_all_device_iotlb = vhost_user_need_all_device_iotlb, }; diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index f51bf573d5..70922ee998 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -48,6 +48,9 @@ static unsigned int used_memslots; static QLIST_HEAD(, vhost_dev) vhost_devices = QLIST_HEAD_INITIALIZER(vhost_devices); +static int vhost_memory_region_lookup(struct vhost_dev *hdev, uint64_t gpa, + uint64_t *uaddr, uint64_t *len); + bool vhost_has_free_slot(void) { unsigned int slots_limit = ~0U; @@ -634,12 +637,39 @@ static void vhost_region_addnop(MemoryListener *listener, vhost_region_add_section(dev, section); } -static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) +static void vhost_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) { struct vhost_iommu *iommu = container_of(n, struct vhost_iommu, n); struct vhost_dev *hdev = iommu->hdev; hwaddr iova = iotlb->iova + iommu->iommu_offset; + if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) { + uint64_t uaddr, len; + + rcu_read_lock(); + + if (iotlb->target_as != NULL) { + if (vhost_memory_region_lookup(hdev, iotlb->translated_addr, + &uaddr, &len)) { + error_report("Fail to lookup the translated address " + "%"PRIx64, iotlb->translated_addr); + goto out; + } + + len = MIN(iotlb->addr_mask + 1, len); + iova = iova & ~iotlb->addr_mask; + + if (vhost_backend_update_device_iotlb(hdev, iova, uaddr, + len, iotlb->perm)) { + error_report("Fail to update device iotlb"); + goto out; + } + } +out: + rcu_read_unlock(); + return; + } + if (vhost_backend_invalidate_device_iotlb(hdev, iova, iotlb->addr_mask + 1)) { error_report("Fail to invalidate device iotlb"); @@ -652,6 +682,7 @@ static void vhost_iommu_region_add(MemoryListener *listener, struct vhost_dev *dev = container_of(listener, struct vhost_dev, iommu_listener); struct vhost_iommu *iommu; + IOMMUNotifierFlag flags; Int128 end; if (!memory_region_is_iommu(section->mr)) { @@ -662,8 +693,15 @@ static void vhost_iommu_region_add(MemoryListener *listener, end = int128_add(int128_make64(section->offset_within_region), section->size); end = int128_sub(end, int128_one()); - iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify, - IOMMU_NOTIFIER_UNMAP, + + if (dev->vhost_ops->vhost_backend_need_all_device_iotlb(dev)) { + flags = IOMMU_NOTIFIER_ALL; + } else { + flags = IOMMU_NOTIFIER_UNMAP; + } + + iommu_notifier_init(&iommu->n, vhost_iommu_map_notify, + flags, section->offset_within_region, int128_get64(end)); iommu->mr = section->mr; @@ -673,6 +711,9 @@ static void vhost_iommu_region_add(MemoryListener *listener, memory_region_register_iommu_notifier(section->mr, &iommu->n); QLIST_INSERT_HEAD(&dev->iommu_list, iommu, iommu_next); /* TODO: can replay help performance here? */ + if (flags == IOMMU_NOTIFIER_ALL) { + memory_region_iommu_replay(IOMMU_MEMORY_REGION(iommu->mr), &iommu->n); + } } static void vhost_iommu_region_del(MemoryListener *listener, diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h index 5dac61f9ea..602fd987a4 100644 --- a/include/hw/virtio/vhost-backend.h +++ b/include/hw/virtio/vhost-backend.h @@ -101,6 +101,8 @@ typedef int (*vhost_crypto_create_session_op)(struct vhost_dev *dev, typedef int (*vhost_crypto_close_session_op)(struct vhost_dev *dev, uint64_t session_id); +typedef bool (*vhost_backend_need_all_device_iotlb_op)(struct vhost_dev *dev); + typedef struct VhostOps { VhostBackendType backend_type; vhost_backend_init vhost_backend_init; @@ -138,6 +140,7 @@ typedef struct VhostOps { vhost_set_config_op vhost_set_config; vhost_crypto_create_session_op vhost_crypto_create_session; vhost_crypto_close_session_op vhost_crypto_close_session; + vhost_backend_need_all_device_iotlb_op vhost_backend_need_all_device_iotlb; } VhostOps; extern const VhostOps user_ops;
This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB feature for vhost-user. By default, vhost-user backend needs to query the IOTLBs from QEMU after meeting unknown IOVAs. With this protocol feature negotiated, QEMU will provide all the IOTLBs to vhost-user backend without waiting for the queries from backend. This is helpful when using a hardware accelerator which is not able to handle unknown IOVAs at the vhost-user backend. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> --- The idea of this patch is to let QEMU push all the IOTLBs to vhost-user backend without waiting for the queries from the backend. Because hardware accelerator at the vhost-user backend may not be able to handle unknown IOVAs. This is just a RFC for now. It seems that, it doesn't work as expected when guest is using kernel driver (To handle this case, it seems that some RAM regions' events also need to be listened). Any comments would be appreciated! Thanks! docs/interop/vhost-user.txt | 9 ++++++++ hw/virtio/vhost-backend.c | 7 ++++++ hw/virtio/vhost-user.c | 8 +++++++ hw/virtio/vhost.c | 47 ++++++++++++++++++++++++++++++++++++--- include/hw/virtio/vhost-backend.h | 3 +++ 5 files changed, 71 insertions(+), 3 deletions(-)