Message ID | 1701970793-6865-11-git-send-email-si-wei.liu@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vdpa-net: improve migration downtime through descriptor ASID and persistent IOTLB | expand |
On Thu, Dec 7, 2023 at 7:50 PM Si-Wei Liu <si-wei.liu@oracle.com> wrote: > > When backend supports the VHOST_BACKEND_F_DESC_ASID feature > and all the data vqs can support one or more descriptor group > to host SVQ vrings and descriptors, we assign them a different > ASID than where its buffers reside in guest memory address > space. With this dedicated ASID for SVQs, the IOVA for what > vdpa device may care effectively becomes the GPA, thus there's > no need to translate IOVA address. For this reason, shadow_data > can be turned off accordingly. It doesn't mean the SVQ is not > enabled, but just that the translation is not needed from iova > tree perspective. > > We can reuse CVQ's address space ID to host SVQ descriptors > because both CVQ and SVQ are emulated in the same QEMU > process, which will share the same VA address space. > > Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> > --- > hw/virtio/vhost-vdpa.c | 5 ++++- > net/vhost-vdpa.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++---- > 2 files changed, 57 insertions(+), 5 deletions(-) > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > index 24844b5..30dff95 100644 > --- a/hw/virtio/vhost-vdpa.c > +++ b/hw/virtio/vhost-vdpa.c > @@ -627,6 +627,7 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp) > uint64_t qemu_backend_features = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 | > 0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH | > 0x1ULL << VHOST_BACKEND_F_IOTLB_ASID | > + 0x1ULL << VHOST_BACKEND_F_DESC_ASID | > 0x1ULL << VHOST_BACKEND_F_SUSPEND; > int ret; > > @@ -1249,7 +1250,9 @@ static bool vhost_vdpa_svqs_start(struct vhost_dev *dev) > goto err; > } > > - vhost_svq_start(svq, dev->vdev, vq, v->shared->iova_tree); > + vhost_svq_start(svq, dev->vdev, vq, > + v->desc_group >= 0 && v->address_space_id ? v->address_space_id != VHOST_VDPA_GUEST_PA_ASID? > + NULL : v->shared->iova_tree); > ok = vhost_vdpa_svq_map_rings(dev, svq, &addr, &err); > if (unlikely(!ok)) { > goto err_map; > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c > index 2555897..aebaa53 100644 > --- a/net/vhost-vdpa.c > +++ b/net/vhost-vdpa.c > @@ -366,20 +366,50 @@ static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v, > static void vhost_vdpa_net_data_start_first(VhostVDPAState *s) > { > struct vhost_vdpa *v = &s->vhost_vdpa; > + int r; > > migration_add_notifier(&s->migration_state, > vdpa_net_migration_state_notifier); > > + if (!v->shadow_vqs_enabled) { && VHOST_BACKEND_F_DESC_ASID feature is acked? > + if (v->desc_group >= 0 && > + v->address_space_id != VHOST_VDPA_GUEST_PA_ASID) { > + vhost_vdpa_set_address_space_id(v, v->desc_group, > + VHOST_VDPA_GUEST_PA_ASID); > + s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID; > + } > + return; > + } > + > /* iova_tree may be initialized by vhost_vdpa_net_load_setup */ > - if (v->shadow_vqs_enabled && !v->shared->iova_tree) { > + if (!v->shared->iova_tree) { > v->shared->iova_tree = vhost_iova_tree_new(v->shared->iova_range.first, > v->shared->iova_range.last); > } Maybe not so popular opinion, but I would add a previous patch modifying: if (v->shadow_vqs_enabled && !v->shared->iova_tree) { iova_tree = new() } --- to: if (!v->shadow_vqs_enabled) { return } if (!v->shared->iova_tree) { iova_tree = new() } --- > + > + if (s->always_svq || v->desc_group < 0) { > + return; > + } > + > + r = vhost_vdpa_set_address_space_id(v, v->desc_group, > + VHOST_VDPA_NET_CVQ_ASID); > + if (unlikely(r < 0)) { > + /* The other data vqs should also fall back to using the same ASID */ > + s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID; > + return; > + } > + > + /* No translation needed on data SVQ when descriptor group is used */ > + s->vhost_vdpa.address_space_id = VHOST_VDPA_NET_CVQ_ASID; I'm not sure "address_space_id" is a good name for this member anymore. If any, I think we can add a comment explaining that it only applies to descs vring if VHOST_BACKEND_F_DESC_ASID is acked and all the needed conditions are met. Also, maybe it is better to define a new constant VHOST_VDPA_NET_VRING_DESCS_ASID, set it to VHOST_VDPA_NET_CVQ_ASID, and explain why it is ok to reuse that ASID? > + s->vhost_vdpa.shared->shadow_data = false; > + return; > } > > static int vhost_vdpa_net_data_start(NetClientState *nc) > { > VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc); > + VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s); > + > struct vhost_vdpa *v = &s->vhost_vdpa; > > assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA); > @@ -397,6 +427,18 @@ static int vhost_vdpa_net_data_start(NetClientState *nc) > return 0; > } > > + if (v->desc_group >= 0 && v->desc_group != s0->vhost_vdpa.desc_group) { > + unsigned asid; > + asid = v->shadow_vqs_enabled ? > + s0->vhost_vdpa.address_space_id : VHOST_VDPA_GUEST_PA_ASID; > + if (asid != s->vhost_vdpa.address_space_id) { > + vhost_vdpa_set_address_space_id(v, v->desc_group, asid); > + } > + s->vhost_vdpa.address_space_id = asid; > + } else { > + s->vhost_vdpa.address_space_id = s0->vhost_vdpa.address_space_id; > + } > + Ok, here I see how all vqs are configured so I think some of my previous comments are not 100% valid. However I think we can improve this, as this omits the case where two vrings different from s0 vring have the same vring descriptor group. But I guess we can always optimize on top. > return 0; > } > > @@ -603,13 +645,19 @@ static int vhost_vdpa_net_cvq_start(NetClientState *nc) > return 0; > } > > - if (!s->cvq_isolated) { > + if (!s->cvq_isolated && v->desc_group < 0) { > + if (s0->vhost_vdpa.shadow_vqs_enabled && > + s0->vhost_vdpa.desc_group >= 0 && > + s0->vhost_vdpa.address_space_id) { > + v->shadow_vqs_enabled = false; > + } > return 0; > } > > - cvq_group = vhost_vdpa_get_vring_group(v->shared->device_fd, > + cvq_group = s->cvq_isolated ? > + vhost_vdpa_get_vring_group(v->shared->device_fd, > v->dev->vq_index_end - 1, > - &err); > + &err) : v->desc_group; I'm not sure if we can happily set v->desc_group if !s->cvq_isolated. If CVQ buffers share its group with data queues, but its vring is effectively isolated, we are setting all the dataplane buffers to the ASID of the CVQ descriptors, isn't it? > if (unlikely(cvq_group < 0)) { > error_report_err(err); > return cvq_group; > @@ -1840,6 +1888,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer, > s->always_svq = svq; > s->migration_state.notify = NULL; > s->vhost_vdpa.shadow_vqs_enabled = svq; > + s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID; Isn't this overridden at each vhost_vdpa_net_*_start? > if (queue_pair_index == 0) { > vhost_vdpa_net_valid_svq_features(features, > &s->vhost_vdpa.migration_blocker); > -- > 1.8.3.1 >
On Fri, Dec 8, 2023 at 2:50 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote: > > When backend supports the VHOST_BACKEND_F_DESC_ASID feature > and all the data vqs can support one or more descriptor group > to host SVQ vrings and descriptors, we assign them a different > ASID than where its buffers reside in guest memory address > space. With this dedicated ASID for SVQs, the IOVA for what > vdpa device may care effectively becomes the GPA, thus there's > no need to translate IOVA address. For this reason, shadow_data > can be turned off accordingly. It doesn't mean the SVQ is not > enabled, but just that the translation is not needed from iova > tree perspective. > > We can reuse CVQ's address space ID to host SVQ descriptors > because both CVQ and SVQ are emulated in the same QEMU > process, which will share the same VA address space. > > Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> > --- > hw/virtio/vhost-vdpa.c | 5 ++++- > net/vhost-vdpa.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++---- > 2 files changed, 57 insertions(+), 5 deletions(-) > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > index 24844b5..30dff95 100644 > --- a/hw/virtio/vhost-vdpa.c > +++ b/hw/virtio/vhost-vdpa.c > @@ -627,6 +627,7 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp) > uint64_t qemu_backend_features = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 | > 0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH | > 0x1ULL << VHOST_BACKEND_F_IOTLB_ASID | > + 0x1ULL << VHOST_BACKEND_F_DESC_ASID | > 0x1ULL << VHOST_BACKEND_F_SUSPEND; > int ret; > > @@ -1249,7 +1250,9 @@ static bool vhost_vdpa_svqs_start(struct vhost_dev *dev) > goto err; > } > > - vhost_svq_start(svq, dev->vdev, vq, v->shared->iova_tree); > + vhost_svq_start(svq, dev->vdev, vq, > + v->desc_group >= 0 && v->address_space_id ? > + NULL : v->shared->iova_tree); Nit: it might be a little bit more clear if we use a helper to check like vhost_svq_needs _iova_tree() > ok = vhost_vdpa_svq_map_rings(dev, svq, &addr, &err); > if (unlikely(!ok)) { > goto err_map; > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c > index 2555897..aebaa53 100644 > --- a/net/vhost-vdpa.c > +++ b/net/vhost-vdpa.c > @@ -366,20 +366,50 @@ static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v, > static void vhost_vdpa_net_data_start_first(VhostVDPAState *s) > { > struct vhost_vdpa *v = &s->vhost_vdpa; > + int r; > > migration_add_notifier(&s->migration_state, > vdpa_net_migration_state_notifier); > > + if (!v->shadow_vqs_enabled) { > + if (v->desc_group >= 0 && > + v->address_space_id != VHOST_VDPA_GUEST_PA_ASID) { > + vhost_vdpa_set_address_space_id(v, v->desc_group, > + VHOST_VDPA_GUEST_PA_ASID); > + s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID; > + } > + return; > + } > + > /* iova_tree may be initialized by vhost_vdpa_net_load_setup */ > - if (v->shadow_vqs_enabled && !v->shared->iova_tree) { > + if (!v->shared->iova_tree) { > v->shared->iova_tree = vhost_iova_tree_new(v->shared->iova_range.first, > v->shared->iova_range.last); > } > + > + if (s->always_svq || v->desc_group < 0) { I think the always_svq mode deserves a TODO there since it can utilize the desc_group actually? > + return; > + } > + > + r = vhost_vdpa_set_address_space_id(v, v->desc_group, > + VHOST_VDPA_NET_CVQ_ASID); Any reason why we only set the descriptor group for the first nc? (This seems implies the device has one descriptor group for all virtqueue which might not be true) > + if (unlikely(r < 0)) { > + /* The other data vqs should also fall back to using the same ASID */ > + s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID; > + return; > + } > + > + /* No translation needed on data SVQ when descriptor group is used */ > + s->vhost_vdpa.address_space_id = VHOST_VDPA_NET_CVQ_ASID; > + s->vhost_vdpa.shared->shadow_data = false; > + return; > } > > static int vhost_vdpa_net_data_start(NetClientState *nc) > { > VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc); > + VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s); > + > struct vhost_vdpa *v = &s->vhost_vdpa; > > assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA); > @@ -397,6 +427,18 @@ static int vhost_vdpa_net_data_start(NetClientState *nc) > return 0; > } > > + if (v->desc_group >= 0 && v->desc_group != s0->vhost_vdpa.desc_group) { > + unsigned asid; > + asid = v->shadow_vqs_enabled ? > + s0->vhost_vdpa.address_space_id : VHOST_VDPA_GUEST_PA_ASID; > + if (asid != s->vhost_vdpa.address_space_id) { > + vhost_vdpa_set_address_space_id(v, v->desc_group, asid); > + } > + s->vhost_vdpa.address_space_id = asid; Can we unify the logic for nc0 and others here? Then we don't need the trick in start_fisrt(). > + } else { > + s->vhost_vdpa.address_space_id = s0->vhost_vdpa.address_space_id; > + } > + > return 0; > } > > @@ -603,13 +645,19 @@ static int vhost_vdpa_net_cvq_start(NetClientState *nc) > return 0; > } > > - if (!s->cvq_isolated) { > + if (!s->cvq_isolated && v->desc_group < 0) { > + if (s0->vhost_vdpa.shadow_vqs_enabled && > + s0->vhost_vdpa.desc_group >= 0 && I think we should fail if v->desc_group < 0 but s0->vhost_vdpa.desc_group >= 0 ? > + s0->vhost_vdpa.address_space_id) { If this is a check for VHOST_VDPA_GUEST_PA_ASID, let's explicitly check it against the macro here. But it's not clear to me the logic here: It looks to me like the code tries to work when CVQ is not isolated, is this intended? This makes the logic rather complicated here. Thanks > + v->shadow_vqs_enabled = false; > + } > return 0; > } > > - cvq_group = vhost_vdpa_get_vring_group(v->shared->device_fd, > + cvq_group = s->cvq_isolated ? > + vhost_vdpa_get_vring_group(v->shared->device_fd, > v->dev->vq_index_end - 1, > - &err); > + &err) : v->desc_group; > if (unlikely(cvq_group < 0)) { > error_report_err(err); > return cvq_group; > @@ -1840,6 +1888,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer, > s->always_svq = svq; > s->migration_state.notify = NULL; > s->vhost_vdpa.shadow_vqs_enabled = svq; > + s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID; > if (queue_pair_index == 0) { > vhost_vdpa_net_valid_svq_features(features, > &s->vhost_vdpa.migration_blocker); > -- > 1.8.3.1 >
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c index 24844b5..30dff95 100644 --- a/hw/virtio/vhost-vdpa.c +++ b/hw/virtio/vhost-vdpa.c @@ -627,6 +627,7 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp) uint64_t qemu_backend_features = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 | 0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH | 0x1ULL << VHOST_BACKEND_F_IOTLB_ASID | + 0x1ULL << VHOST_BACKEND_F_DESC_ASID | 0x1ULL << VHOST_BACKEND_F_SUSPEND; int ret; @@ -1249,7 +1250,9 @@ static bool vhost_vdpa_svqs_start(struct vhost_dev *dev) goto err; } - vhost_svq_start(svq, dev->vdev, vq, v->shared->iova_tree); + vhost_svq_start(svq, dev->vdev, vq, + v->desc_group >= 0 && v->address_space_id ? + NULL : v->shared->iova_tree); ok = vhost_vdpa_svq_map_rings(dev, svq, &addr, &err); if (unlikely(!ok)) { goto err_map; diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 2555897..aebaa53 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -366,20 +366,50 @@ static int vhost_vdpa_set_address_space_id(struct vhost_vdpa *v, static void vhost_vdpa_net_data_start_first(VhostVDPAState *s) { struct vhost_vdpa *v = &s->vhost_vdpa; + int r; migration_add_notifier(&s->migration_state, vdpa_net_migration_state_notifier); + if (!v->shadow_vqs_enabled) { + if (v->desc_group >= 0 && + v->address_space_id != VHOST_VDPA_GUEST_PA_ASID) { + vhost_vdpa_set_address_space_id(v, v->desc_group, + VHOST_VDPA_GUEST_PA_ASID); + s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID; + } + return; + } + /* iova_tree may be initialized by vhost_vdpa_net_load_setup */ - if (v->shadow_vqs_enabled && !v->shared->iova_tree) { + if (!v->shared->iova_tree) { v->shared->iova_tree = vhost_iova_tree_new(v->shared->iova_range.first, v->shared->iova_range.last); } + + if (s->always_svq || v->desc_group < 0) { + return; + } + + r = vhost_vdpa_set_address_space_id(v, v->desc_group, + VHOST_VDPA_NET_CVQ_ASID); + if (unlikely(r < 0)) { + /* The other data vqs should also fall back to using the same ASID */ + s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID; + return; + } + + /* No translation needed on data SVQ when descriptor group is used */ + s->vhost_vdpa.address_space_id = VHOST_VDPA_NET_CVQ_ASID; + s->vhost_vdpa.shared->shadow_data = false; + return; } static int vhost_vdpa_net_data_start(NetClientState *nc) { VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc); + VhostVDPAState *s0 = vhost_vdpa_net_first_nc_vdpa(s); + struct vhost_vdpa *v = &s->vhost_vdpa; assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA); @@ -397,6 +427,18 @@ static int vhost_vdpa_net_data_start(NetClientState *nc) return 0; } + if (v->desc_group >= 0 && v->desc_group != s0->vhost_vdpa.desc_group) { + unsigned asid; + asid = v->shadow_vqs_enabled ? + s0->vhost_vdpa.address_space_id : VHOST_VDPA_GUEST_PA_ASID; + if (asid != s->vhost_vdpa.address_space_id) { + vhost_vdpa_set_address_space_id(v, v->desc_group, asid); + } + s->vhost_vdpa.address_space_id = asid; + } else { + s->vhost_vdpa.address_space_id = s0->vhost_vdpa.address_space_id; + } + return 0; } @@ -603,13 +645,19 @@ static int vhost_vdpa_net_cvq_start(NetClientState *nc) return 0; } - if (!s->cvq_isolated) { + if (!s->cvq_isolated && v->desc_group < 0) { + if (s0->vhost_vdpa.shadow_vqs_enabled && + s0->vhost_vdpa.desc_group >= 0 && + s0->vhost_vdpa.address_space_id) { + v->shadow_vqs_enabled = false; + } return 0; } - cvq_group = vhost_vdpa_get_vring_group(v->shared->device_fd, + cvq_group = s->cvq_isolated ? + vhost_vdpa_get_vring_group(v->shared->device_fd, v->dev->vq_index_end - 1, - &err); + &err) : v->desc_group; if (unlikely(cvq_group < 0)) { error_report_err(err); return cvq_group; @@ -1840,6 +1888,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer, s->always_svq = svq; s->migration_state.notify = NULL; s->vhost_vdpa.shadow_vqs_enabled = svq; + s->vhost_vdpa.address_space_id = VHOST_VDPA_GUEST_PA_ASID; if (queue_pair_index == 0) { vhost_vdpa_net_valid_svq_features(features, &s->vhost_vdpa.migration_blocker);
When backend supports the VHOST_BACKEND_F_DESC_ASID feature and all the data vqs can support one or more descriptor group to host SVQ vrings and descriptors, we assign them a different ASID than where its buffers reside in guest memory address space. With this dedicated ASID for SVQs, the IOVA for what vdpa device may care effectively becomes the GPA, thus there's no need to translate IOVA address. For this reason, shadow_data can be turned off accordingly. It doesn't mean the SVQ is not enabled, but just that the translation is not needed from iova tree perspective. We can reuse CVQ's address space ID to host SVQ descriptors because both CVQ and SVQ are emulated in the same QEMU process, which will share the same VA address space. Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> --- hw/virtio/vhost-vdpa.c | 5 ++++- net/vhost-vdpa.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 57 insertions(+), 5 deletions(-)