Message ID | 20181206063552.6701-1-xieyongji@baidu.com (mailing list archive) |
---|---|
Headers | show |
Series | vhost-user-blk: Add support for backend reconnecting | expand |
Hi On Thu, Dec 6, 2018 at 10:36 AM <elohimes@gmail.com> wrote: > > From: Xie Yongji <xieyongji@baidu.com> > > This patchset is aimed at supporting qemu to reconnect > vhost-user-blk backend after vhost-user-blk backend crash or > restart. > > The patch 1 tries to implenment the sync connection for > "reconnect socket". > > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT > to support offering shared memory to backend to record > its inflight I/O. > > The patch 3,4 are the corresponding libvhost-user patches of > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT. > > The patch 5 supports vhost-user-blk to reconnect backend when > connection closed. > > The patch 6 tells qemu that we support reconnecting now. > > To use it, we could start qemu with: > > qemu-system-x86_64 \ > -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \ > -device vhost-user-blk-pci,chardev=char0 \ Why do you want qemu to be the client since it is actually the one that serves and remains alive? Why make it try to reconnect regularly when it could instead wait for a connection to come up? > > and start vhost-user-blk backend with: > > vhost-user-blk -b /path/file -s /path/vhost.socket > > Then we can restart vhost-user-blk at any time during VM running. > > Xie Yongji (6): > char-socket: Enable "wait" option for client mode > vhost-user: Add shared memory to record inflight I/O > libvhost-user: Introduce vu_queue_map_desc() > libvhost-user: Support recording inflight I/O in shared memory > vhost-user-blk: Add support for reconnecting backend > contrib/vhost-user-blk: enable inflight I/O recording > > chardev/char-socket.c | 5 +- > contrib/libvhost-user/libvhost-user.c | 215 ++++++++++++++++++++---- > contrib/libvhost-user/libvhost-user.h | 19 +++ > contrib/vhost-user-blk/vhost-user-blk.c | 3 +- > hw/block/vhost-user-blk.c | 169 +++++++++++++++++-- > hw/virtio/vhost-user.c | 69 ++++++++ > hw/virtio/vhost.c | 8 + > include/hw/virtio/vhost-backend.h | 4 + > include/hw/virtio/vhost-user-blk.h | 4 + > include/hw/virtio/vhost-user.h | 8 + > 10 files changed, 452 insertions(+), 52 deletions(-) > > -- > 2.17.1 > >
On Thu, 6 Dec 2018 at 15:23, Marc-André Lureau <marcandre.lureau@gmail.com> wrote: > > Hi > > On Thu, Dec 6, 2018 at 10:36 AM <elohimes@gmail.com> wrote: > > > > From: Xie Yongji <xieyongji@baidu.com> > > > > This patchset is aimed at supporting qemu to reconnect > > vhost-user-blk backend after vhost-user-blk backend crash or > > restart. > > > > The patch 1 tries to implenment the sync connection for > > "reconnect socket". > > > > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT > > to support offering shared memory to backend to record > > its inflight I/O. > > > > The patch 3,4 are the corresponding libvhost-user patches of > > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT. > > > > The patch 5 supports vhost-user-blk to reconnect backend when > > connection closed. > > > > The patch 6 tells qemu that we support reconnecting now. > > > > To use it, we could start qemu with: > > > > qemu-system-x86_64 \ > > -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \ > > -device vhost-user-blk-pci,chardev=char0 \ > > Why do you want qemu to be the client since it is actually the one > that serves and remains alive? Why make it try to reconnect regularly > when it could instead wait for a connection to come up? > Actually, this patchset should also work when qemu is in server mode. The reason I make qemu to be client is that some vhost-user backend such as spdk, vhost-user-blk may still work in server mode. And seems like we could not make sure all vhost-user backend is working in client mode. Thanks, Yongji
Hi, it's very interesting patchset. I also research reconnecting issue for vhost-user-blk and SPDK. Did you support a case when vhost backend is not started but QEMU does? Regards, Yury 06.12.2018, 09:37, "elohimes@gmail.com" <elohimes@gmail.com>: > From: Xie Yongji <xieyongji@baidu.com> > > This patchset is aimed at supporting qemu to reconnect > vhost-user-blk backend after vhost-user-blk backend crash or > restart. > > The patch 1 tries to implenment the sync connection for > "reconnect socket". > > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT > to support offering shared memory to backend to record > its inflight I/O. > > The patch 3,4 are the corresponding libvhost-user patches of > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT. > > The patch 5 supports vhost-user-blk to reconnect backend when > connection closed. > > The patch 6 tells qemu that we support reconnecting now. > > To use it, we could start qemu with: > > qemu-system-x86_64 \ > -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \ > -device vhost-user-blk-pci,chardev=char0 \ > > and start vhost-user-blk backend with: > > vhost-user-blk -b /path/file -s /path/vhost.socket > > Then we can restart vhost-user-blk at any time during VM running. > > Xie Yongji (6): > char-socket: Enable "wait" option for client mode > vhost-user: Add shared memory to record inflight I/O > libvhost-user: Introduce vu_queue_map_desc() > libvhost-user: Support recording inflight I/O in shared memory > vhost-user-blk: Add support for reconnecting backend > contrib/vhost-user-blk: enable inflight I/O recording > > chardev/char-socket.c | 5 +- > contrib/libvhost-user/libvhost-user.c | 215 ++++++++++++++++++++---- > contrib/libvhost-user/libvhost-user.h | 19 +++ > contrib/vhost-user-blk/vhost-user-blk.c | 3 +- > hw/block/vhost-user-blk.c | 169 +++++++++++++++++-- > hw/virtio/vhost-user.c | 69 ++++++++ > hw/virtio/vhost.c | 8 + > include/hw/virtio/vhost-backend.h | 4 + > include/hw/virtio/vhost-user-blk.h | 4 + > include/hw/virtio/vhost-user.h | 8 + > 10 files changed, 452 insertions(+), 52 deletions(-) > > -- > 2.17.1
On Thu, 6 Dec 2018 at 17:21, Yury Kotov <yury-kotov@yandex-team.ru> wrote: > > Hi, it's very interesting patchset. > > I also research reconnecting issue for vhost-user-blk and SPDK. > Did you support a case when vhost backend is not started but QEMU does? > Now we do not support this case. Because qemu have to get config from vhost-user backend through VHOST_USER_GET_CONFIG when we initialize the vhost-user-blk device. If we delay getting config until we connect vhost-user backend, guest driver may get incorrect configuration? That's why I modify the "wait" option to support client mode. Thanks, Yongji
Yes, I also think that realize shout be sync. But may be it's better to add an 'disconnected' option to init the chardev in disconnected state, then do the first connection with qemu_chr_fe_wait_connected from vhost_user_blk_realize. So when connection will be broken in realize we can try again. What do you think about it? Regards, Yury 06.12.2018, 12:42, "Yongji Xie" <elohimes@gmail.com>: > On Thu, 6 Dec 2018 at 17:21, Yury Kotov <yury-kotov@yandex-team.ru> wrote: >> Hi, it's very interesting patchset. >> >> I also research reconnecting issue for vhost-user-blk and SPDK. >> Did you support a case when vhost backend is not started but QEMU does? > > Now we do not support this case. Because qemu have to get config from vhost-user > backend through VHOST_USER_GET_CONFIG when we initialize the > vhost-user-blk device. > > If we delay getting config until we connect vhost-user backend, guest > driver may get incorrect > configuration? That's why I modify the "wait" option to support client mode. > > Thanks, > Yongji
On Thu, 6 Dec 2018 at 17:52, Yury Kotov <yury-kotov@yandex-team.ru> wrote: > > Yes, I also think that realize shout be sync. > > But may be it's better to add an 'disconnected' option to init the chardev > in disconnected state, then do the first connection with > qemu_chr_fe_wait_connected from vhost_user_blk_realize. So when > connection will be broken in realize we can try again. > What do you think about it? > Sounds good to me. And maybe we need to add a timeout for the retrying. Thanks, Yongji
On 2018/12/6 下午2:35, elohimes@gmail.com wrote: > From: Xie Yongji <xieyongji@baidu.com> > > This patchset is aimed at supporting qemu to reconnect > vhost-user-blk backend after vhost-user-blk backend crash or > restart. > > The patch 1 tries to implenment the sync connection for > "reconnect socket". > > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT > to support offering shared memory to backend to record > its inflight I/O. > > The patch 3,4 are the corresponding libvhost-user patches of > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT. > > The patch 5 supports vhost-user-blk to reconnect backend when > connection closed. > > The patch 6 tells qemu that we support reconnecting now. > > To use it, we could start qemu with: > > qemu-system-x86_64 \ > -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \ > -device vhost-user-blk-pci,chardev=char0 \ > > and start vhost-user-blk backend with: > > vhost-user-blk -b /path/file -s /path/vhost.socket > > Then we can restart vhost-user-blk at any time during VM running. I wonder whether or not it's better to handle this at the level of virtio protocol itself instead of vhost-user level. E.g expose last_avail_idx to driver might be sufficient? Another possible issue is, looks like you need to deal with different kinds of ring layouts e.g packed virtqueues. Thanks > > Xie Yongji (6): > char-socket: Enable "wait" option for client mode > vhost-user: Add shared memory to record inflight I/O > libvhost-user: Introduce vu_queue_map_desc() > libvhost-user: Support recording inflight I/O in shared memory > vhost-user-blk: Add support for reconnecting backend > contrib/vhost-user-blk: enable inflight I/O recording > > chardev/char-socket.c | 5 +- > contrib/libvhost-user/libvhost-user.c | 215 ++++++++++++++++++++---- > contrib/libvhost-user/libvhost-user.h | 19 +++ > contrib/vhost-user-blk/vhost-user-blk.c | 3 +- > hw/block/vhost-user-blk.c | 169 +++++++++++++++++-- > hw/virtio/vhost-user.c | 69 ++++++++ > hw/virtio/vhost.c | 8 + > include/hw/virtio/vhost-backend.h | 4 + > include/hw/virtio/vhost-user-blk.h | 4 + > include/hw/virtio/vhost-user.h | 8 + > 10 files changed, 452 insertions(+), 52 deletions(-) >
On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote: > > On 2018/12/6 下午2:35, elohimes@gmail.com wrote: > > From: Xie Yongji <xieyongji@baidu.com> > > > > This patchset is aimed at supporting qemu to reconnect > > vhost-user-blk backend after vhost-user-blk backend crash or > > restart. > > > > The patch 1 tries to implenment the sync connection for > > "reconnect socket". > > > > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT > > to support offering shared memory to backend to record > > its inflight I/O. > > > > The patch 3,4 are the corresponding libvhost-user patches of > > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT. > > > > The patch 5 supports vhost-user-blk to reconnect backend when > > connection closed. > > > > The patch 6 tells qemu that we support reconnecting now. > > > > To use it, we could start qemu with: > > > > qemu-system-x86_64 \ > > -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \ > > -device vhost-user-blk-pci,chardev=char0 \ > > > > and start vhost-user-blk backend with: > > > > vhost-user-blk -b /path/file -s /path/vhost.socket > > > > Then we can restart vhost-user-blk at any time during VM running. > > > I wonder whether or not it's better to handle this at the level of virtio > protocol itself instead of vhost-user level. E.g expose last_avail_idx to > driver might be sufficient? > > Another possible issue is, looks like you need to deal with different kinds > of ring layouts e.g packed virtqueues. > > Thanks I'm not sure I understand your comments here. All these would be guest-visible extensions. Possible for sure but how is this related to a patch supporting transparent reconnects? > > > > > Xie Yongji (6): > > char-socket: Enable "wait" option for client mode > > vhost-user: Add shared memory to record inflight I/O > > libvhost-user: Introduce vu_queue_map_desc() > > libvhost-user: Support recording inflight I/O in shared memory > > vhost-user-blk: Add support for reconnecting backend > > contrib/vhost-user-blk: enable inflight I/O recording > > > > chardev/char-socket.c | 5 +- > > contrib/libvhost-user/libvhost-user.c | 215 ++++++++++++++++++++---- > > contrib/libvhost-user/libvhost-user.h | 19 +++ > > contrib/vhost-user-blk/vhost-user-blk.c | 3 +- > > hw/block/vhost-user-blk.c | 169 +++++++++++++++++-- > > hw/virtio/vhost-user.c | 69 ++++++++ > > hw/virtio/vhost.c | 8 + > > include/hw/virtio/vhost-backend.h | 4 + > > include/hw/virtio/vhost-user-blk.h | 4 + > > include/hw/virtio/vhost-user.h | 8 + > > 10 files changed, 452 insertions(+), 52 deletions(-) > >
On 2018/12/6 下午9:57, Jason Wang wrote: > > On 2018/12/6 下午2:35, elohimes@gmail.com wrote: >> From: Xie Yongji <xieyongji@baidu.com> >> >> This patchset is aimed at supporting qemu to reconnect >> vhost-user-blk backend after vhost-user-blk backend crash or >> restart. >> >> The patch 1 tries to implenment the sync connection for >> "reconnect socket". >> >> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT >> to support offering shared memory to backend to record >> its inflight I/O. >> >> The patch 3,4 are the corresponding libvhost-user patches of >> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT. >> >> The patch 5 supports vhost-user-blk to reconnect backend when >> connection closed. >> >> The patch 6 tells qemu that we support reconnecting now. >> >> To use it, we could start qemu with: >> >> qemu-system-x86_64 \ >> -chardev >> socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \ >> -device vhost-user-blk-pci,chardev=char0 \ >> >> and start vhost-user-blk backend with: >> >> vhost-user-blk -b /path/file -s /path/vhost.socket >> >> Then we can restart vhost-user-blk at any time during VM running. > > > I wonder whether or not it's better to handle this at the level of > virtio protocol itself instead of vhost-user level. E.g expose > last_avail_idx to driver might be sufficient? > > Another possible issue is, looks like you need to deal with different > kinds of ring layouts e.g packed virtqueues. > > Thanks Cc Maxime who wrote vhost-user reconnecting for more thoughts. Thanks
On Thu, 6 Dec 2018 at 21:57, Jason Wang <jasowang@redhat.com> wrote: > > > On 2018/12/6 下午2:35, elohimes@gmail.com wrote: > > From: Xie Yongji <xieyongji@baidu.com> > > > > This patchset is aimed at supporting qemu to reconnect > > vhost-user-blk backend after vhost-user-blk backend crash or > > restart. > > > > The patch 1 tries to implenment the sync connection for > > "reconnect socket". > > > > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT > > to support offering shared memory to backend to record > > its inflight I/O. > > > > The patch 3,4 are the corresponding libvhost-user patches of > > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT. > > > > The patch 5 supports vhost-user-blk to reconnect backend when > > connection closed. > > > > The patch 6 tells qemu that we support reconnecting now. > > > > To use it, we could start qemu with: > > > > qemu-system-x86_64 \ > > -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \ > > -device vhost-user-blk-pci,chardev=char0 \ > > > > and start vhost-user-blk backend with: > > > > vhost-user-blk -b /path/file -s /path/vhost.socket > > > > Then we can restart vhost-user-blk at any time during VM running. > > > I wonder whether or not it's better to handle this at the level of > virtio protocol itself instead of vhost-user level. E.g expose > last_avail_idx to driver might be sufficient? > > Another possible issue is, looks like you need to deal with different > kinds of ring layouts e.g packed virtqueues. > Yes, we should be able to deal with the packed virtqueues. But this should be processed in backend rather than in qemu. Qemu just sends a shared memory to backend. Then backend use it to record inflight I/O and do I/O replay when necessary. Thanks, Yongji
On 2018/12/6 下午9:59, Michael S. Tsirkin wrote: > On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote: >> On 2018/12/6 下午2:35,elohimes@gmail.com wrote: >>> From: Xie Yongji<xieyongji@baidu.com> >>> >>> This patchset is aimed at supporting qemu to reconnect >>> vhost-user-blk backend after vhost-user-blk backend crash or >>> restart. >>> >>> The patch 1 tries to implenment the sync connection for >>> "reconnect socket". >>> >>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT >>> to support offering shared memory to backend to record >>> its inflight I/O. >>> >>> The patch 3,4 are the corresponding libvhost-user patches of >>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT. >>> >>> The patch 5 supports vhost-user-blk to reconnect backend when >>> connection closed. >>> >>> The patch 6 tells qemu that we support reconnecting now. >>> >>> To use it, we could start qemu with: >>> >>> qemu-system-x86_64 \ >>> -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \ >>> -device vhost-user-blk-pci,chardev=char0 \ >>> >>> and start vhost-user-blk backend with: >>> >>> vhost-user-blk -b /path/file -s /path/vhost.socket >>> >>> Then we can restart vhost-user-blk at any time during VM running. >> I wonder whether or not it's better to handle this at the level of virtio >> protocol itself instead of vhost-user level. E.g expose last_avail_idx to >> driver might be sufficient? >> >> Another possible issue is, looks like you need to deal with different kinds >> of ring layouts e.g packed virtqueues. >> >> Thanks > I'm not sure I understand your comments here. > All these would be guest-visible extensions. Looks not, it only introduces a shared memory between qemu and vhost-user backend? > Possible for sure but how is this related to > a patch supporting transparent reconnects? I might miss something. My understanding is that we support transparent reconnects, but we can't deduce an accurate last_avail_idx and this is what capability this series try to add. To me, this series is functional equivalent to expose last_avail_idx (or avail_idx_cons) in available ring. So the information is inside guest memory, vhost-user backend can access it and update it directly. I believe this is some modern NIC did as well (but index is in MMIO area of course). Thanks
On Mon, 10 Dec 2018 at 17:32, Jason Wang <jasowang@redhat.com> wrote: > > > On 2018/12/6 下午9:59, Michael S. Tsirkin wrote: > > On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote: > >> On 2018/12/6 下午2:35,elohimes@gmail.com wrote: > >>> From: Xie Yongji<xieyongji@baidu.com> > >>> > >>> This patchset is aimed at supporting qemu to reconnect > >>> vhost-user-blk backend after vhost-user-blk backend crash or > >>> restart. > >>> > >>> The patch 1 tries to implenment the sync connection for > >>> "reconnect socket". > >>> > >>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT > >>> to support offering shared memory to backend to record > >>> its inflight I/O. > >>> > >>> The patch 3,4 are the corresponding libvhost-user patches of > >>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT. > >>> > >>> The patch 5 supports vhost-user-blk to reconnect backend when > >>> connection closed. > >>> > >>> The patch 6 tells qemu that we support reconnecting now. > >>> > >>> To use it, we could start qemu with: > >>> > >>> qemu-system-x86_64 \ > >>> -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \ > >>> -device vhost-user-blk-pci,chardev=char0 \ > >>> > >>> and start vhost-user-blk backend with: > >>> > >>> vhost-user-blk -b /path/file -s /path/vhost.socket > >>> > >>> Then we can restart vhost-user-blk at any time during VM running. > >> I wonder whether or not it's better to handle this at the level of virtio > >> protocol itself instead of vhost-user level. E.g expose last_avail_idx to > >> driver might be sufficient? > >> > >> Another possible issue is, looks like you need to deal with different kinds > >> of ring layouts e.g packed virtqueues. > >> > >> Thanks > > I'm not sure I understand your comments here. > > All these would be guest-visible extensions. > > > Looks not, it only introduces a shared memory between qemu and > vhost-user backend? > > > > Possible for sure but how is this related to > > a patch supporting transparent reconnects? > > > I might miss something. My understanding is that we support transparent > reconnects, but we can't deduce an accurate last_avail_idx and this is > what capability this series try to add. To me, this series is functional > equivalent to expose last_avail_idx (or avail_idx_cons) in available > ring. So the information is inside guest memory, vhost-user backend can > access it and update it directly. I believe this is some modern NIC did > as well (but index is in MMIO area of course). > Hi Jason, If my understand is correct, it might be not enough to only expose last_avail_idx. Because we would not process descriptors in the same order in which they have been made available sometimes. If so, we can't get correct inflight I/O information from available ring. Thanks, Yongji
On 2018/12/12 上午10:48, Yongji Xie wrote: > On Mon, 10 Dec 2018 at 17:32, Jason Wang <jasowang@redhat.com> wrote: >> >> On 2018/12/6 下午9:59, Michael S. Tsirkin wrote: >>> On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote: >>>> On 2018/12/6 下午2:35,elohimes@gmail.com wrote: >>>>> From: Xie Yongji<xieyongji@baidu.com> >>>>> >>>>> This patchset is aimed at supporting qemu to reconnect >>>>> vhost-user-blk backend after vhost-user-blk backend crash or >>>>> restart. >>>>> >>>>> The patch 1 tries to implenment the sync connection for >>>>> "reconnect socket". >>>>> >>>>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT >>>>> to support offering shared memory to backend to record >>>>> its inflight I/O. >>>>> >>>>> The patch 3,4 are the corresponding libvhost-user patches of >>>>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT. >>>>> >>>>> The patch 5 supports vhost-user-blk to reconnect backend when >>>>> connection closed. >>>>> >>>>> The patch 6 tells qemu that we support reconnecting now. >>>>> >>>>> To use it, we could start qemu with: >>>>> >>>>> qemu-system-x86_64 \ >>>>> -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \ >>>>> -device vhost-user-blk-pci,chardev=char0 \ >>>>> >>>>> and start vhost-user-blk backend with: >>>>> >>>>> vhost-user-blk -b /path/file -s /path/vhost.socket >>>>> >>>>> Then we can restart vhost-user-blk at any time during VM running. >>>> I wonder whether or not it's better to handle this at the level of virtio >>>> protocol itself instead of vhost-user level. E.g expose last_avail_idx to >>>> driver might be sufficient? >>>> >>>> Another possible issue is, looks like you need to deal with different kinds >>>> of ring layouts e.g packed virtqueues. >>>> >>>> Thanks >>> I'm not sure I understand your comments here. >>> All these would be guest-visible extensions. >> >> Looks not, it only introduces a shared memory between qemu and >> vhost-user backend? >> >> >>> Possible for sure but how is this related to >>> a patch supporting transparent reconnects? >> >> I might miss something. My understanding is that we support transparent >> reconnects, but we can't deduce an accurate last_avail_idx and this is >> what capability this series try to add. To me, this series is functional >> equivalent to expose last_avail_idx (or avail_idx_cons) in available >> ring. So the information is inside guest memory, vhost-user backend can >> access it and update it directly. I believe this is some modern NIC did >> as well (but index is in MMIO area of course). >> > Hi Jason, > > If my understand is correct, it might be not enough to only expose > last_avail_idx. > Because we would not process descriptors in the same order in which they have > been made available sometimes. If so, we can't get correct inflight > I/O information > from available ring. You can get this with the help of the both used ring and last_avail_idx I believe. Or maybe you can give us an example? Thanks > > Thanks, > Yongji
On Wed, 12 Dec 2018 at 11:00, Jason Wang <jasowang@redhat.com> wrote: > > > On 2018/12/12 上午10:48, Yongji Xie wrote: > > On Mon, 10 Dec 2018 at 17:32, Jason Wang <jasowang@redhat.com> wrote: > >> > >> On 2018/12/6 下午9:59, Michael S. Tsirkin wrote: > >>> On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote: > >>>> On 2018/12/6 下午2:35,elohimes@gmail.com wrote: > >>>>> From: Xie Yongji<xieyongji@baidu.com> > >>>>> > >>>>> This patchset is aimed at supporting qemu to reconnect > >>>>> vhost-user-blk backend after vhost-user-blk backend crash or > >>>>> restart. > >>>>> > >>>>> The patch 1 tries to implenment the sync connection for > >>>>> "reconnect socket". > >>>>> > >>>>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT > >>>>> to support offering shared memory to backend to record > >>>>> its inflight I/O. > >>>>> > >>>>> The patch 3,4 are the corresponding libvhost-user patches of > >>>>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT. > >>>>> > >>>>> The patch 5 supports vhost-user-blk to reconnect backend when > >>>>> connection closed. > >>>>> > >>>>> The patch 6 tells qemu that we support reconnecting now. > >>>>> > >>>>> To use it, we could start qemu with: > >>>>> > >>>>> qemu-system-x86_64 \ > >>>>> -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \ > >>>>> -device vhost-user-blk-pci,chardev=char0 \ > >>>>> > >>>>> and start vhost-user-blk backend with: > >>>>> > >>>>> vhost-user-blk -b /path/file -s /path/vhost.socket > >>>>> > >>>>> Then we can restart vhost-user-blk at any time during VM running. > >>>> I wonder whether or not it's better to handle this at the level of virtio > >>>> protocol itself instead of vhost-user level. E.g expose last_avail_idx to > >>>> driver might be sufficient? > >>>> > >>>> Another possible issue is, looks like you need to deal with different kinds > >>>> of ring layouts e.g packed virtqueues. > >>>> > >>>> Thanks > >>> I'm not sure I understand your comments here. > >>> All these would be guest-visible extensions. > >> > >> Looks not, it only introduces a shared memory between qemu and > >> vhost-user backend? > >> > >> > >>> Possible for sure but how is this related to > >>> a patch supporting transparent reconnects? > >> > >> I might miss something. My understanding is that we support transparent > >> reconnects, but we can't deduce an accurate last_avail_idx and this is > >> what capability this series try to add. To me, this series is functional > >> equivalent to expose last_avail_idx (or avail_idx_cons) in available > >> ring. So the information is inside guest memory, vhost-user backend can > >> access it and update it directly. I believe this is some modern NIC did > >> as well (but index is in MMIO area of course). > >> > > Hi Jason, > > > > If my understand is correct, it might be not enough to only expose > > last_avail_idx. > > Because we would not process descriptors in the same order in which they have > > been made available sometimes. If so, we can't get correct inflight > > I/O information > > from available ring. > > > You can get this with the help of the both used ring and last_avail_idx > I believe. Or maybe you can give us an example? > A simple example, we assume ring size is 8: 1. guest fill avail ring avail ring: 0 1 2 3 4 5 6 7 used ring: 2. vhost-user backend complete 4,5,6,7 and fill used ring avail ring: 0 1 2 3 4 5 6 7 used ring: 4 5 6 7 3. guest fill avail ring again avail ring: 4 5 6 7 4 5 6 7 used ring: 4 5 6 7 4. vhost-user backend crash The inflight descriptors 0, 1, 2, 3 lost. Thanks, Yongji
On 2018/12/12 上午11:21, Yongji Xie wrote: > On Wed, 12 Dec 2018 at 11:00, Jason Wang <jasowang@redhat.com> wrote: >> >> On 2018/12/12 上午10:48, Yongji Xie wrote: >>> On Mon, 10 Dec 2018 at 17:32, Jason Wang <jasowang@redhat.com> wrote: >>>> On 2018/12/6 下午9:59, Michael S. Tsirkin wrote: >>>>> On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote: >>>>>> On 2018/12/6 下午2:35,elohimes@gmail.com wrote: >>>>>>> From: Xie Yongji<xieyongji@baidu.com> >>>>>>> >>>>>>> This patchset is aimed at supporting qemu to reconnect >>>>>>> vhost-user-blk backend after vhost-user-blk backend crash or >>>>>>> restart. >>>>>>> >>>>>>> The patch 1 tries to implenment the sync connection for >>>>>>> "reconnect socket". >>>>>>> >>>>>>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT >>>>>>> to support offering shared memory to backend to record >>>>>>> its inflight I/O. >>>>>>> >>>>>>> The patch 3,4 are the corresponding libvhost-user patches of >>>>>>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT. >>>>>>> >>>>>>> The patch 5 supports vhost-user-blk to reconnect backend when >>>>>>> connection closed. >>>>>>> >>>>>>> The patch 6 tells qemu that we support reconnecting now. >>>>>>> >>>>>>> To use it, we could start qemu with: >>>>>>> >>>>>>> qemu-system-x86_64 \ >>>>>>> -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \ >>>>>>> -device vhost-user-blk-pci,chardev=char0 \ >>>>>>> >>>>>>> and start vhost-user-blk backend with: >>>>>>> >>>>>>> vhost-user-blk -b /path/file -s /path/vhost.socket >>>>>>> >>>>>>> Then we can restart vhost-user-blk at any time during VM running. >>>>>> I wonder whether or not it's better to handle this at the level of virtio >>>>>> protocol itself instead of vhost-user level. E.g expose last_avail_idx to >>>>>> driver might be sufficient? >>>>>> >>>>>> Another possible issue is, looks like you need to deal with different kinds >>>>>> of ring layouts e.g packed virtqueues. >>>>>> >>>>>> Thanks >>>>> I'm not sure I understand your comments here. >>>>> All these would be guest-visible extensions. >>>> Looks not, it only introduces a shared memory between qemu and >>>> vhost-user backend? >>>> >>>> >>>>> Possible for sure but how is this related to >>>>> a patch supporting transparent reconnects? >>>> I might miss something. My understanding is that we support transparent >>>> reconnects, but we can't deduce an accurate last_avail_idx and this is >>>> what capability this series try to add. To me, this series is functional >>>> equivalent to expose last_avail_idx (or avail_idx_cons) in available >>>> ring. So the information is inside guest memory, vhost-user backend can >>>> access it and update it directly. I believe this is some modern NIC did >>>> as well (but index is in MMIO area of course). >>>> >>> Hi Jason, >>> >>> If my understand is correct, it might be not enough to only expose >>> last_avail_idx. >>> Because we would not process descriptors in the same order in which they have >>> been made available sometimes. If so, we can't get correct inflight >>> I/O information >>> from available ring. >> >> You can get this with the help of the both used ring and last_avail_idx >> I believe. Or maybe you can give us an example? >> > A simple example, we assume ring size is 8: > > 1. guest fill avail ring > > avail ring: 0 1 2 3 4 5 6 7 > used ring: > > 2. vhost-user backend complete 4,5,6,7 and fill used ring > > avail ring: 0 1 2 3 4 5 6 7 > used ring: 4 5 6 7 > > 3. guest fill avail ring again > > avail ring: 4 5 6 7 4 5 6 7 > used ring: 4 5 6 7 > > 4. vhost-user backend crash > > The inflight descriptors 0, 1, 2, 3 lost. > > Thanks, > Yongji Ok, then we can simply forbid increasing the avail_idx in this case? Basically, it's a question of whether or not it's better to done it in the level of virtio instead of vhost. I'm pretty sure if we expose sufficient information, it could be done without touching vhost-user. And we won't deal with e.g migration and other cases. Thanks
On Wed, 12 Dec 2018 at 12:07, Jason Wang <jasowang@redhat.com> wrote: > > > On 2018/12/12 上午11:21, Yongji Xie wrote: > > On Wed, 12 Dec 2018 at 11:00, Jason Wang <jasowang@redhat.com> wrote: > >> > >> On 2018/12/12 上午10:48, Yongji Xie wrote: > >>> On Mon, 10 Dec 2018 at 17:32, Jason Wang <jasowang@redhat.com> wrote: > >>>> On 2018/12/6 下午9:59, Michael S. Tsirkin wrote: > >>>>> On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote: > >>>>>> On 2018/12/6 下午2:35,elohimes@gmail.com wrote: > >>>>>>> From: Xie Yongji<xieyongji@baidu.com> > >>>>>>> > >>>>>>> This patchset is aimed at supporting qemu to reconnect > >>>>>>> vhost-user-blk backend after vhost-user-blk backend crash or > >>>>>>> restart. > >>>>>>> > >>>>>>> The patch 1 tries to implenment the sync connection for > >>>>>>> "reconnect socket". > >>>>>>> > >>>>>>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT > >>>>>>> to support offering shared memory to backend to record > >>>>>>> its inflight I/O. > >>>>>>> > >>>>>>> The patch 3,4 are the corresponding libvhost-user patches of > >>>>>>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT. > >>>>>>> > >>>>>>> The patch 5 supports vhost-user-blk to reconnect backend when > >>>>>>> connection closed. > >>>>>>> > >>>>>>> The patch 6 tells qemu that we support reconnecting now. > >>>>>>> > >>>>>>> To use it, we could start qemu with: > >>>>>>> > >>>>>>> qemu-system-x86_64 \ > >>>>>>> -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \ > >>>>>>> -device vhost-user-blk-pci,chardev=char0 \ > >>>>>>> > >>>>>>> and start vhost-user-blk backend with: > >>>>>>> > >>>>>>> vhost-user-blk -b /path/file -s /path/vhost.socket > >>>>>>> > >>>>>>> Then we can restart vhost-user-blk at any time during VM running. > >>>>>> I wonder whether or not it's better to handle this at the level of virtio > >>>>>> protocol itself instead of vhost-user level. E.g expose last_avail_idx to > >>>>>> driver might be sufficient? > >>>>>> > >>>>>> Another possible issue is, looks like you need to deal with different kinds > >>>>>> of ring layouts e.g packed virtqueues. > >>>>>> > >>>>>> Thanks > >>>>> I'm not sure I understand your comments here. > >>>>> All these would be guest-visible extensions. > >>>> Looks not, it only introduces a shared memory between qemu and > >>>> vhost-user backend? > >>>> > >>>> > >>>>> Possible for sure but how is this related to > >>>>> a patch supporting transparent reconnects? > >>>> I might miss something. My understanding is that we support transparent > >>>> reconnects, but we can't deduce an accurate last_avail_idx and this is > >>>> what capability this series try to add. To me, this series is functional > >>>> equivalent to expose last_avail_idx (or avail_idx_cons) in available > >>>> ring. So the information is inside guest memory, vhost-user backend can > >>>> access it and update it directly. I believe this is some modern NIC did > >>>> as well (but index is in MMIO area of course). > >>>> > >>> Hi Jason, > >>> > >>> If my understand is correct, it might be not enough to only expose > >>> last_avail_idx. > >>> Because we would not process descriptors in the same order in which they have > >>> been made available sometimes. If so, we can't get correct inflight > >>> I/O information > >>> from available ring. > >> > >> You can get this with the help of the both used ring and last_avail_idx > >> I believe. Or maybe you can give us an example? > >> > > A simple example, we assume ring size is 8: > > > > 1. guest fill avail ring > > > > avail ring: 0 1 2 3 4 5 6 7 > > used ring: > > > > 2. vhost-user backend complete 4,5,6,7 and fill used ring > > > > avail ring: 0 1 2 3 4 5 6 7 > > used ring: 4 5 6 7 > > > > 3. guest fill avail ring again > > > > avail ring: 4 5 6 7 4 5 6 7 > > used ring: 4 5 6 7 > > > > 4. vhost-user backend crash > > > > The inflight descriptors 0, 1, 2, 3 lost. > > > > Thanks, > > Yongji > > > Ok, then we can simply forbid increasing the avail_idx in this case? > > Basically, it's a question of whether or not it's better to done it in > the level of virtio instead of vhost. I'm pretty sure if we expose > sufficient information, it could be done without touching vhost-user. > And we won't deal with e.g migration and other cases. > OK, I get your point. That's indeed an alternative way. But this feature seems to be only useful to vhost-user backend. I'm not sure whether it make sense to touch virtio protocol for this feature. Thanks, Yongji
On 2018/12/12 下午2:41, Yongji Xie wrote: > On Wed, 12 Dec 2018 at 12:07, Jason Wang <jasowang@redhat.com> wrote: >> >> On 2018/12/12 上午11:21, Yongji Xie wrote: >>> On Wed, 12 Dec 2018 at 11:00, Jason Wang <jasowang@redhat.com> wrote: >>>> On 2018/12/12 上午10:48, Yongji Xie wrote: >>>>> On Mon, 10 Dec 2018 at 17:32, Jason Wang <jasowang@redhat.com> wrote: >>>>>> On 2018/12/6 下午9:59, Michael S. Tsirkin wrote: >>>>>>> On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote: >>>>>>>> On 2018/12/6 下午2:35,elohimes@gmail.com wrote: >>>>>>>>> From: Xie Yongji<xieyongji@baidu.com> >>>>>>>>> >>>>>>>>> This patchset is aimed at supporting qemu to reconnect >>>>>>>>> vhost-user-blk backend after vhost-user-blk backend crash or >>>>>>>>> restart. >>>>>>>>> >>>>>>>>> The patch 1 tries to implenment the sync connection for >>>>>>>>> "reconnect socket". >>>>>>>>> >>>>>>>>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT >>>>>>>>> to support offering shared memory to backend to record >>>>>>>>> its inflight I/O. >>>>>>>>> >>>>>>>>> The patch 3,4 are the corresponding libvhost-user patches of >>>>>>>>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT. >>>>>>>>> >>>>>>>>> The patch 5 supports vhost-user-blk to reconnect backend when >>>>>>>>> connection closed. >>>>>>>>> >>>>>>>>> The patch 6 tells qemu that we support reconnecting now. >>>>>>>>> >>>>>>>>> To use it, we could start qemu with: >>>>>>>>> >>>>>>>>> qemu-system-x86_64 \ >>>>>>>>> -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \ >>>>>>>>> -device vhost-user-blk-pci,chardev=char0 \ >>>>>>>>> >>>>>>>>> and start vhost-user-blk backend with: >>>>>>>>> >>>>>>>>> vhost-user-blk -b /path/file -s /path/vhost.socket >>>>>>>>> >>>>>>>>> Then we can restart vhost-user-blk at any time during VM running. >>>>>>>> I wonder whether or not it's better to handle this at the level of virtio >>>>>>>> protocol itself instead of vhost-user level. E.g expose last_avail_idx to >>>>>>>> driver might be sufficient? >>>>>>>> >>>>>>>> Another possible issue is, looks like you need to deal with different kinds >>>>>>>> of ring layouts e.g packed virtqueues. >>>>>>>> >>>>>>>> Thanks >>>>>>> I'm not sure I understand your comments here. >>>>>>> All these would be guest-visible extensions. >>>>>> Looks not, it only introduces a shared memory between qemu and >>>>>> vhost-user backend? >>>>>> >>>>>> >>>>>>> Possible for sure but how is this related to >>>>>>> a patch supporting transparent reconnects? >>>>>> I might miss something. My understanding is that we support transparent >>>>>> reconnects, but we can't deduce an accurate last_avail_idx and this is >>>>>> what capability this series try to add. To me, this series is functional >>>>>> equivalent to expose last_avail_idx (or avail_idx_cons) in available >>>>>> ring. So the information is inside guest memory, vhost-user backend can >>>>>> access it and update it directly. I believe this is some modern NIC did >>>>>> as well (but index is in MMIO area of course). >>>>>> >>>>> Hi Jason, >>>>> >>>>> If my understand is correct, it might be not enough to only expose >>>>> last_avail_idx. >>>>> Because we would not process descriptors in the same order in which they have >>>>> been made available sometimes. If so, we can't get correct inflight >>>>> I/O information >>>>> from available ring. >>>> You can get this with the help of the both used ring and last_avail_idx >>>> I believe. Or maybe you can give us an example? >>>> >>> A simple example, we assume ring size is 8: >>> >>> 1. guest fill avail ring >>> >>> avail ring: 0 1 2 3 4 5 6 7 >>> used ring: >>> >>> 2. vhost-user backend complete 4,5,6,7 and fill used ring >>> >>> avail ring: 0 1 2 3 4 5 6 7 >>> used ring: 4 5 6 7 >>> >>> 3. guest fill avail ring again >>> >>> avail ring: 4 5 6 7 4 5 6 7 >>> used ring: 4 5 6 7 >>> >>> 4. vhost-user backend crash >>> >>> The inflight descriptors 0, 1, 2, 3 lost. >>> >>> Thanks, >>> Yongji >> >> Ok, then we can simply forbid increasing the avail_idx in this case? >> >> Basically, it's a question of whether or not it's better to done it in >> the level of virtio instead of vhost. I'm pretty sure if we expose >> sufficient information, it could be done without touching vhost-user. >> And we won't deal with e.g migration and other cases. >> > OK, I get your point. That's indeed an alternative way. But this feature seems > to be only useful to vhost-user backend. I admit I could not think of a use case other than vhost-user. > I'm not sure whether it make sense to > touch virtio protocol for this feature. Some possible advantages: - Feature could be determined and noticed by user or management layer. - There's no need to invent ring layout specific protocol to record in flight descriptors. E.g if my understanding is correct, for this series and for the example above, it still can not work for packed virtqueue since descriptor id is not sufficient (descriptor could be overwritten by used one). You probably need to have a (partial) copy of descriptor ring for this. - No need to deal with migration, all information was in guest memory. Thanks > > Thanks, > Yongji
On Wed, 12 Dec 2018 at 15:47, Jason Wang <jasowang@redhat.com> wrote: > > > On 2018/12/12 下午2:41, Yongji Xie wrote: > > On Wed, 12 Dec 2018 at 12:07, Jason Wang <jasowang@redhat.com> wrote: > >> > >> On 2018/12/12 上午11:21, Yongji Xie wrote: > >>> On Wed, 12 Dec 2018 at 11:00, Jason Wang <jasowang@redhat.com> wrote: > >>>> On 2018/12/12 上午10:48, Yongji Xie wrote: > >>>>> On Mon, 10 Dec 2018 at 17:32, Jason Wang <jasowang@redhat.com> wrote: > >>>>>> On 2018/12/6 下午9:59, Michael S. Tsirkin wrote: > >>>>>>> On Thu, Dec 06, 2018 at 09:57:22PM +0800, Jason Wang wrote: > >>>>>>>> On 2018/12/6 下午2:35,elohimes@gmail.com wrote: > >>>>>>>>> From: Xie Yongji<xieyongji@baidu.com> > >>>>>>>>> > >>>>>>>>> This patchset is aimed at supporting qemu to reconnect > >>>>>>>>> vhost-user-blk backend after vhost-user-blk backend crash or > >>>>>>>>> restart. > >>>>>>>>> > >>>>>>>>> The patch 1 tries to implenment the sync connection for > >>>>>>>>> "reconnect socket". > >>>>>>>>> > >>>>>>>>> The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT > >>>>>>>>> to support offering shared memory to backend to record > >>>>>>>>> its inflight I/O. > >>>>>>>>> > >>>>>>>>> The patch 3,4 are the corresponding libvhost-user patches of > >>>>>>>>> patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT. > >>>>>>>>> > >>>>>>>>> The patch 5 supports vhost-user-blk to reconnect backend when > >>>>>>>>> connection closed. > >>>>>>>>> > >>>>>>>>> The patch 6 tells qemu that we support reconnecting now. > >>>>>>>>> > >>>>>>>>> To use it, we could start qemu with: > >>>>>>>>> > >>>>>>>>> qemu-system-x86_64 \ > >>>>>>>>> -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \ > >>>>>>>>> -device vhost-user-blk-pci,chardev=char0 \ > >>>>>>>>> > >>>>>>>>> and start vhost-user-blk backend with: > >>>>>>>>> > >>>>>>>>> vhost-user-blk -b /path/file -s /path/vhost.socket > >>>>>>>>> > >>>>>>>>> Then we can restart vhost-user-blk at any time during VM running. > >>>>>>>> I wonder whether or not it's better to handle this at the level of virtio > >>>>>>>> protocol itself instead of vhost-user level. E.g expose last_avail_idx to > >>>>>>>> driver might be sufficient? > >>>>>>>> > >>>>>>>> Another possible issue is, looks like you need to deal with different kinds > >>>>>>>> of ring layouts e.g packed virtqueues. > >>>>>>>> > >>>>>>>> Thanks > >>>>>>> I'm not sure I understand your comments here. > >>>>>>> All these would be guest-visible extensions. > >>>>>> Looks not, it only introduces a shared memory between qemu and > >>>>>> vhost-user backend? > >>>>>> > >>>>>> > >>>>>>> Possible for sure but how is this related to > >>>>>>> a patch supporting transparent reconnects? > >>>>>> I might miss something. My understanding is that we support transparent > >>>>>> reconnects, but we can't deduce an accurate last_avail_idx and this is > >>>>>> what capability this series try to add. To me, this series is functional > >>>>>> equivalent to expose last_avail_idx (or avail_idx_cons) in available > >>>>>> ring. So the information is inside guest memory, vhost-user backend can > >>>>>> access it and update it directly. I believe this is some modern NIC did > >>>>>> as well (but index is in MMIO area of course). > >>>>>> > >>>>> Hi Jason, > >>>>> > >>>>> If my understand is correct, it might be not enough to only expose > >>>>> last_avail_idx. > >>>>> Because we would not process descriptors in the same order in which they have > >>>>> been made available sometimes. If so, we can't get correct inflight > >>>>> I/O information > >>>>> from available ring. > >>>> You can get this with the help of the both used ring and last_avail_idx > >>>> I believe. Or maybe you can give us an example? > >>>> > >>> A simple example, we assume ring size is 8: > >>> > >>> 1. guest fill avail ring > >>> > >>> avail ring: 0 1 2 3 4 5 6 7 > >>> used ring: > >>> > >>> 2. vhost-user backend complete 4,5,6,7 and fill used ring > >>> > >>> avail ring: 0 1 2 3 4 5 6 7 > >>> used ring: 4 5 6 7 > >>> > >>> 3. guest fill avail ring again > >>> > >>> avail ring: 4 5 6 7 4 5 6 7 > >>> used ring: 4 5 6 7 > >>> > >>> 4. vhost-user backend crash > >>> > >>> The inflight descriptors 0, 1, 2, 3 lost. > >>> > >>> Thanks, > >>> Yongji > >> > >> Ok, then we can simply forbid increasing the avail_idx in this case? > >> > >> Basically, it's a question of whether or not it's better to done it in > >> the level of virtio instead of vhost. I'm pretty sure if we expose > >> sufficient information, it could be done without touching vhost-user. > >> And we won't deal with e.g migration and other cases. > >> > > OK, I get your point. That's indeed an alternative way. But this feature seems > > to be only useful to vhost-user backend. > > > I admit I could not think of a use case other than vhost-user. > > > > I'm not sure whether it make sense to > > touch virtio protocol for this feature. > > > Some possible advantages: > > - Feature could be determined and noticed by user or management layer. > > - There's no need to invent ring layout specific protocol to record in > flight descriptors. E.g if my understanding is correct, for this series > and for the example above, it still can not work for packed virtqueue > since descriptor id is not sufficient (descriptor could be overwritten > by used one). You probably need to have a (partial) copy of descriptor > ring for this. > > - No need to deal with migration, all information was in guest memory. > Yes, we have those advantages. But seems like handle this in vhost-user level could be easier to be maintained in production environment. We can support old guest. And the bug fix will not depend on guest kernel updating. Thanks, Yongji
On 2018/12/12 下午5:18, Yongji Xie wrote: >>>> Ok, then we can simply forbid increasing the avail_idx in this case? >>>> >>>> Basically, it's a question of whether or not it's better to done it in >>>> the level of virtio instead of vhost. I'm pretty sure if we expose >>>> sufficient information, it could be done without touching vhost-user. >>>> And we won't deal with e.g migration and other cases. >>>> >>> OK, I get your point. That's indeed an alternative way. But this feature seems >>> to be only useful to vhost-user backend. >> I admit I could not think of a use case other than vhost-user. >> >> >>> I'm not sure whether it make sense to >>> touch virtio protocol for this feature. >> Some possible advantages: >> >> - Feature could be determined and noticed by user or management layer. >> >> - There's no need to invent ring layout specific protocol to record in >> flight descriptors. E.g if my understanding is correct, for this series >> and for the example above, it still can not work for packed virtqueue >> since descriptor id is not sufficient (descriptor could be overwritten >> by used one). You probably need to have a (partial) copy of descriptor >> ring for this. >> >> - No need to deal with migration, all information was in guest memory. >> > Yes, we have those advantages. But seems like handle this in vhost-user > level could be easier to be maintained in production environment. We can > support old guest. And the bug fix will not depend on guest kernel updating. Yes. But the my main concern is the layout specific data structure. If it could be done through a generic structure (can it?), it would be fine. Otherwise, I believe we don't want another negotiation about what kind of layout that backend support for reconnect. Thanks > > Thanks, > Yongji
On Thu, 13 Dec 2018 at 10:58, Jason Wang <jasowang@redhat.com> wrote: > > > On 2018/12/12 下午5:18, Yongji Xie wrote: > >>>> Ok, then we can simply forbid increasing the avail_idx in this case? > >>>> > >>>> Basically, it's a question of whether or not it's better to done it in > >>>> the level of virtio instead of vhost. I'm pretty sure if we expose > >>>> sufficient information, it could be done without touching vhost-user. > >>>> And we won't deal with e.g migration and other cases. > >>>> > >>> OK, I get your point. That's indeed an alternative way. But this feature seems > >>> to be only useful to vhost-user backend. > >> I admit I could not think of a use case other than vhost-user. > >> > >> > >>> I'm not sure whether it make sense to > >>> touch virtio protocol for this feature. > >> Some possible advantages: > >> > >> - Feature could be determined and noticed by user or management layer. > >> > >> - There's no need to invent ring layout specific protocol to record in > >> flight descriptors. E.g if my understanding is correct, for this series > >> and for the example above, it still can not work for packed virtqueue > >> since descriptor id is not sufficient (descriptor could be overwritten > >> by used one). You probably need to have a (partial) copy of descriptor > >> ring for this. > >> > >> - No need to deal with migration, all information was in guest memory. > >> > > Yes, we have those advantages. But seems like handle this in vhost-user > > level could be easier to be maintained in production environment. We can > > support old guest. And the bug fix will not depend on guest kernel updating. > > > Yes. But the my main concern is the layout specific data structure. If > it could be done through a generic structure (can it?), it would be > fine. Otherwise, I believe we don't want another negotiation about what > kind of layout that backend support for reconnect. > Yes, the current layout in shared memory didn't support packed virtqueue because the information of one descriptor in descriptor ring will not be available once device fetch it. I also thought about a generic structure before. But I failed... So I tried another way to acheive that in this series. In QEMU side, we just provide a shared memory to backend and we didn't define anything for this memory. In backend side, they should know how to use those memory to record inflight I/O no matter what kind of virtqueue they used. Thus, If we updates virtqueue for new virtio spec in the feature, we don't need to touch QEMU and guest. What do you think about it? Thanks, Yongji
On Thu, Dec 06, 2018 at 02:35:46PM +0800, elohimes@gmail.com wrote: > From: Xie Yongji <xieyongji@baidu.com> > > This patchset is aimed at supporting qemu to reconnect > vhost-user-blk backend after vhost-user-blk backend crash or > restart. > > The patch 1 tries to implenment the sync connection for > "reconnect socket". > > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT > to support offering shared memory to backend to record > its inflight I/O. > > The patch 3,4 are the corresponding libvhost-user patches of > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT. > > The patch 5 supports vhost-user-blk to reconnect backend when > connection closed. > > The patch 6 tells qemu that we support reconnecting now. > > To use it, we could start qemu with: > > qemu-system-x86_64 \ > -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \ > -device vhost-user-blk-pci,chardev=char0 \ > > and start vhost-user-blk backend with: > > vhost-user-blk -b /path/file -s /path/vhost.socket > > Then we can restart vhost-user-blk at any time during VM running. > > Xie Yongji (6): > char-socket: Enable "wait" option for client mode > vhost-user: Add shared memory to record inflight I/O > libvhost-user: Introduce vu_queue_map_desc() > libvhost-user: Support recording inflight I/O in shared memory > vhost-user-blk: Add support for reconnecting backend > contrib/vhost-user-blk: enable inflight I/O recording What is missing in all this is documentation. Specifically docs/interop/vhost-user.txt. At a high level the design is IMO a good one. However I would prefer reading the protocol first before the code. So here's what I managed to figure out, and it matches how I imagined it would work when I was still thinking about out of order for net: - backend allocates memory to keep its stuff around - sends it to qemu so it can maintain it - gets it back on reconnect format and size etc are all up to the backend, a good implementation would probably implement some kind of versioning. Is this what this implements? > chardev/char-socket.c | 5 +- > contrib/libvhost-user/libvhost-user.c | 215 ++++++++++++++++++++---- > contrib/libvhost-user/libvhost-user.h | 19 +++ > contrib/vhost-user-blk/vhost-user-blk.c | 3 +- > hw/block/vhost-user-blk.c | 169 +++++++++++++++++-- > hw/virtio/vhost-user.c | 69 ++++++++ > hw/virtio/vhost.c | 8 + > include/hw/virtio/vhost-backend.h | 4 + > include/hw/virtio/vhost-user-blk.h | 4 + > include/hw/virtio/vhost-user.h | 8 + > 10 files changed, 452 insertions(+), 52 deletions(-) > > -- > 2.17.1
On Thu, Dec 13, 2018 at 11:41:06AM +0800, Yongji Xie wrote: > On Thu, 13 Dec 2018 at 10:58, Jason Wang <jasowang@redhat.com> wrote: > > > > > > On 2018/12/12 下午5:18, Yongji Xie wrote: > > >>>> Ok, then we can simply forbid increasing the avail_idx in this case? > > >>>> > > >>>> Basically, it's a question of whether or not it's better to done it in > > >>>> the level of virtio instead of vhost. I'm pretty sure if we expose > > >>>> sufficient information, it could be done without touching vhost-user. > > >>>> And we won't deal with e.g migration and other cases. > > >>>> > > >>> OK, I get your point. That's indeed an alternative way. But this feature seems > > >>> to be only useful to vhost-user backend. > > >> I admit I could not think of a use case other than vhost-user. > > >> > > >> > > >>> I'm not sure whether it make sense to > > >>> touch virtio protocol for this feature. > > >> Some possible advantages: > > >> > > >> - Feature could be determined and noticed by user or management layer. > > >> > > >> - There's no need to invent ring layout specific protocol to record in > > >> flight descriptors. E.g if my understanding is correct, for this series > > >> and for the example above, it still can not work for packed virtqueue > > >> since descriptor id is not sufficient (descriptor could be overwritten > > >> by used one). You probably need to have a (partial) copy of descriptor > > >> ring for this. > > >> > > >> - No need to deal with migration, all information was in guest memory. > > >> > > > Yes, we have those advantages. But seems like handle this in vhost-user > > > level could be easier to be maintained in production environment. We can > > > support old guest. And the bug fix will not depend on guest kernel updating. > > > > > > Yes. But the my main concern is the layout specific data structure. If > > it could be done through a generic structure (can it?), it would be > > fine. Otherwise, I believe we don't want another negotiation about what > > kind of layout that backend support for reconnect. > > > > Yes, the current layout in shared memory didn't support packed virtqueue because > the information of one descriptor in descriptor ring will not be > available once device fetch it. > > I also thought about a generic structure before. But I failed... So I > tried another way > to acheive that in this series. In QEMU side, we just provide a shared > memory to backend > and we didn't define anything for this memory. In backend side, they > should know how to > use those memory to record inflight I/O no matter what kind of > virtqueue they used. > Thus, If we updates virtqueue for new virtio spec in the feature, we > don't need to touch > QEMU and guest. What do you think about it? > > Thanks, > Yongji I think that's a good direction to take, yes. Backends need to be very careful about the layout, with versioning etc.
On Thu, 13 Dec 2018 at 22:45, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Thu, Dec 06, 2018 at 02:35:46PM +0800, elohimes@gmail.com wrote: > > From: Xie Yongji <xieyongji@baidu.com> > > > > This patchset is aimed at supporting qemu to reconnect > > vhost-user-blk backend after vhost-user-blk backend crash or > > restart. > > > > The patch 1 tries to implenment the sync connection for > > "reconnect socket". > > > > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT > > to support offering shared memory to backend to record > > its inflight I/O. > > > > The patch 3,4 are the corresponding libvhost-user patches of > > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT. > > > > The patch 5 supports vhost-user-blk to reconnect backend when > > connection closed. > > > > The patch 6 tells qemu that we support reconnecting now. > > > > To use it, we could start qemu with: > > > > qemu-system-x86_64 \ > > -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \ > > -device vhost-user-blk-pci,chardev=char0 \ > > > > and start vhost-user-blk backend with: > > > > vhost-user-blk -b /path/file -s /path/vhost.socket > > > > Then we can restart vhost-user-blk at any time during VM running. > > > > Xie Yongji (6): > > char-socket: Enable "wait" option for client mode > > vhost-user: Add shared memory to record inflight I/O > > libvhost-user: Introduce vu_queue_map_desc() > > libvhost-user: Support recording inflight I/O in shared memory > > vhost-user-blk: Add support for reconnecting backend > > contrib/vhost-user-blk: enable inflight I/O recording > > What is missing in all this is documentation. > Specifically docs/interop/vhost-user.txt. > > At a high level the design is IMO a good one. > > However I would prefer reading the protocol first before > the code. > > So here's what I managed to figure out, and it matches > how I imagined it would work when I was still > thinking about out of order for net: > > - backend allocates memory to keep its stuff around > - sends it to qemu so it can maintain it > - gets it back on reconnect > > format and size etc are all up to the backend, > a good implementation would probably implement some > kind of versioning. > > Is this what this implements? > Definitely, yes. And the comments looks good to me. Qemu get size and version from backend, then allocate memory and send it back with version. Backend knows how to use the memory according to the version. If we do that, should we allocate the memory per device rather than per virtqueue? Thanks, Yongji
On Fri, Dec 14, 2018 at 09:56:41AM +0800, Yongji Xie wrote: > On Thu, 13 Dec 2018 at 22:45, Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Thu, Dec 06, 2018 at 02:35:46PM +0800, elohimes@gmail.com wrote: > > > From: Xie Yongji <xieyongji@baidu.com> > > > > > > This patchset is aimed at supporting qemu to reconnect > > > vhost-user-blk backend after vhost-user-blk backend crash or > > > restart. > > > > > > The patch 1 tries to implenment the sync connection for > > > "reconnect socket". > > > > > > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT > > > to support offering shared memory to backend to record > > > its inflight I/O. > > > > > > The patch 3,4 are the corresponding libvhost-user patches of > > > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT. > > > > > > The patch 5 supports vhost-user-blk to reconnect backend when > > > connection closed. > > > > > > The patch 6 tells qemu that we support reconnecting now. > > > > > > To use it, we could start qemu with: > > > > > > qemu-system-x86_64 \ > > > -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \ > > > -device vhost-user-blk-pci,chardev=char0 \ > > > > > > and start vhost-user-blk backend with: > > > > > > vhost-user-blk -b /path/file -s /path/vhost.socket > > > > > > Then we can restart vhost-user-blk at any time during VM running. > > > > > > Xie Yongji (6): > > > char-socket: Enable "wait" option for client mode > > > vhost-user: Add shared memory to record inflight I/O > > > libvhost-user: Introduce vu_queue_map_desc() > > > libvhost-user: Support recording inflight I/O in shared memory > > > vhost-user-blk: Add support for reconnecting backend > > > contrib/vhost-user-blk: enable inflight I/O recording > > > > What is missing in all this is documentation. > > Specifically docs/interop/vhost-user.txt. > > > > At a high level the design is IMO a good one. > > > > However I would prefer reading the protocol first before > > the code. > > > > So here's what I managed to figure out, and it matches > > how I imagined it would work when I was still > > thinking about out of order for net: > > > > - backend allocates memory to keep its stuff around > > - sends it to qemu so it can maintain it > > - gets it back on reconnect > > > > format and size etc are all up to the backend, > > a good implementation would probably implement some > > kind of versioning. > > > > Is this what this implements? > > > > Definitely, yes. And the comments looks good to me. Qemu get size and > version from backend, then allocate memory and send it back with > version. Backend knows how to use the memory according to the version. > If we do that, should we allocate the memory per device rather than > per virtqueue? > > Thanks, > Yongji It's up to you. Maybe both.
On Fri, 14 Dec 2018 at 10:20, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Fri, Dec 14, 2018 at 09:56:41AM +0800, Yongji Xie wrote: > > On Thu, 13 Dec 2018 at 22:45, Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > On Thu, Dec 06, 2018 at 02:35:46PM +0800, elohimes@gmail.com wrote: > > > > From: Xie Yongji <xieyongji@baidu.com> > > > > > > > > This patchset is aimed at supporting qemu to reconnect > > > > vhost-user-blk backend after vhost-user-blk backend crash or > > > > restart. > > > > > > > > The patch 1 tries to implenment the sync connection for > > > > "reconnect socket". > > > > > > > > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT > > > > to support offering shared memory to backend to record > > > > its inflight I/O. > > > > > > > > The patch 3,4 are the corresponding libvhost-user patches of > > > > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT. > > > > > > > > The patch 5 supports vhost-user-blk to reconnect backend when > > > > connection closed. > > > > > > > > The patch 6 tells qemu that we support reconnecting now. > > > > > > > > To use it, we could start qemu with: > > > > > > > > qemu-system-x86_64 \ > > > > -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \ > > > > -device vhost-user-blk-pci,chardev=char0 \ > > > > > > > > and start vhost-user-blk backend with: > > > > > > > > vhost-user-blk -b /path/file -s /path/vhost.socket > > > > > > > > Then we can restart vhost-user-blk at any time during VM running. > > > > > > > > Xie Yongji (6): > > > > char-socket: Enable "wait" option for client mode > > > > vhost-user: Add shared memory to record inflight I/O > > > > libvhost-user: Introduce vu_queue_map_desc() > > > > libvhost-user: Support recording inflight I/O in shared memory > > > > vhost-user-blk: Add support for reconnecting backend > > > > contrib/vhost-user-blk: enable inflight I/O recording > > > > > > What is missing in all this is documentation. > > > Specifically docs/interop/vhost-user.txt. > > > > > > At a high level the design is IMO a good one. > > > > > > However I would prefer reading the protocol first before > > > the code. > > > > > > So here's what I managed to figure out, and it matches > > > how I imagined it would work when I was still > > > thinking about out of order for net: > > > > > > - backend allocates memory to keep its stuff around > > > - sends it to qemu so it can maintain it > > > - gets it back on reconnect > > > > > > format and size etc are all up to the backend, > > > a good implementation would probably implement some > > > kind of versioning. > > > > > > Is this what this implements? > > > > > > > Definitely, yes. And the comments looks good to me. Qemu get size and > > version from backend, then allocate memory and send it back with > > version. Backend knows how to use the memory according to the version. > > If we do that, should we allocate the memory per device rather than > > per virtqueue? > > > > Thanks, > > Yongji > > It's up to you. Maybe both. > OK. I think I may still keep it in virtqueue level in v2. Thank you. Thanks, Yongji
On 2018/12/13 下午10:56, Michael S. Tsirkin wrote: > On Thu, Dec 13, 2018 at 11:41:06AM +0800, Yongji Xie wrote: >> On Thu, 13 Dec 2018 at 10:58, Jason Wang <jasowang@redhat.com> wrote: >>> >>> On 2018/12/12 下午5:18, Yongji Xie wrote: >>>>>>> Ok, then we can simply forbid increasing the avail_idx in this case? >>>>>>> >>>>>>> Basically, it's a question of whether or not it's better to done it in >>>>>>> the level of virtio instead of vhost. I'm pretty sure if we expose >>>>>>> sufficient information, it could be done without touching vhost-user. >>>>>>> And we won't deal with e.g migration and other cases. >>>>>>> >>>>>> OK, I get your point. That's indeed an alternative way. But this feature seems >>>>>> to be only useful to vhost-user backend. >>>>> I admit I could not think of a use case other than vhost-user. >>>>> >>>>> >>>>>> I'm not sure whether it make sense to >>>>>> touch virtio protocol for this feature. >>>>> Some possible advantages: >>>>> >>>>> - Feature could be determined and noticed by user or management layer. >>>>> >>>>> - There's no need to invent ring layout specific protocol to record in >>>>> flight descriptors. E.g if my understanding is correct, for this series >>>>> and for the example above, it still can not work for packed virtqueue >>>>> since descriptor id is not sufficient (descriptor could be overwritten >>>>> by used one). You probably need to have a (partial) copy of descriptor >>>>> ring for this. >>>>> >>>>> - No need to deal with migration, all information was in guest memory. >>>>> >>>> Yes, we have those advantages. But seems like handle this in vhost-user >>>> level could be easier to be maintained in production environment. We can >>>> support old guest. And the bug fix will not depend on guest kernel updating. >>> >>> Yes. But the my main concern is the layout specific data structure. If >>> it could be done through a generic structure (can it?), it would be >>> fine. Otherwise, I believe we don't want another negotiation about what >>> kind of layout that backend support for reconnect. >>> >> Yes, the current layout in shared memory didn't support packed virtqueue because >> the information of one descriptor in descriptor ring will not be >> available once device fetch it. >> >> I also thought about a generic structure before. But I failed... So I >> tried another way >> to acheive that in this series. In QEMU side, we just provide a shared >> memory to backend >> and we didn't define anything for this memory. In backend side, they >> should know how to >> use those memory to record inflight I/O no matter what kind of >> virtqueue they used. >> Thus, If we updates virtqueue for new virtio spec in the feature, we >> don't need to touch >> QEMU and guest. What do you think about it? >> >> Thanks, >> Yongji > I think that's a good direction to take, yes. > Backends need to be very careful about the layout, > with versioning etc. I'm not sure this could be done 100% transparent to qemu. E.g you need to deal with reset I think and you need to carefully choose the size of the region. Which means you need negotiate the size, layout through backend. And need to deal with migration with them. This is another sin of splitting virtio dataplane from qemu anyway. Thanks >
On Fri, Dec 14, 2018 at 12:36:01PM +0800, Jason Wang wrote: > > On 2018/12/13 下午10:56, Michael S. Tsirkin wrote: > > On Thu, Dec 13, 2018 at 11:41:06AM +0800, Yongji Xie wrote: > > > On Thu, 13 Dec 2018 at 10:58, Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > On 2018/12/12 下午5:18, Yongji Xie wrote: > > > > > > > > Ok, then we can simply forbid increasing the avail_idx in this case? > > > > > > > > > > > > > > > > Basically, it's a question of whether or not it's better to done it in > > > > > > > > the level of virtio instead of vhost. I'm pretty sure if we expose > > > > > > > > sufficient information, it could be done without touching vhost-user. > > > > > > > > And we won't deal with e.g migration and other cases. > > > > > > > > > > > > > > > OK, I get your point. That's indeed an alternative way. But this feature seems > > > > > > > to be only useful to vhost-user backend. > > > > > > I admit I could not think of a use case other than vhost-user. > > > > > > > > > > > > > > > > > > > I'm not sure whether it make sense to > > > > > > > touch virtio protocol for this feature. > > > > > > Some possible advantages: > > > > > > > > > > > > - Feature could be determined and noticed by user or management layer. > > > > > > > > > > > > - There's no need to invent ring layout specific protocol to record in > > > > > > flight descriptors. E.g if my understanding is correct, for this series > > > > > > and for the example above, it still can not work for packed virtqueue > > > > > > since descriptor id is not sufficient (descriptor could be overwritten > > > > > > by used one). You probably need to have a (partial) copy of descriptor > > > > > > ring for this. > > > > > > > > > > > > - No need to deal with migration, all information was in guest memory. > > > > > > > > > > > Yes, we have those advantages. But seems like handle this in vhost-user > > > > > level could be easier to be maintained in production environment. We can > > > > > support old guest. And the bug fix will not depend on guest kernel updating. > > > > > > > > Yes. But the my main concern is the layout specific data structure. If > > > > it could be done through a generic structure (can it?), it would be > > > > fine. Otherwise, I believe we don't want another negotiation about what > > > > kind of layout that backend support for reconnect. > > > > > > > Yes, the current layout in shared memory didn't support packed virtqueue because > > > the information of one descriptor in descriptor ring will not be > > > available once device fetch it. > > > > > > I also thought about a generic structure before. But I failed... So I > > > tried another way > > > to acheive that in this series. In QEMU side, we just provide a shared > > > memory to backend > > > and we didn't define anything for this memory. In backend side, they > > > should know how to > > > use those memory to record inflight I/O no matter what kind of > > > virtqueue they used. > > > Thus, If we updates virtqueue for new virtio spec in the feature, we > > > don't need to touch > > > QEMU and guest. What do you think about it? > > > > > > Thanks, > > > Yongji > > I think that's a good direction to take, yes. > > Backends need to be very careful about the layout, > > with versioning etc. > > > I'm not sure this could be done 100% transparent to qemu. E.g you need to > deal with reset I think and you need to carefully choose the size of the > region. Which means you need negotiate the size, layout through backend. I am not sure I follow. The point is all this state is internal to the backend. QEMU does not care at all - it just helps a little by hanging on to it. > And > need to deal with migration with them. Good catch. There definitely is an issue in that you can not migrate with backend being disconnected: migration needs to flush the backend and we can't when it's disconnected. This needs to be addressed. I think it's cleanest to just defer migration until backend does reconnect. Backend cross version migration is all messed up in vhost user, I agree. There was a plan to fix it that was never executed unfortunately. Maxime, do you still plan to look into it? > This is another sin of splitting > virtio dataplane from qemu anyway. > > > Thanks It wasn't split as such - dpdk was never a part of qemu. We just enabled it without fuse hacks.
On Fri, Dec 14, 2018 at 10:33:54AM +0800, Yongji Xie wrote: > On Fri, 14 Dec 2018 at 10:20, Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Fri, Dec 14, 2018 at 09:56:41AM +0800, Yongji Xie wrote: > > > On Thu, 13 Dec 2018 at 22:45, Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > On Thu, Dec 06, 2018 at 02:35:46PM +0800, elohimes@gmail.com wrote: > > > > > From: Xie Yongji <xieyongji@baidu.com> > > > > > > > > > > This patchset is aimed at supporting qemu to reconnect > > > > > vhost-user-blk backend after vhost-user-blk backend crash or > > > > > restart. > > > > > > > > > > The patch 1 tries to implenment the sync connection for > > > > > "reconnect socket". > > > > > > > > > > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT > > > > > to support offering shared memory to backend to record > > > > > its inflight I/O. > > > > > > > > > > The patch 3,4 are the corresponding libvhost-user patches of > > > > > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT. > > > > > > > > > > The patch 5 supports vhost-user-blk to reconnect backend when > > > > > connection closed. > > > > > > > > > > The patch 6 tells qemu that we support reconnecting now. > > > > > > > > > > To use it, we could start qemu with: > > > > > > > > > > qemu-system-x86_64 \ > > > > > -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \ > > > > > -device vhost-user-blk-pci,chardev=char0 \ > > > > > > > > > > and start vhost-user-blk backend with: > > > > > > > > > > vhost-user-blk -b /path/file -s /path/vhost.socket > > > > > > > > > > Then we can restart vhost-user-blk at any time during VM running. > > > > > > > > > > Xie Yongji (6): > > > > > char-socket: Enable "wait" option for client mode > > > > > vhost-user: Add shared memory to record inflight I/O > > > > > libvhost-user: Introduce vu_queue_map_desc() > > > > > libvhost-user: Support recording inflight I/O in shared memory > > > > > vhost-user-blk: Add support for reconnecting backend > > > > > contrib/vhost-user-blk: enable inflight I/O recording > > > > > > > > What is missing in all this is documentation. > > > > Specifically docs/interop/vhost-user.txt. > > > > > > > > At a high level the design is IMO a good one. > > > > > > > > However I would prefer reading the protocol first before > > > > the code. > > > > > > > > So here's what I managed to figure out, and it matches > > > > how I imagined it would work when I was still > > > > thinking about out of order for net: > > > > > > > > - backend allocates memory to keep its stuff around > > > > - sends it to qemu so it can maintain it > > > > - gets it back on reconnect > > > > > > > > format and size etc are all up to the backend, > > > > a good implementation would probably implement some > > > > kind of versioning. > > > > > > > > Is this what this implements? > > > > > > > > > > Definitely, yes. And the comments looks good to me. Qemu get size and > > > version from backend, then allocate memory and send it back with > > > version. Backend knows how to use the memory according to the version. > > > If we do that, should we allocate the memory per device rather than > > > per virtqueue? > > > > > > Thanks, > > > Yongji > > > > It's up to you. Maybe both. > > > > OK. I think I may still keep it in virtqueue level in v2. Thank you. > > Thanks, > Yongji I'd actually add options for both, and backend can set size 0 if it wants to.
On Sat, 15 Dec 2018 at 05:23, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Fri, Dec 14, 2018 at 10:33:54AM +0800, Yongji Xie wrote: > > On Fri, 14 Dec 2018 at 10:20, Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > On Fri, Dec 14, 2018 at 09:56:41AM +0800, Yongji Xie wrote: > > > > On Thu, 13 Dec 2018 at 22:45, Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > On Thu, Dec 06, 2018 at 02:35:46PM +0800, elohimes@gmail.com wrote: > > > > > > From: Xie Yongji <xieyongji@baidu.com> > > > > > > > > > > > > This patchset is aimed at supporting qemu to reconnect > > > > > > vhost-user-blk backend after vhost-user-blk backend crash or > > > > > > restart. > > > > > > > > > > > > The patch 1 tries to implenment the sync connection for > > > > > > "reconnect socket". > > > > > > > > > > > > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT > > > > > > to support offering shared memory to backend to record > > > > > > its inflight I/O. > > > > > > > > > > > > The patch 3,4 are the corresponding libvhost-user patches of > > > > > > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT. > > > > > > > > > > > > The patch 5 supports vhost-user-blk to reconnect backend when > > > > > > connection closed. > > > > > > > > > > > > The patch 6 tells qemu that we support reconnecting now. > > > > > > > > > > > > To use it, we could start qemu with: > > > > > > > > > > > > qemu-system-x86_64 \ > > > > > > -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \ > > > > > > -device vhost-user-blk-pci,chardev=char0 \ > > > > > > > > > > > > and start vhost-user-blk backend with: > > > > > > > > > > > > vhost-user-blk -b /path/file -s /path/vhost.socket > > > > > > > > > > > > Then we can restart vhost-user-blk at any time during VM running. > > > > > > > > > > > > Xie Yongji (6): > > > > > > char-socket: Enable "wait" option for client mode > > > > > > vhost-user: Add shared memory to record inflight I/O > > > > > > libvhost-user: Introduce vu_queue_map_desc() > > > > > > libvhost-user: Support recording inflight I/O in shared memory > > > > > > vhost-user-blk: Add support for reconnecting backend > > > > > > contrib/vhost-user-blk: enable inflight I/O recording > > > > > > > > > > What is missing in all this is documentation. > > > > > Specifically docs/interop/vhost-user.txt. > > > > > > > > > > At a high level the design is IMO a good one. > > > > > > > > > > However I would prefer reading the protocol first before > > > > > the code. > > > > > > > > > > So here's what I managed to figure out, and it matches > > > > > how I imagined it would work when I was still > > > > > thinking about out of order for net: > > > > > > > > > > - backend allocates memory to keep its stuff around > > > > > - sends it to qemu so it can maintain it > > > > > - gets it back on reconnect > > > > > > > > > > format and size etc are all up to the backend, > > > > > a good implementation would probably implement some > > > > > kind of versioning. > > > > > > > > > > Is this what this implements? > > > > > > > > > > > > > Definitely, yes. And the comments looks good to me. Qemu get size and > > > > version from backend, then allocate memory and send it back with > > > > version. Backend knows how to use the memory according to the version. > > > > If we do that, should we allocate the memory per device rather than > > > > per virtqueue? > > > > > > > > Thanks, > > > > Yongji > > > > > > It's up to you. Maybe both. > > > > > > > OK. I think I may still keep it in virtqueue level in v2. Thank you. > > > > Thanks, > > Yongji > > I'd actually add options for both, and backend can set size 0 if it > wants to. > OK, let me try it in v2. Thanks, Yongji
From: Xie Yongji <xieyongji@baidu.com> This patchset is aimed at supporting qemu to reconnect vhost-user-blk backend after vhost-user-blk backend crash or restart. The patch 1 tries to implenment the sync connection for "reconnect socket". The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT to support offering shared memory to backend to record its inflight I/O. The patch 3,4 are the corresponding libvhost-user patches of patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT. The patch 5 supports vhost-user-blk to reconnect backend when connection closed. The patch 6 tells qemu that we support reconnecting now. To use it, we could start qemu with: qemu-system-x86_64 \ -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \ -device vhost-user-blk-pci,chardev=char0 \ and start vhost-user-blk backend with: vhost-user-blk -b /path/file -s /path/vhost.socket Then we can restart vhost-user-blk at any time during VM running. Xie Yongji (6): char-socket: Enable "wait" option for client mode vhost-user: Add shared memory to record inflight I/O libvhost-user: Introduce vu_queue_map_desc() libvhost-user: Support recording inflight I/O in shared memory vhost-user-blk: Add support for reconnecting backend contrib/vhost-user-blk: enable inflight I/O recording chardev/char-socket.c | 5 +- contrib/libvhost-user/libvhost-user.c | 215 ++++++++++++++++++++---- contrib/libvhost-user/libvhost-user.h | 19 +++ contrib/vhost-user-blk/vhost-user-blk.c | 3 +- hw/block/vhost-user-blk.c | 169 +++++++++++++++++-- hw/virtio/vhost-user.c | 69 ++++++++ hw/virtio/vhost.c | 8 + include/hw/virtio/vhost-backend.h | 4 + include/hw/virtio/vhost-user-blk.h | 4 + include/hw/virtio/vhost-user.h | 8 + 10 files changed, 452 insertions(+), 52 deletions(-)