Message ID | 20230115170903.3416105-1-antonkuchin@yandex-team.ru (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vhost-user-fs: add capability to allow migration | expand |
On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: > > Now any vhost-user-fs device makes VM unmigratable, that also prevents > qemu update without stopping the VM. In most cases that makes sense > because qemu has no way to transfer FUSE session state. > > But we can give an option to orchestrator to override this if it can > guarantee that state will be preserved (e.g. it uses migration to > update qemu and dst will run on the same host as src and use the same > socket endpoints). > > This patch keeps default behavior that prevents migration with such devices > but adds migration capability 'vhost-user-fs' to explicitly allow migration. > > Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru> > --- > hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++- > qapi/migration.json | 7 ++++++- > 2 files changed, 30 insertions(+), 2 deletions(-) > > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c > index f5049735ac..13d920423e 100644 > --- a/hw/virtio/vhost-user-fs.c > +++ b/hw/virtio/vhost-user-fs.c > @@ -24,6 +24,7 @@ > #include "hw/virtio/vhost-user-fs.h" > #include "monitor/monitor.h" > #include "sysemu/sysemu.h" > +#include "migration/migration.h" > > static const int user_feature_bits[] = { > VIRTIO_F_VERSION_1, > @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) > return &fs->vhost_dev; > } > > +static int vhost_user_fs_pre_save(void *opaque) > +{ > + MigrationState *s = migrate_get_current(); > + > + if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { > + error_report("Migration of vhost-user-fs devices requires internal FUSE " > + "state of backend to be preserved. If orchestrator can " > + "guarantee this (e.g. dst connects to the same backend " > + "instance or backend state is migrated) set 'vhost-user-fs' " > + "migration capability to true to enable migration."); > + return -1; > + } > + > + return 0; > +} > + > static const VMStateDescription vuf_vmstate = { > .name = "vhost-user-fs", > - .unmigratable = 1, > + .minimum_version_id = 0, > + .version_id = 0, > + .fields = (VMStateField[]) { > + VMSTATE_VIRTIO_DEVICE, > + VMSTATE_END_OF_LIST() > + }, > + .pre_save = vhost_user_fs_pre_save, > }; Will it be possible to extend this vmstate when virtiofsd adds support for stateful migration without breaking migration compatibility? If not, then I think a marker field should be added to the vmstate: 0 - stateless/reconnect migration (the approach you're adding in this patch) 1 - stateful migration (future virtiofsd feature) When the field is 0 there are no further vmstate fields and we trust that the destination vhost-user-fs server already has the necessary state. When the field is 1 there are additional vmstate fields that contain the virtiofsd state. The goal is for QEMU to support 3 migration modes, depending on the vhost-user-fs server: 1. No migration support. 2. Stateless migration. 3. Stateful migration. > > static Property vuf_properties[] = { > diff --git a/qapi/migration.json b/qapi/migration.json > index 88ecf86ac8..9a229ea884 100644 > --- a/qapi/migration.json > +++ b/qapi/migration.json > @@ -477,6 +477,11 @@ > # will be handled faster. This is a performance feature and > # should not affect the correctness of postcopy migration. > # (since 7.1) > +# @vhost-user-fs: If enabled, the migration process will allow migration of > +# vhost-user-fs devices, this should be enabled only when > +# backend can preserve local FUSE state e.g. for qemu update > +# when dst reconects to the same endpoints after migration. > +# (since 8.0) This is global but a guest can have multiple vhost-user-fs devices connected to different servers. I would add a qdev property to the device instead of introducing a migration capability. The property would enable "stateless migration". When the property is not set, migration would be prohibited. > # > # Features: > # @unstable: Members @x-colo and @x-ignore-shared are experimental. > @@ -492,7 +497,7 @@ > 'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate', > { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] }, > 'validate-uuid', 'background-snapshot', > - 'zero-copy-send', 'postcopy-preempt'] } > + 'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] } > > ## > # @MigrationCapabilityStatus: > -- > 2.34.1 > >
On 18/01/2023 17:52, Stefan Hajnoczi wrote: > On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >> Now any vhost-user-fs device makes VM unmigratable, that also prevents >> qemu update without stopping the VM. In most cases that makes sense >> because qemu has no way to transfer FUSE session state. >> >> But we can give an option to orchestrator to override this if it can >> guarantee that state will be preserved (e.g. it uses migration to >> update qemu and dst will run on the same host as src and use the same >> socket endpoints). >> >> This patch keeps default behavior that prevents migration with such devices >> but adds migration capability 'vhost-user-fs' to explicitly allow migration. >> >> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru> >> --- >> hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++- >> qapi/migration.json | 7 ++++++- >> 2 files changed, 30 insertions(+), 2 deletions(-) >> >> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c >> index f5049735ac..13d920423e 100644 >> --- a/hw/virtio/vhost-user-fs.c >> +++ b/hw/virtio/vhost-user-fs.c >> @@ -24,6 +24,7 @@ >> #include "hw/virtio/vhost-user-fs.h" >> #include "monitor/monitor.h" >> #include "sysemu/sysemu.h" >> +#include "migration/migration.h" >> >> static const int user_feature_bits[] = { >> VIRTIO_F_VERSION_1, >> @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) >> return &fs->vhost_dev; >> } >> >> +static int vhost_user_fs_pre_save(void *opaque) >> +{ >> + MigrationState *s = migrate_get_current(); >> + >> + if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { >> + error_report("Migration of vhost-user-fs devices requires internal FUSE " >> + "state of backend to be preserved. If orchestrator can " >> + "guarantee this (e.g. dst connects to the same backend " >> + "instance or backend state is migrated) set 'vhost-user-fs' " >> + "migration capability to true to enable migration."); >> + return -1; >> + } >> + >> + return 0; >> +} >> + >> static const VMStateDescription vuf_vmstate = { >> .name = "vhost-user-fs", >> - .unmigratable = 1, >> + .minimum_version_id = 0, >> + .version_id = 0, >> + .fields = (VMStateField[]) { >> + VMSTATE_VIRTIO_DEVICE, >> + VMSTATE_END_OF_LIST() >> + }, >> + .pre_save = vhost_user_fs_pre_save, >> }; > Will it be possible to extend this vmstate when virtiofsd adds support > for stateful migration without breaking migration compatibility? > > If not, then I think a marker field should be added to the vmstate: > 0 - stateless/reconnect migration (the approach you're adding in this patch) > 1 - stateful migration (future virtiofsd feature) > > When the field is 0 there are no further vmstate fields and we trust > that the destination vhost-user-fs server already has the necessary > state. > > When the field is 1 there are additional vmstate fields that contain > the virtiofsd state. > > The goal is for QEMU to support 3 migration modes, depending on the > vhost-user-fs server: > 1. No migration support. > 2. Stateless migration. > 3. Stateful migration. Sure. These vmstate fields are very generic and mandatory for any virtio device. If in future more state can be transfer in migration stream the vmstate can be extended with additional fields. This can be done with new subsections and/or bumping version_id. The main purpose of this patch is to allow update VM to newer version of qemu via local migration without disruption to guest. And future versions hopefully could pack more state from external environment to migration stream. > >> static Property vuf_properties[] = { >> diff --git a/qapi/migration.json b/qapi/migration.json >> index 88ecf86ac8..9a229ea884 100644 >> --- a/qapi/migration.json >> +++ b/qapi/migration.json >> @@ -477,6 +477,11 @@ >> # will be handled faster. This is a performance feature and >> # should not affect the correctness of postcopy migration. >> # (since 7.1) >> +# @vhost-user-fs: If enabled, the migration process will allow migration of >> +# vhost-user-fs devices, this should be enabled only when >> +# backend can preserve local FUSE state e.g. for qemu update >> +# when dst reconects to the same endpoints after migration. >> +# (since 8.0) > This is global but a guest can have multiple vhost-user-fs devices > connected to different servers. AFAIK vhost-user requires unix socket and memory shared from guest so devices can't be connected to different servers, just to different endpoints on current host. > > I would add a qdev property to the device instead of introducing a > migration capability. The property would enable "stateless migration". > When the property is not set, migration would be prohibited. I did thought about that, but this is really not a property of device, this is the capability of management software and applies to exactly one particular migration process that it initiates. It should not persist across migration or be otherwise stored in device. The idea here is that orchestrator can ensure destination qemu will run on the same host, will reconnect to the same unix sockets and only then sets the flag (because inside qemu we can't know anything about the destination). This is somewhat similar to ignore-shared migration capability when qemu avoids saving and loading guest memory that is stores in shmem because it will be picked up by destination process right where source left it. >> # >> # Features: >> # @unstable: Members @x-colo and @x-ignore-shared are experimental. >> @@ -492,7 +497,7 @@ >> 'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate', >> { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] }, >> 'validate-uuid', 'background-snapshot', >> - 'zero-copy-send', 'postcopy-preempt'] } >> + 'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] } >> >> ## >> # @MigrationCapabilityStatus: >> -- >> 2.34.1 >> >>
On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote: > Now any vhost-user-fs device makes VM unmigratable, that also prevents > qemu update without stopping the VM. In most cases that makes sense > because qemu has no way to transfer FUSE session state. > > But we can give an option to orchestrator to override this if it can > guarantee that state will be preserved (e.g. it uses migration to > update qemu and dst will run on the same host as src and use the same > socket endpoints). > > This patch keeps default behavior that prevents migration with such devices > but adds migration capability 'vhost-user-fs' to explicitly allow migration. > > Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru> > --- > hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++- > qapi/migration.json | 7 ++++++- > 2 files changed, 30 insertions(+), 2 deletions(-) > > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c > index f5049735ac..13d920423e 100644 > --- a/hw/virtio/vhost-user-fs.c > +++ b/hw/virtio/vhost-user-fs.c > @@ -24,6 +24,7 @@ > #include "hw/virtio/vhost-user-fs.h" > #include "monitor/monitor.h" > #include "sysemu/sysemu.h" > +#include "migration/migration.h" > > static const int user_feature_bits[] = { > VIRTIO_F_VERSION_1, > @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) > return &fs->vhost_dev; > } > > +static int vhost_user_fs_pre_save(void *opaque) > +{ > + MigrationState *s = migrate_get_current(); > + > + if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { > + error_report("Migration of vhost-user-fs devices requires internal FUSE " > + "state of backend to be preserved. If orchestrator can " > + "guarantee this (e.g. dst connects to the same backend " > + "instance or backend state is migrated) set 'vhost-user-fs' " > + "migration capability to true to enable migration."); Isn't it possible that some backends are same and some are not? Shouldn't this be a device property then? > + return -1; > + } > + > + return 0; > +} > + > static const VMStateDescription vuf_vmstate = { > .name = "vhost-user-fs", > - .unmigratable = 1, > + .minimum_version_id = 0, > + .version_id = 0, > + .fields = (VMStateField[]) { > + VMSTATE_VIRTIO_DEVICE, > + VMSTATE_END_OF_LIST() > + }, > + .pre_save = vhost_user_fs_pre_save, > }; > > static Property vuf_properties[] = { > diff --git a/qapi/migration.json b/qapi/migration.json > index 88ecf86ac8..9a229ea884 100644 > --- a/qapi/migration.json > +++ b/qapi/migration.json > @@ -477,6 +477,11 @@ > # will be handled faster. This is a performance feature and > # should not affect the correctness of postcopy migration. > # (since 7.1) > +# @vhost-user-fs: If enabled, the migration process will allow migration of > +# vhost-user-fs devices, this should be enabled only when > +# backend can preserve local FUSE state e.g. for qemu update > +# when dst reconects to the same endpoints after migration. > +# (since 8.0) > # > # Features: > # @unstable: Members @x-colo and @x-ignore-shared are experimental. > @@ -492,7 +497,7 @@ > 'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate', > { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] }, > 'validate-uuid', 'background-snapshot', > - 'zero-copy-send', 'postcopy-preempt'] } > + 'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] } I kind of dislike that it's such a specific flag. Is only vhost-user-fs ever going to be affected? Any way to put it in a way that is more generic? > ## > # @MigrationCapabilityStatus: > -- > 2.34.1
On 19/01/2023 14:51, Michael S. Tsirkin wrote: > On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote: >> Now any vhost-user-fs device makes VM unmigratable, that also prevents >> qemu update without stopping the VM. In most cases that makes sense >> because qemu has no way to transfer FUSE session state. >> >> But we can give an option to orchestrator to override this if it can >> guarantee that state will be preserved (e.g. it uses migration to >> update qemu and dst will run on the same host as src and use the same >> socket endpoints). >> >> This patch keeps default behavior that prevents migration with such devices >> but adds migration capability 'vhost-user-fs' to explicitly allow migration. >> >> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru> >> --- >> hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++- >> qapi/migration.json | 7 ++++++- >> 2 files changed, 30 insertions(+), 2 deletions(-) >> >> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c >> index f5049735ac..13d920423e 100644 >> --- a/hw/virtio/vhost-user-fs.c >> +++ b/hw/virtio/vhost-user-fs.c >> @@ -24,6 +24,7 @@ >> #include "hw/virtio/vhost-user-fs.h" >> #include "monitor/monitor.h" >> #include "sysemu/sysemu.h" >> +#include "migration/migration.h" >> >> static const int user_feature_bits[] = { >> VIRTIO_F_VERSION_1, >> @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) >> return &fs->vhost_dev; >> } >> >> +static int vhost_user_fs_pre_save(void *opaque) >> +{ >> + MigrationState *s = migrate_get_current(); >> + >> + if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { >> + error_report("Migration of vhost-user-fs devices requires internal FUSE " >> + "state of backend to be preserved. If orchestrator can " >> + "guarantee this (e.g. dst connects to the same backend " >> + "instance or backend state is migrated) set 'vhost-user-fs' " >> + "migration capability to true to enable migration."); > Isn't it possible that some backends are same and some are not? > Shouldn't this be a device property then? If some are not the same it is not guaranteed that correct FUSE state is present there, so orchestrator shouldn't set the capability because this can result in destination devices being broken (they'll be fine after the remount in guest, but this is guest visible and is not acceptable). I can imagine smart orchestrator and backend that can transfer internal FUSE state, but we are not there yet, and this would be their responsibility then to ensure endpoint compatibility between src and dst and set the capability (that's why I put "e.g." and "or" in the error description). > > > >> + return -1; >> + } >> + >> + return 0; >> +} >> + >> static const VMStateDescription vuf_vmstate = { >> .name = "vhost-user-fs", >> - .unmigratable = 1, >> + .minimum_version_id = 0, >> + .version_id = 0, >> + .fields = (VMStateField[]) { >> + VMSTATE_VIRTIO_DEVICE, >> + VMSTATE_END_OF_LIST() >> + }, >> + .pre_save = vhost_user_fs_pre_save, >> }; >> >> static Property vuf_properties[] = { >> diff --git a/qapi/migration.json b/qapi/migration.json >> index 88ecf86ac8..9a229ea884 100644 >> --- a/qapi/migration.json >> +++ b/qapi/migration.json >> @@ -477,6 +477,11 @@ >> # will be handled faster. This is a performance feature and >> # should not affect the correctness of postcopy migration. >> # (since 7.1) >> +# @vhost-user-fs: If enabled, the migration process will allow migration of >> +# vhost-user-fs devices, this should be enabled only when >> +# backend can preserve local FUSE state e.g. for qemu update >> +# when dst reconects to the same endpoints after migration. >> +# (since 8.0) >> # >> # Features: >> # @unstable: Members @x-colo and @x-ignore-shared are experimental. >> @@ -492,7 +497,7 @@ >> 'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate', >> { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] }, >> 'validate-uuid', 'background-snapshot', >> - 'zero-copy-send', 'postcopy-preempt'] } >> + 'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] } > I kind of dislike that it's such a specific flag. Is only vhost-user-fs > ever going to be affected? Any way to put it in a way that is more generic? Here I agree with you: I would prefer less narrow naming too. But I didn't manage to come up with one. Looks like many other vhost-user devices could benefit from this so maybe "vhost-user-stateless" or something like this would be better. I'm not sure that other types of devices could handle reconnect to the old endpoint as easy as vhost-user-fs, but anyway the support for this flag needs to be implemented for each device individually. What do you think? Any ideas would be appreciated. > > >> ## >> # @MigrationCapabilityStatus: >> -- >> 2.34.1
On Thu, 19 Jan 2023 at 07:43, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: > > On 18/01/2023 17:52, Stefan Hajnoczi wrote: > > On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: > >> Now any vhost-user-fs device makes VM unmigratable, that also prevents > >> qemu update without stopping the VM. In most cases that makes sense > >> because qemu has no way to transfer FUSE session state. > >> > >> But we can give an option to orchestrator to override this if it can > >> guarantee that state will be preserved (e.g. it uses migration to > >> update qemu and dst will run on the same host as src and use the same > >> socket endpoints). > >> > >> This patch keeps default behavior that prevents migration with such devices > >> but adds migration capability 'vhost-user-fs' to explicitly allow migration. > >> > >> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru> > >> --- > >> hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++- > >> qapi/migration.json | 7 ++++++- > >> 2 files changed, 30 insertions(+), 2 deletions(-) > >> > >> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c > >> index f5049735ac..13d920423e 100644 > >> --- a/hw/virtio/vhost-user-fs.c > >> +++ b/hw/virtio/vhost-user-fs.c > >> @@ -24,6 +24,7 @@ > >> #include "hw/virtio/vhost-user-fs.h" > >> #include "monitor/monitor.h" > >> #include "sysemu/sysemu.h" > >> +#include "migration/migration.h" > >> > >> static const int user_feature_bits[] = { > >> VIRTIO_F_VERSION_1, > >> @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) > >> return &fs->vhost_dev; > >> } > >> > >> +static int vhost_user_fs_pre_save(void *opaque) > >> +{ > >> + MigrationState *s = migrate_get_current(); > >> + > >> + if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { > >> + error_report("Migration of vhost-user-fs devices requires internal FUSE " > >> + "state of backend to be preserved. If orchestrator can " > >> + "guarantee this (e.g. dst connects to the same backend " > >> + "instance or backend state is migrated) set 'vhost-user-fs' " > >> + "migration capability to true to enable migration."); > >> + return -1; > >> + } > >> + > >> + return 0; > >> +} > >> + > >> static const VMStateDescription vuf_vmstate = { > >> .name = "vhost-user-fs", > >> - .unmigratable = 1, > >> + .minimum_version_id = 0, > >> + .version_id = 0, > >> + .fields = (VMStateField[]) { > >> + VMSTATE_VIRTIO_DEVICE, > >> + VMSTATE_END_OF_LIST() > >> + }, > >> + .pre_save = vhost_user_fs_pre_save, > >> }; > > Will it be possible to extend this vmstate when virtiofsd adds support > > for stateful migration without breaking migration compatibility? > > > > If not, then I think a marker field should be added to the vmstate: > > 0 - stateless/reconnect migration (the approach you're adding in this patch) > > 1 - stateful migration (future virtiofsd feature) > > > > When the field is 0 there are no further vmstate fields and we trust > > that the destination vhost-user-fs server already has the necessary > > state. > > > > When the field is 1 there are additional vmstate fields that contain > > the virtiofsd state. > > > > The goal is for QEMU to support 3 migration modes, depending on the > > vhost-user-fs server: > > 1. No migration support. > > 2. Stateless migration. > > 3. Stateful migration. > > Sure. These vmstate fields are very generic and mandatory for any > virtio device. If in future more state can be transfer in migration > stream the vmstate can be extended with additional fields. This can > be done with new subsections and/or bumping version_id. My concern is that the vmstate introduced in this patch may be unusable when stateful migration is added. So additional compatibility code will need to be introduced to make your stateless migration continue working with extended vmstate. By adding a marker field in this patch it should be possible to continue using the same vmstate for stateless migration without adding extra compatibility code in the future. > The main purpose of this patch is to allow update VM to newer version > of qemu via local migration without disruption to guest. And future > versions hopefully could pack more state from external environment > to migration stream. > > > > > >> static Property vuf_properties[] = { > >> diff --git a/qapi/migration.json b/qapi/migration.json > >> index 88ecf86ac8..9a229ea884 100644 > >> --- a/qapi/migration.json > >> +++ b/qapi/migration.json > >> @@ -477,6 +477,11 @@ > >> # will be handled faster. This is a performance feature and > >> # should not affect the correctness of postcopy migration. > >> # (since 7.1) > >> +# @vhost-user-fs: If enabled, the migration process will allow migration of > >> +# vhost-user-fs devices, this should be enabled only when > >> +# backend can preserve local FUSE state e.g. for qemu update > >> +# when dst reconects to the same endpoints after migration. > >> +# (since 8.0) > > This is global but a guest can have multiple vhost-user-fs devices > > connected to different servers. > AFAIK vhost-user requires unix socket and memory shared from guest so > devices can't be connected to different servers, just to different > endpoints on current host. Different vhost-user-fs server software. vhost-user-fs is a protocol, there can be multiple server implementations. These implementations may have different capabilities (some may not support stateless migration). So I think stateless migration should be a per-instance setting, not global. > > > > > I would add a qdev property to the device instead of introducing a > > migration capability. The property would enable "stateless migration". > > When the property is not set, migration would be prohibited. > I did thought about that, but this is really not a property of device, > this is the capability of management software and applies to exactly one > particular migration process that it initiates. It should not persist > across migration or be otherwise stored in device. I disagree. The vhost-user-fs server software must implement stateless migration in order for this to work. For example, https://gitlab.com/virtio-fs/virtiofsd doesn't support stateless migration as far as I know. > > The idea here is that orchestrator can ensure destination qemu will > run on the same host, will reconnect to the same unix sockets and only > then sets the flag (because inside qemu we can't know anything about > the destination). > This is somewhat similar to ignore-shared migration capability when > qemu avoids saving and loading guest memory that is stores in shmem > because it will be picked up by destination process right where source > left it. > > >> # > >> # Features: > >> # @unstable: Members @x-colo and @x-ignore-shared are experimental. > >> @@ -492,7 +497,7 @@ > >> 'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate', > >> { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] }, > >> 'validate-uuid', 'background-snapshot', > >> - 'zero-copy-send', 'postcopy-preempt'] } > >> + 'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] } > >> > >> ## > >> # @MigrationCapabilityStatus: > >> -- > >> 2.34.1 > >> > >>
On 19/01/2023 16:30, Stefan Hajnoczi wrote: > On Thu, 19 Jan 2023 at 07:43, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >> On 18/01/2023 17:52, Stefan Hajnoczi wrote: >>> On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >>>> Now any vhost-user-fs device makes VM unmigratable, that also prevents >>>> qemu update without stopping the VM. In most cases that makes sense >>>> because qemu has no way to transfer FUSE session state. >>>> >>>> But we can give an option to orchestrator to override this if it can >>>> guarantee that state will be preserved (e.g. it uses migration to >>>> update qemu and dst will run on the same host as src and use the same >>>> socket endpoints). >>>> >>>> This patch keeps default behavior that prevents migration with such devices >>>> but adds migration capability 'vhost-user-fs' to explicitly allow migration. >>>> >>>> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru> >>>> --- >>>> hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++- >>>> qapi/migration.json | 7 ++++++- >>>> 2 files changed, 30 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c >>>> index f5049735ac..13d920423e 100644 >>>> --- a/hw/virtio/vhost-user-fs.c >>>> +++ b/hw/virtio/vhost-user-fs.c >>>> @@ -24,6 +24,7 @@ >>>> #include "hw/virtio/vhost-user-fs.h" >>>> #include "monitor/monitor.h" >>>> #include "sysemu/sysemu.h" >>>> +#include "migration/migration.h" >>>> >>>> static const int user_feature_bits[] = { >>>> VIRTIO_F_VERSION_1, >>>> @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) >>>> return &fs->vhost_dev; >>>> } >>>> >>>> +static int vhost_user_fs_pre_save(void *opaque) >>>> +{ >>>> + MigrationState *s = migrate_get_current(); >>>> + >>>> + if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { >>>> + error_report("Migration of vhost-user-fs devices requires internal FUSE " >>>> + "state of backend to be preserved. If orchestrator can " >>>> + "guarantee this (e.g. dst connects to the same backend " >>>> + "instance or backend state is migrated) set 'vhost-user-fs' " >>>> + "migration capability to true to enable migration."); >>>> + return -1; >>>> + } >>>> + >>>> + return 0; >>>> +} >>>> + >>>> static const VMStateDescription vuf_vmstate = { >>>> .name = "vhost-user-fs", >>>> - .unmigratable = 1, >>>> + .minimum_version_id = 0, >>>> + .version_id = 0, >>>> + .fields = (VMStateField[]) { >>>> + VMSTATE_VIRTIO_DEVICE, >>>> + VMSTATE_END_OF_LIST() >>>> + }, >>>> + .pre_save = vhost_user_fs_pre_save, >>>> }; >>> Will it be possible to extend this vmstate when virtiofsd adds support >>> for stateful migration without breaking migration compatibility? >>> >>> If not, then I think a marker field should be added to the vmstate: >>> 0 - stateless/reconnect migration (the approach you're adding in this patch) >>> 1 - stateful migration (future virtiofsd feature) >>> >>> When the field is 0 there are no further vmstate fields and we trust >>> that the destination vhost-user-fs server already has the necessary >>> state. >>> >>> When the field is 1 there are additional vmstate fields that contain >>> the virtiofsd state. >>> >>> The goal is for QEMU to support 3 migration modes, depending on the >>> vhost-user-fs server: >>> 1. No migration support. >>> 2. Stateless migration. >>> 3. Stateful migration. >> Sure. These vmstate fields are very generic and mandatory for any >> virtio device. If in future more state can be transfer in migration >> stream the vmstate can be extended with additional fields. This can >> be done with new subsections and/or bumping version_id. > My concern is that the vmstate introduced in this patch may be > unusable when stateful migration is added. So additional compatibility > code will need to be introduced to make your stateless migration > continue working with extended vmstate. > > By adding a marker field in this patch it should be possible to > continue using the same vmstate for stateless migration without adding > extra compatibility code in the future. I understand, but this fields in vmstate just packs generic virtio device state that is accessible by qemu. All additional data could be added later by extra fields. I believe we couldn't pull off any type of virtio device migration without transferring virtqueues so more sophisticated types of migration would require adding more data and not modification to this part of vmstate. > >> The main purpose of this patch is to allow update VM to newer version >> of qemu via local migration without disruption to guest. And future >> versions hopefully could pack more state from external environment >> to migration stream. >> >> >>>> static Property vuf_properties[] = { >>>> diff --git a/qapi/migration.json b/qapi/migration.json >>>> index 88ecf86ac8..9a229ea884 100644 >>>> --- a/qapi/migration.json >>>> +++ b/qapi/migration.json >>>> @@ -477,6 +477,11 @@ >>>> # will be handled faster. This is a performance feature and >>>> # should not affect the correctness of postcopy migration. >>>> # (since 7.1) >>>> +# @vhost-user-fs: If enabled, the migration process will allow migration of >>>> +# vhost-user-fs devices, this should be enabled only when >>>> +# backend can preserve local FUSE state e.g. for qemu update >>>> +# when dst reconects to the same endpoints after migration. >>>> +# (since 8.0) >>> This is global but a guest can have multiple vhost-user-fs devices >>> connected to different servers. >> AFAIK vhost-user requires unix socket and memory shared from guest so >> devices can't be connected to different servers, just to different >> endpoints on current host. > Different vhost-user-fs server software. vhost-user-fs is a protocol, > there can be multiple server implementations. These implementations > may have different capabilities (some may not support stateless > migration). So I think stateless migration should be a per-instance > setting, not global. From qemu point of view we have no way of knowing which endpoints implement stateless migration and if they can perform it at the moment or need some additional setup to handle it, so after all the final decision to migrate statelessly originates from the orchestrator that knows backends states better. Setting per-device flag can be more confusing because qemu can't know or control backends states and capabilities. So I'd rather hand off this task completely to the orchestrator and not try to split the responsibility between it and qemu. > >>> I would add a qdev property to the device instead of introducing a >>> migration capability. The property would enable "stateless migration". >>> When the property is not set, migration would be prohibited. >> I did thought about that, but this is really not a property of device, >> this is the capability of management software and applies to exactly one >> particular migration process that it initiates. It should not persist >> across migration or be otherwise stored in device. > I disagree. The vhost-user-fs server software must implement stateless > migration in order for this to work. For example, > https://gitlab.com/virtio-fs/virtiofsd doesn't support stateless > migration as far as I know. Yes the implementation in backend is necessary, but this is backend property and is not related at all to device in qemu. So management software should know better what backends are capable of. In fact I'm working now on support for this in rust viftiofsd (my rust skills are not good enough yet do this quickly) and plan to propose spec patch for exposing "reconnect" vhost user backend capability to have standard way for orchestrators to check for it. > >> The idea here is that orchestrator can ensure destination qemu will >> run on the same host, will reconnect to the same unix sockets and only >> then sets the flag (because inside qemu we can't know anything about >> the destination). >> This is somewhat similar to ignore-shared migration capability when >> qemu avoids saving and loading guest memory that is stores in shmem >> because it will be picked up by destination process right where source >> left it. >> >>>> # >>>> # Features: >>>> # @unstable: Members @x-colo and @x-ignore-shared are experimental. >>>> @@ -492,7 +497,7 @@ >>>> 'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate', >>>> { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] }, >>>> 'validate-uuid', 'background-snapshot', >>>> - 'zero-copy-send', 'postcopy-preempt'] } >>>> + 'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] } >>>> >>>> ## >>>> # @MigrationCapabilityStatus: >>>> -- >>>> 2.34.1 >>>> >>>>
On Thu, 19 Jan 2023 at 10:29, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: > > On 19/01/2023 16:30, Stefan Hajnoczi wrote: > > On Thu, 19 Jan 2023 at 07:43, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: > >> On 18/01/2023 17:52, Stefan Hajnoczi wrote: > >>> On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: > >>>> Now any vhost-user-fs device makes VM unmigratable, that also prevents > >>>> qemu update without stopping the VM. In most cases that makes sense > >>>> because qemu has no way to transfer FUSE session state. > >>>> > >>>> But we can give an option to orchestrator to override this if it can > >>>> guarantee that state will be preserved (e.g. it uses migration to > >>>> update qemu and dst will run on the same host as src and use the same > >>>> socket endpoints). > >>>> > >>>> This patch keeps default behavior that prevents migration with such devices > >>>> but adds migration capability 'vhost-user-fs' to explicitly allow migration. > >>>> > >>>> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru> > >>>> --- > >>>> hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++- > >>>> qapi/migration.json | 7 ++++++- > >>>> 2 files changed, 30 insertions(+), 2 deletions(-) > >>>> > >>>> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c > >>>> index f5049735ac..13d920423e 100644 > >>>> --- a/hw/virtio/vhost-user-fs.c > >>>> +++ b/hw/virtio/vhost-user-fs.c > >>>> @@ -24,6 +24,7 @@ > >>>> #include "hw/virtio/vhost-user-fs.h" > >>>> #include "monitor/monitor.h" > >>>> #include "sysemu/sysemu.h" > >>>> +#include "migration/migration.h" > >>>> > >>>> static const int user_feature_bits[] = { > >>>> VIRTIO_F_VERSION_1, > >>>> @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) > >>>> return &fs->vhost_dev; > >>>> } > >>>> > >>>> +static int vhost_user_fs_pre_save(void *opaque) > >>>> +{ > >>>> + MigrationState *s = migrate_get_current(); > >>>> + > >>>> + if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { > >>>> + error_report("Migration of vhost-user-fs devices requires internal FUSE " > >>>> + "state of backend to be preserved. If orchestrator can " > >>>> + "guarantee this (e.g. dst connects to the same backend " > >>>> + "instance or backend state is migrated) set 'vhost-user-fs' " > >>>> + "migration capability to true to enable migration."); > >>>> + return -1; > >>>> + } > >>>> + > >>>> + return 0; > >>>> +} > >>>> + > >>>> static const VMStateDescription vuf_vmstate = { > >>>> .name = "vhost-user-fs", > >>>> - .unmigratable = 1, > >>>> + .minimum_version_id = 0, > >>>> + .version_id = 0, > >>>> + .fields = (VMStateField[]) { > >>>> + VMSTATE_VIRTIO_DEVICE, > >>>> + VMSTATE_END_OF_LIST() > >>>> + }, > >>>> + .pre_save = vhost_user_fs_pre_save, > >>>> }; > >>> Will it be possible to extend this vmstate when virtiofsd adds support > >>> for stateful migration without breaking migration compatibility? > >>> > >>> If not, then I think a marker field should be added to the vmstate: > >>> 0 - stateless/reconnect migration (the approach you're adding in this patch) > >>> 1 - stateful migration (future virtiofsd feature) > >>> > >>> When the field is 0 there are no further vmstate fields and we trust > >>> that the destination vhost-user-fs server already has the necessary > >>> state. > >>> > >>> When the field is 1 there are additional vmstate fields that contain > >>> the virtiofsd state. > >>> > >>> The goal is for QEMU to support 3 migration modes, depending on the > >>> vhost-user-fs server: > >>> 1. No migration support. > >>> 2. Stateless migration. > >>> 3. Stateful migration. > >> Sure. These vmstate fields are very generic and mandatory for any > >> virtio device. If in future more state can be transfer in migration > >> stream the vmstate can be extended with additional fields. This can > >> be done with new subsections and/or bumping version_id. > > My concern is that the vmstate introduced in this patch may be > > unusable when stateful migration is added. So additional compatibility > > code will need to be introduced to make your stateless migration > > continue working with extended vmstate. > > > > By adding a marker field in this patch it should be possible to > > continue using the same vmstate for stateless migration without adding > > extra compatibility code in the future. > I understand, but this fields in vmstate just packs generic virtio > device state that is accessible by qemu. All additional data could be > added later by extra fields. I believe we couldn't pull off any type > of virtio device migration without transferring virtqueues so more > sophisticated types of migration would require adding more data and > not modification to this part of vmstate. What I'm saying is that your patch could define the vmstate such that it that contains a field to differentiate between stateless and stateful migration. That way QEMU versions that only support stateless migration (this patch) will be able to migrate to future QEMU versions that support both stateless and stateful without compatibility issues. I'm not sure if my suggestion to add a marker field to vuf_vmstate is the best way to do this, but have you thought of how to handle the future addition of stateful migration to the vmstate without breaking stateless vmstates? Maybe David Gilbert has a suggestion for how to do this cleanly. Stefan
On 19/01/2023 18:02, Stefan Hajnoczi wrote: > On Thu, 19 Jan 2023 at 10:29, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >> On 19/01/2023 16:30, Stefan Hajnoczi wrote: >>> On Thu, 19 Jan 2023 at 07:43, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >>>> On 18/01/2023 17:52, Stefan Hajnoczi wrote: >>>>> On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >>>>>> Now any vhost-user-fs device makes VM unmigratable, that also prevents >>>>>> qemu update without stopping the VM. In most cases that makes sense >>>>>> because qemu has no way to transfer FUSE session state. >>>>>> >>>>>> But we can give an option to orchestrator to override this if it can >>>>>> guarantee that state will be preserved (e.g. it uses migration to >>>>>> update qemu and dst will run on the same host as src and use the same >>>>>> socket endpoints). >>>>>> >>>>>> This patch keeps default behavior that prevents migration with such devices >>>>>> but adds migration capability 'vhost-user-fs' to explicitly allow migration. >>>>>> >>>>>> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru> >>>>>> --- >>>>>> hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++- >>>>>> qapi/migration.json | 7 ++++++- >>>>>> 2 files changed, 30 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c >>>>>> index f5049735ac..13d920423e 100644 >>>>>> --- a/hw/virtio/vhost-user-fs.c >>>>>> +++ b/hw/virtio/vhost-user-fs.c >>>>>> @@ -24,6 +24,7 @@ >>>>>> #include "hw/virtio/vhost-user-fs.h" >>>>>> #include "monitor/monitor.h" >>>>>> #include "sysemu/sysemu.h" >>>>>> +#include "migration/migration.h" >>>>>> >>>>>> static const int user_feature_bits[] = { >>>>>> VIRTIO_F_VERSION_1, >>>>>> @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) >>>>>> return &fs->vhost_dev; >>>>>> } >>>>>> >>>>>> +static int vhost_user_fs_pre_save(void *opaque) >>>>>> +{ >>>>>> + MigrationState *s = migrate_get_current(); >>>>>> + >>>>>> + if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { >>>>>> + error_report("Migration of vhost-user-fs devices requires internal FUSE " >>>>>> + "state of backend to be preserved. If orchestrator can " >>>>>> + "guarantee this (e.g. dst connects to the same backend " >>>>>> + "instance or backend state is migrated) set 'vhost-user-fs' " >>>>>> + "migration capability to true to enable migration."); >>>>>> + return -1; >>>>>> + } >>>>>> + >>>>>> + return 0; >>>>>> +} >>>>>> + >>>>>> static const VMStateDescription vuf_vmstate = { >>>>>> .name = "vhost-user-fs", >>>>>> - .unmigratable = 1, >>>>>> + .minimum_version_id = 0, >>>>>> + .version_id = 0, >>>>>> + .fields = (VMStateField[]) { >>>>>> + VMSTATE_VIRTIO_DEVICE, >>>>>> + VMSTATE_END_OF_LIST() >>>>>> + }, >>>>>> + .pre_save = vhost_user_fs_pre_save, >>>>>> }; >>>>> Will it be possible to extend this vmstate when virtiofsd adds support >>>>> for stateful migration without breaking migration compatibility? >>>>> >>>>> If not, then I think a marker field should be added to the vmstate: >>>>> 0 - stateless/reconnect migration (the approach you're adding in this patch) >>>>> 1 - stateful migration (future virtiofsd feature) >>>>> >>>>> When the field is 0 there are no further vmstate fields and we trust >>>>> that the destination vhost-user-fs server already has the necessary >>>>> state. >>>>> >>>>> When the field is 1 there are additional vmstate fields that contain >>>>> the virtiofsd state. >>>>> >>>>> The goal is for QEMU to support 3 migration modes, depending on the >>>>> vhost-user-fs server: >>>>> 1. No migration support. >>>>> 2. Stateless migration. >>>>> 3. Stateful migration. >>>> Sure. These vmstate fields are very generic and mandatory for any >>>> virtio device. If in future more state can be transfer in migration >>>> stream the vmstate can be extended with additional fields. This can >>>> be done with new subsections and/or bumping version_id. >>> My concern is that the vmstate introduced in this patch may be >>> unusable when stateful migration is added. So additional compatibility >>> code will need to be introduced to make your stateless migration >>> continue working with extended vmstate. >>> >>> By adding a marker field in this patch it should be possible to >>> continue using the same vmstate for stateless migration without adding >>> extra compatibility code in the future. >> I understand, but this fields in vmstate just packs generic virtio >> device state that is accessible by qemu. All additional data could be >> added later by extra fields. I believe we couldn't pull off any type >> of virtio device migration without transferring virtqueues so more >> sophisticated types of migration would require adding more data and >> not modification to this part of vmstate. > What I'm saying is that your patch could define the vmstate such that > it that contains a field to differentiate between stateless and > stateful migration. That way QEMU versions that only support stateless > migration (this patch) will be able to migrate to future QEMU versions > that support both stateless and stateful without compatibility issues. I double-checked migration documentation for subsections at https://www.qemu.org/docs/master/devel/migration.html#subsections and believe it perfectly describes our case: virtio device state should always be transferred both in stateless or stateful migration. With stateful one we would add new subsection with extra data that will be transferred only if stateless capability is not set. We can connect this subsection to device property and machine type if we need to. On the receiving side we always need basic virtio device state and newer versions will be able to load extra data from subsection if it is present, while older versions will be still able to accept the migrations that were initiated from new versions with stateless flag set and don't have extra subsection. The only scenario that will fail is older qemu won't be able to load new migration stream with additional subsection that it can't understand - this is general limitation of migration compatibility. So we can't completely protect older versions from future migration stream format because we don't know what will be in that stream but looks like we have all the tools to maintain compatibility reasonably wide. > I'm not sure if my suggestion to add a marker field to vuf_vmstate is > the best way to do this, but have you thought of how to handle the > future addition of stateful migration to the vmstate without breaking > stateless vmstates? Maybe David Gilbert has a suggestion for how to do > this cleanly. > > Stefan I think we'd be better without a new marker because migration code has standard generic way of solving such puzzles that I described above. So adding new marker would go against existing practice. But if you could show me where I missed something I'll be grateful and will fix it to avoid potential problems. I'd also be happy to know the opinion of Dr. David Alan Gilbert.
* Anton Kuchin (antonkuchin@yandex-team.ru) wrote: > On 19/01/2023 14:51, Michael S. Tsirkin wrote: > > On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote: > > > Now any vhost-user-fs device makes VM unmigratable, that also prevents > > > qemu update without stopping the VM. In most cases that makes sense > > > because qemu has no way to transfer FUSE session state. > > > > > > But we can give an option to orchestrator to override this if it can > > > guarantee that state will be preserved (e.g. it uses migration to > > > update qemu and dst will run on the same host as src and use the same > > > socket endpoints). > > > > > > This patch keeps default behavior that prevents migration with such devices > > > but adds migration capability 'vhost-user-fs' to explicitly allow migration. > > > > > > Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru> > > > --- > > > hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++- > > > qapi/migration.json | 7 ++++++- > > > 2 files changed, 30 insertions(+), 2 deletions(-) > > > > > > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c > > > index f5049735ac..13d920423e 100644 > > > --- a/hw/virtio/vhost-user-fs.c > > > +++ b/hw/virtio/vhost-user-fs.c > > > @@ -24,6 +24,7 @@ > > > #include "hw/virtio/vhost-user-fs.h" > > > #include "monitor/monitor.h" > > > #include "sysemu/sysemu.h" > > > +#include "migration/migration.h" > > > static const int user_feature_bits[] = { > > > VIRTIO_F_VERSION_1, > > > @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) > > > return &fs->vhost_dev; > > > } > > > +static int vhost_user_fs_pre_save(void *opaque) > > > +{ > > > + MigrationState *s = migrate_get_current(); > > > + > > > + if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { > > > + error_report("Migration of vhost-user-fs devices requires internal FUSE " > > > + "state of backend to be preserved. If orchestrator can " > > > + "guarantee this (e.g. dst connects to the same backend " > > > + "instance or backend state is migrated) set 'vhost-user-fs' " > > > + "migration capability to true to enable migration."); > > Isn't it possible that some backends are same and some are not? > > Shouldn't this be a device property then? > If some are not the same it is not guaranteed that correct FUSE > state is present there, so orchestrator shouldn't set the capability > because this can result in destination devices being broken (they'll > be fine after the remount in guest, but this is guest visible and is > not acceptable). Shouldn't this be something that's negotiated as a feature bit in the vhost-user protocol - i.e. to say whether the device can do migration? However, I think what you're saying is that a migration might only be doable in some cases - e.g. a local migration - and that's hard for either qemu or the daemon to discover by itself; so it's right that an orchestrator enables it. (I'm not sure the error_report message needs to be quite so verbose - normally a one line clear message is enough that people can then look up). > I can imagine smart orchestrator and backend that can transfer > internal FUSE state, but we are not there yet, and this would be > their responsibility then to ensure endpoint compatibility between src > and dst and set the capability (that's why I put "e.g." and "or" in > the error description). You also need the vhost-user device to be able to do the dirty bitmap updates; is that being checked somewhere? Dave > > > > > > > > > + return -1; > > > + } > > > + > > > + return 0; > > > +} > > > + > > > static const VMStateDescription vuf_vmstate = { > > > .name = "vhost-user-fs", > > > - .unmigratable = 1, > > > + .minimum_version_id = 0, > > > + .version_id = 0, > > > + .fields = (VMStateField[]) { > > > + VMSTATE_VIRTIO_DEVICE, > > > + VMSTATE_END_OF_LIST() > > > + }, > > > + .pre_save = vhost_user_fs_pre_save, > > > }; > > > static Property vuf_properties[] = { > > > diff --git a/qapi/migration.json b/qapi/migration.json > > > index 88ecf86ac8..9a229ea884 100644 > > > --- a/qapi/migration.json > > > +++ b/qapi/migration.json > > > @@ -477,6 +477,11 @@ > > > # will be handled faster. This is a performance feature and > > > # should not affect the correctness of postcopy migration. > > > # (since 7.1) > > > +# @vhost-user-fs: If enabled, the migration process will allow migration of > > > +# vhost-user-fs devices, this should be enabled only when > > > +# backend can preserve local FUSE state e.g. for qemu update > > > +# when dst reconects to the same endpoints after migration. > > > +# (since 8.0) > > > # > > > # Features: > > > # @unstable: Members @x-colo and @x-ignore-shared are experimental. > > > @@ -492,7 +497,7 @@ > > > 'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate', > > > { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] }, > > > 'validate-uuid', 'background-snapshot', > > > - 'zero-copy-send', 'postcopy-preempt'] } > > > + 'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] } > > I kind of dislike that it's such a specific flag. Is only vhost-user-fs > > ever going to be affected? Any way to put it in a way that is more generic? > Here I agree with you: I would prefer less narrow naming too. But I > didn't manage to come up with one. Looks like many other vhost-user > devices could benefit from this so maybe "vhost-user-stateless" or > something like this would be better. > I'm not sure that other types of devices could handle reconnect to > the old endpoint as easy as vhost-user-fs, but anyway the support for > this flag needs to be implemented for each device individually. > What do you think? Any ideas would be appreciated. > > > > > > > > ## > > > # @MigrationCapabilityStatus: > > > -- > > > 2.34.1 >
On Thu, 19 Jan 2023 at 11:58, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: > > On 19/01/2023 18:02, Stefan Hajnoczi wrote: > > On Thu, 19 Jan 2023 at 10:29, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: > >> On 19/01/2023 16:30, Stefan Hajnoczi wrote: > >>> On Thu, 19 Jan 2023 at 07:43, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: > >>>> On 18/01/2023 17:52, Stefan Hajnoczi wrote: > >>>>> On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: > >>>>>> Now any vhost-user-fs device makes VM unmigratable, that also prevents > >>>>>> qemu update without stopping the VM. In most cases that makes sense > >>>>>> because qemu has no way to transfer FUSE session state. > >>>>>> > >>>>>> But we can give an option to orchestrator to override this if it can > >>>>>> guarantee that state will be preserved (e.g. it uses migration to > >>>>>> update qemu and dst will run on the same host as src and use the same > >>>>>> socket endpoints). > >>>>>> > >>>>>> This patch keeps default behavior that prevents migration with such devices > >>>>>> but adds migration capability 'vhost-user-fs' to explicitly allow migration. > >>>>>> > >>>>>> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru> > >>>>>> --- > >>>>>> hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++- > >>>>>> qapi/migration.json | 7 ++++++- > >>>>>> 2 files changed, 30 insertions(+), 2 deletions(-) > >>>>>> > >>>>>> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c > >>>>>> index f5049735ac..13d920423e 100644 > >>>>>> --- a/hw/virtio/vhost-user-fs.c > >>>>>> +++ b/hw/virtio/vhost-user-fs.c > >>>>>> @@ -24,6 +24,7 @@ > >>>>>> #include "hw/virtio/vhost-user-fs.h" > >>>>>> #include "monitor/monitor.h" > >>>>>> #include "sysemu/sysemu.h" > >>>>>> +#include "migration/migration.h" > >>>>>> > >>>>>> static const int user_feature_bits[] = { > >>>>>> VIRTIO_F_VERSION_1, > >>>>>> @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) > >>>>>> return &fs->vhost_dev; > >>>>>> } > >>>>>> > >>>>>> +static int vhost_user_fs_pre_save(void *opaque) > >>>>>> +{ > >>>>>> + MigrationState *s = migrate_get_current(); > >>>>>> + > >>>>>> + if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { > >>>>>> + error_report("Migration of vhost-user-fs devices requires internal FUSE " > >>>>>> + "state of backend to be preserved. If orchestrator can " > >>>>>> + "guarantee this (e.g. dst connects to the same backend " > >>>>>> + "instance or backend state is migrated) set 'vhost-user-fs' " > >>>>>> + "migration capability to true to enable migration."); > >>>>>> + return -1; > >>>>>> + } > >>>>>> + > >>>>>> + return 0; > >>>>>> +} > >>>>>> + > >>>>>> static const VMStateDescription vuf_vmstate = { > >>>>>> .name = "vhost-user-fs", > >>>>>> - .unmigratable = 1, > >>>>>> + .minimum_version_id = 0, > >>>>>> + .version_id = 0, > >>>>>> + .fields = (VMStateField[]) { > >>>>>> + VMSTATE_VIRTIO_DEVICE, > >>>>>> + VMSTATE_END_OF_LIST() > >>>>>> + }, > >>>>>> + .pre_save = vhost_user_fs_pre_save, > >>>>>> }; > >>>>> Will it be possible to extend this vmstate when virtiofsd adds support > >>>>> for stateful migration without breaking migration compatibility? > >>>>> > >>>>> If not, then I think a marker field should be added to the vmstate: > >>>>> 0 - stateless/reconnect migration (the approach you're adding in this patch) > >>>>> 1 - stateful migration (future virtiofsd feature) > >>>>> > >>>>> When the field is 0 there are no further vmstate fields and we trust > >>>>> that the destination vhost-user-fs server already has the necessary > >>>>> state. > >>>>> > >>>>> When the field is 1 there are additional vmstate fields that contain > >>>>> the virtiofsd state. > >>>>> > >>>>> The goal is for QEMU to support 3 migration modes, depending on the > >>>>> vhost-user-fs server: > >>>>> 1. No migration support. > >>>>> 2. Stateless migration. > >>>>> 3. Stateful migration. > >>>> Sure. These vmstate fields are very generic and mandatory for any > >>>> virtio device. If in future more state can be transfer in migration > >>>> stream the vmstate can be extended with additional fields. This can > >>>> be done with new subsections and/or bumping version_id. > >>> My concern is that the vmstate introduced in this patch may be > >>> unusable when stateful migration is added. So additional compatibility > >>> code will need to be introduced to make your stateless migration > >>> continue working with extended vmstate. > >>> > >>> By adding a marker field in this patch it should be possible to > >>> continue using the same vmstate for stateless migration without adding > >>> extra compatibility code in the future. > >> I understand, but this fields in vmstate just packs generic virtio > >> device state that is accessible by qemu. All additional data could be > >> added later by extra fields. I believe we couldn't pull off any type > >> of virtio device migration without transferring virtqueues so more > >> sophisticated types of migration would require adding more data and > >> not modification to this part of vmstate. > > What I'm saying is that your patch could define the vmstate such that > > it that contains a field to differentiate between stateless and > > stateful migration. That way QEMU versions that only support stateless > > migration (this patch) will be able to migrate to future QEMU versions > > that support both stateless and stateful without compatibility issues. > I double-checked migration documentation for subsections at > https://www.qemu.org/docs/master/devel/migration.html#subsections > and believe it perfectly describes our case: virtio device state > should always be transferred both in stateless or stateful migration. > With stateful one we would add new subsection with extra data that > will be transferred only if stateless capability is not set. We can > connect this subsection to device property and machine type if we > need to. > On the receiving side we always need basic virtio device state and > newer versions will be able to load extra data from subsection if it > is present, while older versions will be still able to accept the > migrations that were initiated from new versions with stateless flag > set and don't have extra subsection. > > The only scenario that will fail is older qemu won't be able to load > new migration stream with additional subsection that it can't > understand - this is general limitation of migration compatibility. > So we can't completely protect older versions from future migration > stream format because we don't know what will be in that stream > but looks like we have all the tools to maintain compatibility > reasonably wide. > > I'm not sure if my suggestion to add a marker field to vuf_vmstate is > > the best way to do this, but have you thought of how to handle the > > future addition of stateful migration to the vmstate without breaking > > stateless vmstates? Maybe David Gilbert has a suggestion for how to do > > this cleanly. > > > > Stefan > > I think we'd be better without a new marker because migration code > has standard generic way of solving such puzzles that I described > above. So adding new marker would go against existing practice. That sounds good to me, thanks! Stefan
On 19/01/2023 21:00, Dr. David Alan Gilbert wrote: > * Anton Kuchin (antonkuchin@yandex-team.ru) wrote: >> On 19/01/2023 14:51, Michael S. Tsirkin wrote: >>> On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote: >>>> Now any vhost-user-fs device makes VM unmigratable, that also prevents >>>> qemu update without stopping the VM. In most cases that makes sense >>>> because qemu has no way to transfer FUSE session state. >>>> >>>> But we can give an option to orchestrator to override this if it can >>>> guarantee that state will be preserved (e.g. it uses migration to >>>> update qemu and dst will run on the same host as src and use the same >>>> socket endpoints). >>>> >>>> This patch keeps default behavior that prevents migration with such devices >>>> but adds migration capability 'vhost-user-fs' to explicitly allow migration. >>>> >>>> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru> >>>> --- >>>> hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++- >>>> qapi/migration.json | 7 ++++++- >>>> 2 files changed, 30 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c >>>> index f5049735ac..13d920423e 100644 >>>> --- a/hw/virtio/vhost-user-fs.c >>>> +++ b/hw/virtio/vhost-user-fs.c >>>> @@ -24,6 +24,7 @@ >>>> #include "hw/virtio/vhost-user-fs.h" >>>> #include "monitor/monitor.h" >>>> #include "sysemu/sysemu.h" >>>> +#include "migration/migration.h" >>>> static const int user_feature_bits[] = { >>>> VIRTIO_F_VERSION_1, >>>> @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) >>>> return &fs->vhost_dev; >>>> } >>>> +static int vhost_user_fs_pre_save(void *opaque) >>>> +{ >>>> + MigrationState *s = migrate_get_current(); >>>> + >>>> + if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { >>>> + error_report("Migration of vhost-user-fs devices requires internal FUSE " >>>> + "state of backend to be preserved. If orchestrator can " >>>> + "guarantee this (e.g. dst connects to the same backend " >>>> + "instance or backend state is migrated) set 'vhost-user-fs' " >>>> + "migration capability to true to enable migration."); >>> Isn't it possible that some backends are same and some are not? >>> Shouldn't this be a device property then? >> If some are not the same it is not guaranteed that correct FUSE >> state is present there, so orchestrator shouldn't set the capability >> because this can result in destination devices being broken (they'll >> be fine after the remount in guest, but this is guest visible and is >> not acceptable). > Shouldn't this be something that's negotiated as a feature bit in the > vhost-user protocol - i.e. to say whether the device can do migration? > > However, I think what you're saying is that a migration might only be > doable in some cases - e.g. a local migration - and that's hard for > either qemu or the daemon to discover by itself; so it's right that > an orchestrator enables it. > (I'm not sure the error_report message needs to be quite so verbose - > normally a one line clear message is enough that people can then look > up). Exactly, migration depends not only on device capabilities but also on backends, hosts, management and other environment not known to qemu or backend so I can't see how this can be device config or negotiated via the protocol. In the message I tried to make clear that just enabling the capability without proper setup can be dangerous to reduce temptation to use it recklessly and get broken virtiofs device later. I'll try to shorten the message. I would appreciate if you could help me with the wording (I'm not a native speaker so building a short sentence with proper level of warning is challenging) > >> I can imagine smart orchestrator and backend that can transfer >> internal FUSE state, but we are not there yet, and this would be >> their responsibility then to ensure endpoint compatibility between src >> and dst and set the capability (that's why I put "e.g." and "or" in >> the error description). > You also need the vhost-user device to be able to do the dirty bitmap > updates; is that being checked somewhere? > > Dave Yes, this is done by generic vhost and vhost-user code on device init. In vhost_user_backend_init function if backend doesn't support VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol feature migration blocker is assigned to device. And in vhost_dev_init there is one more check for VHOST_F_LOG_ALL feature that assigns another migration blocker if this is not supported. > >>> >>> >>>> + return -1; >>>> + } >>>> + >>>> + return 0; >>>> +} >>>> + >>>> static const VMStateDescription vuf_vmstate = { >>>> .name = "vhost-user-fs", >>>> - .unmigratable = 1, >>>> + .minimum_version_id = 0, >>>> + .version_id = 0, >>>> + .fields = (VMStateField[]) { >>>> + VMSTATE_VIRTIO_DEVICE, >>>> + VMSTATE_END_OF_LIST() >>>> + }, >>>> + .pre_save = vhost_user_fs_pre_save, >>>> }; >>>> static Property vuf_properties[] = { >>>> diff --git a/qapi/migration.json b/qapi/migration.json >>>> index 88ecf86ac8..9a229ea884 100644 >>>> --- a/qapi/migration.json >>>> +++ b/qapi/migration.json >>>> @@ -477,6 +477,11 @@ >>>> # will be handled faster. This is a performance feature and >>>> # should not affect the correctness of postcopy migration. >>>> # (since 7.1) >>>> +# @vhost-user-fs: If enabled, the migration process will allow migration of >>>> +# vhost-user-fs devices, this should be enabled only when >>>> +# backend can preserve local FUSE state e.g. for qemu update >>>> +# when dst reconects to the same endpoints after migration. >>>> +# (since 8.0) >>>> # >>>> # Features: >>>> # @unstable: Members @x-colo and @x-ignore-shared are experimental. >>>> @@ -492,7 +497,7 @@ >>>> 'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate', >>>> { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] }, >>>> 'validate-uuid', 'background-snapshot', >>>> - 'zero-copy-send', 'postcopy-preempt'] } >>>> + 'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] } >>> I kind of dislike that it's such a specific flag. Is only vhost-user-fs >>> ever going to be affected? Any way to put it in a way that is more generic? >> Here I agree with you: I would prefer less narrow naming too. But I >> didn't manage to come up with one. Looks like many other vhost-user >> devices could benefit from this so maybe "vhost-user-stateless" or >> something like this would be better. >> I'm not sure that other types of devices could handle reconnect to >> the old endpoint as easy as vhost-user-fs, but anyway the support for >> this flag needs to be implemented for each device individually. >> What do you think? Any ideas would be appreciated. >> >>> >>>> ## >>>> # @MigrationCapabilityStatus: >>>> -- >>>> 2.34.1
On Thu, Jan 19, 2023 at 03:45:06PM +0200, Anton Kuchin wrote: > On 19/01/2023 14:51, Michael S. Tsirkin wrote: > > On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote: > > > Now any vhost-user-fs device makes VM unmigratable, that also prevents > > > qemu update without stopping the VM. In most cases that makes sense > > > because qemu has no way to transfer FUSE session state. > > > > > > But we can give an option to orchestrator to override this if it can > > > guarantee that state will be preserved (e.g. it uses migration to > > > update qemu and dst will run on the same host as src and use the same > > > socket endpoints). > > > > > > This patch keeps default behavior that prevents migration with such devices > > > but adds migration capability 'vhost-user-fs' to explicitly allow migration. > > > > > > Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru> > > > --- > > > hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++- > > > qapi/migration.json | 7 ++++++- > > > 2 files changed, 30 insertions(+), 2 deletions(-) > > > > > > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c > > > index f5049735ac..13d920423e 100644 > > > --- a/hw/virtio/vhost-user-fs.c > > > +++ b/hw/virtio/vhost-user-fs.c > > > @@ -24,6 +24,7 @@ > > > #include "hw/virtio/vhost-user-fs.h" > > > #include "monitor/monitor.h" > > > #include "sysemu/sysemu.h" > > > +#include "migration/migration.h" > > > static const int user_feature_bits[] = { > > > VIRTIO_F_VERSION_1, > > > @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) > > > return &fs->vhost_dev; > > > } > > > +static int vhost_user_fs_pre_save(void *opaque) > > > +{ > > > + MigrationState *s = migrate_get_current(); > > > + > > > + if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { > > > + error_report("Migration of vhost-user-fs devices requires internal FUSE " > > > + "state of backend to be preserved. If orchestrator can " > > > + "guarantee this (e.g. dst connects to the same backend " > > > + "instance or backend state is migrated) set 'vhost-user-fs' " > > > + "migration capability to true to enable migration."); > > Isn't it possible that some backends are same and some are not? > > Shouldn't this be a device property then? > If some are not the same it is not guaranteed that correct FUSE > state is present there, so orchestrator shouldn't set the capability > because this can result in destination devices being broken (they'll > be fine after the remount in guest, but this is guest visible and is > not acceptable). > > I can imagine smart orchestrator and backend that can transfer > internal FUSE state, but we are not there yet, and this would be > their responsibility then to ensure endpoint compatibility between src > and dst and set the capability (that's why I put "e.g." and "or" in > the error description). So instead of relying on the orchestrator how about making it a device property? > > > > > > > > > + return -1; > > > + } > > > + > > > + return 0; > > > +} > > > + > > > static const VMStateDescription vuf_vmstate = { > > > .name = "vhost-user-fs", > > > - .unmigratable = 1, > > > + .minimum_version_id = 0, > > > + .version_id = 0, > > > + .fields = (VMStateField[]) { > > > + VMSTATE_VIRTIO_DEVICE, > > > + VMSTATE_END_OF_LIST() > > > + }, > > > + .pre_save = vhost_user_fs_pre_save, > > > }; > > > static Property vuf_properties[] = { > > > diff --git a/qapi/migration.json b/qapi/migration.json > > > index 88ecf86ac8..9a229ea884 100644 > > > --- a/qapi/migration.json > > > +++ b/qapi/migration.json > > > @@ -477,6 +477,11 @@ > > > # will be handled faster. This is a performance feature and > > > # should not affect the correctness of postcopy migration. > > > # (since 7.1) > > > +# @vhost-user-fs: If enabled, the migration process will allow migration of > > > +# vhost-user-fs devices, this should be enabled only when > > > +# backend can preserve local FUSE state e.g. for qemu update > > > +# when dst reconects to the same endpoints after migration. > > > +# (since 8.0) > > > # > > > # Features: > > > # @unstable: Members @x-colo and @x-ignore-shared are experimental. > > > @@ -492,7 +497,7 @@ > > > 'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate', > > > { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] }, > > > 'validate-uuid', 'background-snapshot', > > > - 'zero-copy-send', 'postcopy-preempt'] } > > > + 'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] } > > I kind of dislike that it's such a specific flag. Is only vhost-user-fs > > ever going to be affected? Any way to put it in a way that is more generic? > Here I agree with you: I would prefer less narrow naming too. But I > didn't manage to come up with one. Looks like many other vhost-user > devices could benefit from this so maybe "vhost-user-stateless" or > something like this would be better. > I'm not sure that other types of devices could handle reconnect to > the old endpoint as easy as vhost-user-fs, but anyway the support for > this flag needs to be implemented for each device individually. > What do you think? Any ideas would be appreciated. Let's try to create a better description of when this flag should be set. Then shorten it up to create the name. > > > > > > > ## > > > # @MigrationCapabilityStatus: > > > -- > > > 2.34.1
On 20/01/2023 15:58, Michael S. Tsirkin wrote: > On Thu, Jan 19, 2023 at 03:45:06PM +0200, Anton Kuchin wrote: >> On 19/01/2023 14:51, Michael S. Tsirkin wrote: >>> On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote: >>>> Now any vhost-user-fs device makes VM unmigratable, that also prevents >>>> qemu update without stopping the VM. In most cases that makes sense >>>> because qemu has no way to transfer FUSE session state. >>>> >>>> But we can give an option to orchestrator to override this if it can >>>> guarantee that state will be preserved (e.g. it uses migration to >>>> update qemu and dst will run on the same host as src and use the same >>>> socket endpoints). >>>> >>>> This patch keeps default behavior that prevents migration with such devices >>>> but adds migration capability 'vhost-user-fs' to explicitly allow migration. >>>> >>>> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru> >>>> --- >>>> hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++- >>>> qapi/migration.json | 7 ++++++- >>>> 2 files changed, 30 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c >>>> index f5049735ac..13d920423e 100644 >>>> --- a/hw/virtio/vhost-user-fs.c >>>> +++ b/hw/virtio/vhost-user-fs.c >>>> @@ -24,6 +24,7 @@ >>>> #include "hw/virtio/vhost-user-fs.h" >>>> #include "monitor/monitor.h" >>>> #include "sysemu/sysemu.h" >>>> +#include "migration/migration.h" >>>> static const int user_feature_bits[] = { >>>> VIRTIO_F_VERSION_1, >>>> @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) >>>> return &fs->vhost_dev; >>>> } >>>> +static int vhost_user_fs_pre_save(void *opaque) >>>> +{ >>>> + MigrationState *s = migrate_get_current(); >>>> + >>>> + if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { >>>> + error_report("Migration of vhost-user-fs devices requires internal FUSE " >>>> + "state of backend to be preserved. If orchestrator can " >>>> + "guarantee this (e.g. dst connects to the same backend " >>>> + "instance or backend state is migrated) set 'vhost-user-fs' " >>>> + "migration capability to true to enable migration."); >>> Isn't it possible that some backends are same and some are not? >>> Shouldn't this be a device property then? >> If some are not the same it is not guaranteed that correct FUSE >> state is present there, so orchestrator shouldn't set the capability >> because this can result in destination devices being broken (they'll >> be fine after the remount in guest, but this is guest visible and is >> not acceptable). >> >> I can imagine smart orchestrator and backend that can transfer >> internal FUSE state, but we are not there yet, and this would be >> their responsibility then to ensure endpoint compatibility between src >> and dst and set the capability (that's why I put "e.g." and "or" in >> the error description). > So instead of relying on the orchestrator how about making it a device > property? We have to rely on the orchestrator here and I can't see how a property can help us with this: same device can be migratable or not depending on if the destination is the same host, what features backend supports, how management software works and other factors of environment that are not accessible from qemu or backend daemon. So in the end we'll end up with orchestrator having to setup flags for each device before each migration based on information only it can have - in my opinion this is worse than just giving the orchestrator a single flag that it can set after calculating the decision for the particular migration that it organizes. > > >>> >>> >>>> + return -1; >>>> + } >>>> + >>>> + return 0; >>>> +} >>>> + >>>> static const VMStateDescription vuf_vmstate = { >>>> .name = "vhost-user-fs", >>>> - .unmigratable = 1, >>>> + .minimum_version_id = 0, >>>> + .version_id = 0, >>>> + .fields = (VMStateField[]) { >>>> + VMSTATE_VIRTIO_DEVICE, >>>> + VMSTATE_END_OF_LIST() >>>> + }, >>>> + .pre_save = vhost_user_fs_pre_save, >>>> }; >>>> static Property vuf_properties[] = { >>>> diff --git a/qapi/migration.json b/qapi/migration.json >>>> index 88ecf86ac8..9a229ea884 100644 >>>> --- a/qapi/migration.json >>>> +++ b/qapi/migration.json >>>> @@ -477,6 +477,11 @@ >>>> # will be handled faster. This is a performance feature and >>>> # should not affect the correctness of postcopy migration. >>>> # (since 7.1) >>>> +# @vhost-user-fs: If enabled, the migration process will allow migration of >>>> +# vhost-user-fs devices, this should be enabled only when >>>> +# backend can preserve local FUSE state e.g. for qemu update >>>> +# when dst reconects to the same endpoints after migration. >>>> +# (since 8.0) >>>> # >>>> # Features: >>>> # @unstable: Members @x-colo and @x-ignore-shared are experimental. >>>> @@ -492,7 +497,7 @@ >>>> 'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate', >>>> { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] }, >>>> 'validate-uuid', 'background-snapshot', >>>> - 'zero-copy-send', 'postcopy-preempt'] } >>>> + 'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] } >>> I kind of dislike that it's such a specific flag. Is only vhost-user-fs >>> ever going to be affected? Any way to put it in a way that is more generic? >> Here I agree with you: I would prefer less narrow naming too. But I >> didn't manage to come up with one. Looks like many other vhost-user >> devices could benefit from this so maybe "vhost-user-stateless" or >> something like this would be better. >> I'm not sure that other types of devices could handle reconnect to >> the old endpoint as easy as vhost-user-fs, but anyway the support for >> this flag needs to be implemented for each device individually. >> What do you think? Any ideas would be appreciated. > Let's try to create a better description of when this flag should be > set. Then shorten it up to create the name. This flag should be set when qemu don't need to worry about any external state stored in vhost-user daemons during migration: don't fail migration, just pack generic virtio device states to migration stream and orchestrator guarantees that the rest of the state will be present at the destination to restore full context and continue running. > >>> >>>> ## >>>> # @MigrationCapabilityStatus: >>>> -- >>>> 2.34.1
On Fri, Jan 20, 2023 at 07:37:18PM +0200, Anton Kuchin wrote: > On 20/01/2023 15:58, Michael S. Tsirkin wrote: > > On Thu, Jan 19, 2023 at 03:45:06PM +0200, Anton Kuchin wrote: > > > On 19/01/2023 14:51, Michael S. Tsirkin wrote: > > > > On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote: > > > > > Now any vhost-user-fs device makes VM unmigratable, that also prevents > > > > > qemu update without stopping the VM. In most cases that makes sense > > > > > because qemu has no way to transfer FUSE session state. > > > > > > > > > > But we can give an option to orchestrator to override this if it can > > > > > guarantee that state will be preserved (e.g. it uses migration to > > > > > update qemu and dst will run on the same host as src and use the same > > > > > socket endpoints). > > > > > > > > > > This patch keeps default behavior that prevents migration with such devices > > > > > but adds migration capability 'vhost-user-fs' to explicitly allow migration. > > > > > > > > > > Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru> > > > > > --- > > > > > hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++- > > > > > qapi/migration.json | 7 ++++++- > > > > > 2 files changed, 30 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c > > > > > index f5049735ac..13d920423e 100644 > > > > > --- a/hw/virtio/vhost-user-fs.c > > > > > +++ b/hw/virtio/vhost-user-fs.c > > > > > @@ -24,6 +24,7 @@ > > > > > #include "hw/virtio/vhost-user-fs.h" > > > > > #include "monitor/monitor.h" > > > > > #include "sysemu/sysemu.h" > > > > > +#include "migration/migration.h" > > > > > static const int user_feature_bits[] = { > > > > > VIRTIO_F_VERSION_1, > > > > > @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) > > > > > return &fs->vhost_dev; > > > > > } > > > > > +static int vhost_user_fs_pre_save(void *opaque) > > > > > +{ > > > > > + MigrationState *s = migrate_get_current(); > > > > > + > > > > > + if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { > > > > > + error_report("Migration of vhost-user-fs devices requires internal FUSE " > > > > > + "state of backend to be preserved. If orchestrator can " > > > > > + "guarantee this (e.g. dst connects to the same backend " > > > > > + "instance or backend state is migrated) set 'vhost-user-fs' " > > > > > + "migration capability to true to enable migration."); > > > > Isn't it possible that some backends are same and some are not? > > > > Shouldn't this be a device property then? > > > If some are not the same it is not guaranteed that correct FUSE > > > state is present there, so orchestrator shouldn't set the capability > > > because this can result in destination devices being broken (they'll > > > be fine after the remount in guest, but this is guest visible and is > > > not acceptable). > > > > > > I can imagine smart orchestrator and backend that can transfer > > > internal FUSE state, but we are not there yet, and this would be > > > their responsibility then to ensure endpoint compatibility between src > > > and dst and set the capability (that's why I put "e.g." and "or" in > > > the error description). > > So instead of relying on the orchestrator how about making it a device > > property? > > We have to rely on the orchestrator here and I can't see how a property > can help us with this: same device can be migratable or not depending > on if the destination is the same host, what features backend supports, > how management software works and other factors of environment that > are not accessible from qemu or backend daemon. > So in the end we'll end up with orchestrator having to setup flags for > each device before each migration based on information only it can > have - in my opinion this is worse than just giving the orchestrator > a single flag that it can set after calculating the decision for > the particular migration that it organizes. > > > > > > > > > > > > > > > > > > + return -1; > > > > > + } > > > > > + > > > > > + return 0; > > > > > +} > > > > > + > > > > > static const VMStateDescription vuf_vmstate = { > > > > > .name = "vhost-user-fs", > > > > > - .unmigratable = 1, > > > > > + .minimum_version_id = 0, > > > > > + .version_id = 0, > > > > > + .fields = (VMStateField[]) { > > > > > + VMSTATE_VIRTIO_DEVICE, > > > > > + VMSTATE_END_OF_LIST() > > > > > + }, > > > > > + .pre_save = vhost_user_fs_pre_save, > > > > > }; > > > > > static Property vuf_properties[] = { > > > > > diff --git a/qapi/migration.json b/qapi/migration.json > > > > > index 88ecf86ac8..9a229ea884 100644 > > > > > --- a/qapi/migration.json > > > > > +++ b/qapi/migration.json > > > > > @@ -477,6 +477,11 @@ > > > > > # will be handled faster. This is a performance feature and > > > > > # should not affect the correctness of postcopy migration. > > > > > # (since 7.1) > > > > > +# @vhost-user-fs: If enabled, the migration process will allow migration of > > > > > +# vhost-user-fs devices, this should be enabled only when > > > > > +# backend can preserve local FUSE state e.g. for qemu update > > > > > +# when dst reconects to the same endpoints after migration. > > > > > +# (since 8.0) > > > > > # > > > > > # Features: > > > > > # @unstable: Members @x-colo and @x-ignore-shared are experimental. > > > > > @@ -492,7 +497,7 @@ > > > > > 'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate', > > > > > { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] }, > > > > > 'validate-uuid', 'background-snapshot', > > > > > - 'zero-copy-send', 'postcopy-preempt'] } > > > > > + 'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] } > > > > I kind of dislike that it's such a specific flag. Is only vhost-user-fs > > > > ever going to be affected? Any way to put it in a way that is more generic? > > > Here I agree with you: I would prefer less narrow naming too. But I > > > didn't manage to come up with one. Looks like many other vhost-user > > > devices could benefit from this so maybe "vhost-user-stateless" or > > > something like this would be better. > > > I'm not sure that other types of devices could handle reconnect to > > > the old endpoint as easy as vhost-user-fs, but anyway the support for > > > this flag needs to be implemented for each device individually. > > > What do you think? Any ideas would be appreciated. > > Let's try to create a better description of when this flag should be > > set. Then shorten it up to create the name. > > This flag should be set when qemu don't need to worry about any > external state stored in vhost-user daemons during migration: > don't fail migration, just pack generic virtio device states to > migration stream and orchestrator guarantees that the rest of the > state will be present at the destination to restore full context and > continue running. Sorry I still do not get it. So fundamentally, why do we need this property? vhost-user-fs is not created by default that we'd then need opt-in to the special "migrateable" case. That's why I said it might make some sense as a device property as qemu tracks whether device is unplugged for us. But as written, if you are going to teach the orchestrator about vhost-user-fs and its special needs, just teach it when to migrate and where not to migrate. Either we describe the special situation to qemu and let qemu make an intelligent decision whether to allow migration, or we trust the orchestrator. And if it's the latter, then 'migrate' already says orchestrator decided to migrate. > > > > > > > > > > > ## > > > > > # @MigrationCapabilityStatus: > > > > > -- > > > > > 2.34.1
On 22/01/2023 10:16, Michael S. Tsirkin wrote: > On Fri, Jan 20, 2023 at 07:37:18PM +0200, Anton Kuchin wrote: >> On 20/01/2023 15:58, Michael S. Tsirkin wrote: >>> On Thu, Jan 19, 2023 at 03:45:06PM +0200, Anton Kuchin wrote: >>>> On 19/01/2023 14:51, Michael S. Tsirkin wrote: >>>>> On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote: >>>>>> Now any vhost-user-fs device makes VM unmigratable, that also prevents >>>>>> qemu update without stopping the VM. In most cases that makes sense >>>>>> because qemu has no way to transfer FUSE session state. >>>>>> >>>>>> But we can give an option to orchestrator to override this if it can >>>>>> guarantee that state will be preserved (e.g. it uses migration to >>>>>> update qemu and dst will run on the same host as src and use the same >>>>>> socket endpoints). >>>>>> >>>>>> This patch keeps default behavior that prevents migration with such devices >>>>>> but adds migration capability 'vhost-user-fs' to explicitly allow migration. >>>>>> >>>>>> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru> >>>>>> --- >>>>>> hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++- >>>>>> qapi/migration.json | 7 ++++++- >>>>>> 2 files changed, 30 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c >>>>>> index f5049735ac..13d920423e 100644 >>>>>> --- a/hw/virtio/vhost-user-fs.c >>>>>> +++ b/hw/virtio/vhost-user-fs.c >>>>>> @@ -24,6 +24,7 @@ >>>>>> #include "hw/virtio/vhost-user-fs.h" >>>>>> #include "monitor/monitor.h" >>>>>> #include "sysemu/sysemu.h" >>>>>> +#include "migration/migration.h" >>>>>> static const int user_feature_bits[] = { >>>>>> VIRTIO_F_VERSION_1, >>>>>> @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) >>>>>> return &fs->vhost_dev; >>>>>> } >>>>>> +static int vhost_user_fs_pre_save(void *opaque) >>>>>> +{ >>>>>> + MigrationState *s = migrate_get_current(); >>>>>> + >>>>>> + if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { >>>>>> + error_report("Migration of vhost-user-fs devices requires internal FUSE " >>>>>> + "state of backend to be preserved. If orchestrator can " >>>>>> + "guarantee this (e.g. dst connects to the same backend " >>>>>> + "instance or backend state is migrated) set 'vhost-user-fs' " >>>>>> + "migration capability to true to enable migration."); >>>>> Isn't it possible that some backends are same and some are not? >>>>> Shouldn't this be a device property then? >>>> If some are not the same it is not guaranteed that correct FUSE >>>> state is present there, so orchestrator shouldn't set the capability >>>> because this can result in destination devices being broken (they'll >>>> be fine after the remount in guest, but this is guest visible and is >>>> not acceptable). >>>> >>>> I can imagine smart orchestrator and backend that can transfer >>>> internal FUSE state, but we are not there yet, and this would be >>>> their responsibility then to ensure endpoint compatibility between src >>>> and dst and set the capability (that's why I put "e.g." and "or" in >>>> the error description). >>> So instead of relying on the orchestrator how about making it a device >>> property? >> We have to rely on the orchestrator here and I can't see how a property >> can help us with this: same device can be migratable or not depending >> on if the destination is the same host, what features backend supports, >> how management software works and other factors of environment that >> are not accessible from qemu or backend daemon. >> So in the end we'll end up with orchestrator having to setup flags for >> each device before each migration based on information only it can >> have - in my opinion this is worse than just giving the orchestrator >> a single flag that it can set after calculating the decision for >> the particular migration that it organizes. >> >>> >>>>> >>>>>> + return -1; >>>>>> + } >>>>>> + >>>>>> + return 0; >>>>>> +} >>>>>> + >>>>>> static const VMStateDescription vuf_vmstate = { >>>>>> .name = "vhost-user-fs", >>>>>> - .unmigratable = 1, >>>>>> + .minimum_version_id = 0, >>>>>> + .version_id = 0, >>>>>> + .fields = (VMStateField[]) { >>>>>> + VMSTATE_VIRTIO_DEVICE, >>>>>> + VMSTATE_END_OF_LIST() >>>>>> + }, >>>>>> + .pre_save = vhost_user_fs_pre_save, >>>>>> }; >>>>>> static Property vuf_properties[] = { >>>>>> diff --git a/qapi/migration.json b/qapi/migration.json >>>>>> index 88ecf86ac8..9a229ea884 100644 >>>>>> --- a/qapi/migration.json >>>>>> +++ b/qapi/migration.json >>>>>> @@ -477,6 +477,11 @@ >>>>>> # will be handled faster. This is a performance feature and >>>>>> # should not affect the correctness of postcopy migration. >>>>>> # (since 7.1) >>>>>> +# @vhost-user-fs: If enabled, the migration process will allow migration of >>>>>> +# vhost-user-fs devices, this should be enabled only when >>>>>> +# backend can preserve local FUSE state e.g. for qemu update >>>>>> +# when dst reconects to the same endpoints after migration. >>>>>> +# (since 8.0) >>>>>> # >>>>>> # Features: >>>>>> # @unstable: Members @x-colo and @x-ignore-shared are experimental. >>>>>> @@ -492,7 +497,7 @@ >>>>>> 'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate', >>>>>> { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] }, >>>>>> 'validate-uuid', 'background-snapshot', >>>>>> - 'zero-copy-send', 'postcopy-preempt'] } >>>>>> + 'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] } >>>>> I kind of dislike that it's such a specific flag. Is only vhost-user-fs >>>>> ever going to be affected? Any way to put it in a way that is more generic? >>>> Here I agree with you: I would prefer less narrow naming too. But I >>>> didn't manage to come up with one. Looks like many other vhost-user >>>> devices could benefit from this so maybe "vhost-user-stateless" or >>>> something like this would be better. >>>> I'm not sure that other types of devices could handle reconnect to >>>> the old endpoint as easy as vhost-user-fs, but anyway the support for >>>> this flag needs to be implemented for each device individually. >>>> What do you think? Any ideas would be appreciated. >>> Let's try to create a better description of when this flag should be >>> set. Then shorten it up to create the name. >> This flag should be set when qemu don't need to worry about any >> external state stored in vhost-user daemons during migration: >> don't fail migration, just pack generic virtio device states to >> migration stream and orchestrator guarantees that the rest of the >> state will be present at the destination to restore full context and >> continue running. > Sorry I still do not get it. So fundamentally, why do we need this property? > vhost-user-fs is not created by default that we'd then need opt-in to > the special "migrateable" case. > That's why I said it might make some sense as a device property as qemu > tracks whether device is unplugged for us. > > But as written, if you are going to teach the orchestrator about > vhost-user-fs and its special needs, just teach it when to migrate and > where not to migrate. > > Either we describe the special situation to qemu and let qemu > make an intelligent decision whether to allow migration, > or we trust the orchestrator. And if it's the latter, then 'migrate' > already says orchestrator decided to migrate. The problem I'm trying to solve is that most of vhost-user devices now block migration in qemu. And this makes sense since qemu can't extract and transfer backend daemon state. But this prevents us from updating qemu executable via local migration. So this flag is intended more as a safety check that says "I know what I'm doing". I agree that it is not really necessary if we trust the orchestrator to request migration only when the migration can be performed in a safe way. But changing the current behavior of vhost-user-fs from "always blocks migration" to "migrates partial state whenever orchestrator requests it" seems a little dangerous and can be misinterpreted as full support for migration in all cases. > >>>>>> ## >>>>>> # @MigrationCapabilityStatus: >>>>>> -- >>>>>> 2.34.1
On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote: > > > This flag should be set when qemu don't need to worry about any > > > external state stored in vhost-user daemons during migration: > > > don't fail migration, just pack generic virtio device states to > > > migration stream and orchestrator guarantees that the rest of the > > > state will be present at the destination to restore full context and > > > continue running. > > Sorry I still do not get it. So fundamentally, why do we need this property? > > vhost-user-fs is not created by default that we'd then need opt-in to > > the special "migrateable" case. > > That's why I said it might make some sense as a device property as qemu > > tracks whether device is unplugged for us. > > > > But as written, if you are going to teach the orchestrator about > > vhost-user-fs and its special needs, just teach it when to migrate and > > where not to migrate. > > > > Either we describe the special situation to qemu and let qemu > > make an intelligent decision whether to allow migration, > > or we trust the orchestrator. And if it's the latter, then 'migrate' > > already says orchestrator decided to migrate. > > The problem I'm trying to solve is that most of vhost-user devices > now block migration in qemu. And this makes sense since qemu can't > extract and transfer backend daemon state. But this prevents us from > updating qemu executable via local migration. So this flag is > intended more as a safety check that says "I know what I'm doing". > > I agree that it is not really necessary if we trust the orchestrator > to request migration only when the migration can be performed in a > safe way. But changing the current behavior of vhost-user-fs from > "always blocks migration" to "migrates partial state whenever > orchestrator requests it" seems a little dangerous and can be > misinterpreted as full support for migration in all cases. It's not really different from block is it? orchestrator has to arrange for backend migration. I think we just assumed there's no use-case where this is practical for vhost-user-fs so we blocked it. But in any case it's orchestrator's responsibility.
On 22/01/2023 16:46, Michael S. Tsirkin wrote: > On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote: >>>> This flag should be set when qemu don't need to worry about any >>>> external state stored in vhost-user daemons during migration: >>>> don't fail migration, just pack generic virtio device states to >>>> migration stream and orchestrator guarantees that the rest of the >>>> state will be present at the destination to restore full context and >>>> continue running. >>> Sorry I still do not get it. So fundamentally, why do we need this property? >>> vhost-user-fs is not created by default that we'd then need opt-in to >>> the special "migrateable" case. >>> That's why I said it might make some sense as a device property as qemu >>> tracks whether device is unplugged for us. >>> >>> But as written, if you are going to teach the orchestrator about >>> vhost-user-fs and its special needs, just teach it when to migrate and >>> where not to migrate. >>> >>> Either we describe the special situation to qemu and let qemu >>> make an intelligent decision whether to allow migration, >>> or we trust the orchestrator. And if it's the latter, then 'migrate' >>> already says orchestrator decided to migrate. >> The problem I'm trying to solve is that most of vhost-user devices >> now block migration in qemu. And this makes sense since qemu can't >> extract and transfer backend daemon state. But this prevents us from >> updating qemu executable via local migration. So this flag is >> intended more as a safety check that says "I know what I'm doing". >> >> I agree that it is not really necessary if we trust the orchestrator >> to request migration only when the migration can be performed in a >> safe way. But changing the current behavior of vhost-user-fs from >> "always blocks migration" to "migrates partial state whenever >> orchestrator requests it" seems a little dangerous and can be >> misinterpreted as full support for migration in all cases. > It's not really different from block is it? orchestrator has to arrange > for backend migration. I think we just assumed there's no use-case where > this is practical for vhost-user-fs so we blocked it. > But in any case it's orchestrator's responsibility. Yes, you are right. So do you think we should just drop the blocker without adding a new flag?
On Sun, Jan 22, 2023 at 06:09:40PM +0200, Anton Kuchin wrote: > > On 22/01/2023 16:46, Michael S. Tsirkin wrote: > > On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote: > > > > > This flag should be set when qemu don't need to worry about any > > > > > external state stored in vhost-user daemons during migration: > > > > > don't fail migration, just pack generic virtio device states to > > > > > migration stream and orchestrator guarantees that the rest of the > > > > > state will be present at the destination to restore full context and > > > > > continue running. > > > > Sorry I still do not get it. So fundamentally, why do we need this property? > > > > vhost-user-fs is not created by default that we'd then need opt-in to > > > > the special "migrateable" case. > > > > That's why I said it might make some sense as a device property as qemu > > > > tracks whether device is unplugged for us. > > > > > > > > But as written, if you are going to teach the orchestrator about > > > > vhost-user-fs and its special needs, just teach it when to migrate and > > > > where not to migrate. > > > > > > > > Either we describe the special situation to qemu and let qemu > > > > make an intelligent decision whether to allow migration, > > > > or we trust the orchestrator. And if it's the latter, then 'migrate' > > > > already says orchestrator decided to migrate. > > > The problem I'm trying to solve is that most of vhost-user devices > > > now block migration in qemu. And this makes sense since qemu can't > > > extract and transfer backend daemon state. But this prevents us from > > > updating qemu executable via local migration. So this flag is > > > intended more as a safety check that says "I know what I'm doing". > > > > > > I agree that it is not really necessary if we trust the orchestrator > > > to request migration only when the migration can be performed in a > > > safe way. But changing the current behavior of vhost-user-fs from > > > "always blocks migration" to "migrates partial state whenever > > > orchestrator requests it" seems a little dangerous and can be > > > misinterpreted as full support for migration in all cases. > > It's not really different from block is it? orchestrator has to arrange > > for backend migration. I think we just assumed there's no use-case where > > this is practical for vhost-user-fs so we blocked it. > > But in any case it's orchestrator's responsibility. > > Yes, you are right. So do you think we should just drop the blocker > without adding a new flag? I'd be inclined to. I am curious what do dgilbert and stefanha think though.
On Sun, 22 Jan 2023 at 11:18, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Sun, Jan 22, 2023 at 06:09:40PM +0200, Anton Kuchin wrote: > > > > On 22/01/2023 16:46, Michael S. Tsirkin wrote: > > > On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote: > > > > > > This flag should be set when qemu don't need to worry about any > > > > > > external state stored in vhost-user daemons during migration: > > > > > > don't fail migration, just pack generic virtio device states to > > > > > > migration stream and orchestrator guarantees that the rest of the > > > > > > state will be present at the destination to restore full context and > > > > > > continue running. > > > > > Sorry I still do not get it. So fundamentally, why do we need this property? > > > > > vhost-user-fs is not created by default that we'd then need opt-in to > > > > > the special "migrateable" case. > > > > > That's why I said it might make some sense as a device property as qemu > > > > > tracks whether device is unplugged for us. > > > > > > > > > > But as written, if you are going to teach the orchestrator about > > > > > vhost-user-fs and its special needs, just teach it when to migrate and > > > > > where not to migrate. > > > > > > > > > > Either we describe the special situation to qemu and let qemu > > > > > make an intelligent decision whether to allow migration, > > > > > or we trust the orchestrator. And if it's the latter, then 'migrate' > > > > > already says orchestrator decided to migrate. > > > > The problem I'm trying to solve is that most of vhost-user devices > > > > now block migration in qemu. And this makes sense since qemu can't > > > > extract and transfer backend daemon state. But this prevents us from > > > > updating qemu executable via local migration. So this flag is > > > > intended more as a safety check that says "I know what I'm doing". > > > > > > > > I agree that it is not really necessary if we trust the orchestrator > > > > to request migration only when the migration can be performed in a > > > > safe way. But changing the current behavior of vhost-user-fs from > > > > "always blocks migration" to "migrates partial state whenever > > > > orchestrator requests it" seems a little dangerous and can be > > > > misinterpreted as full support for migration in all cases. > > > It's not really different from block is it? orchestrator has to arrange > > > for backend migration. I think we just assumed there's no use-case where > > > this is practical for vhost-user-fs so we blocked it. > > > But in any case it's orchestrator's responsibility. > > > > Yes, you are right. So do you think we should just drop the blocker > > without adding a new flag? > > I'd be inclined to. I am curious what do dgilbert and stefanha think though. If the migration blocker is removed, what happens when a user attempts to migrate with a management tool and/or vhost-user-fs server implementation that don't support migration? Anton: Can you explain how stateless migration will work on the vhost-user-fs back-end side? Is it reusing vhost-user reconnect functionality or introducing a new mode for stateless migration? I guess the vhost-user-fs back-end implementation is required to implement VHOST_F_LOG_ALL so dirty memory can be tracked and drain all in-flight requests when vrings are stopped? Stefan
On 23/01/2023 16:09, Stefan Hajnoczi wrote: > On Sun, 22 Jan 2023 at 11:18, Michael S. Tsirkin <mst@redhat.com> wrote: >> On Sun, Jan 22, 2023 at 06:09:40PM +0200, Anton Kuchin wrote: >>> On 22/01/2023 16:46, Michael S. Tsirkin wrote: >>>> On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote: >>>>>>> This flag should be set when qemu don't need to worry about any >>>>>>> external state stored in vhost-user daemons during migration: >>>>>>> don't fail migration, just pack generic virtio device states to >>>>>>> migration stream and orchestrator guarantees that the rest of the >>>>>>> state will be present at the destination to restore full context and >>>>>>> continue running. >>>>>> Sorry I still do not get it. So fundamentally, why do we need this property? >>>>>> vhost-user-fs is not created by default that we'd then need opt-in to >>>>>> the special "migrateable" case. >>>>>> That's why I said it might make some sense as a device property as qemu >>>>>> tracks whether device is unplugged for us. >>>>>> >>>>>> But as written, if you are going to teach the orchestrator about >>>>>> vhost-user-fs and its special needs, just teach it when to migrate and >>>>>> where not to migrate. >>>>>> >>>>>> Either we describe the special situation to qemu and let qemu >>>>>> make an intelligent decision whether to allow migration, >>>>>> or we trust the orchestrator. And if it's the latter, then 'migrate' >>>>>> already says orchestrator decided to migrate. >>>>> The problem I'm trying to solve is that most of vhost-user devices >>>>> now block migration in qemu. And this makes sense since qemu can't >>>>> extract and transfer backend daemon state. But this prevents us from >>>>> updating qemu executable via local migration. So this flag is >>>>> intended more as a safety check that says "I know what I'm doing". >>>>> >>>>> I agree that it is not really necessary if we trust the orchestrator >>>>> to request migration only when the migration can be performed in a >>>>> safe way. But changing the current behavior of vhost-user-fs from >>>>> "always blocks migration" to "migrates partial state whenever >>>>> orchestrator requests it" seems a little dangerous and can be >>>>> misinterpreted as full support for migration in all cases. >>>> It's not really different from block is it? orchestrator has to arrange >>>> for backend migration. I think we just assumed there's no use-case where >>>> this is practical for vhost-user-fs so we blocked it. >>>> But in any case it's orchestrator's responsibility. >>> Yes, you are right. So do you think we should just drop the blocker >>> without adding a new flag? >> I'd be inclined to. I am curious what do dgilbert and stefanha think though. > If the migration blocker is removed, what happens when a user attempts > to migrate with a management tool and/or vhost-user-fs server > implementation that don't support migration? There will be no matching fuse-session in destination endpoint so all requests to this fs will fail until it is remounted from guest to send new FUSE_INIT message that does session setup. > > Anton: Can you explain how stateless migration will work on the > vhost-user-fs back-end side? Is it reusing vhost-user reconnect > functionality or introducing a new mode for stateless migration? I > guess the vhost-user-fs back-end implementation is required to > implement VHOST_F_LOG_ALL so dirty memory can be tracked and drain all > in-flight requests when vrings are stopped? It reuses existing vhost-user reconnect code to resubmit inflight requests. Sure, backend needs to support this feature - presence of required features is checked by generic vhost and vhost-user code during init and if something is missing migration blocker is assigned to the device (not a static one in vmstate that I remove in this patch, but other per-device kind of blocker). > > Stefan
* Michael S. Tsirkin (mst@redhat.com) wrote: > On Sun, Jan 22, 2023 at 06:09:40PM +0200, Anton Kuchin wrote: > > > > On 22/01/2023 16:46, Michael S. Tsirkin wrote: > > > On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote: > > > > > > This flag should be set when qemu don't need to worry about any > > > > > > external state stored in vhost-user daemons during migration: > > > > > > don't fail migration, just pack generic virtio device states to > > > > > > migration stream and orchestrator guarantees that the rest of the > > > > > > state will be present at the destination to restore full context and > > > > > > continue running. > > > > > Sorry I still do not get it. So fundamentally, why do we need this property? > > > > > vhost-user-fs is not created by default that we'd then need opt-in to > > > > > the special "migrateable" case. > > > > > That's why I said it might make some sense as a device property as qemu > > > > > tracks whether device is unplugged for us. > > > > > > > > > > But as written, if you are going to teach the orchestrator about > > > > > vhost-user-fs and its special needs, just teach it when to migrate and > > > > > where not to migrate. > > > > > > > > > > Either we describe the special situation to qemu and let qemu > > > > > make an intelligent decision whether to allow migration, > > > > > or we trust the orchestrator. And if it's the latter, then 'migrate' > > > > > already says orchestrator decided to migrate. > > > > The problem I'm trying to solve is that most of vhost-user devices > > > > now block migration in qemu. And this makes sense since qemu can't > > > > extract and transfer backend daemon state. But this prevents us from > > > > updating qemu executable via local migration. So this flag is > > > > intended more as a safety check that says "I know what I'm doing". > > > > > > > > I agree that it is not really necessary if we trust the orchestrator > > > > to request migration only when the migration can be performed in a > > > > safe way. But changing the current behavior of vhost-user-fs from > > > > "always blocks migration" to "migrates partial state whenever > > > > orchestrator requests it" seems a little dangerous and can be > > > > misinterpreted as full support for migration in all cases. > > > It's not really different from block is it? orchestrator has to arrange > > > for backend migration. I think we just assumed there's no use-case where > > > this is practical for vhost-user-fs so we blocked it. > > > But in any case it's orchestrator's responsibility. > > > > Yes, you are right. So do you think we should just drop the blocker > > without adding a new flag? > > I'd be inclined to. I am curious what do dgilbert and stefanha think though. Yes I think that's probably OK, as long as we use the flag for knowing how to handle the discard bitmap as a proxy for the daemon knowing how to handle *some* migrations; knowing which migrations is then the job for the orchestrator to be careful of. Dave > -- > MST >
On Mon, Jan 23, 2023 at 05:52:17PM +0200, Anton Kuchin wrote: > > On 23/01/2023 16:09, Stefan Hajnoczi wrote: > > On Sun, 22 Jan 2023 at 11:18, Michael S. Tsirkin <mst@redhat.com> wrote: > > > On Sun, Jan 22, 2023 at 06:09:40PM +0200, Anton Kuchin wrote: > > > > On 22/01/2023 16:46, Michael S. Tsirkin wrote: > > > > > On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote: > > > > > > > > This flag should be set when qemu don't need to worry about any > > > > > > > > external state stored in vhost-user daemons during migration: > > > > > > > > don't fail migration, just pack generic virtio device states to > > > > > > > > migration stream and orchestrator guarantees that the rest of the > > > > > > > > state will be present at the destination to restore full context and > > > > > > > > continue running. > > > > > > > Sorry I still do not get it. So fundamentally, why do we need this property? > > > > > > > vhost-user-fs is not created by default that we'd then need opt-in to > > > > > > > the special "migrateable" case. > > > > > > > That's why I said it might make some sense as a device property as qemu > > > > > > > tracks whether device is unplugged for us. > > > > > > > > > > > > > > But as written, if you are going to teach the orchestrator about > > > > > > > vhost-user-fs and its special needs, just teach it when to migrate and > > > > > > > where not to migrate. > > > > > > > > > > > > > > Either we describe the special situation to qemu and let qemu > > > > > > > make an intelligent decision whether to allow migration, > > > > > > > or we trust the orchestrator. And if it's the latter, then 'migrate' > > > > > > > already says orchestrator decided to migrate. > > > > > > The problem I'm trying to solve is that most of vhost-user devices > > > > > > now block migration in qemu. And this makes sense since qemu can't > > > > > > extract and transfer backend daemon state. But this prevents us from > > > > > > updating qemu executable via local migration. So this flag is > > > > > > intended more as a safety check that says "I know what I'm doing". > > > > > > > > > > > > I agree that it is not really necessary if we trust the orchestrator > > > > > > to request migration only when the migration can be performed in a > > > > > > safe way. But changing the current behavior of vhost-user-fs from > > > > > > "always blocks migration" to "migrates partial state whenever > > > > > > orchestrator requests it" seems a little dangerous and can be > > > > > > misinterpreted as full support for migration in all cases. > > > > > It's not really different from block is it? orchestrator has to arrange > > > > > for backend migration. I think we just assumed there's no use-case where > > > > > this is practical for vhost-user-fs so we blocked it. > > > > > But in any case it's orchestrator's responsibility. > > > > Yes, you are right. So do you think we should just drop the blocker > > > > without adding a new flag? > > > I'd be inclined to. I am curious what do dgilbert and stefanha think though. > > If the migration blocker is removed, what happens when a user attempts > > to migrate with a management tool and/or vhost-user-fs server > > implementation that don't support migration? > > There will be no matching fuse-session in destination endpoint so all > requests to this fs will fail until it is remounted from guest to > send new FUSE_INIT message that does session setup. The point of the migration blocker is to prevent breaking running guests. Situations where a migration completes but results in a broken guest are problematic for users (especially when they are not logged in to guests and able to fix them interactively). If a command-line option is set to override the blocker, that's fine. But there needs to be a blocker by default if external knowledge is required to decide whether or not it's safe to migrate. > > > > Anton: Can you explain how stateless migration will work on the > > vhost-user-fs back-end side? Is it reusing vhost-user reconnect > > functionality or introducing a new mode for stateless migration? I > > guess the vhost-user-fs back-end implementation is required to > > implement VHOST_F_LOG_ALL so dirty memory can be tracked and drain all > > in-flight requests when vrings are stopped? > > It reuses existing vhost-user reconnect code to resubmit inflight > requests. > Sure, backend needs to support this feature - presence of required > features is checked by generic vhost and vhost-user code during init > and if something is missing migration blocker is assigned to the > device (not a static one in vmstate that I remove in this patch, but > other per-device kind of blocker). This is not enough detail. Please post the QEMU patches before we commit to a user-visible vhost-user-fs command-line parameter. I think what you're trying is a new approach that can be made to work. However, both vhost-user and migration are fragile and you have not explained how it will work. I don't have confidence in merging this incrementally because I'm afraid of committing to user-visible or vhost-user protocol behavior that turns out to be broken just a little while later. The kind of detail I was hoping to hear was, for example, how vhost_user_blk_device_realize() blocks and tries to reconnect 3 times. Does this approach work for stateless migration? The destination QEMU is launched before the source QEMU disconnects from the vhost-user UNIX domain socket, so I guess the destination QEMU cannot connect in the current version of vhost-user reconnect as implemented by QEMU's vhost-user-blk device. Have you come up with a new handover protocol? Stefan
On Mon, Jan 23, 2023 at 06:27:23PM +0000, Dr. David Alan Gilbert wrote: > * Michael S. Tsirkin (mst@redhat.com) wrote: > > On Sun, Jan 22, 2023 at 06:09:40PM +0200, Anton Kuchin wrote: > > > > > > On 22/01/2023 16:46, Michael S. Tsirkin wrote: > > > > On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote: > > > > > > > This flag should be set when qemu don't need to worry about any > > > > > > > external state stored in vhost-user daemons during migration: > > > > > > > don't fail migration, just pack generic virtio device states to > > > > > > > migration stream and orchestrator guarantees that the rest of the > > > > > > > state will be present at the destination to restore full context and > > > > > > > continue running. > > > > > > Sorry I still do not get it. So fundamentally, why do we need this property? > > > > > > vhost-user-fs is not created by default that we'd then need opt-in to > > > > > > the special "migrateable" case. > > > > > > That's why I said it might make some sense as a device property as qemu > > > > > > tracks whether device is unplugged for us. > > > > > > > > > > > > But as written, if you are going to teach the orchestrator about > > > > > > vhost-user-fs and its special needs, just teach it when to migrate and > > > > > > where not to migrate. > > > > > > > > > > > > Either we describe the special situation to qemu and let qemu > > > > > > make an intelligent decision whether to allow migration, > > > > > > or we trust the orchestrator. And if it's the latter, then 'migrate' > > > > > > already says orchestrator decided to migrate. > > > > > The problem I'm trying to solve is that most of vhost-user devices > > > > > now block migration in qemu. And this makes sense since qemu can't > > > > > extract and transfer backend daemon state. But this prevents us from > > > > > updating qemu executable via local migration. So this flag is > > > > > intended more as a safety check that says "I know what I'm doing". > > > > > > > > > > I agree that it is not really necessary if we trust the orchestrator > > > > > to request migration only when the migration can be performed in a > > > > > safe way. But changing the current behavior of vhost-user-fs from > > > > > "always blocks migration" to "migrates partial state whenever > > > > > orchestrator requests it" seems a little dangerous and can be > > > > > misinterpreted as full support for migration in all cases. > > > > It's not really different from block is it? orchestrator has to arrange > > > > for backend migration. I think we just assumed there's no use-case where > > > > this is practical for vhost-user-fs so we blocked it. > > > > But in any case it's orchestrator's responsibility. > > > > > > Yes, you are right. So do you think we should just drop the blocker > > > without adding a new flag? > > > > I'd be inclined to. I am curious what do dgilbert and stefanha think though. > > Yes I think that's probably OK, as long as we use the flag for knowing > how to handle the discard bitmap as a proxy for the daemon knowing how > to handle *some* migrations; knowing which migrations is then the job > for the orchestrator to be careful of. I think the feature bit is not a good way to detect live migration support. vhost-user backends typically use libvhost-user, rust-vmm's vhost-user-backend crate, etc where this feature can be implemented for free. If the feature bit is advertized we don't know if the device implementation (net, blk, fs, etc) is aware of migration at all. Stefan
On Mon, Jan 23, 2023 at 02:49:39PM -0500, Stefan Hajnoczi wrote: > The point of the migration blocker is to prevent breaking running > guests. Situations where a migration completes but results in a broken > guest are problematic for users (especially when they are not logged in > to guests and able to fix them interactively). I thought it's the reverse - we block when we are sure it can't work. If we are not sure we should leave policy to orchestrators. You can always get into situations like this with stateful storage (as opposed to RO one). For example, naively scp the backend file then start migration. Will seem to work but corrupts the disk (I didn't try, for sure with raw but what about qcow2?) > If a command-line option is set to override the blocker, that's fine. > But there needs to be a blocker by default if external knowledge is > required to decide whether or not it's safe to migrate. If all the command line says is "I want migration to work" then that's more like shifting the blame than helping users. They just learn this one weird trick and it seems to work until it doesn't. Then we are like "we told you only to set this flag if you are sure" and they are like "well I was sure".
On Mon, Jan 23, 2023 at 04:00:50PM -0500, Michael S. Tsirkin wrote: > On Mon, Jan 23, 2023 at 02:49:39PM -0500, Stefan Hajnoczi wrote: > > The point of the migration blocker is to prevent breaking running > > guests. Situations where a migration completes but results in a broken > > guest are problematic for users (especially when they are not logged in > > to guests and able to fix them interactively). > > I thought it's the reverse - we block when we are sure it can't work. > If we are not sure we should leave policy to orchestrators. > > You can always get into situations like this with stateful storage (as > opposed to RO one). For example, naively scp the backend file then > start migration. Will seem to work but corrupts the disk (I didn't try, > for sure with raw but what about qcow2?) You're right that ultimately QEMU cannot verify that the destination will 100% work. Who knows if the destination is even a real QEMU or just a process that throws away the migration stream? :) > > If a command-line option is set to override the blocker, that's fine. > > But there needs to be a blocker by default if external knowledge is > > required to decide whether or not it's safe to migrate. > > If all the command line says is "I want migration to work" then > that's more like shifting the blame than helping users. > They just learn this one weird trick and it seems to work > until it doesn't. Then we are like "we told you only to set this > flag if you are sure" and they are like "well I was sure". What I'm getting at is that this is a breaking change. Previously the management tool didn't need to be aware of vhost-user-fs migration support. QEMU would reject migrations. We cannot start allowing them because management tools may depend on QEMU's migration blocker. If management tools need to be aware now then the safe way to introduce this is via a parameter that new management tools must explicitly pass to QEMU. Anton's same host migration case is valid but the majority of users migrate between hosts and that case is not supported yet. Most of the time vhost-user-fs migration won't work. Let's not break existing management tools and vhost-user-fs back-ends. Stefan
On Mon, 23 Jan 2023 at 14:54, Stefan Hajnoczi <stefanha@redhat.com> wrote: > > On Mon, Jan 23, 2023 at 06:27:23PM +0000, Dr. David Alan Gilbert wrote: > > * Michael S. Tsirkin (mst@redhat.com) wrote: > > > On Sun, Jan 22, 2023 at 06:09:40PM +0200, Anton Kuchin wrote: > > > > > > > > On 22/01/2023 16:46, Michael S. Tsirkin wrote: > > > > > On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote: > > > > > > > > This flag should be set when qemu don't need to worry about any > > > > > > > > external state stored in vhost-user daemons during migration: > > > > > > > > don't fail migration, just pack generic virtio device states to > > > > > > > > migration stream and orchestrator guarantees that the rest of the > > > > > > > > state will be present at the destination to restore full context and > > > > > > > > continue running. > > > > > > > Sorry I still do not get it. So fundamentally, why do we need this property? > > > > > > > vhost-user-fs is not created by default that we'd then need opt-in to > > > > > > > the special "migrateable" case. > > > > > > > That's why I said it might make some sense as a device property as qemu > > > > > > > tracks whether device is unplugged for us. > > > > > > > > > > > > > > But as written, if you are going to teach the orchestrator about > > > > > > > vhost-user-fs and its special needs, just teach it when to migrate and > > > > > > > where not to migrate. > > > > > > > > > > > > > > Either we describe the special situation to qemu and let qemu > > > > > > > make an intelligent decision whether to allow migration, > > > > > > > or we trust the orchestrator. And if it's the latter, then 'migrate' > > > > > > > already says orchestrator decided to migrate. > > > > > > The problem I'm trying to solve is that most of vhost-user devices > > > > > > now block migration in qemu. And this makes sense since qemu can't > > > > > > extract and transfer backend daemon state. But this prevents us from > > > > > > updating qemu executable via local migration. So this flag is > > > > > > intended more as a safety check that says "I know what I'm doing". > > > > > > > > > > > > I agree that it is not really necessary if we trust the orchestrator > > > > > > to request migration only when the migration can be performed in a > > > > > > safe way. But changing the current behavior of vhost-user-fs from > > > > > > "always blocks migration" to "migrates partial state whenever > > > > > > orchestrator requests it" seems a little dangerous and can be > > > > > > misinterpreted as full support for migration in all cases. > > > > > It's not really different from block is it? orchestrator has to arrange > > > > > for backend migration. I think we just assumed there's no use-case where > > > > > this is practical for vhost-user-fs so we blocked it. > > > > > But in any case it's orchestrator's responsibility. > > > > > > > > Yes, you are right. So do you think we should just drop the blocker > > > > without adding a new flag? > > > > > > I'd be inclined to. I am curious what do dgilbert and stefanha think though. > > > > Yes I think that's probably OK, as long as we use the flag for knowing > > how to handle the discard bitmap as a proxy for the daemon knowing how > > to handle *some* migrations; knowing which migrations is then the job > > for the orchestrator to be careful of. > > I think the feature bit is not a good way to detect live migration > support. vhost-user backends typically use libvhost-user, rust-vmm's > vhost-user-backend crate, etc where this feature can be implemented for > free. If the feature bit is advertized we don't know if the device > implementation (net, blk, fs, etc) is aware of migration at all. I checked how bad the situation is. libvhost-user currently enables LOG_ALL by default. :( So I don't think the front-end can use LOG_ALL alone to determine whether or not migration is supported by the back-end. There are several existing back-ends based on libvhost-user that have no concept of reconnection or migration but report the LOG_ALL feature bit. Stefan
* Stefan Hajnoczi (stefanha@gmail.com) wrote: > On Mon, 23 Jan 2023 at 14:54, Stefan Hajnoczi <stefanha@redhat.com> wrote: > > > > On Mon, Jan 23, 2023 at 06:27:23PM +0000, Dr. David Alan Gilbert wrote: > > > * Michael S. Tsirkin (mst@redhat.com) wrote: > > > > On Sun, Jan 22, 2023 at 06:09:40PM +0200, Anton Kuchin wrote: > > > > > > > > > > On 22/01/2023 16:46, Michael S. Tsirkin wrote: > > > > > > On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote: > > > > > > > > > This flag should be set when qemu don't need to worry about any > > > > > > > > > external state stored in vhost-user daemons during migration: > > > > > > > > > don't fail migration, just pack generic virtio device states to > > > > > > > > > migration stream and orchestrator guarantees that the rest of the > > > > > > > > > state will be present at the destination to restore full context and > > > > > > > > > continue running. > > > > > > > > Sorry I still do not get it. So fundamentally, why do we need this property? > > > > > > > > vhost-user-fs is not created by default that we'd then need opt-in to > > > > > > > > the special "migrateable" case. > > > > > > > > That's why I said it might make some sense as a device property as qemu > > > > > > > > tracks whether device is unplugged for us. > > > > > > > > > > > > > > > > But as written, if you are going to teach the orchestrator about > > > > > > > > vhost-user-fs and its special needs, just teach it when to migrate and > > > > > > > > where not to migrate. > > > > > > > > > > > > > > > > Either we describe the special situation to qemu and let qemu > > > > > > > > make an intelligent decision whether to allow migration, > > > > > > > > or we trust the orchestrator. And if it's the latter, then 'migrate' > > > > > > > > already says orchestrator decided to migrate. > > > > > > > The problem I'm trying to solve is that most of vhost-user devices > > > > > > > now block migration in qemu. And this makes sense since qemu can't > > > > > > > extract and transfer backend daemon state. But this prevents us from > > > > > > > updating qemu executable via local migration. So this flag is > > > > > > > intended more as a safety check that says "I know what I'm doing". > > > > > > > > > > > > > > I agree that it is not really necessary if we trust the orchestrator > > > > > > > to request migration only when the migration can be performed in a > > > > > > > safe way. But changing the current behavior of vhost-user-fs from > > > > > > > "always blocks migration" to "migrates partial state whenever > > > > > > > orchestrator requests it" seems a little dangerous and can be > > > > > > > misinterpreted as full support for migration in all cases. > > > > > > It's not really different from block is it? orchestrator has to arrange > > > > > > for backend migration. I think we just assumed there's no use-case where > > > > > > this is practical for vhost-user-fs so we blocked it. > > > > > > But in any case it's orchestrator's responsibility. > > > > > > > > > > Yes, you are right. So do you think we should just drop the blocker > > > > > without adding a new flag? > > > > > > > > I'd be inclined to. I am curious what do dgilbert and stefanha think though. > > > > > > Yes I think that's probably OK, as long as we use the flag for knowing > > > how to handle the discard bitmap as a proxy for the daemon knowing how > > > to handle *some* migrations; knowing which migrations is then the job > > > for the orchestrator to be careful of. > > > > I think the feature bit is not a good way to detect live migration > > support. vhost-user backends typically use libvhost-user, rust-vmm's > > vhost-user-backend crate, etc where this feature can be implemented for > > free. If the feature bit is advertized we don't know if the device > > implementation (net, blk, fs, etc) is aware of migration at all. > > I checked how bad the situation is. libvhost-user currently enables > LOG_ALL by default. :( > > So I don't think the front-end can use LOG_ALL alone to determine > whether or not migration is supported by the back-end. > > There are several existing back-ends based on libvhost-user that have > no concept of reconnection or migration but report the LOG_ALL feature > bit. Ouch, yes that's messy. Going back to the original question; I don't think a command line flag will work though, because even for a given VM there's the possibility of some (local) migrations working but other (remote) migrations not working; so you don't know at the point you start the VM whether your migrations are going to work. Dave > Stefan >
On Tue, Jan 24, 2023, 04:50 Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > * Stefan Hajnoczi (stefanha@gmail.com) wrote: > > On Mon, 23 Jan 2023 at 14:54, Stefan Hajnoczi <stefanha@redhat.com> > wrote: > > > > > > On Mon, Jan 23, 2023 at 06:27:23PM +0000, Dr. David Alan Gilbert wrote: > > > > * Michael S. Tsirkin (mst@redhat.com) wrote: > > > > > On Sun, Jan 22, 2023 at 06:09:40PM +0200, Anton Kuchin wrote: > > > > > > > > > > > > On 22/01/2023 16:46, Michael S. Tsirkin wrote: > > > > > > > On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote: > > > > > > > > > > This flag should be set when qemu don't need to worry > about any > > > > > > > > > > external state stored in vhost-user daemons during > migration: > > > > > > > > > > don't fail migration, just pack generic virtio device > states to > > > > > > > > > > migration stream and orchestrator guarantees that the > rest of the > > > > > > > > > > state will be present at the destination to restore full > context and > > > > > > > > > > continue running. > > > > > > > > > Sorry I still do not get it. So fundamentally, why do we > need this property? > > > > > > > > > vhost-user-fs is not created by default that we'd then > need opt-in to > > > > > > > > > the special "migrateable" case. > > > > > > > > > That's why I said it might make some sense as a device > property as qemu > > > > > > > > > tracks whether device is unplugged for us. > > > > > > > > > > > > > > > > > > But as written, if you are going to teach the orchestrator > about > > > > > > > > > vhost-user-fs and its special needs, just teach it when to > migrate and > > > > > > > > > where not to migrate. > > > > > > > > > > > > > > > > > > Either we describe the special situation to qemu and let > qemu > > > > > > > > > make an intelligent decision whether to allow migration, > > > > > > > > > or we trust the orchestrator. And if it's the latter, then > 'migrate' > > > > > > > > > already says orchestrator decided to migrate. > > > > > > > > The problem I'm trying to solve is that most of vhost-user > devices > > > > > > > > now block migration in qemu. And this makes sense since qemu > can't > > > > > > > > extract and transfer backend daemon state. But this prevents > us from > > > > > > > > updating qemu executable via local migration. So this flag is > > > > > > > > intended more as a safety check that says "I know what I'm > doing". > > > > > > > > > > > > > > > > I agree that it is not really necessary if we trust the > orchestrator > > > > > > > > to request migration only when the migration can be > performed in a > > > > > > > > safe way. But changing the current behavior of vhost-user-fs > from > > > > > > > > "always blocks migration" to "migrates partial state whenever > > > > > > > > orchestrator requests it" seems a little dangerous and can > be > > > > > > > > misinterpreted as full support for migration in all cases. > > > > > > > It's not really different from block is it? orchestrator has > to arrange > > > > > > > for backend migration. I think we just assumed there's no > use-case where > > > > > > > this is practical for vhost-user-fs so we blocked it. > > > > > > > But in any case it's orchestrator's responsibility. > > > > > > > > > > > > Yes, you are right. So do you think we should just drop the > blocker > > > > > > without adding a new flag? > > > > > > > > > > I'd be inclined to. I am curious what do dgilbert and stefanha > think though. > > > > > > > > Yes I think that's probably OK, as long as we use the flag for > knowing > > > > how to handle the discard bitmap as a proxy for the daemon knowing > how > > > > to handle *some* migrations; knowing which migrations is then the job > > > > for the orchestrator to be careful of. > > > > > > I think the feature bit is not a good way to detect live migration > > > support. vhost-user backends typically use libvhost-user, rust-vmm's > > > vhost-user-backend crate, etc where this feature can be implemented for > > > free. If the feature bit is advertized we don't know if the device > > > implementation (net, blk, fs, etc) is aware of migration at all. > > > > I checked how bad the situation is. libvhost-user currently enables > > LOG_ALL by default. :( > > > > So I don't think the front-end can use LOG_ALL alone to determine > > whether or not migration is supported by the back-end. > > > > There are several existing back-ends based on libvhost-user that have > > no concept of reconnection or migration but report the LOG_ALL feature > > bit. > > Ouch, yes that's messy. > > Going back to the original question; I don't think a command line flag > will work though, because even for a given VM there's the possibility > of some (local) migrations working but other (remote) migrations not > working; so you don't know at the point you start the VM whether > your migrations are going to work. > The user or management tool should know which types of migration a vhost-user-fs backend supports. That can be passed in as a per-device parameter. Then a migration parameter can be used to distinguish between same host and remote host migration? QEMU already distinguishes between pre-copy and post-copy migration, so this can be thought of as yet another type of migration. Stefan >
On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote: > Now any vhost-user-fs device makes VM unmigratable, that also prevents > qemu update without stopping the VM. In most cases that makes sense > because qemu has no way to transfer FUSE session state. > > But we can give an option to orchestrator to override this if it can > guarantee that state will be preserved (e.g. it uses migration to > update qemu and dst will run on the same host as src and use the same > socket endpoints). > > This patch keeps default behavior that prevents migration with such devices > but adds migration capability 'vhost-user-fs' to explicitly allow migration. > > Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru> > --- > hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++- > qapi/migration.json | 7 ++++++- > 2 files changed, 30 insertions(+), 2 deletions(-) Hi Anton, Sorry for holding up your work with the discussions that we had. I still feel it's important to agree on command-line and/or vhost-user protocol changes that will be able to support non-migratable, stateless migration/reconnect, and stateful migration vhost-user-fs back-ends. All three will exist. As a next step, could you share your code that implements the QEMU side of stateless migration? I think that will make it clearer whether a command-line option (migration capability or per-device) is sufficient or whether the vhost-user protocol needs to be extended. If the vhost-user protocol is extended then maybe no user-visible changes are necessary. QEMU will know if the vhost-user-fs backend supports migration and which type of migration. It can block migration in cases where it's not possible. Thanks, Stefan
On 25/01/2023 21:46, Stefan Hajnoczi wrote: > On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote: >> Now any vhost-user-fs device makes VM unmigratable, that also prevents >> qemu update without stopping the VM. In most cases that makes sense >> because qemu has no way to transfer FUSE session state. >> >> But we can give an option to orchestrator to override this if it can >> guarantee that state will be preserved (e.g. it uses migration to >> update qemu and dst will run on the same host as src and use the same >> socket endpoints). >> >> This patch keeps default behavior that prevents migration with such devices >> but adds migration capability 'vhost-user-fs' to explicitly allow migration. >> >> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru> >> --- >> hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++- >> qapi/migration.json | 7 ++++++- >> 2 files changed, 30 insertions(+), 2 deletions(-) > Hi Anton, > Sorry for holding up your work with the discussions that we had. I still > feel it's important to agree on command-line and/or vhost-user protocol > changes that will be able to support non-migratable, stateless > migration/reconnect, and stateful migration vhost-user-fs back-ends. All > three will exist. > > As a next step, could you share your code that implements the QEMU side > of stateless migration? > > I think that will make it clearer whether a command-line option > (migration capability or per-device) is sufficient or whether the > vhost-user protocol needs to be extended. > > If the vhost-user protocol is extended then maybe no user-visible > changes are necessary. QEMU will know if the vhost-user-fs backend > supports migration and which type of migration. It can block migration > in cases where it's not possible. > > Thanks, > Stefan Thank you, Stefan, That's OK. The discussion is very helpful and showed me some parts that should to be checked to make sure no harm is done by this feature. I needed some time to step back, review my approach to this feature with all valuable feedback and ideas that were suggested and check what other backend implementations can or can't do. I'll answer today the emails with questions that were addressed to me. This is all the code that QEMU needs to support stateless migration. Do you mean backend and/or orchestrator parts? I'll think of how protocol extension can help us make this feature transparent to users.
On Thu, 26 Jan 2023 at 09:20, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: > > On 25/01/2023 21:46, Stefan Hajnoczi wrote: > > On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote: > >> Now any vhost-user-fs device makes VM unmigratable, that also prevents > >> qemu update without stopping the VM. In most cases that makes sense > >> because qemu has no way to transfer FUSE session state. > >> > >> But we can give an option to orchestrator to override this if it can > >> guarantee that state will be preserved (e.g. it uses migration to > >> update qemu and dst will run on the same host as src and use the same > >> socket endpoints). > >> > >> This patch keeps default behavior that prevents migration with such devices > >> but adds migration capability 'vhost-user-fs' to explicitly allow migration. > >> > >> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru> > >> --- > >> hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++- > >> qapi/migration.json | 7 ++++++- > >> 2 files changed, 30 insertions(+), 2 deletions(-) > > Hi Anton, > > Sorry for holding up your work with the discussions that we had. I still > > feel it's important to agree on command-line and/or vhost-user protocol > > changes that will be able to support non-migratable, stateless > > migration/reconnect, and stateful migration vhost-user-fs back-ends. All > > three will exist. > > > > As a next step, could you share your code that implements the QEMU side > > of stateless migration? > > > > I think that will make it clearer whether a command-line option > > (migration capability or per-device) is sufficient or whether the > > vhost-user protocol needs to be extended. > > > > If the vhost-user protocol is extended then maybe no user-visible > > changes are necessary. QEMU will know if the vhost-user-fs backend > > supports migration and which type of migration. It can block migration > > in cases where it's not possible. > > > > Thanks, > > Stefan > > > Thank you, Stefan, > > That's OK. The discussion is very helpful and showed me some parts > that should to be checked to make sure no harm is done by this feature. > I needed some time to step back, review my approach to this feature > with all valuable feedback and ideas that were suggested and check > what other backend implementations can or can't do. > I'll answer today the emails with questions that were addressed to me. > > This is all the code that QEMU needs to support stateless migration. > Do you mean backend and/or orchestrator parts? It's unclear to me how the destination QEMU is able to connect to the vhost-user back-end while the source QEMU is still connected? I thought additional QEMU changes would be necessary to make migration handover work. Stefan
On 26/01/2023 17:13, Stefan Hajnoczi wrote: > On Thu, 26 Jan 2023 at 09:20, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >> On 25/01/2023 21:46, Stefan Hajnoczi wrote: >>> On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote: >>>> Now any vhost-user-fs device makes VM unmigratable, that also prevents >>>> qemu update without stopping the VM. In most cases that makes sense >>>> because qemu has no way to transfer FUSE session state. >>>> >>>> But we can give an option to orchestrator to override this if it can >>>> guarantee that state will be preserved (e.g. it uses migration to >>>> update qemu and dst will run on the same host as src and use the same >>>> socket endpoints). >>>> >>>> This patch keeps default behavior that prevents migration with such devices >>>> but adds migration capability 'vhost-user-fs' to explicitly allow migration. >>>> >>>> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru> >>>> --- >>>> hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++- >>>> qapi/migration.json | 7 ++++++- >>>> 2 files changed, 30 insertions(+), 2 deletions(-) >>> Hi Anton, >>> Sorry for holding up your work with the discussions that we had. I still >>> feel it's important to agree on command-line and/or vhost-user protocol >>> changes that will be able to support non-migratable, stateless >>> migration/reconnect, and stateful migration vhost-user-fs back-ends. All >>> three will exist. >>> >>> As a next step, could you share your code that implements the QEMU side >>> of stateless migration? >>> >>> I think that will make it clearer whether a command-line option >>> (migration capability or per-device) is sufficient or whether the >>> vhost-user protocol needs to be extended. >>> >>> If the vhost-user protocol is extended then maybe no user-visible >>> changes are necessary. QEMU will know if the vhost-user-fs backend >>> supports migration and which type of migration. It can block migration >>> in cases where it's not possible. >>> >>> Thanks, >>> Stefan >> >> Thank you, Stefan, >> >> That's OK. The discussion is very helpful and showed me some parts >> that should to be checked to make sure no harm is done by this feature. >> I needed some time to step back, review my approach to this feature >> with all valuable feedback and ideas that were suggested and check >> what other backend implementations can or can't do. >> I'll answer today the emails with questions that were addressed to me. >> >> This is all the code that QEMU needs to support stateless migration. >> Do you mean backend and/or orchestrator parts? > It's unclear to me how the destination QEMU is able to connect to the > vhost-user back-end while the source QEMU is still connected? I > thought additional QEMU changes would be necessary to make migration > handover work. > > Stefan Oh, yes, that is exactly the part I'm checking right now: I was testing the scenario when vm is saved to file and then restored from file without host and destination running at the same time.
Anton Kuchin <antonkuchin@yandex-team.ru> wrote: > On 19/01/2023 18:02, Stefan Hajnoczi wrote: >> On Thu, 19 Jan 2023 at 10:29, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >>> On 19/01/2023 16:30, Stefan Hajnoczi wrote: >>>> On Thu, 19 Jan 2023 at 07:43, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >>>>> On 18/01/2023 17:52, Stefan Hajnoczi wrote: >>>>>> On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: Hi Sorry to come so late into the discussion. >>>>>>> +static int vhost_user_fs_pre_save(void *opaque) >>>>>>> +{ >>>>>>> + MigrationState *s = migrate_get_current(); >>>>>>> + >>>>>>> + if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { >>>>>>> + error_report("Migration of vhost-user-fs devices requires internal FUSE " >>>>>>> + "state of backend to be preserved. If orchestrator can " >>>>>>> + "guarantee this (e.g. dst connects to the same backend " >>>>>>> + "instance or backend state is migrated) set 'vhost-user-fs' " >>>>>>> + "migration capability to true to enable migration."); >>>>>>> + return -1; >>>>>>> + } >>>>>>> + >>>>>>> + return 0; >>>>>>> +} >>>>>>> + >>>>>>> static const VMStateDescription vuf_vmstate = { >>>>>>> .name = "vhost-user-fs", >>>>>>> - .unmigratable = 1, >>>>>>> + .minimum_version_id = 0, >>>>>>> + .version_id = 0, >>>>>>> + .fields = (VMStateField[]) { >>>>>>> + VMSTATE_VIRTIO_DEVICE, >>>>>>> + VMSTATE_END_OF_LIST() >>>>>>> + }, >>>>>>> + .pre_save = vhost_user_fs_pre_save, >>>>>>> }; I don't object to extend the vmstate this way. But I object to the migration capability for several reasons: - This feature has _nothing_ to do with migration, the problem, what it describes, etc is related to vhost_user_fs. - The number of migration capabilities is limited And we add checks to see if they are valid, consistent, etc see qemu/migration/migration.c:migrate_caps_check() - As Stefan says, we can have several vhost_user_fs devices, and each one should know if it can migrate or not. - We have to options for the orchestator: * it knows that all the vhost_user_fs devices can be migration Then it can add a property to each vhost_user_fs device * it don't know it Then it is a good idea that we are not migrating this VM. > I think we'd be better without a new marker because migration code > has standard generic way of solving such puzzles that I described > above. So adding new marker would go against existing practice. > But if you could show me where I missed something I'll be grateful > and will fix it to avoid potential problems. > I'd also be happy to know the opinion of Dr. David Alan Gilbert. If everybody agrees that any vhost_user_fs device is going to have a virtio device, then I agree with you that the marker is not needed at this point. Once told that, I think that you are making your live harder in the future when you add the other migratable devices. I am assuming here that your "underlying device" is: enum VhostUserFSType { VHOST_USER_NO_MIGRATABLE = 0, // The one we are doing here VHOST_USER_EXTERNAL_MIGRATABLE, // The one you describe for the future VHOST_USER_INTERNAL_MIGRATABLE, }; struct VHostUserFS { /*< private >*/ VirtIODevice parent; VHostUserFSConf conf; struct vhost_virtqueue *vhost_vqs; struct vhost_dev vhost_dev; VhostUserState vhost_user; VirtQueue **req_vqs; VirtQueue *hiprio_vq; int32_t bootindex; enum migration_type; /*< public >*/ }; static int vhost_user_fs_pre_save(void *opaque) { VHostUserFS *s = opaque; if (s->migration_type == VHOST_USER_NO_MIGRATABLE)) { error_report("Migration of vhost-user-fs devices requires internal FUSE " "state of backend to be preserved. If orchestrator can " "guarantee this (e.g. dst connects to the same backend " "instance or backend state is migrated) set 'vhost-user-fs' " "migration capability to true to enable migration."); return -1; } if (s->migration_type == VHOST_USER_EXTERNAL_MIGRATABLE) { return 0; } if (s->migration_type == VHOST_USER_INTERNAL_MIGRATABLE) { error_report("still not implemented"); return -1; } assert("we don't reach here"); } Your initial vmstateDescription static const VMStateDescription vuf_vmstate = { .name = "vhost-user-fs", .unmigratable = 1, .minimum_version_id = 0, .version_id = 0, .fields = (VMStateField[]) { VMSTATE_INT8(migration_type, struct VHostUserFS), VMSTATE_VIRTIO_DEVICE, VMSTATE_END_OF_LIST() }, .pre_save = vhost_user_fs_pre_save, }; And later you change it to something like: static bool vhost_fs_user_internal_state_needed(void *opaque) { VHostUserFS *s = opaque; return s->migration_type == VMOST_USER_INTERNAL_MIGRATABLE; } static const VMStateDescription vmstate_vhost_user_fs_internal_sub = { .name = "vhost-user-fs/internal", .version_id = 1, .minimum_version_id = 1, .needed = &vhost_fs_user_internal_state_needed, .fields = (VMStateField[]) { .... // Whatever VMSTATE_END_OF_LIST() } }; static const VMStateDescription vuf_vmstate = { .name = "vhost-user-fs", .minimum_version_id = 0, .version_id = 0, .fields = (VMStateField[]) { VMSTATE_INT8(migration_type, struct VHostUserFS), VMSTATE_VIRTIO_DEVICE, VMSTATE_END_OF_LIST() }, .pre_save = vhost_user_fs_pre_save, .subsections = (const VMStateDescription*[]) { &vmstate_vhost_user_fs_internal_sub, NULL } }; And you are done. I will propose to use a property to set migration_type, but I didn't want to write the code right now. I think that this proposal will make Stephan happy, and it is just adding and extra uint8_t that is helpul to implement everything. Later, Juan. PD. One of the few things that Pascal got right and C got completely wrong were pascal variant registers vs C union's. If you have a union, if should be "required" that there is a field in the enclosing struct that specifies what element of the union we have. This is exactly that case.
Stefan Hajnoczi <stefanha@gmail.com> wrote: > On Tue, Jan 24, 2023, 04:50 Dr. David Alan Gilbert <dgilbert@redhat.com> > wrote: [...] >> > I checked how bad the situation is. libvhost-user currently enables >> > LOG_ALL by default. :( >> > >> > So I don't think the front-end can use LOG_ALL alone to determine >> > whether or not migration is supported by the back-end. >> > >> > There are several existing back-ends based on libvhost-user that have >> > no concept of reconnection or migration but report the LOG_ALL feature >> > bit. >> >> Ouch, yes that's messy. >> >> Going back to the original question; I don't think a command line flag >> will work though, because even for a given VM there's the possibility >> of some (local) migrations working but other (remote) migrations not >> working; so you don't know at the point you start the VM whether >> your migrations are going to work. >> > > The user or management tool should know which types of migration a > vhost-user-fs backend supports. That can be passed in as a per-device > parameter. > > Then a migration parameter can be used to distinguish between same host and > remote host migration? QEMU already distinguishes between pre-copy and > post-copy migration, so this can be thought of as yet another type of > migration. I was going to suggest this (my previous answer was after reading only the other part of the comments). What we have here is that this device has "three" states: - You can't migrate it to other host (now and the default behaviour) - You can migrate some of the backends if you are migrating in the same host (note, we don't know directly that we are migrating inside the same host, so I would agree to add _that_ migration capability, that is related to migration, and it makes sense for migration code and devices to know that is happening) - In the future, perhaps, you can migrate "always" some vhost-use-fs, that would be a property on my opinion. Later, Juan.
On 01/02/2023 16:26, Juan Quintela wrote: > Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >> On 19/01/2023 18:02, Stefan Hajnoczi wrote: >>> On Thu, 19 Jan 2023 at 10:29, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >>>> On 19/01/2023 16:30, Stefan Hajnoczi wrote: >>>>> On Thu, 19 Jan 2023 at 07:43, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >>>>>> On 18/01/2023 17:52, Stefan Hajnoczi wrote: >>>>>>> On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: > Hi > > Sorry to come so late into the discussion. You are just in time, I'm working on v2 of this patch right now :-) > >>>>>>>> +static int vhost_user_fs_pre_save(void *opaque) >>>>>>>> +{ >>>>>>>> + MigrationState *s = migrate_get_current(); >>>>>>>> + >>>>>>>> + if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { >>>>>>>> + error_report("Migration of vhost-user-fs devices requires internal FUSE " >>>>>>>> + "state of backend to be preserved. If orchestrator can " >>>>>>>> + "guarantee this (e.g. dst connects to the same backend " >>>>>>>> + "instance or backend state is migrated) set 'vhost-user-fs' " >>>>>>>> + "migration capability to true to enable migration."); >>>>>>>> + return -1; >>>>>>>> + } >>>>>>>> + >>>>>>>> + return 0; >>>>>>>> +} >>>>>>>> + >>>>>>>> static const VMStateDescription vuf_vmstate = { >>>>>>>> .name = "vhost-user-fs", >>>>>>>> - .unmigratable = 1, >>>>>>>> + .minimum_version_id = 0, >>>>>>>> + .version_id = 0, >>>>>>>> + .fields = (VMStateField[]) { >>>>>>>> + VMSTATE_VIRTIO_DEVICE, >>>>>>>> + VMSTATE_END_OF_LIST() >>>>>>>> + }, >>>>>>>> + .pre_save = vhost_user_fs_pre_save, >>>>>>>> }; > I don't object to extend the vmstate this way. > > But I object to the migration capability for several reasons: > - This feature has _nothing_ to do with migration, the problem, what it > describes, etc is related to vhost_user_fs. > - The number of migration capabilities is limited > And we add checks to see if they are valid, consistent, etc > see qemu/migration/migration.c:migrate_caps_check() > - As Stefan says, we can have several vhost_user_fs devices, and each > one should know if it can migrate or not. > - We have to options for the orchestator: > * it knows that all the vhost_user_fs devices can be migration > Then it can add a property to each vhost_user_fs device > * it don't know it > Then it is a good idea that we are not migrating this VM. > >> I think we'd be better without a new marker because migration code >> has standard generic way of solving such puzzles that I described >> above. So adding new marker would go against existing practice. >> But if you could show me where I missed something I'll be grateful >> and will fix it to avoid potential problems. >> I'd also be happy to know the opinion of Dr. David Alan Gilbert. > If everybody agrees that any vhost_user_fs device is going to have a > virtio device, then I agree with you that the marker is not needed at > this point. > > Once told that, I think that you are making your live harder in the > future when you add the other migratable devices. > > I am assuming here that your "underlying device" is: > > enum VhostUserFSType { > VHOST_USER_NO_MIGRATABLE = 0, > // The one we are doing here > VHOST_USER_EXTERNAL_MIGRATABLE, > // The one you describe for the future > VHOST_USER_INTERNAL_MIGRATABLE, > }; > > struct VHostUserFS { > /*< private >*/ > VirtIODevice parent; > VHostUserFSConf conf; > struct vhost_virtqueue *vhost_vqs; > struct vhost_dev vhost_dev; > VhostUserState vhost_user; > VirtQueue **req_vqs; > VirtQueue *hiprio_vq; > int32_t bootindex; > enum migration_type; > /*< public >*/ > }; > > > static int vhost_user_fs_pre_save(void *opaque) > { > VHostUserFS *s = opaque; > > if (s->migration_type == VHOST_USER_NO_MIGRATABLE)) { > error_report("Migration of vhost-user-fs devices requires internal FUSE " > "state of backend to be preserved. If orchestrator can " > "guarantee this (e.g. dst connects to the same backend " > "instance or backend state is migrated) set 'vhost-user-fs' " > "migration capability to true to enable migration."); > return -1; > } > if (s->migration_type == VHOST_USER_EXTERNAL_MIGRATABLE) { > return 0; > } > if (s->migration_type == VHOST_USER_INTERNAL_MIGRATABLE) { > error_report("still not implemented"); > return -1; > } > assert("we don't reach here"); > } > > Your initial vmstateDescription > > static const VMStateDescription vuf_vmstate = { > .name = "vhost-user-fs", > .unmigratable = 1, > .minimum_version_id = 0, > .version_id = 0, > .fields = (VMStateField[]) { > VMSTATE_INT8(migration_type, struct VHostUserFS), > VMSTATE_VIRTIO_DEVICE, > VMSTATE_END_OF_LIST() > }, > .pre_save = vhost_user_fs_pre_save, > }; > > And later you change it to something like: > > static bool vhost_fs_user_internal_state_needed(void *opaque) > { > VHostUserFS *s = opaque; > > return s->migration_type == VMOST_USER_INTERNAL_MIGRATABLE; > } > > static const VMStateDescription vmstate_vhost_user_fs_internal_sub = { > .name = "vhost-user-fs/internal", > .version_id = 1, > .minimum_version_id = 1, > .needed = &vhost_fs_user_internal_state_needed, > .fields = (VMStateField[]) { > .... // Whatever > VMSTATE_END_OF_LIST() > } > }; > > static const VMStateDescription vuf_vmstate = { > .name = "vhost-user-fs", > .minimum_version_id = 0, > .version_id = 0, > .fields = (VMStateField[]) { > VMSTATE_INT8(migration_type, struct VHostUserFS), > VMSTATE_VIRTIO_DEVICE, > VMSTATE_END_OF_LIST() > }, > .pre_save = vhost_user_fs_pre_save, > .subsections = (const VMStateDescription*[]) { > &vmstate_vhost_user_fs_internal_sub, > NULL > } > }; > > And you are done. > > I will propose to use a property to set migration_type, but I didn't > want to write the code right now. > > I think that this proposal will make Stephan happy, and it is just > adding and extra uint8_t that is helpul to implement everything. That is exactly the approach I'm trying to implement it right now. Single flag can be convenient for orchestrator but makes it too hard to account in all cases for all devices on qemu side without breaking future compatibility. So I'm rewriting it with properties. BTW do you think each vhost-user device should have its own enum of migration types or maybe we could make them common for all device types? > > Later, Juan. > > PD. One of the few things that Pascal got right and C got completely > wrong were pascal variant registers vs C union's. If you have a > union, if should be "required" that there is a field in the > enclosing struct that specifies what element of the union we have. > This is exactly that case. >
Anton Kuchin <antonkuchin@yandex-team.ru> wrote: > On 01/02/2023 16:26, Juan Quintela wrote: >> Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >>> On 19/01/2023 18:02, Stefan Hajnoczi wrote: >>>> On Thu, 19 Jan 2023 at 10:29, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >>>>> On 19/01/2023 16:30, Stefan Hajnoczi wrote: >>>>>> On Thu, 19 Jan 2023 at 07:43, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >>>>>>> On 18/01/2023 17:52, Stefan Hajnoczi wrote: >>>>>>>> On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >> Once told that, I think that you are making your live harder in the >> future when you add the other migratable devices. >> >> I am assuming here that your "underlying device" is: >> >> enum VhostUserFSType { >> VHOST_USER_NO_MIGRATABLE = 0, >> // The one we are doing here >> VHOST_USER_EXTERNAL_MIGRATABLE, >> // The one you describe for the future >> VHOST_USER_INTERNAL_MIGRATABLE, >> }; >> >> struct VHostUserFS { >> /*< private >*/ >> VirtIODevice parent; >> VHostUserFSConf conf; >> struct vhost_virtqueue *vhost_vqs; >> struct vhost_dev vhost_dev; >> VhostUserState vhost_user; >> VirtQueue **req_vqs; >> VirtQueue *hiprio_vq; >> int32_t bootindex; >> enum migration_type; >> /*< public >*/ >> }; >> >> >> static int vhost_user_fs_pre_save(void *opaque) >> { >> VHostUserFS *s = opaque; >> >> if (s->migration_type == VHOST_USER_NO_MIGRATABLE)) { >> error_report("Migration of vhost-user-fs devices requires internal FUSE " >> "state of backend to be preserved. If orchestrator can " >> "guarantee this (e.g. dst connects to the same backend " >> "instance or backend state is migrated) set 'vhost-user-fs' " >> "migration capability to true to enable migration."); >> return -1; >> } >> if (s->migration_type == VHOST_USER_EXTERNAL_MIGRATABLE) { >> return 0; >> } >> if (s->migration_type == VHOST_USER_INTERNAL_MIGRATABLE) { >> error_report("still not implemented"); >> return -1; >> } >> assert("we don't reach here"); >> } >> >> Your initial vmstateDescription >> >> static const VMStateDescription vuf_vmstate = { >> .name = "vhost-user-fs", >> .unmigratable = 1, >> .minimum_version_id = 0, >> .version_id = 0, >> .fields = (VMStateField[]) { >> VMSTATE_INT8(migration_type, struct VHostUserFS), >> VMSTATE_VIRTIO_DEVICE, >> VMSTATE_END_OF_LIST() >> }, >> .pre_save = vhost_user_fs_pre_save, >> }; >> >> And later you change it to something like: >> >> static bool vhost_fs_user_internal_state_needed(void *opaque) >> { >> VHostUserFS *s = opaque; >> >> return s->migration_type == VMOST_USER_INTERNAL_MIGRATABLE; >> } >> >> static const VMStateDescription vmstate_vhost_user_fs_internal_sub = { >> .name = "vhost-user-fs/internal", >> .version_id = 1, >> .minimum_version_id = 1, >> .needed = &vhost_fs_user_internal_state_needed, >> .fields = (VMStateField[]) { >> .... // Whatever >> VMSTATE_END_OF_LIST() >> } >> }; >> >> static const VMStateDescription vuf_vmstate = { >> .name = "vhost-user-fs", >> .minimum_version_id = 0, >> .version_id = 0, >> .fields = (VMStateField[]) { >> VMSTATE_INT8(migration_type, struct VHostUserFS), >> VMSTATE_VIRTIO_DEVICE, >> VMSTATE_END_OF_LIST() >> }, >> .pre_save = vhost_user_fs_pre_save, >> .subsections = (const VMStateDescription*[]) { >> &vmstate_vhost_user_fs_internal_sub, >> NULL >> } >> }; >> >> And you are done. >> >> I will propose to use a property to set migration_type, but I didn't >> want to write the code right now. >> >> I think that this proposal will make Stephan happy, and it is just >> adding and extra uint8_t that is helpul to implement everything. > > That is exactly the approach I'm trying to implement it right now. Single > flag can be convenient for orchestrator but makes it too hard to account in > all cases for all devices on qemu side without breaking future > compatibility. > So I'm rewriting it with properties. Nice. That would be my proposal. Just a bit complicated for a proof of concetp. > BTW do you think each vhost-user device should have its own enum of > migration > types or maybe we could make them common for all device types? I will put it for vhost-user, because as far as I know nobody else is asking for this functionality. The most similar device that I can think of right now is vfio devices. But they are implemeting callbacks to save hardware device state, and they go device by device, i.e. there is nothing general there. Later, Juan.
On 02/02/2023 11:59, Juan Quintela wrote: > Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >> On 01/02/2023 16:26, Juan Quintela wrote: >>> Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >>>> On 19/01/2023 18:02, Stefan Hajnoczi wrote: >>>>> On Thu, 19 Jan 2023 at 10:29, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >>>>>> On 19/01/2023 16:30, Stefan Hajnoczi wrote: >>>>>>> On Thu, 19 Jan 2023 at 07:43, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >>>>>>>> On 18/01/2023 17:52, Stefan Hajnoczi wrote: >>>>>>>>> On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >>> Once told that, I think that you are making your live harder in the >>> future when you add the other migratable devices. >>> >>> I am assuming here that your "underlying device" is: >>> >>> enum VhostUserFSType { >>> VHOST_USER_NO_MIGRATABLE = 0, >>> // The one we are doing here >>> VHOST_USER_EXTERNAL_MIGRATABLE, >>> // The one you describe for the future >>> VHOST_USER_INTERNAL_MIGRATABLE, >>> }; >>> >>> struct VHostUserFS { >>> /*< private >*/ >>> VirtIODevice parent; >>> VHostUserFSConf conf; >>> struct vhost_virtqueue *vhost_vqs; >>> struct vhost_dev vhost_dev; >>> VhostUserState vhost_user; >>> VirtQueue **req_vqs; >>> VirtQueue *hiprio_vq; >>> int32_t bootindex; >>> enum migration_type; >>> /*< public >*/ >>> }; >>> >>> >>> static int vhost_user_fs_pre_save(void *opaque) >>> { >>> VHostUserFS *s = opaque; >>> >>> if (s->migration_type == VHOST_USER_NO_MIGRATABLE)) { >>> error_report("Migration of vhost-user-fs devices requires internal FUSE " >>> "state of backend to be preserved. If orchestrator can " >>> "guarantee this (e.g. dst connects to the same backend " >>> "instance or backend state is migrated) set 'vhost-user-fs' " >>> "migration capability to true to enable migration."); >>> return -1; >>> } >>> if (s->migration_type == VHOST_USER_EXTERNAL_MIGRATABLE) { >>> return 0; >>> } >>> if (s->migration_type == VHOST_USER_INTERNAL_MIGRATABLE) { >>> error_report("still not implemented"); >>> return -1; >>> } >>> assert("we don't reach here"); >>> } >>> >>> Your initial vmstateDescription >>> >>> static const VMStateDescription vuf_vmstate = { >>> .name = "vhost-user-fs", >>> .unmigratable = 1, >>> .minimum_version_id = 0, >>> .version_id = 0, >>> .fields = (VMStateField[]) { >>> VMSTATE_INT8(migration_type, struct VHostUserFS), >>> VMSTATE_VIRTIO_DEVICE, >>> VMSTATE_END_OF_LIST() >>> }, >>> .pre_save = vhost_user_fs_pre_save, >>> }; >>> >>> And later you change it to something like: >>> >>> static bool vhost_fs_user_internal_state_needed(void *opaque) >>> { >>> VHostUserFS *s = opaque; >>> >>> return s->migration_type == VMOST_USER_INTERNAL_MIGRATABLE; >>> } >>> >>> static const VMStateDescription vmstate_vhost_user_fs_internal_sub = { >>> .name = "vhost-user-fs/internal", >>> .version_id = 1, >>> .minimum_version_id = 1, >>> .needed = &vhost_fs_user_internal_state_needed, >>> .fields = (VMStateField[]) { >>> .... // Whatever >>> VMSTATE_END_OF_LIST() >>> } >>> }; >>> >>> static const VMStateDescription vuf_vmstate = { >>> .name = "vhost-user-fs", >>> .minimum_version_id = 0, >>> .version_id = 0, >>> .fields = (VMStateField[]) { >>> VMSTATE_INT8(migration_type, struct VHostUserFS), >>> VMSTATE_VIRTIO_DEVICE, >>> VMSTATE_END_OF_LIST() >>> }, >>> .pre_save = vhost_user_fs_pre_save, >>> .subsections = (const VMStateDescription*[]) { >>> &vmstate_vhost_user_fs_internal_sub, >>> NULL >>> } >>> }; >>> >>> And you are done. >>> >>> I will propose to use a property to set migration_type, but I didn't >>> want to write the code right now. I have a little problem with implementation here and need an advice: Can we make this property adjustable at runtime after device was realized? There is a statement in qemu wiki [1] that makes me think this is possible but I couldn't find any code for it or example in other devices. > "Properties are, by default, locked while the device is realized. Exceptions > can be made by the devices themselves in order to implement a way for a user > to interact with a device while it is realized." Or is your idea just to set this property once at construction and keep it constant for device lifetime? [1] https://wiki.qemu.org/Features/QOM >>> >>> I think that this proposal will make Stephan happy, and it is just >>> adding and extra uint8_t that is helpul to implement everything. >> That is exactly the approach I'm trying to implement it right now. Single >> flag can be convenient for orchestrator but makes it too hard to account in >> all cases for all devices on qemu side without breaking future >> compatibility. >> So I'm rewriting it with properties. > Nice. That would be my proposal. Just a bit complicated for a proof of concetp. > >> BTW do you think each vhost-user device should have its own enum of >> migration >> types or maybe we could make them common for all device types? > I will put it for vhost-user, because as far as I know nobody else is > asking for this functionality. I mean do we need it for all vhost-user devices or only for vhost-user-fs that I'm implementing now? > > The most similar device that I can think of right now is vfio devices. > But they are implemeting callbacks to save hardware device state, and > they go device by device, i.e. there is nothing general there. > > Later, Juan. >
Anton Kuchin <antonkuchin@yandex-team.ru> wrote: > On 02/02/2023 11:59, Juan Quintela wrote: >> Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >>> On 01/02/2023 16:26, Juan Quintela wrote: >>>> Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >>>>> On 19/01/2023 18:02, Stefan Hajnoczi wrote: >>>>>> On Thu, 19 Jan 2023 at 10:29, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >>>>>>> On 19/01/2023 16:30, Stefan Hajnoczi wrote: >>>>>>>> On Thu, 19 Jan 2023 at 07:43, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >>>>>>>>> On 18/01/2023 17:52, Stefan Hajnoczi wrote: >>>>>>>>>> On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: >>>> Once told that, I think that you are making your live harder in the >>>> future when you add the other migratable devices. >>>> >>>> static const VMStateDescription vuf_vmstate = { >>>> .name = "vhost-user-fs", >>>> .minimum_version_id = 0, >>>> .version_id = 0, >>>> .fields = (VMStateField[]) { >>>> VMSTATE_INT8(migration_type, struct VHostUserFS), >>>> VMSTATE_VIRTIO_DEVICE, >>>> VMSTATE_END_OF_LIST() >>>> }, >>>> .pre_save = vhost_user_fs_pre_save, >>>> .subsections = (const VMStateDescription*[]) { >>>> &vmstate_vhost_user_fs_internal_sub, >>>> NULL >>>> } >>>> }; >>>> >>>> And you are done. >>>> >>>> I will propose to use a property to set migration_type, but I didn't >>>> want to write the code right now. > > I have a little problem with implementation here and need an advice: > > Can we make this property adjustable at runtime after device was realized? > There is a statement in qemu wiki [1] that makes me think this is possible > but I couldn't find any code for it or example in other devices. >> "Properties are, by default, locked while the device is > realized. Exceptions >> can be made by the devices themselves in order to implement a way > for a user >> to interact with a device while it is realized." > > Or is your idea just to set this property once at construction and keep it > constant for device lifetime? > > [1] https://wiki.qemu.org/Features/QOM I have no clue here. Markus? Stefan? >>>> >>>> I think that this proposal will make Stephan happy, and it is just >>>> adding and extra uint8_t that is helpul to implement everything. >>> That is exactly the approach I'm trying to implement it right now. Single >>> flag can be convenient for orchestrator but makes it too hard to account in >>> all cases for all devices on qemu side without breaking future >>> compatibility. >>> So I'm rewriting it with properties. >> Nice. That would be my proposal. Just a bit complicated for a proof of concetp. >> >>> BTW do you think each vhost-user device should have its own enum of >>> migration >>> types or maybe we could make them common for all device types? >> I will put it for vhost-user, because as far as I know nobody else is >> asking for this functionality. > > I mean do we need it for all vhost-user devices or only for vhost-user-fs > that I'm implementing now? I will put it only for vhost-user-fs, except if there is a central place that is used for all vhost-user and its easy to put there. But I don't know enough about vhost-user to know if there is any common struct to put this. >> The most similar device that I can think of right now is vfio devices. >> But they are implemeting callbacks to save hardware device state, and >> they go device by device, i.e. there is nothing general there. >> >> Later, Juan. >> Later, Juan.
On Fri, Feb 10, 2023 at 05:08:16PM +0100, Juan Quintela wrote: > Anton Kuchin <antonkuchin@yandex-team.ru> wrote: > > On 02/02/2023 11:59, Juan Quintela wrote: > >> Anton Kuchin <antonkuchin@yandex-team.ru> wrote: > >>> On 01/02/2023 16:26, Juan Quintela wrote: > >>>> Anton Kuchin <antonkuchin@yandex-team.ru> wrote: > >>>>> On 19/01/2023 18:02, Stefan Hajnoczi wrote: > >>>>>> On Thu, 19 Jan 2023 at 10:29, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: > >>>>>>> On 19/01/2023 16:30, Stefan Hajnoczi wrote: > >>>>>>>> On Thu, 19 Jan 2023 at 07:43, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: > >>>>>>>>> On 18/01/2023 17:52, Stefan Hajnoczi wrote: > >>>>>>>>>> On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote: > >>>> Once told that, I think that you are making your live harder in the > >>>> future when you add the other migratable devices. > >>>> > >>>> static const VMStateDescription vuf_vmstate = { > >>>> .name = "vhost-user-fs", > >>>> .minimum_version_id = 0, > >>>> .version_id = 0, > >>>> .fields = (VMStateField[]) { > >>>> VMSTATE_INT8(migration_type, struct VHostUserFS), > >>>> VMSTATE_VIRTIO_DEVICE, > >>>> VMSTATE_END_OF_LIST() > >>>> }, > >>>> .pre_save = vhost_user_fs_pre_save, > >>>> .subsections = (const VMStateDescription*[]) { > >>>> &vmstate_vhost_user_fs_internal_sub, > >>>> NULL > >>>> } > >>>> }; > >>>> > >>>> And you are done. > >>>> > >>>> I will propose to use a property to set migration_type, but I didn't > >>>> want to write the code right now. > > > > I have a little problem with implementation here and need an advice: > > > > Can we make this property adjustable at runtime after device was realized? > > There is a statement in qemu wiki [1] that makes me think this is possible > > but I couldn't find any code for it or example in other devices. > >> "Properties are, by default, locked while the device is > > realized. Exceptions > >> can be made by the devices themselves in order to implement a way > > for a user > >> to interact with a device while it is realized." > > > > Or is your idea just to set this property once at construction and keep it > > constant for device lifetime? > > > > [1] https://wiki.qemu.org/Features/QOM > > I have no clue here. Markus? Stefan? Sorry for the late reply. Yes, QOM properties can be set after realize (e.g. using the qom-set command). The set() callback can return an error, so some properties are implemented to refuse updates when ->realize is true. Stefan
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c index f5049735ac..13d920423e 100644 --- a/hw/virtio/vhost-user-fs.c +++ b/hw/virtio/vhost-user-fs.c @@ -24,6 +24,7 @@ #include "hw/virtio/vhost-user-fs.h" #include "monitor/monitor.h" #include "sysemu/sysemu.h" +#include "migration/migration.h" static const int user_feature_bits[] = { VIRTIO_F_VERSION_1, @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev) return &fs->vhost_dev; } +static int vhost_user_fs_pre_save(void *opaque) +{ + MigrationState *s = migrate_get_current(); + + if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) { + error_report("Migration of vhost-user-fs devices requires internal FUSE " + "state of backend to be preserved. If orchestrator can " + "guarantee this (e.g. dst connects to the same backend " + "instance or backend state is migrated) set 'vhost-user-fs' " + "migration capability to true to enable migration."); + return -1; + } + + return 0; +} + static const VMStateDescription vuf_vmstate = { .name = "vhost-user-fs", - .unmigratable = 1, + .minimum_version_id = 0, + .version_id = 0, + .fields = (VMStateField[]) { + VMSTATE_VIRTIO_DEVICE, + VMSTATE_END_OF_LIST() + }, + .pre_save = vhost_user_fs_pre_save, }; static Property vuf_properties[] = { diff --git a/qapi/migration.json b/qapi/migration.json index 88ecf86ac8..9a229ea884 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -477,6 +477,11 @@ # will be handled faster. This is a performance feature and # should not affect the correctness of postcopy migration. # (since 7.1) +# @vhost-user-fs: If enabled, the migration process will allow migration of +# vhost-user-fs devices, this should be enabled only when +# backend can preserve local FUSE state e.g. for qemu update +# when dst reconects to the same endpoints after migration. +# (since 8.0) # # Features: # @unstable: Members @x-colo and @x-ignore-shared are experimental. @@ -492,7 +497,7 @@ 'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate', { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] }, 'validate-uuid', 'background-snapshot', - 'zero-copy-send', 'postcopy-preempt'] } + 'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] } ## # @MigrationCapabilityStatus:
Now any vhost-user-fs device makes VM unmigratable, that also prevents qemu update without stopping the VM. In most cases that makes sense because qemu has no way to transfer FUSE session state. But we can give an option to orchestrator to override this if it can guarantee that state will be preserved (e.g. it uses migration to update qemu and dst will run on the same host as src and use the same socket endpoints). This patch keeps default behavior that prevents migration with such devices but adds migration capability 'vhost-user-fs' to explicitly allow migration. Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru> --- hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++- qapi/migration.json | 7 ++++++- 2 files changed, 30 insertions(+), 2 deletions(-)