Message ID | cover.1596528468.git.lukasstraub2@web.de (mailing list archive) |
---|---|
Headers | show |
Series | Introduce 'yank' oob qmp command to recover from hanging qemu | expand |
On Tue, 4 Aug 2020 10:11:22 +0200 Lukas Straub <lukasstraub2@web.de> wrote: > Hello Everyone, > In many cases, if qemu has a network connection (qmp, migration, chardev, etc.) > to some other server and that server dies or hangs, qemu hangs too. > These patches introduce the new 'yank' out-of-band qmp command to recover from > these kinds of hangs. The different subsystems register callbacks which get > executed with the yank command. For example the callback can shutdown() a > socket. This is intended for the colo use-case, but it can be used for other > things too of course. > > Regards, > Lukas Straub > > v7: > -yank_register_instance now returns error via Error **errp instead of aborting > -dropped "chardev/char.c: Check for duplicate id before creating chardev" > > v6: > -add Reviewed-by and Acked-by tags > -rebase on master > -lots of changes in nbd due to rebase > -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé) > -fix a crash discovered by the newly added chardev test > -fix the test itself > > v5: > -move yank.c to util/ > -move yank.h to include/qemu/ > -add license to yank.h > -use const char* > -nbd: use atomic_store_release and atomic_load_aqcuire > -io-channel: ensure thread-safety and document it > -add myself as maintainer for yank > > v4: > -fix build errors... > > v3: > -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini) > -fix build errors > -rewrite migration patch so it actually passes all tests > > v2: > -don't touch io/ code anymore > -always register yank functions > -'yank' now takes a list of instances to yank > -'query-yank' returns a list of yankable instances > > Lukas Straub (8): > Introduce yank feature > block/nbd.c: Add yank feature > chardev/char-socket.c: Add yank feature > migration: Add yank feature > io/channel-tls.c: make qio_channel_tls_shutdown thread-safe > io: Document thread-safety of qio_channel_shutdown > MAINTAINERS: Add myself as maintainer for yank feature > tests/test-char.c: Wait for the chardev to connect in > char_socket_client_dupid_test > > MAINTAINERS | 6 ++ > block/nbd.c | 129 +++++++++++++++--------- > chardev/char-socket.c | 31 ++++++ > include/io/channel.h | 2 + > include/qemu/yank.h | 80 +++++++++++++++ > io/channel-tls.c | 6 +- > migration/channel.c | 12 +++ > migration/migration.c | 25 ++++- > migration/multifd.c | 10 ++ > migration/qemu-file-channel.c | 6 ++ > migration/savevm.c | 6 ++ > qapi/misc.json | 45 +++++++++ > tests/Makefile.include | 2 +- > tests/test-char.c | 1 + > util/Makefile.objs | 1 + > util/yank.c | 184 ++++++++++++++++++++++++++++++++++ > 16 files changed, 493 insertions(+), 53 deletions(-) > create mode 100644 include/qemu/yank.h > create mode 100644 util/yank.c > > -- > 2.20.1 Ping...
On Tue, 18 Aug 2020 14:26:31 +0200 Lukas Straub <lukasstraub2@web.de> wrote: > On Tue, 4 Aug 2020 10:11:22 +0200 > Lukas Straub <lukasstraub2@web.de> wrote: > > > Hello Everyone, > > In many cases, if qemu has a network connection (qmp, migration, chardev, etc.) > > to some other server and that server dies or hangs, qemu hangs too. > > These patches introduce the new 'yank' out-of-band qmp command to recover from > > these kinds of hangs. The different subsystems register callbacks which get > > executed with the yank command. For example the callback can shutdown() a > > socket. This is intended for the colo use-case, but it can be used for other > > things too of course. > > > > Regards, > > Lukas Straub > > > > v7: > > -yank_register_instance now returns error via Error **errp instead of aborting > > -dropped "chardev/char.c: Check for duplicate id before creating chardev" > > > > v6: > > -add Reviewed-by and Acked-by tags > > -rebase on master > > -lots of changes in nbd due to rebase > > -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé) > > -fix a crash discovered by the newly added chardev test > > -fix the test itself > > > > v5: > > -move yank.c to util/ > > -move yank.h to include/qemu/ > > -add license to yank.h > > -use const char* > > -nbd: use atomic_store_release and atomic_load_aqcuire > > -io-channel: ensure thread-safety and document it > > -add myself as maintainer for yank > > > > v4: > > -fix build errors... > > > > v3: > > -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini) > > -fix build errors > > -rewrite migration patch so it actually passes all tests > > > > v2: > > -don't touch io/ code anymore > > -always register yank functions > > -'yank' now takes a list of instances to yank > > -'query-yank' returns a list of yankable instances > > > > Lukas Straub (8): > > Introduce yank feature > > block/nbd.c: Add yank feature > > chardev/char-socket.c: Add yank feature > > migration: Add yank feature > > io/channel-tls.c: make qio_channel_tls_shutdown thread-safe > > io: Document thread-safety of qio_channel_shutdown > > MAINTAINERS: Add myself as maintainer for yank feature > > tests/test-char.c: Wait for the chardev to connect in > > char_socket_client_dupid_test > > > > MAINTAINERS | 6 ++ > > block/nbd.c | 129 +++++++++++++++--------- > > chardev/char-socket.c | 31 ++++++ > > include/io/channel.h | 2 + > > include/qemu/yank.h | 80 +++++++++++++++ > > io/channel-tls.c | 6 +- > > migration/channel.c | 12 +++ > > migration/migration.c | 25 ++++- > > migration/multifd.c | 10 ++ > > migration/qemu-file-channel.c | 6 ++ > > migration/savevm.c | 6 ++ > > qapi/misc.json | 45 +++++++++ > > tests/Makefile.include | 2 +- > > tests/test-char.c | 1 + > > util/Makefile.objs | 1 + > > util/yank.c | 184 ++++++++++++++++++++++++++++++++++ > > 16 files changed, 493 insertions(+), 53 deletions(-) > > create mode 100644 include/qemu/yank.h > > create mode 100644 util/yank.c > > > > -- > > 2.20.1 > > Ping... Ping 2... Also, can the different subsystems have a look at this and give their ok? Regards, Lukas Straub
On Thu, Aug 27, 2020 at 10:42:46AM +0200, Lukas Straub wrote: > On Tue, 18 Aug 2020 14:26:31 +0200 > Lukas Straub <lukasstraub2@web.de> wrote: > > > On Tue, 4 Aug 2020 10:11:22 +0200 > > Lukas Straub <lukasstraub2@web.de> wrote: > > > > > Hello Everyone, > > > In many cases, if qemu has a network connection (qmp, migration, chardev, etc.) > > > to some other server and that server dies or hangs, qemu hangs too. > > > These patches introduce the new 'yank' out-of-band qmp command to recover from > > > these kinds of hangs. The different subsystems register callbacks which get > > > executed with the yank command. For example the callback can shutdown() a > > > socket. This is intended for the colo use-case, but it can be used for other > > > things too of course. > > > > > > Regards, > > > Lukas Straub > > > > > > v7: > > > -yank_register_instance now returns error via Error **errp instead of aborting > > > -dropped "chardev/char.c: Check for duplicate id before creating chardev" > > > > > > v6: > > > -add Reviewed-by and Acked-by tags > > > -rebase on master > > > -lots of changes in nbd due to rebase > > > -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé) > > > -fix a crash discovered by the newly added chardev test > > > -fix the test itself > > > > > > v5: > > > -move yank.c to util/ > > > -move yank.h to include/qemu/ > > > -add license to yank.h > > > -use const char* > > > -nbd: use atomic_store_release and atomic_load_aqcuire > > > -io-channel: ensure thread-safety and document it > > > -add myself as maintainer for yank > > > > > > v4: > > > -fix build errors... > > > > > > v3: > > > -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini) > > > -fix build errors > > > -rewrite migration patch so it actually passes all tests > > > > > > v2: > > > -don't touch io/ code anymore > > > -always register yank functions > > > -'yank' now takes a list of instances to yank > > > -'query-yank' returns a list of yankable instances > > > > > > Lukas Straub (8): > > > Introduce yank feature > > > block/nbd.c: Add yank feature > > > chardev/char-socket.c: Add yank feature > > > migration: Add yank feature > > > io/channel-tls.c: make qio_channel_tls_shutdown thread-safe > > > io: Document thread-safety of qio_channel_shutdown > > > MAINTAINERS: Add myself as maintainer for yank feature > > > tests/test-char.c: Wait for the chardev to connect in > > > char_socket_client_dupid_test > > > > > > MAINTAINERS | 6 ++ > > > block/nbd.c | 129 +++++++++++++++--------- > > > chardev/char-socket.c | 31 ++++++ > > > include/io/channel.h | 2 + > > > include/qemu/yank.h | 80 +++++++++++++++ > > > io/channel-tls.c | 6 +- > > > migration/channel.c | 12 +++ > > > migration/migration.c | 25 ++++- > > > migration/multifd.c | 10 ++ > > > migration/qemu-file-channel.c | 6 ++ > > > migration/savevm.c | 6 ++ > > > qapi/misc.json | 45 +++++++++ > > > tests/Makefile.include | 2 +- > > > tests/test-char.c | 1 + > > > util/Makefile.objs | 1 + > > > util/yank.c | 184 ++++++++++++++++++++++++++++++++++ > > > 16 files changed, 493 insertions(+), 53 deletions(-) > > > create mode 100644 include/qemu/yank.h > > > create mode 100644 util/yank.c > > > > > > -- > > > 2.20.1 > > > > Ping... > > Ping 2... > > Also, can the different subsystems have a look at this and give their ok? We need ACKs from the NBD, migration and chardev maintainers, for the respective patches, then I think this series is ready for a pull request. Once acks arrive, I'm happy to send a PULL unless someone else has a desire todo it. Regards, Daniel
Daniel P. Berrangé <berrange@redhat.com> writes: > On Thu, Aug 27, 2020 at 10:42:46AM +0200, Lukas Straub wrote: [...] >> Also, can the different subsystems have a look at this and give their ok? > > We need ACKs from the NBD, migration and chardev maintainers, for the > respective patches, then I think this series is ready for a pull request. The QMP interface and its documentation need a bit of work, see my review of PATCH 1. I'm hopeful v8 will nail it. > Once acks arrive, I'm happy to send a PULL unless someone else has a > desire todo it. Not yet, please.
* Daniel P. Berrangé (berrange@redhat.com) wrote: > On Thu, Aug 27, 2020 at 10:42:46AM +0200, Lukas Straub wrote: > > On Tue, 18 Aug 2020 14:26:31 +0200 > > Lukas Straub <lukasstraub2@web.de> wrote: > > > > > On Tue, 4 Aug 2020 10:11:22 +0200 > > > Lukas Straub <lukasstraub2@web.de> wrote: > > > > > > > Hello Everyone, > > > > In many cases, if qemu has a network connection (qmp, migration, chardev, etc.) > > > > to some other server and that server dies or hangs, qemu hangs too. > > > > These patches introduce the new 'yank' out-of-band qmp command to recover from > > > > these kinds of hangs. The different subsystems register callbacks which get > > > > executed with the yank command. For example the callback can shutdown() a > > > > socket. This is intended for the colo use-case, but it can be used for other > > > > things too of course. > > > > > > > > Regards, > > > > Lukas Straub > > > > > > > > v7: > > > > -yank_register_instance now returns error via Error **errp instead of aborting > > > > -dropped "chardev/char.c: Check for duplicate id before creating chardev" > > > > > > > > v6: > > > > -add Reviewed-by and Acked-by tags > > > > -rebase on master > > > > -lots of changes in nbd due to rebase > > > > -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé) > > > > -fix a crash discovered by the newly added chardev test > > > > -fix the test itself > > > > > > > > v5: > > > > -move yank.c to util/ > > > > -move yank.h to include/qemu/ > > > > -add license to yank.h > > > > -use const char* > > > > -nbd: use atomic_store_release and atomic_load_aqcuire > > > > -io-channel: ensure thread-safety and document it > > > > -add myself as maintainer for yank > > > > > > > > v4: > > > > -fix build errors... > > > > > > > > v3: > > > > -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini) > > > > -fix build errors > > > > -rewrite migration patch so it actually passes all tests > > > > > > > > v2: > > > > -don't touch io/ code anymore > > > > -always register yank functions > > > > -'yank' now takes a list of instances to yank > > > > -'query-yank' returns a list of yankable instances > > > > > > > > Lukas Straub (8): > > > > Introduce yank feature > > > > block/nbd.c: Add yank feature > > > > chardev/char-socket.c: Add yank feature > > > > migration: Add yank feature > > > > io/channel-tls.c: make qio_channel_tls_shutdown thread-safe > > > > io: Document thread-safety of qio_channel_shutdown > > > > MAINTAINERS: Add myself as maintainer for yank feature > > > > tests/test-char.c: Wait for the chardev to connect in > > > > char_socket_client_dupid_test > > > > > > > > MAINTAINERS | 6 ++ > > > > block/nbd.c | 129 +++++++++++++++--------- > > > > chardev/char-socket.c | 31 ++++++ > > > > include/io/channel.h | 2 + > > > > include/qemu/yank.h | 80 +++++++++++++++ > > > > io/channel-tls.c | 6 +- > > > > migration/channel.c | 12 +++ > > > > migration/migration.c | 25 ++++- > > > > migration/multifd.c | 10 ++ > > > > migration/qemu-file-channel.c | 6 ++ > > > > migration/savevm.c | 6 ++ > > > > qapi/misc.json | 45 +++++++++ > > > > tests/Makefile.include | 2 +- > > > > tests/test-char.c | 1 + > > > > util/Makefile.objs | 1 + > > > > util/yank.c | 184 ++++++++++++++++++++++++++++++++++ > > > > 16 files changed, 493 insertions(+), 53 deletions(-) > > > > create mode 100644 include/qemu/yank.h > > > > create mode 100644 util/yank.c > > > > > > > > -- > > > > 2.20.1 > > > > > > Ping... > > > > Ping 2... > > > > Also, can the different subsystems have a look at this and give their ok? > > We need ACKs from the NBD, migration and chardev maintainers, for the > respective patches, then I think this series is ready for a pull request. I'm happy from Migration: Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > Once acks arrive, I'm happy to send a PULL unless someone else has a > desire todo it. Looks like Markus would like a QMP tweak; but other than that I'd also be happy to take it via migration; whichever is easiest. Dave > > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|