mbox series

[0/5] Introduce 'yank' oob qmp command to recover from hanging qemu

Message ID cover.1589193717.git.lukasstraub2@web.de (mailing list archive)
Headers show
Series Introduce 'yank' oob qmp command to recover from hanging qemu | expand

Message

Lukas Straub May 11, 2020, 11:14 a.m. UTC
Hello Everyone,
In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
to some other server and that server dies or hangs, qemu hangs too.
These patches introduce the new 'yank' out-of-band qmp command to recover from
these kinds of hangs. The different subsystems register callbacks which get
executed with the yank command. For example the callback can shutdown() a
socket. This is intended for the colo use-case, but it can be used for other
things too of course.

Regards,
Lukas Straub

Lukas Straub (5):
  Introduce yank feature
  io/channel.c,io/channel-socket.c: Add yank feature
  block/nbd.c: Add yank feature
  chardev/char-socket.c: Add yank feature
  migration: Add yank feature

 Makefile.objs               |  2 +
 block/nbd.c                 | 68 ++++++++++++++++++++++++---------
 chardev/char-socket.c       |  8 ++++
 chardev/char.c              |  3 ++
 include/io/channel-socket.h |  1 +
 include/io/channel.h        | 12 ++++++
 io/channel-socket.c         | 29 ++++++++++++++
 io/channel.c                |  9 +++++
 migration/channel.c         |  2 +
 migration/migration.c       | 11 ++++++
 qapi/block-core.json        |  5 ++-
 qapi/char.json              |  5 ++-
 qapi/migration.json         | 17 +++++++--
 qapi/misc.json              | 15 ++++++++
 softmmu/vl.c                |  2 +
 yank.c                      | 75 +++++++++++++++++++++++++++++++++++++
 yank.h                      | 12 ++++++
 17 files changed, 254 insertions(+), 22 deletions(-)
 create mode 100644 yank.c
 create mode 100644 yank.h

Comments

Daniel P. Berrangé May 11, 2020, 11:49 a.m. UTC | #1
On Mon, May 11, 2020 at 01:14:34PM +0200, Lukas Straub wrote:
> Hello Everyone,
> In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> to some other server and that server dies or hangs, qemu hangs too.

If qemu as a whole hangs due to a stalled network connection, that is a
bug in QEMU that we should be fixing IMHO. QEMU should be doing non-blocking
I/O in general, such that if the network connection or remote server stalls,
we simply stop sending I/O - we shouldn't ever hang the QEMU process or main
loop.

There are places in QEMU code which are not well behaved in this respect,
but many are, and others are getting fixed where found to be important.

Arguably any place in QEMU code which can result in a hang of QEMU in the
event of a stalled network should be considered a security flaw, because
the network is untrusted in general.

> These patches introduce the new 'yank' out-of-band qmp command to recover from
> these kinds of hangs. The different subsystems register callbacks which get
> executed with the yank command. For example the callback can shutdown() a
> socket. This is intended for the colo use-case, but it can be used for other
> things too of course.

IIUC, invoking the "yank" command unconditionally kills every single
network connection in QEMU that has registered with the "yank" subsystem.
IMHO this is way too big of a hammer, even if we accept there are bugs in
QEMU not handling stalled networking well.

eg if a chardev hangs QEMU, and we tear down everything, killing the NBD
connection used for the guest disk, we needlessly break I/O.

eg doing this in the chardev backend is not desirable, because the bugs
with hanging QEMU are typically caused by the way the frontend device
uses the chardev blocking I/O calls, instead of non-blocking I/O calls.


Regards,
Daniel
Dr. David Alan Gilbert May 11, 2020, 12:07 p.m. UTC | #2
* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Mon, May 11, 2020 at 01:14:34PM +0200, Lukas Straub wrote:
> > Hello Everyone,
> > In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> > to some other server and that server dies or hangs, qemu hangs too.
> 
> If qemu as a whole hangs due to a stalled network connection, that is a
> bug in QEMU that we should be fixing IMHO. QEMU should be doing non-blocking
> I/O in general, such that if the network connection or remote server stalls,
> we simply stop sending I/O - we shouldn't ever hang the QEMU process or main
> loop.
> 
> There are places in QEMU code which are not well behaved in this respect,
> but many are, and others are getting fixed where found to be important.
> 
> Arguably any place in QEMU code which can result in a hang of QEMU in the
> event of a stalled network should be considered a security flaw, because
> the network is untrusted in general.

That's not really true of the 'management network' - people trust that
and I don't see a lot of the qemu code getting fixed safely for all of
them.

> > These patches introduce the new 'yank' out-of-band qmp command to recover from
> > these kinds of hangs. The different subsystems register callbacks which get
> > executed with the yank command. For example the callback can shutdown() a
> > socket. This is intended for the colo use-case, but it can be used for other
> > things too of course.
> 
> IIUC, invoking the "yank" command unconditionally kills every single
> network connection in QEMU that has registered with the "yank" subsystem.
> IMHO this is way too big of a hammer, even if we accept there are bugs in
> QEMU not handling stalled networking well.

But isn't this hammer conditional - I see that it's a migration
capabiltiy for the migration socket, and a flag in nbd - so it only
yanks things you've told it to.

> eg if a chardev hangs QEMU, and we tear down everything, killing the NBD
> connection used for the guest disk, we needlessly break I/O.
> 
> eg doing this in the chardev backend is not desirable, because the bugs
> with hanging QEMU are typically caused by the way the frontend device
> uses the chardev blocking I/O calls, instead of non-blocking I/O calls.
> 

Having a way to get out of any of these problems from a single point is
quite nice.  To be useful in COLO you need to know for sure you can get
out of any network screwup.

We already use shutdown(2) in migrate_cancel and migrate-pause for
basically the same reason; I don't think we've got anything similar for
NBD, and we probably should have (I think I asked for it fairly
recently).

Dave



> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Daniel P. Berrangé May 11, 2020, 12:17 p.m. UTC | #3
On Mon, May 11, 2020 at 01:07:18PM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > On Mon, May 11, 2020 at 01:14:34PM +0200, Lukas Straub wrote:
> > > Hello Everyone,
> > > In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> > > to some other server and that server dies or hangs, qemu hangs too.
> > 
> > If qemu as a whole hangs due to a stalled network connection, that is a
> > bug in QEMU that we should be fixing IMHO. QEMU should be doing non-blocking
> > I/O in general, such that if the network connection or remote server stalls,
> > we simply stop sending I/O - we shouldn't ever hang the QEMU process or main
> > loop.
> > 
> > There are places in QEMU code which are not well behaved in this respect,
> > but many are, and others are getting fixed where found to be important.
> > 
> > Arguably any place in QEMU code which can result in a hang of QEMU in the
> > event of a stalled network should be considered a security flaw, because
> > the network is untrusted in general.
> 
> That's not really true of the 'management network' - people trust that
> and I don't see a lot of the qemu code getting fixed safely for all of
> them.

It depends on the user / app / deployment scenario. In OpenStack alot of
work was done to beef up security between services on the mgmt network,
with TLS encryption as standard to reduce attack vectors.

> > > These patches introduce the new 'yank' out-of-band qmp command to recover from
> > > these kinds of hangs. The different subsystems register callbacks which get
> > > executed with the yank command. For example the callback can shutdown() a
> > > socket. This is intended for the colo use-case, but it can be used for other
> > > things too of course.
> > 
> > IIUC, invoking the "yank" command unconditionally kills every single
> > network connection in QEMU that has registered with the "yank" subsystem.
> > IMHO this is way too big of a hammer, even if we accept there are bugs in
> > QEMU not handling stalled networking well.
> 
> But isn't this hammer conditional - I see that it's a migration
> capabiltiy for the migration socket, and a flag in nbd - so it only
> yanks things you've told it to.

IIUC, you have to set these flags upfront when you launch QEMU, or
hotplug the device using the feature. When something gets stuck,
and you issue the "yank" command, then everything that has the flag
enabled gets torn down. So in practice it looks like the flag will
get enabled for everything at QEMU startup, and yanking down tear
down everything.

> > eg if a chardev hangs QEMU, and we tear down everything, killing the NBD
> > connection used for the guest disk, we needlessly break I/O.
> > 
> > eg doing this in the chardev backend is not desirable, because the bugs
> > with hanging QEMU are typically caused by the way the frontend device
> > uses the chardev blocking I/O calls, instead of non-blocking I/O calls.
> > 
> 
> Having a way to get out of any of these problems from a single point is
> quite nice.  To be useful in COLO you need to know for sure you can get
> out of any network screwup.
> 
> We already use shutdown(2) in migrate_cancel and migrate-pause for
> basically the same reason; I don't think we've got anything similar for
> NBD, and we probably should have (I think I asked for it fairly
> recently).

Yes, the migrate_cancel is an example of a more fine grained way to
recover. I was thinking that we need an equivalent fine control knob
for NBD too.

That way if QEMU does get stuck, you can start by tearing down the
least distruptive channel. eg try tearing down the migration connection
first (which shouldn't negatively impact the guest), and only if that
doesn't work then, move on to tear down the NBD connection (which risks
data loss)

Regards,
Daniel
Dr. David Alan Gilbert May 11, 2020, 3:46 p.m. UTC | #4
* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Mon, May 11, 2020 at 01:07:18PM +0100, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > On Mon, May 11, 2020 at 01:14:34PM +0200, Lukas Straub wrote:
> > > > Hello Everyone,
> > > > In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> > > > to some other server and that server dies or hangs, qemu hangs too.
> > > 
> > > If qemu as a whole hangs due to a stalled network connection, that is a
> > > bug in QEMU that we should be fixing IMHO. QEMU should be doing non-blocking
> > > I/O in general, such that if the network connection or remote server stalls,
> > > we simply stop sending I/O - we shouldn't ever hang the QEMU process or main
> > > loop.
> > > 
> > > There are places in QEMU code which are not well behaved in this respect,
> > > but many are, and others are getting fixed where found to be important.
> > > 
> > > Arguably any place in QEMU code which can result in a hang of QEMU in the
> > > event of a stalled network should be considered a security flaw, because
> > > the network is untrusted in general.
> > 
> > That's not really true of the 'management network' - people trust that
> > and I don't see a lot of the qemu code getting fixed safely for all of
> > them.
> 
> It depends on the user / app / deployment scenario. In OpenStack alot of
> work was done to beef up security between services on the mgmt network,
> with TLS encryption as standard to reduce attack vectors.
> 
> > > > These patches introduce the new 'yank' out-of-band qmp command to recover from
> > > > these kinds of hangs. The different subsystems register callbacks which get
> > > > executed with the yank command. For example the callback can shutdown() a
> > > > socket. This is intended for the colo use-case, but it can be used for other
> > > > things too of course.
> > > 
> > > IIUC, invoking the "yank" command unconditionally kills every single
> > > network connection in QEMU that has registered with the "yank" subsystem.
> > > IMHO this is way too big of a hammer, even if we accept there are bugs in
> > > QEMU not handling stalled networking well.
> > 
> > But isn't this hammer conditional - I see that it's a migration
> > capabiltiy for the migration socket, and a flag in nbd - so it only
> > yanks things you've told it to.
> 
> IIUC, you have to set these flags upfront when you launch QEMU, or
> hotplug the device using the feature. When something gets stuck,
> and you issue the "yank" command, then everything that has the flag
> enabled gets torn down. So in practice it looks like the flag will
> get enabled for everything at QEMU startup, and yanking down tear
> down everything.

For COLO I really expect it for the migration stream, the disk mirroring
stream and probably the network comparison/forwarding streams.

> > > eg if a chardev hangs QEMU, and we tear down everything, killing the NBD
> > > connection used for the guest disk, we needlessly break I/O.
> > > 
> > > eg doing this in the chardev backend is not desirable, because the bugs
> > > with hanging QEMU are typically caused by the way the frontend device
> > > uses the chardev blocking I/O calls, instead of non-blocking I/O calls.
> > > 
> > 
> > Having a way to get out of any of these problems from a single point is
> > quite nice.  To be useful in COLO you need to know for sure you can get
> > out of any network screwup.
> > 
> > We already use shutdown(2) in migrate_cancel and migrate-pause for
> > basically the same reason; I don't think we've got anything similar for
> > NBD, and we probably should have (I think I asked for it fairly
> > recently).
> 
> Yes, the migrate_cancel is an example of a more fine grained way to
> recover. I was thinking that we need an equivalent fine control knob
> for NBD too.

I feel it might be nice not to have to create so many separate knobs.

> That way if QEMU does get stuck, you can start by tearing down the
> least distruptive channel. eg try tearing down the migration connection
> first (which shouldn't negatively impact the guest), and only if that
> doesn't work then, move on to tear down the NBD connection (which risks
> data loss)

I wonder if a different way would be to make all network connections
register with yank, but then make yank take a list of connections to
shutdown(2).

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Lukas Straub May 11, 2020, 6:12 p.m. UTC | #5
On Mon, 11 May 2020 12:49:47 +0100
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Mon, May 11, 2020 at 01:14:34PM +0200, Lukas Straub wrote:
> > Hello Everyone,
> > In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> > to some other server and that server dies or hangs, qemu hangs too.  
> 
> If qemu as a whole hangs due to a stalled network connection, that is a
> bug in QEMU that we should be fixing IMHO. QEMU should be doing non-blocking
> I/O in general, such that if the network connection or remote server stalls,
> we simply stop sending I/O - we shouldn't ever hang the QEMU process or main
> loop.
> 
> There are places in QEMU code which are not well behaved in this respect,
> but many are, and others are getting fixed where found to be important.
> 
> Arguably any place in QEMU code which can result in a hang of QEMU in the
> event of a stalled network should be considered a security flaw, because
> the network is untrusted in general.

The fact that out-of-band qmp commands exist at all shows that we have to make tradeoffs of developer time vs. doing things right. Sure, the migration code can be rewritten to use non-blocking i/o and finegrained locks. But as a hobbyist I don't have time to fix this.

> > These patches introduce the new 'yank' out-of-band qmp command to recover from
> > these kinds of hangs. The different subsystems register callbacks which get
> > executed with the yank command. For example the callback can shutdown() a
> > socket. This is intended for the colo use-case, but it can be used for other
> > things too of course.  
> 
> IIUC, invoking the "yank" command unconditionally kills every single
> network connection in QEMU that has registered with the "yank" subsystem.
> IMHO this is way too big of a hammer, even if we accept there are bugs in
> QEMU not handling stalled networking well.
> 
> eg if a chardev hangs QEMU, and we tear down everything, killing the NBD
> connection used for the guest disk, we needlessly break I/O.

Yeah, these patches are intended to solve the problems with the colo use-case where all external connections (migration, chardevs, nbd) are just for replication. In other use-cases you'd enable the yank feature only on the non-essential connections.

> eg doing this in the chardev backend is not desirable, because the bugs
> with hanging QEMU are typically caused by the way the frontend device
> uses the chardev blocking I/O calls, instead of non-blocking I/O calls.
> 
> 
> Regards,
> Daniel
Lukas Straub May 11, 2020, 7:41 p.m. UTC | #6
On Mon, 11 May 2020 13:17:14 +0100
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Mon, May 11, 2020 at 01:07:18PM +0100, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrangé (berrange@redhat.com) wrote:  
> > > On Mon, May 11, 2020 at 01:14:34PM +0200, Lukas Straub wrote:  
> > > > Hello Everyone,
> > > > In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> > > > to some other server and that server dies or hangs, qemu hangs too.  
> > > 
> > > If qemu as a whole hangs due to a stalled network connection, that is a
> > > bug in QEMU that we should be fixing IMHO. QEMU should be doing non-blocking
> > > I/O in general, such that if the network connection or remote server stalls,
> > > we simply stop sending I/O - we shouldn't ever hang the QEMU process or main
> > > loop.
> > > 
> > > There are places in QEMU code which are not well behaved in this respect,
> > > but many are, and others are getting fixed where found to be important.
> > > 
> > > Arguably any place in QEMU code which can result in a hang of QEMU in the
> > > event of a stalled network should be considered a security flaw, because
> > > the network is untrusted in general.  
> > 
> > That's not really true of the 'management network' - people trust that
> > and I don't see a lot of the qemu code getting fixed safely for all of
> > them.  
> 
> It depends on the user / app / deployment scenario. In OpenStack alot of
> work was done to beef up security between services on the mgmt network,
> with TLS encryption as standard to reduce attack vectors.
> 
> > > > These patches introduce the new 'yank' out-of-band qmp command to recover from
> > > > these kinds of hangs. The different subsystems register callbacks which get
> > > > executed with the yank command. For example the callback can shutdown() a
> > > > socket. This is intended for the colo use-case, but it can be used for other
> > > > things too of course.  
> > > 
> > > IIUC, invoking the "yank" command unconditionally kills every single
> > > network connection in QEMU that has registered with the "yank" subsystem.
> > > IMHO this is way too big of a hammer, even if we accept there are bugs in
> > > QEMU not handling stalled networking well.  
> > 
> > But isn't this hammer conditional - I see that it's a migration
> > capabiltiy for the migration socket, and a flag in nbd - so it only
> > yanks things you've told it to.  
> 
> IIUC, you have to set these flags upfront when you launch QEMU, or
> hotplug the device using the feature. When something gets stuck,
> and you issue the "yank" command, then everything that has the flag
> enabled gets torn down. So in practice it looks like the flag will
> get enabled for everything at QEMU startup, and yanking down tear
> down everything.
> 
> > > eg if a chardev hangs QEMU, and we tear down everything, killing the NBD
> > > connection used for the guest disk, we needlessly break I/O.
> > > 
> > > eg doing this in the chardev backend is not desirable, because the bugs
> > > with hanging QEMU are typically caused by the way the frontend device
> > > uses the chardev blocking I/O calls, instead of non-blocking I/O calls.
> > >   
> > 
> > Having a way to get out of any of these problems from a single point is
> > quite nice.  To be useful in COLO you need to know for sure you can get
> > out of any network screwup.
> > 
> > We already use shutdown(2) in migrate_cancel and migrate-pause for
> > basically the same reason; I don't think we've got anything similar for
> > NBD, and we probably should have (I think I asked for it fairly
> > recently).  
> 
> Yes, the migrate_cancel is an example of a more fine grained way to
> recover. I was thinking that we need an equivalent fine control knob
> for NBD too.

One reason why the yank feature is done this way is that the management application may not know in what state qemu is and so it doesn't know what to yank. Poking in the dark would work too in my case, but it's not that nice.

Regards,
Lukas Straub

> That way if QEMU does get stuck, you can start by tearing down the
> least distruptive channel. eg try tearing down the migration connection
> first (which shouldn't negatively impact the guest), and only if that
> doesn't work then, move on to tear down the NBD connection (which risks
> data loss)
> 
> Regards,
> Daniel
Daniel P. Berrangé May 12, 2020, 9:03 a.m. UTC | #7
On Mon, May 11, 2020 at 08:12:18PM +0200, Lukas Straub wrote:
> On Mon, 11 May 2020 12:49:47 +0100
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> > On Mon, May 11, 2020 at 01:14:34PM +0200, Lukas Straub wrote:
> > > Hello Everyone,
> > > In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> > > to some other server and that server dies or hangs, qemu hangs too.  
> > 
> > If qemu as a whole hangs due to a stalled network connection, that is a
> > bug in QEMU that we should be fixing IMHO. QEMU should be doing non-blocking
> > I/O in general, such that if the network connection or remote server stalls,
> > we simply stop sending I/O - we shouldn't ever hang the QEMU process or main
> > loop.
> > 
> > There are places in QEMU code which are not well behaved in this respect,
> > but many are, and others are getting fixed where found to be important.
> > 
> > Arguably any place in QEMU code which can result in a hang of QEMU in the
> > event of a stalled network should be considered a security flaw, because
> > the network is untrusted in general.
> 
> The fact that out-of-band qmp commands exist at all shows that we have to make tradeoffs of developer time vs. doing things right. Sure, the migration code can be rewritten to use non-blocking i/o and finegrained locks. But as a hobbyist I don't have time to fix this.
> 
> > > These patches introduce the new 'yank' out-of-band qmp command to recover from
> > > these kinds of hangs. The different subsystems register callbacks which get
> > > executed with the yank command. For example the callback can shutdown() a
> > > socket. This is intended for the colo use-case, but it can be used for other
> > > things too of course.  
> > 
> > IIUC, invoking the "yank" command unconditionally kills every single
> > network connection in QEMU that has registered with the "yank" subsystem.
> > IMHO this is way too big of a hammer, even if we accept there are bugs in
> > QEMU not handling stalled networking well.
> > 
> > eg if a chardev hangs QEMU, and we tear down everything, killing the NBD
> > connection used for the guest disk, we needlessly break I/O.
> 
> Yeah, these patches are intended to solve the problems with the colo
> use-case where all external connections (migration, chardevs, nbd)
> are just for replication. In other use-cases you'd enable the yank
> feature only on the non-essential connections.

That is a pretty inflexible design for other use cases though,
as "non-essential" is not a black & white list in general. There
are varying levels of importance to the different channels. We
can afford to loose migration without any user visible effects.
If that doesn't solve it, a serial device chardev, or VNC connection
can be dropped at the inconvenience of loosing interactive console
which is end user visible impact, so may only be want to be yanked
if the migration yank didn't fix it. 

Regards,
Daniel
Dr. David Alan Gilbert May 12, 2020, 9:06 a.m. UTC | #8
* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Mon, May 11, 2020 at 08:12:18PM +0200, Lukas Straub wrote:
> > On Mon, 11 May 2020 12:49:47 +0100
> > Daniel P. Berrangé <berrange@redhat.com> wrote:
> > 
> > > On Mon, May 11, 2020 at 01:14:34PM +0200, Lukas Straub wrote:
> > > > Hello Everyone,
> > > > In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> > > > to some other server and that server dies or hangs, qemu hangs too.  
> > > 
> > > If qemu as a whole hangs due to a stalled network connection, that is a
> > > bug in QEMU that we should be fixing IMHO. QEMU should be doing non-blocking
> > > I/O in general, such that if the network connection or remote server stalls,
> > > we simply stop sending I/O - we shouldn't ever hang the QEMU process or main
> > > loop.
> > > 
> > > There are places in QEMU code which are not well behaved in this respect,
> > > but many are, and others are getting fixed where found to be important.
> > > 
> > > Arguably any place in QEMU code which can result in a hang of QEMU in the
> > > event of a stalled network should be considered a security flaw, because
> > > the network is untrusted in general.
> > 
> > The fact that out-of-band qmp commands exist at all shows that we have to make tradeoffs of developer time vs. doing things right. Sure, the migration code can be rewritten to use non-blocking i/o and finegrained locks. But as a hobbyist I don't have time to fix this.
> > 
> > > > These patches introduce the new 'yank' out-of-band qmp command to recover from
> > > > these kinds of hangs. The different subsystems register callbacks which get
> > > > executed with the yank command. For example the callback can shutdown() a
> > > > socket. This is intended for the colo use-case, but it can be used for other
> > > > things too of course.  
> > > 
> > > IIUC, invoking the "yank" command unconditionally kills every single
> > > network connection in QEMU that has registered with the "yank" subsystem.
> > > IMHO this is way too big of a hammer, even if we accept there are bugs in
> > > QEMU not handling stalled networking well.
> > > 
> > > eg if a chardev hangs QEMU, and we tear down everything, killing the NBD
> > > connection used for the guest disk, we needlessly break I/O.
> > 
> > Yeah, these patches are intended to solve the problems with the colo
> > use-case where all external connections (migration, chardevs, nbd)
> > are just for replication. In other use-cases you'd enable the yank
> > feature only on the non-essential connections.
> 
> That is a pretty inflexible design for other use cases though,
> as "non-essential" is not a black & white list in general. There
> are varying levels of importance to the different channels. We
> can afford to loose migration without any user visible effects.
> If that doesn't solve it, a serial device chardev, or VNC connection
> can be dropped at the inconvenience of loosing interactive console
> which is end user visible impact, so may only be want to be yanked
> if the migration yank didn't fix it. 

In the case of COLO that's not the case though - here we explicitly want
to kill the migration to be able to ensure that we can recover - and
we're under time pressure to get the other member of the pair running
again. 

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Dr. David Alan Gilbert May 12, 2020, 9:09 a.m. UTC | #9
* Lukas Straub (lukasstraub2@web.de) wrote:
> On Mon, 11 May 2020 12:49:47 +0100
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> 
> > On Mon, May 11, 2020 at 01:14:34PM +0200, Lukas Straub wrote:
> > > Hello Everyone,
> > > In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> > > to some other server and that server dies or hangs, qemu hangs too.  
> > 
> > If qemu as a whole hangs due to a stalled network connection, that is a
> > bug in QEMU that we should be fixing IMHO. QEMU should be doing non-blocking
> > I/O in general, such that if the network connection or remote server stalls,
> > we simply stop sending I/O - we shouldn't ever hang the QEMU process or main
> > loop.
> > 
> > There are places in QEMU code which are not well behaved in this respect,
> > but many are, and others are getting fixed where found to be important.
> > 
> > Arguably any place in QEMU code which can result in a hang of QEMU in the
> > event of a stalled network should be considered a security flaw, because
> > the network is untrusted in general.
> 
> The fact that out-of-band qmp commands exist at all shows that we have to make tradeoffs of developer time vs. doing things right. Sure, the migration code can be rewritten to use non-blocking i/o and finegrained locks. But as a hobbyist I don't have time to fix this.

If it was just an hobbyist vs fulltime thing then I'd say that was a bad
way to make the decision; however the reality is that even those who are
paid to watch this code don't have the feeling we can make it fail
quickly/non-blocking - and for COLO you need to be absolutely sure you
nail every case, so you'd some how have to audit the whole lot, and keep
watching it.

(and thank you for taking your time to do this!)

Dave


> > > These patches introduce the new 'yank' out-of-band qmp command to recover from
> > > these kinds of hangs. The different subsystems register callbacks which get
> > > executed with the yank command. For example the callback can shutdown() a
> > > socket. This is intended for the colo use-case, but it can be used for other
> > > things too of course.  
> > 
> > IIUC, invoking the "yank" command unconditionally kills every single
> > network connection in QEMU that has registered with the "yank" subsystem.
> > IMHO this is way too big of a hammer, even if we accept there are bugs in
> > QEMU not handling stalled networking well.
> > 
> > eg if a chardev hangs QEMU, and we tear down everything, killing the NBD
> > connection used for the guest disk, we needlessly break I/O.
> 
> Yeah, these patches are intended to solve the problems with the colo use-case where all external connections (migration, chardevs, nbd) are just for replication. In other use-cases you'd enable the yank feature only on the non-essential connections.
> 
> > eg doing this in the chardev backend is not desirable, because the bugs
> > with hanging QEMU are typically caused by the way the frontend device
> > uses the chardev blocking I/O calls, instead of non-blocking I/O calls.
> > 
> > 
> > Regards,
> > Daniel
> 


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Lukas Straub May 12, 2020, 9:32 a.m. UTC | #10
On Mon, 11 May 2020 16:46:45 +0100
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Daniel P. Berrangé (berrange@redhat.com) wrote: 
> > ...
> > That way if QEMU does get stuck, you can start by tearing down the
> > least distruptive channel. eg try tearing down the migration connection
> > first (which shouldn't negatively impact the guest), and only if that
> > doesn't work then, move on to tear down the NBD connection (which risks
> > data loss)  
> 
> I wonder if a different way would be to make all network connections
> register with yank, but then make yank take a list of connections to
> shutdown(2).

Good Idea. We could name the connections (/yank callbacks) in the form "nbd:<node-name>", "chardev:<chardev-name>" and "migration" (and add "netdev:...", etc. in the future). Then make yank take a list of connection names as you suggest and silently ignore connections that don't exist. And maybe even add a 'query-yank' oob command returning a list of registered connections so the management application can do pattern matching if it wants.

Comments?

Regards,
Lukas Straub

> Dave
> 
> > Regards,
> > Daniel
> > -- 
> > |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> > |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> > |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|  
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>
Daniel P. Berrangé May 12, 2020, 9:43 a.m. UTC | #11
On Tue, May 12, 2020 at 11:32:06AM +0200, Lukas Straub wrote:
> On Mon, 11 May 2020 16:46:45 +0100
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Daniel P. Berrangé (berrange@redhat.com) wrote: 
> > > ...
> > > That way if QEMU does get stuck, you can start by tearing down the
> > > least distruptive channel. eg try tearing down the migration connection
> > > first (which shouldn't negatively impact the guest), and only if that
> > > doesn't work then, move on to tear down the NBD connection (which risks
> > > data loss)  
> > 
> > I wonder if a different way would be to make all network connections
> > register with yank, but then make yank take a list of connections to
> > shutdown(2).
> 
> Good Idea. We could name the connections (/yank callbacks) in the
> form "nbd:<node-name>", "chardev:<chardev-name>" and "migration"
> (and add "netdev:...", etc. in the future). Then make yank take a
> list of connection names as you suggest and silently ignore connections
> that don't exist. And maybe even add a 'query-yank' oob command returning
> a list of registered connections so the management application can do
> pattern matching if it wants.

Yes, that would make the yank command much more flexible in how it can
be used.

As an alternative to using formatted strings like this, it could be
modelled more explicitly in QAPI

  { 'struct':  'YankChannels',
    'data': { 'chardev': [ 'string' ],
              'nbd': ['string'],
	      'migration': bool } }

In this example, 'chardev' would accept a list of chardev IDs which
have it enabled, 'nbd' would accept a list of block node IDs which
have it enabled, and migration is a singleton on/off.

The benefit of this modelling is that you can introspect QEMU
to discover what classes of channels support being yanked by
this QEMU build, as well as what instances are configured to
be yanked. ie you can distinguish between a QEMU that doesn't
support yanking network devices, from a QEMU that does support
yanking network devices, but doesn't have it enabled for any
network device instances.

Regards,
Daniel
Dr. David Alan Gilbert May 12, 2020, 6:58 p.m. UTC | #12
* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Tue, May 12, 2020 at 11:32:06AM +0200, Lukas Straub wrote:
> > On Mon, 11 May 2020 16:46:45 +0100
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > 
> > > * Daniel P. Berrangé (berrange@redhat.com) wrote: 
> > > > ...
> > > > That way if QEMU does get stuck, you can start by tearing down the
> > > > least distruptive channel. eg try tearing down the migration connection
> > > > first (which shouldn't negatively impact the guest), and only if that
> > > > doesn't work then, move on to tear down the NBD connection (which risks
> > > > data loss)  
> > > 
> > > I wonder if a different way would be to make all network connections
> > > register with yank, but then make yank take a list of connections to
> > > shutdown(2).
> > 
> > Good Idea. We could name the connections (/yank callbacks) in the
> > form "nbd:<node-name>", "chardev:<chardev-name>" and "migration"
> > (and add "netdev:...", etc. in the future). Then make yank take a
> > list of connection names as you suggest and silently ignore connections
> > that don't exist. And maybe even add a 'query-yank' oob command returning
> > a list of registered connections so the management application can do
> > pattern matching if it wants.
> 
> Yes, that would make the yank command much more flexible in how it can
> be used.
> 
> As an alternative to using formatted strings like this, it could be
> modelled more explicitly in QAPI
> 
>   { 'struct':  'YankChannels',
>     'data': { 'chardev': [ 'string' ],
>               'nbd': ['string'],
> 	      'migration': bool } }
> 
> In this example, 'chardev' would accept a list of chardev IDs which
> have it enabled, 'nbd' would accept a list of block node IDs which
> have it enabled, and migration is a singleton on/off.

Do we already have a QOM object name for each of these things?
Is that nbd/blockdevice unique - i.e. can you have multiple nbd clients
on the same node?

> The benefit of this modelling is that you can introspect QEMU
> to discover what classes of channels support being yanked by
> this QEMU build, as well as what instances are configured to
> be yanked. ie you can distinguish between a QEMU that doesn't
> support yanking network devices, from a QEMU that does support
> yanking network devices, but doesn't have it enabled for any
> network device instances.

What do we need to make it introspectable like that?

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Lukas Straub May 12, 2020, 7:42 p.m. UTC | #13
On Tue, 12 May 2020 10:43:37 +0100
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Tue, May 12, 2020 at 11:32:06AM +0200, Lukas Straub wrote:
> > ... 
> > 
> > Good Idea. We could name the connections (/yank callbacks) in the
> > form "nbd:<node-name>", "chardev:<chardev-name>" and "migration"
> > (and add "netdev:...", etc. in the future). Then make yank take a
> > list of connection names as you suggest and silently ignore connections
> > that don't exist. And maybe even add a 'query-yank' oob command returning
> > a list of registered connections so the management application can do
> > pattern matching if it wants.  
> 
> Yes, that would make the yank command much more flexible in how it can
> be used.
> 
> As an alternative to using formatted strings like this, it could be
> modelled more explicitly in QAPI
> 
>   { 'struct':  'YankChannels',
>     'data': { 'chardev': [ 'string' ],
>               'nbd': ['string'],
> 	      'migration': bool } }
> 
> In this example, 'chardev' would accept a list of chardev IDs which
> have it enabled, 'nbd' would accept a list of block node IDs which
> have it enabled, and migration is a singleton on/off.

With the new command, the yank feature can then be enabled unconditionally on the instances.

> The benefit of this modelling is that you can introspect QEMU
> to discover what classes of channels support being yanked by
> this QEMU build, as well as what instances are configured to
> be yanked. ie you can distinguish between a QEMU that doesn't
> support yanking network devices, from a QEMU that does support
> yanking network devices, but doesn't have it enabled for any
> network device instances.
> 
> Regards,
> Daniel
Daniel P. Berrangé May 13, 2020, 8:41 a.m. UTC | #14
On Tue, May 12, 2020 at 07:58:17PM +0100, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > On Tue, May 12, 2020 at 11:32:06AM +0200, Lukas Straub wrote:
> > > On Mon, 11 May 2020 16:46:45 +0100
> > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > 
> > > > * Daniel P. Berrangé (berrange@redhat.com) wrote: 
> > > > > ...
> > > > > That way if QEMU does get stuck, you can start by tearing down the
> > > > > least distruptive channel. eg try tearing down the migration connection
> > > > > first (which shouldn't negatively impact the guest), and only if that
> > > > > doesn't work then, move on to tear down the NBD connection (which risks
> > > > > data loss)  
> > > > 
> > > > I wonder if a different way would be to make all network connections
> > > > register with yank, but then make yank take a list of connections to
> > > > shutdown(2).
> > > 
> > > Good Idea. We could name the connections (/yank callbacks) in the
> > > form "nbd:<node-name>", "chardev:<chardev-name>" and "migration"
> > > (and add "netdev:...", etc. in the future). Then make yank take a
> > > list of connection names as you suggest and silently ignore connections
> > > that don't exist. And maybe even add a 'query-yank' oob command returning
> > > a list of registered connections so the management application can do
> > > pattern matching if it wants.
> > 
> > Yes, that would make the yank command much more flexible in how it can
> > be used.
> > 
> > As an alternative to using formatted strings like this, it could be
> > modelled more explicitly in QAPI
> > 
> >   { 'struct':  'YankChannels',
> >     'data': { 'chardev': [ 'string' ],
> >               'nbd': ['string'],
> > 	      'migration': bool } }
> > 
> > In this example, 'chardev' would accept a list of chardev IDs which
> > have it enabled, 'nbd' would accept a list of block node IDs which
> > have it enabled, and migration is a singleton on/off.
> 
> Do we already have a QOM object name for each of these things?
> Is that nbd/blockdevice unique - i.e. can you have multiple nbd clients
> on the same node?
> 
> > The benefit of this modelling is that you can introspect QEMU
> > to discover what classes of channels support being yanked by
> > this QEMU build, as well as what instances are configured to
> > be yanked. ie you can distinguish between a QEMU that doesn't
> > support yanking network devices, from a QEMU that does support
> > yanking network devices, but doesn't have it enabled for any
> > network device instances.
> 
> What do we need to make it introspectable like that?

The model I describe above would work, because you can introspect the
QAPI schema to see what fields are in the "YankChannels" struct. So
if we added a "nic" field later, apps can discover it.

Regards,
Daniel
Kevin Wolf May 13, 2020, 10:32 a.m. UTC | #15
Am 12.05.2020 um 11:43 hat Daniel P. Berrangé geschrieben:
> On Tue, May 12, 2020 at 11:32:06AM +0200, Lukas Straub wrote:
> > On Mon, 11 May 2020 16:46:45 +0100
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > 
> > > * Daniel P. Berrangé (berrange@redhat.com) wrote: 
> > > > ...
> > > > That way if QEMU does get stuck, you can start by tearing down the
> > > > least distruptive channel. eg try tearing down the migration connection
> > > > first (which shouldn't negatively impact the guest), and only if that
> > > > doesn't work then, move on to tear down the NBD connection (which risks
> > > > data loss)  
> > > 
> > > I wonder if a different way would be to make all network connections
> > > register with yank, but then make yank take a list of connections to
> > > shutdown(2).
> > 
> > Good Idea. We could name the connections (/yank callbacks) in the
> > form "nbd:<node-name>", "chardev:<chardev-name>" and "migration"
> > (and add "netdev:...", etc. in the future). Then make yank take a
> > list of connection names as you suggest and silently ignore connections
> > that don't exist. And maybe even add a 'query-yank' oob command returning
> > a list of registered connections so the management application can do
> > pattern matching if it wants.

I'm generally not a big fan of silently ignoring things. Is there a
specific requirement to do it in this case, or can management
applications be expected to know which connections exist?

> Yes, that would make the yank command much more flexible in how it can
> be used.
> 
> As an alternative to using formatted strings like this, it could be
> modelled more explicitly in QAPI
> 
>   { 'struct':  'YankChannels',
>     'data': { 'chardev': [ 'string' ],
>               'nbd': ['string'],
> 	      'migration': bool } }
> 
> In this example, 'chardev' would accept a list of chardev IDs which
> have it enabled, 'nbd' would accept a list of block node IDs which
> have it enabled, and migration is a singleton on/off.

Of course, it also means that the yank code needs to know about every
single object that supports the operation, whereas if you only have
strings, the objects could keep registering their connection with a
generic function like yank_register_function() in this version.

I'm not sure if the additional complexity is worth the benefits.

> The benefit of this modelling is that you can introspect QEMU
> to discover what classes of channels support being yanked by
> this QEMU build, as well as what instances are configured to
> be yanked. ie you can distinguish between a QEMU that doesn't
> support yanking network devices, from a QEMU that does support
> yanking network devices, but doesn't have it enabled for any
> network device instances.

This is true, though I think Lukas' suggestion with query-yank should be
as good in practice (you can't check before creating the NBD device
then, but would you actually want to do this?).

And if all else fails, we can still add a few more feature flags to the
schema...

Kevin
Dr. David Alan Gilbert May 13, 2020, 10:53 a.m. UTC | #16
* Kevin Wolf (kwolf@redhat.com) wrote:
> Am 12.05.2020 um 11:43 hat Daniel P. Berrangé geschrieben:
> > On Tue, May 12, 2020 at 11:32:06AM +0200, Lukas Straub wrote:
> > > On Mon, 11 May 2020 16:46:45 +0100
> > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > 
> > > > * Daniel P. Berrangé (berrange@redhat.com) wrote: 
> > > > > ...
> > > > > That way if QEMU does get stuck, you can start by tearing down the
> > > > > least distruptive channel. eg try tearing down the migration connection
> > > > > first (which shouldn't negatively impact the guest), and only if that
> > > > > doesn't work then, move on to tear down the NBD connection (which risks
> > > > > data loss)  
> > > > 
> > > > I wonder if a different way would be to make all network connections
> > > > register with yank, but then make yank take a list of connections to
> > > > shutdown(2).
> > > 
> > > Good Idea. We could name the connections (/yank callbacks) in the
> > > form "nbd:<node-name>", "chardev:<chardev-name>" and "migration"
> > > (and add "netdev:...", etc. in the future). Then make yank take a
> > > list of connection names as you suggest and silently ignore connections
> > > that don't exist. And maybe even add a 'query-yank' oob command returning
> > > a list of registered connections so the management application can do
> > > pattern matching if it wants.
> 
> I'm generally not a big fan of silently ignoring things. Is there a
> specific requirement to do it in this case, or can management
> applications be expected to know which connections exist?
> 
> > Yes, that would make the yank command much more flexible in how it can
> > be used.
> > 
> > As an alternative to using formatted strings like this, it could be
> > modelled more explicitly in QAPI
> > 
> >   { 'struct':  'YankChannels',
> >     'data': { 'chardev': [ 'string' ],
> >               'nbd': ['string'],
> > 	      'migration': bool } }
> > 
> > In this example, 'chardev' would accept a list of chardev IDs which
> > have it enabled, 'nbd' would accept a list of block node IDs which
> > have it enabled, and migration is a singleton on/off.
> 
> Of course, it also means that the yank code needs to know about every
> single object that supports the operation, whereas if you only have
> strings, the objects could keep registering their connection with a
> generic function like yank_register_function() in this version.
> 
> I'm not sure if the additional complexity is worth the benefits.

I tend to agree; although we do have to ensure we either use an existing
naming scheme (e.g. QOM object names?) or make sure we've got a well
defined list of prefixes.

Dave

> 
> > The benefit of this modelling is that you can introspect QEMU
> > to discover what classes of channels support being yanked by
> > this QEMU build, as well as what instances are configured to
> > be yanked. ie you can distinguish between a QEMU that doesn't
> > support yanking network devices, from a QEMU that does support
> > yanking network devices, but doesn't have it enabled for any
> > network device instances.
> 
> This is true, though I think Lukas' suggestion with query-yank should be
> as good in practice (you can't check before creating the NBD device
> then, but would you actually want to do this?).
> 
> And if all else fails, we can still add a few more feature flags to the
> schema...
> 
> Kevin
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Kevin Wolf May 13, 2020, 11:13 a.m. UTC | #17
Am 13.05.2020 um 12:53 hat Dr. David Alan Gilbert geschrieben:
> * Kevin Wolf (kwolf@redhat.com) wrote:
> > Am 12.05.2020 um 11:43 hat Daniel P. Berrangé geschrieben:
> > > On Tue, May 12, 2020 at 11:32:06AM +0200, Lukas Straub wrote:
> > > > On Mon, 11 May 2020 16:46:45 +0100
> > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > 
> > > > > * Daniel P. Berrangé (berrange@redhat.com) wrote: 
> > > > > > ...
> > > > > > That way if QEMU does get stuck, you can start by tearing down the
> > > > > > least distruptive channel. eg try tearing down the migration connection
> > > > > > first (which shouldn't negatively impact the guest), and only if that
> > > > > > doesn't work then, move on to tear down the NBD connection (which risks
> > > > > > data loss)  
> > > > > 
> > > > > I wonder if a different way would be to make all network connections
> > > > > register with yank, but then make yank take a list of connections to
> > > > > shutdown(2).
> > > > 
> > > > Good Idea. We could name the connections (/yank callbacks) in the
> > > > form "nbd:<node-name>", "chardev:<chardev-name>" and "migration"
> > > > (and add "netdev:...", etc. in the future). Then make yank take a
> > > > list of connection names as you suggest and silently ignore connections
> > > > that don't exist. And maybe even add a 'query-yank' oob command returning
> > > > a list of registered connections so the management application can do
> > > > pattern matching if it wants.
> > 
> > I'm generally not a big fan of silently ignoring things. Is there a
> > specific requirement to do it in this case, or can management
> > applications be expected to know which connections exist?
> > 
> > > Yes, that would make the yank command much more flexible in how it can
> > > be used.
> > > 
> > > As an alternative to using formatted strings like this, it could be
> > > modelled more explicitly in QAPI
> > > 
> > >   { 'struct':  'YankChannels',
> > >     'data': { 'chardev': [ 'string' ],
> > >               'nbd': ['string'],
> > > 	      'migration': bool } }
> > > 
> > > In this example, 'chardev' would accept a list of chardev IDs which
> > > have it enabled, 'nbd' would accept a list of block node IDs which
> > > have it enabled, and migration is a singleton on/off.
> > 
> > Of course, it also means that the yank code needs to know about every
> > single object that supports the operation, whereas if you only have
> > strings, the objects could keep registering their connection with a
> > generic function like yank_register_function() in this version.
> > 
> > I'm not sure if the additional complexity is worth the benefits.
> 
> I tend to agree; although we do have to ensure we either use an existing
> naming scheme (e.g. QOM object names?) or make sure we've got a well
> defined list of prefixes.

Not everything that has a network connection is a QOM object (in fact,
neither migration nor chardev nor nbd are QOM objects).

I guess it would be nice to have a single namespace for everything in
QEMU, but the reality is that we have a few separate ones. As long as we
consistently add a prefix that identifies the namespace in question, I
think that would work.

This means that if we're using node-name to identify the NBD connection,
the namespace should be 'block' rather than 'nbd'.

One more thing to consider is, what if a single object has multiple
connections? In the case of node-names, we have a limited set of allowed
characters, so we can use one of the remaining characters as a separator
and then suffix a counter. In other places, the identifier isn't
restricted, so suffixing doesn't work. Maybe prefixing does, but it
would have to be there from the beginning then.

And another thing: Do we really want to document this as limited to
network connections? Another common cause of hangs is when you have
image files on an NFS mount and the connection goes away. Of course, in
the end this is still networking, but inside of QEMU it looks like
accessing any other file. I'm not sure that we'll allow yanking access
to image files anytime soon, but it might not hurt to keep it at the
back of our mind as a potential option we might want the design to
allow.

Kevin
Daniel P. Berrangé May 13, 2020, 11:26 a.m. UTC | #18
On Wed, May 13, 2020 at 01:13:20PM +0200, Kevin Wolf wrote:
> Am 13.05.2020 um 12:53 hat Dr. David Alan Gilbert geschrieben:
> > * Kevin Wolf (kwolf@redhat.com) wrote:
> > > Am 12.05.2020 um 11:43 hat Daniel P. Berrangé geschrieben:
> > > > On Tue, May 12, 2020 at 11:32:06AM +0200, Lukas Straub wrote:
> > > > > On Mon, 11 May 2020 16:46:45 +0100
> > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > > 
> > > > > > * Daniel P. Berrangé (berrange@redhat.com) wrote: 
> > > > > > > ...
> > > > > > > That way if QEMU does get stuck, you can start by tearing down the
> > > > > > > least distruptive channel. eg try tearing down the migration connection
> > > > > > > first (which shouldn't negatively impact the guest), and only if that
> > > > > > > doesn't work then, move on to tear down the NBD connection (which risks
> > > > > > > data loss)  
> > > > > > 
> > > > > > I wonder if a different way would be to make all network connections
> > > > > > register with yank, but then make yank take a list of connections to
> > > > > > shutdown(2).
> > > > > 
> > > > > Good Idea. We could name the connections (/yank callbacks) in the
> > > > > form "nbd:<node-name>", "chardev:<chardev-name>" and "migration"
> > > > > (and add "netdev:...", etc. in the future). Then make yank take a
> > > > > list of connection names as you suggest and silently ignore connections
> > > > > that don't exist. And maybe even add a 'query-yank' oob command returning
> > > > > a list of registered connections so the management application can do
> > > > > pattern matching if it wants.
> > > 
> > > I'm generally not a big fan of silently ignoring things. Is there a
> > > specific requirement to do it in this case, or can management
> > > applications be expected to know which connections exist?
> > > 
> > > > Yes, that would make the yank command much more flexible in how it can
> > > > be used.
> > > > 
> > > > As an alternative to using formatted strings like this, it could be
> > > > modelled more explicitly in QAPI
> > > > 
> > > >   { 'struct':  'YankChannels',
> > > >     'data': { 'chardev': [ 'string' ],
> > > >               'nbd': ['string'],
> > > > 	      'migration': bool } }
> > > > 
> > > > In this example, 'chardev' would accept a list of chardev IDs which
> > > > have it enabled, 'nbd' would accept a list of block node IDs which
> > > > have it enabled, and migration is a singleton on/off.
> > > 
> > > Of course, it also means that the yank code needs to know about every
> > > single object that supports the operation, whereas if you only have
> > > strings, the objects could keep registering their connection with a
> > > generic function like yank_register_function() in this version.
> > > 
> > > I'm not sure if the additional complexity is worth the benefits.
> > 
> > I tend to agree; although we do have to ensure we either use an existing
> > naming scheme (e.g. QOM object names?) or make sure we've got a well
> > defined list of prefixes.
> 
> Not everything that has a network connection is a QOM object (in fact,
> neither migration nor chardev nor nbd are QOM objects).
> 
> I guess it would be nice to have a single namespace for everything in
> QEMU, but the reality is that we have a few separate ones. As long as we
> consistently add a prefix that identifies the namespace in question, I
> think that would work.
> 
> This means that if we're using node-name to identify the NBD connection,
> the namespace should be 'block' rather than 'nbd'.
> 
> One more thing to consider is, what if a single object has multiple
> connections? In the case of node-names, we have a limited set of allowed
> characters, so we can use one of the remaining characters as a separator
> and then suffix a counter. In other places, the identifier isn't
> restricted, so suffixing doesn't work. Maybe prefixing does, but it
> would have to be there from the beginning then.
> 
> And another thing: Do we really want to document this as limited to
> network connections? Another common cause of hangs is when you have
> image files on an NFS mount and the connection goes away. Of course, in
> the end this is still networking, but inside of QEMU it looks like
> accessing any other file. I'm not sure that we'll allow yanking access
> to image files anytime soon, but it might not hurt to keep it at the
> back of our mind as a potential option we might want the design to
> allow.

Are you referring to the in-kernel NFS client hangs here ?  AFAIK, it is
impossible to do anything to get out of those hangs from userspace, because
the thread is stuck in an uninterruptable sleep in kernel space.

If using the in-QEMU NFS client, then there is a network connection that
can be yanked just like the NBD client.

Regards,
Daniel
Paolo Bonzini May 13, 2020, 11:58 a.m. UTC | #19
On 13/05/20 13:26, Daniel P. Berrangé wrote:
> Are you referring to the in-kernel NFS client hangs here ?  AFAIK, it is
> impossible to do anything to get out of those hangs from userspace, because
> the thread is stuck in an uninterruptable sleep in kernel space.
> 
> If using the in-QEMU NFS client, then there is a network connection that
> can be yanked just like the NBD client.

But it's a bad idea to yank it (and also the NBD client) because you're
not sure which wites have made it to the server (and to the medium) and
which haven't.

Effectively, the in-QEMU NFS client and NBD client are always operating
in "soft" mode, but we should always treat that as a bug (which cannot
be fixed) and not a feature for read-write images.

Paolo
Kevin Wolf May 13, 2020, 12:18 p.m. UTC | #20
Am 13.05.2020 um 13:26 hat Daniel P. Berrangé geschrieben:
> On Wed, May 13, 2020 at 01:13:20PM +0200, Kevin Wolf wrote:
> > Am 13.05.2020 um 12:53 hat Dr. David Alan Gilbert geschrieben:
> > > * Kevin Wolf (kwolf@redhat.com) wrote:
> > > > Am 12.05.2020 um 11:43 hat Daniel P. Berrangé geschrieben:
> > > > > On Tue, May 12, 2020 at 11:32:06AM +0200, Lukas Straub wrote:
> > > > > > On Mon, 11 May 2020 16:46:45 +0100
> > > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > > > 
> > > > > > > * Daniel P. Berrangé (berrange@redhat.com) wrote: 
> > > > > > > > ...
> > > > > > > > That way if QEMU does get stuck, you can start by tearing down the
> > > > > > > > least distruptive channel. eg try tearing down the migration connection
> > > > > > > > first (which shouldn't negatively impact the guest), and only if that
> > > > > > > > doesn't work then, move on to tear down the NBD connection (which risks
> > > > > > > > data loss)  
> > > > > > > 
> > > > > > > I wonder if a different way would be to make all network connections
> > > > > > > register with yank, but then make yank take a list of connections to
> > > > > > > shutdown(2).
> > > > > > 
> > > > > > Good Idea. We could name the connections (/yank callbacks) in the
> > > > > > form "nbd:<node-name>", "chardev:<chardev-name>" and "migration"
> > > > > > (and add "netdev:...", etc. in the future). Then make yank take a
> > > > > > list of connection names as you suggest and silently ignore connections
> > > > > > that don't exist. And maybe even add a 'query-yank' oob command returning
> > > > > > a list of registered connections so the management application can do
> > > > > > pattern matching if it wants.
> > > > 
> > > > I'm generally not a big fan of silently ignoring things. Is there a
> > > > specific requirement to do it in this case, or can management
> > > > applications be expected to know which connections exist?
> > > > 
> > > > > Yes, that would make the yank command much more flexible in how it can
> > > > > be used.
> > > > > 
> > > > > As an alternative to using formatted strings like this, it could be
> > > > > modelled more explicitly in QAPI
> > > > > 
> > > > >   { 'struct':  'YankChannels',
> > > > >     'data': { 'chardev': [ 'string' ],
> > > > >               'nbd': ['string'],
> > > > > 	      'migration': bool } }
> > > > > 
> > > > > In this example, 'chardev' would accept a list of chardev IDs which
> > > > > have it enabled, 'nbd' would accept a list of block node IDs which
> > > > > have it enabled, and migration is a singleton on/off.
> > > > 
> > > > Of course, it also means that the yank code needs to know about every
> > > > single object that supports the operation, whereas if you only have
> > > > strings, the objects could keep registering their connection with a
> > > > generic function like yank_register_function() in this version.
> > > > 
> > > > I'm not sure if the additional complexity is worth the benefits.
> > > 
> > > I tend to agree; although we do have to ensure we either use an existing
> > > naming scheme (e.g. QOM object names?) or make sure we've got a well
> > > defined list of prefixes.
> > 
> > Not everything that has a network connection is a QOM object (in fact,
> > neither migration nor chardev nor nbd are QOM objects).
> > 
> > I guess it would be nice to have a single namespace for everything in
> > QEMU, but the reality is that we have a few separate ones. As long as we
> > consistently add a prefix that identifies the namespace in question, I
> > think that would work.
> > 
> > This means that if we're using node-name to identify the NBD connection,
> > the namespace should be 'block' rather than 'nbd'.
> > 
> > One more thing to consider is, what if a single object has multiple
> > connections? In the case of node-names, we have a limited set of allowed
> > characters, so we can use one of the remaining characters as a separator
> > and then suffix a counter. In other places, the identifier isn't
> > restricted, so suffixing doesn't work. Maybe prefixing does, but it
> > would have to be there from the beginning then.
> > 
> > And another thing: Do we really want to document this as limited to
> > network connections? Another common cause of hangs is when you have
> > image files on an NFS mount and the connection goes away. Of course, in
> > the end this is still networking, but inside of QEMU it looks like
> > accessing any other file. I'm not sure that we'll allow yanking access
> > to image files anytime soon, but it might not hurt to keep it at the
> > back of our mind as a potential option we might want the design to
> > allow.
> 
> Are you referring to the in-kernel NFS client hangs here ?  AFAIK, it is
> impossible to do anything to get out of those hangs from userspace, because
> the thread is stuck in an uninterruptable sleep in kernel space.

I'm not sure if it's always uninterruptible, but yes, in general you
would be sacrificing the block node and some of its worker threads to
get the VM unstuck.

The next thing you should probably do is migrate off to another QEMU
instance without a hanging thread. But at least it would let you migrate
off instead of getting stuck while trying to wait for all running
requests to complete.

> If using the in-QEMU NFS client, then there is a network connection
> that can be yanked just like the NBD client.

Yes, that's the same case as NBD.

Kevin
Dr. David Alan Gilbert May 13, 2020, 12:25 p.m. UTC | #21
* Paolo Bonzini (pbonzini@redhat.com) wrote:
> On 13/05/20 13:26, Daniel P. Berrangé wrote:
> > Are you referring to the in-kernel NFS client hangs here ?  AFAIK, it is
> > impossible to do anything to get out of those hangs from userspace, because
> > the thread is stuck in an uninterruptable sleep in kernel space.
> > 
> > If using the in-QEMU NFS client, then there is a network connection that
> > can be yanked just like the NBD client.
> 
> But it's a bad idea to yank it (and also the NBD client) because you're
> not sure which wites have made it to the server (and to the medium) and
> which haven't.

No, that's OK - if you look at the COLO case, and some other cases,
you've got a dead storage device but your redundant pair might be OK;
so it's OK to yank it.
Other similar storage cases are trying to migrate a VM that has one dead
disk, even if you know and accept it's dead and unresponding, you often
can't kill it off if the device is hung.

> Effectively, the in-QEMU NFS client and NBD client are always operating
> in "soft" mode, but we should always treat that as a bug (which cannot
> be fixed) and not a feature for read-write images.
Dave

> 
> Paolo
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Kevin Wolf May 13, 2020, 12:32 p.m. UTC | #22
Am 13.05.2020 um 13:58 hat Paolo Bonzini geschrieben:
> On 13/05/20 13:26, Daniel P. Berrangé wrote:
> > Are you referring to the in-kernel NFS client hangs here ?  AFAIK, it is
> > impossible to do anything to get out of those hangs from userspace, because
> > the thread is stuck in an uninterruptable sleep in kernel space.
> > 
> > If using the in-QEMU NFS client, then there is a network connection that
> > can be yanked just like the NBD client.
> 
> But it's a bad idea to yank it (and also the NBD client) because you're
> not sure which wites have made it to the server (and to the medium) and
> which haven't.

This (undefined content) is generally what guests should expect when a
write request returns an error.

> Effectively, the in-QEMU NFS client and NBD client are always operating
> in "soft" mode, but we should always treat that as a bug (which cannot
> be fixed) and not a feature for read-write images.

It certainly means that you can't continue as if nothing had happened.

However, writing to the same area again (and successfully) restores a
valid state, so with werror=stop you would get into a state where you
can later reconnect and continue where you stopped, but the monitor
would stay responsive during that time.

Or if the disk isn't critical for the guest, you could replace the NBD
block node with a node that always returns I/O errors, so the guest
would see a broken disk (actually, just reporting the errors from
yanking to the guest might be enough to make it give up on the disk and
stop sending new requests). Then you can resume the VM right away.

Kevin
Dr. David Alan Gilbert May 13, 2020, 12:56 p.m. UTC | #23
* Kevin Wolf (kwolf@redhat.com) wrote:
> Am 13.05.2020 um 12:53 hat Dr. David Alan Gilbert geschrieben:
> > * Kevin Wolf (kwolf@redhat.com) wrote:
> > > Am 12.05.2020 um 11:43 hat Daniel P. Berrangé geschrieben:
> > > > On Tue, May 12, 2020 at 11:32:06AM +0200, Lukas Straub wrote:
> > > > > On Mon, 11 May 2020 16:46:45 +0100
> > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > > 
> > > > > > * Daniel P. Berrangé (berrange@redhat.com) wrote: 
> > > > > > > ...
> > > > > > > That way if QEMU does get stuck, you can start by tearing down the
> > > > > > > least distruptive channel. eg try tearing down the migration connection
> > > > > > > first (which shouldn't negatively impact the guest), and only if that
> > > > > > > doesn't work then, move on to tear down the NBD connection (which risks
> > > > > > > data loss)  
> > > > > > 
> > > > > > I wonder if a different way would be to make all network connections
> > > > > > register with yank, but then make yank take a list of connections to
> > > > > > shutdown(2).
> > > > > 
> > > > > Good Idea. We could name the connections (/yank callbacks) in the
> > > > > form "nbd:<node-name>", "chardev:<chardev-name>" and "migration"
> > > > > (and add "netdev:...", etc. in the future). Then make yank take a
> > > > > list of connection names as you suggest and silently ignore connections
> > > > > that don't exist. And maybe even add a 'query-yank' oob command returning
> > > > > a list of registered connections so the management application can do
> > > > > pattern matching if it wants.
> > > 
> > > I'm generally not a big fan of silently ignoring things. Is there a
> > > specific requirement to do it in this case, or can management
> > > applications be expected to know which connections exist?
> > > 
> > > > Yes, that would make the yank command much more flexible in how it can
> > > > be used.
> > > > 
> > > > As an alternative to using formatted strings like this, it could be
> > > > modelled more explicitly in QAPI
> > > > 
> > > >   { 'struct':  'YankChannels',
> > > >     'data': { 'chardev': [ 'string' ],
> > > >               'nbd': ['string'],
> > > > 	      'migration': bool } }
> > > > 
> > > > In this example, 'chardev' would accept a list of chardev IDs which
> > > > have it enabled, 'nbd' would accept a list of block node IDs which
> > > > have it enabled, and migration is a singleton on/off.
> > > 
> > > Of course, it also means that the yank code needs to know about every
> > > single object that supports the operation, whereas if you only have
> > > strings, the objects could keep registering their connection with a
> > > generic function like yank_register_function() in this version.
> > > 
> > > I'm not sure if the additional complexity is worth the benefits.
> > 
> > I tend to agree; although we do have to ensure we either use an existing
> > naming scheme (e.g. QOM object names?) or make sure we've got a well
> > defined list of prefixes.
> 
> Not everything that has a network connection is a QOM object (in fact,
> neither migration nor chardev nor nbd are QOM objects).

Hmm, migrationstate is a qdev object.

> I guess it would be nice to have a single namespace for everything in
> QEMU, but the reality is that we have a few separate ones. As long as we
> consistently add a prefix that identifies the namespace in question, I
> think that would work.

> This means that if we're using node-name to identify the NBD connection,
> the namespace should be 'block' rather than 'nbd'.
> 
> One more thing to consider is, what if a single object has multiple
> connections? In the case of node-names, we have a limited set of allowed
> characters, so we can use one of the remaining characters as a separator
> and then suffix a counter. In other places, the identifier isn't
> restricted, so suffixing doesn't work. Maybe prefixing does, but it
> would have to be there from the beginning then.

Yeh I worry about whether on nbd if you can have multiple nbd
connections to one block device.

> And another thing: Do we really want to document this as limited to
> network connections? Another common cause of hangs is when you have
> image files on an NFS mount and the connection goes away. Of course, in
> the end this is still networking, but inside of QEMU it looks like
> accessing any other file. I'm not sure that we'll allow yanking access
> to image files anytime soon, but it might not hurt to keep it at the
> back of our mind as a potential option we might want the design to
> allow.

Yep.

Dave

> Kevin
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Daniel P. Berrangé May 13, 2020, 1:08 p.m. UTC | #24
On Wed, May 13, 2020 at 01:56:24PM +0100, Dr. David Alan Gilbert wrote:
> * Kevin Wolf (kwolf@redhat.com) wrote:
> > I guess it would be nice to have a single namespace for everything in
> > QEMU, but the reality is that we have a few separate ones. As long as we
> > consistently add a prefix that identifies the namespace in question, I
> > think that would work.
> 
> > This means that if we're using node-name to identify the NBD connection,
> > the namespace should be 'block' rather than 'nbd'.
> > 
> > One more thing to consider is, what if a single object has multiple
> > connections? In the case of node-names, we have a limited set of allowed
> > characters, so we can use one of the remaining characters as a separator
> > and then suffix a counter. In other places, the identifier isn't
> > restricted, so suffixing doesn't work. Maybe prefixing does, but it
> > would have to be there from the beginning then.
> 
> Yeh I worry about whether on nbd if you can have multiple nbd
> connections to one block device.

The kernel NBD driver now supports multiple parallel connections.
QEMU hasn't implemented this in its NBD code yet, but I certainly
see that being in scope for future.


Regards,
Daniel
Dr. David Alan Gilbert May 13, 2020, 1:48 p.m. UTC | #25
* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Wed, May 13, 2020 at 01:56:24PM +0100, Dr. David Alan Gilbert wrote:
> > * Kevin Wolf (kwolf@redhat.com) wrote:
> > > I guess it would be nice to have a single namespace for everything in
> > > QEMU, but the reality is that we have a few separate ones. As long as we
> > > consistently add a prefix that identifies the namespace in question, I
> > > think that would work.
> > 
> > > This means that if we're using node-name to identify the NBD connection,
> > > the namespace should be 'block' rather than 'nbd'.
> > > 
> > > One more thing to consider is, what if a single object has multiple
> > > connections? In the case of node-names, we have a limited set of allowed
> > > characters, so we can use one of the remaining characters as a separator
> > > and then suffix a counter. In other places, the identifier isn't
> > > restricted, so suffixing doesn't work. Maybe prefixing does, but it
> > > would have to be there from the beginning then.
> > 
> > Yeh I worry about whether on nbd if you can have multiple nbd
> > connections to one block device.
> 
> The kernel NBD driver now supports multiple parallel connections.
> QEMU hasn't implemented this in its NBD code yet, but I certainly
> see that being in scope for future.

It's not parallel for performance that worries me, it's more about
separateq connections for separate uses - e.g. if we're serving the same
read-only disk to multiple separate things.

Dave
 
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Eric Blake May 13, 2020, 1:57 p.m. UTC | #26
On 5/13/20 8:48 AM, Dr. David Alan Gilbert wrote:

>>> Yeh I worry about whether on nbd if you can have multiple nbd
>>> connections to one block device.
>>
>> The kernel NBD driver now supports multiple parallel connections.
>> QEMU hasn't implemented this in its NBD code yet, but I certainly
>> see that being in scope for future.
> 
> It's not parallel for performance that worries me, it's more about
> separateq connections for separate uses - e.g. if we're serving the same
> read-only disk to multiple separate things.

qemu allows multiple simultaneous NBD clients, but does not promise 
cache consistency between them.  As long as all the clients are 
read-only, you can't tell the difference (and thus, for read-only 
connections, qemu even advertises NBD_FLAG_CAN_MULTI_CONN per the NBD 
spec).  See also 'qemu-nbd -e' for controlling how many clients qemu-nbd 
will permit at once (when qemu proper is serving NBD, we don't currently 
expose a knob over QMP to control client count, but instead just blindly 
allow multiple connections).  But as soon as one of the clients wants to 
write, that is where we would need to improve code before making it 
safe, so there, we do not advertise MULTI_CONN support.

As for cases with multiple simultaneous clients: we already have those. 
In the case of incremental backups with an NBD export, my demo code for 
copying out the incremental backup involved two parallel NBD clients - 
one using 'qemu-img map' to probe which portions of the image were 
dirty, and a second client actually performing the reads.  But 
incremental backups are read-only situations (the clients are not 
modifying qemu state).
Kevin Wolf May 13, 2020, 2:06 p.m. UTC | #27
Am 13.05.2020 um 15:48 hat Dr. David Alan Gilbert geschrieben:
> * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > On Wed, May 13, 2020 at 01:56:24PM +0100, Dr. David Alan Gilbert wrote:
> > > * Kevin Wolf (kwolf@redhat.com) wrote:
> > > > I guess it would be nice to have a single namespace for everything in
> > > > QEMU, but the reality is that we have a few separate ones. As long as we
> > > > consistently add a prefix that identifies the namespace in question, I
> > > > think that would work.
> > > 
> > > > This means that if we're using node-name to identify the NBD connection,
> > > > the namespace should be 'block' rather than 'nbd'.
> > > > 
> > > > One more thing to consider is, what if a single object has multiple
> > > > connections? In the case of node-names, we have a limited set of allowed
> > > > characters, so we can use one of the remaining characters as a separator
> > > > and then suffix a counter. In other places, the identifier isn't
> > > > restricted, so suffixing doesn't work. Maybe prefixing does, but it
> > > > would have to be there from the beginning then.
> > > 
> > > Yeh I worry about whether on nbd if you can have multiple nbd
> > > connections to one block device.
> > 
> > The kernel NBD driver now supports multiple parallel connections.
> > QEMU hasn't implemented this in its NBD code yet, but I certainly
> > see that being in scope for future.
> 
> It's not parallel for performance that worries me, it's more about
> separateq connections for separate uses - e.g. if we're serving the same
> read-only disk to multiple separate things.

That would be a concern for the NBD server. I'm not sure if anything in
QEMU ever waits for NBD servers (except for the client on the other side
of the connection, of course), so there might be no use case for yanking
their connections.

Anyway, here we were talking about the NBD client, which always accesses
one disk. If you access a second disk, you have a second NBD block node.

Kevin
Eric Blake May 13, 2020, 2:18 p.m. UTC | #28
On 5/13/20 9:06 AM, Kevin Wolf wrote:

>>>>> One more thing to consider is, what if a single object has multiple
>>>>> connections? In the case of node-names, we have a limited set of allowed
>>>>> characters, so we can use one of the remaining characters as a separator
>>>>> and then suffix a counter. In other places, the identifier isn't
>>>>> restricted, so suffixing doesn't work. Maybe prefixing does, but it
>>>>> would have to be there from the beginning then.
>>>>
>>>> Yeh I worry about whether on nbd if you can have multiple nbd
>>>> connections to one block device.
>>>
>>> The kernel NBD driver now supports multiple parallel connections.
>>> QEMU hasn't implemented this in its NBD code yet, but I certainly
>>> see that being in scope for future.
>>
>> It's not parallel for performance that worries me, it's more about
>> separateq connections for separate uses - e.g. if we're serving the same
>> read-only disk to multiple separate things.
> 
> That would be a concern for the NBD server. I'm not sure if anything in
> QEMU ever waits for NBD servers (except for the client on the other side
> of the connection, of course), so there might be no use case for yanking
> their connections.
> 
> Anyway, here we were talking about the NBD client, which always accesses
> one disk. If you access a second disk, you have a second NBD block node.

Ah, right, that's the other direction.  No, we do not currently support 
a single qemu block node backed by multiple NBD clients to one (or more, 
if they are serving identical content in a failover scenario) NBD 
servers.  Performance could indeed be potentially improved by doing 
that, but for now, every time qemu behaves as a new NBD client, it is 
via an additional block node.
Lukas Straub May 13, 2020, 7:12 p.m. UTC | #29
Terminology:
instance = one (nbd) blockdev/one chardev/the single migrationstate
connection = one TCP connection

Hello Everyone,
Having read all the comments, here is proposal v2:
Every instance registers itself with a unique name in the form "blockdev:<node-name>", "chardev:<chardev-name>" and "migration" using yank_register_instance which will do some sanity checks like checking if the same name exists already. Then (multiple) yank functions can be registered as needed with that single name. When the instance exits/is removed, it unregisters all yank functions and unregisters it's name with yank_unregister_instance which will check if all yank functions where unregistered.
Every instance that supports the yank feature will register itself and the yank functions unconditionally (No extra 'yank' option per instance).
The 'query-yank' oob qmp command lists the names of all registered instances.
The 'yank' oob qmp command takes a list of names and for every name calls all yank functions registered with that name. Before doing anything, it will check that all names exist.

If the instance has multiple connections (say, migration with multifd), i don't think it makes much sense to just shutdown one connection. Calling 'yank' on a instance will always shutdown all connections of that instance.

Yank functions are generic and in no way limited to connections. Say, if migration is started to an 'exec:' address, migration could register a yank function to kill that external command on yank (Won't be implemented yet though).

Regards,
Lukas Straub
Kevin Wolf May 14, 2020, 10:41 a.m. UTC | #30
Am 13.05.2020 um 21:12 hat Lukas Straub geschrieben:
> Terminology:
> instance = one (nbd) blockdev/one chardev/the single migrationstate
> connection = one TCP connection
> 
> Hello Everyone,
> Having read all the comments, here is proposal v2:

Looks quite good to me.

> Every instance registers itself with a unique name in the form
> "blockdev:<node-name>", "chardev:<chardev-name>" and "migration" using
> yank_register_instance which will do some sanity checks like checking
> if the same name exists already. Then (multiple) yank functions can be
> registered as needed with that single name. When the instance exits/is
> removed, it unregisters all yank functions and unregisters it's name
> with yank_unregister_instance which will check if all yank functions
> where unregistered.

Feels like nitpicking, but you say that functions are registered with a
name that you have previously registered. If you mean literally passing
a string, could this lead to callers forgetting to register it first?

Maybe yank_register_instance() should return something like a
YankInstance object that must be passed to yank_register_function().
Then you would probably also have a list of all functions registered for
a single instance so that yank_unregister_instance() could automatically
remove all of them instead of requiring the instance to do so.

> Every instance that supports the yank feature will register itself and
> the yank functions unconditionally (No extra 'yank' option per
> instance).
> The 'query-yank' oob qmp command lists the names of all registered
> instances.
> The 'yank' oob qmp command takes a list of names and for every name
> calls all yank functions registered with that name. Before doing
> anything, it will check that all names exist.
> 
> If the instance has multiple connections (say, migration with
> multifd), i don't think it makes much sense to just shutdown one
> connection. Calling 'yank' on a instance will always shutdown all
> connections of that instance.

I think it depends. If shutting down one connection basically kills the
functionality of the whole instance, there is no reason not to kill all
connections. But if the instance could continue working on the second
connection, maybe it shouldn't be killed.

Anyway, we can extend things as needed later. I already mentioned
elsewhere in this thread that block node-names have a restricted set of
allowed character, so having a suffix to distinguish multiple yankable
things is possible. I just checked chardev and it also restricts the
allowed set of characters, so the same applies. 'migration' is only a
fixed string, so it's trivially extensible.

So we know a path forward if we ever need to yank individual
connections, which is good enough for me.

> Yank functions are generic and in no way limited to connections. Say,
> if migration is started to an 'exec:' address, migration could
> register a yank function to kill that external command on yank (Won't
> be implemented yet though).

Yes, this makes sense as a potential future use case.

Kevin
Dr. David Alan Gilbert May 14, 2020, 11:01 a.m. UTC | #31
* Lukas Straub (lukasstraub2@web.de) wrote:
> Terminology:
> instance = one (nbd) blockdev/one chardev/the single migrationstate
> connection = one TCP connection
> 
> Hello Everyone,
> Having read all the comments, here is proposal v2:
> Every instance registers itself with a unique name in the form "blockdev:<node-name>", "chardev:<chardev-name>" and "migration" using yank_register_instance which will do some sanity checks like checking if the same name exists already. Then (multiple) yank functions can be registered as needed with that single name. When the instance exits/is removed, it unregisters all yank functions and unregisters it's name with yank_unregister_instance which will check if all yank functions where unregistered.
> Every instance that supports the yank feature will register itself and the yank functions unconditionally (No extra 'yank' option per instance).
> The 'query-yank' oob qmp command lists the names of all registered instances.
> The 'yank' oob qmp command takes a list of names and for every name calls all yank functions registered with that name. Before doing anything, it will check that all names exist.
> 
> If the instance has multiple connections (say, migration with multifd), i don't think it makes much sense to just shutdown one connection. Calling 'yank' on a instance will always shutdown all connections of that instance.

Agreed.

> Yank functions are generic and in no way limited to connections. Say, if migration is started to an 'exec:' address, migration could register a yank function to kill that external command on yank (Won't be implemented yet though).

One thing we need to be care of is that the yank functions stay suitable
for OOB calling.

Dave

> Regards,
> Lukas Straub


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Markus Armbruster Sept. 1, 2020, 10:35 a.m. UTC | #32
Kevin Wolf <kwolf@redhat.com> writes:

> Am 13.05.2020 um 12:53 hat Dr. David Alan Gilbert geschrieben:
>> * Kevin Wolf (kwolf@redhat.com) wrote:
>> > Am 12.05.2020 um 11:43 hat Daniel P. Berrangé geschrieben:
>> > > On Tue, May 12, 2020 at 11:32:06AM +0200, Lukas Straub wrote:
>> > > > On Mon, 11 May 2020 16:46:45 +0100
>> > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
>> > > > 
>> > > > > * Daniel P. Berrangé (berrange@redhat.com) wrote: 
>> > > > > > ...
>> > > > > > That way if QEMU does get stuck, you can start by tearing down the
>> > > > > > least distruptive channel. eg try tearing down the migration connection
>> > > > > > first (which shouldn't negatively impact the guest), and only if that
>> > > > > > doesn't work then, move on to tear down the NBD connection (which risks
>> > > > > > data loss)  
>> > > > > 
>> > > > > I wonder if a different way would be to make all network connections
>> > > > > register with yank, but then make yank take a list of connections to
>> > > > > shutdown(2).
>> > > > 
>> > > > Good Idea. We could name the connections (/yank callbacks) in the
>> > > > form "nbd:<node-name>", "chardev:<chardev-name>" and "migration"
>> > > > (and add "netdev:...", etc. in the future). Then make yank take a
>> > > > list of connection names as you suggest and silently ignore connections
>> > > > that don't exist. And maybe even add a 'query-yank' oob command returning
>> > > > a list of registered connections so the management application can do
>> > > > pattern matching if it wants.
[...]
>> > > Yes, that would make the yank command much more flexible in how it can
>> > > be used.
>> > > 
>> > > As an alternative to using formatted strings like this, it could be
>> > > modelled more explicitly in QAPI
>> > > 
>> > >   { 'struct':  'YankChannels',
>> > >     'data': { 'chardev': [ 'string' ],
>> > >               'nbd': ['string'],
>> > > 	      'migration': bool } }
>> > > 
>> > > In this example, 'chardev' would accept a list of chardev IDs which
>> > > have it enabled, 'nbd' would accept a list of block node IDs which
>> > > have it enabled, and migration is a singleton on/off.
>> > 
>> > Of course, it also means that the yank code needs to know about every
>> > single object that supports the operation, whereas if you only have
>> > strings, the objects could keep registering their connection with a
>> > generic function like yank_register_function() in this version.
>> > 
>> > I'm not sure if the additional complexity is worth the benefits.
>> 
>> I tend to agree; although we do have to ensure we either use an existing
>> naming scheme (e.g. QOM object names?) or make sure we've got a well
>> defined list of prefixes.
>
> Not everything that has a network connection is a QOM object (in fact,
> neither migration nor chardev nor nbd are QOM objects).
>
> I guess it would be nice to have a single namespace for everything in
> QEMU, but the reality is that we have a few separate ones. As long as we
> consistently add a prefix that identifies the namespace in question, I
> think that would work.
>
> This means that if we're using node-name to identify the NBD connection,
> the namespace should be 'block' rather than 'nbd'.
>
> One more thing to consider is, what if a single object has multiple
> connections? In the case of node-names, we have a limited set of allowed
> characters, so we can use one of the remaining characters as a separator
> and then suffix a counter. In other places, the identifier isn't
> restricted, so suffixing doesn't work. Maybe prefixing does, but it
> would have to be there from the beginning then.

If I understand it correctly, the argument for encoding the structured
"what to yank" information into a string is ease of implementation.  The
yank core sees only strings, and doesn't care about their contents.
Code registering "yankables" can build strings however it likes, as long
as they're all distinct.  QMP clients need to understand how the strings
are built to be able to yank specific "yankables" (as opposed to blindly
yanking stuff until QEMU gets unstuck").

I disagree with this argument for a number of reasons.

1. Use of strings doesn't reduce complexity, it moves it.

   String:

   * QMP clients may need to parse the strings they get back from
     command query-yank intro structured data.

   * QMP clients may need to format structured data into a string for
     command yank.

   * How structured data is be parsed from / formatted to a string is
     part of the external interface, and needs to be documented, in
     addition to the data structure.

   * The strings need to be kept backward compatible.  We could
     inadvertently create problems like the one you described above,
     where a format like UNIQUE-PREFIX:ID with an unrestricted ID is not
     extensible.

   * Code providing "yankables" needs to somehow ensure their strings
     are distinct.

   Complex type:

   * The result of query-yank is already structured data, no need for
     QMP clients to parse.

   * The argument of yank is structured data, no need to for QMP clients
     to format it into a string first.

   * Documenting just the data structure suffices.

   * Backward compatibility of complex QMP types is a well-understood
     problem.

   * Defining the structure in the QAPI schema as union trivially
     ensures the union branches are distinct.

2. Even with structured yank arguments, the yank core doesn't *need* to
   understand the structure.

   The yank core manages a set of instances.  Each instance associates a
   key with a list of YankFuncAndParam.  This is a basically dictionary.
   All the yank core needs to do with the dictionary keys is looking
   them up.

   The proposed implementation uses a list of instances (which is just
   fine).  All the lookup needs from the keys is an "is equal" relation.

   Checking two QAPI objects for structural equality is admittedly not
   as simple as comparing strings.  It does not, however, require
   understanding their structure.  Two stupid solutions come to mind:

   * Hand-written compare that keeps specifics out of the yank core

     Whatever uses a variant of YankInstance gets to provide a
     comparison function of the variant members.

     Compare the common members (initially just @type, which is the
     discriminator).  Return false if any of them differs.

     Else both instances use the same variant.  Return the value of the
     variant comparison function.

   * Visitors exist

     Convert both YankInstances to QObject, compare with
     qobject_is_equal(), done.

3. In QMP, encoding structured data in strings is anathema :)

[...]