migration/fd: abort migration if receive POLLHUP event

Message ID	1524295325-18136-1-git-send-email-wangxinxin.wang@huawei.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> From: Wang Xin <wangxinxin.wang@huawei.com> To: <qemu-devel@nongnu.org> Date: Sat, 21 Apr 2018 15:22:05 +0800 Message-ID: <1524295325-18136-1-git-send-email-wangxinxin.wang@huawei.com> MIME-Version: 1.0 Content-Type: text/plain Subject: [Qemu-devel] [PATCH] migration/fd: abort migration if receive POLLHUP event Precedence: list Cc: Wang Xin <wangxinxin.wang@huawei.com>, arei.gonglei@huawei.com, dgilbert@redhat.com, quintela@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Wang Xin April 21, 2018, 7:22 a.m. UTC

If the fd socket peer closed shortly, ppoll may receive a POLLHUP
event before the expected POLLIN event, and qemu will do nothing
but goes into an infinite loop of the POLLHUP event.

So, abort the migration if we receive a POLLHUP event.

Signed-off-by: Wang Xin <wangxinxin.wang@huawei.com>

Dr. David Alan Gilbert April 24, 2018, 5:16 p.m. UTC | #1

* Wang Xin (wangxinxin.wang@huawei.com) wrote:
> If the fd socket peer closed shortly, ppoll may receive a POLLHUP
> event before the expected POLLIN event, and qemu will do nothing
> but goes into an infinite loop of the POLLHUP event.
> 
> So, abort the migration if we receive a POLLHUP event.

Hi Wang Xin,
  Can you explain how you manage to trigger this case; I've not hit it.

> Signed-off-by: Wang Xin <wangxinxin.wang@huawei.com>
> 
> diff --git a/migration/fd.c b/migration/fd.c
> index cd06182..5932c87 100644
> --- a/migration/fd.c
> +++ b/migration/fd.c
> @@ -15,6 +15,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/error-report.h"
>  #include "channel.h"
>  #include "fd.h"
>  #include "monitor/monitor.h"
> @@ -46,6 +47,11 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
>                                               GIOCondition condition,
>                                               gpointer opaque)
>  {
> +    if (condition & G_IO_HUP) {
> +        error_report("The migration peer closed, job abort");
> +        exit(EXIT_FAILURE);
> +    }
> +

OK,  I wish we had a nicer way for failing;  especially for the
multifd/postcopy recovery worlds where one failed connection might not
be fatal; but I don't see how to do that here.

>      migration_channel_process_incoming(ioc);
>      object_unref(OBJECT(ioc));
>      return G_SOURCE_REMOVE;
> @@ -67,7 +73,7 @@ void fd_start_incoming_migration(const char *infd, Error **errp)
>  
>      qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-incoming");
>      qio_channel_add_watch(ioc,
> -                          G_IO_IN,
> +                          G_IO_IN | G_IO_HUP,
>                            fd_accept_incoming_migration,
>                            NULL,
>                            NULL);

Dave

> -- 
> 2.8.1.windows.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Daniel P. Berrangé April 24, 2018, 6:24 p.m. UTC | #2

On Tue, Apr 24, 2018 at 06:16:31PM +0100, Dr. David Alan Gilbert wrote:
> * Wang Xin (wangxinxin.wang@huawei.com) wrote:
> > If the fd socket peer closed shortly, ppoll may receive a POLLHUP
> > event before the expected POLLIN event, and qemu will do nothing
> > but goes into an infinite loop of the POLLHUP event.
> > 
> > So, abort the migration if we receive a POLLHUP event.
> 
> Hi Wang Xin,
>   Can you explain how you manage to trigger this case; I've not hit it.
> 
> > Signed-off-by: Wang Xin <wangxinxin.wang@huawei.com>
> > 
> > diff --git a/migration/fd.c b/migration/fd.c
> > index cd06182..5932c87 100644
> > --- a/migration/fd.c
> > +++ b/migration/fd.c
> > @@ -15,6 +15,7 @@
> >   */
> >  
> >  #include "qemu/osdep.h"
> > +#include "qemu/error-report.h"
> >  #include "channel.h"
> >  #include "fd.h"
> >  #include "monitor/monitor.h"
> > @@ -46,6 +47,11 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
> >                                               GIOCondition condition,
> >                                               gpointer opaque)
> >  {
> > +    if (condition & G_IO_HUP) {
> > +        error_report("The migration peer closed, job abort");
> > +        exit(EXIT_FAILURE);
> > +    }
> > +
> 
> OK,  I wish we had a nicer way for failing;  especially for the
> multifd/postcopy recovery worlds where one failed connection might not
> be fatal; but I don't see how to do that here.

This doesn't feel right to me.

We have passed in a pre-opened FD to QEMU, and we registered a watch
on it to detect when there is data from the src QEMU that is available
to read.  Normally the src will have sent something so we'll get G_IO_IN,
but you're suggesting the client has quit immediately, so we're getting
G_IO_HUP due to end of file.

The migration_channel_process_incoming() method that we pass the ioc
object to will be calling qio_channel_read(ioc) somewhere to try to
read that data.

For QEMU to spin in infinite loop there must be code in the
migration_channel_process_incoming() that is ignoring the return
value of qio_channel_read() in some manner causing it to retry
the read again & again I presume.

Putting this check for G_IO_HUP fixes your immediate problem scenario,
but whatever code was spinning in infinite loop is still broken and
I'd guess it was possible to still trigger the loop. eg by writing
1 single byte and then closing the socket.

So, IMHO this fix is wrong - we need to find the root cause and fix
that, not try to avoid calling the buggy code.

> 
> >      migration_channel_process_incoming(ioc);
> >      object_unref(OBJECT(ioc));
> >      return G_SOURCE_REMOVE;
> > @@ -67,7 +73,7 @@ void fd_start_incoming_migration(const char *infd, Error **errp)
> >  
> >      qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-incoming");
> >      qio_channel_add_watch(ioc,
> > -                          G_IO_IN,
> > +                          G_IO_IN | G_IO_HUP,
> >                            fd_accept_incoming_migration,
> >                            NULL,
> >                            NULL);
> 
> Dave
> 
> > -- 
> > 2.8.1.windows.1
> > 
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Regards,
Daniel

Peter Xu April 25, 2018, 3:14 a.m. UTC | #3

On Tue, Apr 24, 2018 at 07:24:05PM +0100, Daniel P. Berrangé wrote:
> On Tue, Apr 24, 2018 at 06:16:31PM +0100, Dr. David Alan Gilbert wrote:
> > * Wang Xin (wangxinxin.wang@huawei.com) wrote:
> > > If the fd socket peer closed shortly, ppoll may receive a POLLHUP
> > > event before the expected POLLIN event, and qemu will do nothing
> > > but goes into an infinite loop of the POLLHUP event.
> > > 
> > > So, abort the migration if we receive a POLLHUP event.
> > 
> > Hi Wang Xin,
> >   Can you explain how you manage to trigger this case; I've not hit it.
> > 
> > > Signed-off-by: Wang Xin <wangxinxin.wang@huawei.com>
> > > 
> > > diff --git a/migration/fd.c b/migration/fd.c
> > > index cd06182..5932c87 100644
> > > --- a/migration/fd.c
> > > +++ b/migration/fd.c
> > > @@ -15,6 +15,7 @@
> > >   */
> > >  
> > >  #include "qemu/osdep.h"
> > > +#include "qemu/error-report.h"
> > >  #include "channel.h"
> > >  #include "fd.h"
> > >  #include "monitor/monitor.h"
> > > @@ -46,6 +47,11 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
> > >                                               GIOCondition condition,
> > >                                               gpointer opaque)
> > >  {
> > > +    if (condition & G_IO_HUP) {
> > > +        error_report("The migration peer closed, job abort");
> > > +        exit(EXIT_FAILURE);
> > > +    }
> > > +
> > 
> > OK,  I wish we had a nicer way for failing;  especially for the
> > multifd/postcopy recovery worlds where one failed connection might not
> > be fatal; but I don't see how to do that here.
> 
> This doesn't feel right to me.
> 
> We have passed in a pre-opened FD to QEMU, and we registered a watch
> on it to detect when there is data from the src QEMU that is available
> to read.  Normally the src will have sent something so we'll get G_IO_IN,
> but you're suggesting the client has quit immediately, so we're getting
> G_IO_HUP due to end of file.
> 
> The migration_channel_process_incoming() method that we pass the ioc
> object to will be calling qio_channel_read(ioc) somewhere to try to
> read that data.
> 
> For QEMU to spin in infinite loop there must be code in the
> migration_channel_process_incoming() that is ignoring the return
> value of qio_channel_read() in some manner causing it to retry
> the read again & again I presume.
> 
> Putting this check for G_IO_HUP fixes your immediate problem scenario,
> but whatever code was spinning in infinite loop is still broken and
> I'd guess it was possible to still trigger the loop. eg by writing
> 1 single byte and then closing the socket.
> 
> So, IMHO this fix is wrong - we need to find the root cause and fix
> that, not try to avoid calling the buggy code.

I agree. AFAIU the first read should be in qemu_loadvm_state():

    v = qemu_get_be32(f);
    if (v != QEMU_VM_FILE_MAGIC) {
        error_report("Not a migration stream");
        return -EINVAL;
    }

So I would be curious more about how that infinite loop happened.

Peter Xu April 25, 2018, 3:31 a.m. UTC | #4

On Wed, Apr 25, 2018 at 11:14:23AM +0800, Peter Xu wrote:
> On Tue, Apr 24, 2018 at 07:24:05PM +0100, Daniel P. Berrangé wrote:
> > On Tue, Apr 24, 2018 at 06:16:31PM +0100, Dr. David Alan Gilbert wrote:
> > > * Wang Xin (wangxinxin.wang@huawei.com) wrote:
> > > > If the fd socket peer closed shortly, ppoll may receive a POLLHUP
> > > > event before the expected POLLIN event, and qemu will do nothing
> > > > but goes into an infinite loop of the POLLHUP event.
> > > > 
> > > > So, abort the migration if we receive a POLLHUP event.
> > > 
> > > Hi Wang Xin,
> > >   Can you explain how you manage to trigger this case; I've not hit it.
> > > 
> > > > Signed-off-by: Wang Xin <wangxinxin.wang@huawei.com>
> > > > 
> > > > diff --git a/migration/fd.c b/migration/fd.c
> > > > index cd06182..5932c87 100644
> > > > --- a/migration/fd.c
> > > > +++ b/migration/fd.c
> > > > @@ -15,6 +15,7 @@
> > > >   */
> > > >  
> > > >  #include "qemu/osdep.h"
> > > > +#include "qemu/error-report.h"
> > > >  #include "channel.h"
> > > >  #include "fd.h"
> > > >  #include "monitor/monitor.h"
> > > > @@ -46,6 +47,11 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
> > > >                                               GIOCondition condition,
> > > >                                               gpointer opaque)
> > > >  {
> > > > +    if (condition & G_IO_HUP) {
> > > > +        error_report("The migration peer closed, job abort");
> > > > +        exit(EXIT_FAILURE);
> > > > +    }
> > > > +
> > > 
> > > OK,  I wish we had a nicer way for failing;  especially for the
> > > multifd/postcopy recovery worlds where one failed connection might not
> > > be fatal; but I don't see how to do that here.
> > 
> > This doesn't feel right to me.
> > 
> > We have passed in a pre-opened FD to QEMU, and we registered a watch
> > on it to detect when there is data from the src QEMU that is available
> > to read.  Normally the src will have sent something so we'll get G_IO_IN,
> > but you're suggesting the client has quit immediately, so we're getting
> > G_IO_HUP due to end of file.
> > 
> > The migration_channel_process_incoming() method that we pass the ioc
> > object to will be calling qio_channel_read(ioc) somewhere to try to
> > read that data.
> > 
> > For QEMU to spin in infinite loop there must be code in the
> > migration_channel_process_incoming() that is ignoring the return
> > value of qio_channel_read() in some manner causing it to retry
> > the read again & again I presume.
> > 
> > Putting this check for G_IO_HUP fixes your immediate problem scenario,
> > but whatever code was spinning in infinite loop is still broken and
> > I'd guess it was possible to still trigger the loop. eg by writing
> > 1 single byte and then closing the socket.
> > 
> > So, IMHO this fix is wrong - we need to find the root cause and fix
> > that, not try to avoid calling the buggy code.
> 
> I agree. AFAIU the first read should be in qemu_loadvm_state():
> 
>     v = qemu_get_be32(f);
>     if (v != QEMU_VM_FILE_MAGIC) {
>         error_report("Not a migration stream");
>         return -EINVAL;
>     }
> 
> So I would be curious more about how that infinite loop happened.

Ah, wait.  I just noticed that Xin mentioned about the loop already -
it's an infinite loop of SIGHUP.  I suppose it means that we'll just
never go into fd_accept_incoming_migration() at all?

If so, I'm not sure whether we should just always watch on G_IO_HUP
(and possibly G_IO_ERR too) in qio_channel_create_watch():

  GSource *ret = klass->io_create_watch(ioc, condition | G_IO_HUP | G_IO_ERR);

Otherwise I'm not sure the same loop will happen for other users of
qio_channel_add_watch().

Regards,

Wang Xin April 25, 2018, 7:29 a.m. UTC | #5

> -----Original Message-----

> From: Peter Xu [mailto:peterx@redhat.com]

> Sent: Wednesday, April 25, 2018 11:32 AM

> To: Daniel P. Berrangé <berrange@redhat.com>

> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>; wangxin (U)

> <wangxinxin.wang@huawei.com>; qemu-devel@nongnu.org;

> quintela@redhat.com; Gonglei (Arei) <arei.gonglei@huawei.com>

> Subject: Re: [PATCH] migration/fd: abort migration if receive POLLHUP event

> 

> On Wed, Apr 25, 2018 at 11:14:23AM +0800, Peter Xu wrote:

> > On Tue, Apr 24, 2018 at 07:24:05PM +0100, Daniel P. Berrangé wrote:

> > > On Tue, Apr 24, 2018 at 06:16:31PM +0100, Dr. David Alan Gilbert wrote:

> > > > * Wang Xin (wangxinxin.wang@huawei.com) wrote:

> > > > > If the fd socket peer closed shortly, ppoll may receive a POLLHUP

> > > > > event before the expected POLLIN event, and qemu will do nothing

> > > > > but goes into an infinite loop of the POLLHUP event.

> > > > >

> > > > > So, abort the migration if we receive a POLLHUP event.

> > > >

> > > > Hi Wang Xin,

> > > >   Can you explain how you manage to trigger this case; I've not hit it.


Hi David,

I haven't find a way of stable reproduction.
It happened when I try to use libvirt command to restore a VM from an remote image, and the net broken unexpectedly.

            virsh restore ------->   libvirtd
                          run/       \ pass fd
Remote image <----> libvirt_iohelper <--pipe--> qemu -migration

Here are some key log:
1) virsh resotre /opt/galax/mnt/hibernating/db898219-1cde-45bf-b46e-4e439aa2d573/db898219-1cde-45bf-b46e-4e439aa2d573.img  --bypass-cache
2018-04-17T09:45:04.128979+08:00|info|libvirtd[4834]|[5098]|virDomainRestoreFlags[1140]|: enter virDomainRestoreFlags from=/opt/galax/mnt/hibernating/db898219-1cde-45bf-b46e-4e439aa2d573/db898219-1cde-45bf-b46e-4e439aa2d573.img, flags=3

2)some qemu log we catched:
2018-04-17T09:45:06.111682+08:00|info|qemu[1628]|[1628]|do_qmp_dispatch[109]|: qmp_cmd_name: migrate-incoming, arguments: {"uri": "fd:37"}
2018-04-17T09:45:06.111735+08:00|info|qemu[1628]|[1628]|monitor_qapi_event_emit[423]|: {"timestamp": {"seconds": 1523929506, "microseconds": 111706}, "event": "MIGRATION", "data": {"status": "setup"}}
2018-04-17T09:45:06.112353+08:00|info|qemu[1628]|[1628]|monitor_qapi_event_emit[423]|: {"timestamp": {"seconds": 1523929506, "microseconds": 112317}, "event": "MIGRATION", "data": {"status": "active"}}
... then nothing print

3) Remote storage broken, process libvirt_iohelper do error exit. The system key messages as below:
2018-04-17T09:45:04.298085+08:00|info|systemd-machined[-]|New machine qemu-1-i-0000023F.
2018-04-17T09:45:04.298547+08:00|info|systemd[-]|Started Virtual Machine qemu-1-i-0000023F.
2018-04-17T09:45:04.298918+08:00|info|systemd[-]|Starting Virtual Machine qemu-1-i-0000023F.
2018-04-17T09:45:06.188259+08:00|info|kernel[-]|[55369.414640]  connection6:0: detected conn error (1020)  //Remote storage broken, libvirt_iohelper do error exit
2018-04-17T09:45:06.198229+08:00|info|kernel[-]|[55369.423977]  connection7:0: detected conn error (1020)
2018-04-17T09:45:06.776992+08:00|warning|iscsid[-]|Kernel reported iSCSI connection 6:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
2018-04-17T09:45:06.777200+08:00|warning|iscsid[-]|Kernel reported iSCSI connection 7:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)

4) Qemu ppoll() exit frequently because of POLLHUP event
(gdb) p ((GPollFD *)gpollfds->date)[8]
$117 = {fd = 37, events = 1, revents = 16}

> > > >

> > > > > Signed-off-by: Wang Xin <wangxinxin.wang@huawei.com>

> > > > >

> > > > > diff --git a/migration/fd.c b/migration/fd.c

> > > > > index cd06182..5932c87 100644

> > > > > --- a/migration/fd.c

> > > > > +++ b/migration/fd.c

> > > > > @@ -15,6 +15,7 @@

> > > > >   */

> > > > >

> > > > >  #include "qemu/osdep.h"

> > > > > +#include "qemu/error-report.h"

> > > > >  #include "channel.h"

> > > > >  #include "fd.h"

> > > > >  #include "monitor/monitor.h"

> > > > > @@ -46,6 +47,11 @@ static gboolean

> fd_accept_incoming_migration(QIOChannel *ioc,

> > > > >                                               GIOCondition

> condition,

> > > > >                                               gpointer

> opaque)

> > > > >  {

> > > > > +    if (condition & G_IO_HUP) {

> > > > > +        error_report("The migration peer closed, job abort");

> > > > > +        exit(EXIT_FAILURE);

> > > > > +    }

> > > > > +

> > > >

> > > > OK,  I wish we had a nicer way for failing;  especially for the

> > > > multifd/postcopy recovery worlds where one failed connection might not

> > > > be fatal; but I don't see how to do that here.


Any suggestion? Closing the fd is the normal way to handle G_IO_HUP event.

> > >

> > > This doesn't feel right to me.

> > >

> > > We have passed in a pre-opened FD to QEMU, and we registered a watch

> > > on it to detect when there is data from the src QEMU that is available

> > > to read.  Normally the src will have sent something so we'll get G_IO_IN,

> > > but you're suggesting the client has quit immediately, so we're getting

> > > G_IO_HUP due to end of file.

> > >

> > > The migration_channel_process_incoming() method that we pass the ioc

> > > object to will be calling qio_channel_read(ioc) somewhere to try to

> > > read that data.

> > >

> > > For QEMU to spin in infinite loop there must be code in the

> > > migration_channel_process_incoming() that is ignoring the return

> > > value of qio_channel_read() in some manner causing it to retry

> > > the read again & again I presume.

> > >

> > > Putting this check for G_IO_HUP fixes your immediate problem scenario,

> > > but whatever code was spinning in infinite loop is still broken and

> > > I'd guess it was possible to still trigger the loop. eg by writing

> > > 1 single byte and then closing the socket.

> > >

> > > So, IMHO this fix is wrong - we need to find the root cause and fix

> > > that, not try to avoid calling the buggy code.

> >

> > I agree. AFAIU the first read should be in qemu_loadvm_state():

> >

> >     v = qemu_get_be32(f);

> >     if (v != QEMU_VM_FILE_MAGIC) {

> >         error_report("Not a migration stream");

> >         return -EINVAL;

> >     }

> >

> > So I would be curious more about how that infinite loop happened.

> 

> Ah, wait.  I just noticed that Xin mentioned about the loop already -

> it's an infinite loop of SIGHUP.  I suppose it means that we'll just

> never go into fd_accept_incoming_migration() at all?


Yeah, that's what I want to fix.

> 

> If so, I'm not sure whether we should just always watch on G_IO_HUP

> (and possibly G_IO_ERR too) in qio_channel_create_watch():

> 

>   GSource *ret = klass->io_create_watch(ioc, condition | G_IO_HUP |

> G_IO_ERR);

> 

> Otherwise I'm not sure the same loop will happen for other users of

> qio_channel_add_watch().

> 

Hi Daniel and Peter,

In my scenario, it's clear the client quit immediately and we never got a POLLIN event, otherwise, 
the watch should be unregistered when POLLIN event coming. 
As Daniel said, normally G_IO_IN will be the first event, we need to find why POLLIN event never happened, I'll try it.

> Regards,

> --

> Peter Xu

Daniel P. Berrangé April 25, 2018, 8:03 a.m. UTC | #6

On Wed, Apr 25, 2018 at 07:29:05AM +0000, wangxin (U) wrote:
> 
> > -----Original Message-----
> > From: Peter Xu [mailto:peterx@redhat.com]
> > Sent: Wednesday, April 25, 2018 11:32 AM
> > To: Daniel P. Berrangé <berrange@redhat.com>
> > Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>; wangxin (U)
> > <wangxinxin.wang@huawei.com>; qemu-devel@nongnu.org;
> > quintela@redhat.com; Gonglei (Arei) <arei.gonglei@huawei.com>
> > Subject: Re: [PATCH] migration/fd: abort migration if receive POLLHUP event
> > 

> > Ah, wait.  I just noticed that Xin mentioned about the loop already -
> > it's an infinite loop of SIGHUP.  I suppose it means that we'll just
> > never go into fd_accept_incoming_migration() at all?
> 
> Yeah, that's what I want to fix.
> 
> > 
> > If so, I'm not sure whether we should just always watch on G_IO_HUP
> > (and possibly G_IO_ERR too) in qio_channel_create_watch():
> > 
> >   GSource *ret = klass->io_create_watch(ioc, condition | G_IO_HUP |
> > G_IO_ERR);
> > 
> > Otherwise I'm not sure the same loop will happen for other users of
> > qio_channel_add_watch().
> 
> In my scenario, it's clear the client quit immediately and we
> never got a POLLIN event, otherwise, 
> the watch should be unregistered when POLLIN event coming. 
> As Daniel said, normally G_IO_IN will be the first event, we
> need to find why POLLIN event never happened, I'll try it.

Yes, I have always been under the belief that you're guaranteed to get a
POLLIN event when the client closes the connection, in addition to the
POLLHUP event, but it sounds like that is not happening so I guess I'm
mistaken in that.

In this case, it should be sufficient to just add the G_IO_HUP|G_IO_ERR
events when creating the watch - probably no need to add extra code to
the fd_accept_incoming_migration() method.

Regards,
Daniel

migration/fd: abort migration if receive POLLHUP event

Commit Message

Comments

Patch