diff mbox series

[v2] libxl: fix migration of PV and PVH domUs with and without qemu

Message ID 20190503094251.16148-1-olaf@aepfle.de (mailing list archive)
State Superseded
Headers show
Series [v2] libxl: fix migration of PV and PVH domUs with and without qemu | expand

Commit Message

Olaf Hering May 3, 2019, 9:42 a.m. UTC
If a domU has a qemu-xen instance attached, it is required to call qemus
"xen-save-devices-state" method. Without it, the receiving side of a PV or
PVH migration may be unable to lock the image:

xen be: qdisk-51712: xen be: qdisk-51712: error: Failed to get "write" lock
error: Failed to get "write" lock
xen be: qdisk-51712: xen be: qdisk-51712: initialise() failed
initialise() failed

To fix this bug, libxl__domain_suspend_device_model() and
libxl__domain_resume_device_model() have to be called not only for HVM,
but also if the active device_model is QEMU_XEN.

Unfortunately, libxl__domain_build_info_setdefault() hardcodes
b_info->device_model_version to QEMU_XEN if it does not know it any
better. This breaks domUs without a device_model. libxl__qmp_stop() would
wait 10 seconds in qmp_open() for a qemu that will never appear.  During
this long timeframe the domU remains in state paused on the sending side.
As a result network connections may be dropped. Once this bug is fixed as
well, by just removing that assumption, there is no code to actually
initialise b_info->device_model_version.

There is a helper function libxl__need_xenpv_qemu(), which is used in
various places to decide if any device_model has to be spawned. This
function can not be used as is, just to fill b_info->device_model_version,
because store_libxl_entry() was already called earlier. Update this
function to receive a domid to work with, instead of reading xenstore.

Rearrange the code and initialize b_info->device_model_version in
libxl__domain_build_info_setdefault() per DOMAIN_TYPE.

Update initiate_domain_create() to set b_info->device_model_version if it
was not set earlier, using the updated libxl__need_xenpv_qemu().

Introduce LIBXL_DEVICE_MODEL_VERSION_NONE_REQUIRED for PV and PVH that
have no need for a device_model.

Update existing users of libxl__need_xenpv_qemu() to use
b_info->device_model_version for their check if a device_model is needed.

v02:
- update wording in a comment
- remove stale goto in domcreate_launch_dm
- initialize ret in libxl__need_xenpv_qemu

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Roger Pau Monné <roger.pau@citrix.com>
Cc: Anthony PERARD <anthony.perard@citrix.com>
---
 tools/libxl/libxl_create.c      | 39 +++++++++++++++++++++++++++++++--------
 tools/libxl/libxl_dm.c          | 40 +++++++++++++++++++++++-----------------
 tools/libxl/libxl_dom_suspend.c |  8 ++++++--
 tools/libxl/libxl_internal.h    |  3 ++-
 tools/libxl/libxl_types.idl     |  1 +
 5 files changed, 63 insertions(+), 28 deletions(-)

Comments

Roger Pau Monné May 3, 2019, 11:04 a.m. UTC | #1
On Fri, May 03, 2019 at 11:42:51AM +0200, Olaf Hering wrote:
> If a domU has a qemu-xen instance attached, it is required to call qemus
> "xen-save-devices-state" method. Without it, the receiving side of a PV or
> PVH migration may be unable to lock the image:
> 
> xen be: qdisk-51712: xen be: qdisk-51712: error: Failed to get "write" lock
> error: Failed to get "write" lock
> xen be: qdisk-51712: xen be: qdisk-51712: initialise() failed
> initialise() failed
> 
> To fix this bug, libxl__domain_suspend_device_model() and
> libxl__domain_resume_device_model() have to be called not only for HVM,
> but also if the active device_model is QEMU_XEN.
> 
> Unfortunately, libxl__domain_build_info_setdefault() hardcodes
> b_info->device_model_version to QEMU_XEN if it does not know it any
> better. This breaks domUs without a device_model. libxl__qmp_stop() would
> wait 10 seconds in qmp_open() for a qemu that will never appear.  During
> this long timeframe the domU remains in state paused on the sending side.
> As a result network connections may be dropped. Once this bug is fixed as
> well, by just removing that assumption, there is no code to actually
> initialise b_info->device_model_version.
> 
> There is a helper function libxl__need_xenpv_qemu(), which is used in
> various places to decide if any device_model has to be spawned. This
> function can not be used as is, just to fill b_info->device_model_version,
> because store_libxl_entry() was already called earlier. Update this
> function to receive a domid to work with, instead of reading xenstore.

I have to admit I'm not that familiar with the migration code, and
it's fairly complex, so take this with a pinch of salt.

Wouldn't it be easier to leave libxl__need_xenpv_qemu alone and just
use the contents of the migration stream to decide whether to launch a
QEMU for the PV backends or not? ie: just parsing the domain config on
the migration stream should be enough for the destination side to
decide whether a QEMU is needed in order to handle the PV backends?

> Rearrange the code and initialize b_info->device_model_version in
> libxl__domain_build_info_setdefault() per DOMAIN_TYPE.
> 
> Update initiate_domain_create() to set b_info->device_model_version if it
> was not set earlier, using the updated libxl__need_xenpv_qemu().
> 
> Introduce LIBXL_DEVICE_MODEL_VERSION_NONE_REQUIRED for PV and PVH that
> have no need for a device_model.
> 
> Update existing users of libxl__need_xenpv_qemu() to use
> b_info->device_model_version for their check if a device_model is needed.
> 
> v02:
> - update wording in a comment
> - remove stale goto in domcreate_launch_dm
> - initialize ret in libxl__need_xenpv_qemu
> 
> Signed-off-by: Olaf Hering <olaf@aepfle.de>
> Cc: Roger Pau Monné <roger.pau@citrix.com>
> Cc: Anthony PERARD <anthony.perard@citrix.com>
> ---
>  tools/libxl/libxl_create.c      | 39 +++++++++++++++++++++++++++++++--------
>  tools/libxl/libxl_dm.c          | 40 +++++++++++++++++++++++-----------------
>  tools/libxl/libxl_dom_suspend.c |  8 ++++++--
>  tools/libxl/libxl_internal.h    |  3 ++-
>  tools/libxl/libxl_types.idl     |  1 +
>  5 files changed, 63 insertions(+), 28 deletions(-)
> 
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index 89fe80fc9c..150ab02354 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -87,16 +87,20 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
>          b_info->device_model_ssidref = SECINITSID_DOMDM;
>  
>      if (!b_info->device_model_version) {
> -        if (b_info->type == LIBXL_DOMAIN_TYPE_HVM) {
> +        switch (b_info->type) {
> +        case LIBXL_DOMAIN_TYPE_HVM:
>              if (libxl_defbool_val(b_info->device_model_stubdomain)) {
>                  b_info->device_model_version =
>                      LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL;
>              } else {
>                  b_info->device_model_version = libxl__default_device_model(gc);
>              }
> -        } else {
> -            b_info->device_model_version =
> -                LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
> +            break;
> +        case LIBXL_DOMAIN_TYPE_PV:
> +        case LIBXL_DOMAIN_TYPE_PVH:
> +        default:
> +            /* may be set later */
> +            break;

Is it really worth it to change the if into a switch? AFAICT it would
be simpler to just remove the else branch from the existing if.

>          }
>          if (b_info->device_model_version
>                  == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
> @@ -978,6 +982,17 @@ static void initiate_domain_create(libxl__egc *egc,
>          goto error_out;
>      }
>  
> +    if (d_config->b_info.device_model_version
> +        == LIBXL_DEVICE_MODEL_VERSION_UNKNOWN) {
> +        ret = libxl__need_xenpv_qemu(gc, d_config, domid);

I think the above call is wrong, libxl__need_xenpv_qemu expects to get
the domid of the toolstack domain (ie: the domain running this code),
not the domain being created.

> +        if (ret)
> +            d_config->b_info.device_model_version =
> +                LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
> +        else
> +            d_config->b_info.device_model_version =
> +                LIBXL_DEVICE_MODEL_VERSION_NONE_REQUIRED;
> +    }
> +
>      dcs->guest_domid = domid;
>      dcs->sdss.dm.guest_domid = 0; /* means we haven't spawned */
>  
> @@ -1312,6 +1327,7 @@ static void domcreate_launch_dm(libxl__egc *egc, libxl__multidev *multidev,
>      libxl__domain_create_state *dcs = CONTAINER_OF(multidev, *dcs, multidev);
>      STATE_AO_GC(dcs->ao);
>      int i;
> +    bool need_qemu;
>  
>      /* convenience aliases */
>      const uint32_t domid = dcs->guest_domid;
> @@ -1464,10 +1480,17 @@ static void domcreate_launch_dm(libxl__egc *egc, libxl__multidev *multidev,
>          libxl__device_console_add(gc, domid, &console, state, &device);
>          libxl__device_console_dispose(&console);
>  
> -        ret = libxl__need_xenpv_qemu(gc, d_config);
> -        if (ret < 0)
> -            goto error_out;
> -        if (ret) {
> +        switch (d_config->b_info.device_model_version) {
> +            case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
> +            case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
> +                need_qemu = true;
> +                break;
> +            default:
> +                need_qemu = false;
> +                break;

AFAICT at this point device_model_version will either be set to one of
the versions (either trad or upstream) or to none. So you
could initialize need_qemu to false and set if to true if
device_model_version != LIBXL_DEVICE_MODEL_VERSION_NONE_REQUIRED.

> +        }
> +
> +        if (need_qemu) {
>              dcs->sdss.dm.guest_domid = domid;
>              libxl__spawn_local_dm(egc, &dcs->sdss.dm);
>              return;
> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index 2f19786bdd..bab04ab196 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -2268,7 +2268,7 @@ static void spawn_stub_launch_dm(libxl__egc *egc,
>      libxl__domain_build_state *const d_state = sdss->dm.build_state;
>      libxl__domain_build_state *const stubdom_state = &sdss->dm_state;
>      uint32_t dm_domid = sdss->pvqemu.guest_domid;
> -    int need_qemu;
> +    bool need_qemu;
>  
>      if (ret) {
>          LOGD(ERROR, guest_domid, "error connecting disk devices");
> @@ -2337,7 +2337,15 @@ static void spawn_stub_launch_dm(libxl__egc *egc,
>          }
>      }
>  
> -    need_qemu = libxl__need_xenpv_qemu(gc, dm_config);
> +    switch (dm_config->b_info.device_model_version) {
> +        case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
> +        case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
> +            need_qemu = true;
> +            break;
> +        default:
> +            need_qemu = false;
> +            break;
> +    }
>  
>      for (i = 0; i < num_console; i++) {
>          libxl__device device;
> @@ -3175,18 +3183,11 @@ static void kill_device_model_uid_cb(libxl__egc *egc,
>  }
>  
>  /* Return 0 if no dm needed, 1 if needed and <0 if error. */
> -int libxl__need_xenpv_qemu(libxl__gc *gc, libxl_domain_config *d_config)
> +int libxl__need_xenpv_qemu(libxl__gc *gc, libxl_domain_config *d_config, uint32_t domid)
>  {
> -    int idx, i, ret, num;
> -    uint32_t domid;
> +    int idx, i, ret = 0, num;
>      const struct libxl_device_type *dt;
>  
> -    ret = libxl__get_domid(gc, &domid);
> -    if (ret) {
> -        LOG(ERROR, "unable to get domain id");
> -        goto out;
> -    }

The above calls gets the domid of the current domain running this code
(the toolstack domain), not the domain being created.

> -
>      if (d_config->num_vfbs > 0 || d_config->num_p9s > 0) {
>          ret = 1;
>          goto out;
> @@ -3238,21 +3239,26 @@ int libxl__dm_check_start(libxl__gc *gc, libxl_domain_config *d_config,
>                            uint32_t domid)
>  {
>      int rc;
> +    bool need_qemu;
>  
>      if (libxl__dm_active(gc, domid))
>          return 0;
>  
> -    rc = libxl__need_xenpv_qemu(gc, d_config);
> -    if (rc < 0)
> -        goto out;
> -
> -    if (!rc)
> +    switch (d_config->b_info.device_model_version) {
> +        case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
> +        case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
> +            need_qemu = true;
> +            break;
> +        default:
> +            need_qemu = false;
> +            break;
> +    }
> +    if (need_qemu == false)
>          return 0;
>  
>      LOGD(ERROR, domid, "device model required but not running");
>      rc = ERROR_FAIL;
>  
> -out:
>      return rc;
>  }
>  
> diff --git a/tools/libxl/libxl_dom_suspend.c b/tools/libxl/libxl_dom_suspend.c
> index d1af3a6573..c492fe5dd1 100644
> --- a/tools/libxl/libxl_dom_suspend.c
> +++ b/tools/libxl/libxl_dom_suspend.c
> @@ -379,7 +379,9 @@ static void domain_suspend_common_guest_suspended(libxl__egc *egc,
>      libxl__ev_xswatch_deregister(gc, &dsps->guest_watch);
>      libxl__ev_time_deregister(gc, &dsps->guest_timeout);
>  
> -    if (dsps->type == LIBXL_DOMAIN_TYPE_HVM) {
> +    if (dsps->type == LIBXL_DOMAIN_TYPE_HVM ||
> +        libxl__device_model_version_running(gc, dsps->domid) ==
> +        LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
>          dsps->callback_device_model_done = domain_suspend_common_done;
>          libxl__domain_suspend_device_model(egc, dsps); /* must be last */
>          return;
> @@ -459,7 +461,9 @@ int libxl__domain_resume(libxl__gc *gc, uint32_t domid, int suspend_cancel)
>          goto out;
>      }
>  
> -    if (type == LIBXL_DOMAIN_TYPE_HVM) {
> +    if (type == LIBXL_DOMAIN_TYPE_HVM ||
> +        libxl__device_model_version_running(gc, domid) ==
> +        LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
>          rc = libxl__domain_resume_device_model(gc, domid);
>          if (rc) {
>              LOGD(ERROR, domid, "failed to resume device model:%d", rc);
> diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
> index 44e0221284..9eb4211d85 100644
> --- a/tools/libxl/libxl_internal.h
> +++ b/tools/libxl/libxl_internal.h
> @@ -1817,7 +1817,8 @@ _hidden int libxl__domain_build(libxl__gc *gc,
>  _hidden const char *libxl__domain_device_model(libxl__gc *gc,
>                                          const libxl_domain_build_info *info);
>  _hidden int libxl__need_xenpv_qemu(libxl__gc *gc,
> -                                   libxl_domain_config *d_config);
> +                                   libxl_domain_config *d_config,
> +                                   uint32_t domid);
>  _hidden bool libxl__query_qemu_backend(libxl__gc *gc,
>                                         uint32_t domid,
>                                         uint32_t backend_id,
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index cb4702fd7a..7d75bd3850 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -106,6 +106,7 @@ libxl_device_model_version = Enumeration("device_model_version", [
>      (0, "UNKNOWN"),
>      (1, "QEMU_XEN_TRADITIONAL"), # Historical qemu-xen device model (qemu-dm)
>      (2, "QEMU_XEN"),             # Upstream based qemu-xen device model
> +    (3, "NONE_REQUIRED"),
>      ])
>  
>  libxl_console_type = Enumeration("console_type", [
Olaf Hering May 3, 2019, 2:11 p.m. UTC | #2
Am Fri, 3 May 2019 13:04:11 +0200
schrieb Roger Pau Monné <roger.pau@citrix.com>:

> Wouldn't it be easier to leave libxl__need_xenpv_qemu alone and just
> use the contents of the migration stream to decide whether to launch a
> QEMU for the PV backends or not? ie: just parsing the domain config on
> the migration stream should be enough for the destination side to
> decide whether a QEMU is needed in order to handle the PV backends?

I think that is done anyway. How would the receiving side know what to do?

I will see how to handle the stubdom case.

Olaf
Roger Pau Monné May 3, 2019, 3:29 p.m. UTC | #3
On Fri, May 03, 2019 at 04:11:32PM +0200, Olaf Hering wrote:
> Am Fri, 3 May 2019 13:04:11 +0200
> schrieb Roger Pau Monné <roger.pau@citrix.com>:
> 
> > Wouldn't it be easier to leave libxl__need_xenpv_qemu alone and just
> > use the contents of the migration stream to decide whether to launch a
> > QEMU for the PV backends or not? ie: just parsing the domain config on
> > the migration stream should be enough for the destination side to
> > decide whether a QEMU is needed in order to handle the PV backends?
> 
> I think that is done anyway. How would the receiving side know what to do?

Hm, OK. I will wait for v3 with the domid stuff fixed.

Thanks, Roger.
Olaf Hering May 9, 2019, 1:56 p.m. UTC | #4
Am Fri, 3 May 2019 13:04:11 +0200
schrieb Roger Pau Monné <roger.pau@citrix.com>:

> I think the above call is wrong, libxl__need_xenpv_qemu expects to get
> the domid of the toolstack domain (ie: the domain running this code),
> not the domain being created.

So, how do I actually test such setups? It seems a driver domain is
required. According to xl-disk-configuration(5) I may need to specify
backend=$domU. Is there some guide how to configure such thing?


Olaf
Olaf Hering May 9, 2019, 2:29 p.m. UTC | #5
Am Thu, 9 May 2019 15:56:21 +0200
schrieb Olaf Hering <olaf@aepfle.de>:

> Am Fri, 3 May 2019 13:04:11 +0200
> schrieb Roger Pau Monné <roger.pau@citrix.com>:
> 
> > I think the above call is wrong, libxl__need_xenpv_qemu expects to get
> > the domid of the toolstack domain (ie: the domain running this code),
> > not the domain being created.  
> 
> So, how do I actually test such setups? It seems a driver domain is
> required. According to xl-disk-configuration(5) I may need to specify
> backend=$domU. Is there some guide how to configure such thing?

While my question still stands, I wonder if the statement regarding
libxl__need_xenpv_qemu is correct.

Are you saying the current users of libxl__need_xenpv_qemu (libxl__dm_check_start,
spawn_stub_launch_dm and domcreate_launch_dm) do not only run in dom0, but
also somewhere else?


Olaf
Olaf Hering May 9, 2019, 3:14 p.m. UTC | #6
Am Thu, 9 May 2019 16:29:56 +0200
schrieb Olaf Hering <olaf@aepfle.de>:

> Are you saying the current users of libxl__need_xenpv_qemu (libxl__dm_check_start,
> spawn_stub_launch_dm and domcreate_launch_dm) do not only run in dom0, but
> also somewhere else?

And if it is indeed running somewhere else, would libxl__get_domain_configuration
provide the content of b_info.device_model_version?
At least in my testing /var/lib/xen/userdata-d.domid.uuid.libxl-json has no such
info in the "b_info" field.

Olaf
Roger Pau Monné May 9, 2019, 3:52 p.m. UTC | #7
On Thu, May 09, 2019 at 04:29:56PM +0200, Olaf Hering wrote:
> Am Thu, 9 May 2019 15:56:21 +0200
> schrieb Olaf Hering <olaf@aepfle.de>:
> 
> > Am Fri, 3 May 2019 13:04:11 +0200
> > schrieb Roger Pau Monné <roger.pau@citrix.com>:
> > 
> > > I think the above call is wrong, libxl__need_xenpv_qemu expects to get
> > > the domid of the toolstack domain (ie: the domain running this code),
> > > not the domain being created.  
> > 
> > So, how do I actually test such setups? It seems a driver domain is
> > required. According to xl-disk-configuration(5) I may need to specify
> > backend=$domU. Is there some guide how to configure such thing?

Hm, I'm afraid I'm not able to find anything, I could swear I've
added something to the wiki when this was implemented.

Anyway, you need to create a domU and add driver_domain=1 to the
domain xl.cfg. Inside this domU you need `xl devd` running, there's an
init script for it. Note you will have to install the Xen toolstack
inside this domU. That would be the domU serving the backends.

Then you create another domU that would have a disk line with
backend=<backend domU> and have the disk image available in the
backend domU.

That should be enough to get it working as a test.

> While my question still stands, I wonder if the statement regarding
> libxl__need_xenpv_qemu is correct.

Hm, TBH I'm not sure libxl__need_xenpv_qemu is still used by driver
domains (devd). AFAICT the decision whether to use qdisk or not should
be set on the disk configuration line manually, and then the driver
domain just launches a qemu instance when required (ie: when using
qdisk).

> Are you saying the current users of libxl__need_xenpv_qemu (libxl__dm_check_start,
> spawn_stub_launch_dm and domcreate_launch_dm) do not only run in dom0, but
> also somewhere else?

libxl__spawn_qdisk_backend can indeed be called from driver domains
(!= dom0). See add_device which gets called by backend_watch_callback
from the flow of `xl devd`.

Thanks, Roger.
Olaf Hering May 9, 2019, 3:54 p.m. UTC | #8
Am Fri, 3 May 2019 17:29:53 +0200
schrieb Roger Pau Monné <roger.pau@citrix.com>:

> On Fri, May 03, 2019 at 04:11:32PM +0200, Olaf Hering wrote:
> > Am Fri, 3 May 2019 13:04:11 +0200
> > schrieb Roger Pau Monné <roger.pau@citrix.com>:
> >   
> > > Wouldn't it be easier to leave libxl__need_xenpv_qemu alone and just
> > > use the contents of the migration stream to decide whether to launch a
> > > QEMU for the PV backends or not? ie: just parsing the domain config on
> > > the migration stream should be enough for the destination side to
> > > decide whether a QEMU is needed in order to handle the PV backends?  
> > 
> > I think that is done anyway. How would the receiving side know what to do?  
> 
> Hm, OK. I will wait for v3 with the domid stuff fixed.

I think it will be as simple as this. In the end b_info.device_model_version
must be set prior the call to store_libxl_entry.

--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -92,9 +92,6 @@ int libxl__domain_build_info_setdefault(
             } else {
                 b_info->device_model_version = libxl__default_device_model(gc);
             }
-        } else {
-            b_info->device_model_version =
-                LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
         }
         if (b_info->device_model_version
                 == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
@@ -969,6 +966,18 @@ static void initiate_domain_create(libxl
     dcs->sdss.dm.guest_domid = 0; /* means we haven't spawned */
 
     /*
+     * libxl__domain_build_info_setdefault sets this only for HVM
+     * set it before store_libxl_entry()
+     */
+    if (d_config->b_info.device_model_version
+        == LIBXL_DEVICE_MODEL_VERSION_UNKNOWN) {
+        ret = libxl__need_xenpv_qemu(gc, d_config);
+        if (ret > 0)
+            d_config->b_info.device_model_version =
+                LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
+    }
+
+    /*
      * Set the dm version quite early so that libxl doesn't have to pass the
      * build info around just to know if the domain has a device model or not.
      */
--- a/tools/libxl/libxl_dom_suspend.c
+++ b/tools/libxl/libxl_dom_suspend.c
@@ -377,7 +377,9 @@ static void domain_suspend_common_guest_
     libxl__ev_xswatch_deregister(gc, &dsps->guest_watch);
     libxl__ev_time_deregister(gc, &dsps->guest_timeout);
 
-    if (dsps->type == LIBXL_DOMAIN_TYPE_HVM) {
+    if (dsps->type == LIBXL_DOMAIN_TYPE_HVM ||
+        libxl__device_model_version_running(gc, dsps->domid) ==
+        LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
         rc = libxl__domain_suspend_device_model(gc, dsps);
         if (rc) {
             LOGD(ERROR, dsps->domid,
@@ -460,7 +462,9 @@ int libxl__domain_resume(libxl__gc *gc,
         goto out;
     }
 
-    if (type == LIBXL_DOMAIN_TYPE_HVM) {
+    if (type == LIBXL_DOMAIN_TYPE_HVM ||
+        libxl__device_model_version_running(gc, domid) ==
+        LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
         rc = libxl__domain_resume_device_model(gc, domid);
         if (rc) {
             LOGD(ERROR, domid, "failed to resume device model:%d", rc);

Olaf
Roger Pau Monné May 9, 2019, 4:14 p.m. UTC | #9
On Thu, May 09, 2019 at 05:54:38PM +0200, Olaf Hering wrote:
> Am Fri, 3 May 2019 17:29:53 +0200
> schrieb Roger Pau Monné <roger.pau@citrix.com>:
> 
> > On Fri, May 03, 2019 at 04:11:32PM +0200, Olaf Hering wrote:
> > > Am Fri, 3 May 2019 13:04:11 +0200
> > > schrieb Roger Pau Monné <roger.pau@citrix.com>:
> > >   
> > > > Wouldn't it be easier to leave libxl__need_xenpv_qemu alone and just
> > > > use the contents of the migration stream to decide whether to launch a
> > > > QEMU for the PV backends or not? ie: just parsing the domain config on
> > > > the migration stream should be enough for the destination side to
> > > > decide whether a QEMU is needed in order to handle the PV backends?  
> > > 
> > > I think that is done anyway. How would the receiving side know what to do?  
> > 
> > Hm, OK. I will wait for v3 with the domid stuff fixed.
> 
> I think it will be as simple as this. In the end b_info.device_model_version
> must be set prior the call to store_libxl_entry.

The patch below looks sensible to me, and it's more like what I was
expecting would be needed to solve this issue.

> 
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -92,9 +92,6 @@ int libxl__domain_build_info_setdefault(
>              } else {
>                  b_info->device_model_version = libxl__default_device_model(gc);
>              }
> -        } else {
> -            b_info->device_model_version =
> -                LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
>          }
>          if (b_info->device_model_version
>                  == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
> @@ -969,6 +966,18 @@ static void initiate_domain_create(libxl
>      dcs->sdss.dm.guest_domid = 0; /* means we haven't spawned */
>  
>      /*
> +     * libxl__domain_build_info_setdefault sets this only for HVM
> +     * set it before store_libxl_entry()
> +     */
> +    if (d_config->b_info.device_model_version
> +        == LIBXL_DEVICE_MODEL_VERSION_UNKNOWN) {
> +        ret = libxl__need_xenpv_qemu(gc, d_config);
> +        if (ret > 0)

Since libxl__need_xenpv_qemu can fail, what about doing something
like:

switch ( (ret = libxl__need_xenpv_qemu(gc, d_config)) ) {
case 1:
    d_config->b_info.device_model_version =
    LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
    break;

case 0:
    break;

default:
    LOGEVD(ERROR, ret, domid, "Unable to determine QEMU requisite");
    goto error_out;
}

Note the error message wording is not very good, probably an English
speaker would come up with a better message.

Thanks for looking into this!
Olaf Hering May 9, 2019, 4:19 p.m. UTC | #10
Am Thu, 9 May 2019 18:14:10 +0200
schrieb Roger Pau Monné <roger.pau@citrix.com>:

> The patch below looks sensible to me, and it's more like what I was
> expecting would be needed to solve this issue.

The remaining question is if a new state should be introduced,
or if LIBXL_DEVICE_MODEL_VERSION_UNKNOWN would be good to state the fact
that no device model is required.

Olaf
diff mbox series

Patch

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 89fe80fc9c..150ab02354 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -87,16 +87,20 @@  int libxl__domain_build_info_setdefault(libxl__gc *gc,
         b_info->device_model_ssidref = SECINITSID_DOMDM;
 
     if (!b_info->device_model_version) {
-        if (b_info->type == LIBXL_DOMAIN_TYPE_HVM) {
+        switch (b_info->type) {
+        case LIBXL_DOMAIN_TYPE_HVM:
             if (libxl_defbool_val(b_info->device_model_stubdomain)) {
                 b_info->device_model_version =
                     LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL;
             } else {
                 b_info->device_model_version = libxl__default_device_model(gc);
             }
-        } else {
-            b_info->device_model_version =
-                LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
+            break;
+        case LIBXL_DOMAIN_TYPE_PV:
+        case LIBXL_DOMAIN_TYPE_PVH:
+        default:
+            /* may be set later */
+            break;
         }
         if (b_info->device_model_version
                 == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
@@ -978,6 +982,17 @@  static void initiate_domain_create(libxl__egc *egc,
         goto error_out;
     }
 
+    if (d_config->b_info.device_model_version
+        == LIBXL_DEVICE_MODEL_VERSION_UNKNOWN) {
+        ret = libxl__need_xenpv_qemu(gc, d_config, domid);
+        if (ret)
+            d_config->b_info.device_model_version =
+                LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN;
+        else
+            d_config->b_info.device_model_version =
+                LIBXL_DEVICE_MODEL_VERSION_NONE_REQUIRED;
+    }
+
     dcs->guest_domid = domid;
     dcs->sdss.dm.guest_domid = 0; /* means we haven't spawned */
 
@@ -1312,6 +1327,7 @@  static void domcreate_launch_dm(libxl__egc *egc, libxl__multidev *multidev,
     libxl__domain_create_state *dcs = CONTAINER_OF(multidev, *dcs, multidev);
     STATE_AO_GC(dcs->ao);
     int i;
+    bool need_qemu;
 
     /* convenience aliases */
     const uint32_t domid = dcs->guest_domid;
@@ -1464,10 +1480,17 @@  static void domcreate_launch_dm(libxl__egc *egc, libxl__multidev *multidev,
         libxl__device_console_add(gc, domid, &console, state, &device);
         libxl__device_console_dispose(&console);
 
-        ret = libxl__need_xenpv_qemu(gc, d_config);
-        if (ret < 0)
-            goto error_out;
-        if (ret) {
+        switch (d_config->b_info.device_model_version) {
+            case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
+            case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
+                need_qemu = true;
+                break;
+            default:
+                need_qemu = false;
+                break;
+        }
+
+        if (need_qemu) {
             dcs->sdss.dm.guest_domid = domid;
             libxl__spawn_local_dm(egc, &dcs->sdss.dm);
             return;
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 2f19786bdd..bab04ab196 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -2268,7 +2268,7 @@  static void spawn_stub_launch_dm(libxl__egc *egc,
     libxl__domain_build_state *const d_state = sdss->dm.build_state;
     libxl__domain_build_state *const stubdom_state = &sdss->dm_state;
     uint32_t dm_domid = sdss->pvqemu.guest_domid;
-    int need_qemu;
+    bool need_qemu;
 
     if (ret) {
         LOGD(ERROR, guest_domid, "error connecting disk devices");
@@ -2337,7 +2337,15 @@  static void spawn_stub_launch_dm(libxl__egc *egc,
         }
     }
 
-    need_qemu = libxl__need_xenpv_qemu(gc, dm_config);
+    switch (dm_config->b_info.device_model_version) {
+        case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
+        case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
+            need_qemu = true;
+            break;
+        default:
+            need_qemu = false;
+            break;
+    }
 
     for (i = 0; i < num_console; i++) {
         libxl__device device;
@@ -3175,18 +3183,11 @@  static void kill_device_model_uid_cb(libxl__egc *egc,
 }
 
 /* Return 0 if no dm needed, 1 if needed and <0 if error. */
-int libxl__need_xenpv_qemu(libxl__gc *gc, libxl_domain_config *d_config)
+int libxl__need_xenpv_qemu(libxl__gc *gc, libxl_domain_config *d_config, uint32_t domid)
 {
-    int idx, i, ret, num;
-    uint32_t domid;
+    int idx, i, ret = 0, num;
     const struct libxl_device_type *dt;
 
-    ret = libxl__get_domid(gc, &domid);
-    if (ret) {
-        LOG(ERROR, "unable to get domain id");
-        goto out;
-    }
-
     if (d_config->num_vfbs > 0 || d_config->num_p9s > 0) {
         ret = 1;
         goto out;
@@ -3238,21 +3239,26 @@  int libxl__dm_check_start(libxl__gc *gc, libxl_domain_config *d_config,
                           uint32_t domid)
 {
     int rc;
+    bool need_qemu;
 
     if (libxl__dm_active(gc, domid))
         return 0;
 
-    rc = libxl__need_xenpv_qemu(gc, d_config);
-    if (rc < 0)
-        goto out;
-
-    if (!rc)
+    switch (d_config->b_info.device_model_version) {
+        case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL:
+        case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN:
+            need_qemu = true;
+            break;
+        default:
+            need_qemu = false;
+            break;
+    }
+    if (need_qemu == false)
         return 0;
 
     LOGD(ERROR, domid, "device model required but not running");
     rc = ERROR_FAIL;
 
-out:
     return rc;
 }
 
diff --git a/tools/libxl/libxl_dom_suspend.c b/tools/libxl/libxl_dom_suspend.c
index d1af3a6573..c492fe5dd1 100644
--- a/tools/libxl/libxl_dom_suspend.c
+++ b/tools/libxl/libxl_dom_suspend.c
@@ -379,7 +379,9 @@  static void domain_suspend_common_guest_suspended(libxl__egc *egc,
     libxl__ev_xswatch_deregister(gc, &dsps->guest_watch);
     libxl__ev_time_deregister(gc, &dsps->guest_timeout);
 
-    if (dsps->type == LIBXL_DOMAIN_TYPE_HVM) {
+    if (dsps->type == LIBXL_DOMAIN_TYPE_HVM ||
+        libxl__device_model_version_running(gc, dsps->domid) ==
+        LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
         dsps->callback_device_model_done = domain_suspend_common_done;
         libxl__domain_suspend_device_model(egc, dsps); /* must be last */
         return;
@@ -459,7 +461,9 @@  int libxl__domain_resume(libxl__gc *gc, uint32_t domid, int suspend_cancel)
         goto out;
     }
 
-    if (type == LIBXL_DOMAIN_TYPE_HVM) {
+    if (type == LIBXL_DOMAIN_TYPE_HVM ||
+        libxl__device_model_version_running(gc, domid) ==
+        LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
         rc = libxl__domain_resume_device_model(gc, domid);
         if (rc) {
             LOGD(ERROR, domid, "failed to resume device model:%d", rc);
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 44e0221284..9eb4211d85 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1817,7 +1817,8 @@  _hidden int libxl__domain_build(libxl__gc *gc,
 _hidden const char *libxl__domain_device_model(libxl__gc *gc,
                                         const libxl_domain_build_info *info);
 _hidden int libxl__need_xenpv_qemu(libxl__gc *gc,
-                                   libxl_domain_config *d_config);
+                                   libxl_domain_config *d_config,
+                                   uint32_t domid);
 _hidden bool libxl__query_qemu_backend(libxl__gc *gc,
                                        uint32_t domid,
                                        uint32_t backend_id,
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index cb4702fd7a..7d75bd3850 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -106,6 +106,7 @@  libxl_device_model_version = Enumeration("device_model_version", [
     (0, "UNKNOWN"),
     (1, "QEMU_XEN_TRADITIONAL"), # Historical qemu-xen device model (qemu-dm)
     (2, "QEMU_XEN"),             # Upstream based qemu-xen device model
+    (3, "NONE_REQUIRED"),
     ])
 
 libxl_console_type = Enumeration("console_type", [