Message ID | 20200903100537.1337-3-paul@xen.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | fix 'xl block-detach' | expand |
On Thu, Sep 03, 2020 at 11:05:37AM +0100, Paul Durrant wrote: > From: Paul Durrant <pdurrant@amazon.com> > > The manpage for 'xl' documents that guest co-operation is required for a (non- > forced) block-detach operation and that it may consequently fail. Currently, > however, the implementation of generic device removal means that a time-out > of a block-detach is being automatically re-tried with the force flag set > rather than failing. This patch stops such behaviour. Won't this break cleanup on domain shutdown if the guest doesn't close the devices itself? I think we need some special-casing on shutdown that keeps the current behavior on that case. > > Signed-off-by: Paul Durrant <pdurrant@amazon.com> > --- > Cc: Ian Jackson <iwj@xenproject.org> > Cc: Wei Liu <wl@xen.org> > Cc: Anthony PERARD <anthony.perard@citrix.com> > --- > tools/libxl/libxl_device.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c > index 0381c5d509..d17ca78848 100644 > --- a/tools/libxl/libxl_device.c > +++ b/tools/libxl/libxl_device.c > @@ -1092,7 +1092,8 @@ static void device_backend_callback(libxl__egc *egc, libxl__ev_devstate *ds, > > if (rc == ERROR_TIMEDOUT && > aodev->action == LIBXL__DEVICE_ACTION_REMOVE && > - !aodev->force) { > + !aodev->force && > + aodev->dev->kind != LIBXL__DEVICE_KIND_VBD) { Doing this differentiation for block only seems weird, we should treat all devices equally. Thanks, Roger.
> -----Original Message----- > From: Roger Pau Monné <roger.pau@citrix.com> > Sent: 04 September 2020 09:53 > To: Paul Durrant <paul@xen.org> > Cc: xen-devel@lists.xenproject.org; Paul Durrant <pdurrant@amazon.com>; Ian Jackson > <iwj@xenproject.org>; Wei Liu <wl@xen.org>; Anthony PERARD <anthony.perard@citrix.com> > Subject: Re: [PATCH 2/2] libxl: do not automatically force detach of block devices > > On Thu, Sep 03, 2020 at 11:05:37AM +0100, Paul Durrant wrote: > > From: Paul Durrant <pdurrant@amazon.com> > > > > The manpage for 'xl' documents that guest co-operation is required for a (non- > > forced) block-detach operation and that it may consequently fail. Currently, > > however, the implementation of generic device removal means that a time-out > > of a block-detach is being automatically re-tried with the force flag set > > rather than failing. This patch stops such behaviour. > > Won't this break cleanup on domain shutdown if the guest doesn't close > the devices itself? > I don't think so... AFAIK that's a destroy i.e. it's always forced (since there's no way the guest can co-operate at the point). > I think we need some special-casing on shutdown that keeps the current > behavior on that case. > > > > > Signed-off-by: Paul Durrant <pdurrant@amazon.com> > > --- > > Cc: Ian Jackson <iwj@xenproject.org> > > Cc: Wei Liu <wl@xen.org> > > Cc: Anthony PERARD <anthony.perard@citrix.com> > > --- > > tools/libxl/libxl_device.c | 9 ++++++--- > > 1 file changed, 6 insertions(+), 3 deletions(-) > > > > diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c > > index 0381c5d509..d17ca78848 100644 > > --- a/tools/libxl/libxl_device.c > > +++ b/tools/libxl/libxl_device.c > > @@ -1092,7 +1092,8 @@ static void device_backend_callback(libxl__egc *egc, libxl__ev_devstate *ds, > > > > if (rc == ERROR_TIMEDOUT && > > aodev->action == LIBXL__DEVICE_ACTION_REMOVE && > > - !aodev->force) { > > + !aodev->force && > > + aodev->dev->kind != LIBXL__DEVICE_KIND_VBD) { > > Doing this differentiation for block only seems weird, we should treat > all devices equally. > That is not how things are documented in the xl manpage though; block-detach is the only one to have a force option. I assume that was deliberate. Paul > Thanks, Roger.
On Fri, Sep 04, 2020 at 10:47:37AM +0100, Paul Durrant wrote: > > -----Original Message----- > > From: Roger Pau Monné <roger.pau@citrix.com> > > Sent: 04 September 2020 09:53 > > To: Paul Durrant <paul@xen.org> > > Cc: xen-devel@lists.xenproject.org; Paul Durrant <pdurrant@amazon.com>; Ian Jackson > > <iwj@xenproject.org>; Wei Liu <wl@xen.org>; Anthony PERARD <anthony.perard@citrix.com> > > Subject: Re: [PATCH 2/2] libxl: do not automatically force detach of block devices > > > > On Thu, Sep 03, 2020 at 11:05:37AM +0100, Paul Durrant wrote: > > > From: Paul Durrant <pdurrant@amazon.com> > > > > > > The manpage for 'xl' documents that guest co-operation is required for a (non- > > > forced) block-detach operation and that it may consequently fail. Currently, > > > however, the implementation of generic device removal means that a time-out > > > of a block-detach is being automatically re-tried with the force flag set > > > rather than failing. This patch stops such behaviour. > > > > Won't this break cleanup on domain shutdown if the guest doesn't close > > the devices itself? > > > > I don't think so... AFAIK that's a destroy i.e. it's always forced (since there's no way the guest can co-operate at the point). Urgh, yes, sorry. > > I think we need some special-casing on shutdown that keeps the current > > behavior on that case. > > > > > > > > Signed-off-by: Paul Durrant <pdurrant@amazon.com> > > > --- > > > Cc: Ian Jackson <iwj@xenproject.org> > > > Cc: Wei Liu <wl@xen.org> > > > Cc: Anthony PERARD <anthony.perard@citrix.com> > > > --- > > > tools/libxl/libxl_device.c | 9 ++++++--- > > > 1 file changed, 6 insertions(+), 3 deletions(-) > > > > > > diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c > > > index 0381c5d509..d17ca78848 100644 > > > --- a/tools/libxl/libxl_device.c > > > +++ b/tools/libxl/libxl_device.c > > > @@ -1092,7 +1092,8 @@ static void device_backend_callback(libxl__egc *egc, libxl__ev_devstate *ds, > > > > > > if (rc == ERROR_TIMEDOUT && > > > aodev->action == LIBXL__DEVICE_ACTION_REMOVE && > > > - !aodev->force) { > > > + !aodev->force && > > > + aodev->dev->kind != LIBXL__DEVICE_KIND_VBD) { > > > > Doing this differentiation for block only seems weird, we should treat > > all devices equally. > > > > That is not how things are documented in the xl manpage though; block-detach is the only one to have a force option. I assume that was deliberate. Oh right, that's weird. I guess removing a block device without guest cooperation will likely result in a guest crash, while the same it's not true for other devices. Thanks, Roger.
On Thu, Sep 03, 2020 at 11:05:37AM +0100, Paul Durrant wrote: > From: Paul Durrant <pdurrant@amazon.com> > > The manpage for 'xl' documents that guest co-operation is required for a (non- > forced) block-detach operation and that it may consequently fail. Currently, > however, the implementation of generic device removal means that a time-out > of a block-detach is being automatically re-tried with the force flag set > rather than failing. This patch stops such behaviour. > > Signed-off-by: Paul Durrant <pdurrant@amazon.com> I'm two-minded here. On the one hand, special-casing VBD in libxl to conform to xl's behaviour seems wrong to me. On the other hand, if we want to treat all devices the same in libxl, libxl should drop its force-removal behaviour all together, and the retry behaviour would need to be implemented in xl for all devices excepts for VBD. This requires a bit of code churn and, more importantly, changes how those device removal APIs behave. In the end this effort may not worth it. If we go with the patch here, we should document this special case on libxl level somehow. Anthony and Ian, do you have any opinion on this? Wei. > --- > Cc: Ian Jackson <iwj@xenproject.org> > Cc: Wei Liu <wl@xen.org> > Cc: Anthony PERARD <anthony.perard@citrix.com> > --- > tools/libxl/libxl_device.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c > index 0381c5d509..d17ca78848 100644 > --- a/tools/libxl/libxl_device.c > +++ b/tools/libxl/libxl_device.c > @@ -1092,7 +1092,8 @@ static void device_backend_callback(libxl__egc *egc, libxl__ev_devstate *ds, > > if (rc == ERROR_TIMEDOUT && > aodev->action == LIBXL__DEVICE_ACTION_REMOVE && > - !aodev->force) { > + !aodev->force && > + aodev->dev->kind != LIBXL__DEVICE_KIND_VBD) { > LOGD(DEBUG, aodev->dev->domid, "Timeout reached, initiating forced remove"); > aodev->force = 1; > libxl__initiate_device_generic_remove(egc, aodev); > @@ -1103,7 +1104,8 @@ static void device_backend_callback(libxl__egc *egc, libxl__ev_devstate *ds, > LOGD(ERROR, aodev->dev->domid, "unable to %s device with path %s", > libxl__device_action_to_string(aodev->action), > libxl__device_backend_path(gc, aodev->dev)); > - goto out; > + if (!aodev->force) > + goto out; > } > > device_hotplug(egc, aodev); > @@ -1319,7 +1321,8 @@ static void device_hotplug_done(libxl__egc *egc, libxl__ao_device *aodev) > device_hotplug_clean(gc, aodev); > > /* Clean xenstore if it's a disconnection */ > - if (aodev->action == LIBXL__DEVICE_ACTION_REMOVE) { > + if (aodev->action == LIBXL__DEVICE_ACTION_REMOVE && > + (aodev->force || !aodev->rc)) { > rc = libxl__device_destroy(gc, aodev->dev); > if (!aodev->rc) > aodev->rc = rc; > -- > 2.20.1 >
On Tue, Sep 08, 2020 at 02:19:03PM +0000, Wei Liu wrote: > On Thu, Sep 03, 2020 at 11:05:37AM +0100, Paul Durrant wrote: > > From: Paul Durrant <pdurrant@amazon.com> > > > > The manpage for 'xl' documents that guest co-operation is required for a (non- > > forced) block-detach operation and that it may consequently fail. Currently, > > however, the implementation of generic device removal means that a time-out > > of a block-detach is being automatically re-tried with the force flag set > > rather than failing. This patch stops such behaviour. > > > > Signed-off-by: Paul Durrant <pdurrant@amazon.com> > > I'm two-minded here. > > On the one hand, special-casing VBD in libxl to conform to xl's > behaviour seems wrong to me. > > On the other hand, if we want to treat all devices the same in libxl, > libxl should drop its force-removal behaviour all together, and the > retry behaviour would need to be implemented in xl for all devices > excepts for VBD. This requires a bit of code churn and, more > importantly, changes how those device removal APIs behave. In the end > this effort may not worth it. I would be worried about those changes, as we would likely have to also change libvirt or any other downstreams? > If we go with the patch here, we should document this special case on > libxl level somehow. > > Anthony and Ian, do you have any opinion on this? Maybe a new function should be introduced instead, that attempts to remove a device gracefully and fail otherwise? Then none of the current APIs would change, and xl could use this new function to handle VBD removal? Roger.
On Mon, Sep 14, 2020 at 12:41:09PM +0200, Roger Pau Monné wrote: > On Tue, Sep 08, 2020 at 02:19:03PM +0000, Wei Liu wrote: > > On Thu, Sep 03, 2020 at 11:05:37AM +0100, Paul Durrant wrote: > > > From: Paul Durrant <pdurrant@amazon.com> > > > > > > The manpage for 'xl' documents that guest co-operation is required for a (non- > > > forced) block-detach operation and that it may consequently fail. Currently, > > > however, the implementation of generic device removal means that a time-out > > > of a block-detach is being automatically re-tried with the force flag set > > > rather than failing. This patch stops such behaviour. > > > > > > Signed-off-by: Paul Durrant <pdurrant@amazon.com> > > > > I'm two-minded here. > > > > On the one hand, special-casing VBD in libxl to conform to xl's > > behaviour seems wrong to me. > > > > On the other hand, if we want to treat all devices the same in libxl, > > libxl should drop its force-removal behaviour all together, and the > > retry behaviour would need to be implemented in xl for all devices > > excepts for VBD. This requires a bit of code churn and, more > > importantly, changes how those device removal APIs behave. In the end > > this effort may not worth it. > > I would be worried about those changes, as we would likely have to > also change libvirt or any other downstreams? > > > If we go with the patch here, we should document this special case on > > libxl level somehow. > > > > Anthony and Ian, do you have any opinion on this? > > Maybe a new function should be introduced instead, that attempts to > remove a device gracefully and fail otherwise? > > Then none of the current APIs would change, and xl could use this new > function to handle VBD removal? This sounds fine to me. Wei. > > Roger.
Wei Liu writes ("Re: [PATCH 2/2] libxl: do not automatically force detach of block devices"): > On Mon, Sep 14, 2020 at 12:41:09PM +0200, Roger Pau Monné wrote: > > Maybe a new function should be introduced instead, that attempts to > > remove a device gracefully and fail otherwise? > > > > Then none of the current APIs would change, and xl could use this new > > function to handle VBD removal? > > This sounds fine to me. I agree. If there is going to be different default policy for different devices it ought to be in xl, not libxl, but frankly I think this is an anomaly. I suggest we have a new entrypoint that specifies the fallback behaviour explicitly. ISTM that there are three possible behaviours: - fail if the guest does not cooperate - fall back to force remove - rip the device out immediately The last of these would be useful only in rare situations. IDK if the length of the timeout (for the first two cases) ought to be a parameter too. Ian.
diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c index 0381c5d509..d17ca78848 100644 --- a/tools/libxl/libxl_device.c +++ b/tools/libxl/libxl_device.c @@ -1092,7 +1092,8 @@ static void device_backend_callback(libxl__egc *egc, libxl__ev_devstate *ds, if (rc == ERROR_TIMEDOUT && aodev->action == LIBXL__DEVICE_ACTION_REMOVE && - !aodev->force) { + !aodev->force && + aodev->dev->kind != LIBXL__DEVICE_KIND_VBD) { LOGD(DEBUG, aodev->dev->domid, "Timeout reached, initiating forced remove"); aodev->force = 1; libxl__initiate_device_generic_remove(egc, aodev); @@ -1103,7 +1104,8 @@ static void device_backend_callback(libxl__egc *egc, libxl__ev_devstate *ds, LOGD(ERROR, aodev->dev->domid, "unable to %s device with path %s", libxl__device_action_to_string(aodev->action), libxl__device_backend_path(gc, aodev->dev)); - goto out; + if (!aodev->force) + goto out; } device_hotplug(egc, aodev); @@ -1319,7 +1321,8 @@ static void device_hotplug_done(libxl__egc *egc, libxl__ao_device *aodev) device_hotplug_clean(gc, aodev); /* Clean xenstore if it's a disconnection */ - if (aodev->action == LIBXL__DEVICE_ACTION_REMOVE) { + if (aodev->action == LIBXL__DEVICE_ACTION_REMOVE && + (aodev->force || !aodev->rc)) { rc = libxl__device_destroy(gc, aodev->dev); if (!aodev->rc) aodev->rc = rc;