Message ID | 1562053600-32685-1-git-send-email-chao.gao@intel.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | libxl_qmp: wait for completion of device removal | expand |
Hi, Thanks for the patch. I've got some comments. On Tue, Jul 02, 2019 at 03:46:40PM +0800, Chao Gao wrote: > To remove a device from a domain, a qmp command is sent to qemu. But it is > handled by qemu asychronously. Even the qmp command is claimed to be done, > the actual handling in qemu side may happen later. > This behavior brings two questions: > 1. Attaching a device back to a domain right after detaching the device from > that domain would fail with error: > > libxl: error: libxl_qmp.c:341:qmp_handle_error_response: Domain 1:received an > error message from QMP server: Duplicate ID 'pci-pt-60_00.0' for device > > 2. Accesses to PCI configuration space in Qemu may overlap with later device > reset issued by 'xl' or by pciback. > > In order to avoid mentioned questions, wait for the completion of device > removal by querying all pci devices using qmp command and ensuring the target > device isn't listed. Only retry 5 times to avoid 'xl' potentially being blocked > by qemu. > > Signed-off-by: Chao Gao <chao.gao@intel.com> > --- > tools/libxl/libxl_qmp.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 56 insertions(+), 1 deletion(-) > > diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c > index 42c8ab8..18f6126 100644 > --- a/tools/libxl/libxl_qmp.c > +++ b/tools/libxl/libxl_qmp.c > @@ -1000,9 +1032,32 @@ int libxl__qmp_pci_add(libxl__gc *gc, int domid, libxl_device_pci *pcidev) > static int qmp_device_del(libxl__gc *gc, int domid, char *id) > { > libxl__json_object *args = NULL; > + libxl__qmp_handler *qmp = NULL; > + int rc = 0; > + > + qmp = libxl__qmp_initialize(gc, domid); > + if (!qmp) > + return ERROR_FAIL; > > qmp_parameters_add_string(gc, &args, "id", id); > - return qmp_run_command(gc, domid, "device_del", args, NULL, NULL); > + rc = qmp_synchronous_send(qmp, "device_del", args, > + NULL, NULL, qmp->timeout); > + if (rc == 0) { > + unsigned int retry = 0; > + > + do { > + if (qmp_synchronous_send(qmp, "query-pci", NULL, > + pci_del_callback, id, qmp->timeout) == 0) { Could you assign the return value of qmp_synchronous_send into a variable, then check for success/error? qmp_synchronous_send returns several possible values: - errors, when rc < 0 - rc of the callback, which is 0 or 1 here. If there's an error, we don't want to keep trying we probably want to return that error. Thanks. > + break; > + } > + LOGD(DEBUG, qmp->domid, > + "Waiting for completion of deleting device %s", id); > + sleep(1); > + } while (retry++ < 5); > + }
On Tue, Jul 02, 2019 at 02:42:33PM +0100, Anthony PERARD wrote: >Hi, > >Thanks for the patch. I've got some comments. > >On Tue, Jul 02, 2019 at 03:46:40PM +0800, Chao Gao wrote: >> To remove a device from a domain, a qmp command is sent to qemu. But it is >> handled by qemu asychronously. Even the qmp command is claimed to be done, >> the actual handling in qemu side may happen later. >> This behavior brings two questions: >> 1. Attaching a device back to a domain right after detaching the device from >> that domain would fail with error: >> >> libxl: error: libxl_qmp.c:341:qmp_handle_error_response: Domain 1:received an >> error message from QMP server: Duplicate ID 'pci-pt-60_00.0' for device >> >> 2. Accesses to PCI configuration space in Qemu may overlap with later device >> reset issued by 'xl' or by pciback. >> >> In order to avoid mentioned questions, wait for the completion of device >> removal by querying all pci devices using qmp command and ensuring the target >> device isn't listed. Only retry 5 times to avoid 'xl' potentially being blocked >> by qemu. >> >> Signed-off-by: Chao Gao <chao.gao@intel.com> >> --- >> tools/libxl/libxl_qmp.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 56 insertions(+), 1 deletion(-) >> >> diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c >> index 42c8ab8..18f6126 100644 >> --- a/tools/libxl/libxl_qmp.c >> +++ b/tools/libxl/libxl_qmp.c >> @@ -1000,9 +1032,32 @@ int libxl__qmp_pci_add(libxl__gc *gc, int domid, libxl_device_pci *pcidev) >> static int qmp_device_del(libxl__gc *gc, int domid, char *id) >> { >> libxl__json_object *args = NULL; >> + libxl__qmp_handler *qmp = NULL; >> + int rc = 0; >> + >> + qmp = libxl__qmp_initialize(gc, domid); >> + if (!qmp) >> + return ERROR_FAIL; >> >> qmp_parameters_add_string(gc, &args, "id", id); >> - return qmp_run_command(gc, domid, "device_del", args, NULL, NULL); >> + rc = qmp_synchronous_send(qmp, "device_del", args, >> + NULL, NULL, qmp->timeout); >> + if (rc == 0) { >> + unsigned int retry = 0; >> + >> + do { >> + if (qmp_synchronous_send(qmp, "query-pci", NULL, >> + pci_del_callback, id, qmp->timeout) == 0) { > >Could you assign the return value of qmp_synchronous_send into a >variable, then check for success/error? > >qmp_synchronous_send returns several possible values: >- errors, when rc < 0 >- rc of the callback, which is 0 or 1 here. > >If there's an error, we don't want to keep trying we probably want to >return that error. Sure. Will do. But I prefer to continue device removal to clean up related status and mapping set up for the device in Xen (VT-d) and pciback. It at least makes the device available to other domains. Thanks Chao
diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c index 42c8ab8..18f6126 100644 --- a/tools/libxl/libxl_qmp.c +++ b/tools/libxl/libxl_qmp.c @@ -916,6 +916,38 @@ out: return rc; } +static int pci_del_callback(libxl__qmp_handler *qmp, + const libxl__json_object *response, void *opaque) +{ + const char *asked_id = opaque; + const libxl__json_object *bus = NULL; + GC_INIT(qmp->ctx); + int i, j, rc = 0; + + for (i = 0; (bus = libxl__json_array_get(response, i)); i++) { + const libxl__json_object *devices = NULL; + const libxl__json_object *device = NULL; + const libxl__json_object *o = NULL; + const char *id = NULL; + + devices = libxl__json_map_get("devices", bus, JSON_ARRAY); + + for (j = 0; (device = libxl__json_array_get(devices, j)); j++) { + o = libxl__json_map_get("qdev_id", device, JSON_STRING); + id = libxl__json_object_get_string(o); + + if (id && strcmp(asked_id, id) == 0) { + rc = 1; + goto out; + } + } + } + +out: + GC_FREE; + return rc; +} + static int qmp_run_command(libxl__gc *gc, int domid, const char *cmd, libxl__json_object *args, qmp_callback_t callback, void *opaque) @@ -1000,9 +1032,32 @@ int libxl__qmp_pci_add(libxl__gc *gc, int domid, libxl_device_pci *pcidev) static int qmp_device_del(libxl__gc *gc, int domid, char *id) { libxl__json_object *args = NULL; + libxl__qmp_handler *qmp = NULL; + int rc = 0; + + qmp = libxl__qmp_initialize(gc, domid); + if (!qmp) + return ERROR_FAIL; qmp_parameters_add_string(gc, &args, "id", id); - return qmp_run_command(gc, domid, "device_del", args, NULL, NULL); + rc = qmp_synchronous_send(qmp, "device_del", args, + NULL, NULL, qmp->timeout); + if (rc == 0) { + unsigned int retry = 0; + + do { + if (qmp_synchronous_send(qmp, "query-pci", NULL, + pci_del_callback, id, qmp->timeout) == 0) { + break; + } + LOGD(DEBUG, qmp->domid, + "Waiting for completion of deleting device %s", id); + sleep(1); + } while (retry++ < 5); + } + + libxl__qmp_close(qmp); + return rc; } int libxl__qmp_pci_del(libxl__gc *gc, int domid, libxl_device_pci *pcidev)
To remove a device from a domain, a qmp command is sent to qemu. But it is handled by qemu asychronously. Even the qmp command is claimed to be done, the actual handling in qemu side may happen later. This behavior brings two questions: 1. Attaching a device back to a domain right after detaching the device from that domain would fail with error: libxl: error: libxl_qmp.c:341:qmp_handle_error_response: Domain 1:received an error message from QMP server: Duplicate ID 'pci-pt-60_00.0' for device 2. Accesses to PCI configuration space in Qemu may overlap with later device reset issued by 'xl' or by pciback. In order to avoid mentioned questions, wait for the completion of device removal by querying all pci devices using qmp command and ensuring the target device isn't listed. Only retry 5 times to avoid 'xl' potentially being blocked by qemu. Signed-off-by: Chao Gao <chao.gao@intel.com> --- tools/libxl/libxl_qmp.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 56 insertions(+), 1 deletion(-)