Message ID | 20190409164542.30274-5-anthony.perard@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | libxl: New slow lock + fix libxl_cdrom_insert with QEMU depriv | expand |
Anthony PERARD writes ("[PATCH 4/9] libxl_internal: Create new lock for devices hotplug via QMP"): > The current lock `domain_userdata_lock' can't be used when modification > to a guest is done by sending command to QEMU, this is a slow process > and requires to call CTX_UNLOCK, which is not possible while holding > the `domain_userdata_lock'. ... > +struct libxl__domain_qmp_lock { > + libxl__generic_lock lock; > +}; > + > +libxl__domain_qmp_lock *libxl__lock_domain_qmp(libxl__gc *gc, > + libxl_domid domid) > +{ > + libxl__domain_qmp_lock *lock = NULL; > + int rc; > + > + lock = libxl__zalloc(NOGC, sizeof(*lock)); > + rc = libxl__lock_generic(gc, domid, &lock->lock, > + "libxl-device-changes-lock"); > + if (rc) { > + libxl__unlock_domain_qmp(lock); > + return NULL; > + } > + > + return lock; This is almost identical to the libxl__lock_domain_userdata which appeared in the previous patch. That suggests that the factorisation here is at the wrong layer. > +void libxl__unlock_domain_qmp(libxl__domain_qmp_lock *lock) > +{ > + libxl__unlock_generic(&lock->lock); > + free(lock); > +} This is completely identical to libxl__unlock_domain_userdata. The only reason we have to two of these is so that the two locks are distinguished by the type of the lock struct. That means that you have to call libxl__unlock_domain_qmp(foo->qmp_lock) but libxl__unlock_domain_userdate(foo->userdata_lock) or some such, and the compiler will spot if you write libxl__unlock_domain_userdata(foo->qmp_lock) or something. But is it really useful to have to write the `qmp' vs `userdata' thing twice ? > + * It is to be acquired by an ao event callback. I think there is a worse problem here, though. This lock is probably *inside* some lock from the caller. So usual libxl rules apply and you may not block to acquaire it. Ie libxl__lock_domain_qmp must be one of these things that takes a little ev state struct and makes a callback. At the syscall level, acquiring an fcntl lock cannot be selected on. The options are to poll, or to spawn a thread, or to fork. Currently libxl does not call pthread_create, although maybe it could do. To safely create a thread you have to do a dance with sigprocmask, to avoid signal delivery onto your thread. Maybe it would be better to try once with LOCK_NB, and to fork if this is not successful. But it would be simpler to always fork... > + * Or in case QEMU is the primary config, this pattern can be use: > + * lock qmp (qmp_lock) > + * lock json config (json_lock) > + * read json config > + * update in-memory json config with new entry, replacing > + * any stale entry > + * unlock json config > + * apply new config to primary config > + * lock json config (json_lock) > + * read json config > + * update in-memory json config with new entry, replacing > + * any stale entry > + * write in-memory json config to disk > + * unlock json config > + * unlock qmp This doesn't show the ctx locks but I think that is fine. So I think this description is correct. Ian.
On Tue, Jun 04, 2019 at 06:11:23PM +0100, Ian Jackson wrote: > Anthony PERARD writes ("[PATCH 4/9] libxl_internal: Create new lock for devices hotplug via QMP"): > > The current lock `domain_userdata_lock' can't be used when modification > > to a guest is done by sending command to QEMU, this is a slow process > > and requires to call CTX_UNLOCK, which is not possible while holding > > the `domain_userdata_lock'. > ... > > +struct libxl__domain_qmp_lock { > > + libxl__generic_lock lock; > > +}; > > + > > +libxl__domain_qmp_lock *libxl__lock_domain_qmp(libxl__gc *gc, > > + libxl_domid domid) > > +{ > > + libxl__domain_qmp_lock *lock = NULL; > > + int rc; > > + > > + lock = libxl__zalloc(NOGC, sizeof(*lock)); > > + rc = libxl__lock_generic(gc, domid, &lock->lock, > > + "libxl-device-changes-lock"); > > + if (rc) { > > + libxl__unlock_domain_qmp(lock); > > + return NULL; > > + } > > + > > + return lock; > > This is almost identical to the libxl__lock_domain_userdata which > appeared in the previous patch. That suggests that the factorisation > here is at the wrong layer. > > > +void libxl__unlock_domain_qmp(libxl__domain_qmp_lock *lock) > > +{ > > + libxl__unlock_generic(&lock->lock); > > + free(lock); > > +} > > This is completely identical to libxl__unlock_domain_userdata. The > only reason we have to two of these is so that the two locks are > distinguished by the type of the lock struct. That means that you > have to call > libxl__unlock_domain_qmp(foo->qmp_lock) > but > libxl__unlock_domain_userdate(foo->userdata_lock) > or some such, and the compiler will spot if you write > libxl__unlock_domain_userdata(foo->qmp_lock) > or something. But is it really useful to have to write the `qmp' vs > `userdata' thing twice ? So, instead of that interface, how about a different one that uses the same C type for both kind of lock? libxl__lock *libxl__lock_domain_userdata(libxl__gc *, libxl_domid); libxl__lock *libxl__lock_domain_qmp(libxl__gc *, libxl_domid); void libxl__unlock(libxl__lock *); Or maybe avoid having two functions for locking and use a #define/enum instead: libxl__lock_domain(gc, domid, LOCK_USERDATA); libxl__lock_domain(gc, domid, LOCK_QMP); I'm not sur what should the last parameter be. Either a string or a enum. An enum would be better because the lock filename would be all in a single location in the source code. What do you think? Would the first proposal be enough to avoid having to write `userdata' or `qmp' twice on unlock? > > + * It is to be acquired by an ao event callback. > > I think there is a worse problem here, though. This lock is probably > *inside* some lock from the caller. So usual libxl rules apply and > you may not block to acquaire it. > > Ie libxl__lock_domain_qmp must be one of these things that takes a > little ev state struct and makes a callback. > > At the syscall level, acquiring an fcntl lock cannot be selected on. > The options are to poll, or to spawn a thread, or to fork. > > Currently libxl does not call pthread_create, although maybe it could > do. To safely create a thread you have to do a dance with > sigprocmask, to avoid signal delivery onto your thread. > > Maybe it would be better to try once with LOCK_NB, and to fork if this > is not successful. But it would be simpler to always fork... After our talk IRL, I'll go the fork route. Also, I'm thinking to always fork when libxl is built with "debug=y", and allow the optimisation of trying first with LOCK_NB when built with "debug=n", so the forked code will actually be tested regulary. Thanks,
Anthony PERARD writes ("Re: [PATCH 4/9] libxl_internal: Create new lock for devices hotplug via QMP"): > So, instead of that interface, how about a different one that uses the > same C type for both kind of lock? > > libxl__lock *libxl__lock_domain_userdata(libxl__gc *, libxl_domid); > libxl__lock *libxl__lock_domain_qmp(libxl__gc *, libxl_domid); > void libxl__unlock(libxl__lock *); I think that would be fine. > Or maybe avoid having two functions for locking and use a #define/enum > instead: > libxl__lock_domain(gc, domid, LOCK_USERDATA); > libxl__lock_domain(gc, domid, LOCK_QMP); Or this. But I think maybe this conversation will be superseded by the need to redo the implementation which will result in a totally different API for the slow lock, and probably a different state struct. > What do you think? Would the first proposal be enough to avoid having to > write `userdata' or `qmp' twice on unlock? I don't *mind* the writing `userdata' or `qmp' twice. I just discounting it as a *benefit*. I mind the duplicated implementation code. > > Maybe it would be better to try once with LOCK_NB, and to fork if this > > is not successful. But it would be simpler to always fork... > > After our talk IRL, I'll go the fork route. > Also, I'm thinking to always fork when libxl is built with "debug=y", > and allow the optimisation of trying first with LOCK_NB when built with > "debug=n", so the forked code will actually be tested regulary. Good plan. Ian.
diff --git a/tools/libxl/libxl_internal.c b/tools/libxl/libxl_internal.c index fa0bbc3960..db281ac259 100644 --- a/tools/libxl/libxl_internal.c +++ b/tools/libxl/libxl_internal.c @@ -607,6 +607,33 @@ void libxl__update_domain_configuration(libxl__gc *gc, dst->b_info.video_memkb = src->b_info.video_memkb; } +struct libxl__domain_qmp_lock { + libxl__generic_lock lock; +}; + +libxl__domain_qmp_lock *libxl__lock_domain_qmp(libxl__gc *gc, + libxl_domid domid) +{ + libxl__domain_qmp_lock *lock = NULL; + int rc; + + lock = libxl__zalloc(NOGC, sizeof(*lock)); + rc = libxl__lock_generic(gc, domid, &lock->lock, + "libxl-device-changes-lock"); + if (rc) { + libxl__unlock_domain_qmp(lock); + return NULL; + } + + return lock; +} + +void libxl__unlock_domain_qmp(libxl__domain_qmp_lock *lock) +{ + libxl__unlock_generic(&lock->lock); + free(lock); +} + /* * Local variables: * mode: C diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index f1aefaf98a..43b44f2c30 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -2561,6 +2561,21 @@ typedef struct libxl__ao_device libxl__ao_device; typedef struct libxl__multidev libxl__multidev; typedef void libxl__device_callback(libxl__egc*, libxl__ao_device*); +/* + * Lock for device hotplug, qmp_lock. + * + * This lock is outside the json_lock lock in lock hierarchy. + * It is also outside the libxl_ctx lock. + * It is to be acquired by an ao event callback. + * + * It is to be acquired when adding/removing devices or making changes + * to them via QMP, when this is a slow operation. + */ +typedef struct libxl__domain_qmp_lock libxl__domain_qmp_lock; +_hidden libxl__domain_qmp_lock *libxl__lock_domain_qmp(libxl__gc *gc, + libxl_domid domid); +_hidden void libxl__unlock_domain_qmp(libxl__domain_qmp_lock *lock); + /* This functions sets the necessary libxl__ao_device struct values to use * safely inside functions. It marks the operation as "active" * since we need to be sure that all device status structs are set @@ -2749,11 +2764,11 @@ struct libxl__multidev { * device information, in JSON files, so that we can use this JSON * file as a template to reconstruct domain configuration. * - * In essense there are now two views of device state, one is xenstore, - * the other is JSON file. We use xenstore as primary reference. + * In essense there are now two views of device state, one is the + * primary config (xenstore or QEMU), the other is JSON file. * - * Here we maintain one invariant: every device in xenstore must have - * an entry in JSON file. + * Here we maintain one invariant: every device in the primary config + * must have an entry in JSON file. * * All device hotplug routines should comply to following pattern: * lock json config (json_lock) @@ -2768,6 +2783,22 @@ struct libxl__multidev { * end for loop * unlock json config * + * Or in case QEMU is the primary config, this pattern can be use: + * lock qmp (qmp_lock) + * lock json config (json_lock) + * read json config + * update in-memory json config with new entry, replacing + * any stale entry + * unlock json config + * apply new config to primary config + * lock json config (json_lock) + * read json config + * update in-memory json config with new entry, replacing + * any stale entry + * write in-memory json config to disk + * unlock json config + * unlock qmp + * * Device removal routines are not touched. * * Here is the proof that we always maintain that invariant and we