Message ID | 20240425233517.2125142-3-bmarzins@redhat.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Delegated to: | christophe varoqui |
Headers | show |
Series | multipath: fix hang in flush_map_nopaths | expand |
On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote: > > 1. create a multipath device with a kpartx partition on top of it and > no_path_retry set to either "queue" or something long enough to run > all > the commands in the reproducer before it disables queueing. > 2. disable all the paths to the device with something like: > # echo offline > /sys/block/<path_dev>/device/state > 3. Write directly to the multipath device with something like: > # dd if=/dev/zero of=/dev/mapper/<mpath_dev> bs=4K count=1 > 4. delete all the paths to the device with something like: > # echo 1 > /sys/block/<path_dev>/device/delete I've tried to reproduce the issue with these commands. Test system was using a LIO iSCSI target with 2 paths. I created a test script (attached) to try the offline / IO / delete procedure repeatedly. I haven't been able to make multipathd hang even once. I also played around with dd options. If I use oflag=sync or oflag=direct, the dd command itself hangs. Did I set up anything wrongly, or does the behavior perhaps depend on the kernel, or something else perhaps? Mine was a 6.4 kernel. This is not to say there's something wrong with your patch, but I'd like to understand the error situation better, as it doesn't seem to be trigger-able on my test system. multipath.conf: defaults { verbosity 3 flush_on_last_del yes } blacklist { wwid QEMU } overrides { no_path_retry queue } Regards, Martin
On Tue, Apr 30, 2024 at 07:06:24PM +0200, Martin Wilck wrote: > On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote: > > > > 1. create a multipath device with a kpartx partition on top of it and > > no_path_retry set to either "queue" or something long enough to run > > all > > the commands in the reproducer before it disables queueing. > > 2. disable all the paths to the device with something like: > > # echo offline > /sys/block/<path_dev>/device/state > > 3. Write directly to the multipath device with something like: > > # dd if=/dev/zero of=/dev/mapper/<mpath_dev> bs=4K count=1 > > 4. delete all the paths to the device with something like: > > # echo 1 > /sys/block/<path_dev>/device/delete > > I've tried to reproduce the issue with these commands. Test system was > using a LIO iSCSI target with 2 paths. I created a test script > (attached) to try the offline / IO / delete procedure repeatedly. > I haven't been able to make multipathd hang even once. > > I also played around with dd options. If I use oflag=sync or > oflag=direct, the dd command itself hangs. > > Did I set up anything wrongly, or does the behavior perhaps depend on > the kernel, or something else perhaps? Mine was a 6.4 kernel. This is > not to say there's something wrong with your patch, but I'd like to > understand the error situation better, as it doesn't seem to be > trigger-able on my test system. > > multipath.conf: > > defaults { > verbosity 3 > flush_on_last_del yes If you set flush_on_last_del to "yes", then you won't be able to hit this, because you will never be queueing when multipathd tries to autoremove the device. The goal of my patch was to make sure multipathd never hung on an autoremove, regardless of the no_path_retry setting and the flush_on_last_del setting. With "always", the device will always have queueing disabled, so the device can be safely removed. With "unused", if the device is unused, queuing is disabled. Otherwise, multipathd will skip the autoremove if the device is queueing. With "never", multipathd will skip the autoremove if the device is queueing. Your script looks fine, but with a system set up to hit it, the bug should occur every time. -Ben > } > > blacklist { > wwid QEMU > } > > overrides { > no_path_retry queue > } > > Regards, > Martin > > >
On Tue, 2024-04-30 at 17:29 -0400, Benjamin Marzinski wrote: > On Tue, Apr 30, 2024 at 07:06:24PM +0200, Martin Wilck wrote: > > On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote: > > > > > > 1. create a multipath device with a kpartx partition on top of it > > > and > > > no_path_retry set to either "queue" or something long enough to > > > run > > > all > > > the commands in the reproducer before it disables queueing. > > > 2. disable all the paths to the device with something like: > > > # echo offline > /sys/block/<path_dev>/device/state > > > 3. Write directly to the multipath device with something like: > > > # dd if=/dev/zero of=/dev/mapper/<mpath_dev> bs=4K count=1 > > > 4. delete all the paths to the device with something like: > > > # echo 1 > /sys/block/<path_dev>/device/delete > > > > I've tried to reproduce the issue with these commands. Test system > > was > > using a LIO iSCSI target with 2 paths. I created a test script > > (attached) to try the offline / IO / delete procedure repeatedly. > > I haven't been able to make multipathd hang even once. > > > > I also played around with dd options. If I use oflag=sync or > > oflag=direct, the dd command itself hangs. > > > > Did I set up anything wrongly, or does the behavior perhaps depend > > on > > the kernel, or something else perhaps? Mine was a 6.4 kernel. This > > is > > not to say there's something wrong with your patch, but I'd like to > > understand the error situation better, as it doesn't seem to be > > trigger-able on my test system. > > > > multipath.conf: > > > > defaults { > > verbosity 3 > > flush_on_last_del yes > > If you set flush_on_last_del to "yes", then you won't be able to hit > this, because you will never be queueing when multipathd tries to > autoremove the device. The goal of my patch was to make sure > multipathd > never hung on an autoremove, regardless of the no_path_retry setting > and > the flush_on_last_del setting Stupid me. To my excuse, I'd set "flush_on_last_del yes" because I previously had been unable to reproduce the multipathd hang with the default setting "flush_on_last_del no", and thought I'd misunderstood something about flush_on_last_del. But I'd made some other mistake at that point, apparently, which caused the issue not to reproduce. I just set "flush_on_last_del no" and indeed reproduced the issue with my script, immediately. Thanks, Martin
On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote: > Commit 9bd3482e ("multipathd: make flush_map() delete maps like the > multipath command") fixed an issue where deleting a queueing > multipath > device through multipathd could hang because the multipath device had > outstanding IO, even though the only openers of it at the time of > deletion were the kpartx partition devices. However it is still > possible > to hang multipathd, because autoremoving the device when all paths > have > been deleted doesn't disable queueing. To reproduce this hang: > > 1. create a multipath device with a kpartx partition on top of it and > no_path_retry set to either "queue" or something long enough to run > all > the commands in the reproducer before it disables queueing. > 2. disable all the paths to the device with something like: > # echo offline > /sys/block/<path_dev>/device/state > 3. Write directly to the multipath device with something like: > # dd if=/dev/zero of=/dev/mapper/<mpath_dev> bs=4K count=1 > 4. delete all the paths to the device with something like: > # echo 1 > /sys/block/<path_dev>/device/delete > > Multipathd will hang trying to delete the kpartx device because, as > the > last opener, it must flush the multipath device before closing it. > Because it hangs holding the vecs_lock, multipathd will never disable > queueing on the device, so it will hang forever, even if > no_path_retry > is set to a number. > > This hang can occur, even if deferred_remove is set. Since nothing > has > the kpartx device opened, device-mapper does an immediate remove, > which > will still hang. This means that even if deferred_remove is set, > multipathd still cannot delete a map while queueing is enabled. It > must > either disable queueing or skip the autoremove. > > Mulitpath can currently be configured to avoid this hang by setting > > flush_on_last_del yes > > However there are good reasons why users wouldn't want to set that. > They > may need to be able to survive having all paths getting temporarily > deleted. I should note that this is a pretty rare corner case, since > multipath automatically sets dev_loss_tmo so that it should not > trigger > before queueing is disabled. > > This commit avoids the hang by changing the possible values for > flush_on_last_del to "never", "unused", and "always", and sets the > default to "unused". "always" works like "yes" did, "never" will not > disable queueing, and "unused" will only disable queueing if nothing > has > the kpartx devices or the multipath device open. In order to be safe, > if > the device has queue_if_no_paths set (and in case of "unused", the > device is in-use) the autoremove will be skipped. Also, instead of > just > trusting the lack of "queue_if_no_paths" in the current mpp- > >features, > multipathd will tell the kernel to disable queueing, just to be sure > it > actually is. > > I chose "unused" as the default because this should generally only > cause > multipathd to work differently from the users perspective when > nothing > has the multipath device open but it is queueing and there is > outstanding IO. Without this change, it would have hung instead of > failing the outstanding IO. However, I do understand that an argument > could be made that "never" makes more sense as default, even though > it > will cause multipathd to skip autoremoves in cases where it wouldn't > before. The change to the behavior of deffered_remove will be > noticeable, but skipping an autoremove is much better than hanging. > > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> > --- > libmultipath/defaults.h | 2 +- > libmultipath/dict.c | 72 +++++++++++++++++++++++++++++++-- > -- > libmultipath/dict.h | 1 + > libmultipath/hwtable.c | 6 +-- > libmultipath/propsel.c | 4 +- > libmultipath/structs.h | 7 ++-- > multipath/multipath.conf.5.in | 20 +++++++--- > multipathd/main.c | 39 +++++++++++++++---- > 8 files changed, 122 insertions(+), 29 deletions(-) > > diff --git a/libmultipath/defaults.h b/libmultipath/defaults.h > index 64b633f2..ed08c251 100644 The spell checker noted a typo ("parition") which I'm going to amend. Reviewed-by: Martin Wilck <mwilck@suse.com>
diff --git a/libmultipath/defaults.h b/libmultipath/defaults.h index 64b633f2..ed08c251 100644 --- a/libmultipath/defaults.h +++ b/libmultipath/defaults.h @@ -43,7 +43,7 @@ #define DEFAULT_PRIO PRIO_CONST #define DEFAULT_PRIO_ARGS "" #define DEFAULT_CHECKER TUR -#define DEFAULT_FLUSH FLUSH_DISABLED +#define DEFAULT_FLUSH FLUSH_UNUSED #define DEFAULT_USER_FRIENDLY_NAMES USER_FRIENDLY_NAMES_OFF #define DEFAULT_FORCE_SYNC 0 #define UNSET_PARTITION_DELIM "/UNSET/" diff --git a/libmultipath/dict.c b/libmultipath/dict.c index 5af036b7..546103f2 100644 --- a/libmultipath/dict.c +++ b/libmultipath/dict.c @@ -791,14 +791,70 @@ declare_def_snprint(checker_timeout, print_nonzero) declare_def_handler(allow_usb_devices, set_yes_no) declare_def_snprint(allow_usb_devices, print_yes_no) -declare_def_handler(flush_on_last_del, set_yes_no_undef) -declare_def_snprint_defint(flush_on_last_del, print_yes_no_undef, DEFAULT_FLUSH) -declare_ovr_handler(flush_on_last_del, set_yes_no_undef) -declare_ovr_snprint(flush_on_last_del, print_yes_no_undef) -declare_hw_handler(flush_on_last_del, set_yes_no_undef) -declare_hw_snprint(flush_on_last_del, print_yes_no_undef) -declare_mp_handler(flush_on_last_del, set_yes_no_undef) -declare_mp_snprint(flush_on_last_del, print_yes_no_undef) + +static const char * const flush_on_last_del_optvals[] = { + [FLUSH_NEVER] = "never", + [FLUSH_ALWAYS] = "always", + [FLUSH_UNUSED] = "unused", +}; + +static int +set_flush_on_last_del(vector strvec, void *ptr, const char *file, int line_nr) +{ + int i; + int *flush_val_ptr = (int *)ptr; + char *buff; + + buff = set_value(strvec); + if (!buff) + return 1; + + for (i = FLUSH_NEVER; i <= FLUSH_UNUSED; i++) { + if (flush_on_last_del_optvals[i] != NULL && + !strcmp(buff, flush_on_last_del_optvals[i])) { + *flush_val_ptr = i; + break; + } + } + + if (i > FLUSH_UNUSED) { + bool deprecated = true; + if (strcmp(buff, "no") == 0 || strcmp(buff, "0") == 0) + *flush_val_ptr = FLUSH_UNUSED; + else if (strcmp(buff, "yes") == 0 || strcmp(buff, "1") == 0) + *flush_val_ptr = FLUSH_ALWAYS; + else { + deprecated = false; + condlog(1, "%s line %d, invalid value for flush_on_last_del: \"%s\"", + file, line_nr, buff); + } + if (deprecated) + condlog(2, "%s line %d, \"%s\" is a deprecated value for flush_on_last_del and is treated as \"%s\"", + file, line_nr, buff, + flush_on_last_del_optvals[*flush_val_ptr]); + } + + free(buff); + return 0; +} + +int +print_flush_on_last_del(struct strbuf *buff, long v) +{ + if (v == FLUSH_UNDEF) + return 0; + return append_strbuf_quoted(buff, flush_on_last_del_optvals[(int)v]); +} + +declare_def_handler(flush_on_last_del, set_flush_on_last_del) +declare_def_snprint_defint(flush_on_last_del, print_flush_on_last_del, + DEFAULT_FLUSH) +declare_ovr_handler(flush_on_last_del, set_flush_on_last_del) +declare_ovr_snprint(flush_on_last_del, print_flush_on_last_del) +declare_hw_handler(flush_on_last_del, set_flush_on_last_del) +declare_hw_snprint(flush_on_last_del, print_flush_on_last_del) +declare_mp_handler(flush_on_last_del, set_flush_on_last_del) +declare_mp_snprint(flush_on_last_del, print_flush_on_last_del) declare_def_handler(user_friendly_names, set_yes_no_undef) declare_def_snprint_defint(user_friendly_names, print_yes_no_undef, diff --git a/libmultipath/dict.h b/libmultipath/dict.h index 7e2dfbe0..e1794537 100644 --- a/libmultipath/dict.h +++ b/libmultipath/dict.h @@ -18,4 +18,5 @@ int print_undef_off_zero(struct strbuf *buff, long v); int print_dev_loss(struct strbuf *buff, unsigned long v); int print_off_int_undef(struct strbuf *buff, long v); int print_auto_resize(struct strbuf *buff, long v); +int print_flush_on_last_del(struct strbuf *buff, long v); #endif /* _DICT_H */ diff --git a/libmultipath/hwtable.c b/libmultipath/hwtable.c index 640bf347..ce600030 100644 --- a/libmultipath/hwtable.c +++ b/libmultipath/hwtable.c @@ -60,7 +60,7 @@ .no_path_retry = NO_PATH_RETRY_UNDEF, .minio = 1000, .minio_rq = 1, - .flush_on_last_del = FLUSH_DISABLED, + .flush_on_last_del = FLUSH_UNUSED, .user_friendly_names = USER_FRIENDLY_NAMES_OFF, .fast_io_fail = 5, .dev_loss = 600, @@ -829,7 +829,7 @@ static struct hwentry default_hw[] = { .no_path_retry = NO_PATH_RETRY_QUEUE, .pgpolicy = GROUP_BY_PRIO, .pgfailback = -FAILBACK_IMMEDIATE, - .flush_on_last_del = FLUSH_ENABLED, + .flush_on_last_del = FLUSH_ALWAYS, .dev_loss = MAX_DEV_LOSS_TMO, .prio_name = PRIO_ONTAP, .user_friendly_names = USER_FRIENDLY_NAMES_OFF, @@ -1160,7 +1160,7 @@ static struct hwentry default_hw[] = { .no_path_retry = NO_PATH_RETRY_FAIL, .minio = 1, .minio_rq = 1, - .flush_on_last_del = FLUSH_ENABLED, + .flush_on_last_del = FLUSH_ALWAYS, .fast_io_fail = 15, }, /* diff --git a/libmultipath/propsel.c b/libmultipath/propsel.c index 44241e2a..e2dcb316 100644 --- a/libmultipath/propsel.c +++ b/libmultipath/propsel.c @@ -963,6 +963,7 @@ out: int select_flush_on_last_del(struct config *conf, struct multipath *mp) { const char *origin; + STRBUF_ON_STACK(buff); mp_set_mpe(flush_on_last_del); mp_set_ovr(flush_on_last_del); @@ -970,8 +971,9 @@ int select_flush_on_last_del(struct config *conf, struct multipath *mp) mp_set_conf(flush_on_last_del); mp_set_default(flush_on_last_del, DEFAULT_FLUSH); out: + print_flush_on_last_del(&buff, mp->flush_on_last_del); condlog(3, "%s: flush_on_last_del = %s %s", mp->alias, - (mp->flush_on_last_del == FLUSH_ENABLED)? "yes" : "no", origin); + get_strbuf_str(&buff), origin); return 0; } diff --git a/libmultipath/structs.h b/libmultipath/structs.h index a25eb9d5..dbaf4d43 100644 --- a/libmultipath/structs.h +++ b/libmultipath/structs.h @@ -109,9 +109,10 @@ enum marginal_pathgroups_mode { }; enum flush_states { - FLUSH_UNDEF = YNU_UNDEF, - FLUSH_DISABLED = YNU_NO, - FLUSH_ENABLED = YNU_YES, + FLUSH_UNDEF, + FLUSH_NEVER, + FLUSH_ALWAYS, + FLUSH_UNUSED, }; enum log_checker_err_states { diff --git a/multipath/multipath.conf.5.in b/multipath/multipath.conf.5.in index b29a75fe..8cdb763f 100644 --- a/multipath/multipath.conf.5.in +++ b/multipath/multipath.conf.5.in @@ -707,12 +707,22 @@ The default is: \fBno\fR .TP .B flush_on_last_del If set to -.I yes +.I always , multipathd will disable queueing when the last path to a device has been -deleted. -.RS -.TP -The default is: \fBno\fR +deleted. If set to +.I never +, multipathd will not disable queueing when the last path to a device has +been deleted. Since multipath cannot safely remove a device while queueing +is enabled, setting this to \fInever\fR means that multipathd will not +automatically remove an unused multipath device whose paths are all deleted if +it is currently set to queue_if_no_path. If set to +.I unused +, multipathd will only disable queueing when the last path is removed if +nothing currently has the multipath device or any of the kpartx parition +devices on top of it open. +.RS +.TP +The default is: \fBunused\fR .RE . . diff --git a/multipathd/main.c b/multipathd/main.c index d8518a92..09286dd0 100644 --- a/multipathd/main.c +++ b/multipathd/main.c @@ -584,19 +584,42 @@ int update_multipath (struct vectors *vecs, char *mapname) static bool flush_map_nopaths(struct multipath *mpp, struct vectors *vecs) { int r; + bool is_queueing = true; + if (mpp->features) + is_queueing = strstr(mpp->features, "queue_if_no_path"); + + /* It's not safe to do a remove of a map that has "queue_if_no_path" + * set, since there could be outstanding IO which will cause + * multipathd to hang while attempting the remove */ + if (mpp->flush_on_last_del == FLUSH_NEVER && is_queueing) { + condlog(2, "%s: map is queueing, can't remove", mpp->alias); + return false; + } + if (mpp->flush_on_last_del == FLUSH_UNUSED && + partmap_in_use(mpp->alias, NULL) && is_queueing) { + condlog(2, "%s: map in use and queueing, can't remove", + mpp->alias); + return false; + } /* - * flush_map will fail if the device is open + * This will flush FLUSH_NEVER devices and FLUSH_UNUSED devices + * that are in use, but only if they are already marked as not + * queueing. That is just to make absolutely certain that they + * really are not queueing, like they claim. */ - if (mpp->flush_on_last_del == FLUSH_ENABLED) { - condlog(2, "%s Last path deleted, disabling queueing", + condlog(is_queueing ? 2 : 3, "%s Last path deleted, disabling queueing", + mpp->alias); + mpp->retry_tick = 0; + mpp->no_path_retry = NO_PATH_RETRY_FAIL; + mpp->disable_queueing = 1; + mpp->stat_map_failures++; + if (dm_queue_if_no_path(mpp, 0) != 0) { + condlog(0, "%s: failed to disable queueing. Not removing", mpp->alias); - mpp->retry_tick = 0; - mpp->no_path_retry = NO_PATH_RETRY_FAIL; - mpp->disable_queueing = 1; - mpp->stat_map_failures++; - dm_queue_if_no_path(mpp, 0); + return false; } + r = dm_flush_map_nopaths(mpp->alias, mpp->deferred_remove); if (r != DM_FLUSH_OK) { if (r == DM_FLUSH_DEFERRED) {
Commit 9bd3482e ("multipathd: make flush_map() delete maps like the multipath command") fixed an issue where deleting a queueing multipath device through multipathd could hang because the multipath device had outstanding IO, even though the only openers of it at the time of deletion were the kpartx partition devices. However it is still possible to hang multipathd, because autoremoving the device when all paths have been deleted doesn't disable queueing. To reproduce this hang: 1. create a multipath device with a kpartx partition on top of it and no_path_retry set to either "queue" or something long enough to run all the commands in the reproducer before it disables queueing. 2. disable all the paths to the device with something like: # echo offline > /sys/block/<path_dev>/device/state 3. Write directly to the multipath device with something like: # dd if=/dev/zero of=/dev/mapper/<mpath_dev> bs=4K count=1 4. delete all the paths to the device with something like: # echo 1 > /sys/block/<path_dev>/device/delete Multipathd will hang trying to delete the kpartx device because, as the last opener, it must flush the multipath device before closing it. Because it hangs holding the vecs_lock, multipathd will never disable queueing on the device, so it will hang forever, even if no_path_retry is set to a number. This hang can occur, even if deferred_remove is set. Since nothing has the kpartx device opened, device-mapper does an immediate remove, which will still hang. This means that even if deferred_remove is set, multipathd still cannot delete a map while queueing is enabled. It must either disable queueing or skip the autoremove. Mulitpath can currently be configured to avoid this hang by setting flush_on_last_del yes However there are good reasons why users wouldn't want to set that. They may need to be able to survive having all paths getting temporarily deleted. I should note that this is a pretty rare corner case, since multipath automatically sets dev_loss_tmo so that it should not trigger before queueing is disabled. This commit avoids the hang by changing the possible values for flush_on_last_del to "never", "unused", and "always", and sets the default to "unused". "always" works like "yes" did, "never" will not disable queueing, and "unused" will only disable queueing if nothing has the kpartx devices or the multipath device open. In order to be safe, if the device has queue_if_no_paths set (and in case of "unused", the device is in-use) the autoremove will be skipped. Also, instead of just trusting the lack of "queue_if_no_paths" in the current mpp->features, multipathd will tell the kernel to disable queueing, just to be sure it actually is. I chose "unused" as the default because this should generally only cause multipathd to work differently from the users perspective when nothing has the multipath device open but it is queueing and there is outstanding IO. Without this change, it would have hung instead of failing the outstanding IO. However, I do understand that an argument could be made that "never" makes more sense as default, even though it will cause multipathd to skip autoremoves in cases where it wouldn't before. The change to the behavior of deffered_remove will be noticeable, but skipping an autoremove is much better than hanging. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> --- libmultipath/defaults.h | 2 +- libmultipath/dict.c | 72 +++++++++++++++++++++++++++++++---- libmultipath/dict.h | 1 + libmultipath/hwtable.c | 6 +-- libmultipath/propsel.c | 4 +- libmultipath/structs.h | 7 ++-- multipath/multipath.conf.5.in | 20 +++++++--- multipathd/main.c | 39 +++++++++++++++---- 8 files changed, 122 insertions(+), 29 deletions(-)