Message ID | 20240425233517.2125142-5-bmarzins@redhat.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Delegated to: | christophe varoqui |
Headers | show |
Series | multipath: fix hang in flush_map_nopaths | expand |
On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote: > Currently multipath makes sure that dev_loss_tmo is at least as long > as > the configured no path queueing time. The goal is to make sure that > path > devices aren't removed while the multipath device is still queueing > in > hopes that they will become usable again. > > This is racy. Multipathd may take longer to check the paths than > configured. If strict_timing isn't set, it will definitely take > longer. > To account for this, pad the minimum dev_loss_tmo value by five > seconds > (one default checker interval) plus one second per minute of no path > queueing time. What's the rationale for the "one second per minute" part? It feels kind of arbitrary to me, and I don't find it obvious that we need larger padding for larger timeouts. Regards Martin
On Thu, May 02, 2024 at 05:14:56PM +0200, Martin Wilck wrote: > On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote: > > Currently multipath makes sure that dev_loss_tmo is at least as long > > as > > the configured no path queueing time. The goal is to make sure that > > path > > devices aren't removed while the multipath device is still queueing > > in > > hopes that they will become usable again. > > > > This is racy. Multipathd may take longer to check the paths than > > configured. If strict_timing isn't set, it will definitely take > > longer. > > To account for this, pad the minimum dev_loss_tmo value by five > > seconds > > (one default checker interval) plus one second per minute of no path > > queueing time. > > What's the rationale for the "one second per minute" part? > It feels kind of arbitrary to me, and I don't find it obvious that we > need larger padding for larger timeouts. The way the checker works, without strict_timing, we do all the checking work, and then we wait for a second. Obviously, the checking work takes time, so every nominal one second checker loop will in reality take longer than a second. The larger you have configured (no_path_retry * checkint), the further away the actual time when we disable queueing will be from (no_path_retry * checkint) seconds. With strict_timing on, things work better. As long as the checking work doesn't take more than a second, you will stay fairly close to one second per loop, but even still, it can be longer. nanosleep() will sleep till at least the time specified. If multipathd is not running as a realtime process, it will usually sleep for longer. Also, if the checker work takes longer than a second, then strict_timing will make sure that the loop takes (close to) exactly 2 (or whatever the next number is) seconds. However, retry_count_tick() still only ticks down by 1 for each loop. This means that the real time before queueing is disabled can diverge from (no_path_retry * checkint). And again, the longer you've configured it to wait, the more it diverges. I thought about passing ticks to retry_count_tick(), so that it would correctly deal with these multi-tick loops, but this can cause it to diverge from the actual number of retries attempted, in a way that would mean we disable queueing before the desired number of retries have occurred, which is a bigger problem, IMHO. No matter how long the checker work takes, you will only ever do one path check per loop. In the worse case, say that the checker work (or anything else in multipathd) gets stuck for a couple of minutes. You would only do one path check, but hundreds of ticks would be passed to the retry_count_tick(), meaning that you could disable queueing after only one retry. Now you could limit the number of ticks you passed to retry_count_tick(), but even still, if the a path was 1 tick away from being checked, and the checker work took 3 seconds, you would still pass 3 ticks to retry_count_tick(), which would cause it to start to diverge from the actual time it takes for the path to do the configured amount of retries. We could improve retry_count_tick(), so that it actually tracks the number of retries you did, but that still leaves the problems when either strict_timing isn't on or multipathd isn't a realtime process. It seemed easier to just say, the longer you have configured multipathd to wait to disable queueing, the more the time when it actually disables queueing can diverge from the configured time. The number I picked for the rate of divergence is just a guest, and if you don't like it, I'm fine with removing it. After all, the whole point of this patchset is to make sure that the worst thing that can happen if paths disappear while we are still queueing is that a multipath device isn't autoremoved. -Ben > > Regards > Martin
On Thu, 2024-05-02 at 12:05 -0400, Benjamin Marzinski wrote: > On Thu, May 02, 2024 at 05:14:56PM +0200, Martin Wilck wrote: > > On Thu, 2024-04-25 at 19:35 -0400, Benjamin Marzinski wrote: > > > Currently multipath makes sure that dev_loss_tmo is at least as > > > long > > > as > > > the configured no path queueing time. The goal is to make sure > > > that > > > path > > > devices aren't removed while the multipath device is still > > > queueing > > > in > > > hopes that they will become usable again. > > > > > > This is racy. Multipathd may take longer to check the paths than > > > configured. If strict_timing isn't set, it will definitely take > > > longer. > > > To account for this, pad the minimum dev_loss_tmo value by five > > > seconds > > > (one default checker interval) plus one second per minute of no > > > path > > > queueing time. > > > > What's the rationale for the "one second per minute" part? > > It feels kind of arbitrary to me, and I don't find it obvious that > > we > > need larger padding for larger timeouts. > > The way the checker works, without strict_timing, we do all the > checking > work, and then we wait for a second. Obviously, the checking work > takes > time, so every nominal one second checker loop will in reality take > longer than a second. The larger you have configured (no_path_retry * > checkint), the further away the actual time when we disable queueing > will be from (no_path_retry * checkint) seconds. > > With strict_timing on, things work better. As long as the checking > work > doesn't take more than a second, you will stay fairly close to one > second per loop, but even still, it can be longer. nanosleep() will > sleep till at least the time specified. If multipathd is not running > as > a realtime process, it will usually sleep for longer. > > Also, if the checker work takes longer than a second, then > strict_timing > will make sure that the loop takes (close to) exactly 2 (or whatever > the > next number is) seconds. However, retry_count_tick() still only ticks > down by 1 for each loop. This means that the real time before > queueing > is disabled can diverge from (no_path_retry * checkint). And again, > the > longer you've configured it to wait, the more it diverges. > > I thought about passing ticks to retry_count_tick(), so that it would > correctly deal with these multi-tick loops, but this can cause it to > diverge from the actual number of retries attempted, in a way that > would > mean we disable queueing before the desired number of retries have > occurred, which is a bigger problem, IMHO. No matter how long the > checker work takes, you will only ever do one path check per loop. In > the worse case, say that the checker work (or anything else in > multipathd) gets stuck for a couple of minutes. You would only do one > path check, but hundreds of ticks would be passed to the > retry_count_tick(), meaning that you could disable queueing after > only > one retry. Now you could limit the number of ticks you passed to > retry_count_tick(), but even still, if the a path was 1 tick away > from > being checked, and the checker work took 3 seconds, you would still > pass > 3 ticks to retry_count_tick(), which would cause it to start to > diverge > from the actual time it takes for the path to do the configured > amount > of retries. > > We could improve retry_count_tick(), so that it actually tracks the > number of retries you did, but that still leaves the problems when > either strict_timing isn't on or multipathd isn't a realtime process. > It seemed easier to just say, the longer you have configured > multipathd > to wait to disable queueing, the more the time when it actually > disables > queueing can diverge from the configured time. > > The number I picked for the rate of divergence is just a guest, and > if > you don't like it, I'm fine with removing it. After all, the whole > point > of this patchset is to make sure that the worst thing that can happen > if > paths disappear while we are still queueing is that a multipath > device > isn't autoremoved. I was just asking for a rationale. This is fine. I think we can, and should, improve multipathd's timing. Unfortunately this has been on my agenda for a long time already. For 4/5, too: Reviewed-by: Martin Wilck <mwilck@suse.com>
diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c index 6fd4dabb..e2052422 100644 --- a/libmultipath/discovery.c +++ b/libmultipath/discovery.c @@ -901,6 +901,11 @@ sysfs_set_scsi_tmo (struct config *conf, struct multipath *mpp) uint64_t no_path_retry_tmo = (uint64_t)mpp->no_path_retry * conf->checkint; + /* pad no_path_retry_tmo by one standard check interval + * plus one second per minute to account for timing + * issues with the rechecks */ + no_path_retry_tmo += no_path_retry_tmo / 60 + DEFAULT_CHECKINT; + if (no_path_retry_tmo > MAX_DEV_LOSS_TMO) min_dev_loss = MAX_DEV_LOSS_TMO; else
Currently multipath makes sure that dev_loss_tmo is at least as long as the configured no path queueing time. The goal is to make sure that path devices aren't removed while the multipath device is still queueing in hopes that they will become usable again. This is racy. Multipathd may take longer to check the paths than configured. If strict_timing isn't set, it will definitely take longer. To account for this, pad the minimum dev_loss_tmo value by five seconds (one default checker interval) plus one second per minute of no path queueing time. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> --- libmultipath/discovery.c | 5 +++++ 1 file changed, 5 insertions(+)