Message ID | 20220528061949.28519-2-colyli@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | bcache fix for Linux v5.19 (3rd wave) | expand |
On 5/28/22 12:19 AM, Coly Li wrote: > The kworker routine update_writeback_rate() is schedued to update the > writeback rate in every 5 seconds by default. Before calling > __update_writeback_rate() to do real job, semaphore dc->writeback_lock > should be held by the kworker routine. > > At the same time, bcache writeback thread routine bch_writeback_thread() > also needs to hold dc->writeback_lock before flushing dirty data back > into the backing device. If the dirty data set is large, it might be > very long time for bch_writeback_thread() to scan all dirty buckets and > releases dc->writeback_lock. In such case update_writeback_rate() can be > starved for long enough time so that kernel reports a soft lockup warn- > ing started like: > watchdog: BUG: soft lockup - CPU#246 stuck for 23s! [kworker/246:31:179713] > > Such soft lockup condition is unnecessary, because after the writeback > thread finishes its job and releases dc->writeback_lock, the kworker > update_writeback_rate() may continue to work and everything is fine > indeed. > > This patch avoids the unnecessary soft lockup by the following method, > - Add new member to struct cached_dev > - dc->rate_update_retry (0 by default) > - In update_writeback_rate() call down_read_trylock(&dc->writeback_lock) > firstly, if it fails then lock contention happens. > - If dc->rate_update_retry <= BCH_WBRATE_UPDATE_RETRY_MAX (15), doesn't > acquire the lock and reschedules the kworker for next try. > - If dc->rate_update_retry > BCH_WBRATE_UPDATE_RETRY_MAX, no retry > anymore and call down_read(&dc->writeback_lock) to wait for the lock. > > By the above method, at worst case update_writeback_rate() may retry for > 1+ minutes before blocking on dc->writeback_lock by calling down_read(). > For a 4TB cache device with 1TB dirty data, 90%+ of the unnecessary soft > lockup warning message can be avoided. > > When retrying to acquire dc->writeback_lock in update_writeback_rate(), > of course the writeback rate cannot be updated. It is fair, because when > the kworker is blocked on the lock contention of dc->writeback_lock, the > writeback rate cannot be updated neither. > > This change follows Jens Axboe's suggestion to a more clear and simple > version. This looks fine, but it doesn't apply to my current for-5.19/drivers branch which the previous ones did. Did you spin this one without the other patches, perhaps? One minor thing we might want to change if you're respinning it - BCH_WBRATE_UPDATE_RETRY_MAX isn't really named for what it does, since it doesn't retry anything, it simply allows updates to be skipped. Why not call it BCH_WBRATE_UPDATE_MAX_SKIPS instead? I think that'd be better convey what it does.
> 2022年5月28日 20:20,Jens Axboe <axboe@kernel.dk> 写道: > > On 5/28/22 12:19 AM, Coly Li wrote: >> The kworker routine update_writeback_rate() is schedued to update the >> writeback rate in every 5 seconds by default. Before calling >> __update_writeback_rate() to do real job, semaphore dc->writeback_lock >> should be held by the kworker routine. >> >> At the same time, bcache writeback thread routine bch_writeback_thread() >> also needs to hold dc->writeback_lock before flushing dirty data back >> into the backing device. If the dirty data set is large, it might be >> very long time for bch_writeback_thread() to scan all dirty buckets and >> releases dc->writeback_lock. In such case update_writeback_rate() can be >> starved for long enough time so that kernel reports a soft lockup warn- >> ing started like: >> watchdog: BUG: soft lockup - CPU#246 stuck for 23s! [kworker/246:31:179713] >> >> Such soft lockup condition is unnecessary, because after the writeback >> thread finishes its job and releases dc->writeback_lock, the kworker >> update_writeback_rate() may continue to work and everything is fine >> indeed. >> >> This patch avoids the unnecessary soft lockup by the following method, >> - Add new member to struct cached_dev >> - dc->rate_update_retry (0 by default) >> - In update_writeback_rate() call down_read_trylock(&dc->writeback_lock) >> firstly, if it fails then lock contention happens. >> - If dc->rate_update_retry <= BCH_WBRATE_UPDATE_RETRY_MAX (15), doesn't >> acquire the lock and reschedules the kworker for next try. >> - If dc->rate_update_retry > BCH_WBRATE_UPDATE_RETRY_MAX, no retry >> anymore and call down_read(&dc->writeback_lock) to wait for the lock. >> >> By the above method, at worst case update_writeback_rate() may retry for >> 1+ minutes before blocking on dc->writeback_lock by calling down_read(). >> For a 4TB cache device with 1TB dirty data, 90%+ of the unnecessary soft >> lockup warning message can be avoided. >> >> When retrying to acquire dc->writeback_lock in update_writeback_rate(), >> of course the writeback rate cannot be updated. It is fair, because when >> the kworker is blocked on the lock contention of dc->writeback_lock, the >> writeback rate cannot be updated neither. >> >> This change follows Jens Axboe's suggestion to a more clear and simple >> version. > > This looks fine, but it doesn't apply to my current for-5.19/drivers > branch which the previous ones did. Did you spin this one without the > other patches, perhaps? > > One minor thing we might want to change if you're respinning it - > BCH_WBRATE_UPDATE_RETRY_MAX isn't really named for what it does, since > it doesn't retry anything, it simply allows updates to be skipped. Why > not call it BCH_WBRATE_UPDATE_MAX_SKIPS instead? I think that'd be > better convey what it does. Naming is often challenge for me. Sure, _MAX_SKIPS is better. I will post another modified version. Thanks for the suggestion. Coly Li
On 5/28/22 6:22 AM, Coly Li wrote: > > >> 2022?5?28? 20:20?Jens Axboe <axboe@kernel.dk> ??? >> >> On 5/28/22 12:19 AM, Coly Li wrote: >>> The kworker routine update_writeback_rate() is schedued to update the >>> writeback rate in every 5 seconds by default. Before calling >>> __update_writeback_rate() to do real job, semaphore dc->writeback_lock >>> should be held by the kworker routine. >>> >>> At the same time, bcache writeback thread routine bch_writeback_thread() >>> also needs to hold dc->writeback_lock before flushing dirty data back >>> into the backing device. If the dirty data set is large, it might be >>> very long time for bch_writeback_thread() to scan all dirty buckets and >>> releases dc->writeback_lock. In such case update_writeback_rate() can be >>> starved for long enough time so that kernel reports a soft lockup warn- >>> ing started like: >>> watchdog: BUG: soft lockup - CPU#246 stuck for 23s! [kworker/246:31:179713] >>> >>> Such soft lockup condition is unnecessary, because after the writeback >>> thread finishes its job and releases dc->writeback_lock, the kworker >>> update_writeback_rate() may continue to work and everything is fine >>> indeed. >>> >>> This patch avoids the unnecessary soft lockup by the following method, >>> - Add new member to struct cached_dev >>> - dc->rate_update_retry (0 by default) >>> - In update_writeback_rate() call down_read_trylock(&dc->writeback_lock) >>> firstly, if it fails then lock contention happens. >>> - If dc->rate_update_retry <= BCH_WBRATE_UPDATE_RETRY_MAX (15), doesn't >>> acquire the lock and reschedules the kworker for next try. >>> - If dc->rate_update_retry > BCH_WBRATE_UPDATE_RETRY_MAX, no retry >>> anymore and call down_read(&dc->writeback_lock) to wait for the lock. >>> >>> By the above method, at worst case update_writeback_rate() may retry for >>> 1+ minutes before blocking on dc->writeback_lock by calling down_read(). >>> For a 4TB cache device with 1TB dirty data, 90%+ of the unnecessary soft >>> lockup warning message can be avoided. >>> >>> When retrying to acquire dc->writeback_lock in update_writeback_rate(), >>> of course the writeback rate cannot be updated. It is fair, because when >>> the kworker is blocked on the lock contention of dc->writeback_lock, the >>> writeback rate cannot be updated neither. >>> >>> This change follows Jens Axboe's suggestion to a more clear and simple >>> version. >> >> This looks fine, but it doesn't apply to my current for-5.19/drivers >> branch which the previous ones did. Did you spin this one without the >> other patches, perhaps? >> >> One minor thing we might want to change if you're respinning it - >> BCH_WBRATE_UPDATE_RETRY_MAX isn't really named for what it does, since >> it doesn't retry anything, it simply allows updates to be skipped. Why >> not call it BCH_WBRATE_UPDATE_MAX_SKIPS instead? I think that'd be >> better convey what it does. > > Naming is often challenge for me. Sure, _MAX_SKIPS is better. I will > post another modified version. It's hard for everyone :-) Sounds good, I'll get it applied when it shows up.
diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h index 9ed9c955add7..24735a5e839f 100644 --- a/drivers/md/bcache/bcache.h +++ b/drivers/md/bcache/bcache.h @@ -395,6 +395,13 @@ struct cached_dev { atomic_t io_errors; unsigned int error_limit; unsigned int offline_seconds; + + /* + * Retry to update writeback_rate if contention happens for + * down_read(dc->writeback_lock) in update_writeback_rate() + */ +#define BCH_WBRATE_UPDATE_RETRY_MAX 15 + unsigned int rate_update_retry; }; enum alloc_reserve { diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index d138a2d73240..e8df6e5fe012 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -235,19 +235,27 @@ static void update_writeback_rate(struct work_struct *work) return; } - if (atomic_read(&dc->has_dirty) && dc->writeback_percent) { - /* - * If the whole cache set is idle, set_at_max_writeback_rate() - * will set writeback rate to a max number. Then it is - * unnecessary to update writeback rate for an idle cache set - * in maximum writeback rate number(s). - */ - if (!set_at_max_writeback_rate(c, dc)) { - down_read(&dc->writeback_lock); + /* + * If the whole cache set is idle, set_at_max_writeback_rate() + * will set writeback rate to a max number. Then it is + * unnecessary to update writeback rate for an idle cache set + * in maximum writeback rate number(s). + */ + if (atomic_read(&dc->has_dirty) && dc->writeback_percent && + !set_at_max_writeback_rate(c, dc)) { + do { + if (!down_read_trylock((&dc->writeback_lock))) { + dc->rate_update_retry++; + if (dc->rate_update_retry <= + BCH_WBRATE_UPDATE_RETRY_MAX) + break; + down_read(&dc->writeback_lock); + dc->rate_update_retry = 0; + } __update_writeback_rate(dc); update_gc_after_writeback(c); up_read(&dc->writeback_lock); - } + } while (0); } @@ -1006,6 +1014,9 @@ void bch_cached_dev_writeback_init(struct cached_dev *dc) dc->writeback_rate_fp_term_high = 1000; dc->writeback_rate_i_term_inverse = 10000; + /* For dc->writeback_lock contention in update_writeback_rate() */ + dc->rate_update_retry = 0; + WARN_ON(test_and_clear_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags)); INIT_DELAYED_WORK(&dc->writeback_rate_update, update_writeback_rate); }
The kworker routine update_writeback_rate() is schedued to update the writeback rate in every 5 seconds by default. Before calling __update_writeback_rate() to do real job, semaphore dc->writeback_lock should be held by the kworker routine. At the same time, bcache writeback thread routine bch_writeback_thread() also needs to hold dc->writeback_lock before flushing dirty data back into the backing device. If the dirty data set is large, it might be very long time for bch_writeback_thread() to scan all dirty buckets and releases dc->writeback_lock. In such case update_writeback_rate() can be starved for long enough time so that kernel reports a soft lockup warn- ing started like: watchdog: BUG: soft lockup - CPU#246 stuck for 23s! [kworker/246:31:179713] Such soft lockup condition is unnecessary, because after the writeback thread finishes its job and releases dc->writeback_lock, the kworker update_writeback_rate() may continue to work and everything is fine indeed. This patch avoids the unnecessary soft lockup by the following method, - Add new member to struct cached_dev - dc->rate_update_retry (0 by default) - In update_writeback_rate() call down_read_trylock(&dc->writeback_lock) firstly, if it fails then lock contention happens. - If dc->rate_update_retry <= BCH_WBRATE_UPDATE_RETRY_MAX (15), doesn't acquire the lock and reschedules the kworker for next try. - If dc->rate_update_retry > BCH_WBRATE_UPDATE_RETRY_MAX, no retry anymore and call down_read(&dc->writeback_lock) to wait for the lock. By the above method, at worst case update_writeback_rate() may retry for 1+ minutes before blocking on dc->writeback_lock by calling down_read(). For a 4TB cache device with 1TB dirty data, 90%+ of the unnecessary soft lockup warning message can be avoided. When retrying to acquire dc->writeback_lock in update_writeback_rate(), of course the writeback rate cannot be updated. It is fair, because when the kworker is blocked on the lock contention of dc->writeback_lock, the writeback rate cannot be updated neither. This change follows Jens Axboe's suggestion to a more clear and simple version. Signed-off-by: Coly Li <colyli@suse.de> --- drivers/md/bcache/bcache.h | 7 +++++++ drivers/md/bcache/writeback.c | 31 +++++++++++++++++++++---------- 2 files changed, 28 insertions(+), 10 deletions(-)