Message ID | 20250403105402.1334206-1-ming.lei@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | block: don't grab elevator lock during queue initialization | expand |
On 4/3/25 4:24 PM, Ming Lei wrote: > ->elevator_lock depends on queue freeze lock, see block/blk-sysfs.c. > > queue freeze lock depends on fs_reclaim. > > So don't grab elevator lock during queue initialization which needs to > call kmalloc(GFP_KERNEL), and we can cut the dependency between > ->elevator_lock and fs_reclaim, then the lockdep warning can be killed. > > This way is safe because elevator setting isn't ready to run during > queue initialization. > > There isn't such issue in __blk_mq_update_nr_hw_queues() because > memalloc_noio_save() is called before acquiring elevator lock. > > Fixes the following lockdep warning: > > https://lore.kernel.org/linux-block/67e6b425.050a0220.2f068f.007b.GAE@google.com/ > > Reported-by: syzbot+4c7e0f9b94ad65811efb@syzkaller.appspotmail.com > Cc: Nilay Shroff <nilay@linux.ibm.com> > Signed-off-by: Ming Lei <ming.lei@redhat.com> > --- > block/blk-mq.c | 24 +++++++++++++++++------- > 1 file changed, 17 insertions(+), 7 deletions(-) > I think you earlier posted this same patch here: https://lore.kernel.org/linux-block/Z-dUCLvf06SfTOHy@fedora/ I tested that earlier patch and got into another lockdep splat as reported here: https://lore.kernel.org/linux-block/462d4e8a-dd95-48fe-b9fe-a558057f9595@linux.ibm.com/ I don't know why we think your earlier fix which cut dependency between ->elevator_lock and ->freeze_lock doesn't work. But anyways, my view is that we've got into these lock chains from two different code paths: path1: elevator_lock -> fs_reclaim (GFP_KERNEL) -> freeze_lock path2: freeze_lock(memalloc_noio) -> elevator_lock -> fs_reclaim (this becomes NOP in this case due to memalloc_noio) As you could see above, we've got into circular dependency between ->elevator_lock and ->freeze_lock. I thought with your earlier patch we were able to cut that dependency using blk_mq_enter_no_io and blk_mq_exit_no_io. But I'm not sure why we think that won't work? Thanks, --Nilay
On Thu, Apr 03, 2025 at 06:49:05PM +0530, Nilay Shroff wrote: > > > On 4/3/25 4:24 PM, Ming Lei wrote: > > ->elevator_lock depends on queue freeze lock, see block/blk-sysfs.c. > > > > queue freeze lock depends on fs_reclaim. > > > > So don't grab elevator lock during queue initialization which needs to > > call kmalloc(GFP_KERNEL), and we can cut the dependency between > > ->elevator_lock and fs_reclaim, then the lockdep warning can be killed. > > > > This way is safe because elevator setting isn't ready to run during > > queue initialization. > > > > There isn't such issue in __blk_mq_update_nr_hw_queues() because > > memalloc_noio_save() is called before acquiring elevator lock. > > > > Fixes the following lockdep warning: > > > > https://lore.kernel.org/linux-block/67e6b425.050a0220.2f068f.007b.GAE@google.com/ > > > > Reported-by: syzbot+4c7e0f9b94ad65811efb@syzkaller.appspotmail.com > > Cc: Nilay Shroff <nilay@linux.ibm.com> > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > --- > > block/blk-mq.c | 24 +++++++++++++++++------- > > 1 file changed, 17 insertions(+), 7 deletions(-) > > > I think you earlier posted this same patch here: > https://lore.kernel.org/linux-block/Z-dUCLvf06SfTOHy@fedora/ > > I tested that earlier patch and got into another lockdep splat as reported here: > https://lore.kernel.org/linux-block/462d4e8a-dd95-48fe-b9fe-a558057f9595@linux.ibm.com/ That is another different one, let's fix this one first. The ->elevator_lock in blk_register_queue() should be for avoiding race with updating nr_hw_queues, right? That is why I think the dependency between elevator lock and freeze lock is one trouble. > > I don't know why we think your earlier fix which cut dependency between > ->elevator_lock and ->freeze_lock doesn't work. But anyways, my view > is that we've got into these lock chains from two different code paths: As I explained, blk_mq_enter_no_io() is same with freeze queue, just the lock in __bio_queue_enter() isn't modeled. If it is done, every lockdep warning will be re-triggered too. > > path1: elevator_lock > -> fs_reclaim (GFP_KERNEL) > -> freeze_lock > > path2: freeze_lock(memalloc_noio) > -> elevator_lock > -> fs_reclaim (this becomes NOP in this case due to memalloc_noio) No, there isn't fs_reclaim in path2, and memalloc_noio() will avoid it. Thanks, Ming
On Thu, 03 Apr 2025 18:54:02 +0800, Ming Lei wrote: > ->elevator_lock depends on queue freeze lock, see block/blk-sysfs.c. > > queue freeze lock depends on fs_reclaim. > > So don't grab elevator lock during queue initialization which needs to > call kmalloc(GFP_KERNEL), and we can cut the dependency between > ->elevator_lock and fs_reclaim, then the lockdep warning can be killed. > > [...] Applied, thanks! [1/1] block: don't grab elevator lock during queue initialization commit: 01b91bf14f6d4893e03e357006e7af3a20c03fee Best regards,
On Thu, Apr 03, 2025 at 06:54:02PM +0800, Ming Lei wrote: > Fixes the following lockdep warning: Please spell the actual dependency out here, links are not permanent and also not readable for any offline reading of the commit logs. > +static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, > + struct request_queue *q, bool lock) > +{ > + if (lock) { bool lock(ed) arguments are an anti-pattern, and regularly get Linus screaming at you (in this case even for the right reason :)) > + /* protect against switching io scheduler */ > + mutex_lock(&q->elevator_lock); > + __blk_mq_realloc_hw_ctxs(set, q); > + mutex_unlock(&q->elevator_lock); > + } else { > + __blk_mq_realloc_hw_ctxs(set, q); > + } I think the problem here is again that because of all the other dependencies elevator_lock really needs to be per-set instead of per-queue which will allows us to have much saner locking hierarchies.
On Fri, Apr 04, 2025 at 11:10:37AM +0200, Christoph Hellwig wrote: > On Thu, Apr 03, 2025 at 06:54:02PM +0800, Ming Lei wrote: > > Fixes the following lockdep warning: > > Please spell the actual dependency out here, links are not permanent > and also not readable for any offline reading of the commit logs. > > > +static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, > > + struct request_queue *q, bool lock) > > +{ > > + if (lock) { > > bool lock(ed) arguments are an anti-pattern, and regularly get Linus > screaming at you (in this case even for the right reason :)) > > > + /* protect against switching io scheduler */ > > + mutex_lock(&q->elevator_lock); > > + __blk_mq_realloc_hw_ctxs(set, q); > > + mutex_unlock(&q->elevator_lock); > > + } else { > > + __blk_mq_realloc_hw_ctxs(set, q); > > + } > > I think the problem here is again that because of all the other > dependencies elevator_lock really needs to be per-set instead of > per-queue which will allows us to have much saner locking hierarchies. per-set lock is required in error handling code path, anywhere it is used with freezing together can be one deadlock risk if hardware error happens. Or can you explain the idea in details? Thanks, Ming
diff --git a/block/blk-mq.c b/block/blk-mq.c index ae8494d88897..d7a103dc258b 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -4465,14 +4465,12 @@ static struct blk_mq_hw_ctx *blk_mq_alloc_and_init_hctx( return NULL; } -static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, - struct request_queue *q) +static void __blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, + struct request_queue *q) { struct blk_mq_hw_ctx *hctx; unsigned long i, j; - /* protect against switching io scheduler */ - mutex_lock(&q->elevator_lock); for (i = 0; i < set->nr_hw_queues; i++) { int old_node; int node = blk_mq_get_hctx_node(set, i); @@ -4505,7 +4503,19 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, xa_for_each_start(&q->hctx_table, j, hctx, j) blk_mq_exit_hctx(q, set, hctx, j); - mutex_unlock(&q->elevator_lock); +} + +static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, + struct request_queue *q, bool lock) +{ + if (lock) { + /* protect against switching io scheduler */ + mutex_lock(&q->elevator_lock); + __blk_mq_realloc_hw_ctxs(set, q); + mutex_unlock(&q->elevator_lock); + } else { + __blk_mq_realloc_hw_ctxs(set, q); + } /* unregister cpuhp callbacks for exited hctxs */ blk_mq_remove_hw_queues_cpuhp(q); @@ -4537,7 +4547,7 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, xa_init(&q->hctx_table); - blk_mq_realloc_hw_ctxs(set, q); + blk_mq_realloc_hw_ctxs(set, q, false); if (!q->nr_hw_queues) goto err_hctxs; @@ -5033,7 +5043,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, fallback: blk_mq_update_queue_map(set); list_for_each_entry(q, &set->tag_list, tag_set_list) { - blk_mq_realloc_hw_ctxs(set, q); + blk_mq_realloc_hw_ctxs(set, q, true); if (q->nr_hw_queues != set->nr_hw_queues) { int i = prev_nr_hw_queues;
->elevator_lock depends on queue freeze lock, see block/blk-sysfs.c. queue freeze lock depends on fs_reclaim. So don't grab elevator lock during queue initialization which needs to call kmalloc(GFP_KERNEL), and we can cut the dependency between ->elevator_lock and fs_reclaim, then the lockdep warning can be killed. This way is safe because elevator setting isn't ready to run during queue initialization. There isn't such issue in __blk_mq_update_nr_hw_queues() because memalloc_noio_save() is called before acquiring elevator lock. Fixes the following lockdep warning: https://lore.kernel.org/linux-block/67e6b425.050a0220.2f068f.007b.GAE@google.com/ Reported-by: syzbot+4c7e0f9b94ad65811efb@syzkaller.appspotmail.com Cc: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> --- block/blk-mq.c | 24 +++++++++++++++++------- 1 file changed, 17 insertions(+), 7 deletions(-)