Message ID | 9fe1b3a9-2a80-8891-1044-f83558e28d15@fb.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> Il giorno 15 feb 2017, alle ore 19:04, Jens Axboe <axboe@fb.com> ha scritto: > > On 02/15/2017 10:58 AM, Jens Axboe wrote: >> On 02/15/2017 10:24 AM, Paolo Valente wrote: >>> >>>> Il giorno 10 feb 2017, alle ore 19:32, Omar Sandoval <osandov@osandov.com> ha scritto: >>>> >>>> From: Omar Sandoval <osandov@fb.com> >>>> >>>> None of the other blk-mq elevator hooks are called with this lock held. >>>> Additionally, it can lead to circular locking dependencies between >>>> queue_lock and the private scheduler lock. >>>> >>> >>> Hi Omar, >>> I'm sorry but it seems that a new potential deadlock has showed up. >>> See lockdep splat below. >>> >>> I've tried to think about different solutions than turning back to >>> deferring the body of exit_icq, but at no avail. >> >> Looks like a interaction between bfqd->lock and q->queue_lock. Since the >> core has no notion of you bfqd->lock, the naturally dependency here >> would be to nest bfqd->lock inside q->queue_lock. Is that possible for >> you? >> >> Looking at the code a bit, maybe it'd just be simpler to get rid of >> holding the queue lock for that spot. For the mq scheduler, we really >> don't want places where we invoke with that held already. Does the below >> work for you? > > Would need to remove one more lockdep assert. And only test this for > the mq parts, we'd need to spread a bit of love on the classic > scheduling icq exit path for this to work on that side. > Sorry Jens, same splat. What confuses me is the second column in the possible scenario: [ 139.368477] CPU0 CPU1 [ 139.369129] ---- ---- [ 139.369774] lock(&(&ioc->lock)->rlock); [ 139.370339] lock(&(&q->__queue_lock)->rlock); [ 139.390579] lock(&(&ioc->lock)->rlock); [ 139.391522] lock(&(&bfqd->lock)->rlock); I could not find any code path, related to the reported call traces, and taking first q->queue_lock and then ioc->lock. Any suggestion on how to go on, and hopefully help with this problem is welcome. Thanks, Paolo > diff --git a/block/blk-ioc.c b/block/blk-ioc.c > index b12f9c87b4c3..546ff8f81ede 100644 > --- a/block/blk-ioc.c > +++ b/block/blk-ioc.c > @@ -54,7 +54,7 @@ static void ioc_exit_icq(struct io_cq *icq) > icq->flags |= ICQ_EXITED; > } > > -/* Release an icq. Called with both ioc and q locked. */ > +/* Release an icq. Called with ioc locked. */ > static void ioc_destroy_icq(struct io_cq *icq) > { > struct io_context *ioc = icq->ioc; > @@ -62,7 +62,6 @@ static void ioc_destroy_icq(struct io_cq *icq) > struct elevator_type *et = q->elevator->type; > > lockdep_assert_held(&ioc->lock); > - lockdep_assert_held(q->queue_lock); > > radix_tree_delete(&ioc->icq_tree, icq->q->id); > hlist_del_init(&icq->ioc_node); > @@ -222,25 +221,34 @@ void exit_io_context(struct task_struct *task) > put_io_context_active(ioc); > } > > +static void __ioc_clear_queue(struct list_head *icq_list) > +{ > + while (!list_empty(icq_list)) { > + struct io_cq *icq = list_entry(icq_list->next, > + struct io_cq, q_node); > + struct io_context *ioc = icq->ioc; > + > + spin_lock_irq(&ioc->lock); > + ioc_destroy_icq(icq); > + spin_unlock_irq(&ioc->lock); > + } > +} > + > /** > * ioc_clear_queue - break any ioc association with the specified queue > * @q: request_queue being cleared > * > - * Walk @q->icq_list and exit all io_cq's. Must be called with @q locked. > + * Walk @q->icq_list and exit all io_cq's. > */ > void ioc_clear_queue(struct request_queue *q) > { > - lockdep_assert_held(q->queue_lock); > + LIST_HEAD(icq_list); > > - while (!list_empty(&q->icq_list)) { > - struct io_cq *icq = list_entry(q->icq_list.next, > - struct io_cq, q_node); > - struct io_context *ioc = icq->ioc; > + spin_lock_irq(q->queue_lock); > + list_splice_init(&q->icq_list, &icq_list); > + spin_unlock_irq(q->queue_lock); > > - spin_lock(&ioc->lock); > - ioc_destroy_icq(icq); > - spin_unlock(&ioc->lock); > - } > + __ioc_clear_queue(&icq_list); > } > > int create_task_io_context(struct task_struct *task, gfp_t gfp_flags, int node) > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c > index 070d81bae1d5..1944aa1cb899 100644 > --- a/block/blk-sysfs.c > +++ b/block/blk-sysfs.c > @@ -815,9 +815,7 @@ static void blk_release_queue(struct kobject *kobj) > blkcg_exit_queue(q); > > if (q->elevator) { > - spin_lock_irq(q->queue_lock); > ioc_clear_queue(q); > - spin_unlock_irq(q->queue_lock); > elevator_exit(q->elevator); > } > > diff --git a/block/elevator.c b/block/elevator.c > index a25bdd90b270..aaa1e9836512 100644 > --- a/block/elevator.c > +++ b/block/elevator.c > @@ -985,9 +985,7 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e) > if (old_registered) > elv_unregister_queue(q); > > - spin_lock_irq(q->queue_lock); > ioc_clear_queue(q); > - spin_unlock_irq(q->queue_lock); > } > > /* allocate, init and register new elevator */ > > -- > Jens Axboe
> Il giorno 16 feb 2017, alle ore 11:31, Paolo Valente <paolo.valente@linaro.org> ha scritto: > >> >> Il giorno 15 feb 2017, alle ore 19:04, Jens Axboe <axboe@fb.com> ha scritto: >> >> On 02/15/2017 10:58 AM, Jens Axboe wrote: >>> On 02/15/2017 10:24 AM, Paolo Valente wrote: >>>> >>>>> Il giorno 10 feb 2017, alle ore 19:32, Omar Sandoval <osandov@osandov.com> ha scritto: >>>>> >>>>> From: Omar Sandoval <osandov@fb.com> >>>>> >>>>> None of the other blk-mq elevator hooks are called with this lock held. >>>>> Additionally, it can lead to circular locking dependencies between >>>>> queue_lock and the private scheduler lock. >>>>> >>>> >>>> Hi Omar, >>>> I'm sorry but it seems that a new potential deadlock has showed up. >>>> See lockdep splat below. >>>> >>>> I've tried to think about different solutions than turning back to >>>> deferring the body of exit_icq, but at no avail. >>> >>> Looks like a interaction between bfqd->lock and q->queue_lock. Since the >>> core has no notion of you bfqd->lock, the naturally dependency here >>> would be to nest bfqd->lock inside q->queue_lock. Is that possible for >>> you? >>> >>> Looking at the code a bit, maybe it'd just be simpler to get rid of >>> holding the queue lock for that spot. For the mq scheduler, we really >>> don't want places where we invoke with that held already. Does the below >>> work for you? >> >> Would need to remove one more lockdep assert. And only test this for >> the mq parts, we'd need to spread a bit of love on the classic >> scheduling icq exit path for this to work on that side. >> > > Sorry Jens, same splat. What confuses me is the second column > in the possible scenario: > > [ 139.368477] CPU0 CPU1 > [ 139.369129] ---- ---- > [ 139.369774] lock(&(&ioc->lock)->rlock); > [ 139.370339] lock(&(&q->__queue_lock)->rlock); > [ 139.390579] lock(&(&ioc->lock)->rlock); > [ 139.391522] lock(&(&bfqd->lock)->rlock); > > I could not find any code path, related to the reported call traces, > and taking first q->queue_lock and then ioc->lock. > > Any suggestion on how to go on, and hopefully help with this problem is > welcome. > Jens, this is just to tell you that I have found the link that still causes the circular dependency: an ioc->lock nested into a queue_lock in ioc_create_icq. I'll try to come back with a solution proposal. Thanks, Paolo > Thanks, > Paolo > >> diff --git a/block/blk-ioc.c b/block/blk-ioc.c >> index b12f9c87b4c3..546ff8f81ede 100644 >> --- a/block/blk-ioc.c >> +++ b/block/blk-ioc.c >> @@ -54,7 +54,7 @@ static void ioc_exit_icq(struct io_cq *icq) >> icq->flags |= ICQ_EXITED; >> } >> >> -/* Release an icq. Called with both ioc and q locked. */ >> +/* Release an icq. Called with ioc locked. */ >> static void ioc_destroy_icq(struct io_cq *icq) >> { >> struct io_context *ioc = icq->ioc; >> @@ -62,7 +62,6 @@ static void ioc_destroy_icq(struct io_cq *icq) >> struct elevator_type *et = q->elevator->type; >> >> lockdep_assert_held(&ioc->lock); >> - lockdep_assert_held(q->queue_lock); >> >> radix_tree_delete(&ioc->icq_tree, icq->q->id); >> hlist_del_init(&icq->ioc_node); >> @@ -222,25 +221,34 @@ void exit_io_context(struct task_struct *task) >> put_io_context_active(ioc); >> } >> >> +static void __ioc_clear_queue(struct list_head *icq_list) >> +{ >> + while (!list_empty(icq_list)) { >> + struct io_cq *icq = list_entry(icq_list->next, >> + struct io_cq, q_node); >> + struct io_context *ioc = icq->ioc; >> + >> + spin_lock_irq(&ioc->lock); >> + ioc_destroy_icq(icq); >> + spin_unlock_irq(&ioc->lock); >> + } >> +} >> + >> /** >> * ioc_clear_queue - break any ioc association with the specified queue >> * @q: request_queue being cleared >> * >> - * Walk @q->icq_list and exit all io_cq's. Must be called with @q locked. >> + * Walk @q->icq_list and exit all io_cq's. >> */ >> void ioc_clear_queue(struct request_queue *q) >> { >> - lockdep_assert_held(q->queue_lock); >> + LIST_HEAD(icq_list); >> >> - while (!list_empty(&q->icq_list)) { >> - struct io_cq *icq = list_entry(q->icq_list.next, >> - struct io_cq, q_node); >> - struct io_context *ioc = icq->ioc; >> + spin_lock_irq(q->queue_lock); >> + list_splice_init(&q->icq_list, &icq_list); >> + spin_unlock_irq(q->queue_lock); >> >> - spin_lock(&ioc->lock); >> - ioc_destroy_icq(icq); >> - spin_unlock(&ioc->lock); >> - } >> + __ioc_clear_queue(&icq_list); >> } >> >> int create_task_io_context(struct task_struct *task, gfp_t gfp_flags, int node) >> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c >> index 070d81bae1d5..1944aa1cb899 100644 >> --- a/block/blk-sysfs.c >> +++ b/block/blk-sysfs.c >> @@ -815,9 +815,7 @@ static void blk_release_queue(struct kobject *kobj) >> blkcg_exit_queue(q); >> >> if (q->elevator) { >> - spin_lock_irq(q->queue_lock); >> ioc_clear_queue(q); >> - spin_unlock_irq(q->queue_lock); >> elevator_exit(q->elevator); >> } >> >> diff --git a/block/elevator.c b/block/elevator.c >> index a25bdd90b270..aaa1e9836512 100644 >> --- a/block/elevator.c >> +++ b/block/elevator.c >> @@ -985,9 +985,7 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e) >> if (old_registered) >> elv_unregister_queue(q); >> >> - spin_lock_irq(q->queue_lock); >> ioc_clear_queue(q); >> - spin_unlock_irq(q->queue_lock); >> } >> >> /* allocate, init and register new elevator */ >> >> -- >> Jens Axboe
> Il giorno 17 feb 2017, alle ore 11:30, Paolo Valente <paolo.valente@linaro.org> ha scritto: > > >> Il giorno 16 feb 2017, alle ore 11:31, Paolo Valente <paolo.valente@linaro.org> ha scritto: >> >>> >>> Il giorno 15 feb 2017, alle ore 19:04, Jens Axboe <axboe@fb.com> ha scritto: >>> >>> On 02/15/2017 10:58 AM, Jens Axboe wrote: >>>> On 02/15/2017 10:24 AM, Paolo Valente wrote: >>>>> >>>>>> Il giorno 10 feb 2017, alle ore 19:32, Omar Sandoval <osandov@osandov.com> ha scritto: >>>>>> >>>>>> From: Omar Sandoval <osandov@fb.com> >>>>>> >>>>>> None of the other blk-mq elevator hooks are called with this lock held. >>>>>> Additionally, it can lead to circular locking dependencies between >>>>>> queue_lock and the private scheduler lock. >>>>>> >>>>> >>>>> Hi Omar, >>>>> I'm sorry but it seems that a new potential deadlock has showed up. >>>>> See lockdep splat below. >>>>> >>>>> I've tried to think about different solutions than turning back to >>>>> deferring the body of exit_icq, but at no avail. >>>> >>>> Looks like a interaction between bfqd->lock and q->queue_lock. Since the >>>> core has no notion of you bfqd->lock, the naturally dependency here >>>> would be to nest bfqd->lock inside q->queue_lock. Is that possible for >>>> you? >>>> >>>> Looking at the code a bit, maybe it'd just be simpler to get rid of >>>> holding the queue lock for that spot. For the mq scheduler, we really >>>> don't want places where we invoke with that held already. Does the below >>>> work for you? >>> >>> Would need to remove one more lockdep assert. And only test this for >>> the mq parts, we'd need to spread a bit of love on the classic >>> scheduling icq exit path for this to work on that side. >>> >> >> Sorry Jens, same splat. What confuses me is the second column >> in the possible scenario: >> >> [ 139.368477] CPU0 CPU1 >> [ 139.369129] ---- ---- >> [ 139.369774] lock(&(&ioc->lock)->rlock); >> [ 139.370339] lock(&(&q->__queue_lock)->rlock); >> [ 139.390579] lock(&(&ioc->lock)->rlock); >> [ 139.391522] lock(&(&bfqd->lock)->rlock); >> >> I could not find any code path, related to the reported call traces, >> and taking first q->queue_lock and then ioc->lock. >> >> Any suggestion on how to go on, and hopefully help with this problem is >> welcome. >> > > Jens, > this is just to tell you that I have found the link that still causes > the circular dependency: an ioc->lock nested into a queue_lock in > ioc_create_icq. I'll try to come back with a solution proposal. > Solution implemented and apparently working. If anyone wants to have a look at or test it, it is in the usual branch [1]. I have found other similar circular dependencies in the meanwhile, and this solution covers them too. I'm pasting the message of the last commit below. Actually, the branch now also contains a fix to another minor bug (if useful to know, to not waste further time in preparing several micro fixes, I have just merged some minor fixes into existing commits). I'm starting to work on cgroups support. Unnest request-queue and ioc locks from scheduler locks In some bio-merging functions, the request-queue lock needs to be taken, to lookup for the bic associated with the process that issued the bio that may need to be merged. In addition, put_io_context must be invoked in some other functions, and put_io_context may cause the lock of the involved ioc to be taken. In both cases, these extra request-queue or ioc locks are taken, or might be taken, while the scheduler lock is being held. In this respect, there are other code paths, in part external to bfq-mq, in which the same locks are taken (nested) in the opposite order, i.e., it is the scheduler lock to be taken while the request-queue or the ioc lock is being held. This leads to circular deadlocks. This commit addresses this issue by modifying the logic of the above functions, so as to let the lookup and put_io_context be performed, and thus the extra locks be taken, outside the critical sections protected by the scheduler lock. Thanks, Paolo [1] https://github.com/Algodev-github/bfq-mq > Thanks, > Paolo > >> Thanks, >> Paolo >> >>> diff --git a/block/blk-ioc.c b/block/blk-ioc.c >>> index b12f9c87b4c3..546ff8f81ede 100644 >>> --- a/block/blk-ioc.c >>> +++ b/block/blk-ioc.c >>> @@ -54,7 +54,7 @@ static void ioc_exit_icq(struct io_cq *icq) >>> icq->flags |= ICQ_EXITED; >>> } >>> >>> -/* Release an icq. Called with both ioc and q locked. */ >>> +/* Release an icq. Called with ioc locked. */ >>> static void ioc_destroy_icq(struct io_cq *icq) >>> { >>> struct io_context *ioc = icq->ioc; >>> @@ -62,7 +62,6 @@ static void ioc_destroy_icq(struct io_cq *icq) >>> struct elevator_type *et = q->elevator->type; >>> >>> lockdep_assert_held(&ioc->lock); >>> - lockdep_assert_held(q->queue_lock); >>> >>> radix_tree_delete(&ioc->icq_tree, icq->q->id); >>> hlist_del_init(&icq->ioc_node); >>> @@ -222,25 +221,34 @@ void exit_io_context(struct task_struct *task) >>> put_io_context_active(ioc); >>> } >>> >>> +static void __ioc_clear_queue(struct list_head *icq_list) >>> +{ >>> + while (!list_empty(icq_list)) { >>> + struct io_cq *icq = list_entry(icq_list->next, >>> + struct io_cq, q_node); >>> + struct io_context *ioc = icq->ioc; >>> + >>> + spin_lock_irq(&ioc->lock); >>> + ioc_destroy_icq(icq); >>> + spin_unlock_irq(&ioc->lock); >>> + } >>> +} >>> + >>> /** >>> * ioc_clear_queue - break any ioc association with the specified queue >>> * @q: request_queue being cleared >>> * >>> - * Walk @q->icq_list and exit all io_cq's. Must be called with @q locked. >>> + * Walk @q->icq_list and exit all io_cq's. >>> */ >>> void ioc_clear_queue(struct request_queue *q) >>> { >>> - lockdep_assert_held(q->queue_lock); >>> + LIST_HEAD(icq_list); >>> >>> - while (!list_empty(&q->icq_list)) { >>> - struct io_cq *icq = list_entry(q->icq_list.next, >>> - struct io_cq, q_node); >>> - struct io_context *ioc = icq->ioc; >>> + spin_lock_irq(q->queue_lock); >>> + list_splice_init(&q->icq_list, &icq_list); >>> + spin_unlock_irq(q->queue_lock); >>> >>> - spin_lock(&ioc->lock); >>> - ioc_destroy_icq(icq); >>> - spin_unlock(&ioc->lock); >>> - } >>> + __ioc_clear_queue(&icq_list); >>> } >>> >>> int create_task_io_context(struct task_struct *task, gfp_t gfp_flags, int node) >>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c >>> index 070d81bae1d5..1944aa1cb899 100644 >>> --- a/block/blk-sysfs.c >>> +++ b/block/blk-sysfs.c >>> @@ -815,9 +815,7 @@ static void blk_release_queue(struct kobject *kobj) >>> blkcg_exit_queue(q); >>> >>> if (q->elevator) { >>> - spin_lock_irq(q->queue_lock); >>> ioc_clear_queue(q); >>> - spin_unlock_irq(q->queue_lock); >>> elevator_exit(q->elevator); >>> } >>> >>> diff --git a/block/elevator.c b/block/elevator.c >>> index a25bdd90b270..aaa1e9836512 100644 >>> --- a/block/elevator.c >>> +++ b/block/elevator.c >>> @@ -985,9 +985,7 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e) >>> if (old_registered) >>> elv_unregister_queue(q); >>> >>> - spin_lock_irq(q->queue_lock); >>> ioc_clear_queue(q); >>> - spin_unlock_irq(q->queue_lock); >>> } >>> >>> /* allocate, init and register new elevator */ >>> >>> -- >>> Jens Axboe >
> Il giorno 15 feb 2017, alle ore 19:04, Jens Axboe <axboe@fb.com> ha scritto: > > On 02/15/2017 10:58 AM, Jens Axboe wrote: >> On 02/15/2017 10:24 AM, Paolo Valente wrote: >>> >>>> Il giorno 10 feb 2017, alle ore 19:32, Omar Sandoval <osandov@osandov.com> ha scritto: >>>> >>>> From: Omar Sandoval <osandov@fb.com> >>>> >>>> None of the other blk-mq elevator hooks are called with this lock held. >>>> Additionally, it can lead to circular locking dependencies between >>>> queue_lock and the private scheduler lock. >>>> >>> >>> Hi Omar, >>> I'm sorry but it seems that a new potential deadlock has showed up. >>> See lockdep splat below. >>> >>> I've tried to think about different solutions than turning back to >>> deferring the body of exit_icq, but at no avail. >> >> Looks like a interaction between bfqd->lock and q->queue_lock. Since the >> core has no notion of you bfqd->lock, the naturally dependency here >> would be to nest bfqd->lock inside q->queue_lock. Is that possible for >> you? >> >> Looking at the code a bit, maybe it'd just be simpler to get rid of >> holding the queue lock for that spot. For the mq scheduler, we really >> don't want places where we invoke with that held already. Does the below >> work for you? > > Would need to remove one more lockdep assert. And only test this for > the mq parts, we'd need to spread a bit of love on the classic > scheduling icq exit path for this to work on that side. > Jens, here is the reply I anticipated in my previous email: after rebasing against master, I'm getting again the deadlock that this patch of yours solved (together with some changes in bfq-mq). I thought you added a sort of equivalent commit (now) to the mainline branch. Am I wrong? Thanks, Paolo > diff --git a/block/blk-ioc.c b/block/blk-ioc.c > index b12f9c87b4c3..546ff8f81ede 100644 > --- a/block/blk-ioc.c > +++ b/block/blk-ioc.c > @@ -54,7 +54,7 @@ static void ioc_exit_icq(struct io_cq *icq) > icq->flags |= ICQ_EXITED; > } > > -/* Release an icq. Called with both ioc and q locked. */ > +/* Release an icq. Called with ioc locked. */ > static void ioc_destroy_icq(struct io_cq *icq) > { > struct io_context *ioc = icq->ioc; > @@ -62,7 +62,6 @@ static void ioc_destroy_icq(struct io_cq *icq) > struct elevator_type *et = q->elevator->type; > > lockdep_assert_held(&ioc->lock); > - lockdep_assert_held(q->queue_lock); > > radix_tree_delete(&ioc->icq_tree, icq->q->id); > hlist_del_init(&icq->ioc_node); > @@ -222,25 +221,34 @@ void exit_io_context(struct task_struct *task) > put_io_context_active(ioc); > } > > +static void __ioc_clear_queue(struct list_head *icq_list) > +{ > + while (!list_empty(icq_list)) { > + struct io_cq *icq = list_entry(icq_list->next, > + struct io_cq, q_node); > + struct io_context *ioc = icq->ioc; > + > + spin_lock_irq(&ioc->lock); > + ioc_destroy_icq(icq); > + spin_unlock_irq(&ioc->lock); > + } > +} > + > /** > * ioc_clear_queue - break any ioc association with the specified queue > * @q: request_queue being cleared > * > - * Walk @q->icq_list and exit all io_cq's. Must be called with @q locked. > + * Walk @q->icq_list and exit all io_cq's. > */ > void ioc_clear_queue(struct request_queue *q) > { > - lockdep_assert_held(q->queue_lock); > + LIST_HEAD(icq_list); > > - while (!list_empty(&q->icq_list)) { > - struct io_cq *icq = list_entry(q->icq_list.next, > - struct io_cq, q_node); > - struct io_context *ioc = icq->ioc; > + spin_lock_irq(q->queue_lock); > + list_splice_init(&q->icq_list, &icq_list); > + spin_unlock_irq(q->queue_lock); > > - spin_lock(&ioc->lock); > - ioc_destroy_icq(icq); > - spin_unlock(&ioc->lock); > - } > + __ioc_clear_queue(&icq_list); > } > > int create_task_io_context(struct task_struct *task, gfp_t gfp_flags, int node) > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c > index 070d81bae1d5..1944aa1cb899 100644 > --- a/block/blk-sysfs.c > +++ b/block/blk-sysfs.c > @@ -815,9 +815,7 @@ static void blk_release_queue(struct kobject *kobj) > blkcg_exit_queue(q); > > if (q->elevator) { > - spin_lock_irq(q->queue_lock); > ioc_clear_queue(q); > - spin_unlock_irq(q->queue_lock); > elevator_exit(q->elevator); > } > > diff --git a/block/elevator.c b/block/elevator.c > index a25bdd90b270..aaa1e9836512 100644 > --- a/block/elevator.c > +++ b/block/elevator.c > @@ -985,9 +985,7 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e) > if (old_registered) > elv_unregister_queue(q); > > - spin_lock_irq(q->queue_lock); > ioc_clear_queue(q); > - spin_unlock_irq(q->queue_lock); > } > > /* allocate, init and register new elevator */ > > -- > Jens Axboe
On 03/02/2017 03:28 AM, Paolo Valente wrote: > >> Il giorno 15 feb 2017, alle ore 19:04, Jens Axboe <axboe@fb.com> ha scritto: >> >> On 02/15/2017 10:58 AM, Jens Axboe wrote: >>> On 02/15/2017 10:24 AM, Paolo Valente wrote: >>>> >>>>> Il giorno 10 feb 2017, alle ore 19:32, Omar Sandoval <osandov@osandov.com> ha scritto: >>>>> >>>>> From: Omar Sandoval <osandov@fb.com> >>>>> >>>>> None of the other blk-mq elevator hooks are called with this lock held. >>>>> Additionally, it can lead to circular locking dependencies between >>>>> queue_lock and the private scheduler lock. >>>>> >>>> >>>> Hi Omar, >>>> I'm sorry but it seems that a new potential deadlock has showed up. >>>> See lockdep splat below. >>>> >>>> I've tried to think about different solutions than turning back to >>>> deferring the body of exit_icq, but at no avail. >>> >>> Looks like a interaction between bfqd->lock and q->queue_lock. Since the >>> core has no notion of you bfqd->lock, the naturally dependency here >>> would be to nest bfqd->lock inside q->queue_lock. Is that possible for >>> you? >>> >>> Looking at the code a bit, maybe it'd just be simpler to get rid of >>> holding the queue lock for that spot. For the mq scheduler, we really >>> don't want places where we invoke with that held already. Does the below >>> work for you? >> >> Would need to remove one more lockdep assert. And only test this for >> the mq parts, we'd need to spread a bit of love on the classic >> scheduling icq exit path for this to work on that side. >> > > > Jens, > here is the reply I anticipated in my previous email: after rebasing > against master, I'm getting again the deadlock that this patch of > yours solved (together with some changes in bfq-mq). I thought you added a > sort of equivalent commit (now) to the mainline branch. Am I wrong? The patch I posted was never pulled to completion, it wasn't clear to me if it fixed your issue or not. Maybe I missed a reply on that? Let me take another stab at it today, I'll send you a version to test on top of my for-linus branch.
diff --git a/block/blk-ioc.c b/block/blk-ioc.c index b12f9c87b4c3..546ff8f81ede 100644 --- a/block/blk-ioc.c +++ b/block/blk-ioc.c @@ -54,7 +54,7 @@ static void ioc_exit_icq(struct io_cq *icq) icq->flags |= ICQ_EXITED; } -/* Release an icq. Called with both ioc and q locked. */ +/* Release an icq. Called with ioc locked. */ static void ioc_destroy_icq(struct io_cq *icq) { struct io_context *ioc = icq->ioc; @@ -62,7 +62,6 @@ static void ioc_destroy_icq(struct io_cq *icq) struct elevator_type *et = q->elevator->type; lockdep_assert_held(&ioc->lock); - lockdep_assert_held(q->queue_lock); radix_tree_delete(&ioc->icq_tree, icq->q->id); hlist_del_init(&icq->ioc_node); @@ -222,25 +221,34 @@ void exit_io_context(struct task_struct *task) put_io_context_active(ioc); } +static void __ioc_clear_queue(struct list_head *icq_list) +{ + while (!list_empty(icq_list)) { + struct io_cq *icq = list_entry(icq_list->next, + struct io_cq, q_node); + struct io_context *ioc = icq->ioc; + + spin_lock_irq(&ioc->lock); + ioc_destroy_icq(icq); + spin_unlock_irq(&ioc->lock); + } +} + /** * ioc_clear_queue - break any ioc association with the specified queue * @q: request_queue being cleared * - * Walk @q->icq_list and exit all io_cq's. Must be called with @q locked. + * Walk @q->icq_list and exit all io_cq's. */ void ioc_clear_queue(struct request_queue *q) { - lockdep_assert_held(q->queue_lock); + LIST_HEAD(icq_list); - while (!list_empty(&q->icq_list)) { - struct io_cq *icq = list_entry(q->icq_list.next, - struct io_cq, q_node); - struct io_context *ioc = icq->ioc; + spin_lock_irq(q->queue_lock); + list_splice_init(&q->icq_list, &icq_list); + spin_unlock_irq(q->queue_lock); - spin_lock(&ioc->lock); - ioc_destroy_icq(icq); - spin_unlock(&ioc->lock); - } + __ioc_clear_queue(&icq_list); } int create_task_io_context(struct task_struct *task, gfp_t gfp_flags, int node) diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 070d81bae1d5..1944aa1cb899 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -815,9 +815,7 @@ static void blk_release_queue(struct kobject *kobj) blkcg_exit_queue(q); if (q->elevator) { - spin_lock_irq(q->queue_lock); ioc_clear_queue(q); - spin_unlock_irq(q->queue_lock); elevator_exit(q->elevator); } diff --git a/block/elevator.c b/block/elevator.c index a25bdd90b270..aaa1e9836512 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -985,9 +985,7 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e) if (old_registered) elv_unregister_queue(q); - spin_lock_irq(q->queue_lock); ioc_clear_queue(q); - spin_unlock_irq(q->queue_lock); } /* allocate, init and register new elevator */