Message ID | 20220215182650.19839-4-bvanassche@acm.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Fix a deadlock in the ib_srp driver | expand |
On Tue, Feb 15, 2022 at 10:26:50AM -0800, Bart Van Assche wrote: > Wait on tl_err_work instead of flushing system_long_wq since flushing > system_long_wq is deadlock-prone. > > Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> > Fixes: ef6c49d87c34 ("IB/srp: Eliminate state SRP_TARGET_DEAD") > Reported-by: syzbot+831661966588c802aae9@syzkaller.appspotmail.com > Signed-off-by: Bart Van Assche <bvanassche@acm.org> > --- > drivers/infiniband/ulp/srp/ib_srp.c | 6 ++---- > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c > index 2db7429b42e1..8e1561a6d325 100644 > --- a/drivers/infiniband/ulp/srp/ib_srp.c > +++ b/drivers/infiniband/ulp/srp/ib_srp.c > @@ -4044,12 +4044,10 @@ static void srp_remove_one(struct ib_device *device, void *client_data) > mutex_lock(&host->target_mutex); > list_for_each_entry(target, &host->target_list, list) > srp_queue_remove_work(target); > + list_for_each_entry(target, &host->target_list, list) > + flush_work(&target->tl_err_work); Sorry for my silly question, but why do you do flush and not cancel here? You anyway remove SRP device, so the result of flush is not really important, am I right? Thanks > mutex_unlock(&host->target_mutex); > > - /* > - * Wait for tl_err and target port removal tasks. > - */ > - flush_workqueue(system_long_wq); > flush_workqueue(srp_remove_wq); > > kfree(host);
On 2/15/22 10:51, Leon Romanovsky wrote: > On Tue, Feb 15, 2022 at 10:26:50AM -0800, Bart Van Assche wrote: >> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c >> index 2db7429b42e1..8e1561a6d325 100644 >> --- a/drivers/infiniband/ulp/srp/ib_srp.c >> +++ b/drivers/infiniband/ulp/srp/ib_srp.c >> @@ -4044,12 +4044,10 @@ static void srp_remove_one(struct ib_device *device, void *client_data) >> mutex_lock(&host->target_mutex); >> list_for_each_entry(target, &host->target_list, list) >> srp_queue_remove_work(target); >> + list_for_each_entry(target, &host->target_list, list) >> + flush_work(&target->tl_err_work); > > Sorry for my silly question, but why do you do flush and not cancel > here? You anyway remove SRP device, so the result of flush is not > really important, am I right? That's a great question. It probably doesn't matter much whether flush_work() or cancel_work_sync() is called in this context since srp_queue_remove_work() indirectly cancels tl_err_work. See also the following code in srp_remove_target(): cancel_work_sync(&target->tl_err_work); Thanks, Bart.
On Tue, Feb 15, 2022 at 12:37:43PM -0800, Bart Van Assche wrote: > On 2/15/22 10:51, Leon Romanovsky wrote: > > On Tue, Feb 15, 2022 at 10:26:50AM -0800, Bart Van Assche wrote: > > > diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c > > > index 2db7429b42e1..8e1561a6d325 100644 > > > +++ b/drivers/infiniband/ulp/srp/ib_srp.c > > > @@ -4044,12 +4044,10 @@ static void srp_remove_one(struct ib_device *device, void *client_data) > > > mutex_lock(&host->target_mutex); > > > list_for_each_entry(target, &host->target_list, list) > > > srp_queue_remove_work(target); > > > + list_for_each_entry(target, &host->target_list, list) > > > + flush_work(&target->tl_err_work); > > > > Sorry for my silly question, but why do you do flush and not cancel > > here? You anyway remove SRP device, so the result of flush is not > > really important, am I right? > > That's a great question. It probably doesn't matter much whether > flush_work() or cancel_work_sync() is called in this context since > srp_queue_remove_work() indirectly cancels tl_err_work. See also the > following code in srp_remove_target(): > cancel_work_sync(&target->tl_err_work); If it is already canceled then why call flush? Jason
On 2/15/22 12:41, Jason Gunthorpe wrote: > On Tue, Feb 15, 2022 at 12:37:43PM -0800, Bart Van Assche wrote: >> On 2/15/22 10:51, Leon Romanovsky wrote: >>> On Tue, Feb 15, 2022 at 10:26:50AM -0800, Bart Van Assche wrote: >>>> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c >>>> index 2db7429b42e1..8e1561a6d325 100644 >>>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c >>>> @@ -4044,12 +4044,10 @@ static void srp_remove_one(struct ib_device *device, void *client_data) >>>> mutex_lock(&host->target_mutex); >>>> list_for_each_entry(target, &host->target_list, list) >>>> srp_queue_remove_work(target); >>>> + list_for_each_entry(target, &host->target_list, list) >>>> + flush_work(&target->tl_err_work); >>> >>> Sorry for my silly question, but why do you do flush and not cancel >>> here? You anyway remove SRP device, so the result of flush is not >>> really important, am I right? >> >> That's a great question. It probably doesn't matter much whether >> flush_work() or cancel_work_sync() is called in this context since >> srp_queue_remove_work() indirectly cancels tl_err_work. See also the >> following code in srp_remove_target(): >> cancel_work_sync(&target->tl_err_work); > > If it is already canceled then why call flush? I'll see whether I can leave the flush_work(&target->tl_err_work) calls out. Thanks, Bart.
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 2db7429b42e1..8e1561a6d325 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -4044,12 +4044,10 @@ static void srp_remove_one(struct ib_device *device, void *client_data) mutex_lock(&host->target_mutex); list_for_each_entry(target, &host->target_list, list) srp_queue_remove_work(target); + list_for_each_entry(target, &host->target_list, list) + flush_work(&target->tl_err_work); mutex_unlock(&host->target_mutex); - /* - * Wait for tl_err and target port removal tasks. - */ - flush_workqueue(system_long_wq); flush_workqueue(srp_remove_wq); kfree(host);
Wait on tl_err_work instead of flushing system_long_wq since flushing system_long_wq is deadlock-prone. Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Fixes: ef6c49d87c34 ("IB/srp: Eliminate state SRP_TARGET_DEAD") Reported-by: syzbot+831661966588c802aae9@syzkaller.appspotmail.com Signed-off-by: Bart Van Assche <bvanassche@acm.org> --- drivers/infiniband/ulp/srp/ib_srp.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)