diff mbox series

[1/1] RDMA/iwcm: Fix WARNING:at_kernel/workqueue.c:#check_flush_dependency

Message ID 20240817084244.536397-1-yanjun.zhu@linux.dev (mailing list archive)
State Superseded
Delegated to: Jason Gunthorpe
Headers show
Series [1/1] RDMA/iwcm: Fix WARNING:at_kernel/workqueue.c:#check_flush_dependency | expand

Commit Message

Zhu Yanjun Aug. 17, 2024, 8:42 a.m. UTC
When workqueue_flush is invoked, WQ_MEM_RECLAIM is checked to avoid
errors.

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202408151633.fc01893c-oliver.sang@intel.com
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
---
 drivers/infiniband/core/iwcm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jason Gunthorpe Aug. 19, 2024, 6:38 p.m. UTC | #1
On Sat, Aug 17, 2024 at 10:42:44AM +0200, Zhu Yanjun wrote:
> When workqueue_flush is invoked, WQ_MEM_RECLAIM is checked to avoid
> errors.

Include backtraces please, these things are tricky and we often have
to go back and figure out why.

Explaine exactly why it is needed with traces in the commit message
and summarize in a comment.

Jason
Bart Van Assche Aug. 19, 2024, 7:28 p.m. UTC | #2
On 8/17/24 1:42 AM, Zhu Yanjun wrote:
> When workqueue_flush is invoked, WQ_MEM_RECLAIM is checked to avoid
> errors.

This description is too brief and not entirely correct. In the
description of this patch it should be explained that WQ_MEM_RECLAIM 
must be set for workqueues that are flushed from a work item queued on
a WQ_MEM_RECLAIM workqueue or from a memory reclaim context. Otherwise a
deadlock can occur. From kernel/workqueue.c:

/**
  * check_flush_dependency - check for flush dependency sanity
  * @target_wq: workqueue being flushed
  * @target_work: work item being flushed (NULL for workqueue flushes)
  *
  * %current is trying to flush the whole @target_wq or @target_work on it.
  * If @target_wq doesn't have %WQ_MEM_RECLAIM, verify that %current is not
  * reclaiming memory or running on a workqueue which doesn't have
  * %WQ_MEM_RECLAIM as that can break forward-progress guarantee leading to
  * a deadlock.
  */

> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202408151633.fc01893c-oliver.sang@intel.com
> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>

Since this is a fix, please include a "Fixes:" tag.

Thanks,

Bart.
Zhu Yanjun Aug. 19, 2024, 10:38 p.m. UTC | #3
在 2024/8/20 2:38, Jason Gunthorpe 写道:
> On Sat, Aug 17, 2024 at 10:42:44AM +0200, Zhu Yanjun wrote:
>> When workqueue_flush is invoked, WQ_MEM_RECLAIM is checked to avoid
>> errors.
> Include backtraces please, these things are tricky and we often have
> to go back and figure out why.
>
> Explaine exactly why it is needed with traces in the commit message
> and summarize in a comment.

Got it. I will make changes in the commit log and add backtraces.

Zhu Yanjun

>
> Jason
Zhu Yanjun Aug. 19, 2024, 10:42 p.m. UTC | #4
在 2024/8/20 3:28, Bart Van Assche 写道:
> On 8/17/24 1:42 AM, Zhu Yanjun wrote:
>> When workqueue_flush is invoked, WQ_MEM_RECLAIM is checked to avoid
>> errors.
>
> This description is too brief and not entirely correct. In the
> description of this patch it should be explained that WQ_MEM_RECLAIM 
> must be set for workqueues that are flushed from a work item queued on
> a WQ_MEM_RECLAIM workqueue or from a memory reclaim context. Otherwise a
> deadlock can occur. From kernel/workqueue.c:

Yeah. I will make changes to the commit logs based on the above comments.

Thanks for your advice.

>
> /**
>  * check_flush_dependency - check for flush dependency sanity
>  * @target_wq: workqueue being flushed
>  * @target_work: work item being flushed (NULL for workqueue flushes)
>  *
>  * %current is trying to flush the whole @target_wq or @target_work on 
> it.
>  * If @target_wq doesn't have %WQ_MEM_RECLAIM, verify that %current is 
> not
>  * reclaiming memory or running on a workqueue which doesn't have
>  * %WQ_MEM_RECLAIM as that can break forward-progress guarantee 
> leading to
>  * a deadlock.
>  */
>
>> Reported-by: kernel test robot <oliver.sang@intel.com>
>> Closes: 
>> https://lore.kernel.org/oe-lkp/202408151633.fc01893c-oliver.sang@intel.com
>> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
>
> Since this is a fix, please include a "Fixes:" tag.

Got it. I will add a "Fixes:" tag.

Best Regards,

Zhu Yanjun

>
> Thanks,
>
> Bart.
>
diff mbox series

Patch

diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c
index 1a6339f3a63f..7e3a55349e10 100644
--- a/drivers/infiniband/core/iwcm.c
+++ b/drivers/infiniband/core/iwcm.c
@@ -1182,7 +1182,7 @@  static int __init iw_cm_init(void)
 	if (ret)
 		return ret;
 
-	iwcm_wq = alloc_ordered_workqueue("iw_cm_wq", 0);
+	iwcm_wq = alloc_ordered_workqueue("iw_cm_wq", WQ_MEM_RECLAIM);
 	if (!iwcm_wq)
 		goto err_alloc;