[v2] ufs: core: fix ufshcd_abort_all racing issue

Message ID	20240624121158.21354-1-peter.wang@mediatek.com (mailing list archive)
State	Superseded
Headers	show Received: from mailgw01.mediatek.com (unknown [60.244.123.138]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7679713B58D; Mon, 24 Jun 2024 12:12:08 +0000 (UTC) From: <peter.wang@mediatek.com> To: <linux-scsi@vger.kernel.org>, <martin.petersen@oracle.com>, <avri.altman@wdc.com>, <quic_nguyenb@quicinc.com>, <alim.akhtar@samsung.com>, <jejb@linux.ibm.com> CC: <wsd_upstream@mediatek.com>, <linux-mediatek@lists.infradead.org>, <peter.wang@mediatek.com>, <chun-hung.wu@mediatek.com>, <alice.chao@mediatek.com>, <cc.chou@mediatek.com>, <chaotian.jing@mediatek.com>, <jiajie.hao@mediatek.com>, <powen.kao@mediatek.com>, <qilin.tan@mediatek.com>, <lin.gui@mediatek.com>, <tun-yu.yu@mediatek.com>, <eddie.huang@mediatek.com>, <naomi.chu@mediatek.com>, <chu.stanley@gmail.com>, <stable@vger.kernel.org> Subject: [PATCH v2] ufs: core: fix ufshcd_abort_all racing issue Date: Mon, 24 Jun 2024 20:11:58 +0800 Message-ID: <20240624121158.21354-1-peter.wang@mediatek.com> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain
Series	[v2] ufs: core: fix ufshcd_abort_all racing issue \| expand [v2] ufs: core: fix ufshcd_abort_all racing issue

Peter Wang (王信友) June 24, 2024, 12:11 p.m. UTC

From: Peter Wang <peter.wang@mediatek.com>

When ufshcd_abort_all racing with complete ISR,
the completed tag of request will be release by ISR.
And ufshca_abort_all call ufshcd_mcq_req_to_hwq will
get NULL pointer KE.
Also change the return value success when request is
completed by ISR beacuse sq dosen't need cleanup.

Below is KE back trace.
  ufshcd_try_to_abort_task: cmd at tag 41 not pending in the device.
  ufshcd_try_to_abort_task: cmd at tag=41 is cleared.
  Aborting tag 41 / CDB 0x28 succeeded
  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000194
  pc : [0xffffffddd7a79bf8] blk_mq_unique_tag+0x8/0x14
  lr : [0xffffffddd6155b84] ufshcd_mcq_req_to_hwq+0x1c/0x40 [ufs_mediatek_mod_ise]
   do_mem_abort+0x58/0x118
   el1_abort+0x3c/0x5c
   el1h_64_sync_handler+0x54/0x90
   el1h_64_sync+0x68/0x6c
   blk_mq_unique_tag+0x8/0x14
   ufshcd_err_handler+0xae4/0xfa8 [ufs_mediatek_mod_ise]
   process_one_work+0x208/0x4fc
   worker_thread+0x228/0x438
   kthread+0x104/0x1d4
   ret_from_fork+0x10/0x20

  ufshcd_try_to_abort_task: cmd pending in the device. tag = 6
  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000194
   pc : [0xffffffd589679bf8] blk_mq_unique_tag+0x8/0x14
   lr : [0xffffffd5862f95b4] ufshcd_mcq_sq_cleanup+0x6c/0x1cc [ufs_mediatek_mod_ise]
   Workqueue: ufs_eh_wq_0 ufshcd_err_handler [ufs_mediatek_mod_ise]
   Call trace:
    dump_backtrace+0xf8/0x148
    show_stack+0x18/0x24
    dump_stack_lvl+0x60/0x7c
    dump_stack+0x18/0x3c
    mrdump_common_die+0x24c/0x398 [mrdump]
    ipanic_die+0x20/0x34 [mrdump]
    notify_die+0x80/0xd8
    die+0x94/0x2b8
    __do_kernel_fault+0x264/0x298
    do_page_fault+0xa4/0x4b8
    do_translation_fault+0x38/0x54
    do_mem_abort+0x58/0x118
    el1_abort+0x3c/0x5c
    el1h_64_sync_handler+0x54/0x90
    el1h_64_sync+0x68/0x6c
    blk_mq_unique_tag+0x8/0x14
    ufshcd_clear_cmd+0x34/0x118 [ufs_mediatek_mod_ise]
    ufshcd_try_to_abort_task+0x2c8/0x5b4 [ufs_mediatek_mod_ise]
    ufshcd_err_handler+0xa7c/0xfa8 [ufs_mediatek_mod_ise]
    process_one_work+0x208/0x4fc
    worker_thread+0x228/0x438
    kthread+0x104/0x1d4
    ret_from_fork+0x10/0x20

Fixes: 8d7290348992 ("scsi: ufs: mcq: Add supporting functions for MCQ abort")
Fixes: 93e6c0e19d5b ("scsi: ufs: core: Clear cmd if abort succeeds in MCQ mode")
Cc: <stable@vger.kernel.org> 6.6.x
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
---
 drivers/ufs/core/ufs-mcq.c | 5 +++--
 drivers/ufs/core/ufshcd.c  | 9 +++++----
 2 files changed, 8 insertions(+), 6 deletions(-)

Bart Van Assche June 24, 2024, 6:01 p.m. UTC | #1

On 6/24/24 5:11 AM, peter.wang@mediatek.com wrote:
 > [ ... ]
In this patch there are two call traces, two fixes tags and two code
changes. Please split this patch into two patches with each one call
trace, one Fixes: tag and one code change. Additionally, please include
a changelog when posting a second or later version.

> diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c
> index 8944548c30fa..3b2e5bcb08a7 100644
> --- a/drivers/ufs/core/ufs-mcq.c
> +++ b/drivers/ufs/core/ufs-mcq.c
> @@ -512,8 +512,9 @@ int ufshcd_mcq_sq_cleanup(struct ufs_hba *hba, int task_tag)
>   		return -ETIMEDOUT;
>   
>   	if (task_tag != hba->nutrs - UFSHCD_NUM_RESERVED) {
> -		if (!cmd)
> -			return -EINVAL;
> +		/* Should return 0 if cmd is already complete by irq */
> +		if (!cmd || !ufshcd_cmd_inflight(cmd))
> +			return 0;
>   		hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(cmd));
>   	} else {
>   		hwq = hba->dev_cmd_queue;

Does the call trace show that blk_mq_unique_tag() tries to dereference 
address 0x194? If so, how is this possible? There are
only two lrbp->cmd assignments in the UFS driver. These assignments
either assign a valid SCSI command pointer or NULL. Even after a SCSI
command has been completed, the SCSI command pointer remains valid. So
how can an invalid pointer be passed to blk_mq_unique_tag()? Please
root-cause this issue instead of posting a code change that reduces a
race window without closing the race window completely.

Thanks,

Bart.

Peter Wang (王信友) June 25, 2024, 8:29 a.m. UTC | #2

On Mon, 2024-06-24 at 11:01 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 6/24/24 5:11 AM, peter.wang@mediatek.com wrote:
>  > [ ... ]
> In this patch there are two call traces, two fixes tags and two code
> changes. Please split this patch into two patches with each one call
> trace, one Fixes: tag and one code change. Additionally, please
> include
> a changelog when posting a second or later version.
> 

Hi Bart,

Will split this patch into two patches and add changelog next version,
thanks.

> > diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-
> mcq.c
> > index 8944548c30fa..3b2e5bcb08a7 100644
> > --- a/drivers/ufs/core/ufs-mcq.c
> > +++ b/drivers/ufs/core/ufs-mcq.c
> > @@ -512,8 +512,9 @@ int ufshcd_mcq_sq_cleanup(struct ufs_hba *hba,
> int task_tag)
> >   return -ETIMEDOUT;
> >   
> >   if (task_tag != hba->nutrs - UFSHCD_NUM_RESERVED) {
> > -if (!cmd)
> > -return -EINVAL;
> > +/* Should return 0 if cmd is already complete by irq */
> > +if (!cmd || !ufshcd_cmd_inflight(cmd))
> > +return 0;
> >   hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(cmd));
> >   } else {
> >   hwq = hba->dev_cmd_queue;
> 
> Does the call trace show that blk_mq_unique_tag() tries to
> dereference 
> address 0x194? If so, how is this possible? There are
> only two lrbp->cmd assignments in the UFS driver. These assignments
> either assign a valid SCSI command pointer or NULL. Even after a SCSI
> command has been completed, the SCSI command pointer remains valid.
> So
> how can an invalid pointer be passed to blk_mq_unique_tag()? Please
> root-cause this issue instead of posting a code change that reduces a
> race window without closing the race window completely.
> 
> Thanks,
> 
> Bart.
> 

blk_mq_unique_tag() tries to dereference address 0x194, and it is null.
Beacuse ISR end this IO by scsi_done, free request will be called and
set mq_hctx null.
The call path is
scsi_done -> scsi_done_internal -> blk_mq_complete_request ->
scsi_complete -> 
scsi_finish_command -> scsi_io_completion -> scsi_end_request ->
__blk_mq_end_request -> 
blk_mq_free_request -> __blk_mq_free_request

And blk_mq_unique_tag will access mq_hctx then get null pointer error.
Please reference
https://elixir.bootlin.com/linux/latest/source/block/blk-mq.c#L713
https://elixir.bootlin.com/linux/latest/source/block/blk-mq-tag.c#L680

So, the root-casue is very simple, free request then get hwq.
This patch only check if reqesut not free(inflight) then get hwq.
Thought it still have racing winodw, but it is better then do nothing,
right?
Or, maybe we get all cq_lock before get hwq to close the racing window.
But the code may ugly, how do you think?

Thanks.
Peter

Bart Van Assche June 25, 2024, 4:42 p.m. UTC | #3

On 6/25/24 1:29 AM, Peter Wang (王信友) wrote:
> On Mon, 2024-06-24 at 11:01 -0700, Bart Van Assche wrote:
>>   On 6/24/24 5:11 AM, peter.wang@mediatek.com wrote:
>>> diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-
>> mcq.c
>>> index 8944548c30fa..3b2e5bcb08a7 100644
>>> --- a/drivers/ufs/core/ufs-mcq.c
>>> +++ b/drivers/ufs/core/ufs-mcq.c
>>> @@ -512,8 +512,9 @@ int ufshcd_mcq_sq_cleanup(struct ufs_hba *hba,
>> int task_tag)
>>>    return -ETIMEDOUT;
>>>    
>>>    if (task_tag != hba->nutrs - UFSHCD_NUM_RESERVED) {
>>> -if (!cmd)
>>> -return -EINVAL;
>>> +/* Should return 0 if cmd is already complete by irq */
>>> +if (!cmd || !ufshcd_cmd_inflight(cmd))
>>> +return 0;
>>>    hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(cmd));
>>>    } else {
>>>    hwq = hba->dev_cmd_queue;
>>
>> Does the call trace show that blk_mq_unique_tag() tries to
>> dereference
>> address 0x194? If so, how is this possible? There are
>> only two lrbp->cmd assignments in the UFS driver. These assignments
>> either assign a valid SCSI command pointer or NULL. Even after a SCSI
>> command has been completed, the SCSI command pointer remains valid.
>> So
>> how can an invalid pointer be passed to blk_mq_unique_tag()? Please
>> root-cause this issue instead of posting a code change that reduces a
>> race window without closing the race window completely.
> 
> blk_mq_unique_tag() tries to dereference address 0x194, and it is null.
> Beacuse ISR end this IO by scsi_done, free request will be called and
> set mq_hctx null.
> The call path is
> scsi_done -> scsi_done_internal -> blk_mq_complete_request ->
> scsi_complete ->
> scsi_finish_command -> scsi_io_completion -> scsi_end_request ->
> __blk_mq_end_request ->
> blk_mq_free_request -> __blk_mq_free_request
> 
> And blk_mq_unique_tag will access mq_hctx then get null pointer error.
> Please reference
> https://elixir.bootlin.com/linux/latest/source/block/blk-mq.c#L713
> https://elixir.bootlin.com/linux/latest/source/block/blk-mq-tag.c#L680
> 
> So, the root-casue is very simple, free request then get hwq.
> This patch only check if reqesut not free(inflight) then get hwq.
> Thought it still have racing winodw, but it is better then do nothing,
> right?
> Or, maybe we get all cq_lock before get hwq to close the racing window.
> But the code may ugly, how do you think?

Please include a full root cause analysis when reposting fixes for the
reported crashes. It is not clear to me how it is possible that an
invalid pointer is passed to blk_mq_unique_tag() (0x194). As I mentioned
in my previous email, freeing a request does not modify the request
pointer and does not modify the SCSI command pointer either. As one can
derive from the blk_mq_alloc_rqs() call stack, memory for struct request
and struct scsi_cmnd is allocated at request queue allocation time and
is not freed until the request queue is freed. Hence, for a given tag,
neither the request pointer nor the SCSI command pointer changes as long
as a request queue exists. Hence my request for an explanation how it is
possible that an invalid pointer was passed to blk_mq_unique_tag().

Thanks,

Bart.

Peter Wang (王信友) June 26, 2024, 3:56 a.m. UTC | #4

On Tue, 2024-06-25 at 09:42 -0700, Bart Van Assche wrote:
> 
> 
> Please include a full root cause analysis when reposting fixes for
> the
> reported crashes. It is not clear to me how it is possible that an
> invalid pointer is passed to blk_mq_unique_tag() (0x194). As I
> mentioned
> in my previous email, freeing a request does not modify the request
> pointer and does not modify the SCSI command pointer either. As one
> can
> derive from the blk_mq_alloc_rqs() call stack, memory for struct
> request
> and struct scsi_cmnd is allocated at request queue allocation time
> and
> is not freed until the request queue is freed. Hence, for a given
> tag,
> neither the request pointer nor the SCSI command pointer changes as
> long
> as a request queue exists. Hence my request for an explanation how it
> is
> possible that an invalid pointer was passed to blk_mq_unique_tag().
> 
> Thanks,
> 
> Bart.
> 

Hi Bart,

Sorry I have not explain root-cause clearly.
I will add more clear root-cause analyze next version.

And it is not an invalid pointer is passed to blk_mq_unique_tag(),
I means blk_mq_unique_tag function try access null pointer.
It is differnt and cause misunderstanding.

The null pinter blk_mq_unique_tag try access is:
rq->mq_hctx(NULL)->queue_num.

The racing flow is:

Thread A
ufshcd_err_handler					step 1
	ufshcd_cmd_inflight(true)			step 3
	ufshcd_mcq_req_to_hwq
		blk_mq_unique_tag
			rq->mq_hctx->queue_num		step 5

Thread B				
ufs_mtk_mcq_intr(cq complete ISR)			step 2
	scsi_done						
		...
		__blk_mq_free_request
			rq->mq_hctx = NULL;		step 4

Thanks.
Peter

Bart Van Assche June 26, 2024, 5:13 p.m. UTC | #5

On 6/25/24 8:56 PM, Peter Wang (王信友) wrote:
> Sorry I have not explain root-cause clearly.
> I will add more clear root-cause analyze next version.
> 
> And it is not an invalid pointer is passed to blk_mq_unique_tag(),
> I means blk_mq_unique_tag function try access null pointer.
> It is differnt and cause misunderstanding.
> 
> The null pinter blk_mq_unique_tag try access is:
> rq->mq_hctx(NULL)->queue_num.
> 
> The racing flow is:
> 
> Thread A
> ufshcd_err_handler					step 1
> 	ufshcd_cmd_inflight(true)			step 3
> 	ufshcd_mcq_req_to_hwq
> 		blk_mq_unique_tag
> 			rq->mq_hctx->queue_num		step 5
> 
> Thread B				
> ufs_mtk_mcq_intr(cq complete ISR)			step 2
> 	scsi_done						
> 		...
> 		__blk_mq_free_request
> 			rq->mq_hctx = NULL;		step 4

How about surrounding the blk_mq_unique_tag() call with
atomic_inc_not_zero(&req->ref) / atomic_dec(&req->ref)?

Thanks,

Bart.

Wenchao Hao June 27, 2024, 7:59 a.m. UTC | #6

On 2024/6/26 11:56, Peter Wang (王信友) wrote:
> On Tue, 2024-06-25 at 09:42 -0700, Bart Van Assche wrote:
>>
>>
>> Please include a full root cause analysis when reposting fixes for
>> the
>> reported crashes. It is not clear to me how it is possible that an
>> invalid pointer is passed to blk_mq_unique_tag() (0x194). As I
>> mentioned
>> in my previous email, freeing a request does not modify the request
>> pointer and does not modify the SCSI command pointer either. As one
>> can
>> derive from the blk_mq_alloc_rqs() call stack, memory for struct
>> request
>> and struct scsi_cmnd is allocated at request queue allocation time
>> and
>> is not freed until the request queue is freed. Hence, for a given
>> tag,
>> neither the request pointer nor the SCSI command pointer changes as
>> long
>> as a request queue exists. Hence my request for an explanation how it
>> is
>> possible that an invalid pointer was passed to blk_mq_unique_tag().
>>
>> Thanks,
>>
>> Bart.
>>
> 
> Hi Bart,
> 
> Sorry I have not explain root-cause clearly.
> I will add more clear root-cause analyze next version.
> 
> And it is not an invalid pointer is passed to blk_mq_unique_tag(),
> I means blk_mq_unique_tag function try access null pointer.
> It is differnt and cause misunderstanding.
> 
> The null pinter blk_mq_unique_tag try access is:
> rq->mq_hctx(NULL)->queue_num.
> 

Hi Peter, 

What is queue_num's offset of blk_mq_hw_ctx in your machine?

gdb vmlinux

(gdb) print /x (int)&((struct blk_mq_hw_ctx *)0)->queue_num
$5 = 0x164

I read your descriptions and wondered a same race flow as you described
following. But I found the offset mismatch, if the racing flow is correct,
then the address accessed in blk_mq_unique_tag() should be 0x164, not 0x194.
Maybe the offset is different between our machine?

What's more, if the racing flow is correct, I did not get how your changes
can address this racing flow.

> The racing flow is:
> 
> Thread A
> ufshcd_err_handler					step 1
> 	ufshcd_cmd_inflight(true)			step 3
> 	ufshcd_mcq_req_to_hwq
> 		blk_mq_unique_tag
> 			rq->mq_hctx->queue_num		step 5
> 
> Thread B				
> ufs_mtk_mcq_intr(cq complete ISR)			step 2
> 	scsi_done						
> 		...
> 		__blk_mq_free_request
> 			rq->mq_hctx = NULL;		step 4
> 
> Thanks.
> Peter
> 
> 
> 
>

Wenchao Hao June 27, 2024, 9:19 a.m. UTC | #7

On 2024/6/27 1:13, Bart Van Assche wrote:
> On 6/25/24 8:56 PM, Peter Wang (王信友) wrote:
>> Sorry I have not explain root-cause clearly.
>> I will add more clear root-cause analyze next version.
>>
>> And it is not an invalid pointer is passed to blk_mq_unique_tag(),
>> I means blk_mq_unique_tag function try access null pointer.
>> It is differnt and cause misunderstanding.
>>
>> The null pinter blk_mq_unique_tag try access is:
>> rq->mq_hctx(NULL)->queue_num.
>>
>> The racing flow is:
>>
>> Thread A
>> ufshcd_err_handler                    step 1
>>     ufshcd_cmd_inflight(true)            step 3
>>     ufshcd_mcq_req_to_hwq
>>         blk_mq_unique_tag
>>             rq->mq_hctx->queue_num        step 5
>>
>> Thread B               
>> ufs_mtk_mcq_intr(cq complete ISR)            step 2
>>     scsi_done                       
>>         ...
>>         __blk_mq_free_request
>>             rq->mq_hctx = NULL;        step 4
> 
> How about surrounding the blk_mq_unique_tag() call with
> atomic_inc_not_zero(&req->ref) / atomic_dec(&req->ref)?
> 

Hi Bart,

A small wonder, then should we append __blk_mq_free_request() if
req->ref decreased to 0 like following?

        if (req_ref_put_and_test(rq))
                __blk_mq_free_request(rq);
 

> Thanks,
> 
> Bart.
>

Peter Wang (王信友) June 27, 2024, 10:58 a.m. UTC | #8

On Thu, 2024-06-27 at 15:59 +0800, Wenchao Hao wrote:
> 
> Hi Peter, 
> 
> What is queue_num's offset of blk_mq_hw_ctx in your machine?
> 
> gdb vmlinux
> 
> (gdb) print /x (int)&((struct blk_mq_hw_ctx *)0)->queue_num
> $5 = 0x164
> 
> I read your descriptions and wondered a same race flow as you
> described
> following. But I found the offset mismatch, if the racing flow is
> correct,
> then the address accessed in blk_mq_unique_tag() should be 0x164, not
> 0x194.
> Maybe the offset is different between our machine?
> 
> What's more, if the racing flow is correct, I did not get how your
> changes
> can address this racing flow.
> 
> 

Hi Wenchao Hao,

Yes, our queue_num's offset of blk_mq_hw_ctx is 0x194.
Our kernel version is: Linux version 6.1.68
I think the offset is different by kernel version.

(gdb) print /x (int)&((struct blk_mq_hw_ctx *)0)->queue_num
$1 = 0x194

And yes, it only shorten the race window of step3 and step5.
Reduce the probability of step 4 appearing between step 3 and step 5.


Thanks.
Peter

Peter Wang (王信友) June 27, 2024, 10:59 a.m. UTC | #9

On Thu, 2024-06-27 at 17:19 +0800, Wenchao Hao wrote:
> 
> 
> Hi Bart,
> 
> A small wonder, then should we append __blk_mq_free_request() if
> req->ref decreased to 0 like following?
> 
>         if (req_ref_put_and_test(rq))
>                 __blk_mq_free_request(rq);
>  
> 
> > Thanks,
> > 
> > Bart.
> > 
> 

Hi Bart and Wenchao Hao,

It could be have side effect if we surrounding the blk_mq_unique_tag()
call with
atomic_inc_not_zero(&req->ref) / atomic_dec(&req->ref)?
Beacuse __blk_mq_free_request stll have other finish job to.

This is a chicken-and-egg problem. We need to acquire a lock to know
which hwq it is, 
but we need to know which hwq it is to acquire the lock. 
Therefore, to resolve this dilemma, perhaps we should just take all the
hwq locks 
indiscriminately?

Thanks.
Peter

Bart Van Assche June 27, 2024, 8:13 p.m. UTC | #10

On 6/27/24 3:59 AM, Peter Wang (王信友) wrote:
> This is a chicken-and-egg problem. We need to acquire a lock to know 
> which hwq it is, but we need to know which hwq it is to acquire the
> lock. Therefore, to resolve this dilemma, perhaps we should just take
> all the hwq locks indiscriminately?
How about the (untested) patch below?

Thanks,

Bart.

diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c
index aa119746ee92..c5d327ba253f 100644
--- a/drivers/ufs/core/ufs-mcq.c
+++ b/drivers/ufs/core/ufs-mcq.c
@@ -105,16 +105,15 @@ EXPORT_SYMBOL_GPL(ufshcd_mcq_config_mac);
   * @hba: per adapter instance
   * @req: pointer to the request to be issued
   *
- * Return: the hardware queue instance on which the request would
- * be queued.
+ * Return: the hardware queue instance on which the request will be or 
has been
+ * queued. %NULL if the request has already been freed.
   */
  struct ufs_hw_queue *ufshcd_mcq_req_to_hwq(struct ufs_hba *hba,
  					 struct request *req)
  {
-	u32 utag = blk_mq_unique_tag(req);
-	u32 hwq = blk_mq_unique_tag_to_hwq(utag);
+	struct blk_mq_hw_ctx *hctx = READ_ONCE(rq->mq_hctx);

-	return &hba->uhq[hwq];
+	return hctx ? &hba->uhq[hctx->queue_num] : NULL;
  }

  /**
@@ -547,6 +546,8 @@ int ufshcd_mcq_sq_cleanup(struct ufs_hba *hba, int 
task_tag)
  		if (!cmd)
  			return -EINVAL;
  		hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(cmd));
+		if (!hwq)
+			return -EINVAL;
  	} else {
  		hwq = hba->dev_cmd_queue;
  	}

Wenchao Hao June 28, 2024, 1:44 a.m. UTC | #11

On 2024/6/27 18:58, Peter Wang (王信友) wrote:
> On Thu, 2024-06-27 at 15:59 +0800, Wenchao Hao wrote:
>>
>> Hi Peter, 
>>
>> What is queue_num's offset of blk_mq_hw_ctx in your machine?
>>
>> gdb vmlinux
>>
>> (gdb) print /x (int)&((struct blk_mq_hw_ctx *)0)->queue_num
>> $5 = 0x164
>>
>> I read your descriptions and wondered a same race flow as you
>> described
>> following. But I found the offset mismatch, if the racing flow is
>> correct,
>> then the address accessed in blk_mq_unique_tag() should be 0x164, not
>> 0x194.
>> Maybe the offset is different between our machine?
>>
>> What's more, if the racing flow is correct, I did not get how your
>> changes
>> can address this racing flow.
>>
>>
> 
> Hi Wenchao Hao,
> 
> Yes, our queue_num's offset of blk_mq_hw_ctx is 0x194.
> Our kernel version is: Linux version 6.1.68
> I think the offset is different by kernel version.
> 
> (gdb) print /x (int)&((struct blk_mq_hw_ctx *)0)->queue_num
> $1 = 0x194
> 
> And yes, it only shorten the race window of step3 and step5.
> Reduce the probability of step 4 appearing between step 3 and step 5.
> 
> 

Hi Peter,

Thanks for your reply, I understand the issue now.

> Thanks.
> Peter

Peter Wang (王信友) June 28, 2024, 3:13 a.m. UTC | #12

>   struct ufs_hw_queue *ufshcd_mcq_req_to_hwq(struct ufs_hba *hba,
>    struct request *req)
>   {
> -u32 utag = blk_mq_unique_tag(req);
> -u32 hwq = blk_mq_unique_tag_to_hwq(utag);
> +struct blk_mq_hw_ctx *hctx = READ_ONCE(rq->mq_hctx);
> 
> -return &hba->uhq[hwq];
> +return hctx ? &hba->uhq[hctx->queue_num] : NULL;
>   }
> 

Hi Bart,

Yes, it could be work, thanks.

>   /**
> @@ -547,6 +546,8 @@ int ufshcd_mcq_sq_cleanup(struct ufs_hba *hba,
> int 
> task_tag)
>   if (!cmd)
>   return -EINVAL;
>   hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(cmd));
> +if (!hwq)
> +return -EINVAL;
> 

Should retrun 0, beacuse hwq null means tag is done by ISR.
We don't need cleanup sq now.


Thanks.
Peter



>   } else {
>   hwq = hba->dev_cmd_queue;
>   }
>

[v2] ufs: core: fix ufshcd_abort_all racing issue

Commit Message

Comments

Patch