[v4] ufs: core: wlun send SSU timeout recovery

Message ID	20230922090925.16339-1-peter.wang@mediatek.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-mediatek-bounces+linux-mediatek=archiver.kernel.org@lists.infradead.org> From: <peter.wang@mediatek.com> To: <stanley.chu@mediatek.com>, <linux-scsi@vger.kernel.org>, <martin.petersen@oracle.com>, <avri.altman@wdc.com>, <alim.akhtar@samsung.com>, <jejb@linux.ibm.com> CC: <wsd_upstream@mediatek.com>, <linux-mediatek@lists.infradead.org>, <peter.wang@mediatek.com>, <chun-hung.wu@mediatek.com>, <alice.chao@mediatek.com>, <cc.chou@mediatek.com>, <chaotian.jing@mediatek.com>, <jiajie.hao@mediatek.com>, <powen.kao@mediatek.com>, <qilin.tan@mediatek.com>, <lin.gui@mediatek.com>, <tun-yu.yu@mediatek.com>, <eddie.huang@mediatek.com>, <naomi.chu@mediatek.com> Subject: [PATCH v4] ufs: core: wlun send SSU timeout recovery Date: Fri, 22 Sep 2023 17:09:25 +0800 Message-ID: <20230922090925.16339-1-peter.wang@mediatek.com> MIME-Version: 1.0 Content-Type: text/plain Precedence: list Sender: "Linux-mediatek" <linux-mediatek-bounces@lists.infradead.org> Errors-To: linux-mediatek-bounces+linux-mediatek=archiver.kernel.org@lists.infradead.org
Series	[v4] ufs: core: wlun send SSU timeout recovery \| expand [v4] ufs: core: wlun send SSU timeout recovery

Peter Wang (王信友) Sept. 22, 2023, 9:09 a.m. UTC

From: Peter Wang <peter.wang@mediatek.com>

When runtime pm send SSU times out, the SCSI core invokes
eh_host_reset_handler, which hooks function ufshcd_eh_host_reset_handler
schedule eh_work and stuck at wait flush_work(&hba->eh_work).
However, ufshcd_err_handler hangs in wait rpm resume.
Do link recovery only in this case.
Below is IO hang stack dump in kernel-6.1

kworker/4:0     D
<ffffffd7d31f6fb4> __switch_to+0x180/0x344
<ffffffd7d31f779c> __schedule+0x5ec/0xa14
<ffffffd7d31f7c3c> schedule+0x78/0xe0
<ffffffd7d31fefbc> schedule_timeout+0xb0/0x15c
<ffffffd7d31f8120> io_schedule_timeout+0x48/0x70
<ffffffd7d31f8e40> do_wait_for_common+0x108/0x19c
<ffffffd7d31f837c> wait_for_completion_io_timeout+0x50/0x78
<ffffffd7d2876bc0> blk_execute_rq+0x1b8/0x218
<ffffffd7d2b4297c> scsi_execute_cmd+0x148/0x238
<ffffffd7d2da7358> ufshcd_set_dev_pwr_mode+0xe8/0x244
<ffffffd7d2da7e40> __ufshcd_wl_resume+0x1e0/0x45c
<ffffffd7d2da7b28> ufshcd_wl_runtime_resume+0x3c/0x174
<ffffffd7d2b4f290> scsi_runtime_resume+0x7c/0xc8
<ffffffd7d2ae1d48> __rpm_callback+0xa0/0x410
<ffffffd7d2ae0128> rpm_resume+0x43c/0x67c
<ffffffd7d2ae1e98> __rpm_callback+0x1f0/0x410
<ffffffd7d2ae014c> rpm_resume+0x460/0x67c
<ffffffd7d2ae1450> pm_runtime_work+0xa4/0xac
<ffffffd7d22e39ac> process_one_work+0x208/0x598
<ffffffd7d22e3fc0> worker_thread+0x228/0x438
<ffffffd7d22eb038> kthread+0x104/0x1d4
<ffffffd7d22171a0> ret_from_fork+0x10/0x20

scsi_eh_0       D
<ffffffd7d31f6fb4> __switch_to+0x180/0x344
<ffffffd7d31f779c> __schedule+0x5ec/0xa14
<ffffffd7d31f7c3c> schedule+0x78/0xe0
<ffffffd7d31fef50> schedule_timeout+0x44/0x15c
<ffffffd7d31f8e40> do_wait_for_common+0x108/0x19c
<ffffffd7d31f8234> wait_for_completion+0x48/0x64
<ffffffd7d22deb88> __flush_work+0x260/0x2d0
<ffffffd7d22de918> flush_work+0x10/0x20
<ffffffd7d2da4728> ufshcd_eh_host_reset_handler+0x88/0xcc
<ffffffd7d2b41da4> scsi_try_host_reset+0x48/0xe0
<ffffffd7d2b410fc> scsi_eh_ready_devs+0x934/0xa40
<ffffffd7d2b41618> scsi_error_handler+0x168/0x374
<ffffffd7d22eb038> kthread+0x104/0x1d4
<ffffffd7d22171a0> ret_from_fork+0x10/0x20

kworker/u16:5   D
<ffffffd7d31f6fb4> __switch_to+0x180/0x344
<ffffffd7d31f779c> __schedule+0x5ec/0xa14
<ffffffd7d31f7c3c> schedule+0x78/0xe0
<ffffffd7d2adfe00> rpm_resume+0x114/0x67c
<ffffffd7d2adfca8> __pm_runtime_resume+0x70/0xb4
<ffffffd7d2d9cf48> ufshcd_err_handler+0x1a0/0xe68
<ffffffd7d22e39ac> process_one_work+0x208/0x598
<ffffffd7d22e3fc0> worker_thread+0x228/0x438
<ffffffd7d22eb038> kthread+0x104/0x1d4
<ffffffd7d22171a0> ret_from_fork+0x10/0x20

Signed-off-by: Peter Wang <peter.wang@mediatek.com>
---
 drivers/ufs/core/ufshcd.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

Bart Van Assche Sept. 22, 2023, 4:06 p.m. UTC | #1

On 9/22/23 02:09, peter.wang@mediatek.com wrote:
> When runtime pm send SSU times out, the SCSI core invokes
> eh_host_reset_handler, which hooks function ufshcd_eh_host_reset_handler
> schedule eh_work and stuck at wait flush_work(&hba->eh_work).
> However, ufshcd_err_handler hangs in wait rpm resume.
> Do link recovery only in this case.
> Below is IO hang stack dump in kernel-6.1

What does kernel-6.1 mean? Has commit 7029e2151a7c ("scsi: ufs: Fix a
deadlock between PM and the SCSI error handler") been backported to
that kernel?

> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index c2df07545f96..7608d75bb4fe 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -7713,9 +7713,29 @@ static int ufshcd_eh_host_reset_handler(struct scsi_cmnd *cmd)
>   	int err = SUCCESS;
>   	unsigned long flags;
>   	struct ufs_hba *hba;
> +	struct device *dev;
>   
>   	hba = shost_priv(cmd->device->host);
>   
> +	/*
> +	 * If runtime pm send SSU and got timeout, scsi_error_handler
> +	 * stuck at this function to wait flush_work(&hba->eh_work).
> +	 * And ufshcd_err_handler(eh_work) stuck at wait runtime pm active.
> +	 * Do ufshcd_link_recovery instead shedule eh_work can prevent
> +	 * dead lock happen.
> +	 */
> +	dev = &hba->ufs_device_wlun->sdev_gendev;
> +	if ((dev->power.runtime_status == RPM_RESUMING) ||
> +		(dev->power.runtime_status == RPM_SUSPENDING)) {
> +		err = ufshcd_link_recovery(hba);
> +		if (err) {
> +			dev_err(hba->dev, "WL Device PM: status:%d, err:%d\n",
> +				dev->power.runtime_status,
> +				dev->power.runtime_error);
> +		}
> +		return err;
> +	}
> +
>   	spin_lock_irqsave(hba->host->host_lock, flags);
>   	hba->force_reset = true;
>   	ufshcd_schedule_eh_work(hba);

I think this change is racy because the runtime power management status
may change after the above checks have been performed and before
ufshcd_err_handling_prepare() is called. If commit 7029e2151a7c is
included in your kernel, does applying the untested patch below help?

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index d45a7dd80ab8..656dabea678e 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -123,8 +123,7 @@ static void scsi_mq_requeue_cmd(struct scsi_cmnd *cmd, unsigned long msecs)
  	}

  	blk_mq_requeue_request(rq, false);
-	if (!scsi_host_in_recovery(cmd->device->host))
-		blk_mq_delay_kick_requeue_list(rq->q, msecs);
+	blk_mq_delay_kick_requeue_list(rq->q, msecs);
  }

  /**
@@ -163,8 +162,7 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, bool unbusy)
  	 */
  	cmd->result = 0;

-	blk_mq_requeue_request(scsi_cmd_to_rq(cmd),
-			       !scsi_host_in_recovery(cmd->device->host));
+	blk_mq_requeue_request(scsi_cmd_to_rq(cmd), true);
  }

  /**
@@ -495,9 +493,6 @@ static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd)

  static void scsi_run_queue_async(struct scsi_device *sdev)
  {
-	if (scsi_host_in_recovery(sdev->host))
-		return;
-
  	if (scsi_target(sdev)->single_lun ||
  	    !list_empty(&sdev->host->starved_list)) {
  		kblockd_schedule_work(&sdev->requeue_work);

Thanks,

Bart.

Bart Van Assche Sept. 25, 2023, 10:07 p.m. UTC | #2

On 9/25/23 01:51, Peter Wang (王信友) wrote:
> On Fri, 2023-09-22 at 09:06 -0700, Bart Van Assche wrote:
>> On 9/22/23 02:09, peter.wang@mediatek.com wrote:
>> > When runtime pm send SSU times out, the SCSI core invokes
>> > eh_host_reset_handler, which hooks function ufshcd_eh_host_reset_handler
>> > schedule eh_work and stuck at wait flush_work(&hba->eh_work).
>> > However, ufshcd_err_handler hangs in wait rpm resume.
>> > Do link recovery only in this case.
>> > Below is IO hang stack dump in kernel-6.1
>>
>> What does kernel-6.1 mean? Has commit 7029e2151a7c ("scsi: ufs: Fix a
>> deadlock between PM and the SCSI error handler") been backported to
>> that kernel?
> 
> Yes, our kernel-6.1 have backport 7029e2151a7c ("scsi: ufs: Fix a
> deadlock between PM and the SCSI error handler")

Hi Peter,

Thanks for having confirmed this.

Please use text mode instead of HTML mode when replying. As one can see
here, your reply did not make it to the linux-scsi mailing list:
https://lore.kernel.org/linux-scsi/20230922090925.16339-1-peter.wang@mediatek.com/

> And this IO hang issue still happen.
> 
> This issue is easy trigger if we drop the SSU response to let SSU timeout.
> 
> 
> 
>>
>> > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
>> > index c2df07545f96..7608d75bb4fe 100644
>> > --- a/drivers/ufs/core/ufshcd.c
>> > +++ b/drivers/ufs/core/ufshcd.c
>> > @@ -7713,9 +7713,29 @@ static int ufshcd_eh_host_reset_handler(struct scsi_cmnd *cmd)
>> >   int err = SUCCESS;
>> >   unsigned long flags;
>> >   struct ufs_hba *hba;
>> > +struct device *dev;
>> >   
>> >   hba = shost_priv(cmd->device->host);
>> >   
>> > +/*
>> > + * If runtime pm send SSU and got timeout, scsi_error_handler
>> > + * stuck at this function to wait flush_work(&hba->eh_work).
>> > + * And ufshcd_err_handler(eh_work) stuck at wait runtime pm active.
>> > + * Do ufshcd_link_recovery instead shedule eh_work can prevent
>> > + * dead lock happen.
>> > + */
>> > +dev = &hba->ufs_device_wlun->sdev_gendev;
>> > +if ((dev->power.runtime_status == RPM_RESUMING) ||
>> > +(dev->power.runtime_status == RPM_SUSPENDING)) {
>> > +err = ufshcd_link_recovery(hba);
>> > +if (err) {
>> > +dev_err(hba->dev, "WL Device PM: status:%d, err:%d\n",
>> > +dev->power.runtime_status,
>> > +dev->power.runtime_error);
>> > +}
>> > +return err;
>> > +}
>> > +
>> >   spin_lock_irqsave(hba->host->host_lock, flags);
>> >   hba->force_reset = true;
>> >   ufshcd_schedule_eh_work(hba);
>>
>> I think this change is racy because the runtime power management status
>> may change after the above checks have been performed and before
>> ufshcd_err_handling_prepare() is called.
> 
> If this function invoke and RPM state is in the RPM_RESUMING or RPM_SUSPENDING state,
> it means error happen in ufs driver resume/suspend callback function and no chance
> move to next state if callback function is not finished, just like backtrace stuck in send SSU cmd.
> Or we can make it simple by check pm_op_in_progress, what do you think?

I'm concerned that this approach will fix some cases but not all cases. The
UFS error handler (ufshcd_err_handler()) and runtime suspend or resume code
may be called concurrently. No matter which checks are added at the start of
ufshcd_eh_host_reset_handler(), I think these checks won't protect against
concurrent execution of runtime power management code just after these checks
have finished.

> Since the hang db backtrace is not involde below function, it is not help.

I think that conclusion is wrong. Can you please test the patch that I provided?

Thanks,

Bart.

Peter Wang (王信友) Sept. 26, 2023, 10:23 a.m. UTC | #3

On Mon, 2023-09-25 at 15:07 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 9/25/23 01:51, Peter Wang (王信友) wrote:
> > On Fri, 2023-09-22 at 09:06 -0700, Bart Van Assche wrote:
> >> On 9/22/23 02:09, peter.wang@mediatek.com wrote:
> >> > When runtime pm send SSU times out, the SCSI core invokes
> >> > eh_host_reset_handler, which hooks function
> ufshcd_eh_host_reset_handler
> >> > schedule eh_work and stuck at wait flush_work(&hba->eh_work).
> >> > However, ufshcd_err_handler hangs in wait rpm resume.
> >> > Do link recovery only in this case.
> >> > Below is IO hang stack dump in kernel-6.1
> >>
> >> What does kernel-6.1 mean? Has commit 7029e2151a7c ("scsi: ufs:
> Fix a
> >> deadlock between PM and the SCSI error handler") been backported
> to
> >> that kernel?
> > 
> > Yes, our kernel-6.1 have backport 7029e2151a7c ("scsi: ufs: Fix a
> > deadlock between PM and the SCSI error handler")
> 
> Hi Peter,
> 
> Thanks for having confirmed this.
> 
> Please use text mode instead of HTML mode when replying. As one can
> see
> here, your reply did not make it to the linux-scsi mailing list:
> 
https://lore.kernel.org/linux-scsi/20230922090925.16339-1-peter.wang@mediatek.com/
> 
> > And this IO hang issue still happen.
> > 
> > This issue is easy trigger if we drop the SSU response to let SSU
> timeout.
> > 
> > 
> > 
> >>
> >> > diff --git a/drivers/ufs/core/ufshcd.c
> b/drivers/ufs/core/ufshcd.c
> >> > index c2df07545f96..7608d75bb4fe 100644
> >> > --- a/drivers/ufs/core/ufshcd.c
> >> > +++ b/drivers/ufs/core/ufshcd.c
> >> > @@ -7713,9 +7713,29 @@ static int
> ufshcd_eh_host_reset_handler(struct scsi_cmnd *cmd)
> >> >   int err = SUCCESS;
> >> >   unsigned long flags;
> >> >   struct ufs_hba *hba;
> >> > +struct device *dev;
> >> >   
> >> >   hba = shost_priv(cmd->device->host);
> >> >   
> >> > +/*
> >> > + * If runtime pm send SSU and got timeout, scsi_error_handler
> >> > + * stuck at this function to wait flush_work(&hba->eh_work).
> >> > + * And ufshcd_err_handler(eh_work) stuck at wait runtime pm
> active.
> >> > + * Do ufshcd_link_recovery instead shedule eh_work can prevent
> >> > + * dead lock happen.
> >> > + */
> >> > +dev = &hba->ufs_device_wlun->sdev_gendev;
> >> > +if ((dev->power.runtime_status == RPM_RESUMING) ||
> >> > +(dev->power.runtime_status == RPM_SUSPENDING)) {
> >> > +err = ufshcd_link_recovery(hba);
> >> > +if (err) {
> >> > +dev_err(hba->dev, "WL Device PM: status:%d, err:%d\n",
> >> > +dev->power.runtime_status,
> >> > +dev->power.runtime_error);
> >> > +}
> >> > +return err;
> >> > +}
> >> > +
> >> >   spin_lock_irqsave(hba->host->host_lock, flags);
> >> >   hba->force_reset = true;
> >> >   ufshcd_schedule_eh_work(hba);
> >>
> >> I think this change is racy because the runtime power management
> status
> >> may change after the above checks have been performed and before
> >> ufshcd_err_handling_prepare() is called.
> > 
> > If this function invoke and RPM state is in the RPM_RESUMING or
> RPM_SUSPENDING state,
> > it means error happen in ufs driver resume/suspend callback
> function and no chance
> > move to next state if callback function is not finished, just like
> backtrace stuck in send SSU cmd.
> > Or we can make it simple by check pm_op_in_progress, what do you
> think?
> 
> I'm concerned that this approach will fix some cases but not all
> cases. The
> UFS error handler (ufshcd_err_handler()) and runtime suspend or
> resume code
> may be called concurrently. No matter which checks are added at the 
> start of
> ufshcd_eh_host_reset_handler(), I think these checks won't protect
> against
> concurrent execution of runtime power management code just after
> these checks
> have finished.
> 

Hi Bart,

Yes, but ufshcd_err_handler will wait runtime pm resume to actvie if
concurrently run with runtime suspend or resume.
(ufshcd_err_handler -> ufshcd_err_handling_prepare ->
ufshcd_rpm_get_sync)
So, if runtime suspend or resume send SSU timeout, scsi error handler
stuck at this function and deadlock happen.
The ideas is, if runtime pm flow get error, do ufshcd_link_recovery is
enough.


> > Since the hang db backtrace is not involde below function, it is
> not help.
> 
> I think that conclusion is wrong. Can you please test the patch that
> I provided?
> 

Yes, I have check this patch is include in our code base, and io hang
still happen. The stack dump of this deadlock can tell how it cause io
hang.

Thanks.

> Thanks,
> 
> Bart.

Bart Van Assche Sept. 26, 2023, 6:26 p.m. UTC | #4

On 9/26/23 03:23, Peter Wang (王信友) wrote:
> Yes, but ufshcd_err_handler will wait runtime pm resume to actvie if
> concurrently run with runtime suspend or resume.
> (ufshcd_err_handler -> ufshcd_err_handling_prepare ->
> ufshcd_rpm_get_sync)
> So, if runtime suspend or resume send SSU timeout, scsi error handler
> stuck at this function and deadlock happen.
> The ideas is, if runtime pm flow get error, do ufshcd_link_recovery is
> enough.

Thank you for having provided this clarification.

Bart.

Bart Van Assche Sept. 26, 2023, 6:29 p.m. UTC | #5

On 9/22/23 02:09, peter.wang@mediatek.com wrote:
> +	/*
> +	 * If runtime pm send SSU and got timeout, scsi_error_handler
> +	 * stuck at this function to wait flush_work(&hba->eh_work).
> +	 * And ufshcd_err_handler(eh_work) stuck at wait runtime pm active.
> +	 * Do ufshcd_link_recovery instead shedule eh_work can prevent
> +	 * dead lock happen.
> +	 */

The above comment is hard to understand because of grammatical issues.
Please try to improve this comment. A few examples: I think that "wait"
should be changed into "wait for" and also that "happen" should be changed
into "to happen".

> +	dev = &hba->ufs_device_wlun->sdev_gendev;
> +	if ((dev->power.runtime_status == RPM_RESUMING) ||
> +		(dev->power.runtime_status == RPM_SUSPENDING)) {
> +		err = ufshcd_link_recovery(hba);
> +		if (err) {
> +			dev_err(hba->dev, "WL Device PM: status:%d, err:%d\n",
> +				dev->power.runtime_status,
> +				dev->power.runtime_error);
> +		}
> +		return err;
> +	}

ufshcd_link_recovery() returns a Unix error code (e.g. -ETIMEDOUT) while
ufshcd_eh_host_reset_handler() should return one of the following values:
SUCCESS or FAILED. Please fix this.

Thanks,

Bart.

Peter Wang (王信友) Sept. 27, 2023, 3:32 a.m. UTC | #6

On Tue, 2023-09-26 at 11:29 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 9/22/23 02:09, peter.wang@mediatek.com wrote:
> > +/*
> > + * If runtime pm send SSU and got timeout, scsi_error_handler
> > + * stuck at this function to wait flush_work(&hba->eh_work).
> > + * And ufshcd_err_handler(eh_work) stuck at wait runtime pm
> active.
> > + * Do ufshcd_link_recovery instead shedule eh_work can prevent
> > + * dead lock happen.
> > + */
> 
> The above comment is hard to understand because of grammatical
> issues.
> Please try to improve this comment. A few examples: I think that
> "wait"
> should be changed into "wait for" and also that "happen" should be
> changed
> into "to happen".
> 
> > +dev = &hba->ufs_device_wlun->sdev_gendev;
> > +if ((dev->power.runtime_status == RPM_RESUMING) ||
> > +(dev->power.runtime_status == RPM_SUSPENDING)) {
> > +err = ufshcd_link_recovery(hba);
> > +if (err) {
> > +dev_err(hba->dev, "WL Device PM: status:%d, err:%d\n",
> > +dev->power.runtime_status,
> > +dev->power.runtime_error);
> > +}
> > +return err;
> > +}
> 
> ufshcd_link_recovery() returns a Unix error code (e.g. -ETIMEDOUT)
> while
> ufshcd_eh_host_reset_handler() should return one of the following
> values:
> SUCCESS or FAILED. Please fix this.
> 
> Thanks,
> 
> Bart.

Hi Bart,

Thanks for review.
I will correct comment and chnage return value, Also make it more
simple by check pm_op_in_progress.

Thanks.

Peter

[v4] ufs: core: wlun send SSU timeout recovery

Commit Message

Comments

Patch