diff mbox series

[v6] ufs: core: wlun suspend SSU/enter hibern8 fail recovery

Message ID 20221208072520.26210-1-peter.wang@mediatek.com (mailing list archive)
State New, archived
Headers show
Series [v6] ufs: core: wlun suspend SSU/enter hibern8 fail recovery | expand

Commit Message

Peter Wang (王信友) Dec. 8, 2022, 7:25 a.m. UTC
From: Peter Wang <peter.wang@mediatek.com>

When SSU/enter hibern8 fail in wlun suspend flow, trigger error
handler and return busy to break the suspend.
If not, wlun runtime pm status become error and the consumer will
stuck in runtime suspend status.

Fixes: b294ff3e3449 ("scsi: ufs: core: Enable power management for wlun")
Cc: stable@vger.kernel.org
Signed-off-by: Peter Wang <peter.wang@mediatek.com>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
---
 drivers/ufs/core/ufshcd.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

Comments

Greg Kroah-Hartman Dec. 8, 2022, 8:02 a.m. UTC | #1
On Thu, Dec 08, 2022 at 03:25:20PM +0800, peter.wang@mediatek.com wrote:
> From: Peter Wang <peter.wang@mediatek.com>
> 
> When SSU/enter hibern8 fail in wlun suspend flow, trigger error
> handler and return busy to break the suspend.
> If not, wlun runtime pm status become error and the consumer will
> stuck in runtime suspend status.
> 
> Fixes: b294ff3e3449 ("scsi: ufs: core: Enable power management for wlun")
> Cc: stable@vger.kernel.org
> Signed-off-by: Peter Wang <peter.wang@mediatek.com>
> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  drivers/ufs/core/ufshcd.c | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index b1f59a5fe632..31ed3fdb5266 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -6070,6 +6070,14 @@ void ufshcd_schedule_eh_work(struct ufs_hba *hba)
>  	}
>  }
>  
> +static void ufshcd_force_error_recovery(struct ufs_hba *hba) 
> +{
> +	spin_lock_irq(hba->host->host_lock);
> +	hba->force_reset = true;
> +	ufshcd_schedule_eh_work(hba);
> +	spin_unlock_irq(hba->host->host_lock);
> +}
> +
>  static void ufshcd_clk_scaling_allow(struct ufs_hba *hba, bool allow)
>  {
>  	down_write(&hba->clk_scaling_lock);
> @@ -9049,6 +9057,15 @@ static int __ufshcd_wl_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op)
>  
>  		if (!hba->dev_info.b_rpm_dev_flush_capable) {
>  			ret = ufshcd_set_dev_pwr_mode(hba, req_dev_pwr_mode);
> +			if (ret && pm_op != UFS_SHUTDOWN_PM) {
> +				/*
> +				 * If return err in suspend flow, IO will hang.
> +				 * Trigger error handler and break suspend for
> +				 * error recovery.
> +				 */
> +				ufshcd_force_error_recovery(hba);
> +				ret = -EBUSY;
> +			}
>  			if (ret)
>  				goto enable_scaling;
>  		}
> @@ -9060,6 +9077,15 @@ static int __ufshcd_wl_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op)
>  	 */
>  	check_for_bkops = !ufshcd_is_ufs_dev_deepsleep(hba);
>  	ret = ufshcd_link_state_transition(hba, req_link_state, check_for_bkops);
> +	if (ret && pm_op != UFS_SHUTDOWN_PM) {
> +		/*
> +		 * If return err in suspend flow, IO will hang.
> +		 * Trigger error handler and break suspend for
> +		 * error recovery.
> +		 */
> +		ufshcd_force_error_recovery(hba);
> +		ret = -EBUSY;
> +	}
>  	if (ret)
>  		goto set_dev_active;
>  
> -- 
> 2.18.0
> 

Hi,

This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
a patch that has triggered this response.  He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created.  Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- This looks like a new version of a previously submitted patch, but you
  did not list below the --- line any changes from the previous version.
  Please read the section entitled "The canonical patch format" in the
  kernel file, Documentation/SubmittingPatches for what needs to be done
  here to properly describe this.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot
Martin K. Petersen Dec. 14, 2022, 3:16 a.m. UTC | #2
Peter,

> When SSU/enter hibern8 fail in wlun suspend flow, trigger error
> handler and return busy to break the suspend.  If not, wlun runtime pm
> status become error and the consumer will stuck in runtime suspend
> status.

Applied to 6.2/scsi-staging, thanks!
Daniil Lunev Dec. 20, 2022, 9 p.m. UTC | #3
> Applied to 6.2/scsi-staging, thanks!

There is an interesting side effect of the patch in this iteration
(which I am not sure was present in the past iteration I tried):
If the device auto suspends while running purge - controller is
seemingly recent and thus the purge is aborted (with no patch at all
it hangs).
That might be ok behaviour though - it will just make it an explicit
requirement to disable runtime suspend during the management
operation.

localhost ~ # ufs-utils fl -t 6 -e -p /dev/bsg/ufs-bsg0
localhost ~ # ufs-utils attr -a -p /dev/bsg/ufs-bsg0 | grep bPurgeStatus
bPurgeStatus               := 0x00

[   25.801980] ufs_device_wlun 0:0:0:49488: START_STOP failed for
power mode: 2, result 2
[   25.802002] ufs_device_wlun 0:0:0:49488: Sense Key : Not Ready [current]
[   25.802009] ufs_device_wlun 0:0:0:49488: Add. Sense: No additional
sense information
[   25.802020] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_suspend
failed: -16
Peter Wang (王信友) Dec. 21, 2022, 5:59 a.m. UTC | #4
On Wed, 2022-12-21 at 08:00 +1100, Daniil Lunev wrote:
> > Applied to 6.2/scsi-staging, thanks!
> 
> There is an interesting side effect of the patch in this iteration
> (which I am not sure was present in the past iteration I tried):
> If the device auto suspends while running purge - controller is
> seemingly recent and thus the purge is aborted (with no patch at all
> it hangs).
> That might be ok behaviour though - it will just make it an explicit
> requirement to disable runtime suspend during the management
> operation.
> 

Hi Daniil,

I am not sure if this is similar reason we get SSU(sleep) fail.
But if without this patch when purge is onging, system IO will hang,
this is no better.
And I have another idea about rpm and purge.

To disable runtime suspend when purge operation is ongoing:
1. Disable rpm when fPurgeEnable is set, polling bPurgeStatus become 0
and enable rpm.
   But polling bPurgeStatus will extend rpm timer, so we don't need
really disable rpm, right?
2. Check bPurgeStatus if enter runtime suspend, return EBUSY if
bPurgeStatus is not 0 to break suspend.
   This is correct design to tell rpm flamework that driver is busy
with purge and suspend is inappropriate. 
   But it should be similar as current flow, return EBUSY when get SSU
fail?

So, with current design, if purge initiator do not want to see rpm
EBUSY, then he should polling bPurgeStatus. 
What do you think?


Thanks.
BR
Peter



> localhost ~ # ufs-utils fl -t 6 -e -p /dev/bsg/ufs-bsg0
> localhost ~ # ufs-utils attr -a -p /dev/bsg/ufs-bsg0 | grep
> bPurgeStatus
> bPurgeStatus               := 0x00
> 
> [   25.801980] ufs_device_wlun 0:0:0:49488: START_STOP failed for
> power mode: 2, result 2
> [   25.802002] ufs_device_wlun 0:0:0:49488: Sense Key : Not Ready
> [current]
> [   25.802009] ufs_device_wlun 0:0:0:49488: Add. Sense: No additional
> sense information
> [   25.802020] ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_suspend
> failed: -16
Daniil Lunev Jan. 2, 2023, 10:05 p.m. UTC | #5
On Wed, Dec 21, 2022 at 4:59 PM Peter Wang (王信友)
<peter.wang@mediatek.com> wrote:
> But if without this patch when purge is onging, system IO will hang,
> this is no better.
Yes, that is why I am just pointing this out as a matter of fact, not as a bug.
It is arguable if resetting the controller in the deadlock situation is a proper
thing to do, but it might be the next best thing, so I don't argue that neither.

> So, with current design, if purge initiator do not want to see rpm
> EBUSY, then he should polling bPurgeStatus.
> What do you think?

I am actually not sure if management operations extend the timeout - they are
going through bsg interface, and I am not sure it properly re-sets the timeouts
on all possible nexus interfaces, need to check that.
But even if it does, there are two problems:
* If you make kernel be polling that parameter - it will actually make the
  application level to miss the completion code (since after querying
  completion once it will return Not Started afterwards).
* And application polling is race prone. We set runtime suspend to 100ms - so
  depending on the scheduling quirks it may miss the event.

--Daniil
diff mbox series

Patch

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index b1f59a5fe632..31ed3fdb5266 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -6070,6 +6070,14 @@  void ufshcd_schedule_eh_work(struct ufs_hba *hba)
 	}
 }
 
+static void ufshcd_force_error_recovery(struct ufs_hba *hba) 
+{
+	spin_lock_irq(hba->host->host_lock);
+	hba->force_reset = true;
+	ufshcd_schedule_eh_work(hba);
+	spin_unlock_irq(hba->host->host_lock);
+}
+
 static void ufshcd_clk_scaling_allow(struct ufs_hba *hba, bool allow)
 {
 	down_write(&hba->clk_scaling_lock);
@@ -9049,6 +9057,15 @@  static int __ufshcd_wl_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op)
 
 		if (!hba->dev_info.b_rpm_dev_flush_capable) {
 			ret = ufshcd_set_dev_pwr_mode(hba, req_dev_pwr_mode);
+			if (ret && pm_op != UFS_SHUTDOWN_PM) {
+				/*
+				 * If return err in suspend flow, IO will hang.
+				 * Trigger error handler and break suspend for
+				 * error recovery.
+				 */
+				ufshcd_force_error_recovery(hba);
+				ret = -EBUSY;
+			}
 			if (ret)
 				goto enable_scaling;
 		}
@@ -9060,6 +9077,15 @@  static int __ufshcd_wl_suspend(struct ufs_hba *hba, enum ufs_pm_op pm_op)
 	 */
 	check_for_bkops = !ufshcd_is_ufs_dev_deepsleep(hba);
 	ret = ufshcd_link_state_transition(hba, req_link_state, check_for_bkops);
+	if (ret && pm_op != UFS_SHUTDOWN_PM) {
+		/*
+		 * If return err in suspend flow, IO will hang.
+		 * Trigger error handler and break suspend for
+		 * error recovery.
+		 */
+		ufshcd_force_error_recovery(hba);
+		ret = -EBUSY;
+	}
 	if (ret)
 		goto set_dev_active;