Message ID | 20221228060157.24852-1-peter.wang@mediatek.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] ufs: core: wlun resume SSU(Acitve) fail recovery | expand |
On 12/27/22 22:01, peter.wang@mediatek.com wrote: > When wlun resume SSU(Active) timeout, scsi try eh_host_reset_handler. ^^^^^^^^^^^ Please use the same spelling in the patch subject (Acitve -> Active). timeout -> times out scsi try -> the SCSI core invokes > But ufshcd_eh_host_reset_handler hang at wait flush_work(&hba->eh_work). hang at -> hangs in > And ufshcd_err_handler hang at wait rpm resume. hang at wait rpm resume -> hangs in rpm_resume(). > <ffffffdd78e02b34> schedule+0x110/0x204 > <ffffffdd78e0be60> schedule_timeout+0x98/0x138 > <ffffffdd78e040e8> wait_for_common_io+0x130/0x2d0 > <ffffffdd77d6a000> blk_execute_rq+0x10c/0x16c > <ffffffdd78126d90> __scsi_execute+0xfc/0x278 > <ffffffdd7813891c> ufshcd_set_dev_pwr_mode+0x1c8/0x40c > <ffffffdd78137d1c> __ufshcd_wl_resume+0xf0/0x5cc > <ffffffdd78137ae0> ufshcd_wl_runtime_resume+0x40/0x18c > <ffffffdd78136108> scsi_runtime_resume+0x88/0x104 > <ffffffdd7809a4f8> __rpm_callback+0x1a0/0xaec > <ffffffdd7809b624> rpm_resume+0x7e0/0xcd0 > <ffffffdd7809a788> __rpm_callback+0x430/0xaec > <ffffffdd7809b644> rpm_resume+0x800/0xcd0 > <ffffffdd780a0778> pm_runtime_work+0x148/0x198 > > <ffffffdd78e02b34> schedule+0x110/0x204 > <ffffffdd78e0be10> schedule_timeout+0x48/0x138 > <ffffffdd78e03d9c> wait_for_common+0x144/0x2dc > <ffffffdd7758bba4> __flush_work+0x3d0/0x508 > <ffffffdd7815572c> ufshcd_eh_host_reset_handler+0x134/0x3a8 > <ffffffdd781216f4> scsi_try_host_reset+0x54/0x204 > <ffffffdd78120594> scsi_eh_ready_devs+0xb30/0xd48 > <ffffffdd7812373c> scsi_error_handler+0x260/0x874 > > <ffffffdd78e02b34> schedule+0x110/0x204 > <ffffffdd7809af64> rpm_resume+0x120/0xcd0 > <ffffffdd7809fde8> __pm_runtime_resume+0xa0/0x17c > <ffffffdd7815193c> ufshcd_err_handling_prepare+0x40/0x430 > <ffffffdd7814cce8> ufshcd_err_handler+0x1c4/0xd4c On top of which kernel version has this patch been developed? I think this deadlock has already been fixed by commit 7029e2151a7c ("scsi: ufs: Fix a deadlock between PM and the SCSI error handler"). Please check whether that commit by itself (without this patch) is sufficient to fix the reported deadlock. > --- > drivers/ufs/core/ufshcd.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) The changelog is missing. Please include a changelog when posting v2 or a later version of a patch. > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c > index e18c9f4463ec..0dfb9a35bf66 100644 > --- a/drivers/ufs/core/ufshcd.c > +++ b/drivers/ufs/core/ufshcd.c > @@ -7366,6 +7366,23 @@ static int ufshcd_eh_host_reset_handler(struct scsi_cmnd *cmd) > > hba = shost_priv(cmd->device->host); > > + /* > + * If pm op resume fail and wait err recovery, do link recovery only. > + * Because schedule eh work will get dead lock in ufshcd_rpm_get_sync > + * and wait wlun resume, but wlun resume error wait eh work finish. > + */ The above comment has grammar issues and some parts are incomprehensible. What does e.g. "wait err recovery" mean? Please improve this source code comment. Thanks, Bart.
On Mon, 2023-01-02 at 16:29 -0800, Bart Van Assche wrote: > On 12/27/22 22:01, peter.wang@mediatek.com wrote: > > When wlun resume SSU(Active) timeout, scsi try > > eh_host_reset_handler. > > ^^^^^^^^^^^ > Please use the same spelling in the patch subject (Acitve -> Active). > > timeout -> times out > scsi try -> the SCSI core invokes > > > But ufshcd_eh_host_reset_handler hang at wait flush_work(&hba- > > >eh_work). > > hang at -> hangs in > > > And ufshcd_err_handler hang at wait rpm resume. > > hang at wait rpm resume -> hangs in rpm_resume(). > > > <ffffffdd78e02b34> schedule+0x110/0x204 > > <ffffffdd78e0be60> schedule_timeout+0x98/0x138 > > <ffffffdd78e040e8> wait_for_common_io+0x130/0x2d0 > > <ffffffdd77d6a000> blk_execute_rq+0x10c/0x16c > > <ffffffdd78126d90> __scsi_execute+0xfc/0x278 > > <ffffffdd7813891c> ufshcd_set_dev_pwr_mode+0x1c8/0x40c > > <ffffffdd78137d1c> __ufshcd_wl_resume+0xf0/0x5cc > > <ffffffdd78137ae0> ufshcd_wl_runtime_resume+0x40/0x18c > > <ffffffdd78136108> scsi_runtime_resume+0x88/0x104 > > <ffffffdd7809a4f8> __rpm_callback+0x1a0/0xaec > > <ffffffdd7809b624> rpm_resume+0x7e0/0xcd0 > > <ffffffdd7809a788> __rpm_callback+0x430/0xaec > > <ffffffdd7809b644> rpm_resume+0x800/0xcd0 > > <ffffffdd780a0778> pm_runtime_work+0x148/0x198 > > > > <ffffffdd78e02b34> schedule+0x110/0x204 > > <ffffffdd78e0be10> schedule_timeout+0x48/0x138 > > <ffffffdd78e03d9c> wait_for_common+0x144/0x2dc > > <ffffffdd7758bba4> __flush_work+0x3d0/0x508 > > <ffffffdd7815572c> ufshcd_eh_host_reset_handler+0x134/0x3a8 > > <ffffffdd781216f4> scsi_try_host_reset+0x54/0x204 > > <ffffffdd78120594> scsi_eh_ready_devs+0xb30/0xd48 > > <ffffffdd7812373c> scsi_error_handler+0x260/0x874 > > > > <ffffffdd78e02b34> schedule+0x110/0x204 > > <ffffffdd7809af64> rpm_resume+0x120/0xcd0 > > <ffffffdd7809fde8> __pm_runtime_resume+0xa0/0x17c > > <ffffffdd7815193c> ufshcd_err_handling_prepare+0x40/0x430 > > <ffffffdd7814cce8> ufshcd_err_handler+0x1c4/0xd4c > > On top of which kernel version has this patch been developed? > I think this deadlock has already been fixed by commit 7029e2151a7c > ("scsi: ufs: Fix a deadlock between PM and the SCSI error handler"). > Please check whether that commit by itself (without this patch) is > sufficient to fix the reported deadlock. > Hi Bart, 7029e2151a7c is not fix this dead lock, we still found this issue happen in kernel-6.1 I will update this patch and correct comment. Please have a look again. Thanks. Peter > > --- > > drivers/ufs/core/ufshcd.c | 17 +++++++++++++++++ > > 1 file changed, 17 insertions(+) > > The changelog is missing. Please include a changelog when posting v2 > or > a later version of a patch. > > > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c > > index e18c9f4463ec..0dfb9a35bf66 100644 > > --- a/drivers/ufs/core/ufshcd.c > > +++ b/drivers/ufs/core/ufshcd.c > > @@ -7366,6 +7366,23 @@ static int > > ufshcd_eh_host_reset_handler(struct scsi_cmnd *cmd) > > > > hba = shost_priv(cmd->device->host); > > > > + /* > > + * If pm op resume fail and wait err recovery, do link recovery > > only. > > + * Because schedule eh work will get dead lock in > > ufshcd_rpm_get_sync > > + * and wait wlun resume, but wlun resume error wait eh work > > finish. > > + */ > > The above comment has grammar issues and some parts are > incomprehensible. What does e.g. "wait err recovery" mean? Please > improve this source code comment. > > Thanks, > > Bart.
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index e18c9f4463ec..0dfb9a35bf66 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -7366,6 +7366,23 @@ static int ufshcd_eh_host_reset_handler(struct scsi_cmnd *cmd) hba = shost_priv(cmd->device->host); + /* + * If pm op resume fail and wait err recovery, do link recovery only. + * Because schedule eh work will get dead lock in ufshcd_rpm_get_sync + * and wait wlun resume, but wlun resume error wait eh work finish. + */ + if (hba->pm_op_in_progress && + hba->curr_dev_pwr_mode != UFS_ACTIVE_PWR_MODE) { + err = ufshcd_link_recovery(hba); + if (err) { + dev_err(hba->dev, "%s: power state %d pm_progress %d", + __func__, + hba->curr_dev_pwr_mode, + hba->pm_op_in_progress); + } + return err; + } + spin_lock_irqsave(hba->host->host_lock, flags); hba->force_reset = true; ufshcd_schedule_eh_work(hba);