Message ID | 20240712094355.21572-1-peter.wang@mediatek.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v1] ufs: core: fix deadlock when rtc update | expand |
On Fri, 2024-07-12 at 17:43 +0800, peter.wang@mediatek.com wrote: > From: Peter Wang <peter.wang@mediatek.com> > > Three have deadlock when runtime suspend wait flush rtc work, > and rtc work call ufshcd_rpm_put_sync to wait runtime resume. Reviewed-by: Bean Huo <beanhuo@micron.com>
> @@ -8188,8 +8188,15 @@ static void ufshcd_rtc_work(struct work_struct > *work) > > hba = container_of(to_delayed_work(work), struct ufs_hba, > ufs_rtc_update_work); Will returning here If (!ufshcd_is_ufs_dev_active(hba)) works? And remove it in the 2nd if clause? Thanks, Avri > > - /* Update RTC only when there are no requests in progress and UFSHCI is > operational */ > - if (!ufshcd_is_ufs_dev_busy(hba) && hba->ufshcd_state == > UFSHCD_STATE_OPERATIONAL) > + /* > + * Update RTC only when > + * 1. there are no requests in progress > + * 2. UFSHCI is operational > + * 3. pm operation is not in progress > + */ > + if (!ufshcd_is_ufs_dev_busy(hba) && > + hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL && > + !hba->pm_op_in_progress) > ufshcd_update_rtc(hba); > > if (ufshcd_is_ufs_dev_active(hba) && hba->dev_info.rtc_update_period) > -- > 2.18.0
On Fri, 2024-07-12 at 10:33 +0000, Avri Altman wrote: > > @@ -8188,8 +8188,15 @@ static void ufshcd_rtc_work(struct > > work_struct > > *work) > > > > hba = container_of(to_delayed_work(work), struct ufs_hba, > > ufs_rtc_update_work); > Will returning here If (!ufshcd_is_ufs_dev_active(hba)) works? > And remove it in the 2nd if clause? Avri, we need to reschedule next time work in the below code. if return, cannot. whatelse I missed? kind regards, Bean
> > On Fri, 2024-07-12 at 10:33 +0000, Avri Altman wrote: > > > @@ -8188,8 +8188,15 @@ static void ufshcd_rtc_work(struct > > > work_struct > > > *work) > > > > > > hba = container_of(to_delayed_work(work), struct ufs_hba, > > > ufs_rtc_update_work); > > Will returning here If (!ufshcd_is_ufs_dev_active(hba)) works? > > And remove it in the 2nd if clause? > > Avri, > > we need to reschedule next time work in the below code. if return, cannot. > > whatelse I missed? a) If (!ufshcd_is_ufs_dev_active(hba)) - will not schedule ? b) schedule on next __ufshcd_wl_resume? > > kind regards, > Bean
On 7/12/24 2:43 AM, peter.wang@mediatek.com wrote: > Three have deadlock when runtime suspend wait flush rtc work, > and rtc work call ufshcd_rpm_put_sync to wait runtime resume. "Three have"? The above description is very hard to understand. Please improve it. > - /* Update RTC only when there are no requests in progress and UFSHCI is operational */ > - if (!ufshcd_is_ufs_dev_busy(hba) && hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL) > + /* > + * Update RTC only when > + * 1. there are no requests in progress > + * 2. UFSHCI is operational > + * 3. pm operation is not in progress > + */ > + if (!ufshcd_is_ufs_dev_busy(hba) && > + hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL && > + !hba->pm_op_in_progress) > ufshcd_update_rtc(hba); > > if (ufshcd_is_ufs_dev_active(hba) && hba->dev_info.rtc_update_period) The above seems racy to me. I don't think there is any mechanism that prevents that hba->pm_op_in_progress is set after it has been checked and before ufshcd_update_rtc() is called. Has it been considered to add an ufshcd_rpm_get_sync_nowait() call before the hba->pm_op_in_progress check and a ufshcd_rpm_put_sync() call after the ufshcd_update_rtc() call? Thanks, Bart.
On Fri, 2024-07-12 at 12:31 +0000, Avri Altman wrote: > > > > hba = container_of(to_delayed_work(work), struct > > > > ufs_hba, > > > > ufs_rtc_update_work); > > > Will returning here If (!ufshcd_is_ufs_dev_active(hba)) works? > > > And remove it in the 2nd if clause? > > > > Avri, > > > > we need to reschedule next time work in the below code. if return, > > cannot. > > > > whatelse I missed? > a) If (!ufshcd_is_ufs_dev_active(hba)) - will not schedule ? > b) schedule on next __ufshcd_wl_resume? hba->pm_op_in_progress is true during __ufshcd_wl_resume(), will not schedule update work.
On Fri, 2024-07-12 at 10:34 -0700, Bart Van Assche wrote: > > - /* Update RTC only when there are no requests in progress > > and UFSHCI is operational */ > > - if (!ufshcd_is_ufs_dev_busy(hba) && hba->ufshcd_state == > > UFSHCD_STATE_OPERATIONAL) > > + /* > > + * Update RTC only when > > + * 1. there are no requests in progress > > + * 2. UFSHCI is operational > > + * 3. pm operation is not in progress > > + */ > > + if (!ufshcd_is_ufs_dev_busy(hba) && > > + hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL && > > + !hba->pm_op_in_progress) > > ufshcd_update_rtc(hba); > > > > if (ufshcd_is_ufs_dev_active(hba) && hba- > > >dev_info.rtc_update_period) > > The above seems racy to me. I don't think there is any mechanism that > prevents that hba->pm_op_in_progress is set after it has been checked > and before ufshcd_update_rtc() is called. Has it been considered to > add > an ufshcd_rpm_get_sync_nowait() call before the hba- > >pm_op_in_progress > check and a ufshcd_rpm_put_sync() call after the ufshcd_update_rtc() > call? > > Thanks, > > Bart. Bart, do you want this: diff --git a/drivers/ufs/core/ufshcd-priv.h b/drivers/ufs/core/ufshcd- priv.h index ce36154ce963..2b74d6329b9d 100644 --- a/drivers/ufs/core/ufshcd-priv.h +++ b/drivers/ufs/core/ufshcd-priv.h @@ -311,6 +311,25 @@ static inline int ufshcd_update_ee_usr_mask(struct ufs_hba *hba, &hba->ee_drv_mask, set, clr); } +static inline int ufshcd_rpm_get_sync_nowait(struct ufs_hba *hba) +{ + int ret = 0; + struct device *dev = &hba->ufs_device_wlun->sdev_gendev; + + pm_runtime_get_noresume(dev); + + /* Check if the device is already active */ + if (pm_runtime_active(dev)) + return 0; + + /* Attempt to resume the device without blocking */ + ret = pm_request_resume(dev); + if (ret < 0) + pm_runtime_put_noidle(dev); + + return ret; +} + static inline int ufshcd_rpm_get_sync(struct ufs_hba *hba) { return pm_runtime_get_sync(&hba->ufs_device_wlun->sdev_gendev); diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index bea00e069e9a..1b7fc4ce9e5c 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -8209,10 +8209,8 @@ static void ufshcd_update_rtc(struct ufs_hba *hba) */ val = ts64.tv_sec - hba->dev_info.rtc_time_baseline; - ufshcd_rpm_get_sync(hba); err = ufshcd_query_attr(hba, UPIU_QUERY_OPCODE_WRITE_ATTR, QUERY_ATTR_IDN_SECONDS_PASSED, 0, 0, &val); - ufshcd_rpm_put_sync(hba); if (err) dev_err(hba->dev, "%s: Failed to update rtc %d\n", __func__, err); @@ -8226,10 +8224,14 @@ static void ufshcd_rtc_work(struct work_struct *work) hba = container_of(to_delayed_work(work), struct ufs_hba, ufs_rtc_update_work); + if (ufshcd_rpm_get_sync_nowait(hba)) + goto out; + /* Update RTC only when there are no requests in progress and UFSHCI is operational */ if (!ufshcd_is_ufs_dev_busy(hba) && hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL) ufshcd_update_rtc(hba); - + ufshcd_rpm_put_sync(hba); +out: if (ufshcd_is_ufs_dev_active(hba) && hba- >dev_info.rtc_update_period) schedule_delayed_work(&hba->ufs_rtc_update_work, msecs_to_jiffies(hba- >dev_info.rtc_update_period)); (END) or can we change cancel_delayed_work_sync(&hba->ufs_rtc_update_work); to cancel_delayed_work(&hba->ufs_rtc_update_work); ?? kind regards, Bean
On Fri, 2024-07-12 at 12:31 +0000, Avri Altman wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > > > > On Fri, 2024-07-12 at 10:33 +0000, Avri Altman wrote: > > > > @@ -8188,8 +8188,15 @@ static void ufshcd_rtc_work(struct > > > > work_struct > > > > *work) > > > > > > > > hba = container_of(to_delayed_work(work), struct > ufs_hba, > > > > ufs_rtc_update_work); > > > Will returning here If (!ufshcd_is_ufs_dev_active(hba)) works? > > > And remove it in the 2nd if clause? > > > > Avri, > > > > we need to reschedule next time work in the below code. if return, > cannot. > > > > whatelse I missed? > a) If (!ufshcd_is_ufs_dev_active(hba)) - will not schedule ? > b) schedule on next __ufshcd_wl_resume? > Hi Avri, Yes, if dev is not active (RPM state is not RPM_ACTIVE), will not schedule rtc work and schedule on next __ufshcd_wl_resume. Thanks. Peter > > > > kind regards, > > Bean >
On Fri, 2024-07-12 at 10:34 -0700, Bart Van Assche wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > On 7/12/24 2:43 AM, peter.wang@mediatek.com wrote: > > Three have deadlock when runtime suspend wait flush rtc work, > > and rtc work call ufshcd_rpm_put_sync to wait runtime resume. > > "Three have"? The above description is very hard to understand. > Please > improve it. Hi Bart, Sorry, will improve the description next version. > > > - /* Update RTC only when there are no requests in progress and > UFSHCI is operational */ > > -if (!ufshcd_is_ufs_dev_busy(hba) && hba->ufshcd_state == > UFSHCD_STATE_OPERATIONAL) > > + /* > > + * Update RTC only when > > + * 1. there are no requests in progress > > + * 2. UFSHCI is operational > > + * 3. pm operation is not in progress > > + */ > > +if (!ufshcd_is_ufs_dev_busy(hba) && > > + hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL && > > + !hba->pm_op_in_progress) > > ufshcd_update_rtc(hba); > > > > if (ufshcd_is_ufs_dev_active(hba) && hba- > >dev_info.rtc_update_period) > > The above seems racy to me. I don't think there is any mechanism that > prevents that hba->pm_op_in_progress is set after it has been checked > and before ufshcd_update_rtc() is called. Has it been considered to > add > an ufshcd_rpm_get_sync_nowait() call before the hba- > >pm_op_in_progress > check and a ufshcd_rpm_put_sync() call after the ufshcd_update_rtc() > call? > > Thanks, > > Bart. Yes, check pm_op_in_progress still cannot guarantee the absence of race conditions. But use ufshcd_rpm_get_sync_nowait might be a bit complicated. How about use ufshcd_rpm_get_if_active? I will update next version. Thanks. Peter
On Mon, 2024-07-15 at 00:37 +0200, Bean Huo wrote: > > or can we change cancel_delayed_work_sync(&hba->ufs_rtc_update_work); > to cancel_delayed_work(&hba->ufs_rtc_update_work); ?? > > > > kind regards, > Bean Hi Bean, Using cancel_delayed_work instead of cancel_delayed_work_sync could work, but it may lead to wasted time resume, RTC update, and suspend again. It has increased the system's power consumption. Thanks. Peter
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index 46433ecf0c4d..bfcf2d468b5e 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -8188,8 +8188,15 @@ static void ufshcd_rtc_work(struct work_struct *work) hba = container_of(to_delayed_work(work), struct ufs_hba, ufs_rtc_update_work); - /* Update RTC only when there are no requests in progress and UFSHCI is operational */ - if (!ufshcd_is_ufs_dev_busy(hba) && hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL) + /* + * Update RTC only when + * 1. there are no requests in progress + * 2. UFSHCI is operational + * 3. pm operation is not in progress + */ + if (!ufshcd_is_ufs_dev_busy(hba) && + hba->ufshcd_state == UFSHCD_STATE_OPERATIONAL && + !hba->pm_op_in_progress) ufshcd_update_rtc(hba); if (ufshcd_is_ufs_dev_active(hba) && hba->dev_info.rtc_update_period)