Message ID | 20220721065833.26887-1-peter.wang@mediatek.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] scsi: ufs: correct ufshcd_shutdown flow | expand |
On 7/20/22 23:58, peter.wang@mediatek.com wrote: > Both ufshcd_shtdown and ufshcd_wl_shutdown could run concurrently. > And it have race condition when ufshcd_wl_shutdown on-going and > clock/power turn off by ufshcd_shutdown. > > The normal case: > ufshcd_wl_shutdown -> ufshcd_shtdown > ufshcd_shtdown should turn off clock/power. > > The abnormal case: > ufshcd_shtdown -> ufshcd_wl_shutdown How can this happen since device_shutdown() iterates over devices in the opposite order in which these have been created? Thanks, Bart.
On Sat, Jul 23, 2022 at 5:15 AM Bart Van Assche <bvanassche@acm.org> wrote: > > On 7/20/22 23:58, peter.wang@mediatek.com wrote: > > Both ufshcd_shtdown and ufshcd_wl_shutdown could run concurrently. > > And it have race condition when ufshcd_wl_shutdown on-going and > > clock/power turn off by ufshcd_shutdown. > > > > The normal case: > > ufshcd_wl_shutdown -> ufshcd_shtdown > > ufshcd_shtdown should turn off clock/power. > > > > The abnormal case: > > ufshcd_shtdown -> ufshcd_wl_shutdown > > How can this happen since device_shutdown() iterates over devices in the > opposite order in which these have been created? > Is it possible for more than one initiator to invoke kernel_power_off() at the same time? In this case, the order of device shutdown is not promised anymore because devices_kset is manipulated simultaneously by multiple initiators in device_shutdown(). > Thanks, > > Bart.
On 7/24/22 20:47, Peter Wang wrote: > Because kernel_restart is export, and mediatek may call kernel_restart ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Is this code upstream? > while shutdown is on going. > EXPORT_SYMBOL_GPL(kernel_restart); > kernel_restart -> kernel_restart_prepare -> device_shutdown > > So, there may have two thread execute device_shutdown concurrently. > And the list_lock in device_shutdown function is not protect shutdown > callback function, > caused two callback function(ufshcd_shutdown/ufshcd_wl_shutdown) could > run concurrently. > > Here is the error log that two thread in device_shutdown. > [37684.002227] [T1500641] platform +platform:112b0000.ufshci > kpoc_charger: ufshcd-mtk 112b0000.ufshci: [name:core&]shutdown > [37684.002264] [T1600339] scsi +scsi:0:0:0:49488 charger_thread: > ufs_device_wlun 0:0:0:49488: [name:core&]shutdown Hi Peter, I had not yet taken a look at the kernel_restart() function. Now that I have taken a look, it seems to me that kernel_restart() calls must be serialized via the system_transition_mutex. Please make sure that the kernel_restart() calls are serialized. Thanks, Bart.
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index c7b337480e3e..47b639fd28b9 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -58,6 +58,9 @@ /* Task management command timeout */ #define TM_CMD_TIMEOUT 100 /* msecs */ +/* Shutdown wait devcie into power off timeout */ +#define UFS_SHUTDOWN_TIMEOUT 500 /* msecs */ + /* maximum number of retries for a general UIC command */ #define UFS_UIC_COMMAND_RETRIES 3 @@ -9461,10 +9464,15 @@ EXPORT_SYMBOL(ufshcd_runtime_resume); */ int ufshcd_shutdown(struct ufs_hba *hba) { - if (ufshcd_is_ufs_dev_poweroff(hba) && ufshcd_is_link_off(hba)) - goto out; + unsigned long timeout; - pm_runtime_get_sync(hba->dev); + /* Wait ufshcd_wl_shutdown clear ufs state */ + timeout = jiffies + msecs_to_jiffies(UFS_SHUTDOWN_TIMEOUT); + while (!ufshcd_is_ufs_dev_poweroff(hba) || !ufshcd_is_link_off(hba)) { + if (time_after(jiffies, timeout)) + goto out; + msleep(1); + } ufshcd_suspend(hba); out: