Message ID | 20231127162022.518834-2-kvalo@kernel.org (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Kalle Valo |
Headers | show |
Series | wifi: ath11k: hibernation support | expand |
On Mon, Nov 27, 2023 at 06:20:15PM +0200, Kalle Valo wrote: > From: Baochen Qiang <quic_bqiang@quicinc.com> > > If ath11k tries to call mhi_power_up() during resume it fails: > > ath11k_pci 0000:06:00.0: timeout while waiting for restart complete > > This happens because when calling mhi_power_up() the MHI subsystem eventually > calls device_add() from mhi_create_devices() but the device creation is > deferred: > > mhi mhi0_IPCR: Driver qcom_mhi_qrtr force probe deferral > > The reason for deferring device creation is explained in dpm_prepare(): > > /* > * It is unsafe if probing of devices will happen during suspend or > * hibernation and system behavior will be unpredictable in this case. > * So, let's prohibit device's probing here and defer their probes > * instead. The normal behavior will be restored in dpm_complete(). > */ > > Because the device probe is deferred, the qcom_mhi_qrtr_probe() is not called and > qcom_mhi_qrtr_dl_callback() fails silently as qdev is zero: > > static void qcom_mhi_qrtr_dl_callback(struct mhi_device *mhi_dev, > struct mhi_result *mhi_res) > { > struct qrtr_mhi_dev *qdev = dev_get_drvdata(&mhi_dev->dev); > int rc; > > if (!qdev || mhi_res->transaction_status) > return; > > So what this means that QRTR is not delivering messages and the QMI connection > is not working between ath11k and the firmware, resulting a failure in firmware > initialisation. > > To fix this add new function mhi_power_down_no_destroy() which does not destroy > the devices during power down. This way mhi_power_up() can be called during > resume and we can get ath11k hibernation working with the following patches. > > Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.30 > > Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> > Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Any reason for reposting this series without discussing the suggestion from Mayank? As I said in the internal thread, this patch breaks the Linux device driver model by not destroying the "struct device" when the actual device gets removed. We should try to explore alternate options instead of persisting with this solution. - Mani > --- > drivers/bus/mhi/host/init.c | 1 + > drivers/bus/mhi/host/internal.h | 1 + > drivers/bus/mhi/host/pm.c | 26 +++++++++++++++++++------- > include/linux/mhi.h | 29 +++++++++++++++++++++++++++-- > 4 files changed, 48 insertions(+), 9 deletions(-) > > diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c > index 65ceac1837f9..e626b03ffafa 100644 > --- a/drivers/bus/mhi/host/init.c > +++ b/drivers/bus/mhi/host/init.c > @@ -43,6 +43,7 @@ const char * const dev_state_tran_str[DEV_ST_TRANSITION_MAX] = { > [DEV_ST_TRANSITION_FP] = "FLASH PROGRAMMER", > [DEV_ST_TRANSITION_SYS_ERR] = "SYS ERROR", > [DEV_ST_TRANSITION_DISABLE] = "DISABLE", > + [DEV_ST_TRANSITION_DISABLE_DESTROY_DEVICE] = "DISABLE (DESTROY DEVICE)", > }; > > const char * const mhi_ch_state_type_str[MHI_CH_STATE_TYPE_MAX] = { > diff --git a/drivers/bus/mhi/host/internal.h b/drivers/bus/mhi/host/internal.h > index 30ac415a3000..3f45c9c447bd 100644 > --- a/drivers/bus/mhi/host/internal.h > +++ b/drivers/bus/mhi/host/internal.h > @@ -69,6 +69,7 @@ enum dev_st_transition { > DEV_ST_TRANSITION_FP, > DEV_ST_TRANSITION_SYS_ERR, > DEV_ST_TRANSITION_DISABLE, > + DEV_ST_TRANSITION_DISABLE_DESTROY_DEVICE, > DEV_ST_TRANSITION_MAX, > }; > > diff --git a/drivers/bus/mhi/host/pm.c b/drivers/bus/mhi/host/pm.c > index a2f2feef1476..8833b0248393 100644 > --- a/drivers/bus/mhi/host/pm.c > +++ b/drivers/bus/mhi/host/pm.c > @@ -458,7 +458,8 @@ static int mhi_pm_mission_mode_transition(struct mhi_controller *mhi_cntrl) > } > > /* Handle shutdown transitions */ > -static void mhi_pm_disable_transition(struct mhi_controller *mhi_cntrl) > +static void mhi_pm_disable_transition(struct mhi_controller *mhi_cntrl, > + bool destroy_device) > { > enum mhi_pm_state cur_state; > struct mhi_event *mhi_event; > @@ -520,8 +521,10 @@ static void mhi_pm_disable_transition(struct mhi_controller *mhi_cntrl) > dev_dbg(dev, "Waiting for all pending threads to complete\n"); > wake_up_all(&mhi_cntrl->state_event); > > - dev_dbg(dev, "Reset all active channels and remove MHI devices\n"); > - device_for_each_child(&mhi_cntrl->mhi_dev->dev, NULL, mhi_destroy_device); > + if (destroy_device) { > + dev_dbg(dev, "Reset all active channels and remove MHI devices\n"); > + device_for_each_child(&mhi_cntrl->mhi_dev->dev, NULL, mhi_destroy_device); > + } > > mutex_lock(&mhi_cntrl->pm_mutex); > > @@ -806,7 +809,10 @@ void mhi_pm_st_worker(struct work_struct *work) > mhi_pm_sys_error_transition(mhi_cntrl); > break; > case DEV_ST_TRANSITION_DISABLE: > - mhi_pm_disable_transition(mhi_cntrl); > + mhi_pm_disable_transition(mhi_cntrl, false); > + break; > + case DEV_ST_TRANSITION_DISABLE_DESTROY_DEVICE: > + mhi_pm_disable_transition(mhi_cntrl, true); > break; > default: > break; > @@ -1160,7 +1166,8 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl) > } > EXPORT_SYMBOL_GPL(mhi_async_power_up); > > -void mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful) > +void __mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful, > + bool destroy_device) > { > enum mhi_pm_state cur_state, transition_state; > struct device *dev = &mhi_cntrl->mhi_dev->dev; > @@ -1196,14 +1203,19 @@ void mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful) > write_unlock_irq(&mhi_cntrl->pm_lock); > mutex_unlock(&mhi_cntrl->pm_mutex); > > - mhi_queue_state_transition(mhi_cntrl, DEV_ST_TRANSITION_DISABLE); > + if (destroy_device) > + mhi_queue_state_transition(mhi_cntrl, > + DEV_ST_TRANSITION_DISABLE_DESTROY_DEVICE); > + else > + mhi_queue_state_transition(mhi_cntrl, > + DEV_ST_TRANSITION_DISABLE); > > /* Wait for shutdown to complete */ > flush_work(&mhi_cntrl->st_worker); > > disable_irq(mhi_cntrl->irq[0]); > } > -EXPORT_SYMBOL_GPL(mhi_power_down); > +EXPORT_SYMBOL_GPL(__mhi_power_down); > > int mhi_sync_power_up(struct mhi_controller *mhi_cntrl) > { > diff --git a/include/linux/mhi.h b/include/linux/mhi.h > index d0f9b522f328..ae092bc8b97e 100644 > --- a/include/linux/mhi.h > +++ b/include/linux/mhi.h > @@ -648,12 +648,37 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl); > */ > int mhi_sync_power_up(struct mhi_controller *mhi_cntrl); > > +void __mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful, > + bool destroy_device); > + > /** > - * mhi_power_down - Start MHI power down sequence > + * mhi_power_down - Start MHI power down sequence. See also > + * mhi_power_down_no_destroy() which is a variant of this for suspend. > + * > * @mhi_cntrl: MHI controller > * @graceful: Link is still accessible, so do a graceful shutdown process > */ > -void mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful); > +static inline void mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful) > +{ > + __mhi_power_down(mhi_cntrl, graceful, true); > +} > + > +/** > + * mhi_power_down_no_destroy - Start MHI power down sequence but don't > + * destroy struct devices. This is a variant for mhi_power_down() and is a > + * workaround to make it possible to use mhi_power_up() in a resume > + * handler. When using this variant the caller must also call > + * mhi_prepare_all_for_transfer_autoqueue() and > + * mhi_unprepare_all_from_transfer(). > + * > + * @mhi_cntrl: MHI controller > + * @graceful: Link is still accessible, so do a graceful shutdown process > + */ > +static inline void mhi_power_down_no_destroy(struct mhi_controller *mhi_cntrl, > + bool graceful) > +{ > + __mhi_power_down(mhi_cntrl, graceful, false); > +} > > /** > * mhi_unprepare_after_power_down - Free any allocated memory after power down > -- > 2.39.2 > >
On 11/30/2023 1:42 PM, Manivannan Sadhasivam wrote: > On Mon, Nov 27, 2023 at 06:20:15PM +0200, Kalle Valo wrote: >> From: Baochen Qiang <quic_bqiang@quicinc.com> >> >> If ath11k tries to call mhi_power_up() during resume it fails: >> >> ath11k_pci 0000:06:00.0: timeout while waiting for restart complete >> >> This happens because when calling mhi_power_up() the MHI subsystem eventually >> calls device_add() from mhi_create_devices() but the device creation is >> deferred: >> >> mhi mhi0_IPCR: Driver qcom_mhi_qrtr force probe deferral >> >> The reason for deferring device creation is explained in dpm_prepare(): >> >> /* >> * It is unsafe if probing of devices will happen during suspend or >> * hibernation and system behavior will be unpredictable in this case. >> * So, let's prohibit device's probing here and defer their probes >> * instead. The normal behavior will be restored in dpm_complete(). >> */ >> >> Because the device probe is deferred, the qcom_mhi_qrtr_probe() is not called and >> qcom_mhi_qrtr_dl_callback() fails silently as qdev is zero: >> >> static void qcom_mhi_qrtr_dl_callback(struct mhi_device *mhi_dev, >> struct mhi_result *mhi_res) >> { >> struct qrtr_mhi_dev *qdev = dev_get_drvdata(&mhi_dev->dev); >> int rc; >> >> if (!qdev || mhi_res->transaction_status) >> return; >> >> So what this means that QRTR is not delivering messages and the QMI connection >> is not working between ath11k and the firmware, resulting a failure in firmware >> initialisation. >> >> To fix this add new function mhi_power_down_no_destroy() which does not destroy >> the devices during power down. This way mhi_power_up() can be called during >> resume and we can get ath11k hibernation working with the following patches. >> >> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.30 >> >> Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> >> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> > > Any reason for reposting this series without discussing the suggestion from > Mayank? As I said in the internal thread, this patch breaks the Linux device > driver model by not destroying the "struct device" when the actual device gets > removed. We should try to explore alternate options instead of persisting with > this solution. > > - Mani I shared my opinion on Mayank's proposal, but not noticed it is in internal thead. I will repost here: Mayank's proposal assumes that at the time PM_POST_HIBERNATION is posted, all deferred probe is unblocked and completed, however which is not true. Yes, kernel's sequence is first unblock probe defer and trigger probe again, then post PM_POST_HIBERNATION message. But the problem here is that the probe-trigger is delegated to an work item and returns immediately, kernel does not wait for all probe to complete before posting PM_POST_HIBERNATION. So it is not guaranteed that an MHI device is probed at the time we get that message. > >> --- >> drivers/bus/mhi/host/init.c | 1 + >> drivers/bus/mhi/host/internal.h | 1 + >> drivers/bus/mhi/host/pm.c | 26 +++++++++++++++++++------- >> include/linux/mhi.h | 29 +++++++++++++++++++++++++++-- >> 4 files changed, 48 insertions(+), 9 deletions(-) >> >> diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c >> index 65ceac1837f9..e626b03ffafa 100644 >> --- a/drivers/bus/mhi/host/init.c >> +++ b/drivers/bus/mhi/host/init.c >> @@ -43,6 +43,7 @@ const char * const dev_state_tran_str[DEV_ST_TRANSITION_MAX] = { >> [DEV_ST_TRANSITION_FP] = "FLASH PROGRAMMER", >> [DEV_ST_TRANSITION_SYS_ERR] = "SYS ERROR", >> [DEV_ST_TRANSITION_DISABLE] = "DISABLE", >> + [DEV_ST_TRANSITION_DISABLE_DESTROY_DEVICE] = "DISABLE (DESTROY DEVICE)", >> }; >> >> const char * const mhi_ch_state_type_str[MHI_CH_STATE_TYPE_MAX] = { >> diff --git a/drivers/bus/mhi/host/internal.h b/drivers/bus/mhi/host/internal.h >> index 30ac415a3000..3f45c9c447bd 100644 >> --- a/drivers/bus/mhi/host/internal.h >> +++ b/drivers/bus/mhi/host/internal.h >> @@ -69,6 +69,7 @@ enum dev_st_transition { >> DEV_ST_TRANSITION_FP, >> DEV_ST_TRANSITION_SYS_ERR, >> DEV_ST_TRANSITION_DISABLE, >> + DEV_ST_TRANSITION_DISABLE_DESTROY_DEVICE, >> DEV_ST_TRANSITION_MAX, >> }; >> >> diff --git a/drivers/bus/mhi/host/pm.c b/drivers/bus/mhi/host/pm.c >> index a2f2feef1476..8833b0248393 100644 >> --- a/drivers/bus/mhi/host/pm.c >> +++ b/drivers/bus/mhi/host/pm.c >> @@ -458,7 +458,8 @@ static int mhi_pm_mission_mode_transition(struct mhi_controller *mhi_cntrl) >> } >> >> /* Handle shutdown transitions */ >> -static void mhi_pm_disable_transition(struct mhi_controller *mhi_cntrl) >> +static void mhi_pm_disable_transition(struct mhi_controller *mhi_cntrl, >> + bool destroy_device) >> { >> enum mhi_pm_state cur_state; >> struct mhi_event *mhi_event; >> @@ -520,8 +521,10 @@ static void mhi_pm_disable_transition(struct mhi_controller *mhi_cntrl) >> dev_dbg(dev, "Waiting for all pending threads to complete\n"); >> wake_up_all(&mhi_cntrl->state_event); >> >> - dev_dbg(dev, "Reset all active channels and remove MHI devices\n"); >> - device_for_each_child(&mhi_cntrl->mhi_dev->dev, NULL, mhi_destroy_device); >> + if (destroy_device) { >> + dev_dbg(dev, "Reset all active channels and remove MHI devices\n"); >> + device_for_each_child(&mhi_cntrl->mhi_dev->dev, NULL, mhi_destroy_device); >> + } >> >> mutex_lock(&mhi_cntrl->pm_mutex); >> >> @@ -806,7 +809,10 @@ void mhi_pm_st_worker(struct work_struct *work) >> mhi_pm_sys_error_transition(mhi_cntrl); >> break; >> case DEV_ST_TRANSITION_DISABLE: >> - mhi_pm_disable_transition(mhi_cntrl); >> + mhi_pm_disable_transition(mhi_cntrl, false); >> + break; >> + case DEV_ST_TRANSITION_DISABLE_DESTROY_DEVICE: >> + mhi_pm_disable_transition(mhi_cntrl, true); >> break; >> default: >> break; >> @@ -1160,7 +1166,8 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl) >> } >> EXPORT_SYMBOL_GPL(mhi_async_power_up); >> >> -void mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful) >> +void __mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful, >> + bool destroy_device) >> { >> enum mhi_pm_state cur_state, transition_state; >> struct device *dev = &mhi_cntrl->mhi_dev->dev; >> @@ -1196,14 +1203,19 @@ void mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful) >> write_unlock_irq(&mhi_cntrl->pm_lock); >> mutex_unlock(&mhi_cntrl->pm_mutex); >> >> - mhi_queue_state_transition(mhi_cntrl, DEV_ST_TRANSITION_DISABLE); >> + if (destroy_device) >> + mhi_queue_state_transition(mhi_cntrl, >> + DEV_ST_TRANSITION_DISABLE_DESTROY_DEVICE); >> + else >> + mhi_queue_state_transition(mhi_cntrl, >> + DEV_ST_TRANSITION_DISABLE); >> >> /* Wait for shutdown to complete */ >> flush_work(&mhi_cntrl->st_worker); >> >> disable_irq(mhi_cntrl->irq[0]); >> } >> -EXPORT_SYMBOL_GPL(mhi_power_down); >> +EXPORT_SYMBOL_GPL(__mhi_power_down); >> >> int mhi_sync_power_up(struct mhi_controller *mhi_cntrl) >> { >> diff --git a/include/linux/mhi.h b/include/linux/mhi.h >> index d0f9b522f328..ae092bc8b97e 100644 >> --- a/include/linux/mhi.h >> +++ b/include/linux/mhi.h >> @@ -648,12 +648,37 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl); >> */ >> int mhi_sync_power_up(struct mhi_controller *mhi_cntrl); >> >> +void __mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful, >> + bool destroy_device); >> + >> /** >> - * mhi_power_down - Start MHI power down sequence >> + * mhi_power_down - Start MHI power down sequence. See also >> + * mhi_power_down_no_destroy() which is a variant of this for suspend. >> + * >> * @mhi_cntrl: MHI controller >> * @graceful: Link is still accessible, so do a graceful shutdown process >> */ >> -void mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful); >> +static inline void mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful) >> +{ >> + __mhi_power_down(mhi_cntrl, graceful, true); >> +} >> + >> +/** >> + * mhi_power_down_no_destroy - Start MHI power down sequence but don't >> + * destroy struct devices. This is a variant for mhi_power_down() and is a >> + * workaround to make it possible to use mhi_power_up() in a resume >> + * handler. When using this variant the caller must also call >> + * mhi_prepare_all_for_transfer_autoqueue() and >> + * mhi_unprepare_all_from_transfer(). >> + * >> + * @mhi_cntrl: MHI controller >> + * @graceful: Link is still accessible, so do a graceful shutdown process >> + */ >> +static inline void mhi_power_down_no_destroy(struct mhi_controller *mhi_cntrl, >> + bool graceful) >> +{ >> + __mhi_power_down(mhi_cntrl, graceful, false); >> +} >> >> /** >> * mhi_unprepare_after_power_down - Free any allocated memory after power down >> -- >> 2.39.2 >> >> >
Manivannan Sadhasivam <mani@kernel.org> writes: > On Mon, Nov 27, 2023 at 06:20:15PM +0200, Kalle Valo wrote: > >> From: Baochen Qiang <quic_bqiang@quicinc.com> >> >> If ath11k tries to call mhi_power_up() during resume it fails: >> >> ath11k_pci 0000:06:00.0: timeout while waiting for restart complete >> >> This happens because when calling mhi_power_up() the MHI subsystem eventually >> calls device_add() from mhi_create_devices() but the device creation is >> deferred: >> >> mhi mhi0_IPCR: Driver qcom_mhi_qrtr force probe deferral >> >> The reason for deferring device creation is explained in dpm_prepare(): >> >> /* >> * It is unsafe if probing of devices will happen during suspend or >> * hibernation and system behavior will be unpredictable in this case. >> * So, let's prohibit device's probing here and defer their probes >> * instead. The normal behavior will be restored in dpm_complete(). >> */ >> >> Because the device probe is deferred, the qcom_mhi_qrtr_probe() is not called and >> qcom_mhi_qrtr_dl_callback() fails silently as qdev is zero: >> >> static void qcom_mhi_qrtr_dl_callback(struct mhi_device *mhi_dev, >> struct mhi_result *mhi_res) >> { >> struct qrtr_mhi_dev *qdev = dev_get_drvdata(&mhi_dev->dev); >> int rc; >> >> if (!qdev || mhi_res->transaction_status) >> return; >> >> So what this means that QRTR is not delivering messages and the QMI connection >> is not working between ath11k and the firmware, resulting a failure in firmware >> initialisation. >> >> To fix this add new function mhi_power_down_no_destroy() which does not destroy >> the devices during power down. This way mhi_power_up() can be called during >> resume and we can get ath11k hibernation working with the following patches. >> >> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.30 >> >> Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> >> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> > > Any reason for reposting this series without discussing the suggestion from > Mayank? Baochen quickly sent me fixes for the v1 review comments, as I have been out of office for some time I didn't want to sit on Baochen's fixes for too long. Better to get them out of the door as soon as possible. I will definitely look at Mayank's proposal but that will take longer. > As I said in the internal thread, this patch breaks the Linux device > driver model by not destroying the "struct device" when the actual > device gets removed. This patchset has been tested by several people, I'm even using this patchset on main laptop every day, and we haven't noticed any issues. Can you elaborate more about this driver model? We are not removing any ath11k devices, we just want to power down the ath11k (and in the future ath12k) devices for suspend and power up during resume. > We should try to explore alternate options instead of persisting with > this solution. What other options we have here? At least Baochen is not optimistic that using PM_POST_HIBERNATION as a workaround would work. The issue we have here is that mhi_power_up() doesn't work in the resume handler and that's what we should try to fix, not make workarounds.
On 12/5/2023 4:29 AM, Kalle Valo wrote: > Manivannan Sadhasivam <mani@kernel.org> writes: > >> On Mon, Nov 27, 2023 at 06:20:15PM +0200, Kalle Valo wrote: >> >>> From: Baochen Qiang <quic_bqiang@quicinc.com> >>> >>> If ath11k tries to call mhi_power_up() during resume it fails: >>> >>> ath11k_pci 0000:06:00.0: timeout while waiting for restart complete >>> >>> This happens because when calling mhi_power_up() the MHI subsystem eventually >>> calls device_add() from mhi_create_devices() but the device creation is >>> deferred: >>> >>> mhi mhi0_IPCR: Driver qcom_mhi_qrtr force probe deferral >>> >>> The reason for deferring device creation is explained in dpm_prepare(): >>> >>> /* >>> * It is unsafe if probing of devices will happen during suspend or >>> * hibernation and system behavior will be unpredictable in this case. >>> * So, let's prohibit device's probing here and defer their probes >>> * instead. The normal behavior will be restored in dpm_complete(). >>> */ >>> >>> Because the device probe is deferred, the qcom_mhi_qrtr_probe() is not called and >>> qcom_mhi_qrtr_dl_callback() fails silently as qdev is zero: >>> >>> static void qcom_mhi_qrtr_dl_callback(struct mhi_device *mhi_dev, >>> struct mhi_result *mhi_res) >>> { >>> struct qrtr_mhi_dev *qdev = dev_get_drvdata(&mhi_dev->dev); >>> int rc; >>> >>> if (!qdev || mhi_res->transaction_status) >>> return; >>> >>> So what this means that QRTR is not delivering messages and the QMI connection >>> is not working between ath11k and the firmware, resulting a failure in firmware >>> initialisation. >>> >>> To fix this add new function mhi_power_down_no_destroy() which does not destroy >>> the devices during power down. This way mhi_power_up() can be called during >>> resume and we can get ath11k hibernation working with the following patches. >>> >>> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.30 >>> >>> Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> >>> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> >> >> Any reason for reposting this series without discussing the suggestion from >> Mayank? > > Baochen quickly sent me fixes for the v1 review comments, as I have been > out of office for some time I didn't want to sit on Baochen's fixes for > too long. Better to get them out of the door as soon as possible. I will > definitely look at Mayank's proposal but that will take longer. > >> As I said in the internal thread, this patch breaks the Linux device >> driver model by not destroying the "struct device" when the actual >> device gets removed. > > This patchset has been tested by several people, I'm even using this > patchset on main laptop every day, and we haven't noticed any issues. > > Can you elaborate more about this driver model? We are not removing any > ath11k devices, we just want to power down the ath11k (and in the future > ath12k) devices for suspend and power up during resume. > >> We should try to explore alternate options instead of persisting with >> this solution. > > What other options we have here? At least Baochen is not optimistic that > using PM_POST_HIBERNATION as a workaround would work. The issue we have > here is that mhi_power_up() doesn't work in the resume handler and > that's what we should try to fix, not make workarounds. > Adding Mayank directly plus others to this discussion since we need a solution to have proper hibernation support for devices containing ath11k (and in the near future ath12k). /jeff /jeff
On Tue, Dec 05, 2023 at 02:29:33PM +0200, Kalle Valo wrote: > Manivannan Sadhasivam <mani@kernel.org> writes: > > > On Mon, Nov 27, 2023 at 06:20:15PM +0200, Kalle Valo wrote: > > > >> From: Baochen Qiang <quic_bqiang@quicinc.com> > >> > >> If ath11k tries to call mhi_power_up() during resume it fails: > >> > >> ath11k_pci 0000:06:00.0: timeout while waiting for restart complete > >> > >> This happens because when calling mhi_power_up() the MHI subsystem eventually > >> calls device_add() from mhi_create_devices() but the device creation is > >> deferred: > >> > >> mhi mhi0_IPCR: Driver qcom_mhi_qrtr force probe deferral > >> > >> The reason for deferring device creation is explained in dpm_prepare(): > >> > >> /* > >> * It is unsafe if probing of devices will happen during suspend or > >> * hibernation and system behavior will be unpredictable in this case. > >> * So, let's prohibit device's probing here and defer their probes > >> * instead. The normal behavior will be restored in dpm_complete(). > >> */ > >> > >> Because the device probe is deferred, the qcom_mhi_qrtr_probe() is not called and > >> qcom_mhi_qrtr_dl_callback() fails silently as qdev is zero: > >> > >> static void qcom_mhi_qrtr_dl_callback(struct mhi_device *mhi_dev, > >> struct mhi_result *mhi_res) > >> { > >> struct qrtr_mhi_dev *qdev = dev_get_drvdata(&mhi_dev->dev); > >> int rc; > >> > >> if (!qdev || mhi_res->transaction_status) > >> return; > >> > >> So what this means that QRTR is not delivering messages and the QMI connection > >> is not working between ath11k and the firmware, resulting a failure in firmware > >> initialisation. > >> > >> To fix this add new function mhi_power_down_no_destroy() which does not destroy > >> the devices during power down. This way mhi_power_up() can be called during > >> resume and we can get ath11k hibernation working with the following patches. > >> > >> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.30 > >> > >> Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> > >> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> > > > > Any reason for reposting this series without discussing the suggestion from > > Mayank? > > Baochen quickly sent me fixes for the v1 review comments, as I have been > out of office for some time I didn't want to sit on Baochen's fixes for > too long. Better to get them out of the door as soon as possible. I will > definitely look at Mayank's proposal but that will take longer. > > > As I said in the internal thread, this patch breaks the Linux device > > driver model by not destroying the "struct device" when the actual > > device gets removed. > > This patchset has been tested by several people, I'm even using this > patchset on main laptop every day, and we haven't noticed any issues. > > Can you elaborate more about this driver model? We are not removing any > ath11k devices, we just want to power down the ath11k (and in the future > ath12k) devices for suspend and power up during resume. > Devices (struct dev) for each channels are created once the device (WLAN) enters runtime mode such as (MISSION, SBL etc...). During hibernation, ath11k stack calls mhi_power_down() which essentially resets the device to POR and also the stack powers down the device properly. In that case, MHI channels do not exist as the device (WLAN) itself is powered down. As per kernel driver model, each struct device is tied to its reference count. And the reference count should be decremented whenever the actual device is not in use. Once the actual device is removed from the system, then the respective struct device has to be destroyed altogether. So in this case, even though the channels are not active (present) in the device, the device itself gets powered off, you want MHI stack to keep the struct device active, which is against the model I referenced above. To fix this issue properly, we need to investigate on how other subsystems are handling this situation (device getting powered down during hibernation), like USB. - Mani > > We should try to explore alternate options instead of persisting with > > this solution. > > What other options we have here? At least Baochen is not optimistic that > using PM_POST_HIBERNATION as a workaround would work. The issue we have > here is that mhi_power_up() doesn't work in the resume handler and > that's what we should try to fix, not make workarounds. > > -- > https://patchwork.kernel.org/project/linux-wireless/list/ > > https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
On Wed, Dec 20, 2023 at 10:02:25PM +0530, Manivannan Sadhasivam wrote: > On Tue, Dec 05, 2023 at 02:29:33PM +0200, Kalle Valo wrote: > > Manivannan Sadhasivam <mani@kernel.org> writes: > > > > > On Mon, Nov 27, 2023 at 06:20:15PM +0200, Kalle Valo wrote: > > > > > >> From: Baochen Qiang <quic_bqiang@quicinc.com> > > >> > > >> If ath11k tries to call mhi_power_up() during resume it fails: > > >> > > >> ath11k_pci 0000:06:00.0: timeout while waiting for restart complete > > >> > > >> This happens because when calling mhi_power_up() the MHI subsystem eventually > > >> calls device_add() from mhi_create_devices() but the device creation is > > >> deferred: > > >> > > >> mhi mhi0_IPCR: Driver qcom_mhi_qrtr force probe deferral > > >> > > >> The reason for deferring device creation is explained in dpm_prepare(): > > >> > > >> /* > > >> * It is unsafe if probing of devices will happen during suspend or > > >> * hibernation and system behavior will be unpredictable in this case. > > >> * So, let's prohibit device's probing here and defer their probes > > >> * instead. The normal behavior will be restored in dpm_complete(). > > >> */ > > >> > > >> Because the device probe is deferred, the qcom_mhi_qrtr_probe() is not called and > > >> qcom_mhi_qrtr_dl_callback() fails silently as qdev is zero: > > >> > > >> static void qcom_mhi_qrtr_dl_callback(struct mhi_device *mhi_dev, > > >> struct mhi_result *mhi_res) > > >> { > > >> struct qrtr_mhi_dev *qdev = dev_get_drvdata(&mhi_dev->dev); > > >> int rc; > > >> > > >> if (!qdev || mhi_res->transaction_status) > > >> return; > > >> > > >> So what this means that QRTR is not delivering messages and the QMI connection > > >> is not working between ath11k and the firmware, resulting a failure in firmware > > >> initialisation. > > >> > > >> To fix this add new function mhi_power_down_no_destroy() which does not destroy > > >> the devices during power down. This way mhi_power_up() can be called during > > >> resume and we can get ath11k hibernation working with the following patches. > > >> > > >> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.30 > > >> > > >> Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> > > >> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> > > > > > > Any reason for reposting this series without discussing the suggestion from > > > Mayank? > > > > Baochen quickly sent me fixes for the v1 review comments, as I have been > > out of office for some time I didn't want to sit on Baochen's fixes for > > too long. Better to get them out of the door as soon as possible. I will > > definitely look at Mayank's proposal but that will take longer. > > > > > As I said in the internal thread, this patch breaks the Linux device > > > driver model by not destroying the "struct device" when the actual > > > device gets removed. > > > > This patchset has been tested by several people, I'm even using this > > patchset on main laptop every day, and we haven't noticed any issues. > > > > Can you elaborate more about this driver model? We are not removing any > > ath11k devices, we just want to power down the ath11k (and in the future > > ath12k) devices for suspend and power up during resume. > > > > Devices (struct dev) for each channels are created once the device (WLAN) enters > runtime mode such as (MISSION, SBL etc...). During hibernation, ath11k stack > calls mhi_power_down() which essentially resets the device to POR and also the > stack powers down the device properly. > > In that case, MHI channels do not exist as the device (WLAN) itself is powered > down. As per kernel driver model, each struct device is tied to its reference > count. And the reference count should be decremented whenever the actual device > is not in use. Once the actual device is removed from the system, then the > respective struct device has to be destroyed altogether. > > So in this case, even though the channels are not active (present) in the > device, the device itself gets powered off, you want MHI stack to keep the > struct device active, which is against the model I referenced above. > > To fix this issue properly, we need to investigate on how other subsystems are > handling this situation (device getting powered down during hibernation), like > USB. > To me it all sounds like the probe deferral is not handled properly in mac80211 stack. As you mentioned in the commit message that the dpm_prepare() blocks probing of devices. It gets unblocked and trigerred in dpm_complete(): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/power/main.c#n1131 So if mac80211/ath11k cannot probe the devices at the dpm_complete() stage, then it is definitely an issue that needs to be fixed properly. - Mani > - Mani > > > > We should try to explore alternate options instead of persisting with > > > this solution. > > > > What other options we have here? At least Baochen is not optimistic that > > using PM_POST_HIBERNATION as a workaround would work. The issue we have > > here is that mhi_power_up() doesn't work in the resume handler and > > that's what we should try to fix, not make workarounds. > > > > -- > > https://patchwork.kernel.org/project/linux-wireless/list/ > > > > https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches > > -- > மணிவண்ணன் சதாசிவம்
On 12/21/2023 12:51 AM, Manivannan Sadhasivam wrote: > On Wed, Dec 20, 2023 at 10:02:25PM +0530, Manivannan Sadhasivam wrote: >> On Tue, Dec 05, 2023 at 02:29:33PM +0200, Kalle Valo wrote: >>> Manivannan Sadhasivam <mani@kernel.org> writes: >>> >>>> On Mon, Nov 27, 2023 at 06:20:15PM +0200, Kalle Valo wrote: >>>> >>>>> From: Baochen Qiang <quic_bqiang@quicinc.com> >>>>> >>>>> If ath11k tries to call mhi_power_up() during resume it fails: >>>>> >>>>> ath11k_pci 0000:06:00.0: timeout while waiting for restart complete >>>>> >>>>> This happens because when calling mhi_power_up() the MHI subsystem eventually >>>>> calls device_add() from mhi_create_devices() but the device creation is >>>>> deferred: >>>>> >>>>> mhi mhi0_IPCR: Driver qcom_mhi_qrtr force probe deferral >>>>> >>>>> The reason for deferring device creation is explained in dpm_prepare(): >>>>> >>>>> /* >>>>> * It is unsafe if probing of devices will happen during suspend or >>>>> * hibernation and system behavior will be unpredictable in this case. >>>>> * So, let's prohibit device's probing here and defer their probes >>>>> * instead. The normal behavior will be restored in dpm_complete(). >>>>> */ >>>>> >>>>> Because the device probe is deferred, the qcom_mhi_qrtr_probe() is not called and >>>>> qcom_mhi_qrtr_dl_callback() fails silently as qdev is zero: >>>>> >>>>> static void qcom_mhi_qrtr_dl_callback(struct mhi_device *mhi_dev, >>>>> struct mhi_result *mhi_res) >>>>> { >>>>> struct qrtr_mhi_dev *qdev = dev_get_drvdata(&mhi_dev->dev); >>>>> int rc; >>>>> >>>>> if (!qdev || mhi_res->transaction_status) >>>>> return; >>>>> >>>>> So what this means that QRTR is not delivering messages and the QMI connection >>>>> is not working between ath11k and the firmware, resulting a failure in firmware >>>>> initialisation. >>>>> >>>>> To fix this add new function mhi_power_down_no_destroy() which does not destroy >>>>> the devices during power down. This way mhi_power_up() can be called during >>>>> resume and we can get ath11k hibernation working with the following patches. >>>>> >>>>> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.30 >>>>> >>>>> Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> >>>>> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> >>>> >>>> Any reason for reposting this series without discussing the suggestion from >>>> Mayank? >>> >>> Baochen quickly sent me fixes for the v1 review comments, as I have been >>> out of office for some time I didn't want to sit on Baochen's fixes for >>> too long. Better to get them out of the door as soon as possible. I will >>> definitely look at Mayank's proposal but that will take longer. >>> >>>> As I said in the internal thread, this patch breaks the Linux device >>>> driver model by not destroying the "struct device" when the actual >>>> device gets removed. >>> >>> This patchset has been tested by several people, I'm even using this >>> patchset on main laptop every day, and we haven't noticed any issues. >>> >>> Can you elaborate more about this driver model? We are not removing any >>> ath11k devices, we just want to power down the ath11k (and in the future >>> ath12k) devices for suspend and power up during resume. >>> >> >> Devices (struct dev) for each channels are created once the device (WLAN) enters >> runtime mode such as (MISSION, SBL etc...). During hibernation, ath11k stack >> calls mhi_power_down() which essentially resets the device to POR and also the >> stack powers down the device properly. >> >> In that case, MHI channels do not exist as the device (WLAN) itself is powered >> down. As per kernel driver model, each struct device is tied to its reference >> count. And the reference count should be decremented whenever the actual device >> is not in use. Once the actual device is removed from the system, then the >> respective struct device has to be destroyed altogether. >> >> So in this case, even though the channels are not active (present) in the >> device, the device itself gets powered off, you want MHI stack to keep the >> struct device active, which is against the model I referenced above. >> >> To fix this issue properly, we need to investigate on how other subsystems are >> handling this situation (device getting powered down during hibernation), like >> USB. >> > > To me it all sounds like the probe deferral is not handled properly in mac80211 > stack. As you mentioned in the commit message that the dpm_prepare() blocks > probing of devices. It gets unblocked and trigerred in dpm_complete(): > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/power/main.c#n1131 > > So if mac80211/ath11k cannot probe the devices at the dpm_complete() stage, then > it is definitely an issue that needs to be fixed properly. To clarify, ath11k CAN probe the devices at dpm_complete() stage. The problem is kernel does not wait for all probes to finish, and in that way we will face the issue that user space applications are likely to fail because they get thawed BEFORE WLAN is ready. > > - Mani > >> - Mani >> >>>> We should try to explore alternate options instead of persisting with >>>> this solution. >>> >>> What other options we have here? At least Baochen is not optimistic that >>> using PM_POST_HIBERNATION as a workaround would work. The issue we have >>> here is that mhi_power_up() doesn't work in the resume handler and >>> that's what we should try to fix, not make workarounds. >>> >>> -- >>> https://patchwork.kernel.org/project/linux-wireless/list/ >>> >>> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches >> >> -- >> மணிவண்ணன் சதாசிவம் >
On Thu, Dec 21, 2023 at 07:05:09PM +0800, Baochen Qiang wrote: > > > On 12/21/2023 12:51 AM, Manivannan Sadhasivam wrote: > > On Wed, Dec 20, 2023 at 10:02:25PM +0530, Manivannan Sadhasivam wrote: > > > On Tue, Dec 05, 2023 at 02:29:33PM +0200, Kalle Valo wrote: > > > > Manivannan Sadhasivam <mani@kernel.org> writes: > > > > > > > > > On Mon, Nov 27, 2023 at 06:20:15PM +0200, Kalle Valo wrote: > > > > > > > > > > > From: Baochen Qiang <quic_bqiang@quicinc.com> > > > > > > > > > > > > If ath11k tries to call mhi_power_up() during resume it fails: > > > > > > > > > > > > ath11k_pci 0000:06:00.0: timeout while waiting for restart complete > > > > > > > > > > > > This happens because when calling mhi_power_up() the MHI subsystem eventually > > > > > > calls device_add() from mhi_create_devices() but the device creation is > > > > > > deferred: > > > > > > > > > > > > mhi mhi0_IPCR: Driver qcom_mhi_qrtr force probe deferral > > > > > > > > > > > > The reason for deferring device creation is explained in dpm_prepare(): > > > > > > > > > > > > /* > > > > > > * It is unsafe if probing of devices will happen during suspend or > > > > > > * hibernation and system behavior will be unpredictable in this case. > > > > > > * So, let's prohibit device's probing here and defer their probes > > > > > > * instead. The normal behavior will be restored in dpm_complete(). > > > > > > */ > > > > > > > > > > > > Because the device probe is deferred, the qcom_mhi_qrtr_probe() is not called and > > > > > > qcom_mhi_qrtr_dl_callback() fails silently as qdev is zero: > > > > > > > > > > > > static void qcom_mhi_qrtr_dl_callback(struct mhi_device *mhi_dev, > > > > > > struct mhi_result *mhi_res) > > > > > > { > > > > > > struct qrtr_mhi_dev *qdev = dev_get_drvdata(&mhi_dev->dev); > > > > > > int rc; > > > > > > > > > > > > if (!qdev || mhi_res->transaction_status) > > > > > > return; > > > > > > > > > > > > So what this means that QRTR is not delivering messages and the QMI connection > > > > > > is not working between ath11k and the firmware, resulting a failure in firmware > > > > > > initialisation. > > > > > > > > > > > > To fix this add new function mhi_power_down_no_destroy() which does not destroy > > > > > > the devices during power down. This way mhi_power_up() can be called during > > > > > > resume and we can get ath11k hibernation working with the following patches. > > > > > > > > > > > > Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.30 > > > > > > > > > > > > Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> > > > > > > Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> > > > > > > > > > > Any reason for reposting this series without discussing the suggestion from > > > > > Mayank? > > > > > > > > Baochen quickly sent me fixes for the v1 review comments, as I have been > > > > out of office for some time I didn't want to sit on Baochen's fixes for > > > > too long. Better to get them out of the door as soon as possible. I will > > > > definitely look at Mayank's proposal but that will take longer. > > > > > > > > > As I said in the internal thread, this patch breaks the Linux device > > > > > driver model by not destroying the "struct device" when the actual > > > > > device gets removed. > > > > > > > > This patchset has been tested by several people, I'm even using this > > > > patchset on main laptop every day, and we haven't noticed any issues. > > > > > > > > Can you elaborate more about this driver model? We are not removing any > > > > ath11k devices, we just want to power down the ath11k (and in the future > > > > ath12k) devices for suspend and power up during resume. > > > > > > > > > > Devices (struct dev) for each channels are created once the device (WLAN) enters > > > runtime mode such as (MISSION, SBL etc...). During hibernation, ath11k stack > > > calls mhi_power_down() which essentially resets the device to POR and also the > > > stack powers down the device properly. > > > > > > In that case, MHI channels do not exist as the device (WLAN) itself is powered > > > down. As per kernel driver model, each struct device is tied to its reference > > > count. And the reference count should be decremented whenever the actual device > > > is not in use. Once the actual device is removed from the system, then the > > > respective struct device has to be destroyed altogether. > > > > > > So in this case, even though the channels are not active (present) in the > > > device, the device itself gets powered off, you want MHI stack to keep the > > > struct device active, which is against the model I referenced above. > > > > > > To fix this issue properly, we need to investigate on how other subsystems are > > > handling this situation (device getting powered down during hibernation), like > > > USB. > > > > > > > To me it all sounds like the probe deferral is not handled properly in mac80211 > > stack. As you mentioned in the commit message that the dpm_prepare() blocks > > probing of devices. It gets unblocked and trigerred in dpm_complete(): > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/power/main.c#n1131 > > > > So if mac80211/ath11k cannot probe the devices at the dpm_complete() stage, then > > it is definitely an issue that needs to be fixed properly. > To clarify, ath11k CAN probe the devices at dpm_complete() stage. The > problem is kernel does not wait for all probes to finish, and in that way we > will face the issue that user space applications are likely to fail because > they get thawed BEFORE WLAN is ready. > Hmm. Please give me some time to reproduce this issue locally. I will get back to this thread with my analysis. - Mani > > > > - Mani > > > > > - Mani > > > > > > > > We should try to explore alternate options instead of persisting with > > > > > this solution. > > > > > > > > What other options we have here? At least Baochen is not optimistic that > > > > using PM_POST_HIBERNATION as a workaround would work. The issue we have > > > > here is that mhi_power_up() doesn't work in the resume handler and > > > > that's what we should try to fix, not make workarounds. > > > > > > > > -- > > > > https://patchwork.kernel.org/project/linux-wireless/list/ > > > > > > > > https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches > > > > > > -- > > > மணிவண்ணன் சதாசிவம் > > >
On Thu, Jan 04, 2024 at 11:39:12AM +0530, Manivannan Sadhasivam wrote: + Can, Qiang [...] > > > To me it all sounds like the probe deferral is not handled properly in mac80211 > > > stack. As you mentioned in the commit message that the dpm_prepare() blocks > > > probing of devices. It gets unblocked and trigerred in dpm_complete(): > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/power/main.c#n1131 > > > > > > So if mac80211/ath11k cannot probe the devices at the dpm_complete() stage, then > > > it is definitely an issue that needs to be fixed properly. > > To clarify, ath11k CAN probe the devices at dpm_complete() stage. The > > problem is kernel does not wait for all probes to finish, and in that way we > > will face the issue that user space applications are likely to fail because > > they get thawed BEFORE WLAN is ready. > > > > Hmm. Please give me some time to reproduce this issue locally. I will get back > to this thread with my analysis. > We reproduced the issue with the help of PCIe team (thanks Can). What we found out was, during the resume from hibernation the faliure happens in ath11k_core_resume(). Precisely here: https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/tree/drivers/net/wireless/ath/ath11k/core.c?h=ath11k-hibernation-support#n850 This code waits for the QMI messages to arrive and eventually timesout. But the impression I got from the start was that the mhi_power_up() always fails during resume. In our investigation, we confirmed that the failure is not happening at the MHI level. I'm not pointing fingers here, but trying to understand why can't you fix ath11k_core_resume() to not timeout? IMO this timeout should be handled as a deferral case. - Mani
On 1/22/2024 2:24 PM, Manivannan Sadhasivam wrote: > On Thu, Jan 04, 2024 at 11:39:12AM +0530, Manivannan Sadhasivam wrote: > > + Can, Qiang > > [...] > >>>> To me it all sounds like the probe deferral is not handled properly in mac80211 >>>> stack. As you mentioned in the commit message that the dpm_prepare() blocks >>>> probing of devices. It gets unblocked and trigerred in dpm_complete(): >>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/power/main.c#n1131 >>>> >>>> So if mac80211/ath11k cannot probe the devices at the dpm_complete() stage, then >>>> it is definitely an issue that needs to be fixed properly. >>> To clarify, ath11k CAN probe the devices at dpm_complete() stage. The >>> problem is kernel does not wait for all probes to finish, and in that way we >>> will face the issue that user space applications are likely to fail because >>> they get thawed BEFORE WLAN is ready. >>> >> >> Hmm. Please give me some time to reproduce this issue locally. I will get back >> to this thread with my analysis. >> > > We reproduced the issue with the help of PCIe team (thanks Can). What we found > out was, during the resume from hibernation the faliure happens in > ath11k_core_resume(). Precisely here: > https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/tree/drivers/net/wireless/ath/ath11k/core.c?h=ath11k-hibernation-support#n850 > > This code waits for the QMI messages to arrive and eventually timesout. But the > impression I got from the start was that the mhi_power_up() always fails during > resume. In our investigation, we confirmed that the failure is not happening at > the MHI level.No, mhi_power_up() never fails as it only downloads PBL, SBL and waits for mission mode, no MHI device created hence not affected by the deferred probe. However in addition to PBL/SBL, ath11k also needs to download m3.bin, borad.bin and regdb.bin. Those files are part of WLAN firmware and are downloaded via QMI messages. After mhi_power_up() succeeds ath11k_core_resume() waits for QMI downloading those files. As you know QMI relies on MHI channels, these channels are managed by qcom_mhi_qrtr_driver. Since device probing is deferred, qcom_mhi_qrtr_driver has no chance to run at this stage. As a result ath11k_core_resume() times out. > > I'm not pointing fingers here, but trying to understand why can't you fix > ath11k_core_resume() to not timeout? IMO this timeout should be handled as a > deferral case. Let's see what happens if we do it in a deferral way: 1. In ath11k_core_resume() we returns success directly without waiting for QMI downloading other firmware files. 2. Kernel unblocks device probe and schedules a work item to trigger all deferred probing. As a result MHI devices are probed by qcom_mhi_qrtr_driver and finally QMI is online. 3. kernel continues to resume and wake up userspace applications. 4. ath11k gets the message, either by kernel PM notification or something else, that QMI is ready and then downloads other firmware files. What happens if userspace applications or network stack immediately initiate some WLAN request after resume back? Can ath11k handle such request? The answer is, most likely, no. Because there is no guarantee that QMI finishes downloading before those request. > > - Mani >
On Mon, Jan 22, 2024 at 04:09:53PM +0800, Baochen Qiang wrote: > > > On 1/22/2024 2:24 PM, Manivannan Sadhasivam wrote: > > On Thu, Jan 04, 2024 at 11:39:12AM +0530, Manivannan Sadhasivam wrote: > > > > + Can, Qiang > > > > [...] > > > > > > > To me it all sounds like the probe deferral is not handled properly in mac80211 > > > > > stack. As you mentioned in the commit message that the dpm_prepare() blocks > > > > > probing of devices. It gets unblocked and trigerred in dpm_complete(): > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/power/main.c#n1131 > > > > > > > > > > So if mac80211/ath11k cannot probe the devices at the dpm_complete() stage, then > > > > > it is definitely an issue that needs to be fixed properly. > > > > To clarify, ath11k CAN probe the devices at dpm_complete() stage. The > > > > problem is kernel does not wait for all probes to finish, and in that way we > > > > will face the issue that user space applications are likely to fail because > > > > they get thawed BEFORE WLAN is ready. > > > > > > > > > > Hmm. Please give me some time to reproduce this issue locally. I will get back > > > to this thread with my analysis. > > > > > > > We reproduced the issue with the help of PCIe team (thanks Can). What we found > > out was, during the resume from hibernation the faliure happens in > > ath11k_core_resume(). Precisely here: > > https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/tree/drivers/net/wireless/ath/ath11k/core.c?h=ath11k-hibernation-support#n850 > > > > This code waits for the QMI messages to arrive and eventually timesout. But the > > impression I got from the start was that the mhi_power_up() always fails during > > resume. In our investigation, we confirmed that the failure is not happening at > > the MHI level.No, mhi_power_up() never fails as it only downloads PBL, > > SBL and waits > for mission mode, no MHI device created hence not affected by the deferred > probe. However in addition to PBL/SBL, ath11k also needs to download m3.bin, > borad.bin and regdb.bin. Those files are part of WLAN firmware and are > downloaded via QMI messages. After mhi_power_up() succeeds > ath11k_core_resume() waits for QMI downloading those files. As you know QMI > relies on MHI channels, these channels are managed by qcom_mhi_qrtr_driver. > Since device probing is deferred, qcom_mhi_qrtr_driver has no chance to run > at this stage. As a result ath11k_core_resume() times out. > Thanks for the info, this clarifies the issue in detail. > > > > I'm not pointing fingers here, but trying to understand why can't you fix > > ath11k_core_resume() to not timeout? IMO this timeout should be handled as a > > deferral case. > Let's see what happens if we do it in a deferral way: > 1. In ath11k_core_resume() we returns success directly without waiting for > QMI downloading other firmware files. > 2. Kernel unblocks device probe and schedules a work item to trigger all > deferred probing. As a result MHI devices are probed by qcom_mhi_qrtr_driver > and finally QMI is online. > 3. kernel continues to resume and wake up userspace applications. > 4. ath11k gets the message, either by kernel PM notification or something > else, that QMI is ready and then downloads other firmware files. > > What happens if userspace applications or network stack immediately initiate > some WLAN request after resume back? Can ath11k handle such request? The > answer is, most likely, no. Because there is no guarantee that QMI finishes > downloading before those request. > What will happen to userspace if ath11k returns an error like -EBUSY or something? Will the netdev completely go away? - Mani > > > > - Mani > >
On 1/22/2024 9:09 PM, Manivannan Sadhasivam wrote: > On Mon, Jan 22, 2024 at 04:09:53PM +0800, Baochen Qiang wrote: >> >> >> On 1/22/2024 2:24 PM, Manivannan Sadhasivam wrote: >>> On Thu, Jan 04, 2024 at 11:39:12AM +0530, Manivannan Sadhasivam wrote: >>> >>> + Can, Qiang >>> >>> [...] >>> >>>>>> To me it all sounds like the probe deferral is not handled properly in mac80211 >>>>>> stack. As you mentioned in the commit message that the dpm_prepare() blocks >>>>>> probing of devices. It gets unblocked and trigerred in dpm_complete(): >>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/power/main.c#n1131 >>>>>> >>>>>> So if mac80211/ath11k cannot probe the devices at the dpm_complete() stage, then >>>>>> it is definitely an issue that needs to be fixed properly. >>>>> To clarify, ath11k CAN probe the devices at dpm_complete() stage. The >>>>> problem is kernel does not wait for all probes to finish, and in that way we >>>>> will face the issue that user space applications are likely to fail because >>>>> they get thawed BEFORE WLAN is ready. >>>>> >>>> >>>> Hmm. Please give me some time to reproduce this issue locally. I will get back >>>> to this thread with my analysis. >>>> >>> >>> We reproduced the issue with the help of PCIe team (thanks Can). What we found >>> out was, during the resume from hibernation the faliure happens in >>> ath11k_core_resume(). Precisely here: >>> https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/tree/drivers/net/wireless/ath/ath11k/core.c?h=ath11k-hibernation-support#n850 >>> >>> This code waits for the QMI messages to arrive and eventually timesout. But the >>> impression I got from the start was that the mhi_power_up() always fails during >>> resume. In our investigation, we confirmed that the failure is not happening at >>> the MHI level.No, mhi_power_up() never fails as it only downloads PBL, >>> SBL and waits >> for mission mode, no MHI device created hence not affected by the deferred >> probe. However in addition to PBL/SBL, ath11k also needs to download m3.bin, >> borad.bin and regdb.bin. Those files are part of WLAN firmware and are >> downloaded via QMI messages. After mhi_power_up() succeeds >> ath11k_core_resume() waits for QMI downloading those files. As you know QMI >> relies on MHI channels, these channels are managed by qcom_mhi_qrtr_driver. >> Since device probing is deferred, qcom_mhi_qrtr_driver has no chance to run >> at this stage. As a result ath11k_core_resume() times out. >> > > Thanks for the info, this clarifies the issue in detail. > >>> >>> I'm not pointing fingers here, but trying to understand why can't you fix >>> ath11k_core_resume() to not timeout? IMO this timeout should be handled as a >>> deferral case. >> Let's see what happens if we do it in a deferral way: >> 1. In ath11k_core_resume() we returns success directly without waiting for >> QMI downloading other firmware files. >> 2. Kernel unblocks device probe and schedules a work item to trigger all >> deferred probing. As a result MHI devices are probed by qcom_mhi_qrtr_driver >> and finally QMI is online. >> 3. kernel continues to resume and wake up userspace applications. >> 4. ath11k gets the message, either by kernel PM notification or something >> else, that QMI is ready and then downloads other firmware files. >> >> What happens if userspace applications or network stack immediately initiate >> some WLAN request after resume back? Can ath11k handle such request? The >> answer is, most likely, no. Because there is no guarantee that QMI finishes >> downloading before those request. >> > > What will happen to userspace if ath11k returns an error like -EBUSY or > something? Will the netdev completely go away? It depends, and varies from application to application, we can't make the assumption. Besides, it doesn't make sense to return -EBUSY or something like that, if ath11k returns success during resume. A WLAN driver is supposed to finish everything, at least get back to the state before suspend, in the resume callback. If it couldn't, report the error. > > - Mani > >>> >>> - Mani >>> >
On Tue, Jan 23, 2024 at 09:44:11AM +0800, Baochen Qiang wrote: > > > On 1/22/2024 9:09 PM, Manivannan Sadhasivam wrote: > > On Mon, Jan 22, 2024 at 04:09:53PM +0800, Baochen Qiang wrote: > > > > > > > > > On 1/22/2024 2:24 PM, Manivannan Sadhasivam wrote: > > > > On Thu, Jan 04, 2024 at 11:39:12AM +0530, Manivannan Sadhasivam wrote: > > > > > > > > + Can, Qiang > > > > > > > > [...] > > > > > > > > > > > To me it all sounds like the probe deferral is not handled properly in mac80211 > > > > > > > stack. As you mentioned in the commit message that the dpm_prepare() blocks > > > > > > > probing of devices. It gets unblocked and trigerred in dpm_complete(): > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/power/main.c#n1131 > > > > > > > > > > > > > > So if mac80211/ath11k cannot probe the devices at the dpm_complete() stage, then > > > > > > > it is definitely an issue that needs to be fixed properly. > > > > > > To clarify, ath11k CAN probe the devices at dpm_complete() stage. The > > > > > > problem is kernel does not wait for all probes to finish, and in that way we > > > > > > will face the issue that user space applications are likely to fail because > > > > > > they get thawed BEFORE WLAN is ready. > > > > > > > > > > > > > > > > Hmm. Please give me some time to reproduce this issue locally. I will get back > > > > > to this thread with my analysis. > > > > > > > > > > > > > We reproduced the issue with the help of PCIe team (thanks Can). What we found > > > > out was, during the resume from hibernation the faliure happens in > > > > ath11k_core_resume(). Precisely here: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/tree/drivers/net/wireless/ath/ath11k/core.c?h=ath11k-hibernation-support#n850 > > > > > > > > This code waits for the QMI messages to arrive and eventually timesout. But the > > > > impression I got from the start was that the mhi_power_up() always fails during > > > > resume. In our investigation, we confirmed that the failure is not happening at > > > > the MHI level.No, mhi_power_up() never fails as it only downloads PBL, > > > > SBL and waits > > > for mission mode, no MHI device created hence not affected by the deferred > > > probe. However in addition to PBL/SBL, ath11k also needs to download m3.bin, > > > borad.bin and regdb.bin. Those files are part of WLAN firmware and are > > > downloaded via QMI messages. After mhi_power_up() succeeds > > > ath11k_core_resume() waits for QMI downloading those files. As you know QMI > > > relies on MHI channels, these channels are managed by qcom_mhi_qrtr_driver. > > > Since device probing is deferred, qcom_mhi_qrtr_driver has no chance to run > > > at this stage. As a result ath11k_core_resume() times out. > > > > > > > Thanks for the info, this clarifies the issue in detail. > > > > > > > > > > I'm not pointing fingers here, but trying to understand why can't you fix > > > > ath11k_core_resume() to not timeout? IMO this timeout should be handled as a > > > > deferral case. > > > Let's see what happens if we do it in a deferral way: > > > 1. In ath11k_core_resume() we returns success directly without waiting for > > > QMI downloading other firmware files. > > > 2. Kernel unblocks device probe and schedules a work item to trigger all > > > deferred probing. As a result MHI devices are probed by qcom_mhi_qrtr_driver > > > and finally QMI is online. > > > 3. kernel continues to resume and wake up userspace applications. > > > 4. ath11k gets the message, either by kernel PM notification or something > > > else, that QMI is ready and then downloads other firmware files. > > > > > > What happens if userspace applications or network stack immediately initiate > > > some WLAN request after resume back? Can ath11k handle such request? The > > > answer is, most likely, no. Because there is no guarantee that QMI finishes > > > downloading before those request. > > > > > > > What will happen to userspace if ath11k returns an error like -EBUSY or > > something? Will the netdev completely go away? > It depends, and varies from application to application, we can't make the > assumption. > > Besides, it doesn't make sense to return -EBUSY or something like that, if > ath11k returns success during resume. A WLAN driver is supposed to finish > everything, at least get back to the state before suspend, in the resume > callback. If it couldn't, report the error. > Ok. So I am getting the feeling that we need to talk to the PM people to get a proper solution. Clearly fixing the MHI code is not the right thing to do. We might need a separate callback that gets registered by the drivers like ath11k to wait for the dependency drivers to get probed. Can you initiate such a discussion? You can write to linux-pm@vger.kernel.org, "Rafael J. Wysocki" <rafael@kernel.org> and Pavel Machek <pavel@ucw.cz>. - Mani
On 1/23/2024 7:36 AM, Manivannan Sadhasivam wrote: > Ok. So I am getting the feeling that we need to talk to the PM people to get a > proper solution. Clearly fixing the MHI code is not the right thing to do. We > might need a separate callback that gets registered by the drivers like ath11k > to wait for the dependency drivers to get probed. > > Can you initiate such a discussion? You can write to linux-pm@vger.kernel.org, > "Rafael J. Wysocki" <rafael@kernel.org> and Pavel Machek <pavel@ucw.cz>. Please also include kernel@quicinc.com since this topic may have ramifications across other Qualcomm technologies that use MHI. /jeff
On Mon, Nov 27, 2023 at 06:20:15PM +0200, Kalle Valo wrote: > From: Baochen Qiang <quic_bqiang@quicinc.com> > > If ath11k tries to call mhi_power_up() during resume it fails: > This is confusing! Maybe this is what confused me initially. mhi_sync_power_up() never fails, but ath11k timesout waiting for QMI. You also confirmed the same [1]. > ath11k_pci 0000:06:00.0: timeout while waiting for restart complete > > This happens because when calling mhi_power_up() the MHI subsystem eventually > calls device_add() from mhi_create_devices() but the device creation is > deferred: > > mhi mhi0_IPCR: Driver qcom_mhi_qrtr force probe deferral > > The reason for deferring device creation is explained in dpm_prepare(): > > /* > * It is unsafe if probing of devices will happen during suspend or > * hibernation and system behavior will be unpredictable in this case. > * So, let's prohibit device's probing here and defer their probes > * instead. The normal behavior will be restored in dpm_complete(). > */ > > Because the device probe is deferred, the qcom_mhi_qrtr_probe() is not called and > qcom_mhi_qrtr_dl_callback() fails silently as qdev is zero: > > static void qcom_mhi_qrtr_dl_callback(struct mhi_device *mhi_dev, > struct mhi_result *mhi_res) > { > struct qrtr_mhi_dev *qdev = dev_get_drvdata(&mhi_dev->dev); > int rc; > > if (!qdev || mhi_res->transaction_status) > return; > > So what this means that QRTR is not delivering messages and the QMI connection > is not working between ath11k and the firmware, resulting a failure in firmware > initialisation. > > To fix this add new function mhi_power_down_no_destroy() which does not destroy > the devices during power down. This way mhi_power_up() can be called during > resume and we can get ath11k hibernation working with the following patches. > > Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.30 > > Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> > Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> > --- > drivers/bus/mhi/host/init.c | 1 + > drivers/bus/mhi/host/internal.h | 1 + > drivers/bus/mhi/host/pm.c | 26 +++++++++++++++++++------- > include/linux/mhi.h | 29 +++++++++++++++++++++++++++-- > 4 files changed, 48 insertions(+), 9 deletions(-) > > diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c > index 65ceac1837f9..e626b03ffafa 100644 > --- a/drivers/bus/mhi/host/init.c > +++ b/drivers/bus/mhi/host/init.c > @@ -43,6 +43,7 @@ const char * const dev_state_tran_str[DEV_ST_TRANSITION_MAX] = { > [DEV_ST_TRANSITION_FP] = "FLASH PROGRAMMER", > [DEV_ST_TRANSITION_SYS_ERR] = "SYS ERROR", > [DEV_ST_TRANSITION_DISABLE] = "DISABLE", > + [DEV_ST_TRANSITION_DISABLE_DESTROY_DEVICE] = "DISABLE (DESTROY DEVICE)", > }; > > const char * const mhi_ch_state_type_str[MHI_CH_STATE_TYPE_MAX] = { > diff --git a/drivers/bus/mhi/host/internal.h b/drivers/bus/mhi/host/internal.h > index 30ac415a3000..3f45c9c447bd 100644 > --- a/drivers/bus/mhi/host/internal.h > +++ b/drivers/bus/mhi/host/internal.h > @@ -69,6 +69,7 @@ enum dev_st_transition { > DEV_ST_TRANSITION_FP, > DEV_ST_TRANSITION_SYS_ERR, > DEV_ST_TRANSITION_DISABLE, > + DEV_ST_TRANSITION_DISABLE_DESTROY_DEVICE, > DEV_ST_TRANSITION_MAX, > }; > > diff --git a/drivers/bus/mhi/host/pm.c b/drivers/bus/mhi/host/pm.c > index a2f2feef1476..8833b0248393 100644 > --- a/drivers/bus/mhi/host/pm.c > +++ b/drivers/bus/mhi/host/pm.c > @@ -458,7 +458,8 @@ static int mhi_pm_mission_mode_transition(struct mhi_controller *mhi_cntrl) > } > > /* Handle shutdown transitions */ > -static void mhi_pm_disable_transition(struct mhi_controller *mhi_cntrl) > +static void mhi_pm_disable_transition(struct mhi_controller *mhi_cntrl, > + bool destroy_device) > { > enum mhi_pm_state cur_state; > struct mhi_event *mhi_event; > @@ -520,8 +521,10 @@ static void mhi_pm_disable_transition(struct mhi_controller *mhi_cntrl) > dev_dbg(dev, "Waiting for all pending threads to complete\n"); > wake_up_all(&mhi_cntrl->state_event); > > - dev_dbg(dev, "Reset all active channels and remove MHI devices\n"); > - device_for_each_child(&mhi_cntrl->mhi_dev->dev, NULL, mhi_destroy_device); > + if (destroy_device) { > + dev_dbg(dev, "Reset all active channels and remove MHI devices\n"); > + device_for_each_child(&mhi_cntrl->mhi_dev->dev, NULL, mhi_destroy_device); > + } > > mutex_lock(&mhi_cntrl->pm_mutex); > > @@ -806,7 +809,10 @@ void mhi_pm_st_worker(struct work_struct *work) > mhi_pm_sys_error_transition(mhi_cntrl); > break; > case DEV_ST_TRANSITION_DISABLE: > - mhi_pm_disable_transition(mhi_cntrl); > + mhi_pm_disable_transition(mhi_cntrl, false); > + break; > + case DEV_ST_TRANSITION_DISABLE_DESTROY_DEVICE: > + mhi_pm_disable_transition(mhi_cntrl, true); > break; > default: > break; > @@ -1160,7 +1166,8 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl) > } > EXPORT_SYMBOL_GPL(mhi_async_power_up); > > -void mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful) > +void __mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful, > + bool destroy_device) > { > enum mhi_pm_state cur_state, transition_state; > struct device *dev = &mhi_cntrl->mhi_dev->dev; > @@ -1196,14 +1203,19 @@ void mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful) > write_unlock_irq(&mhi_cntrl->pm_lock); > mutex_unlock(&mhi_cntrl->pm_mutex); > > - mhi_queue_state_transition(mhi_cntrl, DEV_ST_TRANSITION_DISABLE); > + if (destroy_device) > + mhi_queue_state_transition(mhi_cntrl, > + DEV_ST_TRANSITION_DISABLE_DESTROY_DEVICE); > + else > + mhi_queue_state_transition(mhi_cntrl, > + DEV_ST_TRANSITION_DISABLE); > > /* Wait for shutdown to complete */ > flush_work(&mhi_cntrl->st_worker); > > disable_irq(mhi_cntrl->irq[0]); > } > -EXPORT_SYMBOL_GPL(mhi_power_down); > +EXPORT_SYMBOL_GPL(__mhi_power_down); This is a helper, so should not be exported. You should export the API instead. > > int mhi_sync_power_up(struct mhi_controller *mhi_cntrl) > { > diff --git a/include/linux/mhi.h b/include/linux/mhi.h > index d0f9b522f328..ae092bc8b97e 100644 > --- a/include/linux/mhi.h > +++ b/include/linux/mhi.h > @@ -648,12 +648,37 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl); > */ > int mhi_sync_power_up(struct mhi_controller *mhi_cntrl); > > +void __mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful, > + bool destroy_device); This is a helper, so make it static. > + > /** > - * mhi_power_down - Start MHI power down sequence > + * mhi_power_down - Start MHI power down sequence. See also > + * mhi_power_down_no_destroy() which is a variant of this for suspend. suspend/hibernation > + * > * @mhi_cntrl: MHI controller > * @graceful: Link is still accessible, so do a graceful shutdown process > */ > -void mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful); > +static inline void mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful) No, API cannot be static inline. Make it global. > +{ > + __mhi_power_down(mhi_cntrl, graceful, true); > +} > + > +/** > + * mhi_power_down_no_destroy - Start MHI power down sequence but don't > + * destroy struct devices. This is a variant for mhi_power_down() and is a "struct devices for channels" > + * workaround to make it possible to use mhi_power_up() in a resume You should mention that the devices are not destroyed and this would be useful in suspend/hibernation. > + * handler. When using this variant the caller must also call > + * mhi_prepare_all_for_transfer_autoqueue() and mhi_prepare_all_for_transfer*() > + * mhi_unprepare_all_from_transfer(). > + * > + * @mhi_cntrl: MHI controller > + * @graceful: Link is still accessible, so do a graceful shutdown process > + */ > +static inline void mhi_power_down_no_destroy(struct mhi_controller *mhi_cntrl, > + bool graceful) Same as above, make it global. - Mani
On 1/31/2024 2:04 AM, Manivannan Sadhasivam wrote: > On Mon, Nov 27, 2023 at 06:20:15PM +0200, Kalle Valo wrote: >> From: Baochen Qiang <quic_bqiang@quicinc.com> >> >> If ath11k tries to call mhi_power_up() during resume it fails: >> > > This is confusing! Maybe this is what confused me initially. mhi_sync_power_up() > never fails, but ath11k timesout waiting for QMI. You also confirmed the same > [1]. mhi_sync_power_up() creates MHI devices which fails to get probed. So from the view of ath11k it fails, while from the sense of code execuation it succeeds. Will rephrase to avoid confusion. > >> ath11k_pci 0000:06:00.0: timeout while waiting for restart complete >> >> This happens because when calling mhi_power_up() the MHI subsystem eventually >> calls device_add() from mhi_create_devices() but the device creation is >> deferred: >> >> mhi mhi0_IPCR: Driver qcom_mhi_qrtr force probe deferral >> >> The reason for deferring device creation is explained in dpm_prepare(): >> >> /* >> * It is unsafe if probing of devices will happen during suspend or >> * hibernation and system behavior will be unpredictable in this case. >> * So, let's prohibit device's probing here and defer their probes >> * instead. The normal behavior will be restored in dpm_complete(). >> */ >> >> Because the device probe is deferred, the qcom_mhi_qrtr_probe() is not called and >> qcom_mhi_qrtr_dl_callback() fails silently as qdev is zero: >> >> static void qcom_mhi_qrtr_dl_callback(struct mhi_device *mhi_dev, >> struct mhi_result *mhi_res) >> { >> struct qrtr_mhi_dev *qdev = dev_get_drvdata(&mhi_dev->dev); >> int rc; >> >> if (!qdev || mhi_res->transaction_status) >> return; >> >> So what this means that QRTR is not delivering messages and the QMI connection >> is not working between ath11k and the firmware, resulting a failure in firmware >> initialisation. >> >> To fix this add new function mhi_power_down_no_destroy() which does not destroy >> the devices during power down. This way mhi_power_up() can be called during >> resume and we can get ath11k hibernation working with the following patches. >> >> Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.30 >> >> Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> >> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> >> --- >> drivers/bus/mhi/host/init.c | 1 + >> drivers/bus/mhi/host/internal.h | 1 + >> drivers/bus/mhi/host/pm.c | 26 +++++++++++++++++++------- >> include/linux/mhi.h | 29 +++++++++++++++++++++++++++-- >> 4 files changed, 48 insertions(+), 9 deletions(-) >> >> diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c >> index 65ceac1837f9..e626b03ffafa 100644 >> --- a/drivers/bus/mhi/host/init.c >> +++ b/drivers/bus/mhi/host/init.c >> @@ -43,6 +43,7 @@ const char * const dev_state_tran_str[DEV_ST_TRANSITION_MAX] = { >> [DEV_ST_TRANSITION_FP] = "FLASH PROGRAMMER", >> [DEV_ST_TRANSITION_SYS_ERR] = "SYS ERROR", >> [DEV_ST_TRANSITION_DISABLE] = "DISABLE", >> + [DEV_ST_TRANSITION_DISABLE_DESTROY_DEVICE] = "DISABLE (DESTROY DEVICE)", >> }; >> >> const char * const mhi_ch_state_type_str[MHI_CH_STATE_TYPE_MAX] = { >> diff --git a/drivers/bus/mhi/host/internal.h b/drivers/bus/mhi/host/internal.h >> index 30ac415a3000..3f45c9c447bd 100644 >> --- a/drivers/bus/mhi/host/internal.h >> +++ b/drivers/bus/mhi/host/internal.h >> @@ -69,6 +69,7 @@ enum dev_st_transition { >> DEV_ST_TRANSITION_FP, >> DEV_ST_TRANSITION_SYS_ERR, >> DEV_ST_TRANSITION_DISABLE, >> + DEV_ST_TRANSITION_DISABLE_DESTROY_DEVICE, >> DEV_ST_TRANSITION_MAX, >> }; >> >> diff --git a/drivers/bus/mhi/host/pm.c b/drivers/bus/mhi/host/pm.c >> index a2f2feef1476..8833b0248393 100644 >> --- a/drivers/bus/mhi/host/pm.c >> +++ b/drivers/bus/mhi/host/pm.c >> @@ -458,7 +458,8 @@ static int mhi_pm_mission_mode_transition(struct mhi_controller *mhi_cntrl) >> } >> >> /* Handle shutdown transitions */ >> -static void mhi_pm_disable_transition(struct mhi_controller *mhi_cntrl) >> +static void mhi_pm_disable_transition(struct mhi_controller *mhi_cntrl, >> + bool destroy_device) >> { >> enum mhi_pm_state cur_state; >> struct mhi_event *mhi_event; >> @@ -520,8 +521,10 @@ static void mhi_pm_disable_transition(struct mhi_controller *mhi_cntrl) >> dev_dbg(dev, "Waiting for all pending threads to complete\n"); >> wake_up_all(&mhi_cntrl->state_event); >> >> - dev_dbg(dev, "Reset all active channels and remove MHI devices\n"); >> - device_for_each_child(&mhi_cntrl->mhi_dev->dev, NULL, mhi_destroy_device); >> + if (destroy_device) { >> + dev_dbg(dev, "Reset all active channels and remove MHI devices\n"); >> + device_for_each_child(&mhi_cntrl->mhi_dev->dev, NULL, mhi_destroy_device); >> + } >> >> mutex_lock(&mhi_cntrl->pm_mutex); >> >> @@ -806,7 +809,10 @@ void mhi_pm_st_worker(struct work_struct *work) >> mhi_pm_sys_error_transition(mhi_cntrl); >> break; >> case DEV_ST_TRANSITION_DISABLE: >> - mhi_pm_disable_transition(mhi_cntrl); >> + mhi_pm_disable_transition(mhi_cntrl, false); >> + break; >> + case DEV_ST_TRANSITION_DISABLE_DESTROY_DEVICE: >> + mhi_pm_disable_transition(mhi_cntrl, true); >> break; >> default: >> break; >> @@ -1160,7 +1166,8 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl) >> } >> EXPORT_SYMBOL_GPL(mhi_async_power_up); >> >> -void mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful) >> +void __mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful, >> + bool destroy_device) >> { >> enum mhi_pm_state cur_state, transition_state; >> struct device *dev = &mhi_cntrl->mhi_dev->dev; >> @@ -1196,14 +1203,19 @@ void mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful) >> write_unlock_irq(&mhi_cntrl->pm_lock); >> mutex_unlock(&mhi_cntrl->pm_mutex); >> >> - mhi_queue_state_transition(mhi_cntrl, DEV_ST_TRANSITION_DISABLE); >> + if (destroy_device) >> + mhi_queue_state_transition(mhi_cntrl, >> + DEV_ST_TRANSITION_DISABLE_DESTROY_DEVICE); >> + else >> + mhi_queue_state_transition(mhi_cntrl, >> + DEV_ST_TRANSITION_DISABLE); >> >> /* Wait for shutdown to complete */ >> flush_work(&mhi_cntrl->st_worker); >> >> disable_irq(mhi_cntrl->irq[0]); >> } >> -EXPORT_SYMBOL_GPL(mhi_power_down); >> +EXPORT_SYMBOL_GPL(__mhi_power_down); > > This is a helper, so should not be exported. You should export the API instead. > >> >> int mhi_sync_power_up(struct mhi_controller *mhi_cntrl) >> { >> diff --git a/include/linux/mhi.h b/include/linux/mhi.h >> index d0f9b522f328..ae092bc8b97e 100644 >> --- a/include/linux/mhi.h >> +++ b/include/linux/mhi.h >> @@ -648,12 +648,37 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl); >> */ >> int mhi_sync_power_up(struct mhi_controller *mhi_cntrl); >> >> +void __mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful, >> + bool destroy_device); > > This is a helper, so make it static. > >> + >> /** >> - * mhi_power_down - Start MHI power down sequence >> + * mhi_power_down - Start MHI power down sequence. See also >> + * mhi_power_down_no_destroy() which is a variant of this for suspend. > > suspend/hibernation > >> + * >> * @mhi_cntrl: MHI controller >> * @graceful: Link is still accessible, so do a graceful shutdown process >> */ >> -void mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful); >> +static inline void mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful) > > No, API cannot be static inline. Make it global. > >> +{ >> + __mhi_power_down(mhi_cntrl, graceful, true); >> +} >> + >> +/** >> + * mhi_power_down_no_destroy - Start MHI power down sequence but don't >> + * destroy struct devices. This is a variant for mhi_power_down() and is a > > "struct devices for channels" > >> + * workaround to make it possible to use mhi_power_up() in a resume > > You should mention that the devices are not destroyed and this would be useful > in suspend/hibernation. > >> + * handler. When using this variant the caller must also call >> + * mhi_prepare_all_for_transfer_autoqueue() and > > mhi_prepare_all_for_transfer*() > >> + * mhi_unprepare_all_from_transfer(). >> + * >> + * @mhi_cntrl: MHI controller >> + * @graceful: Link is still accessible, so do a graceful shutdown process >> + */ >> +static inline void mhi_power_down_no_destroy(struct mhi_controller *mhi_cntrl, >> + bool graceful) > > Same as above, make it global. > > - Mani >
diff --git a/drivers/bus/mhi/host/init.c b/drivers/bus/mhi/host/init.c index 65ceac1837f9..e626b03ffafa 100644 --- a/drivers/bus/mhi/host/init.c +++ b/drivers/bus/mhi/host/init.c @@ -43,6 +43,7 @@ const char * const dev_state_tran_str[DEV_ST_TRANSITION_MAX] = { [DEV_ST_TRANSITION_FP] = "FLASH PROGRAMMER", [DEV_ST_TRANSITION_SYS_ERR] = "SYS ERROR", [DEV_ST_TRANSITION_DISABLE] = "DISABLE", + [DEV_ST_TRANSITION_DISABLE_DESTROY_DEVICE] = "DISABLE (DESTROY DEVICE)", }; const char * const mhi_ch_state_type_str[MHI_CH_STATE_TYPE_MAX] = { diff --git a/drivers/bus/mhi/host/internal.h b/drivers/bus/mhi/host/internal.h index 30ac415a3000..3f45c9c447bd 100644 --- a/drivers/bus/mhi/host/internal.h +++ b/drivers/bus/mhi/host/internal.h @@ -69,6 +69,7 @@ enum dev_st_transition { DEV_ST_TRANSITION_FP, DEV_ST_TRANSITION_SYS_ERR, DEV_ST_TRANSITION_DISABLE, + DEV_ST_TRANSITION_DISABLE_DESTROY_DEVICE, DEV_ST_TRANSITION_MAX, }; diff --git a/drivers/bus/mhi/host/pm.c b/drivers/bus/mhi/host/pm.c index a2f2feef1476..8833b0248393 100644 --- a/drivers/bus/mhi/host/pm.c +++ b/drivers/bus/mhi/host/pm.c @@ -458,7 +458,8 @@ static int mhi_pm_mission_mode_transition(struct mhi_controller *mhi_cntrl) } /* Handle shutdown transitions */ -static void mhi_pm_disable_transition(struct mhi_controller *mhi_cntrl) +static void mhi_pm_disable_transition(struct mhi_controller *mhi_cntrl, + bool destroy_device) { enum mhi_pm_state cur_state; struct mhi_event *mhi_event; @@ -520,8 +521,10 @@ static void mhi_pm_disable_transition(struct mhi_controller *mhi_cntrl) dev_dbg(dev, "Waiting for all pending threads to complete\n"); wake_up_all(&mhi_cntrl->state_event); - dev_dbg(dev, "Reset all active channels and remove MHI devices\n"); - device_for_each_child(&mhi_cntrl->mhi_dev->dev, NULL, mhi_destroy_device); + if (destroy_device) { + dev_dbg(dev, "Reset all active channels and remove MHI devices\n"); + device_for_each_child(&mhi_cntrl->mhi_dev->dev, NULL, mhi_destroy_device); + } mutex_lock(&mhi_cntrl->pm_mutex); @@ -806,7 +809,10 @@ void mhi_pm_st_worker(struct work_struct *work) mhi_pm_sys_error_transition(mhi_cntrl); break; case DEV_ST_TRANSITION_DISABLE: - mhi_pm_disable_transition(mhi_cntrl); + mhi_pm_disable_transition(mhi_cntrl, false); + break; + case DEV_ST_TRANSITION_DISABLE_DESTROY_DEVICE: + mhi_pm_disable_transition(mhi_cntrl, true); break; default: break; @@ -1160,7 +1166,8 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl) } EXPORT_SYMBOL_GPL(mhi_async_power_up); -void mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful) +void __mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful, + bool destroy_device) { enum mhi_pm_state cur_state, transition_state; struct device *dev = &mhi_cntrl->mhi_dev->dev; @@ -1196,14 +1203,19 @@ void mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful) write_unlock_irq(&mhi_cntrl->pm_lock); mutex_unlock(&mhi_cntrl->pm_mutex); - mhi_queue_state_transition(mhi_cntrl, DEV_ST_TRANSITION_DISABLE); + if (destroy_device) + mhi_queue_state_transition(mhi_cntrl, + DEV_ST_TRANSITION_DISABLE_DESTROY_DEVICE); + else + mhi_queue_state_transition(mhi_cntrl, + DEV_ST_TRANSITION_DISABLE); /* Wait for shutdown to complete */ flush_work(&mhi_cntrl->st_worker); disable_irq(mhi_cntrl->irq[0]); } -EXPORT_SYMBOL_GPL(mhi_power_down); +EXPORT_SYMBOL_GPL(__mhi_power_down); int mhi_sync_power_up(struct mhi_controller *mhi_cntrl) { diff --git a/include/linux/mhi.h b/include/linux/mhi.h index d0f9b522f328..ae092bc8b97e 100644 --- a/include/linux/mhi.h +++ b/include/linux/mhi.h @@ -648,12 +648,37 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl); */ int mhi_sync_power_up(struct mhi_controller *mhi_cntrl); +void __mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful, + bool destroy_device); + /** - * mhi_power_down - Start MHI power down sequence + * mhi_power_down - Start MHI power down sequence. See also + * mhi_power_down_no_destroy() which is a variant of this for suspend. + * * @mhi_cntrl: MHI controller * @graceful: Link is still accessible, so do a graceful shutdown process */ -void mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful); +static inline void mhi_power_down(struct mhi_controller *mhi_cntrl, bool graceful) +{ + __mhi_power_down(mhi_cntrl, graceful, true); +} + +/** + * mhi_power_down_no_destroy - Start MHI power down sequence but don't + * destroy struct devices. This is a variant for mhi_power_down() and is a + * workaround to make it possible to use mhi_power_up() in a resume + * handler. When using this variant the caller must also call + * mhi_prepare_all_for_transfer_autoqueue() and + * mhi_unprepare_all_from_transfer(). + * + * @mhi_cntrl: MHI controller + * @graceful: Link is still accessible, so do a graceful shutdown process + */ +static inline void mhi_power_down_no_destroy(struct mhi_controller *mhi_cntrl, + bool graceful) +{ + __mhi_power_down(mhi_cntrl, graceful, false); +} /** * mhi_unprepare_after_power_down - Free any allocated memory after power down