Message ID | 20250108-mhi_recovery_fix-v1-1-a0a00a17da46@linaro.org (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | bus: mhi: host: pci_generic: Couple of recovery fixes | expand |
On Wed, 8 Jan 2025 at 14:39, Manivannan Sadhasivam via B4 Relay <devnull+manivannan.sadhasivam.linaro.org@kernel.org> wrote: > > From: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> > > There are multiple places from where the recovery work gets scheduled > asynchronously. Also, there are multiple places where the caller waits > synchronously for the recovery to be completed. One such place is during > the PM shutdown() callback. > > If the device is not alive during recovery_work, it will try to reset the > device using pci_reset_function(). This function internally will take the > device_lock() first before resetting the device. By this time, if the lock > has already been acquired, then recovery_work will get stalled while > waiting for the lock. And if the lock was already acquired by the caller > which waits for the recovery_work to be completed, it will lead to > deadlock. > > This is what happened on the X1E80100 CRD device when the device died > before shutdown() callback. Driver core calls the driver's shutdown() > callback while holding the device_lock() leading to deadlock. > > And this deadlock scenario can occur on other paths as well, like during > the PM suspend() callback, where the driver core would hold the > device_lock() before calling driver's suspend() callback. And if the > recovery_work was already started, it could lead to deadlock. This is also > observed on the X1E80100 CRD. > > So to fix both issues, use pci_try_reset_function() in recovery_work. This > function first checks for the availability of the device_lock() before > trying to reset the device. If the lock is available, it will acquire it > and reset the device. Otherwise, it will return -EAGAIN. If that happens, > recovery_work will fail with the error message "Recovery failed" as not > much could be done. > > Cc: stable@vger.kernel.org # 5.12 > Reported-by: Johan Hovold <johan@kernel.org> > Closes: https://lore.kernel.org/mhi/Z1me8iaK7cwgjL92@hovoldconsulting.com > Fixes: 7389337f0a78 ("mhi: pci_generic: Add suspend/resume/recovery procedure") > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
diff --git a/drivers/bus/mhi/host/pci_generic.c b/drivers/bus/mhi/host/pci_generic.c index 07645ce2119a..e92df380c785 100644 --- a/drivers/bus/mhi/host/pci_generic.c +++ b/drivers/bus/mhi/host/pci_generic.c @@ -1040,7 +1040,7 @@ static void mhi_pci_recovery_work(struct work_struct *work) err_unprepare: mhi_unprepare_after_power_down(mhi_cntrl); err_try_reset: - if (pci_reset_function(pdev)) + if (pci_try_reset_function(pdev)) dev_err(&pdev->dev, "Recovery failed\n"); }