diff mbox

mmc: dw_mmc-exynos: fix potential external abort in resume_noirq()

Message ID CAPDyKFom7NWyvQ9insmT5AmNvY8ON=wHK6KZHUC+OJx5jMec7A@mail.gmail.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Ulf Hansson June 12, 2018, 9:20 a.m. UTC
Hi Marek,

On 12 June 2018 at 10:28, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> Hi Ulf,
>
> On 2018-06-11 14:24, Ulf Hansson wrote:
>> On 11 June 2018 at 11:50, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>> On 2018-06-11 11:35, Ulf Hansson wrote:
>>>> On 11 June 2018 at 08:48, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>>>> dw_mci_exynos_resume_noirq() performs DWMMC register access without
>>>>> ensuring that respective clocks are enabled. This might cause external
>>>>> abort on some systems (observed on Exynos5433 based boards). Fix this
>>>>> by adding needed prepare_enable/disable_unprepare calls.
>>>>>
>>>>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>> ---
>>>>>    drivers/mmc/host/dw_mmc-exynos.c | 6 ++++++
>>>>>    1 file changed, 6 insertions(+)
>>>>>
>>>>> diff --git a/drivers/mmc/host/dw_mmc-exynos.c b/drivers/mmc/host/dw_mmc-exynos.c
>>>>> index 3164681108ae..6125b68726b0 100644
>>>>> --- a/drivers/mmc/host/dw_mmc-exynos.c
>>>>> +++ b/drivers/mmc/host/dw_mmc-exynos.c
>>>>> @@ -193,6 +193,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)
>>>>>           struct dw_mci_exynos_priv_data *priv = host->priv;
>>>>>           u32 clksel;
>>>>>
>>>>> +       clk_prepare_enable(host->biu_clk);
>>>>> +       clk_prepare_enable(host->ciu_clk);
>>>>> +
>>>>>           if (priv->ctrl_type == DW_MCI_TYPE_EXYNOS7 ||
>>>>>                   priv->ctrl_type == DW_MCI_TYPE_EXYNOS7_SMU)
>>>>>                   clksel = mci_readl(host, CLKSEL64);
>>>>> @@ -207,6 +210,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)
>>>>>                           mci_writel(host, CLKSEL, clksel);
>>>>>           }
>>>>>
>>>>> +       clk_disable_unprepare(host->biu_clk);
>>>>> +       clk_disable_unprepare(host->ciu_clk);
>>>>> +
>>>>>           return 0;
>>>>>    }
>>>>>    #else
>>>> I looked a little closer and I am wondering if it wouldn't be possible
>>>> to use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() instead of
>>>> SET_SYSTEM_SLEEP_PM_OPS()?
>>>>
>>>> Somelike this:
>>>> SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
>>>> dw_mci_exynos_resume_noirq)
>>>>
>>>> Then from dw_mci_exynos_resume_noirq() call pm_runtime_force_resume().
>>>>
>>>> I think it would simplify the code a bit, as you can rely on the
>>>> runtime PM callbacks to deal with clk_prepare_enable() and
>>>> clk_disable_unprepare(), unless I am mistaken.
>>> This will not fix the problem, because mci_writel() calls in
>>> dw_mci_exynos_resume_noirq are done unconditionally, regardless of the
>>> controller's runtime pm state. Since commit 1d9174fbc55e after calling
>>> pm_runtime_force_resume() there is no guarantee that device is in
>>> runtime active state if it was runtime suspended state.
>> Yes, because the runtime PM usage count is greater than 1.
>> (pm_runtime_get_noresume() is called during probe).
>>
>> If you want to make this explicit (not relying on ->probe()), one can
>> add a ->suspend_noirq() callback and call pm_runtime_get_noresume() in
>> it.
>
> Sorry, but I don't get how this would work. Exactly the same pattern as
> you have proposed was already used in s3c-64xx SPI driver and it didn't
> work properly (tested on the same SoC as this DW-MMC change). I had to
> move register access to runtime resume callback to fix external abort
> issue:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e935dba111621bd6a0c5d48e6511a4d9885103b4

Yep, that is a correct solution.

>
> Here in DW-MMC driver such approach (moving all the code to runtime
> resume callback) is not possible because of the potential interrupt storm
> caused by the hw bug (that's the reason of using noirq resume callback).

I understand. What you need is to run the runtime resume/suspend
callbacks in the resume/suspend noirq phase. Moreover, you need to
make sure that the runtime resume callback, really becomes invoked
during the resume noirq phase, because of the HW bug.

I think the below should work. Can you give it a try?

It relies on the call pm_runtime_get_noresume(), done during
->probe(). Note that, the driver always keeps the RPM usage count
increased, thus preventing runtime suspend during normal execution.

Anyway, if this doesn't work, your suggested approach works fine as well.

---
 drivers/mmc/host/dw_mmc-exynos.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

Comments

Marek Szyprowski June 12, 2018, 9:52 a.m. UTC | #1
Hi Ulf,

On 2018-06-12 11:20, Ulf Hansson wrote:
> On 12 June 2018 at 10:28, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>> On 2018-06-11 14:24, Ulf Hansson wrote:
>>> On 11 June 2018 at 11:50, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>>> On 2018-06-11 11:35, Ulf Hansson wrote:
>>>>> On 11 June 2018 at 08:48, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>>>>> dw_mci_exynos_resume_noirq() performs DWMMC register access without
>>>>>> ensuring that respective clocks are enabled. This might cause external
>>>>>> abort on some systems (observed on Exynos5433 based boards). Fix this
>>>>>> by adding needed prepare_enable/disable_unprepare calls.
>>>>>>
>>>>>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>>> ---
>>>>>>     drivers/mmc/host/dw_mmc-exynos.c | 6 ++++++
>>>>>>     1 file changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/mmc/host/dw_mmc-exynos.c b/drivers/mmc/host/dw_mmc-exynos.c
>>>>>> index 3164681108ae..6125b68726b0 100644
>>>>>> --- a/drivers/mmc/host/dw_mmc-exynos.c
>>>>>> +++ b/drivers/mmc/host/dw_mmc-exynos.c
>>>>>> @@ -193,6 +193,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)
>>>>>>            struct dw_mci_exynos_priv_data *priv = host->priv;
>>>>>>            u32 clksel;
>>>>>>
>>>>>> +       clk_prepare_enable(host->biu_clk);
>>>>>> +       clk_prepare_enable(host->ciu_clk);
>>>>>> +
>>>>>>            if (priv->ctrl_type == DW_MCI_TYPE_EXYNOS7 ||
>>>>>>                    priv->ctrl_type == DW_MCI_TYPE_EXYNOS7_SMU)
>>>>>>                    clksel = mci_readl(host, CLKSEL64);
>>>>>> @@ -207,6 +210,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)
>>>>>>                            mci_writel(host, CLKSEL, clksel);
>>>>>>            }
>>>>>>
>>>>>> +       clk_disable_unprepare(host->biu_clk);
>>>>>> +       clk_disable_unprepare(host->ciu_clk);
>>>>>> +
>>>>>>            return 0;
>>>>>>     }
>>>>>>     #else
>>>>> I looked a little closer and I am wondering if it wouldn't be possible
>>>>> to use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() instead of
>>>>> SET_SYSTEM_SLEEP_PM_OPS()?
>>>>>
>>>>> Somelike this:
>>>>> SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
>>>>> dw_mci_exynos_resume_noirq)
>>>>>
>>>>> Then from dw_mci_exynos_resume_noirq() call pm_runtime_force_resume().
>>>>>
>>>>> I think it would simplify the code a bit, as you can rely on the
>>>>> runtime PM callbacks to deal with clk_prepare_enable() and
>>>>> clk_disable_unprepare(), unless I am mistaken.
>>>> This will not fix the problem, because mci_writel() calls in
>>>> dw_mci_exynos_resume_noirq are done unconditionally, regardless of the
>>>> controller's runtime pm state. Since commit 1d9174fbc55e after calling
>>>> pm_runtime_force_resume() there is no guarantee that device is in
>>>> runtime active state if it was runtime suspended state.
>>> Yes, because the runtime PM usage count is greater than 1.
>>> (pm_runtime_get_noresume() is called during probe).
>>>
>>> If you want to make this explicit (not relying on ->probe()), one can
>>> add a ->suspend_noirq() callback and call pm_runtime_get_noresume() in
>>> it.
>> Sorry, but I don't get how this would work. Exactly the same pattern as
>> you have proposed was already used in s3c-64xx SPI driver and it didn't
>> work properly (tested on the same SoC as this DW-MMC change). I had to
>> move register access to runtime resume callback to fix external abort
>> issue:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e935dba111621bd6a0c5d48e6511a4d9885103b4
> Yep, that is a correct solution.
>
>> Here in DW-MMC driver such approach (moving all the code to runtime
>> resume callback) is not possible because of the potential interrupt storm
>> caused by the hw bug (that's the reason of using noirq resume callback).
> I understand. What you need is to run the runtime resume/suspend
> callbacks in the resume/suspend noirq phase. Moreover, you need to
> make sure that the runtime resume callback, really becomes invoked
> during the resume noirq phase, because of the HW bug.
>
> I think the below should work. Can you give it a try?
>
> It relies on the call pm_runtime_get_noresume(), done during
> ->probe(). Note that, the driver always keeps the RPM usage count
> increased, thus preventing runtime suspend during normal execution.
>
> Anyway, if this doesn't work, your suggested approach works fine as well.

Okay, finally I got it. I wasn't aware that dw_mmc-exynos keeps device
runtime active all the time between the driver probe() and remove().
Right, this will fix this specific case, but it isn't a generic solution,
so I will also add a comment on that, so one would not need to debug it
again if he decides to change runtime pm usage scheme in dw_mmc-exynos
in the future.

>
> ---
>   drivers/mmc/host/dw_mmc-exynos.c | 15 +++++++--------
>   1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/mmc/host/dw_mmc-exynos.c b/drivers/mmc/host/dw_mmc-exynos.c
> index a84aa3f1ae85..66132f7fceed 100644
> --- a/drivers/mmc/host/dw_mmc-exynos.c
> +++ b/drivers/mmc/host/dw_mmc-exynos.c
> @@ -175,7 +175,9 @@ static int dw_mci_exynos_runtime_resume(struct device *dev)
>
>          return ret;
>   }
> +#endif /* CONFIG_PM */
>
> +#ifdef CONFIG_PM_SLEEP
>   /**
>    * dw_mci_exynos_resume_noirq - Exynos-specific resume code
>    *
> @@ -193,6 +195,8 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)
>          struct dw_mci_exynos_priv_data *priv = host->priv;
>          u32 clksel;
>
> +       pm_runtime_force_resume(dev);
> +
>          if (priv->ctrl_type == DW_MCI_TYPE_EXYNOS7 ||
>                  priv->ctrl_type == DW_MCI_TYPE_EXYNOS7_SMU)
>                  clksel = mci_readl(host, CLKSEL64);
> @@ -209,9 +213,7 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)
>
>          return 0;
>   }
> -#else
> -#define dw_mci_exynos_resume_noirq     NULL
> -#endif /* CONFIG_PM */
> +#endif /* CONFIG_PM_SLEEP */
>
>   static void dw_mci_exynos_config_hs400(struct dw_mci *host, u32 timing)
>   {
> @@ -553,14 +555,11 @@ static int dw_mci_exynos_remove(struct
> platform_device *pdev)
>   }
>   static const struct dev_pm_ops dw_mci_exynos_pmops = {
> -       SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
> -                               pm_runtime_force_resume)
> +       SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
> +                               dw_mci_exynos_resume_noirq)
>          SET_RUNTIME_PM_OPS(dw_mci_runtime_suspend,
>                             dw_mci_exynos_runtime_resume,
>                             NULL)
> -       .resume_noirq = dw_mci_exynos_resume_noirq,
> -       .thaw_noirq = dw_mci_exynos_resume_noirq,
> -       .restore_noirq = dw_mci_exynos_resume_noirq,
>   };
>
>   static struct platform_driver dw_mci_exynos_pltfm_driver = {

Best regards
Ulf Hansson June 12, 2018, 10:07 a.m. UTC | #2
On 12 June 2018 at 11:52, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> Hi Ulf,
>
> On 2018-06-12 11:20, Ulf Hansson wrote:
>> On 12 June 2018 at 10:28, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>> On 2018-06-11 14:24, Ulf Hansson wrote:
>>>> On 11 June 2018 at 11:50, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>>>> On 2018-06-11 11:35, Ulf Hansson wrote:
>>>>>> On 11 June 2018 at 08:48, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>>>>>> dw_mci_exynos_resume_noirq() performs DWMMC register access without
>>>>>>> ensuring that respective clocks are enabled. This might cause external
>>>>>>> abort on some systems (observed on Exynos5433 based boards). Fix this
>>>>>>> by adding needed prepare_enable/disable_unprepare calls.
>>>>>>>
>>>>>>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>>>> ---
>>>>>>>     drivers/mmc/host/dw_mmc-exynos.c | 6 ++++++
>>>>>>>     1 file changed, 6 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/mmc/host/dw_mmc-exynos.c b/drivers/mmc/host/dw_mmc-exynos.c
>>>>>>> index 3164681108ae..6125b68726b0 100644
>>>>>>> --- a/drivers/mmc/host/dw_mmc-exynos.c
>>>>>>> +++ b/drivers/mmc/host/dw_mmc-exynos.c
>>>>>>> @@ -193,6 +193,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)
>>>>>>>            struct dw_mci_exynos_priv_data *priv = host->priv;
>>>>>>>            u32 clksel;
>>>>>>>
>>>>>>> +       clk_prepare_enable(host->biu_clk);
>>>>>>> +       clk_prepare_enable(host->ciu_clk);
>>>>>>> +
>>>>>>>            if (priv->ctrl_type == DW_MCI_TYPE_EXYNOS7 ||
>>>>>>>                    priv->ctrl_type == DW_MCI_TYPE_EXYNOS7_SMU)
>>>>>>>                    clksel = mci_readl(host, CLKSEL64);
>>>>>>> @@ -207,6 +210,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)
>>>>>>>                            mci_writel(host, CLKSEL, clksel);
>>>>>>>            }
>>>>>>>
>>>>>>> +       clk_disable_unprepare(host->biu_clk);
>>>>>>> +       clk_disable_unprepare(host->ciu_clk);
>>>>>>> +
>>>>>>>            return 0;
>>>>>>>     }
>>>>>>>     #else
>>>>>> I looked a little closer and I am wondering if it wouldn't be possible
>>>>>> to use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() instead of
>>>>>> SET_SYSTEM_SLEEP_PM_OPS()?
>>>>>>
>>>>>> Somelike this:
>>>>>> SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
>>>>>> dw_mci_exynos_resume_noirq)
>>>>>>
>>>>>> Then from dw_mci_exynos_resume_noirq() call pm_runtime_force_resume().
>>>>>>
>>>>>> I think it would simplify the code a bit, as you can rely on the
>>>>>> runtime PM callbacks to deal with clk_prepare_enable() and
>>>>>> clk_disable_unprepare(), unless I am mistaken.
>>>>> This will not fix the problem, because mci_writel() calls in
>>>>> dw_mci_exynos_resume_noirq are done unconditionally, regardless of the
>>>>> controller's runtime pm state. Since commit 1d9174fbc55e after calling
>>>>> pm_runtime_force_resume() there is no guarantee that device is in
>>>>> runtime active state if it was runtime suspended state.
>>>> Yes, because the runtime PM usage count is greater than 1.
>>>> (pm_runtime_get_noresume() is called during probe).
>>>>
>>>> If you want to make this explicit (not relying on ->probe()), one can
>>>> add a ->suspend_noirq() callback and call pm_runtime_get_noresume() in
>>>> it.
>>> Sorry, but I don't get how this would work. Exactly the same pattern as
>>> you have proposed was already used in s3c-64xx SPI driver and it didn't
>>> work properly (tested on the same SoC as this DW-MMC change). I had to
>>> move register access to runtime resume callback to fix external abort
>>> issue:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e935dba111621bd6a0c5d48e6511a4d9885103b4
>> Yep, that is a correct solution.
>>
>>> Here in DW-MMC driver such approach (moving all the code to runtime
>>> resume callback) is not possible because of the potential interrupt storm
>>> caused by the hw bug (that's the reason of using noirq resume callback).
>> I understand. What you need is to run the runtime resume/suspend
>> callbacks in the resume/suspend noirq phase. Moreover, you need to
>> make sure that the runtime resume callback, really becomes invoked
>> during the resume noirq phase, because of the HW bug.
>>
>> I think the below should work. Can you give it a try?
>>
>> It relies on the call pm_runtime_get_noresume(), done during
>> ->probe(). Note that, the driver always keeps the RPM usage count
>> increased, thus preventing runtime suspend during normal execution.
>>
>> Anyway, if this doesn't work, your suggested approach works fine as well.
>
> Okay, finally I got it. I wasn't aware that dw_mmc-exynos keeps device
> runtime active all the time between the driver probe() and remove().
> Right, this will fix this specific case, but it isn't a generic solution,
> so I will also add a comment on that, so one would not need to debug it
> again if he decides to change runtime pm usage scheme in dw_mmc-exynos
> in the future.

Seems reasonable!

If you want the more generic solution, I would add a exynos specific
suspend_noirq() callback, let it call pm_runtime_get_noresume() and
them pm_runtime_force_suspend().

In the corresponding resume_noirq() callback, extend my suggested
changes, with a call to pm_runtime_put_noidle() after all actions has
been done in it.

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marek Szyprowski June 12, 2018, 10:25 a.m. UTC | #3
Hi Ulf,

On 2018-06-12 12:07, Ulf Hansson wrote:
> On 12 June 2018 at 11:52, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>> On 2018-06-12 11:20, Ulf Hansson wrote:
>>> On 12 June 2018 at 10:28, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>>> On 2018-06-11 14:24, Ulf Hansson wrote:
>>>>> On 11 June 2018 at 11:50, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>>>>> On 2018-06-11 11:35, Ulf Hansson wrote:
>>>>>>> On 11 June 2018 at 08:48, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>>>>>>> dw_mci_exynos_resume_noirq() performs DWMMC register access without
>>>>>>>> ensuring that respective clocks are enabled. This might cause external
>>>>>>>> abort on some systems (observed on Exynos5433 based boards). Fix this
>>>>>>>> by adding needed prepare_enable/disable_unprepare calls.
>>>>>>>>
>>>>>>>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>>>>> ---
>>>>>>>>      drivers/mmc/host/dw_mmc-exynos.c | 6 ++++++
>>>>>>>>      1 file changed, 6 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/mmc/host/dw_mmc-exynos.c b/drivers/mmc/host/dw_mmc-exynos.c
>>>>>>>> index 3164681108ae..6125b68726b0 100644
>>>>>>>> --- a/drivers/mmc/host/dw_mmc-exynos.c
>>>>>>>> +++ b/drivers/mmc/host/dw_mmc-exynos.c
>>>>>>>> @@ -193,6 +193,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)
>>>>>>>>             struct dw_mci_exynos_priv_data *priv = host->priv;
>>>>>>>>             u32 clksel;
>>>>>>>>
>>>>>>>> +       clk_prepare_enable(host->biu_clk);
>>>>>>>> +       clk_prepare_enable(host->ciu_clk);
>>>>>>>> +
>>>>>>>>             if (priv->ctrl_type == DW_MCI_TYPE_EXYNOS7 ||
>>>>>>>>                     priv->ctrl_type == DW_MCI_TYPE_EXYNOS7_SMU)
>>>>>>>>                     clksel = mci_readl(host, CLKSEL64);
>>>>>>>> @@ -207,6 +210,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)
>>>>>>>>                             mci_writel(host, CLKSEL, clksel);
>>>>>>>>             }
>>>>>>>>
>>>>>>>> +       clk_disable_unprepare(host->biu_clk);
>>>>>>>> +       clk_disable_unprepare(host->ciu_clk);
>>>>>>>> +
>>>>>>>>             return 0;
>>>>>>>>      }
>>>>>>>>      #else
>>>>>>> I looked a little closer and I am wondering if it wouldn't be possible
>>>>>>> to use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() instead of
>>>>>>> SET_SYSTEM_SLEEP_PM_OPS()?
>>>>>>>
>>>>>>> Somelike this:
>>>>>>> SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
>>>>>>> dw_mci_exynos_resume_noirq)
>>>>>>>
>>>>>>> Then from dw_mci_exynos_resume_noirq() call pm_runtime_force_resume().
>>>>>>>
>>>>>>> I think it would simplify the code a bit, as you can rely on the
>>>>>>> runtime PM callbacks to deal with clk_prepare_enable() and
>>>>>>> clk_disable_unprepare(), unless I am mistaken.
>>>>>> This will not fix the problem, because mci_writel() calls in
>>>>>> dw_mci_exynos_resume_noirq are done unconditionally, regardless of the
>>>>>> controller's runtime pm state. Since commit 1d9174fbc55e after calling
>>>>>> pm_runtime_force_resume() there is no guarantee that device is in
>>>>>> runtime active state if it was runtime suspended state.
>>>>> Yes, because the runtime PM usage count is greater than 1.
>>>>> (pm_runtime_get_noresume() is called during probe).
>>>>>
>>>>> If you want to make this explicit (not relying on ->probe()), one can
>>>>> add a ->suspend_noirq() callback and call pm_runtime_get_noresume() in
>>>>> it.
>>>> Sorry, but I don't get how this would work. Exactly the same pattern as
>>>> you have proposed was already used in s3c-64xx SPI driver and it didn't
>>>> work properly (tested on the same SoC as this DW-MMC change). I had to
>>>> move register access to runtime resume callback to fix external abort
>>>> issue:
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e935dba111621bd6a0c5d48e6511a4d9885103b4
>>> Yep, that is a correct solution.
>>>
>>>> Here in DW-MMC driver such approach (moving all the code to runtime
>>>> resume callback) is not possible because of the potential interrupt storm
>>>> caused by the hw bug (that's the reason of using noirq resume callback).
>>> I understand. What you need is to run the runtime resume/suspend
>>> callbacks in the resume/suspend noirq phase. Moreover, you need to
>>> make sure that the runtime resume callback, really becomes invoked
>>> during the resume noirq phase, because of the HW bug.
>>>
>>> I think the below should work. Can you give it a try?
>>>
>>> It relies on the call pm_runtime_get_noresume(), done during
>>> ->probe(). Note that, the driver always keeps the RPM usage count
>>> increased, thus preventing runtime suspend during normal execution.
>>>
>>> Anyway, if this doesn't work, your suggested approach works fine as well.
>> Okay, finally I got it. I wasn't aware that dw_mmc-exynos keeps device
>> runtime active all the time between the driver probe() and remove().
>> Right, this will fix this specific case, but it isn't a generic solution,
>> so I will also add a comment on that, so one would not need to debug it
>> again if he decides to change runtime pm usage scheme in dw_mmc-exynos
>> in the future.
> Seems reasonable!
>
> If you want the more generic solution, I would add a exynos specific
> suspend_noirq() callback, let it call pm_runtime_get_noresume() and
> them pm_runtime_force_suspend().
>
> In the corresponding resume_noirq() callback, extend my suggested
> changes, with a call to pm_runtime_put_noidle() after all actions has
> been done in it.

Okay, this finally looks like a proper and future-proof solution. Just
one more question: why pm_runtime_put_noidle()? Hypothetically, when the
dw_mmc-exynos driver gets proper runtime PM management and system suspend
will happen when the device is in runtime suspended state, this will leave
it in runtime active state with refcount = 0 after the system resume cycle.
Imho simple pm_runtime_put() will be better in this case.

Best regards
Ulf Hansson June 12, 2018, 10:38 a.m. UTC | #4
On 12 June 2018 at 12:25, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> Hi Ulf,
>
> On 2018-06-12 12:07, Ulf Hansson wrote:
>> On 12 June 2018 at 11:52, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>> On 2018-06-12 11:20, Ulf Hansson wrote:
>>>> On 12 June 2018 at 10:28, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>>>> On 2018-06-11 14:24, Ulf Hansson wrote:
>>>>>> On 11 June 2018 at 11:50, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>>>>>> On 2018-06-11 11:35, Ulf Hansson wrote:
>>>>>>>> On 11 June 2018 at 08:48, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>>>>>>>>> dw_mci_exynos_resume_noirq() performs DWMMC register access without
>>>>>>>>> ensuring that respective clocks are enabled. This might cause external
>>>>>>>>> abort on some systems (observed on Exynos5433 based boards). Fix this
>>>>>>>>> by adding needed prepare_enable/disable_unprepare calls.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
>>>>>>>>> ---
>>>>>>>>>      drivers/mmc/host/dw_mmc-exynos.c | 6 ++++++
>>>>>>>>>      1 file changed, 6 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/mmc/host/dw_mmc-exynos.c b/drivers/mmc/host/dw_mmc-exynos.c
>>>>>>>>> index 3164681108ae..6125b68726b0 100644
>>>>>>>>> --- a/drivers/mmc/host/dw_mmc-exynos.c
>>>>>>>>> +++ b/drivers/mmc/host/dw_mmc-exynos.c
>>>>>>>>> @@ -193,6 +193,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)
>>>>>>>>>             struct dw_mci_exynos_priv_data *priv = host->priv;
>>>>>>>>>             u32 clksel;
>>>>>>>>>
>>>>>>>>> +       clk_prepare_enable(host->biu_clk);
>>>>>>>>> +       clk_prepare_enable(host->ciu_clk);
>>>>>>>>> +
>>>>>>>>>             if (priv->ctrl_type == DW_MCI_TYPE_EXYNOS7 ||
>>>>>>>>>                     priv->ctrl_type == DW_MCI_TYPE_EXYNOS7_SMU)
>>>>>>>>>                     clksel = mci_readl(host, CLKSEL64);
>>>>>>>>> @@ -207,6 +210,9 @@ static int dw_mci_exynos_resume_noirq(struct device *dev)
>>>>>>>>>                             mci_writel(host, CLKSEL, clksel);
>>>>>>>>>             }
>>>>>>>>>
>>>>>>>>> +       clk_disable_unprepare(host->biu_clk);
>>>>>>>>> +       clk_disable_unprepare(host->ciu_clk);
>>>>>>>>> +
>>>>>>>>>             return 0;
>>>>>>>>>      }
>>>>>>>>>      #else
>>>>>>>> I looked a little closer and I am wondering if it wouldn't be possible
>>>>>>>> to use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() instead of
>>>>>>>> SET_SYSTEM_SLEEP_PM_OPS()?
>>>>>>>>
>>>>>>>> Somelike this:
>>>>>>>> SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
>>>>>>>> dw_mci_exynos_resume_noirq)
>>>>>>>>
>>>>>>>> Then from dw_mci_exynos_resume_noirq() call pm_runtime_force_resume().
>>>>>>>>
>>>>>>>> I think it would simplify the code a bit, as you can rely on the
>>>>>>>> runtime PM callbacks to deal with clk_prepare_enable() and
>>>>>>>> clk_disable_unprepare(), unless I am mistaken.
>>>>>>> This will not fix the problem, because mci_writel() calls in
>>>>>>> dw_mci_exynos_resume_noirq are done unconditionally, regardless of the
>>>>>>> controller's runtime pm state. Since commit 1d9174fbc55e after calling
>>>>>>> pm_runtime_force_resume() there is no guarantee that device is in
>>>>>>> runtime active state if it was runtime suspended state.
>>>>>> Yes, because the runtime PM usage count is greater than 1.
>>>>>> (pm_runtime_get_noresume() is called during probe).
>>>>>>
>>>>>> If you want to make this explicit (not relying on ->probe()), one can
>>>>>> add a ->suspend_noirq() callback and call pm_runtime_get_noresume() in
>>>>>> it.
>>>>> Sorry, but I don't get how this would work. Exactly the same pattern as
>>>>> you have proposed was already used in s3c-64xx SPI driver and it didn't
>>>>> work properly (tested on the same SoC as this DW-MMC change). I had to
>>>>> move register access to runtime resume callback to fix external abort
>>>>> issue:
>>>>>
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e935dba111621bd6a0c5d48e6511a4d9885103b4
>>>> Yep, that is a correct solution.
>>>>
>>>>> Here in DW-MMC driver such approach (moving all the code to runtime
>>>>> resume callback) is not possible because of the potential interrupt storm
>>>>> caused by the hw bug (that's the reason of using noirq resume callback).
>>>> I understand. What you need is to run the runtime resume/suspend
>>>> callbacks in the resume/suspend noirq phase. Moreover, you need to
>>>> make sure that the runtime resume callback, really becomes invoked
>>>> during the resume noirq phase, because of the HW bug.
>>>>
>>>> I think the below should work. Can you give it a try?
>>>>
>>>> It relies on the call pm_runtime_get_noresume(), done during
>>>> ->probe(). Note that, the driver always keeps the RPM usage count
>>>> increased, thus preventing runtime suspend during normal execution.
>>>>
>>>> Anyway, if this doesn't work, your suggested approach works fine as well.
>>> Okay, finally I got it. I wasn't aware that dw_mmc-exynos keeps device
>>> runtime active all the time between the driver probe() and remove().
>>> Right, this will fix this specific case, but it isn't a generic solution,
>>> so I will also add a comment on that, so one would not need to debug it
>>> again if he decides to change runtime pm usage scheme in dw_mmc-exynos
>>> in the future.
>> Seems reasonable!
>>
>> If you want the more generic solution, I would add a exynos specific
>> suspend_noirq() callback, let it call pm_runtime_get_noresume() and
>> them pm_runtime_force_suspend().
>>
>> In the corresponding resume_noirq() callback, extend my suggested
>> changes, with a call to pm_runtime_put_noidle() after all actions has
>> been done in it.
>
> Okay, this finally looks like a proper and future-proof solution. Just
> one more question: why pm_runtime_put_noidle()? Hypothetically, when the
> dw_mmc-exynos driver gets proper runtime PM management and system suspend
> will happen when the device is in runtime suspended state, this will leave
> it in runtime active state with refcount = 0 after the system resume cycle.
> Imho simple pm_runtime_put() will be better in this case.

It doesn't really matter.

Runtime PM is disabled for the device (done by driver core) and the PM
workqueue is suspended, which means pm_runtime_put() will not
immediately runtime suspend the device.

The important part is to restore the usage count, such that the driver
core can potentially runtime suspend the device when device_complete()
is called for it.

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/mmc/host/dw_mmc-exynos.c b/drivers/mmc/host/dw_mmc-exynos.c
index a84aa3f1ae85..66132f7fceed 100644
--- a/drivers/mmc/host/dw_mmc-exynos.c
+++ b/drivers/mmc/host/dw_mmc-exynos.c
@@ -175,7 +175,9 @@  static int dw_mci_exynos_runtime_resume(struct device *dev)

        return ret;
 }
+#endif /* CONFIG_PM */

+#ifdef CONFIG_PM_SLEEP
 /**
  * dw_mci_exynos_resume_noirq - Exynos-specific resume code
  *
@@ -193,6 +195,8 @@  static int dw_mci_exynos_resume_noirq(struct device *dev)
        struct dw_mci_exynos_priv_data *priv = host->priv;
        u32 clksel;

+       pm_runtime_force_resume(dev);
+
        if (priv->ctrl_type == DW_MCI_TYPE_EXYNOS7 ||
                priv->ctrl_type == DW_MCI_TYPE_EXYNOS7_SMU)
                clksel = mci_readl(host, CLKSEL64);
@@ -209,9 +213,7 @@  static int dw_mci_exynos_resume_noirq(struct device *dev)

        return 0;
 }
-#else
-#define dw_mci_exynos_resume_noirq     NULL
-#endif /* CONFIG_PM */
+#endif /* CONFIG_PM_SLEEP */

 static void dw_mci_exynos_config_hs400(struct dw_mci *host, u32 timing)
 {
@@ -553,14 +555,11 @@  static int dw_mci_exynos_remove(struct
platform_device *pdev)
 }
 static const struct dev_pm_ops dw_mci_exynos_pmops = {
-       SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
-                               pm_runtime_force_resume)
+       SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
+                               dw_mci_exynos_resume_noirq)
        SET_RUNTIME_PM_OPS(dw_mci_runtime_suspend,
                           dw_mci_exynos_runtime_resume,
                           NULL)
-       .resume_noirq = dw_mci_exynos_resume_noirq,
-       .thaw_noirq = dw_mci_exynos_resume_noirq,
-       .restore_noirq = dw_mci_exynos_resume_noirq,
 };

 static struct platform_driver dw_mci_exynos_pltfm_driver = {