diff mbox

[v2] clk: Properly update prepare/enable count on orphan clock reparent

Message ID 20180126091845.11902-1-m.szyprowski@samsung.com (mailing list archive)
State New, archived
Headers show

Commit Message

Marek Szyprowski Jan. 26, 2018, 9:18 a.m. UTC
If orphaned clock has been already prepared/enabled (for example if it or
one of its children has CLK_IS_CRITICAL flag), then the prepare/enable
counters of the newly assigned parent are not updated correctly. This
might later cause warnings during changing clock parents.

This patch fixes following warning on Exynos5422-based boards:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 59 at drivers/clk/clk.c:811 clk_core_disable_lock+0x18/0x24
Modules linked in:
CPU: 0 PID: 59 Comm: kworker/0:1 Not tainted 4.15.0-rc7-next-20180115 #106
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
Workqueue: pm genpd_power_off_work_fn
[<c0111504>] (unwind_backtrace) from [<c010dbec>] (show_stack+0x10/0x14)
[<c010dbec>] (show_stack) from [<c09b3f34>] (dump_stack+0x90/0xc8)
[<c09b3f34>] (dump_stack) from [<c0125060>] (__warn+0xe4/0x110)
[<c0125060>] (__warn) from [<c01250cc>] (warn_slowpath_null+0x40/0x48)
[<c01250cc>] (warn_slowpath_null) from [<c048d584>] (clk_core_disable_lock+0x18/0x24)
[<c048d584>] (clk_core_disable_lock) from [<c048e490>] (clk_core_disable_unprepare+0xc/0x20)
[<c048e490>] (clk_core_disable_unprepare) from [<c048ea60>] (__clk_set_parent_after+0x48/0x4c)
[<c048ea60>] (__clk_set_parent_after) from [<c048ec88>] (clk_core_set_parent_nolock+0x224/0x5b4)
[<c048ec88>] (clk_core_set_parent_nolock) from [<c048f23c>] (clk_set_parent+0x38/0x6c)
[<c048f23c>] (clk_set_parent) from [<c049c41c>] (exynos_pd_power+0x74/0x1cc)
[<c049c41c>] (exynos_pd_power) from [<c0550768>] (genpd_power_off+0x164/0x264)
[<c0550768>] (genpd_power_off) from [<c0550b34>] (genpd_power_off_work_fn+0x2c/0x40)
[<c0550b34>] (genpd_power_off_work_fn) from [<c01432e4>] (process_one_work+0x1d0/0x7bc)
[<c01432e4>] (process_one_work) from [<c014393c>] (worker_thread+0x34/0x4dc)
[<c014393c>] (worker_thread) from [<c014a0d0>] (kthread+0x128/0x164)
[<c014a0d0>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
Exception stack(0xeeb01fb0 to 0xeeb01ff8)
1fa0:                                     00000000 00000000 00000000 00000000
1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
---[ end trace 503c239fb760f17a ]---
------------[ cut here ]------------
WARNING: CPU: 0 PID: 59 at drivers/clk/clk.c:684 clk_core_disable_unprepare+0x18/0x20
Modules linked in:
CPU: 0 PID: 59 Comm: kworker/0:1 Tainted: G        W        4.15.0-rc7-next-20180115 #106
Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
Workqueue: pm genpd_power_off_work_fn
[<c0111504>] (unwind_backtrace) from [<c010dbec>] (show_stack+0x10/0x14)
[<c010dbec>] (show_stack) from [<c09b3f34>] (dump_stack+0x90/0xc8)
[<c09b3f34>] (dump_stack) from [<c0125060>] (__warn+0xe4/0x110)
[<c0125060>] (__warn) from [<c01250cc>] (warn_slowpath_null+0x40/0x48)
[<c01250cc>] (warn_slowpath_null) from [<c048e49c>] (clk_core_disable_unprepare+0x18/0x20)
[<c048e49c>] (clk_core_disable_unprepare) from [<c048ea60>] (__clk_set_parent_after+0x48/0x4c)
[<c048ea60>] (__clk_set_parent_after) from [<c048ec88>] (clk_core_set_parent_nolock+0x224/0x5b4)
[<c048ec88>] (clk_core_set_parent_nolock) from [<c048f23c>] (clk_set_parent+0x38/0x6c)
[<c048f23c>] (clk_set_parent) from [<c049c41c>] (exynos_pd_power+0x74/0x1cc)
[<c049c41c>] (exynos_pd_power) from [<c0550768>] (genpd_power_off+0x164/0x264)
[<c0550768>] (genpd_power_off) from [<c0550b34>] (genpd_power_off_work_fn+0x2c/0x40)
[<c0550b34>] (genpd_power_off_work_fn) from [<c01432e4>] (process_one_work+0x1d0/0x7bc)
[<c01432e4>] (process_one_work) from [<c014393c>] (worker_thread+0x34/0x4dc)
[<c014393c>] (worker_thread) from [<c014a0d0>] (kthread+0x128/0x164)
[<c014a0d0>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
Exception stack(0xeeb01fb0 to 0xeeb01ff8)
1fa0:                                     00000000 00000000 00000000 00000000
1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
---[ end trace 503c239fb760f17b ]---
------------[ cut here ]------------

Fixes: f8f8f1d04494 ("clk: Don't touch hardware when reparenting during registration")
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
v2:
- moved clk_core_prepare(parent) before clk_enable_lock() to avoid calling
  it under a spinlock
---
 drivers/clk/clk.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Stephen Boyd Jan. 27, 2018, 12:49 a.m. UTC | #1
On 01/26, Marek Szyprowski wrote:
> If orphaned clock has been already prepared/enabled (for example if it or
> one of its children has CLK_IS_CRITICAL flag), then the prepare/enable
> counters of the newly assigned parent are not updated correctly. This
> might later cause warnings during changing clock parents.

This doesn't feel right. Perhaps we should delay enabling a clk
if it's CRITICAL until we adopt an orphaned clk. Good news is we
have orphan status tracking now so this should be pretty simple.
Otherwise migrating the count up is complicated and requires us
to call the prepare/enable ops on a critical clk and then keep
doing that each time it gets re-parented. Do you have this case,
where some clk is marked as CRITICAL, and then we need to migrate
that enable/prepare count to the parent?

Hopefully it isn't the worser case, where the clk is handed out
to some consumer but it's still orphaned at that point, and then
we have little control over the migration of state to the parent.
Marek Szyprowski Jan. 30, 2018, 9:01 a.m. UTC | #2
Hi Stephen

On 2018-01-27 01:49, Stephen Boyd wrote:
> On 01/26, Marek Szyprowski wrote:
>> If orphaned clock has been already prepared/enabled (for example if it or
>> one of its children has CLK_IS_CRITICAL flag), then the prepare/enable
>> counters of the newly assigned parent are not updated correctly. This
>> might later cause warnings during changing clock parents.
> This doesn't feel right. Perhaps we should delay enabling a clk
> if it's CRITICAL until we adopt an orphaned clk. Good news is we
> have orphan status tracking now so this should be pretty simple.

Not really. It won't be so simple, because we would need to track the
status of the whole clock sub-tree for enabling critical clocks. Even
then we would need to delay enabling them until the real top clock is
registered.

> Otherwise migrating the count up is complicated and requires us
> to call the prepare/enable ops on a critical clk and then keep
> doing that each time it gets re-parented. Do you have this case,
> where some clk is marked as CRITICAL, and then we need to migrate
> that enable/prepare count to the parent?

Yes, this is the case of Exynos5422-based boards, where we have a
bunch of critical clocks, which are reparented from internal PLL to
external oscillator when respective power domain is suspended.

It took me a while to gather all the needed information from the
debug logs. Here is one example of such clock tree:

fin_pll (clk1, top clock, external pll)
     fout_dpll (clk2)
         mout_sclk_dpll (clk3)
             mout_aclk300_disp1 (clk4)
                 dout_aclk300_disp1 (clk5)
                     mout_sw_aclk300_disp1 (clk6)
                         mout_user_aclk300_disp1 (clk7)
                             aclk300_disp1 (clk8, critical)

Before turning off power domains, aclk300_disp1 is reparented
directly to fin_pll (this is a hardware requirement). When power
domain is turned on again, aclk300_disp1 is reparented back to
mout_user_aclk300_disp1.

Clocks are registered in the following order:
1, 2, 7, 3, 6, 5, 8, 4.

In this case when critical clock is registered, clk4 is not yet
available, so clocks 1-3 won't get proper prepare/enable count.
Then this causes a warning during reparenting clk8 to clk1:

  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 59 at drivers/clk/clk.c:811 
clk_core_disable_lock+0x18/0x24
  Modules linked in:
  CPU: 0 PID: 59 Comm: kworker/0:1 Not tainted 
4.15.0-next-20180130-dirty #131
  Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
  Workqueue: pm genpd_power_off_work_fn
  [<c01114e4>] (unwind_backtrace) from [<c010dbc0>] (show_stack+0x10/0x14)
  [<c010dbc0>] (show_stack) from [<c09b7b94>] (dump_stack+0x90/0xc8)
  [<c09b7b94>] (dump_stack) from [<c0124ff8>] (__warn+0xe4/0x110)
  [<c0124ff8>] (__warn) from [<c0125064>] (warn_slowpath_null+0x40/0x48)
  [<c0125064>] (warn_slowpath_null) from [<c048e650>] 
(clk_core_disable_lock+0x18/0x24)
  [<c048e650>] (clk_core_disable_lock) from [<c048f51c>] 
(clk_core_disable_unprepare+0xc/0x20)
  [<c048f51c>] (clk_core_disable_unprepare) from [<c048faec>] 
(__clk_set_parent_after+0x48/0x4c)
  [<c048faec>] (__clk_set_parent_after) from [<c048fd28>] 
(clk_core_set_parent_nolock+0x238/0x5d0)
  [<c048fd28>] (clk_core_set_parent_nolock) from [<c04902e4>] 
(clk_set_parent+0x38/0x6c)
  [<c04902e4>] (clk_set_parent) from [<c049d760>] 
(exynos_pd_power+0x74/0x1cc)
  [<c049d760>] (exynos_pd_power) from [<c0552064>] 
(genpd_power_off+0x164/0x264)
  [<c0552064>] (genpd_power_off) from [<c0552430>] 
(genpd_power_off_work_fn+0x2c/0x40)
  [<c0552430>] (genpd_power_off_work_fn) from [<c014355c>] 
(process_one_work+0x1d0/0x7bc)
  [<c014355c>] (process_one_work) from [<c0143bb4>] 
(worker_thread+0x34/0x4dc)
  [<c0143bb4>] (worker_thread) from [<c014a348>] (kthread+0x128/0x164)
  [<c014a348>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
  Exception stack(0xeeb01fb0 to 0xeeb01ff8)
  1fa0:                                     00000000 00000000 00000000 
00000000
  1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
00000000
  1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
  ---[ end trace 48eea511a44c78ef ]---
  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 59 at drivers/clk/clk.c:684 
clk_core_disable_unprepare+0x18/0x20
  Modules linked in:
  CPU: 0 PID: 59 Comm: kworker/0:1 Tainted: G        W 
4.15.0-next-20180130-dirty #131
  Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
  Workqueue: pm genpd_power_off_work_fn
  [<c01114e4>] (unwind_backtrace) from [<c010dbc0>] (show_stack+0x10/0x14)
  [<c010dbc0>] (show_stack) from [<c09b7b94>] (dump_stack+0x90/0xc8)
  [<c09b7b94>] (dump_stack) from [<c0124ff8>] (__warn+0xe4/0x110)
  [<c0124ff8>] (__warn) from [<c0125064>] (warn_slowpath_null+0x40/0x48)
  [<c0125064>] (warn_slowpath_null) from [<c048f528>] 
(clk_core_disable_unprepare+0x18/0x20)
  [<c048f528>] (clk_core_disable_unprepare) from [<c048faec>] 
(__clk_set_parent_after+0x48/0x4c)
  [<c048faec>] (__clk_set_parent_after) from [<c048fd28>] 
(clk_core_set_parent_nolock+0x238/0x5d0)
  [<c048fd28>] (clk_core_set_parent_nolock) from [<c04902e4>] 
(clk_set_parent+0x38/0x6c)
  [<c04902e4>] (clk_set_parent) from [<c049d760>] 
(exynos_pd_power+0x74/0x1cc)
  [<c049d760>] (exynos_pd_power) from [<c0552064>] 
(genpd_power_off+0x164/0x264)
  [<c0552064>] (genpd_power_off) from [<c0552430>] 
(genpd_power_off_work_fn+0x2c/0x40)
  [<c0552430>] (genpd_power_off_work_fn) from [<c014355c>] 
(process_one_work+0x1d0/0x7bc)
  [<c014355c>] (process_one_work) from [<c0143bb4>] 
(worker_thread+0x34/0x4dc)
  [<c0143bb4>] (worker_thread) from [<c014a348>] (kthread+0x128/0x164)
  [<c014a348>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
  Exception stack(0xeeb01fb0 to 0xeeb01ff8)
  1fa0:                                     00000000 00000000 00000000 
00000000
  1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
00000000
  1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
  ---[ end trace 48eea511a44c78f0 ]---

Please note that this pattern happens for other critical clocks too.

> Hopefully it isn't the worser case, where the clk is handed out
> to some consumer but it's still orphaned at that point, and then
> we have little control over the migration of state to the parent.

Well, in my opinion my patch is simplest fix for the regression
introduced by commit f8f8f1d04494 ("clk: Don't touch hardware when
reparenting during registration"). It also doesn't have any side-effects
as it affects only the situation when orphaned clock has been already
prepared/enabled, what in practice means it or one of its children has
critical flag. Reworking critical clock handling will be much more
complicated.

Best regards
Marek Szyprowski Feb. 14, 2018, 1:37 p.m. UTC | #3
Hi Stephen,

On 2018-01-30 10:01, Marek Szyprowski wrote:
> Hi Stephen
>
> On 2018-01-27 01:49, Stephen Boyd wrote:
>> On 01/26, Marek Szyprowski wrote:
>>> If orphaned clock has been already prepared/enabled (for example if 
>>> it or
>>> one of its children has CLK_IS_CRITICAL flag), then the prepare/enable
>>> counters of the newly assigned parent are not updated correctly. This
>>> might later cause warnings during changing clock parents.
>> This doesn't feel right. Perhaps we should delay enabling a clk
>> if it's CRITICAL until we adopt an orphaned clk. Good news is we
>> have orphan status tracking now so this should be pretty simple.
>
> Not really. It won't be so simple, because we would need to track the
> status of the whole clock sub-tree for enabling critical clocks. Even
> then we would need to delay enabling them until the real top clock is
> registered.
>
>> Otherwise migrating the count up is complicated and requires us
>> to call the prepare/enable ops on a critical clk and then keep
>> doing that each time it gets re-parented. Do you have this case,
>> where some clk is marked as CRITICAL, and then we need to migrate
>> that enable/prepare count to the parent?
>
> Yes, this is the case of Exynos5422-based boards, where we have a
> bunch of critical clocks, which are reparented from internal PLL to
> external oscillator when respective power domain is suspended.
>
> It took me a while to gather all the needed information from the
> debug logs. Here is one example of such clock tree:
>
> fin_pll (clk1, top clock, external pll)
>     fout_dpll (clk2)
>         mout_sclk_dpll (clk3)
>             mout_aclk300_disp1 (clk4)
>                 dout_aclk300_disp1 (clk5)
>                     mout_sw_aclk300_disp1 (clk6)
>                         mout_user_aclk300_disp1 (clk7)
>                             aclk300_disp1 (clk8, critical)
>
> Before turning off power domains, aclk300_disp1 is reparented
> directly to fin_pll (this is a hardware requirement). When power
> domain is turned on again, aclk300_disp1 is reparented back to
> mout_user_aclk300_disp1.
>
> Clocks are registered in the following order:
> 1, 2, 7, 3, 6, 5, 8, 4.
>
> In this case when critical clock is registered, clk4 is not yet
> available, so clocks 1-3 won't get proper prepare/enable count.
> Then this causes a warning during reparenting clk8 to clk1:
>
>  ------------[ cut here ]------------
>  WARNING: CPU: 0 PID: 59 at drivers/clk/clk.c:811 
> clk_core_disable_lock+0x18/0x24
>  Modules linked in:
>  CPU: 0 PID: 59 Comm: kworker/0:1 Not tainted 
> 4.15.0-next-20180130-dirty #131
>  Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
>  Workqueue: pm genpd_power_off_work_fn
>  [<c01114e4>] (unwind_backtrace) from [<c010dbc0>] (show_stack+0x10/0x14)
>  [<c010dbc0>] (show_stack) from [<c09b7b94>] (dump_stack+0x90/0xc8)
>  [<c09b7b94>] (dump_stack) from [<c0124ff8>] (__warn+0xe4/0x110)
>  [<c0124ff8>] (__warn) from [<c0125064>] (warn_slowpath_null+0x40/0x48)
>  [<c0125064>] (warn_slowpath_null) from [<c048e650>] 
> (clk_core_disable_lock+0x18/0x24)
>  [<c048e650>] (clk_core_disable_lock) from [<c048f51c>] 
> (clk_core_disable_unprepare+0xc/0x20)
>  [<c048f51c>] (clk_core_disable_unprepare) from [<c048faec>] 
> (__clk_set_parent_after+0x48/0x4c)
>  [<c048faec>] (__clk_set_parent_after) from [<c048fd28>] 
> (clk_core_set_parent_nolock+0x238/0x5d0)
>  [<c048fd28>] (clk_core_set_parent_nolock) from [<c04902e4>] 
> (clk_set_parent+0x38/0x6c)
>  [<c04902e4>] (clk_set_parent) from [<c049d760>] 
> (exynos_pd_power+0x74/0x1cc)
>  [<c049d760>] (exynos_pd_power) from [<c0552064>] 
> (genpd_power_off+0x164/0x264)
>  [<c0552064>] (genpd_power_off) from [<c0552430>] 
> (genpd_power_off_work_fn+0x2c/0x40)
>  [<c0552430>] (genpd_power_off_work_fn) from [<c014355c>] 
> (process_one_work+0x1d0/0x7bc)
>  [<c014355c>] (process_one_work) from [<c0143bb4>] 
> (worker_thread+0x34/0x4dc)
>  [<c0143bb4>] (worker_thread) from [<c014a348>] (kthread+0x128/0x164)
>  [<c014a348>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
>  Exception stack(0xeeb01fb0 to 0xeeb01ff8)
>  1fa0:                                     00000000 00000000 00000000 
> 00000000
>  1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
> 00000000
>  1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
>  ---[ end trace 48eea511a44c78ef ]---
>  ------------[ cut here ]------------
>  WARNING: CPU: 0 PID: 59 at drivers/clk/clk.c:684 
> clk_core_disable_unprepare+0x18/0x20
>  Modules linked in:
>  CPU: 0 PID: 59 Comm: kworker/0:1 Tainted: G        W 
> 4.15.0-next-20180130-dirty #131
>  Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
>  Workqueue: pm genpd_power_off_work_fn
>  [<c01114e4>] (unwind_backtrace) from [<c010dbc0>] (show_stack+0x10/0x14)
>  [<c010dbc0>] (show_stack) from [<c09b7b94>] (dump_stack+0x90/0xc8)
>  [<c09b7b94>] (dump_stack) from [<c0124ff8>] (__warn+0xe4/0x110)
>  [<c0124ff8>] (__warn) from [<c0125064>] (warn_slowpath_null+0x40/0x48)
>  [<c0125064>] (warn_slowpath_null) from [<c048f528>] 
> (clk_core_disable_unprepare+0x18/0x20)
>  [<c048f528>] (clk_core_disable_unprepare) from [<c048faec>] 
> (__clk_set_parent_after+0x48/0x4c)
>  [<c048faec>] (__clk_set_parent_after) from [<c048fd28>] 
> (clk_core_set_parent_nolock+0x238/0x5d0)
>  [<c048fd28>] (clk_core_set_parent_nolock) from [<c04902e4>] 
> (clk_set_parent+0x38/0x6c)
>  [<c04902e4>] (clk_set_parent) from [<c049d760>] 
> (exynos_pd_power+0x74/0x1cc)
>  [<c049d760>] (exynos_pd_power) from [<c0552064>] 
> (genpd_power_off+0x164/0x264)
>  [<c0552064>] (genpd_power_off) from [<c0552430>] 
> (genpd_power_off_work_fn+0x2c/0x40)
>  [<c0552430>] (genpd_power_off_work_fn) from [<c014355c>] 
> (process_one_work+0x1d0/0x7bc)
>  [<c014355c>] (process_one_work) from [<c0143bb4>] 
> (worker_thread+0x34/0x4dc)
>  [<c0143bb4>] (worker_thread) from [<c014a348>] (kthread+0x128/0x164)
>  [<c014a348>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
>  Exception stack(0xeeb01fb0 to 0xeeb01ff8)
>  1fa0:                                     00000000 00000000 00000000 
> 00000000
>  1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
> 00000000
>  1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
>  ---[ end trace 48eea511a44c78f0 ]---
>
> Please note that this pattern happens for other critical clocks too.
>
>> Hopefully it isn't the worser case, where the clk is handed out
>> to some consumer but it's still orphaned at that point, and then
>> we have little control over the migration of state to the parent.
>
> Well, in my opinion my patch is simplest fix for the regression
> introduced by commit f8f8f1d04494 ("clk: Don't touch hardware when
> reparenting during registration"). It also doesn't have any side-effects
> as it affects only the situation when orphaned clock has been already
> prepared/enabled, what in practice means it or one of its children has
> critical flag. Reworking critical clock handling will be much more
> complicated.

Stephen, do you have any comments?

The issue is there for almost 2 months... v4.16-rc1 is out and its
a good time to get a fix in (this or any other if you propose such).

Best regards
Jerome Brunet Feb. 15, 2018, 11:44 a.m. UTC | #4
On Wed, 2018-02-14 at 14:37 +0100, Marek Szyprowski wrote:
> Hi Stephen,
> 
> On 2018-01-30 10:01, Marek Szyprowski wrote:
> > Hi Stephen
> > 
> > On 2018-01-27 01:49, Stephen Boyd wrote:
> > > On 01/26, Marek Szyprowski wrote:
> > > > If orphaned clock has been already prepared/enabled (for example if 
> > > > it or
> > > > one of its children has CLK_IS_CRITICAL flag), then the prepare/enable
> > > > counters of the newly assigned parent are not updated correctly. This
> > > > might later cause warnings during changing clock parents.
> > > 
> > > This doesn't feel right. Perhaps we should delay enabling a clk
> > > if it's CRITICAL until we adopt an orphaned clk. Good news is we
> > > have orphan status tracking now so this should be pretty simple.
> > 
> > Not really. It won't be so simple, because we would need to track the
> > status of the whole clock sub-tree for enabling critical clocks. Even
> > then we would need to delay enabling them until the real top clock is
> > registered.
> > 
> > > Otherwise migrating the count up is complicated and requires us
> > > to call the prepare/enable ops on a critical clk and then keep
> > > doing that each time it gets re-parented. Do you have this case,
> > > where some clk is marked as CRITICAL, and then we need to migrate
> > > that enable/prepare count to the parent?
> > 
> > Yes, this is the case of Exynos5422-based boards, where we have a
> > bunch of critical clocks, which are reparented from internal PLL to
> > external oscillator when respective power domain is suspended.
> > 
> > It took me a while to gather all the needed information from the
> > debug logs. Here is one example of such clock tree:
> > 
> > fin_pll (clk1, top clock, external pll)
> >     fout_dpll (clk2)
> >         mout_sclk_dpll (clk3)
> >             mout_aclk300_disp1 (clk4)
> >                 dout_aclk300_disp1 (clk5)
> >                     mout_sw_aclk300_disp1 (clk6)
> >                         mout_user_aclk300_disp1 (clk7)
> >                             aclk300_disp1 (clk8, critical)
> > 
> > Before turning off power domains, aclk300_disp1 is reparented
> > directly to fin_pll (this is a hardware requirement). When power
> > domain is turned on again, aclk300_disp1 is reparented back to
> > mout_user_aclk300_disp1.
> > 
> > Clocks are registered in the following order:
> > 1, 2, 7, 3, 6, 5, 8, 4.
> > 
> > In this case when critical clock is registered, clk4 is not yet
> > available, so clocks 1-3 won't get proper prepare/enable count.
> > Then this causes a warning during reparenting clk8 to clk1:
> > 
> >  ------------[ cut here ]------------
> >  WARNING: CPU: 0 PID: 59 at drivers/clk/clk.c:811 
> > clk_core_disable_lock+0x18/0x24
> >  Modules linked in:
> >  CPU: 0 PID: 59 Comm: kworker/0:1 Not tainted 
> > 4.15.0-next-20180130-dirty #131
> >  Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> >  Workqueue: pm genpd_power_off_work_fn
> >  [<c01114e4>] (unwind_backtrace) from [<c010dbc0>] (show_stack+0x10/0x14)
> >  [<c010dbc0>] (show_stack) from [<c09b7b94>] (dump_stack+0x90/0xc8)
> >  [<c09b7b94>] (dump_stack) from [<c0124ff8>] (__warn+0xe4/0x110)
> >  [<c0124ff8>] (__warn) from [<c0125064>] (warn_slowpath_null+0x40/0x48)
> >  [<c0125064>] (warn_slowpath_null) from [<c048e650>] 
> > (clk_core_disable_lock+0x18/0x24)
> >  [<c048e650>] (clk_core_disable_lock) from [<c048f51c>] 
> > (clk_core_disable_unprepare+0xc/0x20)
> >  [<c048f51c>] (clk_core_disable_unprepare) from [<c048faec>] 
> > (__clk_set_parent_after+0x48/0x4c)
> >  [<c048faec>] (__clk_set_parent_after) from [<c048fd28>] 
> > (clk_core_set_parent_nolock+0x238/0x5d0)
> >  [<c048fd28>] (clk_core_set_parent_nolock) from [<c04902e4>] 
> > (clk_set_parent+0x38/0x6c)
> >  [<c04902e4>] (clk_set_parent) from [<c049d760>] 
> > (exynos_pd_power+0x74/0x1cc)
> >  [<c049d760>] (exynos_pd_power) from [<c0552064>] 
> > (genpd_power_off+0x164/0x264)
> >  [<c0552064>] (genpd_power_off) from [<c0552430>] 
> > (genpd_power_off_work_fn+0x2c/0x40)
> >  [<c0552430>] (genpd_power_off_work_fn) from [<c014355c>] 
> > (process_one_work+0x1d0/0x7bc)
> >  [<c014355c>] (process_one_work) from [<c0143bb4>] 
> > (worker_thread+0x34/0x4dc)
> >  [<c0143bb4>] (worker_thread) from [<c014a348>] (kthread+0x128/0x164)
> >  [<c014a348>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
> >  Exception stack(0xeeb01fb0 to 0xeeb01ff8)
> >  1fa0:                                     00000000 00000000 00000000 
> > 00000000
> >  1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
> > 00000000
> >  1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> >  ---[ end trace 48eea511a44c78ef ]---
> >  ------------[ cut here ]------------
> >  WARNING: CPU: 0 PID: 59 at drivers/clk/clk.c:684 
> > clk_core_disable_unprepare+0x18/0x20
> >  Modules linked in:
> >  CPU: 0 PID: 59 Comm: kworker/0:1 Tainted: G        W 
> > 4.15.0-next-20180130-dirty #131
> >  Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
> >  Workqueue: pm genpd_power_off_work_fn
> >  [<c01114e4>] (unwind_backtrace) from [<c010dbc0>] (show_stack+0x10/0x14)
> >  [<c010dbc0>] (show_stack) from [<c09b7b94>] (dump_stack+0x90/0xc8)
> >  [<c09b7b94>] (dump_stack) from [<c0124ff8>] (__warn+0xe4/0x110)
> >  [<c0124ff8>] (__warn) from [<c0125064>] (warn_slowpath_null+0x40/0x48)
> >  [<c0125064>] (warn_slowpath_null) from [<c048f528>] 
> > (clk_core_disable_unprepare+0x18/0x20)
> >  [<c048f528>] (clk_core_disable_unprepare) from [<c048faec>] 
> > (__clk_set_parent_after+0x48/0x4c)
> >  [<c048faec>] (__clk_set_parent_after) from [<c048fd28>] 
> > (clk_core_set_parent_nolock+0x238/0x5d0)
> >  [<c048fd28>] (clk_core_set_parent_nolock) from [<c04902e4>] 
> > (clk_set_parent+0x38/0x6c)
> >  [<c04902e4>] (clk_set_parent) from [<c049d760>] 
> > (exynos_pd_power+0x74/0x1cc)
> >  [<c049d760>] (exynos_pd_power) from [<c0552064>] 
> > (genpd_power_off+0x164/0x264)
> >  [<c0552064>] (genpd_power_off) from [<c0552430>] 
> > (genpd_power_off_work_fn+0x2c/0x40)
> >  [<c0552430>] (genpd_power_off_work_fn) from [<c014355c>] 
> > (process_one_work+0x1d0/0x7bc)
> >  [<c014355c>] (process_one_work) from [<c0143bb4>] 
> > (worker_thread+0x34/0x4dc)
> >  [<c0143bb4>] (worker_thread) from [<c014a348>] (kthread+0x128/0x164)
> >  [<c014a348>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
> >  Exception stack(0xeeb01fb0 to 0xeeb01ff8)
> >  1fa0:                                     00000000 00000000 00000000 
> > 00000000
> >  1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
> > 00000000
> >  1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> >  ---[ end trace 48eea511a44c78f0 ]---
> > 
> > Please note that this pattern happens for other critical clocks too.
> > 
> > > Hopefully it isn't the worser case, where the clk is handed out
> > > to some consumer but it's still orphaned at that point, and then
> > > we have little control over the migration of state to the parent.
> > 
> > Well, in my opinion my patch is simplest fix for the regression
> > introduced by commit f8f8f1d04494 ("clk: Don't touch hardware when
> > reparenting during registration"). It also doesn't have any side-effects
> > as it affects only the situation when orphaned clock has been already
> > prepared/enabled, what in practice means it or one of its children has
> > critical flag. Reworking critical clock handling will be much more
> > complicated.
> 
> Stephen, do you have any comments?
> 
> The issue is there for almost 2 months... v4.16-rc1 is out and its
> a good time to get a fix in (this or any other if you propose such).
> 
> Best regards

Hi Marek,

I've sent another patch addressing the same issue (sorry, I did not see your
thread earlier)

While I agree parent adopting orphan need to take care of any existing count,
your solution could be missing a few things, like dealing with
CLK_OPS_PARENT_ENABLE clock types

Fortunately, __clk_set_parent_before() and __clk_set_parent_after() already deal
with those annoying stuff and does count migration well.

Would you mind trying this change [0] and let me know if works for you as well ?

[0] : https://lkml.kernel.org/r/20180214134340.17242-5-jbrunet@baylibre.com

Cheers
Jerome
Jerome Brunet Feb. 15, 2018, 12:19 p.m. UTC | #5
On Fri, 2018-01-26 at 16:49 -0800, Stephen Boyd wrote:
> On 01/26, Marek Szyprowski wrote:
> > If orphaned clock has been already prepared/enabled (for example if it or
> > one of its children has CLK_IS_CRITICAL flag), then the prepare/enable
> > counters of the newly assigned parent are not updated correctly. This
> > might later cause warnings during changing clock parents.
> 
> This doesn't feel right. Perhaps we should delay enabling a clk
> if it's CRITICAL until we adopt an orphaned clk.

It does not sounds right. A critical clock should be enabled as soon as
possible, regardless of current knowledge of the clock tree.

How would you decide when is right time anyway ?
The orphaned critical clock could be reparented to clock that is orphan itself,
you would get into the same count issue again.

> Good news is we
> have orphan status tracking now so this should be pretty simple.
> Otherwise migrating the count up is complicated and requires us
> to call the prepare/enable ops on a critical clk and then keep
> doing that each time it gets re-parented.

What is the problem with this ? We are going to have call those enable ops
anyway. Whether it is done one at a time or all at once, does it really matter ?

> Do you have this case,
> where some clk is marked as CRITICAL, and then we need to migrate
> that enable/prepare count to the parent?

yes.

> 
> Hopefully it isn't the worser case, where the clk is handed out
> to some consumer but it's still orphaned at that point, and then
> we have little control over the migration of state to the parent.
> 

This should not be a problem if the clock reparenting  and count migration is
done properly. To do  this, the __clk_set_parent_before() and
__clk_set_parent_after() function look good to me,  have I missed anything ?
Marek Szyprowski Feb. 15, 2018, 12:56 p.m. UTC | #6
Hi Jerome,

On 2018-02-15 12:44, Jerome Brunet wrote:
> On Wed, 2018-02-14 at 14:37 +0100, Marek Szyprowski wrote:
>> On 2018-01-30 10:01, Marek Szyprowski wrote:
>>> On 2018-01-27 01:49, Stephen Boyd wrote:
>>>> On 01/26, Marek Szyprowski wrote:
>>>>> If orphaned clock has been already prepared/enabled (for example if
>>>>> it or
>>>>> one of its children has CLK_IS_CRITICAL flag), then the prepare/enable
>>>>> counters of the newly assigned parent are not updated correctly. This
>>>>> might later cause warnings during changing clock parents.
>>>> This doesn't feel right. Perhaps we should delay enabling a clk
>>>> if it's CRITICAL until we adopt an orphaned clk. Good news is we
>>>> have orphan status tracking now so this should be pretty simple.
>>> Not really. It won't be so simple, because we would need to track the
>>> status of the whole clock sub-tree for enabling critical clocks. Even
>>> then we would need to delay enabling them until the real top clock is
>>> registered.
>>>
>>>> Otherwise migrating the count up is complicated and requires us
>>>> to call the prepare/enable ops on a critical clk and then keep
>>>> doing that each time it gets re-parented. Do you have this case,
>>>> where some clk is marked as CRITICAL, and then we need to migrate
>>>> that enable/prepare count to the parent?
>>> Yes, this is the case of Exynos5422-based boards, where we have a
>>> bunch of critical clocks, which are reparented from internal PLL to
>>> external oscillator when respective power domain is suspended.
>>>
>>> It took me a while to gather all the needed information from the
>>> debug logs. Here is one example of such clock tree:
>>>
>>> fin_pll (clk1, top clock, external pll)
>>>      fout_dpll (clk2)
>>>          mout_sclk_dpll (clk3)
>>>              mout_aclk300_disp1 (clk4)
>>>                  dout_aclk300_disp1 (clk5)
>>>                      mout_sw_aclk300_disp1 (clk6)
>>>                          mout_user_aclk300_disp1 (clk7)
>>>                              aclk300_disp1 (clk8, critical)
>>>
>>> Before turning off power domains, aclk300_disp1 is reparented
>>> directly to fin_pll (this is a hardware requirement). When power
>>> domain is turned on again, aclk300_disp1 is reparented back to
>>> mout_user_aclk300_disp1.
>>>
>>> Clocks are registered in the following order:
>>> 1, 2, 7, 3, 6, 5, 8, 4.
>>>
>>> In this case when critical clock is registered, clk4 is not yet
>>> available, so clocks 1-3 won't get proper prepare/enable count.
>>> Then this causes a warning during reparenting clk8 to clk1:
>>>
>>>   ------------[ cut here ]------------
>>>   WARNING: CPU: 0 PID: 59 at drivers/clk/clk.c:811
>>> clk_core_disable_lock+0x18/0x24
>>>   Modules linked in:
>>>   CPU: 0 PID: 59 Comm: kworker/0:1 Not tainted
>>> 4.15.0-next-20180130-dirty #131
>>>   Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
>>>   Workqueue: pm genpd_power_off_work_fn
>>>   [<c01114e4>] (unwind_backtrace) from [<c010dbc0>] (show_stack+0x10/0x14)
>>>   [<c010dbc0>] (show_stack) from [<c09b7b94>] (dump_stack+0x90/0xc8)
>>>   [<c09b7b94>] (dump_stack) from [<c0124ff8>] (__warn+0xe4/0x110)
>>>   [<c0124ff8>] (__warn) from [<c0125064>] (warn_slowpath_null+0x40/0x48)
>>>   [<c0125064>] (warn_slowpath_null) from [<c048e650>]
>>> (clk_core_disable_lock+0x18/0x24)
>>>   [<c048e650>] (clk_core_disable_lock) from [<c048f51c>]
>>> (clk_core_disable_unprepare+0xc/0x20)
>>>   [<c048f51c>] (clk_core_disable_unprepare) from [<c048faec>]
>>> (__clk_set_parent_after+0x48/0x4c)
>>>   [<c048faec>] (__clk_set_parent_after) from [<c048fd28>]
>>> (clk_core_set_parent_nolock+0x238/0x5d0)
>>>   [<c048fd28>] (clk_core_set_parent_nolock) from [<c04902e4>]
>>> (clk_set_parent+0x38/0x6c)
>>>   [<c04902e4>] (clk_set_parent) from [<c049d760>]
>>> (exynos_pd_power+0x74/0x1cc)
>>>   [<c049d760>] (exynos_pd_power) from [<c0552064>]
>>> (genpd_power_off+0x164/0x264)
>>>   [<c0552064>] (genpd_power_off) from [<c0552430>]
>>> (genpd_power_off_work_fn+0x2c/0x40)
>>>   [<c0552430>] (genpd_power_off_work_fn) from [<c014355c>]
>>> (process_one_work+0x1d0/0x7bc)
>>>   [<c014355c>] (process_one_work) from [<c0143bb4>]
>>> (worker_thread+0x34/0x4dc)
>>>   [<c0143bb4>] (worker_thread) from [<c014a348>] (kthread+0x128/0x164)
>>>   [<c014a348>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
>>>   Exception stack(0xeeb01fb0 to 0xeeb01ff8)
>>>   1fa0:                                     00000000 00000000 00000000
>>> 00000000
>>>   1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>>> 00000000
>>>   1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
>>>   ---[ end trace 48eea511a44c78ef ]---
>>>   ------------[ cut here ]------------
>>>   WARNING: CPU: 0 PID: 59 at drivers/clk/clk.c:684
>>> clk_core_disable_unprepare+0x18/0x20
>>>   Modules linked in:
>>>   CPU: 0 PID: 59 Comm: kworker/0:1 Tainted: G        W
>>> 4.15.0-next-20180130-dirty #131
>>>   Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
>>>   Workqueue: pm genpd_power_off_work_fn
>>>   [<c01114e4>] (unwind_backtrace) from [<c010dbc0>] (show_stack+0x10/0x14)
>>>   [<c010dbc0>] (show_stack) from [<c09b7b94>] (dump_stack+0x90/0xc8)
>>>   [<c09b7b94>] (dump_stack) from [<c0124ff8>] (__warn+0xe4/0x110)
>>>   [<c0124ff8>] (__warn) from [<c0125064>] (warn_slowpath_null+0x40/0x48)
>>>   [<c0125064>] (warn_slowpath_null) from [<c048f528>]
>>> (clk_core_disable_unprepare+0x18/0x20)
>>>   [<c048f528>] (clk_core_disable_unprepare) from [<c048faec>]
>>> (__clk_set_parent_after+0x48/0x4c)
>>>   [<c048faec>] (__clk_set_parent_after) from [<c048fd28>]
>>> (clk_core_set_parent_nolock+0x238/0x5d0)
>>>   [<c048fd28>] (clk_core_set_parent_nolock) from [<c04902e4>]
>>> (clk_set_parent+0x38/0x6c)
>>>   [<c04902e4>] (clk_set_parent) from [<c049d760>]
>>> (exynos_pd_power+0x74/0x1cc)
>>>   [<c049d760>] (exynos_pd_power) from [<c0552064>]
>>> (genpd_power_off+0x164/0x264)
>>>   [<c0552064>] (genpd_power_off) from [<c0552430>]
>>> (genpd_power_off_work_fn+0x2c/0x40)
>>>   [<c0552430>] (genpd_power_off_work_fn) from [<c014355c>]
>>> (process_one_work+0x1d0/0x7bc)
>>>   [<c014355c>] (process_one_work) from [<c0143bb4>]
>>> (worker_thread+0x34/0x4dc)
>>>   [<c0143bb4>] (worker_thread) from [<c014a348>] (kthread+0x128/0x164)
>>>   [<c014a348>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
>>>   Exception stack(0xeeb01fb0 to 0xeeb01ff8)
>>>   1fa0:                                     00000000 00000000 00000000
>>> 00000000
>>>   1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>>> 00000000
>>>   1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
>>>   ---[ end trace 48eea511a44c78f0 ]---
>>>
>>> Please note that this pattern happens for other critical clocks too.
>>>
>>>> Hopefully it isn't the worser case, where the clk is handed out
>>>> to some consumer but it's still orphaned at that point, and then
>>>> we have little control over the migration of state to the parent.
>>> Well, in my opinion my patch is simplest fix for the regression
>>> introduced by commit f8f8f1d04494 ("clk: Don't touch hardware when
>>> reparenting during registration"). It also doesn't have any side-effects
>>> as it affects only the situation when orphaned clock has been already
>>> prepared/enabled, what in practice means it or one of its children has
>>> critical flag. Reworking critical clock handling will be much more
>>> complicated.
>> Stephen, do you have any comments?
>>
>> The issue is there for almost 2 months... v4.16-rc1 is out and its
>> a good time to get a fix in (this or any other if you propose such).
>>
>> Best regards
> Hi Marek,
>
> I've sent another patch addressing the same issue (sorry, I did not see your
> thread earlier)
>
> While I agree parent adopting orphan need to take care of any existing count,
> your solution could be missing a few things, like dealing with
> CLK_OPS_PARENT_ENABLE clock types
>
> Fortunately, __clk_set_parent_before() and __clk_set_parent_after() already deal
> with those annoying stuff and does count migration well.
>
> Would you mind trying this change [0] and let me know if works for you as well ?
>
> [0] : https://lkml.kernel.org/r/20180214134340.17242-5-jbrunet@baylibre.com

I'm fine with your approach. It fixes the issue observed on 
Exynos5422-based boards.

Best regards
diff mbox

Patch

diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 0f686a9dac3e..a14d64fa6a28 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -2981,7 +2981,11 @@  static int __clk_core_init(struct clk_core *core)
 		 */
 		if (parent) {
 			/* update the clk tree topology */
+			if (orphan->prepare_count)
+				clk_core_prepare(parent);
 			flags = clk_enable_lock();
+			if (orphan->enable_count)
+				clk_core_enable(parent);
 			clk_reparent(orphan, parent);
 			clk_enable_unlock(flags);
 			__clk_recalc_accuracies(orphan);