cpuidle: riscv-sbi: Stop using non-retentive suspend

Message ID	20221121205647.23343-1-palmer@rivosinc.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org> Subject: [PATCH] cpuidle: riscv-sbi: Stop using non-retentive suspend Date: Mon, 21 Nov 2022 12:56:47 -0800 Message-Id: <20221121205647.23343-1-palmer@rivosinc.com> MIME-Version: 1.0 Cc: anup@brainfault.org, rafael@kernel.org, daniel.lezcano@linaro.org, Paul Walmsley <paul.walmsley@sifive.com>, Palmer Dabbelt <palmer@dabbelt.com>, aou@eecs.berkeley.edu, linux-pm@vger.kernel.org, linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org, linux@rivosinc.com, Palmer Dabbelt <palmer@rivosinc.com> From: Palmer Dabbelt <palmer@rivosinc.com> To: Conor Dooley <conor@kernel.org> Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-riscv" <linux-riscv-bounces@lists.infradead.org> Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org
Series	cpuidle: riscv-sbi: Stop using non-retentive suspend \| expand cpuidle: riscv-sbi: Stop using non-retentive suspend

Context	Check	Description
conchuod/patch_count	success	Link
conchuod/cover_letter	success	Single patches do not need cover letters
conchuod/tree_selection	success	Guessed tree name to be fixes
conchuod/fixes_present	success	Fixes tag present in non-next series
conchuod/verify_signedoff	success	Signed-off-by tag matches author and committer
conchuod/kdoc	success	Errors and warnings before: 0 this patch: 0
conchuod/module_param	success	Was 0 now: 0
conchuod/build_rv32_defconfig	success	Build OK
conchuod/build_warn_rv64	success	Errors and warnings before: 0 this patch: 0
conchuod/dtb_warn_rv64	success	Errors and warnings before: 0 this patch: 0
conchuod/header_inline	success	No static functions without inline keyword in header files
conchuod/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 17 lines checked
conchuod/source_inline	success	Was 0 now: 0
conchuod/build_rv64_nommu_k210_defconfig	success	Build OK
conchuod/verify_fixes	success	Fixes tag looks correct
conchuod/build_rv64_nommu_virt_defconfig	success	Build OK

Palmer Dabbelt Nov. 21, 2022, 8:56 p.m. UTC

From: Palmer Dabbelt <palmer@rivosinc.com>

As per [1], whether or not the core can wake up from non-retentive
suspend is a platform-specific detail.  We don't have any way to encode
that, so just stop using them until we've sorted that out.

Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687
Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver")
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

---

This should allow us to revert 232ccac1bd9b ("clocksource/drivers/riscv:
Events are stopped during CPU suspend"), which fixes suspend on the D1
but breaks timers everywhere.
---
 drivers/cpuidle/cpuidle-riscv-sbi.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Conor Dooley Nov. 21, 2022, 9:10 p.m. UTC | #1

On Mon, Nov 21, 2022 at 12:56:47PM -0800, Palmer Dabbelt wrote:
> From: Palmer Dabbelt <palmer@rivosinc.com>
> 
> As per [1], whether or not the core can wake up from non-retentive
> suspend is a platform-specific detail.  We don't have any way to encode
> that, so just stop using them until we've sorted that out.

For anyone playing along at home, Anup had a proposal for encoding this
information (yoinked from the GH issue below):
https://lore.kernel.org/all/20220727114302.302201-1-apatel@ventanamicro.com/> 

> Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687
> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver")
> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
> 
> ---
> 
> This should allow us to revert 232ccac1bd9b ("clocksource/drivers/riscv:
> Events are stopped during CPU suspend"), which fixes suspend on the D1
> but breaks timers everywhere.

FWIW the revert is at:
https://lore.kernel.org/linux-riscv/20221023185444.678573-1-conor@kernel.org/

Commit message is probably a little lacking as I didn't understand the
problem when I wrote it. I'll respin the revert with a tidier message
tomorrow.

Acked-by: Conor Dooley <conor.dooley@microchip.com>

> ---
>  drivers/cpuidle/cpuidle-riscv-sbi.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/cpuidle/cpuidle-riscv-sbi.c b/drivers/cpuidle/cpuidle-riscv-sbi.c
> index 05fe2902df9a..9d1063a54495 100644
> --- a/drivers/cpuidle/cpuidle-riscv-sbi.c
> +++ b/drivers/cpuidle/cpuidle-riscv-sbi.c
> @@ -214,6 +214,17 @@ static bool sbi_suspend_state_is_valid(u32 state)
>  	if (state > SBI_HSM_SUSPEND_NON_RET_DEFAULT &&
>  	    state < SBI_HSM_SUSPEND_NON_RET_PLATFORM)
>  		return false;
> +
> +	/*
> +	 * Whether or not RISC-V systems deliver interrupts to harts in a
> +	 * non-retentive suspend state is a platform-specific detail.  This can
> +	 * leave the hart unable to wake up, so just mark these states as
> +	 * unsupported until we have a mechanism to expose these
> +	 * platform-specific details to Linux.
> +	 */
> +	if (state & SBI_HSM_SUSP_NON_RET_BIT)
> +		return false;
> +
>  	return true;
>  }
>  
> -- 
> 2.38.1
>

Samuel Holland Nov. 22, 2022, 12:45 a.m. UTC | #2

Hi Palmer,

On 11/21/22 14:56, Palmer Dabbelt wrote:
> From: Palmer Dabbelt <palmer@rivosinc.com>
> 
> As per [1], whether or not the core can wake up from non-retentive
> suspend is a platform-specific detail.  We don't have any way to encode
> that, so just stop using them until we've sorted that out.

We do have *exactly* a way to encode that. Specifically, the existence
or non-existence of a non-retentive CPU suspend state in the DT.

If a hart has no way of resuming from non-retentive suspend (i.e. a
complete lack of interrupt delivery in non-retentive suspend), then it
makes zero sense to advertise such a suspend state in the DT. Therefore,
if the state exists in the DT, you can assume there is *some* interrupt
that can wake up the hart. And I would extend that to assume at least
one of those wakeup-capable interrupts is a timer interrupt, although
not necessarily the architectural timer interrupt.

> Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687

This comment refers specifically to the architectural timer interrupt,
not interrupts more generally.

> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver")
> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
> 
> ---
> 
> This should allow us to revert 232ccac1bd9b ("clocksource/drivers/riscv:
> Events are stopped during CPU suspend"), which fixes suspend on the D1
> but breaks timers everywhere.

I understand that reverting 232ccac1bd9b is the easiest way to fix the
issues you and others are seeing. I do not have any functioning RISC-V
hardware with SMP, so it is hard for me to help debug the root issue in
the Linux timer code. I do not know why turning on the C3STOP flag
breaks RCU stall detection or clock_nanosleep(), but I agree that
breakage should not happen.

So while I still think 232ccac1bd9b is the right change to make from a
"following the spec" standpoint, I am okay with reverting it for
pragmatic reasons. Since the D1 has another timer driver that is
currently used in preference to the RISC-V/SBI timer triver, reverting
232ccac1bd9b does not break non-retentive suspend for the D1.

But please do not make the change below. It is unnecessarily broad, and
will break something that works fine right now. If the DT advertises a
CPU suspend state that cannot be woken up from at all, then that is a
bug in the DT. Linux should not try to work around that.

So revert 232ccac1bd9b for now, and we can figure out what to do about
the DT property, but please do not merge this.

Regards,
Samuel

> ---
>  drivers/cpuidle/cpuidle-riscv-sbi.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/cpuidle/cpuidle-riscv-sbi.c b/drivers/cpuidle/cpuidle-riscv-sbi.c
> index 05fe2902df9a..9d1063a54495 100644
> --- a/drivers/cpuidle/cpuidle-riscv-sbi.c
> +++ b/drivers/cpuidle/cpuidle-riscv-sbi.c
> @@ -214,6 +214,17 @@ static bool sbi_suspend_state_is_valid(u32 state)
>  	if (state > SBI_HSM_SUSPEND_NON_RET_DEFAULT &&
>  	    state < SBI_HSM_SUSPEND_NON_RET_PLATFORM)
>  		return false;
> +
> +	/*
> +	 * Whether or not RISC-V systems deliver interrupts to harts in a
> +	 * non-retentive suspend state is a platform-specific detail.  This can
> +	 * leave the hart unable to wake up, so just mark these states as
> +	 * unsupported until we have a mechanism to expose these
> +	 * platform-specific details to Linux.
> +	 */
> +	if (state & SBI_HSM_SUSP_NON_RET_BIT)
> +		return false;
> +
>  	return true;
>  }
>

Anup Patel Nov. 22, 2022, 3:45 a.m. UTC | #3

On Tue, Nov 22, 2022 at 2:27 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
>
> From: Palmer Dabbelt <palmer@rivosinc.com>
>
> As per [1], whether or not the core can wake up from non-retentive
> suspend is a platform-specific detail.  We don't have any way to encode
> that, so just stop using them until we've sorted that out.
>
> Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687
> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver")
> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

This is just unnecessary maintenance churn and it's not the
right way to go. Better to fix this the right way instead of having
a temporary fix.

I had already sent-out a patch series 5 months back to describe
this in DT:
https://lore.kernel.org/lkml/20220727114302.302201-1-apatel@ventanamicro.com/

No one has commented/suggested anything (except Samuel
Holland and Sudeep Holla).

Please review this series. I can quickly address comments to
make this available for Linux-6.2. Until this series is merged,
the affected platforms can simply remove non-retentive suspend
states from their DT.

With all due respect, NACK to this patch from my side.

Regards,
Anup

>
> ---
>
> This should allow us to revert 232ccac1bd9b ("clocksource/drivers/riscv:
> Events are stopped during CPU suspend"), which fixes suspend on the D1
> but breaks timers everywhere.
> ---
>  drivers/cpuidle/cpuidle-riscv-sbi.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/drivers/cpuidle/cpuidle-riscv-sbi.c b/drivers/cpuidle/cpuidle-riscv-sbi.c
> index 05fe2902df9a..9d1063a54495 100644
> --- a/drivers/cpuidle/cpuidle-riscv-sbi.c
> +++ b/drivers/cpuidle/cpuidle-riscv-sbi.c
> @@ -214,6 +214,17 @@ static bool sbi_suspend_state_is_valid(u32 state)
>         if (state > SBI_HSM_SUSPEND_NON_RET_DEFAULT &&
>             state < SBI_HSM_SUSPEND_NON_RET_PLATFORM)
>                 return false;
> +
> +       /*
> +        * Whether or not RISC-V systems deliver interrupts to harts in a
> +        * non-retentive suspend state is a platform-specific detail.  This can
> +        * leave the hart unable to wake up, so just mark these states as
> +        * unsupported until we have a mechanism to expose these
> +        * platform-specific details to Linux.
> +        */
> +       if (state & SBI_HSM_SUSP_NON_RET_BIT)
> +               return false;
> +
>         return true;
>  }
>
> --
> 2.38.1
>

Palmer Dabbelt Nov. 22, 2022, 5:16 a.m. UTC | #4

On Mon, 21 Nov 2022 19:45:07 PST (-0800), anup@brainfault.org wrote:
> On Tue, Nov 22, 2022 at 2:27 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
>>
>> From: Palmer Dabbelt <palmer@rivosinc.com>
>>
>> As per [1], whether or not the core can wake up from non-retentive
>> suspend is a platform-specific detail.  We don't have any way to encode
>> that, so just stop using them until we've sorted that out.
>>
>> Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687
>> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver")
>> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
>
> This is just unnecessary maintenance churn and it's not the
> right way to go. Better to fix this the right way instead of having
> a temporary fix.
>
> I had already sent-out a patch series 5 months back to describe
> this in DT:
> https://lore.kernel.org/lkml/20220727114302.302201-1-apatel@ventanamicro.com/
>
> No one has commented/suggested anything (except Samuel
> Holland and Sudeep Holla).

I see some comments from Krzysztof here 
<https://lore.kernel.org/lkml/7a0477a0-9f0f-87d6-4070-30321745f4cc@linaro.org/> 
as well.  Looks like everyone is pointing out that having our CPU nodes 
encode timers is a bad idea, my guess is that they're probably right.

> Please review this series. I can quickly address comments to
> make this available for Linux-6.2. Until this series is merged,
> the affected platforms can simply remove non-retentive suspend
> states from their DT.

That leaves us with a dependency between kernel versions and DT 
bindings: kernels with the current driver will result in broken systems 
with the non-retentive suspend states in the DT they boot with when 
those states can't wake up the CPU.

> With all due respect, NACK to this patch from my side.
>
> Regards,
> Anup
>
>>
>> ---
>>
>> This should allow us to revert 232ccac1bd9b ("clocksource/drivers/riscv:
>> Events are stopped during CPU suspend"), which fixes suspend on the D1
>> but breaks timers everywhere.
>> ---
>>  drivers/cpuidle/cpuidle-riscv-sbi.c | 11 +++++++++++
>>  1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/cpuidle/cpuidle-riscv-sbi.c b/drivers/cpuidle/cpuidle-riscv-sbi.c
>> index 05fe2902df9a..9d1063a54495 100644
>> --- a/drivers/cpuidle/cpuidle-riscv-sbi.c
>> +++ b/drivers/cpuidle/cpuidle-riscv-sbi.c
>> @@ -214,6 +214,17 @@ static bool sbi_suspend_state_is_valid(u32 state)
>>         if (state > SBI_HSM_SUSPEND_NON_RET_DEFAULT &&
>>             state < SBI_HSM_SUSPEND_NON_RET_PLATFORM)
>>                 return false;
>> +
>> +       /*
>> +        * Whether or not RISC-V systems deliver interrupts to harts in a
>> +        * non-retentive suspend state is a platform-specific detail.  This can
>> +        * leave the hart unable to wake up, so just mark these states as
>> +        * unsupported until we have a mechanism to expose these
>> +        * platform-specific details to Linux.
>> +        */
>> +       if (state & SBI_HSM_SUSP_NON_RET_BIT)
>> +               return false;
>> +
>>         return true;
>>  }
>>
>> --
>> 2.38.1
>>

Anup Patel Nov. 22, 2022, 5:36 a.m. UTC | #5

On Tue, Nov 22, 2022 at 10:46 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
>
> On Mon, 21 Nov 2022 19:45:07 PST (-0800), anup@brainfault.org wrote:
> > On Tue, Nov 22, 2022 at 2:27 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
> >>
> >> From: Palmer Dabbelt <palmer@rivosinc.com>
> >>
> >> As per [1], whether or not the core can wake up from non-retentive
> >> suspend is a platform-specific detail.  We don't have any way to encode
> >> that, so just stop using them until we've sorted that out.
> >>
> >> Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687
> >> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver")
> >> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
> >
> > This is just unnecessary maintenance churn and it's not the
> > right way to go. Better to fix this the right way instead of having
> > a temporary fix.
> >
> > I had already sent-out a patch series 5 months back to describe
> > this in DT:
> > https://lore.kernel.org/lkml/20220727114302.302201-1-apatel@ventanamicro.com/
> >
> > No one has commented/suggested anything (except Samuel
> > Holland and Sudeep Holla).
>
> I see some comments from Krzysztof here
> <https://lore.kernel.org/lkml/7a0477a0-9f0f-87d6-4070-30321745f4cc@linaro.org/>
> as well.  Looks like everyone is pointing out that having our CPU nodes
> encode timers is a bad idea, my guess is that they're probably right.

Adding a separate timer DT node, creates a new set of compatibility
issues for existing platforms. I am fine updating my series to have
separate timer DT node but do we want to go in this direction ?

Even if ARM has a separate timer DT node, the timers are still part
of the CPU. It depends on how we see the DT bindings aligning with
actual HW.

>
> > Please review this series. I can quickly address comments to
> > make this available for Linux-6.2. Until this series is merged,
> > the affected platforms can simply remove non-retentive suspend
> > states from their DT.
>
> That leaves us with a dependency between kernel versions and DT
> bindings: kernels with the current driver will result in broken systems
> with the non-retentive suspend states in the DT they boot with when
> those states can't wake up the CPU.

This is not a new problem we are facing. Even in the ARM world,
the DT bindings grew organically over time based on newer platform
requirements.

Now that we have a platform which does not want the time
C3STOP feature, we need to first come-up with DT bindings
to support this platform instead of temporarily disabling
features which don't work on this platform.

Regards,
Anup

>
> > With all due respect, NACK to this patch from my side.
> >
> > Regards,
> > Anup
> >
> >>
> >> ---
> >>
> >> This should allow us to revert 232ccac1bd9b ("clocksource/drivers/riscv:
> >> Events are stopped during CPU suspend"), which fixes suspend on the D1
> >> but breaks timers everywhere.
> >> ---
> >>  drivers/cpuidle/cpuidle-riscv-sbi.c | 11 +++++++++++
> >>  1 file changed, 11 insertions(+)
> >>
> >> diff --git a/drivers/cpuidle/cpuidle-riscv-sbi.c b/drivers/cpuidle/cpuidle-riscv-sbi.c
> >> index 05fe2902df9a..9d1063a54495 100644
> >> --- a/drivers/cpuidle/cpuidle-riscv-sbi.c
> >> +++ b/drivers/cpuidle/cpuidle-riscv-sbi.c
> >> @@ -214,6 +214,17 @@ static bool sbi_suspend_state_is_valid(u32 state)
> >>         if (state > SBI_HSM_SUSPEND_NON_RET_DEFAULT &&
> >>             state < SBI_HSM_SUSPEND_NON_RET_PLATFORM)
> >>                 return false;
> >> +
> >> +       /*
> >> +        * Whether or not RISC-V systems deliver interrupts to harts in a
> >> +        * non-retentive suspend state is a platform-specific detail.  This can
> >> +        * leave the hart unable to wake up, so just mark these states as
> >> +        * unsupported until we have a mechanism to expose these
> >> +        * platform-specific details to Linux.
> >> +        */
> >> +       if (state & SBI_HSM_SUSP_NON_RET_BIT)
> >> +               return false;
> >> +
> >>         return true;
> >>  }
> >>
> >> --
> >> 2.38.1
> >>

Conor Dooley Nov. 22, 2022, 11:06 a.m. UTC | #6

Hey Samuel,

Thanks a lot for the extra context.

On Mon, Nov 21, 2022 at 06:45:25PM -0600, Samuel Holland wrote:
> Hi Palmer,
> 
> On 11/21/22 14:56, Palmer Dabbelt wrote:
> > From: Palmer Dabbelt <palmer@rivosinc.com>
> > 
> > As per [1], whether or not the core can wake up from non-retentive
> > suspend is a platform-specific detail.  We don't have any way to encode
> > that, so just stop using them until we've sorted that out.
> 
> We do have *exactly* a way to encode that. Specifically, the existence
> or non-existence of a non-retentive CPU suspend state in the DT.
> 
> If a hart has no way of resuming from non-retentive suspend (i.e. a
> complete lack of interrupt delivery in non-retentive suspend), then it
> makes zero sense to advertise such a suspend state in the DT.

I would have to agree with that. I think the sprawling conversation has
confused us all at this point, I (prior to reading this mail) thought
that suspend was borked on the D1. I don't think anyone is advertising
specific states in the DT at the moment though, I had a check in the D1
patchset and didn't see anything there - unless you're adding it
dynamically from the bootloader?

> Therefore,
> if the state exists in the DT, you can assume there is *some* interrupt
> that can wake up the hart. And I would extend that to assume at least
> one of those wakeup-capable interrupts is a timer interrupt, although
> not necessarily the architectural timer interrupt.
> 
> > Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687
> 
> This comment refers specifically to the architectural timer interrupt,
> not interrupts more generally.
> 
> > Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver")
> > Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
> > 
> > ---
> > 
> > This should allow us to revert 232ccac1bd9b ("clocksource/drivers/riscv:
> > Events are stopped during CPU suspend"), which fixes suspend on the D1
> > but breaks timers everywhere.
> 
> I understand that reverting 232ccac1bd9b is the easiest way to fix the
> issues you and others are seeing.

I am going to submit another respin of that revert, hopefully with the
extra context from here and elsewhere mixed in.

> I do not have any functioning RISC-V
> hardware with SMP, so it is hard for me to help debug the root issue in
> the Linux timer code. I do not know why turning on the C3STOP flag
> breaks RCU stall detection or clock_nanosleep(), but I agree that
> breakage should not happen.
> 
> So while I still think 232ccac1bd9b is the right change to make from a
> "following the spec" standpoint

Right, so the spec says:
Request the SBI implementation to put the calling hart in a platform
specific suspend (or low power) state specified by the suspend_type
parameter. The hart will automatically come out of suspended state and
resume normal execution when it receives an interrupt or platform
specific hardware event.

That, as we have discussed a bunch of times, does not say whether a
given interrupt should actually arrive during suspend. The correct
behaviour, to me, is to assume that no events arrive during suspend.

We've got a regular old SiFive implementation so I assume (and will go
investigate at some point this week) that the other SiFive {,based}
implementations also have timer events during suspend.

> I am okay with reverting it for
> pragmatic reasons. Since the D1 has another timer driver that is
> currently used in preference to the RISC-V/SBI timer triver,

Once we have got something in place to actually make the determination
we can revert the revert. I'll go give some feedback on Anup's stuff,
I've been meaning to but unfortunately not had the chance to do so yet.

> reverting
> 232ccac1bd9b does not break non-retentive suspend for the D1.

Ah, I did not know this. Probably should have gone looking before I
acked this patch - sorry!
Since that's the case this patch seems un-needed.

> But please do not make the change below. It is unnecessarily broad, and
> will break something that works fine right now. If the DT advertises a
> CPU suspend state that cannot be woken up from at all, then that is a
> bug in the DT. Linux should not try to work around that.

Thanks again Samuel :)

> > ---
> >  drivers/cpuidle/cpuidle-riscv-sbi.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/drivers/cpuidle/cpuidle-riscv-sbi.c b/drivers/cpuidle/cpuidle-riscv-sbi.c
> > index 05fe2902df9a..9d1063a54495 100644
> > --- a/drivers/cpuidle/cpuidle-riscv-sbi.c
> > +++ b/drivers/cpuidle/cpuidle-riscv-sbi.c
> > @@ -214,6 +214,17 @@ static bool sbi_suspend_state_is_valid(u32 state)
> >  	if (state > SBI_HSM_SUSPEND_NON_RET_DEFAULT &&
> >  	    state < SBI_HSM_SUSPEND_NON_RET_PLATFORM)
> >  		return false;
> > +
> > +	/*
> > +	 * Whether or not RISC-V systems deliver interrupts to harts in a
> > +	 * non-retentive suspend state is a platform-specific detail.  This can
> > +	 * leave the hart unable to wake up, so just mark these states as
> > +	 * unsupported until we have a mechanism to expose these
> > +	 * platform-specific details to Linux.
> > +	 */
> > +	if (state & SBI_HSM_SUSP_NON_RET_BIT)
> > +		return false;
> > +
> >  	return true;
> >  }
> >  
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

Conor Dooley Nov. 22, 2022, 11:19 a.m. UTC | #7

On Tue, Nov 22, 2022 at 11:06:15AM +0530, Anup Patel wrote:
> On Tue, Nov 22, 2022 at 10:46 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
> >
> > On Mon, 21 Nov 2022 19:45:07 PST (-0800), anup@brainfault.org wrote:
> > > On Tue, Nov 22, 2022 at 2:27 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
> > >>
> > >> From: Palmer Dabbelt <palmer@rivosinc.com>
> > >>
> > >> As per [1], whether or not the core can wake up from non-retentive
> > >> suspend is a platform-specific detail.  We don't have any way to encode
> > >> that, so just stop using them until we've sorted that out.
> > >>
> > >> Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687
> > >> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver")
> > >> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
> > >
> > > This is just unnecessary maintenance churn and it's not the
> > > right way to go. Better to fix this the right way instead of having
> > > a temporary fix.
> > >
> > > I had already sent-out a patch series 5 months back to describe
> > > this in DT:
> > > https://lore.kernel.org/lkml/20220727114302.302201-1-apatel@ventanamicro.com/
> > >
> > > No one has commented/suggested anything (except Samuel
> > > Holland and Sudeep Holla).
> >
> > I see some comments from Krzysztof here
> > <https://lore.kernel.org/lkml/7a0477a0-9f0f-87d6-4070-30321745f4cc@linaro.org/>
> > as well.  Looks like everyone is pointing out that having our CPU nodes
> > encode timers is a bad idea, my guess is that they're probably right.
> 
> Adding a separate timer DT node, creates a new set of compatibility
> issues for existing platforms. I am fine updating my series to have
> separate timer DT node but do we want to go in this direction ?

I don't really follow. How is there a compatibility issue created by
adding a new node that is not added for a new property? Both will
require changes to the device tree. (You need not reply here, I am going
to review the other thread, it's been on my todo list for too long. Been
caught up with non-coherent stuff & our sw release cycle..)

> Even if ARM has a separate timer DT node, the timers are still part
> of the CPU. It depends on how we see the DT bindings aligning with
> actual HW.
> 
> >
> > > Please review this series. I can quickly address comments to
> > > make this available for Linux-6.2. Until this series is merged,
> > > the affected platforms can simply remove non-retentive suspend
> > > states from their DT.
> >
> > That leaves us with a dependency between kernel versions and DT
> > bindings: kernels with the current driver will result in broken systems
> > with the non-retentive suspend states in the DT they boot with when
> > those states can't wake up the CPU.

Can someone point me at a (non D1 or virt) system that has suspend
states in the DT that would need fixing?

> This is not a new problem we are facing. Even in the ARM world,
> the DT bindings grew organically over time based on newer platform
> requirements.
> 
> Now that we have a platform which does not want the time
> C3STOP feature, we need to first come-up with DT bindings
> to support this platform instead of temporarily disabling
> features which don't work on this platform.

It's the opposite surely? It should be "now that we have a platform that
*does want* the C3STOP feature", right?

> > > With all due respect, NACK to this patch from my side.

As Samuel pointed out that the D1 doesn't actually use the timer in
question, I think we are okay here?

> > >>
> > >> ---
> > >>
> > >> This should allow us to revert 232ccac1bd9b ("clocksource/drivers/riscv:
> > >> Events are stopped during CPU suspend"), which fixes suspend on the D1
> > >> but breaks timers everywhere.
> > >> ---
> > >>  drivers/cpuidle/cpuidle-riscv-sbi.c | 11 +++++++++++
> > >>  1 file changed, 11 insertions(+)
> > >>
> > >> diff --git a/drivers/cpuidle/cpuidle-riscv-sbi.c b/drivers/cpuidle/cpuidle-riscv-sbi.c
> > >> index 05fe2902df9a..9d1063a54495 100644
> > >> --- a/drivers/cpuidle/cpuidle-riscv-sbi.c
> > >> +++ b/drivers/cpuidle/cpuidle-riscv-sbi.c
> > >> @@ -214,6 +214,17 @@ static bool sbi_suspend_state_is_valid(u32 state)
> > >>         if (state > SBI_HSM_SUSPEND_NON_RET_DEFAULT &&
> > >>             state < SBI_HSM_SUSPEND_NON_RET_PLATFORM)
> > >>                 return false;
> > >> +
> > >> +       /*
> > >> +        * Whether or not RISC-V systems deliver interrupts to harts in a
> > >> +        * non-retentive suspend state is a platform-specific detail.  This can
> > >> +        * leave the hart unable to wake up, so just mark these states as
> > >> +        * unsupported until we have a mechanism to expose these
> > >> +        * platform-specific details to Linux.
> > >> +        */
> > >> +       if (state & SBI_HSM_SUSP_NON_RET_BIT)
> > >> +               return false;
> > >> +
> > >>         return true;
> > >>  }
> > >>
> > >> --
> > >> 2.38.1
> > >>

Palmer Dabbelt Nov. 22, 2022, 3:28 p.m. UTC | #8

On Tue, 22 Nov 2022 03:19:43 PST (-0800), conor.dooley@microchip.com wrote:
> On Tue, Nov 22, 2022 at 11:06:15AM +0530, Anup Patel wrote:
>> On Tue, Nov 22, 2022 at 10:46 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
>> >
>> > On Mon, 21 Nov 2022 19:45:07 PST (-0800), anup@brainfault.org wrote:
>> > > On Tue, Nov 22, 2022 at 2:27 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
>> > >>
>> > >> From: Palmer Dabbelt <palmer@rivosinc.com>
>> > >>
>> > >> As per [1], whether or not the core can wake up from non-retentive
>> > >> suspend is a platform-specific detail.  We don't have any way to encode
>> > >> that, so just stop using them until we've sorted that out.
>> > >>
>> > >> Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687
>> > >> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver")
>> > >> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
>> > >
>> > > This is just unnecessary maintenance churn and it's not the
>> > > right way to go. Better to fix this the right way instead of having
>> > > a temporary fix.
>> > >
>> > > I had already sent-out a patch series 5 months back to describe
>> > > this in DT:
>> > > https://lore.kernel.org/lkml/20220727114302.302201-1-apatel@ventanamicro.com/
>> > >
>> > > No one has commented/suggested anything (except Samuel
>> > > Holland and Sudeep Holla).
>> >
>> > I see some comments from Krzysztof here
>> > <https://lore.kernel.org/lkml/7a0477a0-9f0f-87d6-4070-30321745f4cc@linaro.org/>
>> > as well.  Looks like everyone is pointing out that having our CPU nodes
>> > encode timers is a bad idea, my guess is that they're probably right.
>>
>> Adding a separate timer DT node, creates a new set of compatibility
>> issues for existing platforms. I am fine updating my series to have
>> separate timer DT node but do we want to go in this direction ?
>
> I don't really follow. How is there a compatibility issue created by
> adding a new node that is not added for a new property? Both will
> require changes to the device tree. (You need not reply here, I am going
> to review the other thread, it's been on my todo list for too long. Been
> caught up with non-coherent stuff & our sw release cycle..)
>
>> Even if ARM has a separate timer DT node, the timers are still part
>> of the CPU. It depends on how we see the DT bindings aligning with
>> actual HW.
>>
>> >
>> > > Please review this series. I can quickly address comments to
>> > > make this available for Linux-6.2. Until this series is merged,
>> > > the affected platforms can simply remove non-retentive suspend
>> > > states from their DT.
>> >
>> > That leaves us with a dependency between kernel versions and DT
>> > bindings: kernels with the current driver will result in broken systems
>> > with the non-retentive suspend states in the DT they boot with when
>> > those states can't wake up the CPU.
>
> Can someone point me at a (non D1 or virt) system that has suspend
> states in the DT that would need fixing?
>
>> This is not a new problem we are facing. Even in the ARM world,
>> the DT bindings grew organically over time based on newer platform
>> requirements.
>>
>> Now that we have a platform which does not want the time
>> C3STOP feature, we need to first come-up with DT bindings
>> to support this platform instead of temporarily disabling
>> features which don't work on this platform.
>
> It's the opposite surely? It should be "now that we have a platform that
> *does want* the C3STOP feature", right?

IMO we just shouldn't be turning on C3STOP at all.  Sure it's making the 
problem here go away, but it's trying to emulate a bunch of Intel timer 
features we don't have on RISC-V and ending up in a bunch of fallback 
paths.

If we need some workaround in the timer subsystem to sort out the 
non-retentive states then we can sort that out, but my guess is that 
vendors are just going to 3

>> > > With all due respect, NACK to this patch from my side.
>
> As Samuel pointed out that the D1 doesn't actually use the timer in
> question, I think we are okay here?

IIUC it just should use the sunxi timer driver, but that requires some 
priority changes (and presumably breaks 

That said, I guess I'm confused about what's actually broken here.  My 
understanding of the problem is: The D1's firmware disables some 
interrupts during non-retentive suspend, which results in SBI timers 
failing to wake up the core.  The patch to add C3STOP makes the core 
come back from sleep, but that results in a bunch of other timer-related 
issues.

So IMO we should revert the C3STOP patch as it's causing regressions 
(ie, old workloads break in order to make new ones work).  Seems like 
folks mostly agree on that one?

I also think we should stop entering non-retentive suspend until we can 
sort out how reliably wake up from it, as the SBI makes that a 
platform-specific detail.  If the answer there is "non-retentive suspend 
is fine on the D1 as long as we don't use the SBI timers" then that 
seems fine, we just need some way to describe that in Linux -- that 
doesn't fix other platforms and other interrupts, but at least it's a 
start.

Sorry if I've just misunderstood something here?

>
>> > >>
>> > >> ---
>> > >>
>> > >> This should allow us to revert 232ccac1bd9b ("clocksource/drivers/riscv:
>> > >> Events are stopped during CPU suspend"), which fixes suspend on the D1
>> > >> but breaks timers everywhere.
>> > >> ---
>> > >>  drivers/cpuidle/cpuidle-riscv-sbi.c | 11 +++++++++++
>> > >>  1 file changed, 11 insertions(+)
>> > >>
>> > >> diff --git a/drivers/cpuidle/cpuidle-riscv-sbi.c b/drivers/cpuidle/cpuidle-riscv-sbi.c
>> > >> index 05fe2902df9a..9d1063a54495 100644
>> > >> --- a/drivers/cpuidle/cpuidle-riscv-sbi.c
>> > >> +++ b/drivers/cpuidle/cpuidle-riscv-sbi.c
>> > >> @@ -214,6 +214,17 @@ static bool sbi_suspend_state_is_valid(u32 state)
>> > >>         if (state > SBI_HSM_SUSPEND_NON_RET_DEFAULT &&
>> > >>             state < SBI_HSM_SUSPEND_NON_RET_PLATFORM)
>> > >>                 return false;
>> > >> +
>> > >> +       /*
>> > >> +        * Whether or not RISC-V systems deliver interrupts to harts in a
>> > >> +        * non-retentive suspend state is a platform-specific detail.  This can
>> > >> +        * leave the hart unable to wake up, so just mark these states as
>> > >> +        * unsupported until we have a mechanism to expose these
>> > >> +        * platform-specific details to Linux.
>> > >> +        */
>> > >> +       if (state & SBI_HSM_SUSP_NON_RET_BIT)
>> > >> +               return false;
>> > >> +
>> > >>         return true;
>> > >>  }
>> > >>
>> > >> --
>> > >> 2.38.1
>> > >>

Samuel Holland Nov. 23, 2022, 3:42 a.m. UTC | #9

Hi Conor,

On 11/22/22 05:06, Conor Dooley wrote:
> Hey Samuel,
> 
> Thanks a lot for the extra context.
> 
> On Mon, Nov 21, 2022 at 06:45:25PM -0600, Samuel Holland wrote:
>> Hi Palmer,
>>
>> On 11/21/22 14:56, Palmer Dabbelt wrote:
>>> From: Palmer Dabbelt <palmer@rivosinc.com>
>>>
>>> As per [1], whether or not the core can wake up from non-retentive
>>> suspend is a platform-specific detail.  We don't have any way to encode
>>> that, so just stop using them until we've sorted that out.
>>
>> We do have *exactly* a way to encode that. Specifically, the existence
>> or non-existence of a non-retentive CPU suspend state in the DT.
>>
>> If a hart has no way of resuming from non-retentive suspend (i.e. a
>> complete lack of interrupt delivery in non-retentive suspend), then it
>> makes zero sense to advertise such a suspend state in the DT.
> 
> I would have to agree with that. I think the sprawling conversation has
> confused us all at this point, I (prior to reading this mail) thought
> that suspend was borked on the D1. I don't think anyone is advertising
> specific states in the DT at the moment though, I had a check in the D1
> patchset and didn't see anything there - unless you're adding it
> dynamically from the bootloader?

The availability and latency properties of idle states depend on the SBI
implementation, so yes, they need to be added dynamically.

>> Therefore,
>> if the state exists in the DT, you can assume there is *some* interrupt
>> that can wake up the hart. And I would extend that to assume at least
>> one of those wakeup-capable interrupts is a timer interrupt, although
>> not necessarily the architectural timer interrupt.
>>
>>> Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687
>>
>> This comment refers specifically to the architectural timer interrupt,
>> not interrupts more generally.
>>
>>> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver")
>>> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
>>>
>>> ---
>>>
>>> This should allow us to revert 232ccac1bd9b ("clocksource/drivers/riscv:
>>> Events are stopped during CPU suspend"), which fixes suspend on the D1
>>> but breaks timers everywhere.
>>
>> I understand that reverting 232ccac1bd9b is the easiest way to fix the
>> issues you and others are seeing.
> 
> I am going to submit another respin of that revert, hopefully with the
> extra context from here and elsewhere mixed in.
> 
>> I do not have any functioning RISC-V
>> hardware with SMP, so it is hard for me to help debug the root issue in
>> the Linux timer code. I do not know why turning on the C3STOP flag
>> breaks RCU stall detection or clock_nanosleep(), but I agree that
>> breakage should not happen.
>>
>> So while I still think 232ccac1bd9b is the right change to make from a
>> "following the spec" standpoint
> 
> Right, so the spec says:
> Request the SBI implementation to put the calling hart in a platform
> specific suspend (or low power) state specified by the suspend_type
> parameter. The hart will automatically come out of suspended state and
> resume normal execution when it receives an interrupt or platform
> specific hardware event.
> 
> That, as we have discussed a bunch of times, does not say whether a
> given interrupt should actually arrive during suspend. The correct
> behaviour, to me, is to assume that no events arrive during suspend.

Are you suggesting that we need some property to declare the delivery of
each kind of interrupt (software, timer, external, PMU)? I assumed that
external interrupt delivery would be required to consider an idle state
"viable", but I suppose it would be _possible_ to have a state where
only timer interrupts are deliverable.

> We've got a regular old SiFive implementation so I assume (and will go
> investigate at some point this week) that the other SiFive {,based}
> implementations also have timer events during suspend.
> 
>> I am okay with reverting it for
>> pragmatic reasons. Since the D1 has another timer driver that is
>> currently used in preference to the RISC-V/SBI timer triver,
> 
> Once we have got something in place to actually make the determination
> we can revert the revert. I'll go give some feedback on Anup's stuff,
> I've been meaning to but unfortunately not had the chance to do so yet.

Thanks for following up on this.

Regards,
Samuel

>> reverting
>> 232ccac1bd9b does not break non-retentive suspend for the D1.
> 
> Ah, I did not know this. Probably should have gone looking before I
> acked this patch - sorry!
> Since that's the case this patch seems un-needed.
> 
>> But please do not make the change below. It is unnecessarily broad, and
>> will break something that works fine right now. If the DT advertises a
>> CPU suspend state that cannot be woken up from at all, then that is a
>> bug in the DT. Linux should not try to work around that.
> 
> Thanks again Samuel :)
> 
>>> ---
>>>  drivers/cpuidle/cpuidle-riscv-sbi.c | 11 +++++++++++
>>>  1 file changed, 11 insertions(+)
>>>
>>> diff --git a/drivers/cpuidle/cpuidle-riscv-sbi.c b/drivers/cpuidle/cpuidle-riscv-sbi.c
>>> index 05fe2902df9a..9d1063a54495 100644
>>> --- a/drivers/cpuidle/cpuidle-riscv-sbi.c
>>> +++ b/drivers/cpuidle/cpuidle-riscv-sbi.c
>>> @@ -214,6 +214,17 @@ static bool sbi_suspend_state_is_valid(u32 state)
>>>  	if (state > SBI_HSM_SUSPEND_NON_RET_DEFAULT &&
>>>  	    state < SBI_HSM_SUSPEND_NON_RET_PLATFORM)
>>>  		return false;
>>> +
>>> +	/*
>>> +	 * Whether or not RISC-V systems deliver interrupts to harts in a
>>> +	 * non-retentive suspend state is a platform-specific detail.  This can
>>> +	 * leave the hart unable to wake up, so just mark these states as
>>> +	 * unsupported until we have a mechanism to expose these
>>> +	 * platform-specific details to Linux.
>>> +	 */
>>> +	if (state & SBI_HSM_SUSP_NON_RET_BIT)
>>> +		return false;
>>> +
>>>  	return true;
>>>  }
>>>  
>>
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv

Anup Patel Nov. 23, 2022, 4:26 a.m. UTC | #10

On Tue, Nov 22, 2022 at 4:50 PM Conor Dooley <conor.dooley@microchip.com> wrote:
>
> On Tue, Nov 22, 2022 at 11:06:15AM +0530, Anup Patel wrote:
> > On Tue, Nov 22, 2022 at 10:46 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
> > >
> > > On Mon, 21 Nov 2022 19:45:07 PST (-0800), anup@brainfault.org wrote:
> > > > On Tue, Nov 22, 2022 at 2:27 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
> > > >>
> > > >> From: Palmer Dabbelt <palmer@rivosinc.com>
> > > >>
> > > >> As per [1], whether or not the core can wake up from non-retentive
> > > >> suspend is a platform-specific detail.  We don't have any way to encode
> > > >> that, so just stop using them until we've sorted that out.
> > > >>
> > > >> Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687
> > > >> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver")
> > > >> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
> > > >
> > > > This is just unnecessary maintenance churn and it's not the
> > > > right way to go. Better to fix this the right way instead of having
> > > > a temporary fix.
> > > >
> > > > I had already sent-out a patch series 5 months back to describe
> > > > this in DT:
> > > > https://lore.kernel.org/lkml/20220727114302.302201-1-apatel@ventanamicro.com/
> > > >
> > > > No one has commented/suggested anything (except Samuel
> > > > Holland and Sudeep Holla).
> > >
> > > I see some comments from Krzysztof here
> > > <https://lore.kernel.org/lkml/7a0477a0-9f0f-87d6-4070-30321745f4cc@linaro.org/>
> > > as well.  Looks like everyone is pointing out that having our CPU nodes
> > > encode timers is a bad idea, my guess is that they're probably right.
> >
> > Adding a separate timer DT node, creates a new set of compatibility
> > issues for existing platforms. I am fine updating my series to have
> > separate timer DT node but do we want to go in this direction ?
>
> I don't really follow. How is there a compatibility issue created by
> adding a new node that is not added for a new property? Both will
> require changes to the device tree. (You need not reply here, I am going
> to review the other thread, it's been on my todo list for too long. Been
> caught up with non-coherent stuff & our sw release cycle..)

Adding a new timer DT node would mean, the RISC-V timer driver
will now be probed using the compatible to the new DT node whereas
the RISC-V timer driver is currently probed using CPU DT nodes.

>
> > Even if ARM has a separate timer DT node, the timers are still part
> > of the CPU. It depends on how we see the DT bindings aligning with
> > actual HW.
> >
> > >
> > > > Please review this series. I can quickly address comments to
> > > > make this available for Linux-6.2. Until this series is merged,
> > > > the affected platforms can simply remove non-retentive suspend
> > > > states from their DT.
> > >
> > > That leaves us with a dependency between kernel versions and DT
> > > bindings: kernels with the current driver will result in broken systems
> > > with the non-retentive suspend states in the DT they boot with when
> > > those states can't wake up the CPU.
>
> Can someone point me at a (non D1 or virt) system that has suspend
> states in the DT that would need fixing?

For the QEMU virt machine, the default non-retentive suspend state was
tested using a temporary DTB provided separately via QEMU command
line. The QEMU virt machine does not have its own HART suspend
states so OpenSBI will functionally emulate default retentive/non-retentive
suspend states.

>
> > This is not a new problem we are facing. Even in the ARM world,
> > the DT bindings grew organically over time based on newer platform
> > requirements.
> >
> > Now that we have a platform which does not want the time
> > C3STOP feature, we need to first come-up with DT bindings
> > to support this platform instead of temporarily disabling
> > features which don't work on this platform.
>
> It's the opposite surely? It should be "now that we have a platform that
> *does want* the C3STOP feature", right?

Yes, we can think this way as well.

>
> > > > With all due respect, NACK to this patch from my side.
>
> As Samuel pointed out that the D1 doesn't actually use the timer in
> question, I think we are okay here?

Yes, that's why D1 needs the C3STOP flag.

>
> > > >>
> > > >> ---
> > > >>
> > > >> This should allow us to revert 232ccac1bd9b ("clocksource/drivers/riscv:
> > > >> Events are stopped during CPU suspend"), which fixes suspend on the D1
> > > >> but breaks timers everywhere.
> > > >> ---
> > > >>  drivers/cpuidle/cpuidle-riscv-sbi.c | 11 +++++++++++
> > > >>  1 file changed, 11 insertions(+)
> > > >>
> > > >> diff --git a/drivers/cpuidle/cpuidle-riscv-sbi.c b/drivers/cpuidle/cpuidle-riscv-sbi.c
> > > >> index 05fe2902df9a..9d1063a54495 100644
> > > >> --- a/drivers/cpuidle/cpuidle-riscv-sbi.c
> > > >> +++ b/drivers/cpuidle/cpuidle-riscv-sbi.c
> > > >> @@ -214,6 +214,17 @@ static bool sbi_suspend_state_is_valid(u32 state)
> > > >>         if (state > SBI_HSM_SUSPEND_NON_RET_DEFAULT &&
> > > >>             state < SBI_HSM_SUSPEND_NON_RET_PLATFORM)
> > > >>                 return false;
> > > >> +
> > > >> +       /*
> > > >> +        * Whether or not RISC-V systems deliver interrupts to harts in a
> > > >> +        * non-retentive suspend state is a platform-specific detail.  This can
> > > >> +        * leave the hart unable to wake up, so just mark these states as
> > > >> +        * unsupported until we have a mechanism to expose these
> > > >> +        * platform-specific details to Linux.
> > > >> +        */
> > > >> +       if (state & SBI_HSM_SUSP_NON_RET_BIT)
> > > >> +               return false;
> > > >> +
> > > >>         return true;
> > > >>  }
> > > >>
> > > >> --
> > > >> 2.38.1
> > > >>

Regards,
Anup

Anup Patel Nov. 23, 2022, 4:38 a.m. UTC | #11

On Tue, Nov 22, 2022 at 8:58 PM Palmer Dabbelt <palmer@rivosinc.com> wrote:
>
> On Tue, 22 Nov 2022 03:19:43 PST (-0800), conor.dooley@microchip.com wrote:
> > On Tue, Nov 22, 2022 at 11:06:15AM +0530, Anup Patel wrote:
> >> On Tue, Nov 22, 2022 at 10:46 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
> >> >
> >> > On Mon, 21 Nov 2022 19:45:07 PST (-0800), anup@brainfault.org wrote:
> >> > > On Tue, Nov 22, 2022 at 2:27 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
> >> > >>
> >> > >> From: Palmer Dabbelt <palmer@rivosinc.com>
> >> > >>
> >> > >> As per [1], whether or not the core can wake up from non-retentive
> >> > >> suspend is a platform-specific detail.  We don't have any way to encode
> >> > >> that, so just stop using them until we've sorted that out.
> >> > >>
> >> > >> Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687
> >> > >> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver")
> >> > >> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
> >> > >
> >> > > This is just unnecessary maintenance churn and it's not the
> >> > > right way to go. Better to fix this the right way instead of having
> >> > > a temporary fix.
> >> > >
> >> > > I had already sent-out a patch series 5 months back to describe
> >> > > this in DT:
> >> > > https://lore.kernel.org/lkml/20220727114302.302201-1-apatel@ventanamicro.com/
> >> > >
> >> > > No one has commented/suggested anything (except Samuel
> >> > > Holland and Sudeep Holla).
> >> >
> >> > I see some comments from Krzysztof here
> >> > <https://lore.kernel.org/lkml/7a0477a0-9f0f-87d6-4070-30321745f4cc@linaro.org/>
> >> > as well.  Looks like everyone is pointing out that having our CPU nodes
> >> > encode timers is a bad idea, my guess is that they're probably right.
> >>
> >> Adding a separate timer DT node, creates a new set of compatibility
> >> issues for existing platforms. I am fine updating my series to have
> >> separate timer DT node but do we want to go in this direction ?
> >
> > I don't really follow. How is there a compatibility issue created by
> > adding a new node that is not added for a new property? Both will
> > require changes to the device tree. (You need not reply here, I am going
> > to review the other thread, it's been on my todo list for too long. Been
> > caught up with non-coherent stuff & our sw release cycle..)
> >
> >> Even if ARM has a separate timer DT node, the timers are still part
> >> of the CPU. It depends on how we see the DT bindings aligning with
> >> actual HW.
> >>
> >> >
> >> > > Please review this series. I can quickly address comments to
> >> > > make this available for Linux-6.2. Until this series is merged,
> >> > > the affected platforms can simply remove non-retentive suspend
> >> > > states from their DT.
> >> >
> >> > That leaves us with a dependency between kernel versions and DT
> >> > bindings: kernels with the current driver will result in broken systems
> >> > with the non-retentive suspend states in the DT they boot with when
> >> > those states can't wake up the CPU.
> >
> > Can someone point me at a (non D1 or virt) system that has suspend
> > states in the DT that would need fixing?
> >
> >> This is not a new problem we are facing. Even in the ARM world,
> >> the DT bindings grew organically over time based on newer platform
> >> requirements.
> >>
> >> Now that we have a platform which does not want the time
> >> C3STOP feature, we need to first come-up with DT bindings
> >> to support this platform instead of temporarily disabling
> >> features which don't work on this platform.
> >
> > It's the opposite surely? It should be "now that we have a platform that
> > *does want* the C3STOP feature", right?
>
> IMO we just shouldn't be turning on C3STOP at all.  Sure it's making the
> problem here go away, but it's trying to emulate a bunch of Intel timer
> features we don't have on RISC-V and ending up in a bunch of fallback
> paths.
>
> If we need some workaround in the timer subsystem to sort out the
> non-retentive states then we can sort that out, but my guess is that
> vendors are just going to 3

Actually, it will be better if we set C3STOP in the RISC-V timer driver
only for D1 (and future platforms that might need it). We can easily
do this based on the D1 compatible string from the root DT node.

>
> >> > > With all due respect, NACK to this patch from my side.
> >
> > As Samuel pointed out that the D1 doesn't actually use the timer in
> > question, I think we are okay here?
>
> IIUC it just should use the sunxi timer driver, but that requires some
> priority changes (and presumably breaks
>
> That said, I guess I'm confused about what's actually broken here.  My
> understanding of the problem is: The D1's firmware disables some
> interrupts during non-retentive suspend, which results in SBI timers
> failing to wake up the core.  The patch to add C3STOP makes the core
> come back from sleep, but that results in a bunch of other timer-related
> issues.
>
> So IMO we should revert the C3STOP patch as it's causing regressions
> (ie, old workloads break in order to make new ones work).  Seems like
> folks mostly agree on that one?

Yes, I agree.

We can go a step further and add quirks in the RISC-V timer driver which
only sets C3STOP for certain platforms (e.g. D1).

>
> I also think we should stop entering non-retentive suspend until we can
> sort out how reliably wake up from it, as the SBI makes that a
> platform-specific detail.  If the answer there is "non-retentive suspend
> is fine on the D1 as long as we don't use the SBI timers" then that
> seems fine, we just need some way to describe that in Linux -- that
> doesn't fix other platforms and other interrupts, but at least it's a
> start.
>
> Sorry if I've just misunderstood something here?

Rather than "stop entering non-retentive suspend", we should improve
the relevant drivers.

>
> >
> >> > >>
> >> > >> ---
> >> > >>
> >> > >> This should allow us to revert 232ccac1bd9b ("clocksource/drivers/riscv:
> >> > >> Events are stopped during CPU suspend"), which fixes suspend on the D1
> >> > >> but breaks timers everywhere.
> >> > >> ---
> >> > >>  drivers/cpuidle/cpuidle-riscv-sbi.c | 11 +++++++++++
> >> > >>  1 file changed, 11 insertions(+)
> >> > >>
> >> > >> diff --git a/drivers/cpuidle/cpuidle-riscv-sbi.c b/drivers/cpuidle/cpuidle-riscv-sbi.c
> >> > >> index 05fe2902df9a..9d1063a54495 100644
> >> > >> --- a/drivers/cpuidle/cpuidle-riscv-sbi.c
> >> > >> +++ b/drivers/cpuidle/cpuidle-riscv-sbi.c
> >> > >> @@ -214,6 +214,17 @@ static bool sbi_suspend_state_is_valid(u32 state)
> >> > >>         if (state > SBI_HSM_SUSPEND_NON_RET_DEFAULT &&
> >> > >>             state < SBI_HSM_SUSPEND_NON_RET_PLATFORM)
> >> > >>                 return false;
> >> > >> +
> >> > >> +       /*
> >> > >> +        * Whether or not RISC-V systems deliver interrupts to harts in a
> >> > >> +        * non-retentive suspend state is a platform-specific detail.  This can
> >> > >> +        * leave the hart unable to wake up, so just mark these states as
> >> > >> +        * unsupported until we have a mechanism to expose these
> >> > >> +        * platform-specific details to Linux.
> >> > >> +        */
> >> > >> +       if (state & SBI_HSM_SUSP_NON_RET_BIT)
> >> > >> +               return false;
> >> > >> +
> >> > >>         return true;
> >> > >>  }
> >> > >>
> >> > >> --
> >> > >> 2.38.1
> >> > >>

Regards,
Anup

Samuel Holland Nov. 23, 2022, 5:11 a.m. UTC | #12

Hi Palmer,

On 11/22/22 09:28, Palmer Dabbelt wrote:
> On Tue, 22 Nov 2022 03:19:43 PST (-0800), conor.dooley@microchip.com wrote:
>> On Tue, Nov 22, 2022 at 11:06:15AM +0530, Anup Patel wrote:
>>> On Tue, Nov 22, 2022 at 10:46 AM Palmer Dabbelt <palmer@rivosinc.com>
>>> wrote:
>>> >
>>> > On Mon, 21 Nov 2022 19:45:07 PST (-0800), anup@brainfault.org wrote:
>>> > > On Tue, Nov 22, 2022 at 2:27 AM Palmer Dabbelt
>>> <palmer@rivosinc.com> wrote:
>>> > >>
>>> > >> From: Palmer Dabbelt <palmer@rivosinc.com>
>>> > >>
>>> > >> As per [1], whether or not the core can wake up from non-retentive
>>> > >> suspend is a platform-specific detail.  We don't have any way to
>>> encode
>>> > >> that, so just stop using them until we've sorted that out.
>>> > >>
>>> > >> Link:
>>> https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687
>>> > >> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver")
>>> > >> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
>>> > >
>>> > > This is just unnecessary maintenance churn and it's not the
>>> > > right way to go. Better to fix this the right way instead of having
>>> > > a temporary fix.
>>> > >
>>> > > I had already sent-out a patch series 5 months back to describe
>>> > > this in DT:
>>> > >
>>> https://lore.kernel.org/lkml/20220727114302.302201-1-apatel@ventanamicro.com/
>>> > >
>>> > > No one has commented/suggested anything (except Samuel
>>> > > Holland and Sudeep Holla).
>>> >
>>> > I see some comments from Krzysztof here
>>> >
>>> <https://lore.kernel.org/lkml/7a0477a0-9f0f-87d6-4070-30321745f4cc@linaro.org/>
>>> > as well.  Looks like everyone is pointing out that having our CPU
>>> nodes
>>> > encode timers is a bad idea, my guess is that they're probably right.
>>>
>>> Adding a separate timer DT node, creates a new set of compatibility
>>> issues for existing platforms. I am fine updating my series to have
>>> separate timer DT node but do we want to go in this direction ?
>>
>> I don't really follow. How is there a compatibility issue created by
>> adding a new node that is not added for a new property? Both will
>> require changes to the device tree. (You need not reply here, I am going
>> to review the other thread, it's been on my todo list for too long. Been
>> caught up with non-coherent stuff & our sw release cycle..)
>>
>>> Even if ARM has a separate timer DT node, the timers are still part
>>> of the CPU. It depends on how we see the DT bindings aligning with
>>> actual HW.
>>>
>>> >
>>> > > Please review this series. I can quickly address comments to
>>> > > make this available for Linux-6.2. Until this series is merged,
>>> > > the affected platforms can simply remove non-retentive suspend
>>> > > states from their DT.
>>> >
>>> > That leaves us with a dependency between kernel versions and DT
>>> > bindings: kernels with the current driver will result in broken
>>> systems
>>> > with the non-retentive suspend states in the DT they boot with when
>>> > those states can't wake up the CPU.
>>
>> Can someone point me at a (non D1 or virt) system that has suspend
>> states in the DT that would need fixing?
>>
>>> This is not a new problem we are facing. Even in the ARM world,
>>> the DT bindings grew organically over time based on newer platform
>>> requirements.
>>>
>>> Now that we have a platform which does not want the time
>>> C3STOP feature, we need to first come-up with DT bindings
>>> to support this platform instead of temporarily disabling
>>> features which don't work on this platform.
>>
>> It's the opposite surely? It should be "now that we have a platform that
>> *does want* the C3STOP feature", right?
> 
> IMO we just shouldn't be turning on C3STOP at all.  Sure it's making the
> problem here go away, but it's trying to emulate a bunch of Intel timer
> features we don't have on RISC-V and ending up in a bunch of fallback
> paths.

While the comment in include/linux/clockchips.h and the name of the flag
imply that C3STOP is Intel-specific, it really isn't. The flag is used
on ARM, MIPS, and PowerPC as well.

However we do it, the end goal here is making tick_broadcast_enter()
return nonzero (when called from inside cpuidle_enter_state()), thus
inhibiting Linux from using idle states with the "local-timer-stop"
property set in the DT.

Looking down inside tick_broadcast_oneshot_control(), it appears
CLOCK_EVT_FEAT_C3STOP really is required to make this happen.

What are you referring to by "fallback paths"?

We could add another CLOCK_EVT_FEAT_??? flag, but how should it differ
from CLOCK_EVT_FEAT_C3STOP?

> If we need some workaround in the timer subsystem to sort out the
> non-retentive states then we can sort that out, but my guess is that
> vendors are just going to 3

(your message got cut off here)

>>> > > With all due respect, NACK to this patch from my side.
>>
>> As Samuel pointed out that the D1 doesn't actually use the timer in
>> question, I think we are okay here?
> 
> IIUC it just should use the sunxi timer driver, but that requires some
> priority changes (and presumably breaks

(and here too)

D1 currently uses sun4i_tick (rating 350) over riscv_timer_clockevent
(rating 100). The ratings are fine as is.

> That said, I guess I'm confused about what's actually broken here.  My
> understanding of the problem is: The D1's firmware disables some
> interrupts during non-retentive suspend, which results in SBI timers
> failing to wake up the core.

The D1's hardware cannot deliver the RISC-V architectural timer
interrupt while the C906 core is powered off. It cannot deliver the
RISC-V architectural external interrupt either, but the SoC provides a
side channel for all of the PLIC inputs, so they _can_ wake up the CPU.
So the net result is that the CLINT is the only peripheral unable to
wake the CPU.

> The patch to add C3STOP makes the core
> come back from sleep, but that results in a bunch of other timer-related
> issues.

No, C3STOP _inhibits_ Linux from entering idle states that both:
 1) rely on that clockevent device to wake the local CPU, and
 2) cause that clockevent device to stop working, as signified by the
    CPUIDLE_FLAG_TIMER_STOP flag, which is set by the local-timer-stop
    property in the idle state DT node.

That means entering the idle state is allowed if Linux can solve that
first condition by finding another clockevent device on some other CPU
to wake the local CPU with an IPI. That is the purpose of the broadcast
timer mechanism.

In the case of the D1, since it is single core, there is no other CPU to
broadcast a timer event. So if riscv_timer_clockevent is the only
clockevent device, then tick_broadcast_enter() should return nonzero,
and find_deepest_state() will pick a retentive idle state instead.

This is the already-existing mechanism for only entering idle states we
can reliably wake up from. :)

> So IMO we should revert the C3STOP patch as it's causing regressions

I agree that clearly something is going wrong in the Linux code to cause
problems on SMP systems like PolarFire. I do not really understand the
broadcast timer internals, but what I do know is:
 1) By setting CLOCK_EVT_FEAT_C3STOP, we inhibit the RISC-V timer from
    being used as the broadcast timer (tick_check_broadcast_device()),
    and IIUC fall back to kernel/time/tick-broadcast-hrtimer.c. Maybe
    something goes wrong there?
 2) The broadcast timer wakes up CPUs via IPIs. Maybe something goes
    wrong with IPI delivery? (This raises the question of if we need
    another DT property for receiving IPIs in non-retentive suspend.)

But neither of these should affect behavior when not using broadcast timers.

> (ie, old workloads break in order to make new ones work).  Seems like
> folks mostly agree on that one?

Well, for the D1 specifically, there is no new workload that the C3STOP
patch makes work. Non-retentive suspend worked there already, as I have
explained. The patch was always about being compliant to the spec.

> I also think we should stop entering non-retentive suspend until we can
> sort out how reliably wake up from it, as the SBI makes that a
> platform-specific detail.  If the answer there is "non-retentive suspend
> is fine on the D1 as long as we don't use the SBI timers" then that
> seems fine, we just need some way to describe that in Linux -- that
> doesn't fix other platforms and other interrupts, but at least it's a
> start.

We need some way to describe the situation from the SBI implementation
to Linux.

Non-retentive suspend is fine on the D1 as long as either one of these
conditions is met:
 1) we don't use the SBI timers, or
 2) the SBI timer implementation does not use the CLINT

And it is up to the SBI implementation which timer hardware it uses, so
the SBI implementation needs to patch this information in to the DT at
runtime.

Regards,
Samuel

Anup Patel Nov. 23, 2022, 5:35 a.m. UTC | #13

On Wed, Nov 23, 2022 at 10:41 AM Samuel Holland <samuel@sholland.org> wrote:
>
> Hi Palmer,
>
> On 11/22/22 09:28, Palmer Dabbelt wrote:
> > On Tue, 22 Nov 2022 03:19:43 PST (-0800), conor.dooley@microchip.com wrote:
> >> On Tue, Nov 22, 2022 at 11:06:15AM +0530, Anup Patel wrote:
> >>> On Tue, Nov 22, 2022 at 10:46 AM Palmer Dabbelt <palmer@rivosinc.com>
> >>> wrote:
> >>> >
> >>> > On Mon, 21 Nov 2022 19:45:07 PST (-0800), anup@brainfault.org wrote:
> >>> > > On Tue, Nov 22, 2022 at 2:27 AM Palmer Dabbelt
> >>> <palmer@rivosinc.com> wrote:
> >>> > >>
> >>> > >> From: Palmer Dabbelt <palmer@rivosinc.com>
> >>> > >>
> >>> > >> As per [1], whether or not the core can wake up from non-retentive
> >>> > >> suspend is a platform-specific detail.  We don't have any way to
> >>> encode
> >>> > >> that, so just stop using them until we've sorted that out.
> >>> > >>
> >>> > >> Link:
> >>> https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687
> >>> > >> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver")
> >>> > >> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
> >>> > >
> >>> > > This is just unnecessary maintenance churn and it's not the
> >>> > > right way to go. Better to fix this the right way instead of having
> >>> > > a temporary fix.
> >>> > >
> >>> > > I had already sent-out a patch series 5 months back to describe
> >>> > > this in DT:
> >>> > >
> >>> https://lore.kernel.org/lkml/20220727114302.302201-1-apatel@ventanamicro.com/
> >>> > >
> >>> > > No one has commented/suggested anything (except Samuel
> >>> > > Holland and Sudeep Holla).
> >>> >
> >>> > I see some comments from Krzysztof here
> >>> >
> >>> <https://lore.kernel.org/lkml/7a0477a0-9f0f-87d6-4070-30321745f4cc@linaro.org/>
> >>> > as well.  Looks like everyone is pointing out that having our CPU
> >>> nodes
> >>> > encode timers is a bad idea, my guess is that they're probably right.
> >>>
> >>> Adding a separate timer DT node, creates a new set of compatibility
> >>> issues for existing platforms. I am fine updating my series to have
> >>> separate timer DT node but do we want to go in this direction ?
> >>
> >> I don't really follow. How is there a compatibility issue created by
> >> adding a new node that is not added for a new property? Both will
> >> require changes to the device tree. (You need not reply here, I am going
> >> to review the other thread, it's been on my todo list for too long. Been
> >> caught up with non-coherent stuff & our sw release cycle..)
> >>
> >>> Even if ARM has a separate timer DT node, the timers are still part
> >>> of the CPU. It depends on how we see the DT bindings aligning with
> >>> actual HW.
> >>>
> >>> >
> >>> > > Please review this series. I can quickly address comments to
> >>> > > make this available for Linux-6.2. Until this series is merged,
> >>> > > the affected platforms can simply remove non-retentive suspend
> >>> > > states from their DT.
> >>> >
> >>> > That leaves us with a dependency between kernel versions and DT
> >>> > bindings: kernels with the current driver will result in broken
> >>> systems
> >>> > with the non-retentive suspend states in the DT they boot with when
> >>> > those states can't wake up the CPU.
> >>
> >> Can someone point me at a (non D1 or virt) system that has suspend
> >> states in the DT that would need fixing?
> >>
> >>> This is not a new problem we are facing. Even in the ARM world,
> >>> the DT bindings grew organically over time based on newer platform
> >>> requirements.
> >>>
> >>> Now that we have a platform which does not want the time
> >>> C3STOP feature, we need to first come-up with DT bindings
> >>> to support this platform instead of temporarily disabling
> >>> features which don't work on this platform.
> >>
> >> It's the opposite surely? It should be "now that we have a platform that
> >> *does want* the C3STOP feature", right?
> >
> > IMO we just shouldn't be turning on C3STOP at all.  Sure it's making the
> > problem here go away, but it's trying to emulate a bunch of Intel timer
> > features we don't have on RISC-V and ending up in a bunch of fallback
> > paths.
>
> While the comment in include/linux/clockchips.h and the name of the flag
> imply that C3STOP is Intel-specific, it really isn't. The flag is used
> on ARM, MIPS, and PowerPC as well.
>
> However we do it, the end goal here is making tick_broadcast_enter()
> return nonzero (when called from inside cpuidle_enter_state()), thus
> inhibiting Linux from using idle states with the "local-timer-stop"
> property set in the DT.
>
> Looking down inside tick_broadcast_oneshot_control(), it appears
> CLOCK_EVT_FEAT_C3STOP really is required to make this happen.
>
> What are you referring to by "fallback paths"?
>
> We could add another CLOCK_EVT_FEAT_??? flag, but how should it differ
> from CLOCK_EVT_FEAT_C3STOP?
>
> > If we need some workaround in the timer subsystem to sort out the
> > non-retentive states then we can sort that out, but my guess is that
> > vendors are just going to 3
>
> (your message got cut off here)
>
> >>> > > With all due respect, NACK to this patch from my side.
> >>
> >> As Samuel pointed out that the D1 doesn't actually use the timer in
> >> question, I think we are okay here?
> >
> > IIUC it just should use the sunxi timer driver, but that requires some
> > priority changes (and presumably breaks
>
> (and here too)
>
> D1 currently uses sun4i_tick (rating 350) over riscv_timer_clockevent
> (rating 100). The ratings are fine as is.
>
> > That said, I guess I'm confused about what's actually broken here.  My
> > understanding of the problem is: The D1's firmware disables some
> > interrupts during non-retentive suspend, which results in SBI timers
> > failing to wake up the core.
>
> The D1's hardware cannot deliver the RISC-V architectural timer
> interrupt while the C906 core is powered off. It cannot deliver the
> RISC-V architectural external interrupt either, but the SoC provides a
> side channel for all of the PLIC inputs, so they _can_ wake up the CPU.
> So the net result is that the CLINT is the only peripheral unable to
> wake the CPU.
>
> > The patch to add C3STOP makes the core
> > come back from sleep, but that results in a bunch of other timer-related
> > issues.
>
> No, C3STOP _inhibits_ Linux from entering idle states that both:
>  1) rely on that clockevent device to wake the local CPU, and
>  2) cause that clockevent device to stop working, as signified by the
>     CPUIDLE_FLAG_TIMER_STOP flag, which is set by the local-timer-stop
>     property in the idle state DT node.
>
> That means entering the idle state is allowed if Linux can solve that
> first condition by finding another clockevent device on some other CPU
> to wake the local CPU with an IPI. That is the purpose of the broadcast
> timer mechanism.
>
> In the case of the D1, since it is single core, there is no other CPU to
> broadcast a timer event. So if riscv_timer_clockevent is the only
> clockevent device, then tick_broadcast_enter() should return nonzero,
> and find_deepest_state() will pick a retentive idle state instead.
>
> This is the already-existing mechanism for only entering idle states we
> can reliably wake up from. :)
>
> > So IMO we should revert the C3STOP patch as it's causing regressions
>
> I agree that clearly something is going wrong in the Linux code to cause
> problems on SMP systems like PolarFire. I do not really understand the
> broadcast timer internals, but what I do know is:
>  1) By setting CLOCK_EVT_FEAT_C3STOP, we inhibit the RISC-V timer from
>     being used as the broadcast timer (tick_check_broadcast_device()),
>     and IIUC fall back to kernel/time/tick-broadcast-hrtimer.c. Maybe
>     something goes wrong there?
>  2) The broadcast timer wakes up CPUs via IPIs. Maybe something goes
>     wrong with IPI delivery? (This raises the question of if we need
>     another DT property for receiving IPIs in non-retentive suspend.)
>
> But neither of these should affect behavior when not using broadcast timers.
>
> > (ie, old workloads break in order to make new ones work).  Seems like
> > folks mostly agree on that one?
>
> Well, for the D1 specifically, there is no new workload that the C3STOP
> patch makes work. Non-retentive suspend worked there already, as I have
> explained. The patch was always about being compliant to the spec.
>
> > I also think we should stop entering non-retentive suspend until we can
> > sort out how reliably wake up from it, as the SBI makes that a
> > platform-specific detail.  If the answer there is "non-retentive suspend
> > is fine on the D1 as long as we don't use the SBI timers" then that
> > seems fine, we just need some way to describe that in Linux -- that
> > doesn't fix other platforms and other interrupts, but at least it's a
> > start.
>
> We need some way to describe the situation from the SBI implementation
> to Linux.
>
> Non-retentive suspend is fine on the D1 as long as either one of these
> conditions is met:
>  1) we don't use the SBI timers, or
>  2) the SBI timer implementation does not use the CLINT
>
> And it is up to the SBI implementation which timer hardware it uses, so
> the SBI implementation needs to patch this information in to the DT at
> runtime.

Rather than SBI implementation patching information in DT, it is much
simpler to add a quirk in RISC-V timer driver for D1 platform (i.e. based
on D1 compatible string in root node).

Regards,
Anup

Samuel Holland Nov. 23, 2022, 5:38 a.m. UTC | #14

Hi Anup,

On 11/22/22 23:35, Anup Patel wrote:
> On Wed, Nov 23, 2022 at 10:41 AM Samuel Holland <samuel@sholland.org> wrote:
>> On 11/22/22 09:28, Palmer Dabbelt wrote:
>>> I also think we should stop entering non-retentive suspend until we can
>>> sort out how reliably wake up from it, as the SBI makes that a
>>> platform-specific detail.  If the answer there is "non-retentive suspend
>>> is fine on the D1 as long as we don't use the SBI timers" then that
>>> seems fine, we just need some way to describe that in Linux -- that
>>> doesn't fix other platforms and other interrupts, but at least it's a
>>> start.
>>
>> We need some way to describe the situation from the SBI implementation
>> to Linux.
>>
>> Non-retentive suspend is fine on the D1 as long as either one of these
>> conditions is met:
>>  1) we don't use the SBI timers, or
>>  2) the SBI timer implementation does not use the CLINT
>>
>> And it is up to the SBI implementation which timer hardware it uses, so
>> the SBI implementation needs to patch this information in to the DT at
>> runtime.
> 
> Rather than SBI implementation patching information in DT, it is much
> simpler to add a quirk in RISC-V timer driver for D1 platform (i.e. based
> on D1 compatible string in root node).

It would be simpler, but it would be wrong, as I just explained.

Only the SBI implementation knows if the SBI timer extension can wake
any given CPU from any given non-retentive suspend state.

Regards,
Samuel

Anup Patel Nov. 23, 2022, 6:10 a.m. UTC | #15

On Wed, Nov 23, 2022 at 11:08 AM Samuel Holland <samuel@sholland.org> wrote:
>
> Hi Anup,
>
> On 11/22/22 23:35, Anup Patel wrote:
> > On Wed, Nov 23, 2022 at 10:41 AM Samuel Holland <samuel@sholland.org> wrote:
> >> On 11/22/22 09:28, Palmer Dabbelt wrote:
> >>> I also think we should stop entering non-retentive suspend until we can
> >>> sort out how reliably wake up from it, as the SBI makes that a
> >>> platform-specific detail.  If the answer there is "non-retentive suspend
> >>> is fine on the D1 as long as we don't use the SBI timers" then that
> >>> seems fine, we just need some way to describe that in Linux -- that
> >>> doesn't fix other platforms and other interrupts, but at least it's a
> >>> start.
> >>
> >> We need some way to describe the situation from the SBI implementation
> >> to Linux.
> >>
> >> Non-retentive suspend is fine on the D1 as long as either one of these
> >> conditions is met:
> >>  1) we don't use the SBI timers, or
> >>  2) the SBI timer implementation does not use the CLINT
> >>
> >> And it is up to the SBI implementation which timer hardware it uses, so
> >> the SBI implementation needs to patch this information in to the DT at
> >> runtime.
> >
> > Rather than SBI implementation patching information in DT, it is much
> > simpler to add a quirk in RISC-V timer driver for D1 platform (i.e. based
> > on D1 compatible string in root node).
>
> It would be simpler, but it would be wrong, as I just explained.
>
> Only the SBI implementation knows if the SBI timer extension can wake
> any given CPU from any given non-retentive suspend state.

The SBI implementation would derive this information from platform
compatible string which is already available to the Linux kernel so
why does SBI implementation have to patch the DTB and put the
same information in a different way ?

Regards,
Anup

Samuel Holland Nov. 23, 2022, 6:27 a.m. UTC | #16

On 11/23/22 00:10, Anup Patel wrote:
> On Wed, Nov 23, 2022 at 11:08 AM Samuel Holland <samuel@sholland.org> wrote:
>>
>> Hi Anup,
>>
>> On 11/22/22 23:35, Anup Patel wrote:
>>> On Wed, Nov 23, 2022 at 10:41 AM Samuel Holland <samuel@sholland.org> wrote:
>>>> On 11/22/22 09:28, Palmer Dabbelt wrote:
>>>>> I also think we should stop entering non-retentive suspend until we can
>>>>> sort out how reliably wake up from it, as the SBI makes that a
>>>>> platform-specific detail.  If the answer there is "non-retentive suspend
>>>>> is fine on the D1 as long as we don't use the SBI timers" then that
>>>>> seems fine, we just need some way to describe that in Linux -- that
>>>>> doesn't fix other platforms and other interrupts, but at least it's a
>>>>> start.
>>>>
>>>> We need some way to describe the situation from the SBI implementation
>>>> to Linux.
>>>>
>>>> Non-retentive suspend is fine on the D1 as long as either one of these
>>>> conditions is met:
>>>>  1) we don't use the SBI timers, or
>>>>  2) the SBI timer implementation does not use the CLINT
>>>>
>>>> And it is up to the SBI implementation which timer hardware it uses, so
>>>> the SBI implementation needs to patch this information in to the DT at
>>>> runtime.
>>>
>>> Rather than SBI implementation patching information in DT, it is much
>>> simpler to add a quirk in RISC-V timer driver for D1 platform (i.e. based
>>> on D1 compatible string in root node).
>>
>> It would be simpler, but it would be wrong, as I just explained.
>>
>> Only the SBI implementation knows if the SBI timer extension can wake
>> any given CPU from any given non-retentive suspend state.
> 
> The SBI implementation would derive this information from platform
> compatible string which is already available to the Linux kernel so
> why does SBI implementation have to patch the DTB and put the
> same information in a different way ?

It is not the same information. The SBI implementation also chooses, at
runtime, which timer hardware (CLINT, platform-specific MMIO timer,
etc.) is used to implement the SBI timer extension. The value of the
sbi-timer-can-wake-cpu property depends on this choice.

Using D1 as an example, there are two MMIO timer peripherals ("sun4i"
TIMER and "sun5i" HSTIMER) where the sbi-timer-can-wake-cpu property
should be set. But the property should not be set if the CLINT is used
by SBI.

It would be perfectly reasonable for the SBI implementation to claim one
of the wakeup-capable MMIO timers for itself, mark it as "reserved" in
the DT passed to Linux, and thus force Linux to use the SBI timer or a
native CLINT driver (C906 CLINT has S-mode extensions). Then the SBI
timer _would_ be capable of waking the CPU from non-retentive suspend.

Regards,
Samuel

Anup Patel Nov. 23, 2022, 6:41 a.m. UTC | #17

On Wed, Nov 23, 2022 at 11:57 AM Samuel Holland <samuel@sholland.org> wrote:
>
> On 11/23/22 00:10, Anup Patel wrote:
> > On Wed, Nov 23, 2022 at 11:08 AM Samuel Holland <samuel@sholland.org> wrote:
> >>
> >> Hi Anup,
> >>
> >> On 11/22/22 23:35, Anup Patel wrote:
> >>> On Wed, Nov 23, 2022 at 10:41 AM Samuel Holland <samuel@sholland.org> wrote:
> >>>> On 11/22/22 09:28, Palmer Dabbelt wrote:
> >>>>> I also think we should stop entering non-retentive suspend until we can
> >>>>> sort out how reliably wake up from it, as the SBI makes that a
> >>>>> platform-specific detail.  If the answer there is "non-retentive suspend
> >>>>> is fine on the D1 as long as we don't use the SBI timers" then that
> >>>>> seems fine, we just need some way to describe that in Linux -- that
> >>>>> doesn't fix other platforms and other interrupts, but at least it's a
> >>>>> start.
> >>>>
> >>>> We need some way to describe the situation from the SBI implementation
> >>>> to Linux.
> >>>>
> >>>> Non-retentive suspend is fine on the D1 as long as either one of these
> >>>> conditions is met:
> >>>>  1) we don't use the SBI timers, or
> >>>>  2) the SBI timer implementation does not use the CLINT
> >>>>
> >>>> And it is up to the SBI implementation which timer hardware it uses, so
> >>>> the SBI implementation needs to patch this information in to the DT at
> >>>> runtime.
> >>>
> >>> Rather than SBI implementation patching information in DT, it is much
> >>> simpler to add a quirk in RISC-V timer driver for D1 platform (i.e. based
> >>> on D1 compatible string in root node).
> >>
> >> It would be simpler, but it would be wrong, as I just explained.
> >>
> >> Only the SBI implementation knows if the SBI timer extension can wake
> >> any given CPU from any given non-retentive suspend state.
> >
> > The SBI implementation would derive this information from platform
> > compatible string which is already available to the Linux kernel so
> > why does SBI implementation have to patch the DTB and put the
> > same information in a different way ?
>
> It is not the same information. The SBI implementation also chooses, at
> runtime, which timer hardware (CLINT, platform-specific MMIO timer,
> etc.) is used to implement the SBI timer extension. The value of the
> sbi-timer-can-wake-cpu property depends on this choice.
>
> Using D1 as an example, there are two MMIO timer peripherals ("sun4i"
> TIMER and "sun5i" HSTIMER) where the sbi-timer-can-wake-cpu property
> should be set. But the property should not be set if the CLINT is used
> by SBI.
>
> It would be perfectly reasonable for the SBI implementation to claim one
> of the wakeup-capable MMIO timers for itself, mark it as "reserved" in
> the DT passed to Linux, and thus force Linux to use the SBI timer or a
> native CLINT driver (C906 CLINT has S-mode extensions). Then the SBI
> timer _would_ be capable of waking the CPU from non-retentive suspend.

Fair enough but the DT property should not be SBI specific because same
situation can happen with Sstc as well where a particular non-retentive state
does not preserve the state of stimecmp CSRs in the HARTs.

Better to keep the DT property name as "riscv,timer-can-wake-cpu".

Regards,
Anup

Samuel Holland Nov. 23, 2022, 6:49 a.m. UTC | #18

On 11/23/22 00:41, Anup Patel wrote:
> On Wed, Nov 23, 2022 at 11:57 AM Samuel Holland <samuel@sholland.org> wrote:
>>
>> On 11/23/22 00:10, Anup Patel wrote:
>>> On Wed, Nov 23, 2022 at 11:08 AM Samuel Holland <samuel@sholland.org> wrote:
>>>>
>>>> Hi Anup,
>>>>
>>>> On 11/22/22 23:35, Anup Patel wrote:
>>>>> On Wed, Nov 23, 2022 at 10:41 AM Samuel Holland <samuel@sholland.org> wrote:
>>>>>> On 11/22/22 09:28, Palmer Dabbelt wrote:
>>>>>>> I also think we should stop entering non-retentive suspend until we can
>>>>>>> sort out how reliably wake up from it, as the SBI makes that a
>>>>>>> platform-specific detail.  If the answer there is "non-retentive suspend
>>>>>>> is fine on the D1 as long as we don't use the SBI timers" then that
>>>>>>> seems fine, we just need some way to describe that in Linux -- that
>>>>>>> doesn't fix other platforms and other interrupts, but at least it's a
>>>>>>> start.
>>>>>>
>>>>>> We need some way to describe the situation from the SBI implementation
>>>>>> to Linux.
>>>>>>
>>>>>> Non-retentive suspend is fine on the D1 as long as either one of these
>>>>>> conditions is met:
>>>>>>  1) we don't use the SBI timers, or
>>>>>>  2) the SBI timer implementation does not use the CLINT
>>>>>>
>>>>>> And it is up to the SBI implementation which timer hardware it uses, so
>>>>>> the SBI implementation needs to patch this information in to the DT at
>>>>>> runtime.
>>>>>
>>>>> Rather than SBI implementation patching information in DT, it is much
>>>>> simpler to add a quirk in RISC-V timer driver for D1 platform (i.e. based
>>>>> on D1 compatible string in root node).
>>>>
>>>> It would be simpler, but it would be wrong, as I just explained.
>>>>
>>>> Only the SBI implementation knows if the SBI timer extension can wake
>>>> any given CPU from any given non-retentive suspend state.
>>>
>>> The SBI implementation would derive this information from platform
>>> compatible string which is already available to the Linux kernel so
>>> why does SBI implementation have to patch the DTB and put the
>>> same information in a different way ?
>>
>> It is not the same information. The SBI implementation also chooses, at
>> runtime, which timer hardware (CLINT, platform-specific MMIO timer,
>> etc.) is used to implement the SBI timer extension. The value of the
>> sbi-timer-can-wake-cpu property depends on this choice.
>>
>> Using D1 as an example, there are two MMIO timer peripherals ("sun4i"
>> TIMER and "sun5i" HSTIMER) where the sbi-timer-can-wake-cpu property
>> should be set. But the property should not be set if the CLINT is used
>> by SBI.
>>
>> It would be perfectly reasonable for the SBI implementation to claim one
>> of the wakeup-capable MMIO timers for itself, mark it as "reserved" in
>> the DT passed to Linux, and thus force Linux to use the SBI timer or a
>> native CLINT driver (C906 CLINT has S-mode extensions). Then the SBI
>> timer _would_ be capable of waking the CPU from non-retentive suspend.
> 
> Fair enough but the DT property should not be SBI specific because same
> situation can happen with Sstc as well where a particular non-retentive state
> does not preserve the state of stimecmp CSRs in the HARTs.
> 
> Better to keep the DT property name as "riscv,timer-can-wake-cpu".

Consider a platform where the Sstc-based timer cannot wake the CPU, but
the SBI timer can, because it uses different timer hardware or a
different interrupt delivery method. It seems like we need two different
properties, one for Sstc and the other for the SBI timer. If both are
supported, firmware cannot know which one S-mode software will use.

Regards,
Samuel

Anup Patel Nov. 23, 2022, 7:13 a.m. UTC | #19

On Wed, Nov 23, 2022 at 12:20 PM Samuel Holland <samuel@sholland.org> wrote:
>
> On 11/23/22 00:41, Anup Patel wrote:
> > On Wed, Nov 23, 2022 at 11:57 AM Samuel Holland <samuel@sholland.org> wrote:
> >>
> >> On 11/23/22 00:10, Anup Patel wrote:
> >>> On Wed, Nov 23, 2022 at 11:08 AM Samuel Holland <samuel@sholland.org> wrote:
> >>>>
> >>>> Hi Anup,
> >>>>
> >>>> On 11/22/22 23:35, Anup Patel wrote:
> >>>>> On Wed, Nov 23, 2022 at 10:41 AM Samuel Holland <samuel@sholland.org> wrote:
> >>>>>> On 11/22/22 09:28, Palmer Dabbelt wrote:
> >>>>>>> I also think we should stop entering non-retentive suspend until we can
> >>>>>>> sort out how reliably wake up from it, as the SBI makes that a
> >>>>>>> platform-specific detail.  If the answer there is "non-retentive suspend
> >>>>>>> is fine on the D1 as long as we don't use the SBI timers" then that
> >>>>>>> seems fine, we just need some way to describe that in Linux -- that
> >>>>>>> doesn't fix other platforms and other interrupts, but at least it's a
> >>>>>>> start.
> >>>>>>
> >>>>>> We need some way to describe the situation from the SBI implementation
> >>>>>> to Linux.
> >>>>>>
> >>>>>> Non-retentive suspend is fine on the D1 as long as either one of these
> >>>>>> conditions is met:
> >>>>>>  1) we don't use the SBI timers, or
> >>>>>>  2) the SBI timer implementation does not use the CLINT
> >>>>>>
> >>>>>> And it is up to the SBI implementation which timer hardware it uses, so
> >>>>>> the SBI implementation needs to patch this information in to the DT at
> >>>>>> runtime.
> >>>>>
> >>>>> Rather than SBI implementation patching information in DT, it is much
> >>>>> simpler to add a quirk in RISC-V timer driver for D1 platform (i.e. based
> >>>>> on D1 compatible string in root node).
> >>>>
> >>>> It would be simpler, but it would be wrong, as I just explained.
> >>>>
> >>>> Only the SBI implementation knows if the SBI timer extension can wake
> >>>> any given CPU from any given non-retentive suspend state.
> >>>
> >>> The SBI implementation would derive this information from platform
> >>> compatible string which is already available to the Linux kernel so
> >>> why does SBI implementation have to patch the DTB and put the
> >>> same information in a different way ?
> >>
> >> It is not the same information. The SBI implementation also chooses, at
> >> runtime, which timer hardware (CLINT, platform-specific MMIO timer,
> >> etc.) is used to implement the SBI timer extension. The value of the
> >> sbi-timer-can-wake-cpu property depends on this choice.
> >>
> >> Using D1 as an example, there are two MMIO timer peripherals ("sun4i"
> >> TIMER and "sun5i" HSTIMER) where the sbi-timer-can-wake-cpu property
> >> should be set. But the property should not be set if the CLINT is used
> >> by SBI.
> >>
> >> It would be perfectly reasonable for the SBI implementation to claim one
> >> of the wakeup-capable MMIO timers for itself, mark it as "reserved" in
> >> the DT passed to Linux, and thus force Linux to use the SBI timer or a
> >> native CLINT driver (C906 CLINT has S-mode extensions). Then the SBI
> >> timer _would_ be capable of waking the CPU from non-retentive suspend.
> >
> > Fair enough but the DT property should not be SBI specific because same
> > situation can happen with Sstc as well where a particular non-retentive state
> > does not preserve the state of stimecmp CSRs in the HARTs.
> >
> > Better to keep the DT property name as "riscv,timer-can-wake-cpu".
>
> Consider a platform where the Sstc-based timer cannot wake the CPU, but
> the SBI timer can, because it uses different timer hardware or a
> different interrupt delivery method. It seems like we need two different
> properties, one for Sstc and the other for the SBI timer. If both are
> supported, firmware cannot know which one S-mode software will use.

On a platform with Sstc, the SBI set_timer() call will internally update
stimecmp CSR. In fact, this is what OpenSBI already does.

Here's the text from Sstc specification:
"In systems in which a supervisor execution environment (SEE) provides
timer facilities via an SBI function call, this SBI call will continue
to support
requests to schedule a timer interrupt. The SEE will simply make use of
stimecmp, changing its value as appropriate. This ensures compatibility
with existing S-mode software that uses this SEE facility, while new S-mode
software takes advantage of stimecmp directly.)"

Based on the above, we don't need separate DT property for SBI timer
and Sstc.

Regards,
Anup

Conor Dooley Nov. 23, 2022, 10:09 a.m. UTC | #20

Hey Anup,

On Wed, Nov 23, 2022 at 09:56:31AM +0530, Anup Patel wrote:
> On Tue, Nov 22, 2022 at 4:50 PM Conor Dooley <conor.dooley@microchip.com> wrote:
> >
> > On Tue, Nov 22, 2022 at 11:06:15AM +0530, Anup Patel wrote:
> > > On Tue, Nov 22, 2022 at 10:46 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
> > > >
> > > > On Mon, 21 Nov 2022 19:45:07 PST (-0800), anup@brainfault.org wrote:
> > > > > On Tue, Nov 22, 2022 at 2:27 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
> > > > >>
> > > > >> From: Palmer Dabbelt <palmer@rivosinc.com>
> > > > >>
> > > > >> As per [1], whether or not the core can wake up from non-retentive
> > > > >> suspend is a platform-specific detail.  We don't have any way to encode
> > > > >> that, so just stop using them until we've sorted that out.
> > > > >>
> > > > >> Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687
> > > > >> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver")
> > > > >> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
> > > > >
> > > > > This is just unnecessary maintenance churn and it's not the
> > > > > right way to go. Better to fix this the right way instead of having
> > > > > a temporary fix.
> > > > >
> > > > > I had already sent-out a patch series 5 months back to describe
> > > > > this in DT:
> > > > > https://lore.kernel.org/lkml/20220727114302.302201-1-apatel@ventanamicro.com/
> > > > >
> > > > > No one has commented/suggested anything (except Samuel
> > > > > Holland and Sudeep Holla).
> > > >
> > > > I see some comments from Krzysztof here
> > > > <https://lore.kernel.org/lkml/7a0477a0-9f0f-87d6-4070-30321745f4cc@linaro.org/>
> > > > as well.  Looks like everyone is pointing out that having our CPU nodes
> > > > encode timers is a bad idea, my guess is that they're probably right.
> > >
> > > Adding a separate timer DT node, creates a new set of compatibility
> > > issues for existing platforms. I am fine updating my series to have
> > > separate timer DT node but do we want to go in this direction ?
> >
> > I don't really follow. How is there a compatibility issue created by
> > adding a new node that is not added for a new property? Both will
> > require changes to the device tree. (You need not reply here, I am going
> > to review the other thread, it's been on my todo list for too long. Been
> > caught up with non-coherent stuff & our sw release cycle..)
> 
> Adding a new timer DT node would mean, the RISC-V timer driver
> will now be probed using the compatible to the new DT node whereas
> the RISC-V timer driver is currently probed using CPU DT nodes.

Ahh, that is what I have missed. I'll continue my thoughts on this in
the dt-binding thread.

> > > Even if ARM has a separate timer DT node, the timers are still part
> > > of the CPU. It depends on how we see the DT bindings aligning with
> > > actual HW.
> > >
> > > >
> > > > > Please review this series. I can quickly address comments to
> > > > > make this available for Linux-6.2. Until this series is merged,
> > > > > the affected platforms can simply remove non-retentive suspend
> > > > > states from their DT.
> > > >
> > > > That leaves us with a dependency between kernel versions and DT
> > > > bindings: kernels with the current driver will result in broken systems
> > > > with the non-retentive suspend states in the DT they boot with when
> > > > those states can't wake up the CPU.
> >
> > Can someone point me at a (non D1 or virt) system that has suspend
> > states in the DT that would need fixing?
> 
> For the QEMU virt machine, the default non-retentive suspend state was
> tested using a temporary DTB provided separately via QEMU command
> line. The QEMU virt machine does not have its own HART suspend
> states so OpenSBI will functionally emulate default retentive/non-retentive
> suspend states.

So since I asked for non D1 or virt systems, that's a no & no DTs
actually needs to be fixed :)

> > > This is not a new problem we are facing. Even in the ARM world,
> > > the DT bindings grew organically over time based on newer platform
> > > requirements.
> > >
> > > Now that we have a platform which does not want the time
> > > C3STOP feature, we need to first come-up with DT bindings
> > > to support this platform instead of temporarily disabling
> > > features which don't work on this platform.
> >
> > It's the opposite surely? It should be "now that we have a platform that
> > *does want* the C3STOP feature", right?
> 
> Yes, we can think this way as well.

No, there's no "thinking" involved here from what I can tell. Pre-D1
systems do not seem to need the flag and the D1 does want that flag for
its riscv,timer. We have to operate with respect to hardware timelines
& the corresponding software implementations, not specs in this context.

If it was the case that you proposed, there would be no chance for
regressions if someone updates their kernel but not their DT.

> > > > > With all due respect, NACK to this patch from my side.
> >
> > As Samuel pointed out that the D1 doesn't actually use the timer in
> > question, I think we are okay here?
> 
> Yes, that's why D1 needs the C3STOP flag.

I don't understand what you mean here, you don't appear to be replying
to what I said.

I was saying that the current D1 configuration does not actually use
the timer-riscv driver as there's another one that has a higher rating
& therefore we are okay to not apply this patch as my revert will not
cause it to be put into sleep states that it cannot return from.

Your reply makes no sense to me in that context.

Thanks,
Conor.

Sudeep Holla Nov. 23, 2022, 10:20 a.m. UTC | #21

On Tue, Nov 22, 2022 at 11:11:23PM -0600, Samuel Holland wrote:

[...]

> While the comment in include/linux/clockchips.h and the name of the flag
> imply that C3STOP is Intel-specific, it really isn't. The flag is used
> on ARM, MIPS, and PowerPC as well.
>

+1. I agree the name is confusing but it used to just indicate that the
timer is not always on and could stop in deeper CPU idle states. It probably
was introduced to be used on x86 for C3 state but it is used for other
purposes as well. May be one should have or even can now rename it to
something more appropriate, but I am sure it might trigger some bikeshedding
discussions

Anup Patel Nov. 23, 2022, 10:36 a.m. UTC | #22

On Wed, Nov 23, 2022 at 3:40 PM Conor Dooley <conor.dooley@microchip.com> wrote:
>
> Hey Anup,
>
> On Wed, Nov 23, 2022 at 09:56:31AM +0530, Anup Patel wrote:
> > On Tue, Nov 22, 2022 at 4:50 PM Conor Dooley <conor.dooley@microchip.com> wrote:
> > >
> > > On Tue, Nov 22, 2022 at 11:06:15AM +0530, Anup Patel wrote:
> > > > On Tue, Nov 22, 2022 at 10:46 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
> > > > >
> > > > > On Mon, 21 Nov 2022 19:45:07 PST (-0800), anup@brainfault.org wrote:
> > > > > > On Tue, Nov 22, 2022 at 2:27 AM Palmer Dabbelt <palmer@rivosinc.com> wrote:
> > > > > >>
> > > > > >> From: Palmer Dabbelt <palmer@rivosinc.com>
> > > > > >>
> > > > > >> As per [1], whether or not the core can wake up from non-retentive
> > > > > >> suspend is a platform-specific detail.  We don't have any way to encode
> > > > > >> that, so just stop using them until we've sorted that out.
> > > > > >>
> > > > > >> Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687
> > > > > >> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver")
> > > > > >> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
> > > > > >
> > > > > > This is just unnecessary maintenance churn and it's not the
> > > > > > right way to go. Better to fix this the right way instead of having
> > > > > > a temporary fix.
> > > > > >
> > > > > > I had already sent-out a patch series 5 months back to describe
> > > > > > this in DT:
> > > > > > https://lore.kernel.org/lkml/20220727114302.302201-1-apatel@ventanamicro.com/
> > > > > >
> > > > > > No one has commented/suggested anything (except Samuel
> > > > > > Holland and Sudeep Holla).
> > > > >
> > > > > I see some comments from Krzysztof here
> > > > > <https://lore.kernel.org/lkml/7a0477a0-9f0f-87d6-4070-30321745f4cc@linaro.org/>
> > > > > as well.  Looks like everyone is pointing out that having our CPU nodes
> > > > > encode timers is a bad idea, my guess is that they're probably right.
> > > >
> > > > Adding a separate timer DT node, creates a new set of compatibility
> > > > issues for existing platforms. I am fine updating my series to have
> > > > separate timer DT node but do we want to go in this direction ?
> > >
> > > I don't really follow. How is there a compatibility issue created by
> > > adding a new node that is not added for a new property? Both will
> > > require changes to the device tree. (You need not reply here, I am going
> > > to review the other thread, it's been on my todo list for too long. Been
> > > caught up with non-coherent stuff & our sw release cycle..)
> >
> > Adding a new timer DT node would mean, the RISC-V timer driver
> > will now be probed using the compatible to the new DT node whereas
> > the RISC-V timer driver is currently probed using CPU DT nodes.
>
> Ahh, that is what I have missed. I'll continue my thoughts on this in
> the dt-binding thread.
>
> > > > Even if ARM has a separate timer DT node, the timers are still part
> > > > of the CPU. It depends on how we see the DT bindings aligning with
> > > > actual HW.
> > > >
> > > > >
> > > > > > Please review this series. I can quickly address comments to
> > > > > > make this available for Linux-6.2. Until this series is merged,
> > > > > > the affected platforms can simply remove non-retentive suspend
> > > > > > states from their DT.
> > > > >
> > > > > That leaves us with a dependency between kernel versions and DT
> > > > > bindings: kernels with the current driver will result in broken systems
> > > > > with the non-retentive suspend states in the DT they boot with when
> > > > > those states can't wake up the CPU.
> > >
> > > Can someone point me at a (non D1 or virt) system that has suspend
> > > states in the DT that would need fixing?
> >
> > For the QEMU virt machine, the default non-retentive suspend state was
> > tested using a temporary DTB provided separately via QEMU command
> > line. The QEMU virt machine does not have its own HART suspend
> > states so OpenSBI will functionally emulate default retentive/non-retentive
> > suspend states.
>
> So since I asked for non D1 or virt systems, that's a no & no DTs
> actually needs to be fixed :)
>
> > > > This is not a new problem we are facing. Even in the ARM world,
> > > > the DT bindings grew organically over time based on newer platform
> > > > requirements.
> > > >
> > > > Now that we have a platform which does not want the time
> > > > C3STOP feature, we need to first come-up with DT bindings
> > > > to support this platform instead of temporarily disabling
> > > > features which don't work on this platform.
> > >
> > > It's the opposite surely? It should be "now that we have a platform that
> > > *does want* the C3STOP feature", right?
> >
> > Yes, we can think this way as well.
>
> No, there's no "thinking" involved here from what I can tell. Pre-D1
> systems do not seem to need the flag and the D1 does want that flag for
> its riscv,timer. We have to operate with respect to hardware timelines
> & the corresponding software implementations, not specs in this context.
>
> If it was the case that you proposed, there would be no chance for
> regressions if someone updates their kernel but not their DT.
>
> > > > > > With all due respect, NACK to this patch from my side.
> > >
> > > As Samuel pointed out that the D1 doesn't actually use the timer in
> > > question, I think we are okay here?
> >
> > Yes, that's why D1 needs the C3STOP flag.
>
> I don't understand what you mean here, you don't appear to be replying
> to what I said.
>
> I was saying that the current D1 configuration does not actually use
> the timer-riscv driver as there's another one that has a higher rating
> & therefore we are okay to not apply this patch as my revert will not
> cause it to be put into sleep states that it cannot return from.
>
> Your reply makes no sense to me in that context.

Sorry for the confusion, I should have written a more complete sentence.

D1 does not use the RISC-V timer but it still needs to set the C3STOP
flag to inform the timer subsystem that the RISC-V timer will not work
in suspend state.

Regards,
Anup

Conor Dooley Nov. 23, 2022, 10:46 a.m. UTC | #23

Hey Samuel,

>On Tue, Nov 22, 2022 at 09:42:24PM -0600, Samuel Holland wrote:
> On 11/22/22 05:06, Conor Dooley wrote:
> > On Mon, Nov 21, 2022 at 06:45:25PM -0600, Samuel Holland wrote:
> > > On 11/21/22 14:56, Palmer Dabbelt wrote:

> > > > As per [1], whether or not the core can wake up from non-retentive
> > > > suspend is a platform-specific detail.  We don't have any way to encode
> > > > that, so just stop using them until we've sorted that out.
> > >
> > > We do have *exactly* a way to encode that. Specifically, the existence
> > > or non-existence of a non-retentive CPU suspend state in the DT.
> > >
> > > If a hart has no way of resuming from non-retentive suspend (i.e. a
> > > complete lack of interrupt delivery in non-retentive suspend), then it
> > > makes zero sense to advertise such a suspend state in the DT.
> > 
> > I would have to agree with that. I think the sprawling conversation has
> > confused us all at this point, I (prior to reading this mail) thought
> > that suspend was borked on the D1. I don't think anyone is advertising
> > specific states in the DT at the moment though, I had a check in the D1
> > patchset and didn't see anything there - unless you're adding it
> > dynamically from the bootloader?
> 
> The availability and latency properties of idle states depend on the SBI
> implementation, so yes, they need to be added dynamically.

Right, thanks for clarifying.

> > > I do not have any functioning RISC-V
> > > hardware with SMP, so it is hard for me to help debug the root issue in
> > > the Linux timer code. I do not know why turning on the C3STOP flag
> > > breaks RCU stall detection or clock_nanosleep(), but I agree that
> > > breakage should not happen.
> > >
> > > So while I still think 232ccac1bd9b is the right change to make from a
> > > "following the spec" standpoint
> > 
> > Right, so the spec says:
> > Request the SBI implementation to put the calling hart in a platform
> > specific suspend (or low power) state specified by the suspend_type
> > parameter. The hart will automatically come out of suspended state and
> > resume normal execution when it receives an interrupt or platform
> > specific hardware event.
> > 
> > That, as we have discussed a bunch of times, does not say whether a
> > given interrupt should actually arrive during suspend. The correct
> > behaviour, to me, is to assume that no events arrive during suspend.
> 
> Are you suggesting that we need some property to declare the delivery of
> each kind of interrupt (software, timer, external, PMU)? 

I'm possibly taking things to an extreme, since if we're having a
discussion that covers what the spec does and does not allow I see no
harm in going down the rabbit hole!

Obviously, some sort of event must get the CPU out of suspend - what I
meant was more like "The correct (software) behaviour, to me, is to
assume that, when looking at an individual source, its events may not
arrive during suspend."

I've not looked at the relevant specs to see if they specify whether
their interrupts *must* arrive, just the SBI one that the issue was
created against.

> I assumed that
> external interrupt delivery would be required to consider an idle state
> "viable", but I suppose it would be _possible_ to have a state where
> only timer interrupts are deliverable.

Who knows what some hardware folks will come up with! Maybe I am being
pretty <whatever the modern version of black & white> is here, but I
fear for a repeat whenever someone does something "creative".

I know I've not answered your question about other kinds of properties
but I am well outside my comfort zone here.

Thanks,
Conor.

cpuidle: riscv-sbi: Stop using non-retentive suspend

Checks

Commit Message

Comments

Patch