diff mbox series

clocksource/drivers/riscv: Events are stopped during CPU suspend

Message ID 20220509012121.40031-1-samuel@sholland.org (mailing list archive)
State New, archived
Headers show
Series clocksource/drivers/riscv: Events are stopped during CPU suspend | expand

Commit Message

Samuel Holland May 9, 2022, 1:21 a.m. UTC
Some implementations of the SBI time extension depend on hart-local
state (for example, CSRs) that are lost or hardware that is powered
down when a CPU is suspended. To be safe, the clockevents driver
cannot assume that timer IRQs will be received during CPU suspend.

Fixes: 62b019436814 ("clocksource: new RISC-V SBI timer driver")
Signed-off-by: Samuel Holland <samuel@sholland.org>
---

 drivers/clocksource/timer-riscv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jessica Clarke May 9, 2022, 2:56 a.m. UTC | #1
On 9 May 2022, at 02:21, Samuel Holland <samuel@sholland.org> wrote:
> 
> Some implementations of the SBI time extension depend on hart-local
> state (for example, CSRs) that are lost or hardware that is powered
> down when a CPU is suspended. To be safe, the clockevents driver
> cannot assume that timer IRQs will be received during CPU suspend.
> 
> Fixes: 62b019436814 ("clocksource: new RISC-V SBI timer driver")

Surely that’s not right? A commit from 2018 can’t have been expected to
predict the future, I would expect this to be one of the patches adding
suspend to RISC-V which came years later.

Jess

> Signed-off-by: Samuel Holland <samuel@sholland.org>
> ---
> 
> drivers/clocksource/timer-riscv.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/clocksource/timer-riscv.c b/drivers/clocksource/timer-riscv.c
> index 1767f8bf2013..593d5a957b69 100644
> --- a/drivers/clocksource/timer-riscv.c
> +++ b/drivers/clocksource/timer-riscv.c
> @@ -34,7 +34,7 @@ static int riscv_clock_next_event(unsigned long delta,
> static unsigned int riscv_clock_event_irq;
> static DEFINE_PER_CPU(struct clock_event_device, riscv_clock_event) = {
> 	.name			= "riscv_timer_clockevent",
> -	.features		= CLOCK_EVT_FEAT_ONESHOT,
> +	.features		= CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_C3STOP,
> 	.rating			= 100,
> 	.set_next_event		= riscv_clock_next_event,
> };
> -- 
> 2.35.1
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Anup Patel May 9, 2022, 3:51 a.m. UTC | #2
On Mon, May 9, 2022 at 6:51 AM Samuel Holland <samuel@sholland.org> wrote:
>
> Some implementations of the SBI time extension depend on hart-local
> state (for example, CSRs) that are lost or hardware that is powered
> down when a CPU is suspended. To be safe, the clockevents driver
> cannot assume that timer IRQs will be received during CPU suspend.
>
> Fixes: 62b019436814 ("clocksource: new RISC-V SBI timer driver")
> Signed-off-by: Samuel Holland <samuel@sholland.org>

Looks good to me.

Reviewed-by: Anup Patel <anup@brainfault.org>

Regards,
Anup

> ---
>
>  drivers/clocksource/timer-riscv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/clocksource/timer-riscv.c b/drivers/clocksource/timer-riscv.c
> index 1767f8bf2013..593d5a957b69 100644
> --- a/drivers/clocksource/timer-riscv.c
> +++ b/drivers/clocksource/timer-riscv.c
> @@ -34,7 +34,7 @@ static int riscv_clock_next_event(unsigned long delta,
>  static unsigned int riscv_clock_event_irq;
>  static DEFINE_PER_CPU(struct clock_event_device, riscv_clock_event) = {
>         .name                   = "riscv_timer_clockevent",
> -       .features               = CLOCK_EVT_FEAT_ONESHOT,
> +       .features               = CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_C3STOP,
>         .rating                 = 100,
>         .set_next_event         = riscv_clock_next_event,
>  };
> --
> 2.35.1
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Daniel Lezcano May 9, 2022, 11:44 a.m. UTC | #3
On 09/05/2022 03:21, Samuel Holland wrote:
> Some implementations of the SBI time extension depend on hart-local
> state (for example, CSRs) that are lost or hardware that is powered
> down when a CPU is suspended. To be safe, the clockevents driver
> cannot assume that timer IRQs will be received during CPU suspend.
> 
> Fixes: 62b019436814 ("clocksource: new RISC-V SBI timer driver")
> Signed-off-by: Samuel Holland <samuel@sholland.org>
> ---

Applied, thanks
Conor Dooley Sept. 29, 2022, 9:50 p.m. UTC | #4
On Sun, May 08, 2022 at 08:21:21PM -0500, Samuel Holland wrote:
> Some implementations of the SBI time extension depend on hart-local
> state (for example, CSRs) that are lost or hardware that is powered
> down when a CPU is suspended. To be safe, the clockevents driver
> cannot assume that timer IRQs will be received during CPU suspend.
> 
> Fixes: 62b019436814 ("clocksource: new RISC-V SBI timer driver")
> Signed-off-by: Samuel Holland <samuel@sholland.org>
> ---
> 
>  drivers/clocksource/timer-riscv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/clocksource/timer-riscv.c b/drivers/clocksource/timer-riscv.c
> index 1767f8bf2013..593d5a957b69 100644
> --- a/drivers/clocksource/timer-riscv.c
> +++ b/drivers/clocksource/timer-riscv.c
> @@ -34,7 +34,7 @@ static int riscv_clock_next_event(unsigned long delta,
>  static unsigned int riscv_clock_event_irq;
>  static DEFINE_PER_CPU(struct clock_event_device, riscv_clock_event) = {
>  	.name			= "riscv_timer_clockevent",
> -	.features		= CLOCK_EVT_FEAT_ONESHOT,
> +	.features		= CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_C3STOP,
>  	.rating			= 100,
>  	.set_next_event		= riscv_clock_next_event,
>  };

After a bit of a painful bisection (with a misdirection into the v5.19
printk reverts along the way) I have arrived at this commit for causing
me some issues.

If an AXI read to the PCIe controller on PolarFire SoC times out, the
system will stall, with an expected:
	 io scheduler mq-deadline registered
	 io scheduler kyber registered
	 microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000 ranges:
	 microchip-pcie 2000000000.pcie:      MEM 0x2008000000..0x2087ffffff -> 0x0008000000
	 microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	 microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	 microchip-pcie 2000000000.pcie: axi read request error
	 microchip-pcie 2000000000.pcie: axi read timeout
	 microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	 microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	 microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	 microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	 microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	 microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	 Freeing initrd memory: 7336K
	 mc_event_handler: 667402 callbacks suppressed
	 microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	 microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	 microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	 microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	 microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	 microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	 microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	 microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	 microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	 microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	 mc_event_handler: 666588 callbacks suppressed
<truncated>
	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	mc_event_handler: 666748 callbacks suppressed
	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
	rcu: 	0-...0: (1 GPs behind) idle=19f/1/0x4000000000000002 softirq=34/36 fqs=2626 
		(detected by 1, t=5256 jiffies, g=-1151, q=1143 ncpus=4)
	Task dump for CPU 0:
	task:swapper/0       state:R  running task     stack:    0 pid:    1 ppid:     0 flags:0x00000008
	Call Trace:
	mc_event_handler: 666648 callbacks suppressed

 With this patch applied, the system just locks up without RCU stalling:
	io scheduler mq-deadline registered
	io scheduler kyber registered
	microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000 ranges:
	microchip-pcie 2000000000.pcie:      MEM 0x2008000000..0x2087ffffff -> 0x0008000000
	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: axi read request error
	microchip-pcie 2000000000.pcie: axi read timeout
	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
	Freeing initrd memory: 7332K

As of yet, I have no idea if RCU stalls for other reasons would also be
lost.

Thanks,
Conor.

git bisect start
# status: waiting for both good and bad commits
# good: [7699f7aacf3ebfee51c670b6f796b2797f0f7487] RISC-V: Prepare dropping week attribute from arch_kexec_apply_relocations[_add]
git bisect good 7699f7aacf3ebfee51c670b6f796b2797f0f7487
# bad: [63d5172e148bcc174398040861d867bbd2770be4] HACK: jogness
git bisect bad 63d5172e148bcc174398040861d867bbd2770be4
# good: [2518f226c60d8e04d18ba4295500a5b0b8ac7659] Merge tag 'drm-next-2022-05-25' of git://anongit.freedesktop.org/drm/drm
git bisect good 2518f226c60d8e04d18ba4295500a5b0b8ac7659
# good: [907bb57aa7b471872aab2f2e83e9713a145673f9] Merge tag 'pinctrl-v5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
git bisect good 907bb57aa7b471872aab2f2e83e9713a145673f9
# good: [4ad680f083ec360e0991c453e18a38ed9ae500d7] Merge tag 'staging-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
git bisect good 4ad680f083ec360e0991c453e18a38ed9ae500d7
# good: [23df9ba64bb9e26cfee6b34f5c3ece49a8a61ee1] Merge tag 'for-5.19/parisc-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
git bisect good 23df9ba64bb9e26cfee6b34f5c3ece49a8a61ee1
# bad: [7a68065eb9cd194cf03f135c9211eeb2d5c4c0a0] Merge tag 'gpio-fixes-for-v5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
git bisect bad 7a68065eb9cd194cf03f135c9211eeb2d5c4c0a0
# bad: [1f192b9e8d8a5c619b33a868fb1af063af65ce5d] Merge tag 'drm-misc-fixes-2022-06-09' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
git bisect bad 1f192b9e8d8a5c619b33a868fb1af063af65ce5d
# good: [b2c9a83d262a8feb022e24e9f9aadb66cb10a7a8] Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
git bisect good b2c9a83d262a8feb022e24e9f9aadb66cb10a7a8
# bad: [e17fee8976c3d2ccf9add6d6c8912a37b025d840] Merge tag 'mm-nonmm-stable-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
git bisect bad e17fee8976c3d2ccf9add6d6c8912a37b025d840
# bad: [c049ecc523171481accd2c83f79ffeecbf53a915] Merge tag 'timers-core-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad c049ecc523171481accd2c83f79ffeecbf53a915
# bad: [9c04a8ff03def4df3f81219ffbe1ec9b44ff5348] clocksource/drivers/oxnas-rps: Fix irq_of_parse_and_map() return value
git bisect bad 9c04a8ff03def4df3f81219ffbe1ec9b44ff5348
# bad: [7160d9c4cce94612d5f42a5db392cd606a38737a] clocksource/drivers/armada-370-xp: Convert to SPDX identifier
git bisect bad 7160d9c4cce94612d5f42a5db392cd606a38737a
# bad: [a98399cbc1e05f7b977419f03905501d566cf54e] clocksource/drivers/sp804: Avoid error on multiple instances
git bisect bad a98399cbc1e05f7b977419f03905501d566cf54e
# good: [41929c9f628b9990d33a200c54bb0c919e089aa8] clocksource/drivers/ixp4xx: Drop boardfile probe path
git bisect good 41929c9f628b9990d33a200c54bb0c919e089aa8
# bad: [232ccac1bd9b5bfe73895f527c08623e7fa0752d] clocksource/drivers/riscv: Events are stopped during CPU suspend
git bisect bad 232ccac1bd9b5bfe73895f527c08623e7fa0752d
# first bad commit: [232ccac1bd9b5bfe73895f527c08623e7fa0752d] clocksource/drivers/riscv: Events are stopped during CPU suspend
Palmer Dabbelt Oct. 9, 2022, 11:45 p.m. UTC | #5
On Thu, 29 Sep 2022 14:50:45 PDT (-0700), Conor Dooley wrote:
> On Sun, May 08, 2022 at 08:21:21PM -0500, Samuel Holland wrote:
>> Some implementations of the SBI time extension depend on hart-local
>> state (for example, CSRs) that are lost or hardware that is powered
>> down when a CPU is suspended. To be safe, the clockevents driver
>> cannot assume that timer IRQs will be received during CPU suspend.
>>
>> Fixes: 62b019436814 ("clocksource: new RISC-V SBI timer driver")
>> Signed-off-by: Samuel Holland <samuel@sholland.org>
>> ---
>>
>>  drivers/clocksource/timer-riscv.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/clocksource/timer-riscv.c b/drivers/clocksource/timer-riscv.c
>> index 1767f8bf2013..593d5a957b69 100644
>> --- a/drivers/clocksource/timer-riscv.c
>> +++ b/drivers/clocksource/timer-riscv.c
>> @@ -34,7 +34,7 @@ static int riscv_clock_next_event(unsigned long delta,
>>  static unsigned int riscv_clock_event_irq;
>>  static DEFINE_PER_CPU(struct clock_event_device, riscv_clock_event) = {
>>  	.name			= "riscv_timer_clockevent",
>> -	.features		= CLOCK_EVT_FEAT_ONESHOT,
>> +	.features		= CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_C3STOP,

This is listed as being x86-specific in the header, but there's a hanful 
of other ports that enable it for timers as well.  Looks like arm is 
setting this based on DT, which seems reasonable to me: we're working 
around a firmware bug, there should be some way to turn off that 
workaround for firmware that doesn't have the bug. Looks like Intel already
turns this off when ARAT is supported, which seems to be the case for anything
modern, so maybe we're just tripping up on some untested behavior here?  I'm
not sure exactly how we should probe this, but having it only enabled when we
need the workaround seems like the right way to go.

That said, I'm not actually sure this C3STOP feature does what we want 
given the commit description.  The timers on RISC-V are sort of in this 
odd middle-ground between being per-CPU timers and system timers: the 
time they produce is global (or at least close, due to the mtime 
synchronization rules) but the actual interrupts are only one-shot and 
only local.  From poking around the code I think this just tries to 
setup a periodic broadcast timer, but since we use software fallbacks 
to emulate those we'll still end up losing the interrupts/ticks if the 
CPU that was asked for an interrupt has gone to sleep and lost that 
state.

I'm not sure if I'm just misunderstanding what's going on here, though.  
Is there something that describes the behavior this fixes in more 
detail?

>>  	.rating			= 100,
>>  	.set_next_event		= riscv_clock_next_event,
>>  };
>
> After a bit of a painful bisection (with a misdirection into the v5.19
> printk reverts along the way) I have arrived at this commit for causing
> me some issues.
>
> If an AXI read to the PCIe controller on PolarFire SoC times out, the
> system will stall, with an expected:
> 	 io scheduler mq-deadline registered
> 	 io scheduler kyber registered
> 	 microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000 ranges:
> 	 microchip-pcie 2000000000.pcie:      MEM 0x2008000000..0x2087ffffff -> 0x0008000000
> 	 microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> 	 microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> 	 microchip-pcie 2000000000.pcie: axi read request error
> 	 microchip-pcie 2000000000.pcie: axi read timeout
> 	 microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> 	 microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> 	 microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> 	 microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> 	 microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> 	 microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> 	 Freeing initrd memory: 7336K
> 	 mc_event_handler: 667402 callbacks suppressed
> 	 microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> 	 microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> 	 microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> 	 microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> 	 microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> 	 microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> 	 microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> 	 microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> 	 microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> 	 microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> 	 mc_event_handler: 666588 callbacks suppressed
> <truncated>
> 	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> 	mc_event_handler: 666748 callbacks suppressed
> 	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> 	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> 	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> 	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> 	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> 	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> 	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> 	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> 	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> 	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> 	rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> 	rcu: 	0-...0: (1 GPs behind) idle=19f/1/0x4000000000000002 softirq=34/36 fqs=2626
> 		(detected by 1, t=5256 jiffies, g=-1151, q=1143 ncpus=4)
> 	Task dump for CPU 0:
> 	task:swapper/0       state:R  running task     stack:    0 pid:    1 ppid:     0 flags:0x00000008
> 	Call Trace:
> 	mc_event_handler: 666648 callbacks suppressed
>
>  With this patch applied, the system just locks up without RCU stalling:
> 	io scheduler mq-deadline registered
> 	io scheduler kyber registered
> 	microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000 ranges:
> 	microchip-pcie 2000000000.pcie:      MEM 0x2008000000..0x2087ffffff -> 0x0008000000
> 	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> 	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> 	microchip-pcie 2000000000.pcie: axi read request error
> 	microchip-pcie 2000000000.pcie: axi read timeout
> 	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> 	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> 	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> 	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> 	microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> 	microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> 	Freeing initrd memory: 7332K
>
> As of yet, I have no idea if RCU stalls for other reasons would also be
> lost.

Sorry this broke stuff.  I'm not entirely sure why this would mask RCU 
stalls, but it seems like we're hitting some pretty odd paths here and 
I'm not sure this is expected to work at all for us.

If non-x86 architectures are meant to be able to set 
CLOCK_EVT_FEAT_C3STOP, maybe we should document what it's supposed to do 
in a more platform-agnostic fashion?

> Thanks,
> Conor.
>
> git bisect start
> # status: waiting for both good and bad commits
> # good: [7699f7aacf3ebfee51c670b6f796b2797f0f7487] RISC-V: Prepare dropping week attribute from arch_kexec_apply_relocations[_add]
> git bisect good 7699f7aacf3ebfee51c670b6f796b2797f0f7487
> # bad: [63d5172e148bcc174398040861d867bbd2770be4] HACK: jogness
> git bisect bad 63d5172e148bcc174398040861d867bbd2770be4
> # good: [2518f226c60d8e04d18ba4295500a5b0b8ac7659] Merge tag 'drm-next-2022-05-25' of git://anongit.freedesktop.org/drm/drm
> git bisect good 2518f226c60d8e04d18ba4295500a5b0b8ac7659
> # good: [907bb57aa7b471872aab2f2e83e9713a145673f9] Merge tag 'pinctrl-v5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
> git bisect good 907bb57aa7b471872aab2f2e83e9713a145673f9
> # good: [4ad680f083ec360e0991c453e18a38ed9ae500d7] Merge tag 'staging-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
> git bisect good 4ad680f083ec360e0991c453e18a38ed9ae500d7
> # good: [23df9ba64bb9e26cfee6b34f5c3ece49a8a61ee1] Merge tag 'for-5.19/parisc-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
> git bisect good 23df9ba64bb9e26cfee6b34f5c3ece49a8a61ee1
> # bad: [7a68065eb9cd194cf03f135c9211eeb2d5c4c0a0] Merge tag 'gpio-fixes-for-v5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
> git bisect bad 7a68065eb9cd194cf03f135c9211eeb2d5c4c0a0
> # bad: [1f192b9e8d8a5c619b33a868fb1af063af65ce5d] Merge tag 'drm-misc-fixes-2022-06-09' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
> git bisect bad 1f192b9e8d8a5c619b33a868fb1af063af65ce5d
> # good: [b2c9a83d262a8feb022e24e9f9aadb66cb10a7a8] Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
> git bisect good b2c9a83d262a8feb022e24e9f9aadb66cb10a7a8
> # bad: [e17fee8976c3d2ccf9add6d6c8912a37b025d840] Merge tag 'mm-nonmm-stable-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> git bisect bad e17fee8976c3d2ccf9add6d6c8912a37b025d840
> # bad: [c049ecc523171481accd2c83f79ffeecbf53a915] Merge tag 'timers-core-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect bad c049ecc523171481accd2c83f79ffeecbf53a915
> # bad: [9c04a8ff03def4df3f81219ffbe1ec9b44ff5348] clocksource/drivers/oxnas-rps: Fix irq_of_parse_and_map() return value
> git bisect bad 9c04a8ff03def4df3f81219ffbe1ec9b44ff5348
> # bad: [7160d9c4cce94612d5f42a5db392cd606a38737a] clocksource/drivers/armada-370-xp: Convert to SPDX identifier
> git bisect bad 7160d9c4cce94612d5f42a5db392cd606a38737a
> # bad: [a98399cbc1e05f7b977419f03905501d566cf54e] clocksource/drivers/sp804: Avoid error on multiple instances
> git bisect bad a98399cbc1e05f7b977419f03905501d566cf54e
> # good: [41929c9f628b9990d33a200c54bb0c919e089aa8] clocksource/drivers/ixp4xx: Drop boardfile probe path
> git bisect good 41929c9f628b9990d33a200c54bb0c919e089aa8
> # bad: [232ccac1bd9b5bfe73895f527c08623e7fa0752d] clocksource/drivers/riscv: Events are stopped during CPU suspend
> git bisect bad 232ccac1bd9b5bfe73895f527c08623e7fa0752d
> # first bad commit: [232ccac1bd9b5bfe73895f527c08623e7fa0752d] clocksource/drivers/riscv: Events are stopped during CPU suspend
Samuel Holland Oct. 24, 2022, 5:01 a.m. UTC | #6
On 10/9/22 18:45, Palmer Dabbelt wrote:
> On Thu, 29 Sep 2022 14:50:45 PDT (-0700), Conor Dooley wrote:
>> On Sun, May 08, 2022 at 08:21:21PM -0500, Samuel Holland wrote:
>>> Some implementations of the SBI time extension depend on hart-local
>>> state (for example, CSRs) that are lost or hardware that is powered
>>> down when a CPU is suspended. To be safe, the clockevents driver
>>> cannot assume that timer IRQs will be received during CPU suspend.
>>>
>>> Fixes: 62b019436814 ("clocksource: new RISC-V SBI timer driver")
>>> Signed-off-by: Samuel Holland <samuel@sholland.org>
>>> ---
>>>
>>>  drivers/clocksource/timer-riscv.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/clocksource/timer-riscv.c
>>> b/drivers/clocksource/timer-riscv.c
>>> index 1767f8bf2013..593d5a957b69 100644
>>> --- a/drivers/clocksource/timer-riscv.c
>>> +++ b/drivers/clocksource/timer-riscv.c
>>> @@ -34,7 +34,7 @@ static int riscv_clock_next_event(unsigned long delta,
>>>  static unsigned int riscv_clock_event_irq;
>>>  static DEFINE_PER_CPU(struct clock_event_device, riscv_clock_event) = {
>>>      .name            = "riscv_timer_clockevent",
>>> -    .features        = CLOCK_EVT_FEAT_ONESHOT,
>>> +    .features        = CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_C3STOP,
> 
> This is listed as being x86-specific in the header, but there's a hanful
> of other ports that enable it for timers as well.  Looks like arm is
> setting this based on DT, which seems reasonable to me: we're working
> around a firmware bug, there should be some way to turn off that
> workaround for firmware that doesn't have the bug. Looks like Intel already
> turns this off when ARAT is supported, which seems to be the case for
> anything modern, so maybe we're just tripping up on some untested behavior here? 
> I'm not sure exactly how we should probe this, but having it only enabled
> when we need the workaround seems like the right way to go.

I opened an issue against the SBI spec about what exactly it requires,
but I got no responses:

https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98

My interpretation of the SBI specification is that it does not require
maintaining any hart-local state across a non-retentive hart suspend.
Unless the SBI spec says the timer must fire during/after suspend, then
there is no firmware bug.

> That said, I'm not actually sure this C3STOP feature does what we want
> given the commit description.  The timers on RISC-V are sort of in this
> odd middle-ground between being per-CPU timers and system timers: the
> time they produce is global (or at least close, due to the mtime
> synchronization rules) but the actual interrupts are only one-shot and
> only local.

And if we cannot rely on the interrupt being delivered, we cannot rely
on the SBI time extension to work across cpuidle entry.

> From poking around the code I think this just tries to
> setup a periodic broadcast timer, but since we use software fallbacks to
> emulate those we'll still end up losing the interrupts/ticks if the CPU
> that was asked for an interrupt has gone to sleep and lost that state.

So by extension, non-retentive cpuidle states cannot be used if the SBI
timer is the only available timer, since there is no hardware broadcast
timer to use as a backup.

> I'm not sure if I'm just misunderstanding what's going on here, though. 
> Is there something that describes the behavior this fixes in more detail?

The motivating scenario for this patch is the C906, where the MTIMER is
in the same reset domain as the CPU, so the timer state is lost during
non-retentive suspend. Without this patch, if riscv_timer_clockevent is
the current clockevent driver, then the CPU fails to wake up from
suspend. However, this same problem would occur on any CPU where the
timer or interrupt delivery stops working during suspend.

>>>      .rating            = 100,
>>>      .set_next_event        = riscv_clock_next_event,
>>>  };
>>
>> After a bit of a painful bisection (with a misdirection into the v5.19
>> printk reverts along the way) I have arrived at this commit for causing
>> me some issues.
>>
>> If an AXI read to the PCIe controller on PolarFire SoC times out, the
>> system will stall, with an expected:
>>      io scheduler mq-deadline registered
>>      io scheduler kyber registered
>>      microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000
>> ranges:
>>      microchip-pcie 2000000000.pcie:      MEM
>> 0x2008000000..0x2087ffffff -> 0x0008000000
>>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>      microchip-pcie 2000000000.pcie: axi read request error
>>      microchip-pcie 2000000000.pcie: axi read timeout
>>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>      Freeing initrd memory: 7336K
>>      mc_event_handler: 667402 callbacks suppressed
>>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>      mc_event_handler: 666588 callbacks suppressed
>> <truncated>
>>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>     mc_event_handler: 666748 callbacks suppressed
>>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>     rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>>     rcu:     0-...0: (1 GPs behind) idle=19f/1/0x4000000000000002
>> softirq=34/36 fqs=2626
>>         (detected by 1, t=5256 jiffies, g=-1151, q=1143 ncpus=4)
>>     Task dump for CPU 0:
>>     task:swapper/0       state:R  running task     stack:    0 pid:   
>> 1 ppid:     0 flags:0x00000008
>>     Call Trace:
>>     mc_event_handler: 666648 callbacks suppressed
>>
>>  With this patch applied, the system just locks up without RCU stalling:
>>     io scheduler mq-deadline registered
>>     io scheduler kyber registered
>>     microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000
>> ranges:
>>     microchip-pcie 2000000000.pcie:      MEM
>> 0x2008000000..0x2087ffffff -> 0x0008000000
>>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>     microchip-pcie 2000000000.pcie: axi read request error
>>     microchip-pcie 2000000000.pcie: axi read timeout
>>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>     Freeing initrd memory: 7332K
>>
>> As of yet, I have no idea if RCU stalls for other reasons would also be
>> lost.
> 
> Sorry this broke stuff.  I'm not entirely sure why this would mask RCU
> stalls, but it seems like we're hitting some pretty odd paths here and
> I'm not sure this is expected to work at all for us.

I'm confused here. The RCU stall is itself a bug, right? Are you sure
this patch is wrongly masking the stall, or is it possibly just avoiding
some buggy code and not causing a stall in the first place?

Regards,
Samuel

> If non-x86 architectures are meant to be able to set
> CLOCK_EVT_FEAT_C3STOP, maybe we should document what it's supposed to do
> in a more platform-agnostic fashion?
> 
>> Thanks,
>> Conor.
>>
>> git bisect start
>> # status: waiting for both good and bad commits
>> # good: [7699f7aacf3ebfee51c670b6f796b2797f0f7487] RISC-V: Prepare
>> dropping week attribute from arch_kexec_apply_relocations[_add]
>> git bisect good 7699f7aacf3ebfee51c670b6f796b2797f0f7487
>> # bad: [63d5172e148bcc174398040861d867bbd2770be4] HACK: jogness
>> git bisect bad 63d5172e148bcc174398040861d867bbd2770be4
>> # good: [2518f226c60d8e04d18ba4295500a5b0b8ac7659] Merge tag
>> 'drm-next-2022-05-25' of git://anongit.freedesktop.org/drm/drm
>> git bisect good 2518f226c60d8e04d18ba4295500a5b0b8ac7659
>> # good: [907bb57aa7b471872aab2f2e83e9713a145673f9] Merge tag
>> 'pinctrl-v5.19-1' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
>> git bisect good 907bb57aa7b471872aab2f2e83e9713a145673f9
>> # good: [4ad680f083ec360e0991c453e18a38ed9ae500d7] Merge tag
>> 'staging-5.19-rc1' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
>> git bisect good 4ad680f083ec360e0991c453e18a38ed9ae500d7
>> # good: [23df9ba64bb9e26cfee6b34f5c3ece49a8a61ee1] Merge tag
>> 'for-5.19/parisc-2' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
>> git bisect good 23df9ba64bb9e26cfee6b34f5c3ece49a8a61ee1
>> # bad: [7a68065eb9cd194cf03f135c9211eeb2d5c4c0a0] Merge tag
>> 'gpio-fixes-for-v5.19-rc2' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
>> git bisect bad 7a68065eb9cd194cf03f135c9211eeb2d5c4c0a0
>> # bad: [1f192b9e8d8a5c619b33a868fb1af063af65ce5d] Merge tag
>> 'drm-misc-fixes-2022-06-09' of
>> git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
>> git bisect bad 1f192b9e8d8a5c619b33a868fb1af063af65ce5d
>> # good: [b2c9a83d262a8feb022e24e9f9aadb66cb10a7a8] Merge tag
>> 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
>> git bisect good b2c9a83d262a8feb022e24e9f9aadb66cb10a7a8
>> # bad: [e17fee8976c3d2ccf9add6d6c8912a37b025d840] Merge tag
>> 'mm-nonmm-stable-2022-06-05' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
>> git bisect bad e17fee8976c3d2ccf9add6d6c8912a37b025d840
>> # bad: [c049ecc523171481accd2c83f79ffeecbf53a915] Merge tag
>> 'timers-core-2022-06-05' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> git bisect bad c049ecc523171481accd2c83f79ffeecbf53a915
>> # bad: [9c04a8ff03def4df3f81219ffbe1ec9b44ff5348]
>> clocksource/drivers/oxnas-rps: Fix irq_of_parse_and_map() return value
>> git bisect bad 9c04a8ff03def4df3f81219ffbe1ec9b44ff5348
>> # bad: [7160d9c4cce94612d5f42a5db392cd606a38737a]
>> clocksource/drivers/armada-370-xp: Convert to SPDX identifier
>> git bisect bad 7160d9c4cce94612d5f42a5db392cd606a38737a
>> # bad: [a98399cbc1e05f7b977419f03905501d566cf54e]
>> clocksource/drivers/sp804: Avoid error on multiple instances
>> git bisect bad a98399cbc1e05f7b977419f03905501d566cf54e
>> # good: [41929c9f628b9990d33a200c54bb0c919e089aa8]
>> clocksource/drivers/ixp4xx: Drop boardfile probe path
>> git bisect good 41929c9f628b9990d33a200c54bb0c919e089aa8
>> # bad: [232ccac1bd9b5bfe73895f527c08623e7fa0752d]
>> clocksource/drivers/riscv: Events are stopped during CPU suspend
>> git bisect bad 232ccac1bd9b5bfe73895f527c08623e7fa0752d
>> # first bad commit: [232ccac1bd9b5bfe73895f527c08623e7fa0752d]
>> clocksource/drivers/riscv: Events are stopped during CPU suspend
Conor Dooley Oct. 24, 2022, 7:04 a.m. UTC | #7
On 24 October 2022 06:01:24 IST, Samuel Holland <samuel@sholland.org> wrote:
>On 10/9/22 18:45, Palmer Dabbelt wrote:
>> On Thu, 29 Sep 2022 14:50:45 PDT (-0700), Conor Dooley wrote:
>>> On Sun, May 08, 2022 at 08:21:21PM -0500, Samuel Holland wrote:
>>>> Some implementations of the SBI time extension depend on hart-local
>>>> state (for example, CSRs) that are lost or hardware that is powered
>>>> down when a CPU is suspended. To be safe, the clockevents driver
>>>> cannot assume that timer IRQs will be received during CPU suspend.
>>>>
>>>> Fixes: 62b019436814 ("clocksource: new RISC-V SBI timer driver")
>>>> Signed-off-by: Samuel Holland <samuel@sholland.org>
>>>> ---
>>>>
>>>>  drivers/clocksource/timer-riscv.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/clocksource/timer-riscv.c
>>>> b/drivers/clocksource/timer-riscv.c
>>>> index 1767f8bf2013..593d5a957b69 100644
>>>> --- a/drivers/clocksource/timer-riscv.c
>>>> +++ b/drivers/clocksource/timer-riscv.c
>>>> @@ -34,7 +34,7 @@ static int riscv_clock_next_event(unsigned long delta,
>>>>  static unsigned int riscv_clock_event_irq;
>>>>  static DEFINE_PER_CPU(struct clock_event_device, riscv_clock_event) = {
>>>>      .name            = "riscv_timer_clockevent",
>>>> -    .features        = CLOCK_EVT_FEAT_ONESHOT,
>>>> +    .features        = CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_C3STOP,
>> 
>> This is listed as being x86-specific in the header, but there's a hanful
>> of other ports that enable it for timers as well.  Looks like arm is
>> setting this based on DT, which seems reasonable to me: we're working
>> around a firmware bug, there should be some way to turn off that
>> workaround for firmware that doesn't have the bug. Looks like Intel already
>> turns this off when ARAT is supported, which seems to be the case for
>> anything modern, so maybe we're just tripping up on some untested behavior here? 
>> I'm not sure exactly how we should probe this, but having it only enabled
>> when we need the workaround seems like the right way to go.
>
>I opened an issue against the SBI spec about what exactly it requires,
>but I got no responses:
>
>https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98
>
>My interpretation of the SBI specification is that it does not require
>maintaining any hart-local state across a non-retentive hart suspend.
>Unless the SBI spec says the timer must fire during/after suspend, then
>there is no firmware bug.
>
>> That said, I'm not actually sure this C3STOP feature does what we want
>> given the commit description.  The timers on RISC-V are sort of in this
>> odd middle-ground between being per-CPU timers and system timers: the
>> time they produce is global (or at least close, due to the mtime
>> synchronization rules) but the actual interrupts are only one-shot and
>> only local.
>
>And if we cannot rely on the interrupt being delivered, we cannot rely
>on the SBI time extension to work across cpuidle entry.
>
>> From poking around the code I think this just tries to
>> setup a periodic broadcast timer, but since we use software fallbacks to
>> emulate those we'll still end up losing the interrupts/ticks if the CPU
>> that was asked for an interrupt has gone to sleep and lost that state.
>
>So by extension, non-retentive cpuidle states cannot be used if the SBI
>timer is the only available timer, since there is no hardware broadcast
>timer to use as a backup.
>
>> I'm not sure if I'm just misunderstanding what's going on here, though. 
>> Is there something that describes the behavior this fixes in more detail?
>
>The motivating scenario for this patch is the C906, where the MTIMER is
>in the same reset domain as the CPU, so the timer state is lost during
>non-retentive suspend. Without this patch, if riscv_timer_clockevent is
>the current clockevent driver, then the CPU fails to wake up from
>suspend. However, this same problem would occur on any CPU where the
>timer or interrupt delivery stops working during suspend.
>
>>>>      .rating            = 100,
>>>>      .set_next_event        = riscv_clock_next_event,
>>>>  };
>>>
>>> After a bit of a painful bisection (with a misdirection into the v5.19
>>> printk reverts along the way) I have arrived at this commit for causing
>>> me some issues.
>>>
>>> If an AXI read to the PCIe controller on PolarFire SoC times out, the
>>> system will stall, with an expected:
>>>      io scheduler mq-deadline registered
>>>      io scheduler kyber registered
>>>      microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000
>>> ranges:
>>>      microchip-pcie 2000000000.pcie:      MEM
>>> 0x2008000000..0x2087ffffff -> 0x0008000000
>>>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>>      microchip-pcie 2000000000.pcie: axi read request error
>>>      microchip-pcie 2000000000.pcie: axi read timeout
>>>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>>      Freeing initrd memory: 7336K
>>>      mc_event_handler: 667402 callbacks suppressed
>>>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>>      mc_event_handler: 666588 callbacks suppressed
>>> <truncated>
>>>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>>     mc_event_handler: 666748 callbacks suppressed
>>>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>>     rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>>>     rcu:     0-...0: (1 GPs behind) idle=19f/1/0x4000000000000002
>>> softirq=34/36 fqs=2626
>>>         (detected by 1, t=5256 jiffies, g=-1151, q=1143 ncpus=4)
>>>     Task dump for CPU 0:
>>>     task:swapper/0       state:R  running task     stack:    0 pid:   
>>> 1 ppid:     0 flags:0x00000008
>>>     Call Trace:
>>>     mc_event_handler: 666648 callbacks suppressed
>>>
>>>  With this patch applied, the system just locks up without RCU stalling:
>>>     io scheduler mq-deadline registered
>>>     io scheduler kyber registered
>>>     microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000
>>> ranges:
>>>     microchip-pcie 2000000000.pcie:      MEM
>>> 0x2008000000..0x2087ffffff -> 0x0008000000
>>>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>>     microchip-pcie 2000000000.pcie: axi read request error
>>>     microchip-pcie 2000000000.pcie: axi read timeout
>>>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>>     Freeing initrd memory: 7332K
>>>
>>> As of yet, I have no idea if RCU stalls for other reasons would also be
>>> lost.
>> 
>> Sorry this broke stuff.  I'm not entirely sure why this would mask RCU
>> stalls, but it seems like we're hitting some pretty odd paths here and
>> I'm not sure this is expected to work at all for us.
>
>I'm confused here. The RCU stall is itself a bug, right? Are you sure
>this patch is wrongly masking the stall, or is it possibly just avoiding
>some buggy code and not causing a stall in the first place?

(Apologies for not truncating the message, phone...)
The lockup is caused by a bad PCI root port sending many error interrupts.
This patch doesn't cause the lockup, that's caused by a problem in the hardware.
Instead it causes the RCU stall detection to not trigger.
Without this patch, the stall detection kicks in and with this patch the system locks up and never progresses.

Hope that makes it clearer? 

>
>Regards,
>Samuel
>
>> If non-x86 architectures are meant to be able to set
>> CLOCK_EVT_FEAT_C3STOP, maybe we should document what it's supposed to do
>> in a more platform-agnostic fashion?
>> 
>>> Thanks,
>>> Conor.
>>>
>>> git bisect start
>>> # status: waiting for both good and bad commits
>>> # good: [7699f7aacf3ebfee51c670b6f796b2797f0f7487] RISC-V: Prepare
>>> dropping week attribute from arch_kexec_apply_relocations[_add]
>>> git bisect good 7699f7aacf3ebfee51c670b6f796b2797f0f7487
>>> # bad: [63d5172e148bcc174398040861d867bbd2770be4] HACK: jogness
>>> git bisect bad 63d5172e148bcc174398040861d867bbd2770be4
>>> # good: [2518f226c60d8e04d18ba4295500a5b0b8ac7659] Merge tag
>>> 'drm-next-2022-05-25' of git://anongit.freedesktop.org/drm/drm
>>> git bisect good 2518f226c60d8e04d18ba4295500a5b0b8ac7659
>>> # good: [907bb57aa7b471872aab2f2e83e9713a145673f9] Merge tag
>>> 'pinctrl-v5.19-1' of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
>>> git bisect good 907bb57aa7b471872aab2f2e83e9713a145673f9
>>> # good: [4ad680f083ec360e0991c453e18a38ed9ae500d7] Merge tag
>>> 'staging-5.19-rc1' of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
>>> git bisect good 4ad680f083ec360e0991c453e18a38ed9ae500d7
>>> # good: [23df9ba64bb9e26cfee6b34f5c3ece49a8a61ee1] Merge tag
>>> 'for-5.19/parisc-2' of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
>>> git bisect good 23df9ba64bb9e26cfee6b34f5c3ece49a8a61ee1
>>> # bad: [7a68065eb9cd194cf03f135c9211eeb2d5c4c0a0] Merge tag
>>> 'gpio-fixes-for-v5.19-rc2' of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
>>> git bisect bad 7a68065eb9cd194cf03f135c9211eeb2d5c4c0a0
>>> # bad: [1f192b9e8d8a5c619b33a868fb1af063af65ce5d] Merge tag
>>> 'drm-misc-fixes-2022-06-09' of
>>> git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
>>> git bisect bad 1f192b9e8d8a5c619b33a868fb1af063af65ce5d
>>> # good: [b2c9a83d262a8feb022e24e9f9aadb66cb10a7a8] Merge tag
>>> 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
>>> git bisect good b2c9a83d262a8feb022e24e9f9aadb66cb10a7a8
>>> # bad: [e17fee8976c3d2ccf9add6d6c8912a37b025d840] Merge tag
>>> 'mm-nonmm-stable-2022-06-05' of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
>>> git bisect bad e17fee8976c3d2ccf9add6d6c8912a37b025d840
>>> # bad: [c049ecc523171481accd2c83f79ffeecbf53a915] Merge tag
>>> 'timers-core-2022-06-05' of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>> git bisect bad c049ecc523171481accd2c83f79ffeecbf53a915
>>> # bad: [9c04a8ff03def4df3f81219ffbe1ec9b44ff5348]
>>> clocksource/drivers/oxnas-rps: Fix irq_of_parse_and_map() return value
>>> git bisect bad 9c04a8ff03def4df3f81219ffbe1ec9b44ff5348
>>> # bad: [7160d9c4cce94612d5f42a5db392cd606a38737a]
>>> clocksource/drivers/armada-370-xp: Convert to SPDX identifier
>>> git bisect bad 7160d9c4cce94612d5f42a5db392cd606a38737a
>>> # bad: [a98399cbc1e05f7b977419f03905501d566cf54e]
>>> clocksource/drivers/sp804: Avoid error on multiple instances
>>> git bisect bad a98399cbc1e05f7b977419f03905501d566cf54e
>>> # good: [41929c9f628b9990d33a200c54bb0c919e089aa8]
>>> clocksource/drivers/ixp4xx: Drop boardfile probe path
>>> git bisect good 41929c9f628b9990d33a200c54bb0c919e089aa8
>>> # bad: [232ccac1bd9b5bfe73895f527c08623e7fa0752d]
>>> clocksource/drivers/riscv: Events are stopped during CPU suspend
>>> git bisect bad 232ccac1bd9b5bfe73895f527c08623e7fa0752d
>>> # first bad commit: [232ccac1bd9b5bfe73895f527c08623e7fa0752d]
>>> clocksource/drivers/riscv: Events are stopped during CPU suspend
>
Anup Patel Oct. 24, 2022, 8:04 a.m. UTC | #8
On Mon, Oct 24, 2022 at 10:31 AM Samuel Holland <samuel@sholland.org> wrote:
>
> On 10/9/22 18:45, Palmer Dabbelt wrote:
> > On Thu, 29 Sep 2022 14:50:45 PDT (-0700), Conor Dooley wrote:
> >> On Sun, May 08, 2022 at 08:21:21PM -0500, Samuel Holland wrote:
> >>> Some implementations of the SBI time extension depend on hart-local
> >>> state (for example, CSRs) that are lost or hardware that is powered
> >>> down when a CPU is suspended. To be safe, the clockevents driver
> >>> cannot assume that timer IRQs will be received during CPU suspend.
> >>>
> >>> Fixes: 62b019436814 ("clocksource: new RISC-V SBI timer driver")
> >>> Signed-off-by: Samuel Holland <samuel@sholland.org>
> >>> ---
> >>>
> >>>  drivers/clocksource/timer-riscv.c | 2 +-
> >>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/clocksource/timer-riscv.c
> >>> b/drivers/clocksource/timer-riscv.c
> >>> index 1767f8bf2013..593d5a957b69 100644
> >>> --- a/drivers/clocksource/timer-riscv.c
> >>> +++ b/drivers/clocksource/timer-riscv.c
> >>> @@ -34,7 +34,7 @@ static int riscv_clock_next_event(unsigned long delta,
> >>>  static unsigned int riscv_clock_event_irq;
> >>>  static DEFINE_PER_CPU(struct clock_event_device, riscv_clock_event) = {
> >>>      .name            = "riscv_timer_clockevent",
> >>> -    .features        = CLOCK_EVT_FEAT_ONESHOT,
> >>> +    .features        = CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_C3STOP,
> >
> > This is listed as being x86-specific in the header, but there's a hanful
> > of other ports that enable it for timers as well.  Looks like arm is
> > setting this based on DT, which seems reasonable to me: we're working
> > around a firmware bug, there should be some way to turn off that
> > workaround for firmware that doesn't have the bug. Looks like Intel already
> > turns this off when ARAT is supported, which seems to be the case for
> > anything modern, so maybe we're just tripping up on some untested behavior here?
> > I'm not sure exactly how we should probe this, but having it only enabled
> > when we need the workaround seems like the right way to go.
>
> I opened an issue against the SBI spec about what exactly it requires,
> but I got no responses:
>
> https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98
>
> My interpretation of the SBI specification is that it does not require
> maintaining any hart-local state across a non-retentive hart suspend.
> Unless the SBI spec says the timer must fire during/after suspend, then
> there is no firmware bug.

SBI spec only defines the mechanism to enter HART suspend state. All
other details (such as timer interrupt firing during/after suspend) are platform
or implementation specific which needs to be discovered from DT/ACPI.

>
> > That said, I'm not actually sure this C3STOP feature does what we want
> > given the commit description.  The timers on RISC-V are sort of in this
> > odd middle-ground between being per-CPU timers and system timers: the
> > time they produce is global (or at least close, due to the mtime
> > synchronization rules) but the actual interrupts are only one-shot and
> > only local.
>
> And if we cannot rely on the interrupt being delivered, we cannot rely
> on the SBI time extension to work across cpuidle entry.

Just like ARM, we need a DT property to discover this platform specific
behaviour.

I had sent out a small series to do this for DT.
Refer, "[PATCH v2 0/2] Improve CLOCK_EVT_FEAT_C3STOP feature setting"
https://lore.kernel.org/all/20220727114302.302201-1-apatel@ventanamicro.com/

>
> > From poking around the code I think this just tries to
> > setup a periodic broadcast timer, but since we use software fallbacks to
> > emulate those we'll still end up losing the interrupts/ticks if the CPU
> > that was asked for an interrupt has gone to sleep and lost that state.
>
> So by extension, non-retentive cpuidle states cannot be used if the SBI
> timer is the only available timer, since there is no hardware broadcast
> timer to use as a backup.
>
> > I'm not sure if I'm just misunderstanding what's going on here, though.
> > Is there something that describes the behavior this fixes in more detail?
>
> The motivating scenario for this patch is the C906, where the MTIMER is
> in the same reset domain as the CPU, so the timer state is lost during
> non-retentive suspend. Without this patch, if riscv_timer_clockevent is
> the current clockevent driver, then the CPU fails to wake up from
> suspend. However, this same problem would occur on any CPU where the
> timer or interrupt delivery stops working during suspend.

Yes, I recall CLOCK_EVT_FEAT_C3STOP was added for C906 but
we should go the DT/ACPI route.

Regards,
Anup

>
> >>>      .rating            = 100,
> >>>      .set_next_event        = riscv_clock_next_event,
> >>>  };
> >>
> >> After a bit of a painful bisection (with a misdirection into the v5.19
> >> printk reverts along the way) I have arrived at this commit for causing
> >> me some issues.
> >>
> >> If an AXI read to the PCIe controller on PolarFire SoC times out, the
> >> system will stall, with an expected:
> >>      io scheduler mq-deadline registered
> >>      io scheduler kyber registered
> >>      microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000
> >> ranges:
> >>      microchip-pcie 2000000000.pcie:      MEM
> >> 0x2008000000..0x2087ffffff -> 0x0008000000
> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> >>      microchip-pcie 2000000000.pcie: axi read request error
> >>      microchip-pcie 2000000000.pcie: axi read timeout
> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> >>      Freeing initrd memory: 7336K
> >>      mc_event_handler: 667402 callbacks suppressed
> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> >>      mc_event_handler: 666588 callbacks suppressed
> >> <truncated>
> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> >>     mc_event_handler: 666748 callbacks suppressed
> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> >>     rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> >>     rcu:     0-...0: (1 GPs behind) idle=19f/1/0x4000000000000002
> >> softirq=34/36 fqs=2626
> >>         (detected by 1, t=5256 jiffies, g=-1151, q=1143 ncpus=4)
> >>     Task dump for CPU 0:
> >>     task:swapper/0       state:R  running task     stack:    0 pid:
> >> 1 ppid:     0 flags:0x00000008
> >>     Call Trace:
> >>     mc_event_handler: 666648 callbacks suppressed
> >>
> >>  With this patch applied, the system just locks up without RCU stalling:
> >>     io scheduler mq-deadline registered
> >>     io scheduler kyber registered
> >>     microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000
> >> ranges:
> >>     microchip-pcie 2000000000.pcie:      MEM
> >> 0x2008000000..0x2087ffffff -> 0x0008000000
> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> >>     microchip-pcie 2000000000.pcie: axi read request error
> >>     microchip-pcie 2000000000.pcie: axi read timeout
> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
> >>     Freeing initrd memory: 7332K
> >>
> >> As of yet, I have no idea if RCU stalls for other reasons would also be
> >> lost.
> >
> > Sorry this broke stuff.  I'm not entirely sure why this would mask RCU
> > stalls, but it seems like we're hitting some pretty odd paths here and
> > I'm not sure this is expected to work at all for us.
>
> I'm confused here. The RCU stall is itself a bug, right? Are you sure
> this patch is wrongly masking the stall, or is it possibly just avoiding
> some buggy code and not causing a stall in the first place?
>
> Regards,
> Samuel
>
> > If non-x86 architectures are meant to be able to set
> > CLOCK_EVT_FEAT_C3STOP, maybe we should document what it's supposed to do
> > in a more platform-agnostic fashion?
> >
> >> Thanks,
> >> Conor.
> >>
> >> git bisect start
> >> # status: waiting for both good and bad commits
> >> # good: [7699f7aacf3ebfee51c670b6f796b2797f0f7487] RISC-V: Prepare
> >> dropping week attribute from arch_kexec_apply_relocations[_add]
> >> git bisect good 7699f7aacf3ebfee51c670b6f796b2797f0f7487
> >> # bad: [63d5172e148bcc174398040861d867bbd2770be4] HACK: jogness
> >> git bisect bad 63d5172e148bcc174398040861d867bbd2770be4
> >> # good: [2518f226c60d8e04d18ba4295500a5b0b8ac7659] Merge tag
> >> 'drm-next-2022-05-25' of git://anongit.freedesktop.org/drm/drm
> >> git bisect good 2518f226c60d8e04d18ba4295500a5b0b8ac7659
> >> # good: [907bb57aa7b471872aab2f2e83e9713a145673f9] Merge tag
> >> 'pinctrl-v5.19-1' of
> >> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
> >> git bisect good 907bb57aa7b471872aab2f2e83e9713a145673f9
> >> # good: [4ad680f083ec360e0991c453e18a38ed9ae500d7] Merge tag
> >> 'staging-5.19-rc1' of
> >> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
> >> git bisect good 4ad680f083ec360e0991c453e18a38ed9ae500d7
> >> # good: [23df9ba64bb9e26cfee6b34f5c3ece49a8a61ee1] Merge tag
> >> 'for-5.19/parisc-2' of
> >> git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
> >> git bisect good 23df9ba64bb9e26cfee6b34f5c3ece49a8a61ee1
> >> # bad: [7a68065eb9cd194cf03f135c9211eeb2d5c4c0a0] Merge tag
> >> 'gpio-fixes-for-v5.19-rc2' of
> >> git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
> >> git bisect bad 7a68065eb9cd194cf03f135c9211eeb2d5c4c0a0
> >> # bad: [1f192b9e8d8a5c619b33a868fb1af063af65ce5d] Merge tag
> >> 'drm-misc-fixes-2022-06-09' of
> >> git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
> >> git bisect bad 1f192b9e8d8a5c619b33a868fb1af063af65ce5d
> >> # good: [b2c9a83d262a8feb022e24e9f9aadb66cb10a7a8] Merge tag
> >> 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
> >> git bisect good b2c9a83d262a8feb022e24e9f9aadb66cb10a7a8
> >> # bad: [e17fee8976c3d2ccf9add6d6c8912a37b025d840] Merge tag
> >> 'mm-nonmm-stable-2022-06-05' of
> >> git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
> >> git bisect bad e17fee8976c3d2ccf9add6d6c8912a37b025d840
> >> # bad: [c049ecc523171481accd2c83f79ffeecbf53a915] Merge tag
> >> 'timers-core-2022-06-05' of
> >> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> >> git bisect bad c049ecc523171481accd2c83f79ffeecbf53a915
> >> # bad: [9c04a8ff03def4df3f81219ffbe1ec9b44ff5348]
> >> clocksource/drivers/oxnas-rps: Fix irq_of_parse_and_map() return value
> >> git bisect bad 9c04a8ff03def4df3f81219ffbe1ec9b44ff5348
> >> # bad: [7160d9c4cce94612d5f42a5db392cd606a38737a]
> >> clocksource/drivers/armada-370-xp: Convert to SPDX identifier
> >> git bisect bad 7160d9c4cce94612d5f42a5db392cd606a38737a
> >> # bad: [a98399cbc1e05f7b977419f03905501d566cf54e]
> >> clocksource/drivers/sp804: Avoid error on multiple instances
> >> git bisect bad a98399cbc1e05f7b977419f03905501d566cf54e
> >> # good: [41929c9f628b9990d33a200c54bb0c919e089aa8]
> >> clocksource/drivers/ixp4xx: Drop boardfile probe path
> >> git bisect good 41929c9f628b9990d33a200c54bb0c919e089aa8
> >> # bad: [232ccac1bd9b5bfe73895f527c08623e7fa0752d]
> >> clocksource/drivers/riscv: Events are stopped during CPU suspend
> >> git bisect bad 232ccac1bd9b5bfe73895f527c08623e7fa0752d
> >> # first bad commit: [232ccac1bd9b5bfe73895f527c08623e7fa0752d]
> >> clocksource/drivers/riscv: Events are stopped during CPU suspend
>
Palmer Dabbelt Oct. 27, 2022, 11:07 p.m. UTC | #9
On Mon, 24 Oct 2022 01:04:55 PDT (-0700), apatel@ventanamicro.com wrote:
> On Mon, Oct 24, 2022 at 10:31 AM Samuel Holland <samuel@sholland.org> wrote:
>>
>> On 10/9/22 18:45, Palmer Dabbelt wrote:
>> > On Thu, 29 Sep 2022 14:50:45 PDT (-0700), Conor Dooley wrote:
>> >> On Sun, May 08, 2022 at 08:21:21PM -0500, Samuel Holland wrote:
>> >>> Some implementations of the SBI time extension depend on hart-local
>> >>> state (for example, CSRs) that are lost or hardware that is powered
>> >>> down when a CPU is suspended. To be safe, the clockevents driver
>> >>> cannot assume that timer IRQs will be received during CPU suspend.
>> >>>
>> >>> Fixes: 62b019436814 ("clocksource: new RISC-V SBI timer driver")
>> >>> Signed-off-by: Samuel Holland <samuel@sholland.org>
>> >>> ---
>> >>>
>> >>>  drivers/clocksource/timer-riscv.c | 2 +-
>> >>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> >>>
>> >>> diff --git a/drivers/clocksource/timer-riscv.c
>> >>> b/drivers/clocksource/timer-riscv.c
>> >>> index 1767f8bf2013..593d5a957b69 100644
>> >>> --- a/drivers/clocksource/timer-riscv.c
>> >>> +++ b/drivers/clocksource/timer-riscv.c
>> >>> @@ -34,7 +34,7 @@ static int riscv_clock_next_event(unsigned long delta,
>> >>>  static unsigned int riscv_clock_event_irq;
>> >>>  static DEFINE_PER_CPU(struct clock_event_device, riscv_clock_event) = {
>> >>>      .name            = "riscv_timer_clockevent",
>> >>> -    .features        = CLOCK_EVT_FEAT_ONESHOT,
>> >>> +    .features        = CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_C3STOP,
>> >
>> > This is listed as being x86-specific in the header, but there's a hanful
>> > of other ports that enable it for timers as well.  Looks like arm is
>> > setting this based on DT, which seems reasonable to me: we're working
>> > around a firmware bug, there should be some way to turn off that
>> > workaround for firmware that doesn't have the bug. Looks like Intel already
>> > turns this off when ARAT is supported, which seems to be the case for
>> > anything modern, so maybe we're just tripping up on some untested behavior here?
>> > I'm not sure exactly how we should probe this, but having it only enabled
>> > when we need the workaround seems like the right way to go.
>>
>> I opened an issue against the SBI spec about what exactly it requires,
>> but I got no responses:
>>
>> https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98
>>
>> My interpretation of the SBI specification is that it does not require
>> maintaining any hart-local state across a non-retentive hart suspend.
>> Unless the SBI spec says the timer must fire during/after suspend, then
>> there is no firmware bug.

IMO this is a grey area in the spec: it says both "The hart will 
automatically come out of suspended state and resume normal execution 
when it receives an interrupt or platform specific hardware event." but 
"sstatus.SIE = 0" on resume.  It's not clear _when_ "sstatus.SIE" must 
take the value 0 (ie, is it before sleeping or after taking up) and if 
"receives an interrupt" means _any_ interrupt, or just enabled 
interrupts.

I agree we can't say it's a firmware bug, though.  There's certainly 
some reading of the spec that allows for this -- even if there wasn't 
we'd still have to live with whatever the firmware does, but here I 
think it's just a Linux bug.

> SBI spec only defines the mechanism to enter HART suspend state. All
> other details (such as timer interrupt firing during/after suspend) are platform
> or implementation specific which needs to be discovered from DT/ACPI.

From that bug it sounds like it's really platform-specific whether or 
not it's possible to wake up from non-retentive suspend, so maybe we 
should just add some sort of DT node that says "non-retentive suspend 
works" and then only use it on those systems?

>> > That said, I'm not actually sure this C3STOP feature does what we want
>> > given the commit description.  The timers on RISC-V are sort of in this
>> > odd middle-ground between being per-CPU timers and system timers: the
>> > time they produce is global (or at least close, due to the mtime
>> > synchronization rules) but the actual interrupts are only one-shot and
>> > only local.
>>
>> And if we cannot rely on the interrupt being delivered, we cannot rely
>> on the SBI time extension to work across cpuidle entry.
>
> Just like ARM, we need a DT property to discover this platform specific
> behaviour.
>
> I had sent out a small series to do this for DT.
> Refer, "[PATCH v2 0/2] Improve CLOCK_EVT_FEAT_C3STOP feature setting"
> https://lore.kernel.org/all/20220727114302.302201-1-apatel@ventanamicro.com/

I broadly agree with the "we should split out the timer node" stuff 
there.  It used to be part of the core, but it's not any more.  Looks 
like that was the hangup, though I'm not sure setting C3STOP is even the 
right fix any more...

>> > From poking around the code I think this just tries to
>> > setup a periodic broadcast timer, but since we use software fallbacks to
>> > emulate those we'll still end up losing the interrupts/ticks if the CPU
>> > that was asked for an interrupt has gone to sleep and lost that state.
>>
>> So by extension, non-retentive cpuidle states cannot be used if the SBI
>> timer is the only available timer, since there is no hardware broadcast
>> timer to use as a backup.
>>
>> > I'm not sure if I'm just misunderstanding what's going on here, though.
>> > Is there something that describes the behavior this fixes in more detail?
>>
>> The motivating scenario for this patch is the C906, where the MTIMER is
>> in the same reset domain as the CPU, so the timer state is lost during
>> non-retentive suspend. Without this patch, if riscv_timer_clockevent is
>> the current clockevent driver, then the CPU fails to wake up from
>> suspend. However, this same problem would occur on any CPU where the
>> timer or interrupt delivery stops working during suspend.
>
> Yes, I recall CLOCK_EVT_FEAT_C3STOP was added for C906 but
> we should go the DT/ACPI route.

I agree, though I think this should disable non-retentive suspend as 
opposed to setting C3STOP.

> Regards,
> Anup
>
>>
>> >>>      .rating            = 100,
>> >>>      .set_next_event        = riscv_clock_next_event,
>> >>>  };
>> >>
>> >> After a bit of a painful bisection (with a misdirection into the v5.19
>> >> printk reverts along the way) I have arrived at this commit for causing
>> >> me some issues.
>> >>
>> >> If an AXI read to the PCIe controller on PolarFire SoC times out, the
>> >> system will stall, with an expected:
>> >>      io scheduler mq-deadline registered
>> >>      io scheduler kyber registered
>> >>      microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000
>> >> ranges:
>> >>      microchip-pcie 2000000000.pcie:      MEM
>> >> 0x2008000000..0x2087ffffff -> 0x0008000000
>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>> >>      microchip-pcie 2000000000.pcie: axi read request error
>> >>      microchip-pcie 2000000000.pcie: axi read timeout
>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>> >>      Freeing initrd memory: 7336K
>> >>      mc_event_handler: 667402 callbacks suppressed
>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>> >>      mc_event_handler: 666588 callbacks suppressed
>> >> <truncated>
>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>> >>     mc_event_handler: 666748 callbacks suppressed
>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>> >>     rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>> >>     rcu:     0-...0: (1 GPs behind) idle=19f/1/0x4000000000000002
>> >> softirq=34/36 fqs=2626
>> >>         (detected by 1, t=5256 jiffies, g=-1151, q=1143 ncpus=4)
>> >>     Task dump for CPU 0:
>> >>     task:swapper/0       state:R  running task     stack:    0 pid:
>> >> 1 ppid:     0 flags:0x00000008
>> >>     Call Trace:
>> >>     mc_event_handler: 666648 callbacks suppressed
>> >>
>> >>  With this patch applied, the system just locks up without RCU stalling:
>> >>     io scheduler mq-deadline registered
>> >>     io scheduler kyber registered
>> >>     microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000
>> >> ranges:
>> >>     microchip-pcie 2000000000.pcie:      MEM
>> >> 0x2008000000..0x2087ffffff -> 0x0008000000
>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>> >>     microchip-pcie 2000000000.pcie: axi read request error
>> >>     microchip-pcie 2000000000.pcie: axi read timeout
>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>> >>     Freeing initrd memory: 7332K
>> >>
>> >> As of yet, I have no idea if RCU stalls for other reasons would also be
>> >> lost.
>> >
>> > Sorry this broke stuff.  I'm not entirely sure why this would mask RCU
>> > stalls, but it seems like we're hitting some pretty odd paths here and
>> > I'm not sure this is expected to work at all for us.
>>
>> I'm confused here. The RCU stall is itself a bug, right? Are you sure
>> this patch is wrongly masking the stall, or is it possibly just avoiding
>> some buggy code and not causing a stall in the first place?
>>
>> Regards,
>> Samuel
>>
>> > If non-x86 architectures are meant to be able to set
>> > CLOCK_EVT_FEAT_C3STOP, maybe we should document what it's supposed to do
>> > in a more platform-agnostic fashion?
>> >
>> >> Thanks,
>> >> Conor.
>> >>
>> >> git bisect start
>> >> # status: waiting for both good and bad commits
>> >> # good: [7699f7aacf3ebfee51c670b6f796b2797f0f7487] RISC-V: Prepare
>> >> dropping week attribute from arch_kexec_apply_relocations[_add]
>> >> git bisect good 7699f7aacf3ebfee51c670b6f796b2797f0f7487
>> >> # bad: [63d5172e148bcc174398040861d867bbd2770be4] HACK: jogness
>> >> git bisect bad 63d5172e148bcc174398040861d867bbd2770be4
>> >> # good: [2518f226c60d8e04d18ba4295500a5b0b8ac7659] Merge tag
>> >> 'drm-next-2022-05-25' of git://anongit.freedesktop.org/drm/drm
>> >> git bisect good 2518f226c60d8e04d18ba4295500a5b0b8ac7659
>> >> # good: [907bb57aa7b471872aab2f2e83e9713a145673f9] Merge tag
>> >> 'pinctrl-v5.19-1' of
>> >> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
>> >> git bisect good 907bb57aa7b471872aab2f2e83e9713a145673f9
>> >> # good: [4ad680f083ec360e0991c453e18a38ed9ae500d7] Merge tag
>> >> 'staging-5.19-rc1' of
>> >> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
>> >> git bisect good 4ad680f083ec360e0991c453e18a38ed9ae500d7
>> >> # good: [23df9ba64bb9e26cfee6b34f5c3ece49a8a61ee1] Merge tag
>> >> 'for-5.19/parisc-2' of
>> >> git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
>> >> git bisect good 23df9ba64bb9e26cfee6b34f5c3ece49a8a61ee1
>> >> # bad: [7a68065eb9cd194cf03f135c9211eeb2d5c4c0a0] Merge tag
>> >> 'gpio-fixes-for-v5.19-rc2' of
>> >> git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
>> >> git bisect bad 7a68065eb9cd194cf03f135c9211eeb2d5c4c0a0
>> >> # bad: [1f192b9e8d8a5c619b33a868fb1af063af65ce5d] Merge tag
>> >> 'drm-misc-fixes-2022-06-09' of
>> >> git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
>> >> git bisect bad 1f192b9e8d8a5c619b33a868fb1af063af65ce5d
>> >> # good: [b2c9a83d262a8feb022e24e9f9aadb66cb10a7a8] Merge tag
>> >> 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
>> >> git bisect good b2c9a83d262a8feb022e24e9f9aadb66cb10a7a8
>> >> # bad: [e17fee8976c3d2ccf9add6d6c8912a37b025d840] Merge tag
>> >> 'mm-nonmm-stable-2022-06-05' of
>> >> git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
>> >> git bisect bad e17fee8976c3d2ccf9add6d6c8912a37b025d840
>> >> # bad: [c049ecc523171481accd2c83f79ffeecbf53a915] Merge tag
>> >> 'timers-core-2022-06-05' of
>> >> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> >> git bisect bad c049ecc523171481accd2c83f79ffeecbf53a915
>> >> # bad: [9c04a8ff03def4df3f81219ffbe1ec9b44ff5348]
>> >> clocksource/drivers/oxnas-rps: Fix irq_of_parse_and_map() return value
>> >> git bisect bad 9c04a8ff03def4df3f81219ffbe1ec9b44ff5348
>> >> # bad: [7160d9c4cce94612d5f42a5db392cd606a38737a]
>> >> clocksource/drivers/armada-370-xp: Convert to SPDX identifier
>> >> git bisect bad 7160d9c4cce94612d5f42a5db392cd606a38737a
>> >> # bad: [a98399cbc1e05f7b977419f03905501d566cf54e]
>> >> clocksource/drivers/sp804: Avoid error on multiple instances
>> >> git bisect bad a98399cbc1e05f7b977419f03905501d566cf54e
>> >> # good: [41929c9f628b9990d33a200c54bb0c919e089aa8]
>> >> clocksource/drivers/ixp4xx: Drop boardfile probe path
>> >> git bisect good 41929c9f628b9990d33a200c54bb0c919e089aa8
>> >> # bad: [232ccac1bd9b5bfe73895f527c08623e7fa0752d]
>> >> clocksource/drivers/riscv: Events are stopped during CPU suspend
>> >> git bisect bad 232ccac1bd9b5bfe73895f527c08623e7fa0752d
>> >> # first bad commit: [232ccac1bd9b5bfe73895f527c08623e7fa0752d]
>> >> clocksource/drivers/riscv: Events are stopped during CPU suspend
>>
Samuel Holland Oct. 27, 2022, 11:41 p.m. UTC | #10
On 10/27/22 18:07, Palmer Dabbelt wrote:
> On Mon, 24 Oct 2022 01:04:55 PDT (-0700), apatel@ventanamicro.com wrote:
>> On Mon, Oct 24, 2022 at 10:31 AM Samuel Holland <samuel@sholland.org>
>> wrote:
>>>
>>> On 10/9/22 18:45, Palmer Dabbelt wrote:
>>> > On Thu, 29 Sep 2022 14:50:45 PDT (-0700), Conor Dooley wrote:
>>> >> On Sun, May 08, 2022 at 08:21:21PM -0500, Samuel Holland wrote:
>>> >>> Some implementations of the SBI time extension depend on hart-local
>>> >>> state (for example, CSRs) that are lost or hardware that is powered
>>> >>> down when a CPU is suspended. To be safe, the clockevents driver
>>> >>> cannot assume that timer IRQs will be received during CPU suspend.
>>> >>>
>>> >>> Fixes: 62b019436814 ("clocksource: new RISC-V SBI timer driver")
>>> >>> Signed-off-by: Samuel Holland <samuel@sholland.org>
>>> >>> ---
>>> >>>
>>> >>>  drivers/clocksource/timer-riscv.c | 2 +-
>>> >>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>> >>>
>>> >>> diff --git a/drivers/clocksource/timer-riscv.c
>>> >>> b/drivers/clocksource/timer-riscv.c
>>> >>> index 1767f8bf2013..593d5a957b69 100644
>>> >>> --- a/drivers/clocksource/timer-riscv.c
>>> >>> +++ b/drivers/clocksource/timer-riscv.c
>>> >>> @@ -34,7 +34,7 @@ static int riscv_clock_next_event(unsigned long
>>> delta,
>>> >>>  static unsigned int riscv_clock_event_irq;
>>> >>>  static DEFINE_PER_CPU(struct clock_event_device,
>>> riscv_clock_event) = {
>>> >>>      .name            = "riscv_timer_clockevent",
>>> >>> -    .features        = CLOCK_EVT_FEAT_ONESHOT,
>>> >>> +    .features        = CLOCK_EVT_FEAT_ONESHOT |
>>> CLOCK_EVT_FEAT_C3STOP,
>>> >
>>> > This is listed as being x86-specific in the header, but there's a
>>> hanful
>>> > of other ports that enable it for timers as well.  Looks like arm is
>>> > setting this based on DT, which seems reasonable to me: we're working
>>> > around a firmware bug, there should be some way to turn off that
>>> > workaround for firmware that doesn't have the bug. Looks like Intel
>>> already
>>> > turns this off when ARAT is supported, which seems to be the case for
>>> > anything modern, so maybe we're just tripping up on some untested
>>> behavior here?
>>> > I'm not sure exactly how we should probe this, but having it only
>>> enabled
>>> > when we need the workaround seems like the right way to go.
>>>
>>> I opened an issue against the SBI spec about what exactly it requires,
>>> but I got no responses:
>>>
>>> https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98
>>>
>>> My interpretation of the SBI specification is that it does not require
>>> maintaining any hart-local state across a non-retentive hart suspend.
>>> Unless the SBI spec says the timer must fire during/after suspend, then
>>> there is no firmware bug.
> 
> IMO this is a grey area in the spec: it says both "The hart will
> automatically come out of suspended state and resume normal execution
> when it receives an interrupt or platform specific hardware event." but
> "sstatus.SIE = 0" on resume.  It's not clear _when_ "sstatus.SIE" must
> take the value 0 (ie, is it before sleeping or after taking up) and if
> "receives an interrupt" means _any_ interrupt, or just enabled interrupts.
> 
> I agree we can't say it's a firmware bug, though.  There's certainly
> some reading of the spec that allows for this -- even if there wasn't
> we'd still have to live with whatever the firmware does, but here I
> think it's just a Linux bug.
> 
>> SBI spec only defines the mechanism to enter HART suspend state. All
>> other details (such as timer interrupt firing during/after suspend)
>> are platform
>> or implementation specific which needs to be discovered from DT/ACPI.
> 
> From that bug it sounds like it's really platform-specific whether or
> not it's possible to wake up from non-retentive suspend, so maybe we
> should just add some sort of DT node that says "non-retentive suspend
> works" and then only use it on those systems?

No, this isn't about whether non-retentive suspend works generally. The
only behavior in question is specifically whether or not the _SBI timer
extension_ can wake from non-retentive suspend.

For example, on Allwinner D1, all PLIC inputs can wake from
non-retentive suspend. This includes several MMIO timers. But notably it
does not include the CLINT. And the CLINT is what OpenSBI chooses as the
backend for the SBI timer extension.

You could make an argument that OpenSBI should use one of those other
MMIO timers, and then the C3STOP flag would not be needed, but the SBI
spec does not _require_ doing this.

And (in the absence of a DT/ACPI property declaring otherwise), the
driver can only rely on what the standard requires.

>>> > That said, I'm not actually sure this C3STOP feature does what we want
>>> > given the commit description.  The timers on RISC-V are sort of in
>>> this
>>> > odd middle-ground between being per-CPU timers and system timers: the
>>> > time they produce is global (or at least close, due to the mtime
>>> > synchronization rules) but the actual interrupts are only one-shot and
>>> > only local.
>>>
>>> And if we cannot rely on the interrupt being delivered, we cannot rely
>>> on the SBI time extension to work across cpuidle entry.
>>
>> Just like ARM, we need a DT property to discover this platform specific
>> behaviour.
>>
>> I had sent out a small series to do this for DT.
>> Refer, "[PATCH v2 0/2] Improve CLOCK_EVT_FEAT_C3STOP feature setting"
>> https://lore.kernel.org/all/20220727114302.302201-1-apatel@ventanamicro.com/

This is a reasonable solution for me, if we can agree on the binding.

> I broadly agree with the "we should split out the timer node" stuff
> there.  It used to be part of the core, but it's not any more.  Looks
> like that was the hangup, though I'm not sure setting C3STOP is even the
> right fix any more...
> 
>>> > From poking around the code I think this just tries to
>>> > setup a periodic broadcast timer, but since we use software
>>> fallbacks to
>>> > emulate those we'll still end up losing the interrupts/ticks if the
>>> CPU
>>> > that was asked for an interrupt has gone to sleep and lost that state.
>>>
>>> So by extension, non-retentive cpuidle states cannot be used if the SBI
>>> timer is the only available timer, since there is no hardware broadcast
>>> timer to use as a backup.
>>>
>>> > I'm not sure if I'm just misunderstanding what's going on here,
>>> though.
>>> > Is there something that describes the behavior this fixes in more
>>> detail?
>>>
>>> The motivating scenario for this patch is the C906, where the MTIMER is
>>> in the same reset domain as the CPU, so the timer state is lost during
>>> non-retentive suspend. Without this patch, if riscv_timer_clockevent is
>>> the current clockevent driver, then the CPU fails to wake up from
>>> suspend. However, this same problem would occur on any CPU where the
>>> timer or interrupt delivery stops working during suspend.
>>
>> Yes, I recall CLOCK_EVT_FEAT_C3STOP was added for C906 but
>> we should go the DT/ACPI route.
> 
> I agree, though I think this should disable non-retentive suspend as
> opposed to setting C3STOP.

Please see above. Disabling non-retentive suspend is a much bigger
hammer than needed.

>> Regards,
>> Anup
>>
>>>
>>> >>>      .rating            = 100,
>>> >>>      .set_next_event        = riscv_clock_next_event,
>>> >>>  };
>>> >>
>>> >> After a bit of a painful bisection (with a misdirection into the
>>> v5.19
>>> >> printk reverts along the way) I have arrived at this commit for
>>> causing
>>> >> me some issues.
>>> >>
>>> >> If an AXI read to the PCIe controller on PolarFire SoC times out, the
>>> >> system will stall, with an expected:
>>> >>      io scheduler mq-deadline registered
>>> >>      io scheduler kyber registered
>>> >>      microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000
>>> >> ranges:
>>> >>      microchip-pcie 2000000000.pcie:      MEM
>>> >> 0x2008000000..0x2087ffffff -> 0x0008000000
>>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>> >>      microchip-pcie 2000000000.pcie: axi read request error
>>> >>      microchip-pcie 2000000000.pcie: axi read timeout
>>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>> >>      Freeing initrd memory: 7336K
>>> >>      mc_event_handler: 667402 callbacks suppressed
>>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>> >>      microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>> >>      microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>> >>      mc_event_handler: 666588 callbacks suppressed
>>> >> <truncated>
>>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>> >>     mc_event_handler: 666748 callbacks suppressed
>>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>> >>     rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>>> >>     rcu:     0-...0: (1 GPs behind) idle=19f/1/0x4000000000000002
>>> >> softirq=34/36 fqs=2626
>>> >>         (detected by 1, t=5256 jiffies, g=-1151, q=1143 ncpus=4)
>>> >>     Task dump for CPU 0:
>>> >>     task:swapper/0       state:R  running task     stack:    0 pid:
>>> >> 1 ppid:     0 flags:0x00000008
>>> >>     Call Trace:
>>> >>     mc_event_handler: 666648 callbacks suppressed
>>> >>
>>> >>  With this patch applied, the system just locks up without RCU
>>> stalling:
>>> >>     io scheduler mq-deadline registered
>>> >>     io scheduler kyber registered
>>> >>     microchip-pcie 2000000000.pcie: host bridge /soc/pcie@2000000000
>>> >> ranges:
>>> >>     microchip-pcie 2000000000.pcie:      MEM
>>> >> 0x2008000000..0x2087ffffff -> 0x0008000000
>>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>> >>     microchip-pcie 2000000000.pcie: axi read request error
>>> >>     microchip-pcie 2000000000.pcie: axi read timeout
>>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>> >>     microchip-pcie 2000000000.pcie: sec error in pcie2axi buffer
>>> >>     microchip-pcie 2000000000.pcie: ded error in pcie2axi buffer
>>> >>     Freeing initrd memory: 7332K
>>> >>
>>> >> As of yet, I have no idea if RCU stalls for other reasons would
>>> also be
>>> >> lost.
>>> >
>>> > Sorry this broke stuff.  I'm not entirely sure why this would mask RCU
>>> > stalls, but it seems like we're hitting some pretty odd paths here and
>>> > I'm not sure this is expected to work at all for us.
>>>
>>> I'm confused here. The RCU stall is itself a bug, right? Are you sure
>>> this patch is wrongly masking the stall, or is it possibly just avoiding
>>> some buggy code and not causing a stall in the first place?

I still don't see why you consider this patch to have broken anything.
Can anyone explain why getting an RCU stall is better than not getting one?

Regards,
Samuel
Conor Dooley Oct. 28, 2022, 12:30 a.m. UTC | #11
On 28 October 2022 00:41:45 IST, Samuel Holland <samuel@sholland.org> wrote:
>
>I still don't see why you consider this patch to have broken anything.
>Can anyone explain why getting an RCU stall is better than not getting one?

Taking that to the extreme, why even have RCU stall detection at all? It's not a case of with the patch the system works fine and without it there are stalls. With the patch, the system completely locks up during boot and never makes progress. I think I've done a bad job of getting across that the system just dies during boot.

(Apologies for formatting, mobile...)
diff mbox series

Patch

diff --git a/drivers/clocksource/timer-riscv.c b/drivers/clocksource/timer-riscv.c
index 1767f8bf2013..593d5a957b69 100644
--- a/drivers/clocksource/timer-riscv.c
+++ b/drivers/clocksource/timer-riscv.c
@@ -34,7 +34,7 @@  static int riscv_clock_next_event(unsigned long delta,
 static unsigned int riscv_clock_event_irq;
 static DEFINE_PER_CPU(struct clock_event_device, riscv_clock_event) = {
 	.name			= "riscv_timer_clockevent",
-	.features		= CLOCK_EVT_FEAT_ONESHOT,
+	.features		= CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_C3STOP,
 	.rating			= 100,
 	.set_next_event		= riscv_clock_next_event,
 };