Message ID | 20220209135535.29547-1-christianshewitt@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: dts: meson: remove CPU opps below 1GHz for G12B/SM1 | expand |
Christian Hewitt <christianshewitt@gmail.com> writes: > Amlogic G12B and SM1 devices experience CPU stalls and random board > wedges when the system idles and CPU cores clock down to lower opp > points. Recent vendor kernels include a change to remove 100-250MHz > (with no explanation) [0] but other downstream sources also remove > the 500/667MHz points (also with no explanation). Unless 100-667Mhz > opps are removed or the CPU governor forced to performance, stalls > are observed, so let's remove them an improve stability/uptime. > > [0] https://github.com/khadas/linux/commit/20e237a4fe9f0302370e24950cb1416e038eee03 hehe, not a very helpful changelog in that khadas kernel commit :( > Signed-off-by: Christian Hewitt <christianshewitt@gmail.com> > --- > Numerous people have experienced this issue and I have tested with > only the low opp-points removed and numerous voltage tweaks: but it > makes no difference. With the opp points present an Odroid N2 or > Khadas VIM3 reliably drop off my network after being left idling > overnight with UART showing a CPU stall splat. With the opp points > removed I see weeks of uninterupted uptime. It's beyond my skills > to research what the cause of the stalls might be, but if anyone > ever figures it out we can always restore things. NB: This issue > is not too widely reported in forums, but that's largely because > most of the Amlogic supporting distros have been including this > change picked from my kernel patchset for some time. Very interesting. I've also noticed instability across suspend resume on VIM3/VIM3L and only got as far in debugging to noticing it was DVFS/OPP related, but didn't get much further yet. I'll give this a try to see if it helps. Thanks for finding & posting! Kevin
Christian Hewitt <christianshewitt@gmail.com> writes: > Amlogic G12B and SM1 devices experience CPU stalls and random board > wedges when the system idles and CPU cores clock down to lower opp > points. Recent vendor kernels include a change to remove 100-250MHz > (with no explanation) [0] but other downstream sources also remove > the 500/667MHz points (also with no explanation). Unless 100-667Mhz > opps are removed or the CPU governor forced to performance, stalls > are observed, so let's remove them an improve stability/uptime. Just curious: what CPUfreq governor do you use by default for the LibreELEC kernel? Your patch greatly improves the stability I'm seeing, but doesn't quite elimitate it. I'm testing suspend/resume in a loop on VIM3, and with schedutil (default) or ondemand, it eventually hangs. With either powersave or performance it's stable. Kevin
> On 10 Feb 2022, at 5:31 am, Kevin Hilman <khilman@baylibre.com> wrote: > > Christian Hewitt <christianshewitt@gmail.com> writes: > >> Amlogic G12B and SM1 devices experience CPU stalls and random board >> wedges when the system idles and CPU cores clock down to lower opp >> points. Recent vendor kernels include a change to remove 100-250MHz >> (with no explanation) [0] but other downstream sources also remove >> the 500/667MHz points (also with no explanation). Unless 100-667Mhz >> opps are removed or the CPU governor forced to performance, stalls >> are observed, so let's remove them an improve stability/uptime. > > Just curious: what CPUfreq governor do you use by default for the > LibreELEC kernel? LE uses ondemand. One of the original clues on the problem us that the issue isn’t seen in some of the retro-gaming forks on LE's codebase which use the performance governor (and overclocks, etc.) > Your patch greatly improves the stability I'm seeing, but doesn't quite > elimitate it. > > I'm testing suspend/resume in a loop on VIM3, and with schedutil > (default) or ondemand, it eventually hangs. With either powersave or > performance it's stable. > > Kevin
On 09/02/2022 14:55, Christian Hewitt wrote: > Amlogic G12B and SM1 devices experience CPU stalls and random board > wedges when the system idles and CPU cores clock down to lower opp > points. Recent vendor kernels include a change to remove 100-250MHz > (with no explanation) [0] but other downstream sources also remove > the 500/667MHz points (also with no explanation). Unless 100-667Mhz > opps are removed or the CPU governor forced to performance, stalls > are observed, so let's remove them an improve stability/uptime. > > [0] https://github.com/khadas/linux/commit/20e237a4fe9f0302370e24950cb1416e038eee03 > > Signed-off-by: Christian Hewitt <christianshewitt@gmail.com> > --- > Numerous people have experienced this issue and I have tested with > only the low opp-points removed and numerous voltage tweaks: but it > makes no difference. With the opp points present an Odroid N2 or > Khadas VIM3 reliably drop off my network after being left idling > overnight with UART showing a CPU stall splat. With the opp points > removed I see weeks of uninterupted uptime. It's beyond my skills > to research what the cause of the stalls might be, but if anyone > ever figures it out we can always restore things. NB: This issue > is not too widely reported in forums, but that's largely because > most of the Amlogic supporting distros have been including this > change picked from my kernel patchset for some time. > > .../boot/dts/amlogic/meson-g12b-a311d.dtsi | 40 ------------------- > .../boot/dts/amlogic/meson-g12b-s922x.dtsi | 40 ------------------- > arch/arm64/boot/dts/amlogic/meson-sm1.dtsi | 20 ---------- > 3 files changed, 100 deletions(-) > > diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b-a311d.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b-a311d.dtsi > index d61f43052a34..8e9ad1e51d66 100644 > --- a/arch/arm64/boot/dts/amlogic/meson-g12b-a311d.dtsi > +++ b/arch/arm64/boot/dts/amlogic/meson-g12b-a311d.dtsi > @@ -11,26 +11,6 @@ > compatible = "operating-points-v2"; > opp-shared; > > - opp-100000000 { > - opp-hz = /bits/ 64 <100000000>; > - opp-microvolt = <731000>; > - }; > - > - opp-250000000 { > - opp-hz = /bits/ 64 <250000000>; > - opp-microvolt = <731000>; > - }; > - > - opp-500000000 { > - opp-hz = /bits/ 64 <500000000>; > - opp-microvolt = <731000>; > - }; > - > - opp-667000000 { > - opp-hz = /bits/ 64 <667000000>; > - opp-microvolt = <731000>; > - }; > - > opp-1000000000 { > opp-hz = /bits/ 64 <1000000000>; > opp-microvolt = <761000>; > @@ -71,26 +51,6 @@ > compatible = "operating-points-v2"; > opp-shared; > > - opp-100000000 { > - opp-hz = /bits/ 64 <100000000>; > - opp-microvolt = <731000>; > - }; > - > - opp-250000000 { > - opp-hz = /bits/ 64 <250000000>; > - opp-microvolt = <731000>; > - }; > - > - opp-500000000 { > - opp-hz = /bits/ 64 <500000000>; > - opp-microvolt = <731000>; > - }; > - > - opp-667000000 { > - opp-hz = /bits/ 64 <667000000>; > - opp-microvolt = <731000>; > - }; > - > opp-1000000000 { > opp-hz = /bits/ 64 <1000000000>; > opp-microvolt = <731000>; > diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b-s922x.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b-s922x.dtsi > index 1e5d0ee5d541..44c23c984034 100644 > --- a/arch/arm64/boot/dts/amlogic/meson-g12b-s922x.dtsi > +++ b/arch/arm64/boot/dts/amlogic/meson-g12b-s922x.dtsi > @@ -11,26 +11,6 @@ > compatible = "operating-points-v2"; > opp-shared; > > - opp-100000000 { > - opp-hz = /bits/ 64 <100000000>; > - opp-microvolt = <731000>; > - }; > - > - opp-250000000 { > - opp-hz = /bits/ 64 <250000000>; > - opp-microvolt = <731000>; > - }; > - > - opp-500000000 { > - opp-hz = /bits/ 64 <500000000>; > - opp-microvolt = <731000>; > - }; > - > - opp-667000000 { > - opp-hz = /bits/ 64 <667000000>; > - opp-microvolt = <731000>; > - }; > - > opp-1000000000 { > opp-hz = /bits/ 64 <1000000000>; > opp-microvolt = <731000>; > @@ -76,26 +56,6 @@ > compatible = "operating-points-v2"; > opp-shared; > > - opp-100000000 { > - opp-hz = /bits/ 64 <100000000>; > - opp-microvolt = <751000>; > - }; > - > - opp-250000000 { > - opp-hz = /bits/ 64 <250000000>; > - opp-microvolt = <751000>; > - }; > - > - opp-500000000 { > - opp-hz = /bits/ 64 <500000000>; > - opp-microvolt = <751000>; > - }; > - > - opp-667000000 { > - opp-hz = /bits/ 64 <667000000>; > - opp-microvolt = <751000>; > - }; > - > opp-1000000000 { > opp-hz = /bits/ 64 <1000000000>; > opp-microvolt = <771000>; > diff --git a/arch/arm64/boot/dts/amlogic/meson-sm1.dtsi b/arch/arm64/boot/dts/amlogic/meson-sm1.dtsi > index 3c07a89bfd27..80737731af3f 100644 > --- a/arch/arm64/boot/dts/amlogic/meson-sm1.dtsi > +++ b/arch/arm64/boot/dts/amlogic/meson-sm1.dtsi > @@ -95,26 +95,6 @@ > compatible = "operating-points-v2"; > opp-shared; > > - opp-100000000 { > - opp-hz = /bits/ 64 <100000000>; > - opp-microvolt = <730000>; > - }; > - > - opp-250000000 { > - opp-hz = /bits/ 64 <250000000>; > - opp-microvolt = <730000>; > - }; > - > - opp-500000000 { > - opp-hz = /bits/ 64 <500000000>; > - opp-microvolt = <730000>; > - }; > - > - opp-667000000 { > - opp-hz = /bits/ 64 <666666666>; > - opp-microvolt = <750000>; > - }; > - > opp-1000000000 { > opp-hz = /bits/ 64 <1000000000>; > opp-microvolt = <770000>; Can you find if an acceptable set of Fixes tag can be added to permit backporting to the current LTS kernels ? Neil
Christian Hewitt <christianshewitt@gmail.com> writes: >> On 10 Feb 2022, at 5:31 am, Kevin Hilman <khilman@baylibre.com> wrote: >> >> Christian Hewitt <christianshewitt@gmail.com> writes: >> >>> Amlogic G12B and SM1 devices experience CPU stalls and random board >>> wedges when the system idles and CPU cores clock down to lower opp >>> points. Recent vendor kernels include a change to remove 100-250MHz >>> (with no explanation) [0] but other downstream sources also remove >>> the 500/667MHz points (also with no explanation). Unless 100-667Mhz >>> opps are removed or the CPU governor forced to performance, stalls >>> are observed, so let's remove them an improve stability/uptime. >> >> Just curious: what CPUfreq governor do you use by default for the >> LibreELEC kernel? > > LE uses ondemand. One of the original clues on the problem us that the > issue isn’t seen in some of the retro-gaming forks on LE's codebase > which use the performance governor (and overclocks, etc.) OK, thanks. And does LE ever do full system suspend/resume? Are things stable for you across multiple suspend/resume cycles on G12B or SM1 devices? I'm seeing hat with either powersave or performance, repeated suspend/resume is stable, but with ondemand or schedultil it's not, even with $SUBJECT patch applied. If you have some time to test, seeing how long this loop[1] runs with ondemand vs performance or powersave would be instructive. Even more interesting... if I set the governor to performance, but set the suspend OPP to 1GHz[2] (which is what it would be for the powersave governor), it is also unstable. This suggests (to me) that any sort of OPP change during the suspend/resume process is going to be unstable. Now the challenge is to understand why so we can avoid it. Thanks, Kevin [1] while true; do echo "=== SUSPEND ===" cat /sys/devices/system/cpu/cpufreq/policy0/scaling_governor cat /sys/devices/system/cpu/cpufreq/policy0/scaling_cur_freq cat /sys/devices/system/cpu/cpufreq/policy2/scaling_governor cat /sys/devices/system/cpu/cpufreq/policy2/scaling_cur_freq echo rtcwake -d rtc0 -m mem -s4 echo "=== RESUME ===" sleep 4 done [2] diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b-s922x.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b-s922x.dtsi index 1e5d0ee5d541..37da8be85288 100644 --- a/arch/arm64/boot/dts/amlogic/meson-g12b-s922x.dtsi +++ b/arch/arm64/boot/dts/amlogic/meson-g12b-s922x.dtsi @@ -14,6 +14,7 @@ cpu_opp_table_0: opp-table-0 { opp-100000000 { opp-hz = /bits/ 64 <100000000>; opp-microvolt = <731000>; + opp-suspend; }; opp-250000000 { @@ -79,6 +80,7 @@ cpub_opp_table_1: opp-table-1 { opp-100000000 { opp-hz = /bits/ 64 <100000000>; opp-microvolt = <751000>; + opp-suspend; }; opp-250000000 {
diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b-a311d.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b-a311d.dtsi index d61f43052a34..8e9ad1e51d66 100644 --- a/arch/arm64/boot/dts/amlogic/meson-g12b-a311d.dtsi +++ b/arch/arm64/boot/dts/amlogic/meson-g12b-a311d.dtsi @@ -11,26 +11,6 @@ compatible = "operating-points-v2"; opp-shared; - opp-100000000 { - opp-hz = /bits/ 64 <100000000>; - opp-microvolt = <731000>; - }; - - opp-250000000 { - opp-hz = /bits/ 64 <250000000>; - opp-microvolt = <731000>; - }; - - opp-500000000 { - opp-hz = /bits/ 64 <500000000>; - opp-microvolt = <731000>; - }; - - opp-667000000 { - opp-hz = /bits/ 64 <667000000>; - opp-microvolt = <731000>; - }; - opp-1000000000 { opp-hz = /bits/ 64 <1000000000>; opp-microvolt = <761000>; @@ -71,26 +51,6 @@ compatible = "operating-points-v2"; opp-shared; - opp-100000000 { - opp-hz = /bits/ 64 <100000000>; - opp-microvolt = <731000>; - }; - - opp-250000000 { - opp-hz = /bits/ 64 <250000000>; - opp-microvolt = <731000>; - }; - - opp-500000000 { - opp-hz = /bits/ 64 <500000000>; - opp-microvolt = <731000>; - }; - - opp-667000000 { - opp-hz = /bits/ 64 <667000000>; - opp-microvolt = <731000>; - }; - opp-1000000000 { opp-hz = /bits/ 64 <1000000000>; opp-microvolt = <731000>; diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b-s922x.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b-s922x.dtsi index 1e5d0ee5d541..44c23c984034 100644 --- a/arch/arm64/boot/dts/amlogic/meson-g12b-s922x.dtsi +++ b/arch/arm64/boot/dts/amlogic/meson-g12b-s922x.dtsi @@ -11,26 +11,6 @@ compatible = "operating-points-v2"; opp-shared; - opp-100000000 { - opp-hz = /bits/ 64 <100000000>; - opp-microvolt = <731000>; - }; - - opp-250000000 { - opp-hz = /bits/ 64 <250000000>; - opp-microvolt = <731000>; - }; - - opp-500000000 { - opp-hz = /bits/ 64 <500000000>; - opp-microvolt = <731000>; - }; - - opp-667000000 { - opp-hz = /bits/ 64 <667000000>; - opp-microvolt = <731000>; - }; - opp-1000000000 { opp-hz = /bits/ 64 <1000000000>; opp-microvolt = <731000>; @@ -76,26 +56,6 @@ compatible = "operating-points-v2"; opp-shared; - opp-100000000 { - opp-hz = /bits/ 64 <100000000>; - opp-microvolt = <751000>; - }; - - opp-250000000 { - opp-hz = /bits/ 64 <250000000>; - opp-microvolt = <751000>; - }; - - opp-500000000 { - opp-hz = /bits/ 64 <500000000>; - opp-microvolt = <751000>; - }; - - opp-667000000 { - opp-hz = /bits/ 64 <667000000>; - opp-microvolt = <751000>; - }; - opp-1000000000 { opp-hz = /bits/ 64 <1000000000>; opp-microvolt = <771000>; diff --git a/arch/arm64/boot/dts/amlogic/meson-sm1.dtsi b/arch/arm64/boot/dts/amlogic/meson-sm1.dtsi index 3c07a89bfd27..80737731af3f 100644 --- a/arch/arm64/boot/dts/amlogic/meson-sm1.dtsi +++ b/arch/arm64/boot/dts/amlogic/meson-sm1.dtsi @@ -95,26 +95,6 @@ compatible = "operating-points-v2"; opp-shared; - opp-100000000 { - opp-hz = /bits/ 64 <100000000>; - opp-microvolt = <730000>; - }; - - opp-250000000 { - opp-hz = /bits/ 64 <250000000>; - opp-microvolt = <730000>; - }; - - opp-500000000 { - opp-hz = /bits/ 64 <500000000>; - opp-microvolt = <730000>; - }; - - opp-667000000 { - opp-hz = /bits/ 64 <666666666>; - opp-microvolt = <750000>; - }; - opp-1000000000 { opp-hz = /bits/ 64 <1000000000>; opp-microvolt = <770000>;
Amlogic G12B and SM1 devices experience CPU stalls and random board wedges when the system idles and CPU cores clock down to lower opp points. Recent vendor kernels include a change to remove 100-250MHz (with no explanation) [0] but other downstream sources also remove the 500/667MHz points (also with no explanation). Unless 100-667Mhz opps are removed or the CPU governor forced to performance, stalls are observed, so let's remove them an improve stability/uptime. [0] https://github.com/khadas/linux/commit/20e237a4fe9f0302370e24950cb1416e038eee03 Signed-off-by: Christian Hewitt <christianshewitt@gmail.com> --- Numerous people have experienced this issue and I have tested with only the low opp-points removed and numerous voltage tweaks: but it makes no difference. With the opp points present an Odroid N2 or Khadas VIM3 reliably drop off my network after being left idling overnight with UART showing a CPU stall splat. With the opp points removed I see weeks of uninterupted uptime. It's beyond my skills to research what the cause of the stalls might be, but if anyone ever figures it out we can always restore things. NB: This issue is not too widely reported in forums, but that's largely because most of the Amlogic supporting distros have been including this change picked from my kernel patchset for some time. .../boot/dts/amlogic/meson-g12b-a311d.dtsi | 40 ------------------- .../boot/dts/amlogic/meson-g12b-s922x.dtsi | 40 ------------------- arch/arm64/boot/dts/amlogic/meson-sm1.dtsi | 20 ---------- 3 files changed, 100 deletions(-)