Message ID | 20240603012804.122215-2-ben@bens.haus (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | cpufreq: enable 1200Mhz clock speed for armada-37xx | expand |
On Sun, Jun 02, 2024 at 06:26:38PM -0700, Benjamin Schneider wrote: > This frequency was disabled because of unresolved stability problems. > However, based on several months of testing, the source of the > stability problems seems to be the bootloader, not the kernel. > Marvell has recently merged changes to their bootloader source that > addresses the stability issues when frequency scaling is enabled at > all frequencies including 1.2Ghz. The problem is, most systems don't have the new bootloader. And so if you enable 1.2GHz, they are going to be unstable. Rather than making this unconditional, i think it needs to be conditional on knowing the bootloader has been upgraded. Could you add code which looks in the DDRPHY and see if 0xC0001004 has the correct value. Only then enable the additional clock speed. Andrew
Jun 3, 2024, 05:46 by andrew@lunn.ch: > The problem is, most systems don't have the new bootloader. And so if > you enable 1.2GHz, they are going to be unstable. > Based on my testing, the A3720 was unstable using a bootloader built with Marvell's source without regard to clock speed or frequency scaling. That is, it didn't matter if 1.2Ghz was enabled or not, and it didn't matter if cpufreq-dt was loaded or not, my devices were reliably crashing when trying to use Marvell's source instead of Globalscale's for building the bootloader. When I dug in to find the difference, this DDRPHY setting was one of two that I found. I also found that setting it to the value in Globalscale's repos restored stability to the devices. I then tested 1.2Ghz bootloader speeds as well as frequency scaling and found that they worked fine. I've been keeping track here: https://github.com/bschnei/ebu-bootloader > Rather than making this unconditional, i think it needs to be > conditional on knowing the bootloader has been upgraded. Could you add > code which looks in the DDRPHY and see if 0xC0001004 has the correct > value. Only then enable the additional clock speed. > I think there are two potential issues with doing something like that. First, that DDRPHY value has been flipped back and forth. The change I submitted to Marvell just undoes this change from January 2021: https://github.com/MarvellEmbeddedProcessors/mv-ddr-marvell/commit/c3a8d3c7ff4bd460770b0cf601e57a6f70cb1871 Perhaps it would be OK if the older code is also stable, but I haven't tested it. Second, the value seems to be deliberately different for other memory configurations (DDR3) which makes the conditional logic more complex if it's meant to work for all A37XX variants, and I don't have other variants to test. Given the history of this setting getting flipped back and forth, and having read through a few old threads on this subject, it's my theory that some of the instability issues that were attributed to kernel frequency scaling and/or 1.2Ghz as a speed were more likely attributable to bad bootloaders all along. I've reached out to Armbian and Arch communities to let them know in the hope of finding other users of these devices that might be willing to test, but have not received any responses. It's also worth noting that my devices came from the factory with the bootloader clocked at 800Mhz. I'm pretty sure the OS cannot set a speed above the bootloader clock speed. As a result, at least for the ESPRESSObin Ultra, the only devices I would expect to break are those where users have put in the work to build (or taken the risk of flashing) a bootloader clocked at 1.2Ghz. When the kernel encounters one of those devices it currently disables frequency scaling entirely (cpufreq-dt will not load) leaving them to run at full speed constantly. If there are users who can't/won't update their bootloader and for which frequency scaling is unstable, it seems like it would make more sense and facilitate further testing to use kernel config or userspace tools as the place to disable scaling. Thanks! Ben
On Sunday 02 June 2024 18:26:38 Benjamin Schneider wrote: > This frequency was disabled because of unresolved stability problems. > However, based on several months of testing, the source of the > stability problems seems to be the bootloader, not the kernel. > Marvell has recently merged changes to their bootloader source that > addresses the stability issues when frequency scaling is enabled at > all frequencies including 1.2Ghz. > > Signed-off-by: Benjamin Schneider <ben@bens.haus> > --- > drivers/cpufreq/armada-37xx-cpufreq.c | 6 +----- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c b/drivers/cpufreq/armada-37xx-cpufreq.c > index bea41ccab..f28a4435f 100644 > --- a/drivers/cpufreq/armada-37xx-cpufreq.c > +++ b/drivers/cpufreq/armada-37xx-cpufreq.c > @@ -102,11 +102,7 @@ struct armada_37xx_dvfs { > }; > > static struct armada_37xx_dvfs armada_37xx_dvfs[] = { > - /* > - * The cpufreq scaling for 1.2 GHz variant of the SOC is currently > - * unstable because we do not know how to configure it properly. > - */ > - /* {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} }, */ > + {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} }, > {.cpu_freq_max = 1000*1000*1000, .divider = {1, 2, 4, 5} }, > {.cpu_freq_max = 800*1000*1000, .divider = {1, 2, 3, 4} }, > {.cpu_freq_max = 600*1000*1000, .divider = {2, 4, 5, 6} }, > -- > 2.45.1 As without the updated firmware on 1.2 GHz variant of the SoC is kernel already crashing, even with commented line for .cpu_freq_max = 1200, this change makes sense. There is no reason to have 1.2 GHz line disabled as it does not solve any issue (as was originally thought) and just prevent people with updated firmware to use non-performance governor on that SoC. (When cpufreq driver is not loaded then CPU frequency of the SoC is locked at the max speed, which has observed behavior same as performance cpufreq governor). So, go ahead with this change with my Reviewed-by: Pali Rohár <pali@kernel.org>
On Wed, Jun 05, 2024 at 09:44:22PM +0200, Pali Rohár wrote: > On Sunday 02 June 2024 18:26:38 Benjamin Schneider wrote: > > This frequency was disabled because of unresolved stability problems. > > However, based on several months of testing, the source of the > > stability problems seems to be the bootloader, not the kernel. > > Marvell has recently merged changes to their bootloader source that > > addresses the stability issues when frequency scaling is enabled at > > all frequencies including 1.2Ghz. > > > > Signed-off-by: Benjamin Schneider <ben@bens.haus> > > --- > > drivers/cpufreq/armada-37xx-cpufreq.c | 6 +----- > > 1 file changed, 1 insertion(+), 5 deletions(-) > > > > diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c b/drivers/cpufreq/armada-37xx-cpufreq.c > > index bea41ccab..f28a4435f 100644 > > --- a/drivers/cpufreq/armada-37xx-cpufreq.c > > +++ b/drivers/cpufreq/armada-37xx-cpufreq.c > > @@ -102,11 +102,7 @@ struct armada_37xx_dvfs { > > }; > > > > static struct armada_37xx_dvfs armada_37xx_dvfs[] = { > > - /* > > - * The cpufreq scaling for 1.2 GHz variant of the SOC is currently > > - * unstable because we do not know how to configure it properly. > > - */ > > - /* {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} }, */ > > + {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} }, > > {.cpu_freq_max = 1000*1000*1000, .divider = {1, 2, 4, 5} }, > > {.cpu_freq_max = 800*1000*1000, .divider = {1, 2, 3, 4} }, > > {.cpu_freq_max = 600*1000*1000, .divider = {2, 4, 5, 6} }, > > -- > > 2.45.1 > > As without the updated firmware on 1.2 GHz variant of the SoC is kernel > already crashing, even with commented line for .cpu_freq_max = 1200, > this change makes sense. > > There is no reason to have 1.2 GHz line disabled as it does not solve > any issue (as was originally thought) and just prevent people with > updated firmware to use non-performance governor on that SoC. > (When cpufreq driver is not loaded then CPU frequency of the SoC is > locked at the max speed, which has observed behavior same as performance > cpufreq governor). > > So, go ahead with this change with my > > Reviewed-by: Pali Rohár <pali@kernel.org> I defer to your knowledge in this matter. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Andrew
Jun 5, 2024, 16:44 by andrew@lunn.ch: > On Wed, Jun 05, 2024 at 09:44:22PM +0200, Pali Rohár wrote: > >> So, go ahead with this change with my >> >> Reviewed-by: Pali Rohár <pali@kernel.org> >> > > I defer to your knowledge in this matter. > > Reviewed-by: Andrew Lunn <andrew@lunn.ch> > > Andrew > Hi all, just checking in to see if there are any outstanding concerns/issues outstanding with this patch? I am happy to report several more weeks of stability on two A3720 devices. Thanks! Ben
Hello Ben, > Jun 5, 2024, 16:44 by andrew@lunn.ch: > >> On Wed, Jun 05, 2024 at 09:44:22PM +0200, Pali Rohár wrote: >> >>> So, go ahead with this change with my >>> >>> Reviewed-by: Pali Rohár <pali@kernel.org> >>> >> >> I defer to your knowledge in this matter. >> >> Reviewed-by: Andrew Lunn <andrew@lunn.ch> >> >> Andrew >> > > Hi all, just checking in to see if there are any outstanding > concerns/issues outstanding with this patch? I am happy to report > several more weeks of stability on two A3720 devices. Thanks! > You received two "reviewed-by" approvals, and Marek Behún, who wrote the patch removing the 1.2GHz limitation, did not object. Therefore, from my point of view, there is no problem merging your change. As the original author of this driver, you can add my "Acked-by" to your patch if it can help: Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com> Gregory > Ben
Jul 16, 2024, 01:51 by gregory.clement@bootlin.com: > You received two "reviewed-by" approvals, and Marek Behún, who wrote the > patch removing the 1.2GHz limitation, did not object. Therefore, from my > point of view, there is no problem merging your change. > Acked-by: Gregory CLEMENT <> gregory.clement@bootlin.com> > > Hi all, please let me know if there is anything else needed from me before this can be merged. The two devices I have continue to be stable while using the ondemand scaling policy. Thanks! Ben
Jul 16, 2024, 01:51 by gregory.clement@bootlin.com: > Hello Ben, > >> Jun 5, 2024, 16:44 by >> andrew@lunn.ch>> : >> >>> On Wed, Jun 05, 2024 at 09:44:22PM +0200, Pali Rohár wrote: >>> >>>> So, go ahead with this change with my >>>> Reviewed-by: Pali Rohár <>>>> pali@kernel.org>>>> > >>>> >>> I defer to your knowledge in this matter. >>> Reviewed-by: Andrew Lunn <>>> andrew@lunn.ch>>> > >>> > Acked-by: Gregory CLEMENT <> gregory.clement@bootlin.com> > > Hi maintainers, I don't believe there are any outstanding concerns with this patch. Would one of you please merge or let me know next steps? Marvell accepted the patch to fix the bootloader instability in May, and I've since had two devices running stably with frequency scaling enabled up to 1.2Ghz. I've kept track of my work here: https://github.com/bschnei/ebu-bootloader Thank you! Ben
diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c b/drivers/cpufreq/armada-37xx-cpufreq.c index bea41ccab..f28a4435f 100644 --- a/drivers/cpufreq/armada-37xx-cpufreq.c +++ b/drivers/cpufreq/armada-37xx-cpufreq.c @@ -102,11 +102,7 @@ struct armada_37xx_dvfs { }; static struct armada_37xx_dvfs armada_37xx_dvfs[] = { - /* - * The cpufreq scaling for 1.2 GHz variant of the SOC is currently - * unstable because we do not know how to configure it properly. - */ - /* {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} }, */ + {.cpu_freq_max = 1200*1000*1000, .divider = {1, 2, 4, 6} }, {.cpu_freq_max = 1000*1000*1000, .divider = {1, 2, 4, 5} }, {.cpu_freq_max = 800*1000*1000, .divider = {1, 2, 3, 4} }, {.cpu_freq_max = 600*1000*1000, .divider = {2, 4, 5, 6} },
This frequency was disabled because of unresolved stability problems. However, based on several months of testing, the source of the stability problems seems to be the bootloader, not the kernel. Marvell has recently merged changes to their bootloader source that addresses the stability issues when frequency scaling is enabled at all frequencies including 1.2Ghz. Signed-off-by: Benjamin Schneider <ben@bens.haus> --- drivers/cpufreq/armada-37xx-cpufreq.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)