Message ID | 20220624165211.4318-1-r.stratiienko@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | clk: sunxi-ng: sun50i: h6: Modify GPU clock configuration to support DFS | expand |
Hi Roman, Dne petek, 24. junij 2022 ob 18:52:11 CEST je Roman Stratiienko napisal(a): > Using simple bash script it was discovered that not all CCU registers > can be safely used for DFS, e.g.: > > while true > do > devmem 0x3001030 4 0xb0003e02 > devmem 0x3001030 4 0xb0001e02 > done > > Script above changes the GPU_PLL multiplier register value. While the > script is running, the user should interact with the user interface. > > Using this method the following results were obtained: > | Register | Name | Bits | Values | Result | > | -- | -- | -- | -- | -- | > | 0x3001030 | GPU_PLL.MULT | 15..8 | 20-62 | OK | > | 0x3001030 | GPU_PLL.INDIV | 1 | 0-1 | OK | > | 0x3001030 | GPU_PLL.OUTDIV | 0 | 0-1 | FAIL | > | 0x3001670 | GPU_CLK.DIV | 3..0 | ANY | FAIL | > > Once bits that caused system failure disabled (kept default 0), > it was discovered that GPU_CLK.MUX was used during DFS for some > reason and was causing the failure too. > > After disabling GPU_PLL.OUTDIV the system started to fail during > booting for some reason until the maximum frequency of GPU_PLL > clock was limited to 756MHz. > > After all the changes made DVFS started to work seamlessly. I appreciate testing effort, but I don't think userspace approach is good way for testing DVFS. I see 2 issues: - As name already suggest, voltage also plays crucial role for stability. You didn't say on which board you tested this, but I assume it has PMIC. Did you make sure GPU voltage regulator is always at 1.04 V, which is needed for 756 MHz? - Kernel clock driver always goes through proper procedure for clock rate change, which involves several steps. Bypassing them might also cause some stability problems. I agree that GPU PLL should be limited to 756 MHz max. This seems to be maximum operating point specified at vendor DT. But I managed to extract some more information from vendor GPU driver. More specifically, from this snippet, located in modules/gpu/mali-midgard/kernel_mode/driver/drivers/gpu/arm/ midgard/platform/sunxi/mali_kbase_config_sunxi.c: pll_freq = target->freq; while (pll_freq < 288000000) pll_freq *= 2; err = clk_set_rate(sunxi_mali->gpu_pll_clk, pll_freq); <...> err = clk_set_rate(kbdev->clock, target->freq); <...> Apparently, minimum stable PLL frequency is 288 MHz (this should be added) and divider in peripheral clock can really be used, although preferably not. Vendor GPU operating points specify only 2 lower than 288 MHz points - at 264 MHz and 216 MHz. I'm fully aware that they may not be really stable and given that these two and next two all share minimum voltage of 810 mV, power and thermal savings are probably not that great, so we can skip them and pin peripheral divider to 1, as you already did. Another discrepancy I see is that vendor DT has two operating points, at 336 MHz and 384 MHz, which also use factor P (also known as d2 in vendor clock source). This can be again an oversight or alternatively, it can be that P factor can actually be used, but just with lower frequencies. Can you please make another test with GPU operating points specified in DT and check if it works with P factor left in? For reference, vendor DT has following operating points (kHz, uV): 756000 1040000 624000 950000 576000 930000 540000 910000 504000 890000 456000 870000 432000 860000 420000 850000 408000 840000 384000 830000 360000 820000 336000 810000 312000 810000 264000 810000 216000 810000 Best regards, Jernej > > Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com> > --- > drivers/clk/sunxi-ng/ccu-sun50i-h6.c | 12 +++++------- > 1 file changed, 5 insertions(+), 7 deletions(-) > > diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c > b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c index 2ddf0a0da526f..d941238cd178a > 100644 > --- a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c > +++ b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c > @@ -95,13 +95,14 @@ static struct ccu_nkmp pll_periph1_clk = { > }, > }; > > +/* For GPU PLL, using an output divider for DFS causes system to fail */ > #define SUN50I_H6_PLL_GPU_REG 0x030 > static struct ccu_nkmp pll_gpu_clk = { > .enable = BIT(31), > .lock = BIT(28), > .n = _SUNXI_CCU_MULT_MIN(8, 8, 12), > .m = _SUNXI_CCU_DIV(1, 1), /* input divider */ > - .p = _SUNXI_CCU_DIV(0, 1), /* output divider */ > + .max_rate = 756000000UL, > .common = { > .reg = 0x030, > .hw.init = CLK_HW_INIT("pll-gpu", "osc24M", > @@ -294,12 +295,9 @@ static SUNXI_CCU_M_WITH_MUX_GATE(deinterlace_clk, > "deinterlace", static SUNXI_CCU_GATE(bus_deinterlace_clk, > "bus-deinterlace", "psi-ahb1-ahb2", 0x62c, BIT(0), 0); > > -static const char * const gpu_parents[] = { "pll-gpu" }; > -static SUNXI_CCU_M_WITH_MUX_GATE(gpu_clk, "gpu", gpu_parents, 0x670, > - 0, 3, /* M */ > - 24, 1, /* mux */ > - BIT(31), /* gate */ > - CLK_SET_RATE_PARENT); > +/* GPU_CLK divider kept disabled to avoid interferences with DFS */ > +static SUNXI_CCU_GATE(gpu_clk, "gpu", "pll-gpu", 0x670, > + BIT(31), CLK_SET_RATE_PARENT); > > static SUNXI_CCU_GATE(bus_gpu_clk, "bus-gpu", "psi-ahb1-ahb2", > 0x67c, BIT(0), 0);
Hi, DVFS was tested as DVFS using devfreq driver, not the script. The following OPP table was used: https://github.com/clementperon/linux/commit/add3ef683238095d2721de03601d5b01f2d9ce22 As is already mentioned in the commit message, P causes the issues as well. Regards, Roman сб, 25 июн. 2022 г. в 13:43, Jernej Škrabec <jernej.skrabec@gmail.com>: > > Hi Roman, > > Dne petek, 24. junij 2022 ob 18:52:11 CEST je Roman Stratiienko napisal(a): > > Using simple bash script it was discovered that not all CCU registers > > can be safely used for DFS, e.g.: > > > > while true > > do > > devmem 0x3001030 4 0xb0003e02 > > devmem 0x3001030 4 0xb0001e02 > > done > > > > Script above changes the GPU_PLL multiplier register value. While the > > script is running, the user should interact with the user interface. > > > > Using this method the following results were obtained: > > | Register | Name | Bits | Values | Result | > > | -- | -- | -- | -- | -- | > > | 0x3001030 | GPU_PLL.MULT | 15..8 | 20-62 | OK | > > | 0x3001030 | GPU_PLL.INDIV | 1 | 0-1 | OK | > > | 0x3001030 | GPU_PLL.OUTDIV | 0 | 0-1 | FAIL | > > | 0x3001670 | GPU_CLK.DIV | 3..0 | ANY | FAIL | > > > > Once bits that caused system failure disabled (kept default 0), > > it was discovered that GPU_CLK.MUX was used during DFS for some > > reason and was causing the failure too. > > > > After disabling GPU_PLL.OUTDIV the system started to fail during > > booting for some reason until the maximum frequency of GPU_PLL > > clock was limited to 756MHz. > > > > After all the changes made DVFS started to work seamlessly. > > I appreciate testing effort, but I don't think userspace approach is good way > for testing DVFS. I see 2 issues: > - As name already suggest, voltage also plays crucial role for stability. You > didn't say on which board you tested this, but I assume it has PMIC. Did you > make sure GPU voltage regulator is always at 1.04 V, which is needed for 756 > MHz? > - Kernel clock driver always goes through proper procedure for clock rate > change, which involves several steps. Bypassing them might also cause some > stability problems. > > I agree that GPU PLL should be limited to 756 MHz max. This seems to be > maximum operating point specified at vendor DT. But I managed to extract some > more information from vendor GPU driver. More specifically, from this snippet, > located in modules/gpu/mali-midgard/kernel_mode/driver/drivers/gpu/arm/ > midgard/platform/sunxi/mali_kbase_config_sunxi.c: > > pll_freq = target->freq; > while (pll_freq < 288000000) > pll_freq *= 2; > > err = clk_set_rate(sunxi_mali->gpu_pll_clk, pll_freq); > <...> > err = clk_set_rate(kbdev->clock, target->freq); > <...> > > Apparently, minimum stable PLL frequency is 288 MHz (this should be added) and > divider in peripheral clock can really be used, although preferably not. > Vendor GPU operating points specify only 2 lower than 288 MHz points - at 264 > MHz and 216 MHz. I'm fully aware that they may not be really stable and given > that these two and next two all share minimum voltage of 810 mV, power and > thermal savings are probably not that great, so we can skip them and pin > peripheral divider to 1, as you already did. > > Another discrepancy I see is that vendor DT has two operating points, at 336 > MHz and 384 MHz, which also use factor P (also known as d2 in vendor clock > source). This can be again an oversight or alternatively, it can be that P > factor can actually be used, but just with lower frequencies. > > Can you please make another test with GPU operating points specified in DT and > check if it works with P factor left in? > > For reference, vendor DT has following operating points (kHz, uV): > 756000 1040000 > 624000 950000 > 576000 930000 > 540000 910000 > 504000 890000 > 456000 870000 > 432000 860000 > 420000 850000 > 408000 840000 > 384000 830000 > 360000 820000 > 336000 810000 > 312000 810000 > 264000 810000 > 216000 810000 > > Best regards, > Jernej > > > > > Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com> > > --- > > drivers/clk/sunxi-ng/ccu-sun50i-h6.c | 12 +++++------- > > 1 file changed, 5 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c > > b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c index 2ddf0a0da526f..d941238cd178a > > 100644 > > --- a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c > > +++ b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c > > @@ -95,13 +95,14 @@ static struct ccu_nkmp pll_periph1_clk = { > > }, > > }; > > > > +/* For GPU PLL, using an output divider for DFS causes system to fail */ > > #define SUN50I_H6_PLL_GPU_REG 0x030 > > static struct ccu_nkmp pll_gpu_clk = { > > .enable = BIT(31), > > .lock = BIT(28), > > .n = _SUNXI_CCU_MULT_MIN(8, 8, 12), > > .m = _SUNXI_CCU_DIV(1, 1), /* input divider */ > > - .p = _SUNXI_CCU_DIV(0, 1), /* output divider > */ > > + .max_rate = 756000000UL, > > .common = { > > .reg = 0x030, > > .hw.init = CLK_HW_INIT("pll-gpu", "osc24M", > > @@ -294,12 +295,9 @@ static SUNXI_CCU_M_WITH_MUX_GATE(deinterlace_clk, > > "deinterlace", static SUNXI_CCU_GATE(bus_deinterlace_clk, > > "bus-deinterlace", "psi-ahb1-ahb2", 0x62c, BIT(0), 0); > > > > -static const char * const gpu_parents[] = { "pll-gpu" }; > > -static SUNXI_CCU_M_WITH_MUX_GATE(gpu_clk, "gpu", gpu_parents, 0x670, > > - 0, 3, /* M */ > > - 24, 1, /* mux */ > > - BIT(31), /* gate */ > > - CLK_SET_RATE_PARENT); > > +/* GPU_CLK divider kept disabled to avoid interferences with DFS */ > > +static SUNXI_CCU_GATE(gpu_clk, "gpu", "pll-gpu", 0x670, > > + BIT(31), CLK_SET_RATE_PARENT); > > > > static SUNXI_CCU_GATE(bus_gpu_clk, "bus-gpu", "psi-ahb1-ahb2", > > 0x67c, BIT(0), 0); > > > >
PS: For better DFS resolution P or GPU_CLK divider can be preprogrammed and GPU_PLL can be marked with fixed out divider. It's safe. Not safe to touch these regs during runtime. But do we really need that resolution? сб, 25 июн. 2022 г. в 16:27, Roman Stratiienko <r.stratiienko@gmail.com>: > > Hi, > > DVFS was tested as DVFS using devfreq driver, not the script. > > The following OPP table was used: > https://github.com/clementperon/linux/commit/add3ef683238095d2721de03601d5b01f2d9ce22 > > As is already mentioned in the commit message, P causes the issues as well. > > Regards, > Roman > > сб, 25 июн. 2022 г. в 13:43, Jernej Škrabec <jernej.skrabec@gmail.com>: > > > > > Hi Roman, > > > > Dne petek, 24. junij 2022 ob 18:52:11 CEST je Roman Stratiienko napisal(a): > > > Using simple bash script it was discovered that not all CCU registers > > > can be safely used for DFS, e.g.: > > > > > > while true > > > do > > > devmem 0x3001030 4 0xb0003e02 > > > devmem 0x3001030 4 0xb0001e02 > > > done > > > > > > Script above changes the GPU_PLL multiplier register value. While the > > > script is running, the user should interact with the user interface. > > > > > > Using this method the following results were obtained: > > > | Register | Name | Bits | Values | Result | > > > | -- | -- | -- | -- | -- | > > > | 0x3001030 | GPU_PLL.MULT | 15..8 | 20-62 | OK | > > > | 0x3001030 | GPU_PLL.INDIV | 1 | 0-1 | OK | > > > | 0x3001030 | GPU_PLL.OUTDIV | 0 | 0-1 | FAIL | > > > | 0x3001670 | GPU_CLK.DIV | 3..0 | ANY | FAIL | > > > > > > Once bits that caused system failure disabled (kept default 0), > > > it was discovered that GPU_CLK.MUX was used during DFS for some > > > reason and was causing the failure too. > > > > > > After disabling GPU_PLL.OUTDIV the system started to fail during > > > booting for some reason until the maximum frequency of GPU_PLL > > > clock was limited to 756MHz. > > > > > > After all the changes made DVFS started to work seamlessly. > > > > I appreciate testing effort, but I don't think userspace approach is good way > > for testing DVFS. I see 2 issues: > > - As name already suggest, voltage also plays crucial role for stability. You > > didn't say on which board you tested this, but I assume it has PMIC. Did you > > make sure GPU voltage regulator is always at 1.04 V, which is needed for 756 > > MHz? > > - Kernel clock driver always goes through proper procedure for clock rate > > change, which involves several steps. Bypassing them might also cause some > > stability problems. > > > > I agree that GPU PLL should be limited to 756 MHz max. This seems to be > > maximum operating point specified at vendor DT. But I managed to extract some > > more information from vendor GPU driver. More specifically, from this snippet, > > located in modules/gpu/mali-midgard/kernel_mode/driver/drivers/gpu/arm/ > > midgard/platform/sunxi/mali_kbase_config_sunxi.c: > > > > pll_freq = target->freq; > > while (pll_freq < 288000000) > > pll_freq *= 2; > > > > err = clk_set_rate(sunxi_mali->gpu_pll_clk, pll_freq); > > <...> > > err = clk_set_rate(kbdev->clock, target->freq); > > <...> > > > > Apparently, minimum stable PLL frequency is 288 MHz (this should be added) and > > divider in peripheral clock can really be used, although preferably not. > > Vendor GPU operating points specify only 2 lower than 288 MHz points - at 264 > > MHz and 216 MHz. I'm fully aware that they may not be really stable and given > > that these two and next two all share minimum voltage of 810 mV, power and > > thermal savings are probably not that great, so we can skip them and pin > > peripheral divider to 1, as you already did. > > > > Another discrepancy I see is that vendor DT has two operating points, at 336 > > MHz and 384 MHz, which also use factor P (also known as d2 in vendor clock > > source). This can be again an oversight or alternatively, it can be that P > > factor can actually be used, but just with lower frequencies. > > > > Can you please make another test with GPU operating points specified in DT and > > check if it works with P factor left in? > > > > For reference, vendor DT has following operating points (kHz, uV): > > 756000 1040000 > > 624000 950000 > > 576000 930000 > > 540000 910000 > > 504000 890000 > > 456000 870000 > > 432000 860000 > > 420000 850000 > > 408000 840000 > > 384000 830000 > > 360000 820000 > > 336000 810000 > > 312000 810000 > > 264000 810000 > > 216000 810000 > > > > Best regards, > > Jernej > > > > > > > > Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com> > > > --- > > > drivers/clk/sunxi-ng/ccu-sun50i-h6.c | 12 +++++------- > > > 1 file changed, 5 insertions(+), 7 deletions(-) > > > > > > diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c > > > b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c index 2ddf0a0da526f..d941238cd178a > > > 100644 > > > --- a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c > > > +++ b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c > > > @@ -95,13 +95,14 @@ static struct ccu_nkmp pll_periph1_clk = { > > > }, > > > }; > > > > > > +/* For GPU PLL, using an output divider for DFS causes system to fail */ > > > #define SUN50I_H6_PLL_GPU_REG 0x030 > > > static struct ccu_nkmp pll_gpu_clk = { > > > .enable = BIT(31), > > > .lock = BIT(28), > > > .n = _SUNXI_CCU_MULT_MIN(8, 8, 12), > > > .m = _SUNXI_CCU_DIV(1, 1), /* input divider */ > > > - .p = _SUNXI_CCU_DIV(0, 1), /* output divider > > */ > > > + .max_rate = 756000000UL, > > > .common = { > > > .reg = 0x030, > > > .hw.init = CLK_HW_INIT("pll-gpu", "osc24M", > > > @@ -294,12 +295,9 @@ static SUNXI_CCU_M_WITH_MUX_GATE(deinterlace_clk, > > > "deinterlace", static SUNXI_CCU_GATE(bus_deinterlace_clk, > > > "bus-deinterlace", "psi-ahb1-ahb2", 0x62c, BIT(0), 0); > > > > > > -static const char * const gpu_parents[] = { "pll-gpu" }; > > > -static SUNXI_CCU_M_WITH_MUX_GATE(gpu_clk, "gpu", gpu_parents, 0x670, > > > - 0, 3, /* M */ > > > - 24, 1, /* mux */ > > > - BIT(31), /* gate */ > > > - CLK_SET_RATE_PARENT); > > > +/* GPU_CLK divider kept disabled to avoid interferences with DFS */ > > > +static SUNXI_CCU_GATE(gpu_clk, "gpu", "pll-gpu", 0x670, > > > + BIT(31), CLK_SET_RATE_PARENT); > > > > > > static SUNXI_CCU_GATE(bus_gpu_clk, "bus-gpu", "psi-ahb1-ahb2", > > > 0x67c, BIT(0), 0); > > > > > > > >
Hi Roman, Jernej, On Sat, 25 Jun 2022 at 16:02, Roman Stratiienko <r.stratiienko@gmail.com> wrote: > > PS: > > For better DFS resolution P or GPU_CLK divider can be preprogrammed > and GPU_PLL can be marked with fixed out divider. It's safe. Not safe > to touch these regs during runtime. > But do we really need that resolution? > > сб, 25 июн. 2022 г. в 16:27, Roman Stratiienko <r.stratiienko@gmail.com>: > > > > Hi, > > > > DVFS was tested as DVFS using devfreq driver, not the script. > > > > The following OPP table was used: > > https://github.com/clementperon/linux/commit/add3ef683238095d2721de03601d5b01f2d9ce22 I now remember when I tried to enable GPU devfreq on H6 and noticed instability, I made a recap of my searches: https://lore.kernel.org/lkml/CAJiuCce58Gaxf_Qg2cnMwvOgUqYU__eKb3MDX1Fe_+47htg2bA@mail.gmail.com/ And found that Megous gave a possible explanation on linux-sunxi IRC: 20:12 <megi> looks like gpu pll on H6 is NKMP clock, and those are implemented in such a way in mainline that they are prone to overshooting the frequency during output divider reduction 20:13 <megi> so disabling P divider may help 20:13 <megi> or fixing the dividers 20:14 <megi> and just allowing N to change 20:22 <megi> hmm, I haven't looked at this for quite some time, but H6 BSP way of setting PLL factors actually makes the most sense out of everything I've seen/tested so far 20:23 <megi> it waits for lock not after setting NK factors, but after reducing the M factor (pre-divider) 20:24 <megi> I might as well re-run my CPU PLL tester with this algorithm, to see if it fixes the lockups 20:26 <megi> it makes sense to wait for PLL to stabilize "after" changing all the factors that actually affect the VCO, and not just some of them 20:27 <megi> warpme_: ^ 20:28 <megi> it may be the same thing that plagues the CPU PLL rate changes at runtime Regards, Clement > > > > As is already mentioned in the commit message, P causes the issues as well. > > > > Regards, > > Roman > > > > сб, 25 июн. 2022 г. в 13:43, Jernej Škrabec <jernej.skrabec@gmail.com>: > > > > > > > > Hi Roman, > > > > > > Dne petek, 24. junij 2022 ob 18:52:11 CEST je Roman Stratiienko napisal(a): > > > > Using simple bash script it was discovered that not all CCU registers > > > > can be safely used for DFS, e.g.: > > > > > > > > while true > > > > do > > > > devmem 0x3001030 4 0xb0003e02 > > > > devmem 0x3001030 4 0xb0001e02 > > > > done > > > > > > > > Script above changes the GPU_PLL multiplier register value. While the > > > > script is running, the user should interact with the user interface. > > > > > > > > Using this method the following results were obtained: > > > > | Register | Name | Bits | Values | Result | > > > > | -- | -- | -- | -- | -- | > > > > | 0x3001030 | GPU_PLL.MULT | 15..8 | 20-62 | OK | > > > > | 0x3001030 | GPU_PLL.INDIV | 1 | 0-1 | OK | > > > > | 0x3001030 | GPU_PLL.OUTDIV | 0 | 0-1 | FAIL | > > > > | 0x3001670 | GPU_CLK.DIV | 3..0 | ANY | FAIL | > > > > > > > > Once bits that caused system failure disabled (kept default 0), > > > > it was discovered that GPU_CLK.MUX was used during DFS for some > > > > reason and was causing the failure too. > > > > > > > > After disabling GPU_PLL.OUTDIV the system started to fail during > > > > booting for some reason until the maximum frequency of GPU_PLL > > > > clock was limited to 756MHz. > > > > > > > > After all the changes made DVFS started to work seamlessly. > > > > > > I appreciate testing effort, but I don't think userspace approach is good way > > > for testing DVFS. I see 2 issues: > > > - As name already suggest, voltage also plays crucial role for stability. You > > > didn't say on which board you tested this, but I assume it has PMIC. Did you > > > make sure GPU voltage regulator is always at 1.04 V, which is needed for 756 > > > MHz? > > > - Kernel clock driver always goes through proper procedure for clock rate > > > change, which involves several steps. Bypassing them might also cause some > > > stability problems. > > > > > > I agree that GPU PLL should be limited to 756 MHz max. This seems to be > > > maximum operating point specified at vendor DT. But I managed to extract some > > > more information from vendor GPU driver. More specifically, from this snippet, > > > located in modules/gpu/mali-midgard/kernel_mode/driver/drivers/gpu/arm/ > > > midgard/platform/sunxi/mali_kbase_config_sunxi.c: > > > > > > pll_freq = target->freq; > > > while (pll_freq < 288000000) > > > pll_freq *= 2; > > > > > > err = clk_set_rate(sunxi_mali->gpu_pll_clk, pll_freq); > > > <...> > > > err = clk_set_rate(kbdev->clock, target->freq); > > > <...> > > > > > > Apparently, minimum stable PLL frequency is 288 MHz (this should be added) and > > > divider in peripheral clock can really be used, although preferably not. > > > Vendor GPU operating points specify only 2 lower than 288 MHz points - at 264 > > > MHz and 216 MHz. I'm fully aware that they may not be really stable and given > > > that these two and next two all share minimum voltage of 810 mV, power and > > > thermal savings are probably not that great, so we can skip them and pin > > > peripheral divider to 1, as you already did. > > > > > > Another discrepancy I see is that vendor DT has two operating points, at 336 > > > MHz and 384 MHz, which also use factor P (also known as d2 in vendor clock > > > source). This can be again an oversight or alternatively, it can be that P > > > factor can actually be used, but just with lower frequencies. > > > > > > Can you please make another test with GPU operating points specified in DT and > > > check if it works with P factor left in? > > > > > > For reference, vendor DT has following operating points (kHz, uV): > > > 756000 1040000 > > > 624000 950000 > > > 576000 930000 > > > 540000 910000 > > > 504000 890000 > > > 456000 870000 > > > 432000 860000 > > > 420000 850000 > > > 408000 840000 > > > 384000 830000 > > > 360000 820000 > > > 336000 810000 > > > 312000 810000 > > > 264000 810000 > > > 216000 810000 > > > > > > Best regards, > > > Jernej > > > > > > > > > > > Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com> > > > > --- > > > > drivers/clk/sunxi-ng/ccu-sun50i-h6.c | 12 +++++------- > > > > 1 file changed, 5 insertions(+), 7 deletions(-) > > > > > > > > diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c > > > > b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c index 2ddf0a0da526f..d941238cd178a > > > > 100644 > > > > --- a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c > > > > +++ b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c > > > > @@ -95,13 +95,14 @@ static struct ccu_nkmp pll_periph1_clk = { > > > > }, > > > > }; > > > > > > > > +/* For GPU PLL, using an output divider for DFS causes system to fail */ > > > > #define SUN50I_H6_PLL_GPU_REG 0x030 > > > > static struct ccu_nkmp pll_gpu_clk = { > > > > .enable = BIT(31), > > > > .lock = BIT(28), > > > > .n = _SUNXI_CCU_MULT_MIN(8, 8, 12), > > > > .m = _SUNXI_CCU_DIV(1, 1), /* input divider */ > > > > - .p = _SUNXI_CCU_DIV(0, 1), /* output divider > > > */ > > > > + .max_rate = 756000000UL, > > > > .common = { > > > > .reg = 0x030, > > > > .hw.init = CLK_HW_INIT("pll-gpu", "osc24M", > > > > @@ -294,12 +295,9 @@ static SUNXI_CCU_M_WITH_MUX_GATE(deinterlace_clk, > > > > "deinterlace", static SUNXI_CCU_GATE(bus_deinterlace_clk, > > > > "bus-deinterlace", "psi-ahb1-ahb2", 0x62c, BIT(0), 0); > > > > > > > > -static const char * const gpu_parents[] = { "pll-gpu" }; > > > > -static SUNXI_CCU_M_WITH_MUX_GATE(gpu_clk, "gpu", gpu_parents, 0x670, > > > > - 0, 3, /* M */ > > > > - 24, 1, /* mux */ > > > > - BIT(31), /* gate */ > > > > - CLK_SET_RATE_PARENT); > > > > +/* GPU_CLK divider kept disabled to avoid interferences with DFS */ > > > > +static SUNXI_CCU_GATE(gpu_clk, "gpu", "pll-gpu", 0x670, > > > > + BIT(31), CLK_SET_RATE_PARENT); > > > > > > > > static SUNXI_CCU_GATE(bus_gpu_clk, "bus-gpu", "psi-ahb1-ahb2", > > > > 0x67c, BIT(0), 0); > > > > > > > > > > > >
On 6/24/22 11:52 AM, Roman Stratiienko wrote: > Using simple bash script it was discovered that not all CCU registers > can be safely used for DFS, e.g.: > > while true > do > devmem 0x3001030 4 0xb0003e02 > devmem 0x3001030 4 0xb0001e02 > done > > Script above changes the GPU_PLL multiplier register value. While the > script is running, the user should interact with the user interface. > > Using this method the following results were obtained: > > | Register | Name | Bits | Values | Result | > | -- | -- | -- | -- | -- | > | 0x3001030 | GPU_PLL.MULT | 15..8 | 20-62 | OK | > | 0x3001030 | GPU_PLL.INDIV | 1 | 0-1 | OK | > | 0x3001030 | GPU_PLL.OUTDIV | 0 | 0-1 | FAIL | > | 0x3001670 | GPU_CLK.DIV | 3..0 | ANY | FAIL | > > Once bits that caused system failure disabled (kept default 0), > it was discovered that GPU_CLK.MUX was used during DFS for some > reason and was causing the failure too. The GPU module clock has only one parent declared, so it is surprising that the mux would get set. Did this happen while the kernel driver was changing the frequency? > After disabling GPU_PLL.OUTDIV the system started to fail during > booting for some reason until the maximum frequency of GPU_PLL > clock was limited to 756MHz. The manual lists PLL_GPU's maximum frequency as 800 MHz. I assume you chose 756 MHz because that is the highest OPP. That should be okay, too. > After all the changes made DVFS started to work seamlessly. > > Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com> > --- > drivers/clk/sunxi-ng/ccu-sun50i-h6.c | 12 +++++------- > 1 file changed, 5 insertions(+), 7 deletions(-) > > diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c > index 2ddf0a0da526f..d941238cd178a 100644 > --- a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c > +++ b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c > @@ -95,13 +95,14 @@ static struct ccu_nkmp pll_periph1_clk = { > }, > }; > > +/* For GPU PLL, using an output divider for DFS causes system to fail */ > #define SUN50I_H6_PLL_GPU_REG 0x030 > static struct ccu_nkmp pll_gpu_clk = { > .enable = BIT(31), > .lock = BIT(28), > .n = _SUNXI_CCU_MULT_MIN(8, 8, 12), > .m = _SUNXI_CCU_DIV(1, 1), /* input divider */ > - .p = _SUNXI_CCU_DIV(0, 1), /* output divider */ > + .max_rate = 756000000UL, > .common = { > .reg = 0x030, > .hw.init = CLK_HW_INIT("pll-gpu", "osc24M", > @@ -294,12 +295,9 @@ static SUNXI_CCU_M_WITH_MUX_GATE(deinterlace_clk, "deinterlace", > static SUNXI_CCU_GATE(bus_deinterlace_clk, "bus-deinterlace", "psi-ahb1-ahb2", > 0x62c, BIT(0), 0); > > -static const char * const gpu_parents[] = { "pll-gpu" }; > -static SUNXI_CCU_M_WITH_MUX_GATE(gpu_clk, "gpu", gpu_parents, 0x670, > - 0, 3, /* M */ > - 24, 1, /* mux */ > - BIT(31), /* gate */ > - CLK_SET_RATE_PARENT); > +/* GPU_CLK divider kept disabled to avoid interferences with DFS */ > +static SUNXI_CCU_GATE(gpu_clk, "gpu", "pll-gpu", 0x670, > + BIT(31), CLK_SET_RATE_PARENT); These changes look fine to me. You also need to set the initial value for the fixed fields in the driver's probe function. Regards, Samuel
Hello Samuel, Thanks for having a look. вс, 3 июл. 2022 г. в 09:50, Samuel Holland <samuel@sholland.org>: > > On 6/24/22 11:52 AM, Roman Stratiienko wrote: > > Using simple bash script it was discovered that not all CCU registers > > can be safely used for DFS, e.g.: > > > > while true > > do > > devmem 0x3001030 4 0xb0003e02 > > devmem 0x3001030 4 0xb0001e02 > > done > > > > Script above changes the GPU_PLL multiplier register value. While the > > script is running, the user should interact with the user interface. > > > > Using this method the following results were obtained: > > > > | Register | Name | Bits | Values | Result | > > | -- | -- | -- | -- | -- | > > | 0x3001030 | GPU_PLL.MULT | 15..8 | 20-62 | OK | > > | 0x3001030 | GPU_PLL.INDIV | 1 | 0-1 | OK | > > | 0x3001030 | GPU_PLL.OUTDIV | 0 | 0-1 | FAIL | > > | 0x3001670 | GPU_CLK.DIV | 3..0 | ANY | FAIL | > > > > Once bits that caused system failure disabled (kept default 0), > > it was discovered that GPU_CLK.MUX was used during DFS for some > > reason and was causing the failure too. > > The GPU module clock has only one parent declared, so it is surprising that the > mux would get set. Did this happen while the kernel driver was changing the > frequency? I looked through the ccu code and didn't see anything that may cause issues, so I tested again and DFS works with MUX this time. I'll drop this change in v2. > > > After disabling GPU_PLL.OUTDIV the system started to fail during > > booting for some reason until the maximum frequency of GPU_PLL > > clock was limited to 756MHz. > > The manual lists PLL_GPU's maximum frequency as 800 MHz. I assume you chose 756 > MHz because that is the highest OPP. That should be okay, too. Setting the frequency higher than 756 makes the GPU very unstable. I decided to validate it again and removed the frequency limitation and can't see any issues so far. I'll also drop this change in v2. > > > After all the changes made DVFS started to work seamlessly. > > > > Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com> > > --- > > drivers/clk/sunxi-ng/ccu-sun50i-h6.c | 12 +++++------- > > 1 file changed, 5 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c > > index 2ddf0a0da526f..d941238cd178a 100644 > > --- a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c > > +++ b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c > > @@ -95,13 +95,14 @@ static struct ccu_nkmp pll_periph1_clk = { > > }, > > }; > > > > +/* For GPU PLL, using an output divider for DFS causes system to fail */ > > #define SUN50I_H6_PLL_GPU_REG 0x030 > > static struct ccu_nkmp pll_gpu_clk = { > > .enable = BIT(31), > > .lock = BIT(28), > > .n = _SUNXI_CCU_MULT_MIN(8, 8, 12), > > .m = _SUNXI_CCU_DIV(1, 1), /* input divider */ > > - .p = _SUNXI_CCU_DIV(0, 1), /* output divider */ > > + .max_rate = 756000000UL, > > .common = { > > .reg = 0x030, > > .hw.init = CLK_HW_INIT("pll-gpu", "osc24M", > > @@ -294,12 +295,9 @@ static SUNXI_CCU_M_WITH_MUX_GATE(deinterlace_clk, "deinterlace", > > static SUNXI_CCU_GATE(bus_deinterlace_clk, "bus-deinterlace", "psi-ahb1-ahb2", > > 0x62c, BIT(0), 0); > > > > -static const char * const gpu_parents[] = { "pll-gpu" }; > > -static SUNXI_CCU_M_WITH_MUX_GATE(gpu_clk, "gpu", gpu_parents, 0x670, > > - 0, 3, /* M */ > > - 24, 1, /* mux */ > > - BIT(31), /* gate */ > > - CLK_SET_RATE_PARENT); > > +/* GPU_CLK divider kept disabled to avoid interferences with DFS */ > > +static SUNXI_CCU_GATE(gpu_clk, "gpu", "pll-gpu", 0x670, > > + BIT(31), CLK_SET_RATE_PARENT); > > These changes look fine to me. You also need to set the initial value for the > fixed fields in the driver's probe function. Will do that in v2. I have no idea what was causing additional issues in my previous test session. Let's forget about them for now. Regards, Roman. > > Regards, > Samuel
diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c index 2ddf0a0da526f..d941238cd178a 100644 --- a/drivers/clk/sunxi-ng/ccu-sun50i-h6.c +++ b/drivers/clk/sunxi-ng/ccu-sun50i-h6.c @@ -95,13 +95,14 @@ static struct ccu_nkmp pll_periph1_clk = { }, }; +/* For GPU PLL, using an output divider for DFS causes system to fail */ #define SUN50I_H6_PLL_GPU_REG 0x030 static struct ccu_nkmp pll_gpu_clk = { .enable = BIT(31), .lock = BIT(28), .n = _SUNXI_CCU_MULT_MIN(8, 8, 12), .m = _SUNXI_CCU_DIV(1, 1), /* input divider */ - .p = _SUNXI_CCU_DIV(0, 1), /* output divider */ + .max_rate = 756000000UL, .common = { .reg = 0x030, .hw.init = CLK_HW_INIT("pll-gpu", "osc24M", @@ -294,12 +295,9 @@ static SUNXI_CCU_M_WITH_MUX_GATE(deinterlace_clk, "deinterlace", static SUNXI_CCU_GATE(bus_deinterlace_clk, "bus-deinterlace", "psi-ahb1-ahb2", 0x62c, BIT(0), 0); -static const char * const gpu_parents[] = { "pll-gpu" }; -static SUNXI_CCU_M_WITH_MUX_GATE(gpu_clk, "gpu", gpu_parents, 0x670, - 0, 3, /* M */ - 24, 1, /* mux */ - BIT(31), /* gate */ - CLK_SET_RATE_PARENT); +/* GPU_CLK divider kept disabled to avoid interferences with DFS */ +static SUNXI_CCU_GATE(gpu_clk, "gpu", "pll-gpu", 0x670, + BIT(31), CLK_SET_RATE_PARENT); static SUNXI_CCU_GATE(bus_gpu_clk, "bus-gpu", "psi-ahb1-ahb2", 0x67c, BIT(0), 0);
Using simple bash script it was discovered that not all CCU registers can be safely used for DFS, e.g.: while true do devmem 0x3001030 4 0xb0003e02 devmem 0x3001030 4 0xb0001e02 done Script above changes the GPU_PLL multiplier register value. While the script is running, the user should interact with the user interface. Using this method the following results were obtained: | Register | Name | Bits | Values | Result | | -- | -- | -- | -- | -- | | 0x3001030 | GPU_PLL.MULT | 15..8 | 20-62 | OK | | 0x3001030 | GPU_PLL.INDIV | 1 | 0-1 | OK | | 0x3001030 | GPU_PLL.OUTDIV | 0 | 0-1 | FAIL | | 0x3001670 | GPU_CLK.DIV | 3..0 | ANY | FAIL | Once bits that caused system failure disabled (kept default 0), it was discovered that GPU_CLK.MUX was used during DFS for some reason and was causing the failure too. After disabling GPU_PLL.OUTDIV the system started to fail during booting for some reason until the maximum frequency of GPU_PLL clock was limited to 756MHz. After all the changes made DVFS started to work seamlessly. Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com> --- drivers/clk/sunxi-ng/ccu-sun50i-h6.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-)