Message ID | 20230612053922.3284394-1-dmitry.baryshkov@linaro.org (mailing list archive) |
---|---|
Headers | show |
Series | ARM: qcom: apq8064: support CPU frequency scaling | expand |
On Mon, Jun 12, 2023 at 08:39:04AM +0300, Dmitry Baryshkov wrote: > Implement CPUFreq support for one of the oldest supported Qualcomm > platforms, APQ8064. Each core has independent power and frequency > control. Additionally the L2 cache is scaled to follow the CPU > frequencies (failure to do so results in strange semi-random crashes). Hi, can we talk, maybe in private about this interconnect-cpu thing? I see you follow the original implementation of the msm_bus where in practice with the use of the kbps the correct clock and voltage was set. (and this was also used to set the fabric clock from nominal to fast) On ipq806x and I assume other SoC there isn't always a 1:1 map of CPU freq and L2 freq. For example on ipq8064 we have max CPU freq of 1.4GHz and L2 freq of 1.2GHz, on ipq8065 we have CPU 1.7GHz and L2 of 1.4GHz. (and even that is curious since I used the debug regs and the cxo crystal to measure the clock by hardware (yes i ported the very ancient clk-debug to modern kernel and it works and discovered all sort of things) the L2 (I assume due to climitation of the hfpll) actually can't never reach that frequency (1.4GHz in reality results to something like 1.2GHz from what I notice a stable clock is there only with frequency of max 1GHz)) So my idea was to introduce a simple devfreq driver and use the PASSIVE governor where it was added the possibility to link to a CPU frequency and with interpolation select the L2 frequency (and voltage) From some old comments in ancient qsdk code it was pointed out that due to a hw limitation the secondary cpu can't stay at a high clock if L2 was at the idle clock. (no idea if this is specific to IPQ806x) So this might be a cause of your crash? (I also have random crash with L2 scaling and we are planning to just force the L2 at max frequency) But sorry for all of this (maybe) useless info. I checked the other patch and I didn't understand how the different L2 frequency are declared and even the voltage. Is this something that will come later? I'm very interested in this implementation. > > Core voltage is controlled through the SAW2 devices, one for each core. > The L2 has two regulators, vdd-mem and vdd-dig. > > Depenency: [1] for interconnect-clk implementation > > https://lore.kernel.org/linux-arm-msm/20230512001334.2983048-3-dmitry.baryshkov@linaro.org/ >
On 11/06/2023 19:27, Christian Marangi wrote: > On Mon, Jun 12, 2023 at 08:39:04AM +0300, Dmitry Baryshkov wrote: >> Implement CPUFreq support for one of the oldest supported Qualcomm >> platforms, APQ8064. Each core has independent power and frequency >> control. Additionally the L2 cache is scaled to follow the CPU >> frequencies (failure to do so results in strange semi-random crashes). > > Hi, can we talk, maybe in private about this interconnect-cpu thing? Hi, sure. Feel free to ping me on IRC (lumag) or via email. Or we can just continue our discussion here, as it might be interesting to other people too. > I see you follow the original implementation of the msm_bus where in > practice with the use of the kbps the correct clock and voltage was set. > (and this was also used to set the fabric clock from nominal to fast) > > On ipq806x and I assume other SoC there isn't always a 1:1 map of CPU > freq and L2 freq. For example on ipq8064 we have max CPU freq of 1.4GHz > and L2 freq of 1.2GHz, on ipq8065 we have CPU 1.7GHz and L2 of 1.4GHz. This is also the case for apq8064. The vendor kernel defines 15 frequencies for L2 cache clock, but then for some reasons all PVS tables use just 3 entries from these 15. > (and even that is curious since I used the debug regs and the cxo > crystal to measure the clock by hardware (yes i ported the very ancient > clk-debug to modern kernel and it works and discovered all sort of > things) the L2 (I assume due to climitation of the hfpll) actually can't > never reach that frequency (1.4GHz in reality results to something like > 1.2GHz from what I notice a stable clock is there only with frequency of > max 1GHz)) I would like to point you to https://github.com/andersson/debugcc/, which is a userspace reimplementation of clk-debug. We'd appreciate your patches there. > So my idea was to introduce a simple devfreq driver and use the PASSIVE > governor where it was added the possibility to link to a CPU frequency > and with interpolation select the L2 frequency (and voltage) I stumbled upon this idea, when I was working on the msm8996 and it's CBF clock (CBF = interconnect between two core clusters). While it should be possible to use DEVFREQ in simple cases (e.g. L2 clock >= max(CPU clock), if possible). However real configurations are slightly harder. E.g. for the purpose of this patchset, the relationship for apq8064 is the following (in MHz): CPU L2 384 384 486 648 594 648 702 648 .... ... 1026 648 1134 1134 .... .... 1512 1134 .... .... It should be noted that msm8960 also used just three values for the L2 cache frequencies. From what I can see, only msm8x60 made L2 freq tightly follow the CPU frequency. > From some old comments in ancient qsdk code it was pointed out that due > to a hw limitation the secondary cpu can't stay at a high clock if L2 > was at the idle clock. (no idea if this is specific to IPQ806x) So this > might be a cause of your crash? (I also have random crash with L2 > scaling and we are planning to just force the L2 at max frequency) It might be related. It was more or less the same story with msm8996, which was either 'maxcpus=2' or scaling the CBF clock. > But sorry for all of this (maybe) useless info. I checked the other > patch and I didn't understand how the different L2 frequency are > declared and even the voltage. Is this something that will come later? > I'm very interested in this implementation. The L2 frequency (<&kraitcc 4>) is converted into bandwidth vote, which then goes into the OPP tables. But please also see the discussion started at the patch 15. > >> >> Core voltage is controlled through the SAW2 devices, one for each core. >> The L2 has two regulators, vdd-mem and vdd-dig. >> >> Depenency: [1] for interconnect-clk implementation >> >> https://lore.kernel.org/linux-arm-msm/20230512001334.2983048-3-dmitry.baryshkov@linaro.org/ >> >
On Mon, Jun 12, 2023 at 05:20:02PM +0300, Dmitry Baryshkov wrote: > On 11/06/2023 19:27, Christian Marangi wrote: > > On Mon, Jun 12, 2023 at 08:39:04AM +0300, Dmitry Baryshkov wrote: > > > Implement CPUFreq support for one of the oldest supported Qualcomm > > > platforms, APQ8064. Each core has independent power and frequency > > > control. Additionally the L2 cache is scaled to follow the CPU > > > frequencies (failure to do so results in strange semi-random crashes). > > > > Hi, can we talk, maybe in private about this interconnect-cpu thing? > > Hi, sure. Feel free to ping me on IRC (lumag) or via email. Or we can just > continue our discussion here, as it might be interesting to other people > too. > Don't know if here is the right place to discuss my concern and problem with L2 scaling on ipq8064... > > I see you follow the original implementation of the msm_bus where in > > practice with the use of the kbps the correct clock and voltage was set. > > (and this was also used to set the fabric clock from nominal to fast) > > > > On ipq806x and I assume other SoC there isn't always a 1:1 map of CPU > > freq and L2 freq. For example on ipq8064 we have max CPU freq of 1.4GHz > > and L2 freq of 1.2GHz, on ipq8065 we have CPU 1.7GHz and L2 of 1.4GHz. > > This is also the case for apq8064. The vendor kernel defines 15 frequencies > for L2 cache clock, but then for some reasons all PVS tables use just 3 > entries from these 15. > Eh who knows why they did this... Probably the hfpll was limited or they notice no temp/power benefits were present with scaling with that much of steps? > > (and even that is curious since I used the debug regs and the cxo > > crystal to measure the clock by hardware (yes i ported the very ancient > > clk-debug to modern kernel and it works and discovered all sort of > > things) the L2 (I assume due to climitation of the hfpll) actually can't > > never reach that frequency (1.4GHz in reality results to something like > > 1.2GHz from what I notice a stable clock is there only with frequency of > > max 1GHz)) > > I would like to point you to https://github.com/andersson/debugcc/, which is > a userspace reimplementation of clk-debug. We'd appreciate your patches > there. > Hi, I wasted some good time on the implementation but manage to make it work and proposed a pr! I assume the thing can be reused for apq8064 if someone ever wants to have fun with that. > > So my idea was to introduce a simple devfreq driver and use the PASSIVE > > governor where it was added the possibility to link to a CPU frequency > > and with interpolation select the L2 frequency (and voltage) > > I stumbled upon this idea, when I was working on the msm8996 and it's CBF > clock (CBF = interconnect between two core clusters). While it should be > possible to use DEVFREQ in simple cases (e.g. L2 clock >= max(CPU clock), if > possible). However real configurations are slightly harder. > E.g. for the purpose of this patchset, the relationship for apq8064 is the > following (in MHz): > > CPU L2 > 384 384 > 486 648 > 594 648 > 702 648 > .... ... > 1026 648 > 1134 1134 > .... .... > 1512 1134 > .... .... > > It should be noted that msm8960 also used just three values for the L2 cache > frequencies. From what I can see, only msm8x60 made L2 freq tightly follow > the CPU frequency. > Happy to test and found a common path... With the merge of the cpu opp and nvmem work, I was just about to send the L2 devfreq driver... And also the fabric devfreq driver. But I wonder if I can use this interconnect thing for the 2 task. > > From some old comments in ancient qsdk code it was pointed out that due > > to a hw limitation the secondary cpu can't stay at a high clock if L2 > > was at the idle clock. (no idea if this is specific to IPQ806x) So this > > might be a cause of your crash? (I also have random crash with L2 > > scaling and we are planning to just force the L2 at max frequency) > > It might be related. It was more or less the same story with msm8996, which > was either 'maxcpus=2' or scaling the CBF clock. > Might be a krait defect... and this is pretty bad... > > But sorry for all of this (maybe) useless info. I checked the other > > patch and I didn't understand how the different L2 frequency are > > declared and even the voltage. Is this something that will come later? > > I'm very interested in this implementation. > > The L2 frequency (<&kraitcc 4>) is converted into bandwidth vote, which then > goes into the OPP tables. But please also see the discussion started at the > patch 15. > I didn't notice you were defining multiple supply, scaling the voltage under the hood with that trick. It's not a bad idea but as pointed out it might be problematic, since is seems krait is very sensible with L2 frequency and voltage so we should simulate the original implementation as close as possible... > > > > > > > > Core voltage is controlled through the SAW2 devices, one for each core. > > > The L2 has two regulators, vdd-mem and vdd-dig. > > > > > > Depenency: [1] for interconnect-clk implementation > > > > > > https://lore.kernel.org/linux-arm-msm/20230512001334.2983048-3-dmitry.baryshkov@linaro.org/ > > > > > > > -- > With best wishes > Dmitry >
On 13/06/2023 19:19, Christian Marangi wrote: > On Mon, Jun 12, 2023 at 05:20:02PM +0300, Dmitry Baryshkov wrote: >> On 11/06/2023 19:27, Christian Marangi wrote: >>> On Mon, Jun 12, 2023 at 08:39:04AM +0300, Dmitry Baryshkov wrote: >>>> Implement CPUFreq support for one of the oldest supported Qualcomm >>>> platforms, APQ8064. Each core has independent power and frequency >>>> control. Additionally the L2 cache is scaled to follow the CPU >>>> frequencies (failure to do so results in strange semi-random crashes). >>> >>> Hi, can we talk, maybe in private about this interconnect-cpu thing? >> >> Hi, sure. Feel free to ping me on IRC (lumag) or via email. Or we can just >> continue our discussion here, as it might be interesting to other people >> too. >> > > Don't know if here is the right place to discuss my concern and problem > with L2 scaling on ipq8064... I think I will try segregating L2 data to l2-cache device node (I saw your comment that it is not populated by default. I'll have to fix this). > >>> I see you follow the original implementation of the msm_bus where in >>> practice with the use of the kbps the correct clock and voltage was set. >>> (and this was also used to set the fabric clock from nominal to fast) >>> >>> On ipq806x and I assume other SoC there isn't always a 1:1 map of CPU >>> freq and L2 freq. For example on ipq8064 we have max CPU freq of 1.4GHz >>> and L2 freq of 1.2GHz, on ipq8065 we have CPU 1.7GHz and L2 of 1.4GHz. >> >> This is also the case for apq8064. The vendor kernel defines 15 frequencies >> for L2 cache clock, but then for some reasons all PVS tables use just 3 >> entries from these 15. >> > > Eh who knows why they did this... Probably the hfpll was limited or they > notice no temp/power benefits were present with scaling with that much > of steps? > >>> (and even that is curious since I used the debug regs and the cxo >>> crystal to measure the clock by hardware (yes i ported the very ancient >>> clk-debug to modern kernel and it works and discovered all sort of >>> things) the L2 (I assume due to climitation of the hfpll) actually can't >>> never reach that frequency (1.4GHz in reality results to something like >>> 1.2GHz from what I notice a stable clock is there only with frequency of >>> max 1GHz)) >> >> I would like to point you to https://github.com/andersson/debugcc/, which is >> a userspace reimplementation of clk-debug. We'd appreciate your patches >> there. >> > > Hi, I wasted some good time on the implementation but manage to make it > work and proposed a pr! I assume the thing can be reused for apq8064 if > someone ever wants to have fun with that. Thanks a lot! Generally I think that debugcc is a very valuable debugging tool and it should be getting more attention from the community. With the chips newer than 8064 it is easy enough to add new platform data. > >>> So my idea was to introduce a simple devfreq driver and use the PASSIVE >>> governor where it was added the possibility to link to a CPU frequency >>> and with interpolation select the L2 frequency (and voltage) >> >> I stumbled upon this idea, when I was working on the msm8996 and it's CBF >> clock (CBF = interconnect between two core clusters). While it should be >> possible to use DEVFREQ in simple cases (e.g. L2 clock >= max(CPU clock), if >> possible). However real configurations are slightly harder. >> E.g. for the purpose of this patchset, the relationship for apq8064 is the >> following (in MHz): >> >> CPU L2 >> 384 384 >> 486 648 >> 594 648 >> 702 648 >> .... ... >> 1026 648 >> 1134 1134 >> .... .... >> 1512 1134 >> .... .... >> >> It should be noted that msm8960 also used just three values for the L2 cache >> frequencies. From what I can see, only msm8x60 made L2 freq tightly follow >> the CPU frequency. >> > > Happy to test and found a common path... With the merge of the cpu opp > and nvmem work, I was just about to send the L2 devfreq driver... And > also the fabric devfreq driver. But I wonder if I can use this > interconnect thing for the 2 task. > >>> From some old comments in ancient qsdk code it was pointed out that due >>> to a hw limitation the secondary cpu can't stay at a high clock if L2 >>> was at the idle clock. (no idea if this is specific to IPQ806x) So this >>> might be a cause of your crash? (I also have random crash with L2 >>> scaling and we are planning to just force the L2 at max frequency) >> >> It might be related. It was more or less the same story with msm8996, which >> was either 'maxcpus=2' or scaling the CBF clock. >> > > Might be a krait defect... and this is pretty bad... I don't know if it is a defect or just a misfeature. Anyway, we know that L2 should be clocked high enough and we can cope with it. > >>> But sorry for all of this (maybe) useless info. I checked the other >>> patch and I didn't understand how the different L2 frequency are >>> declared and even the voltage. Is this something that will come later? >>> I'm very interested in this implementation. >> >> The L2 frequency (<&kraitcc 4>) is converted into bandwidth vote, which then >> goes into the OPP tables. But please also see the discussion started at the >> patch 15. >> > > I didn't notice you were defining multiple supply, scaling the voltage > under the hood with that trick. It's not a bad idea but as pointed out > it might be problematic, since is seems krait is very sensible with L2 > frequency and voltage so we should simulate the original implementation > as close as possible... It was my original intention,as the vendor kernel does it in the vdd-mem, vdd-dig, vdd-core, L2-freq, core freq order. I did not expect that voltages are scaled after BW casts. (this describes freq-increase case, in case of decreasing frequency the order is inverted). > >>> >>>> >>>> Core voltage is controlled through the SAW2 devices, one for each core. >>>> The L2 has two regulators, vdd-mem and vdd-dig. >>>> >>>> Depenency: [1] for interconnect-clk implementation >>>> >>>> https://lore.kernel.org/linux-arm-msm/20230512001334.2983048-3-dmitry.baryshkov@linaro.org/ >>>> >>> >> >> -- >> With best wishes >> Dmitry >> >