Message ID | fcca009e0e10840ff5ee31c88b618a212ab69201.1510735482.git.viresh.kumar@linaro.org (mailing list archive) |
---|---|
State | Deferred |
Delegated to: | Rafael Wysocki |
Headers | show |
On 15/11/2017 10:19, Viresh Kumar wrote: > No one has used it for the last two and half years (since it was > introduced by commit c36cf0717631 ("thermal: cpu_cooling: implement the > power cooling device API")), get rid of it. > > Cc: Javi Merino <javi.merino@arm.com> > Cc: Punit Agrawal <punit.agrawal@arm.com> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> > --- Even if I agree that is not used to in the mainstream kernel, it is part of the EAS which is currently merged in Android.
On 15-11-17, 11:18, Daniel Lezcano wrote: > Even if I agree that is not used to in the mainstream kernel, it is part > of the EAS which is currently merged in Android. Okay, I had chat with Daniel offline after looking at Android sources and here are the key-points: - No one is using this feature currently in Mainline or Android kernel tree. - There may be some SoC vendors who are using it on their local repos, but we don't have much details on that. - For EAS (Energy aware scheduling), we want to take static power of the CPUs into account and that is a TODO thing right now. What worries me is that this code got merged almost 2.5 years back and no one used it. Yeah, it looks interesting stuff but what's the guarantee that anyone is going to use it in near future ? @Javi, @Puneet, @Lukasz: As it was added by Javi (from ARM), can someone from ARM write a patch for the arm big LITTLE cpufreq driver (Juno?) and use this thing? Else I am not sure how long we want to keep code in kernel that no one touches. And of course, we can revert this commit later on if required without much trouble hopefully.
On Wed, Nov 15, 2017 at 02:49:48PM +0530, Viresh Kumar wrote: > No one has used it for the last two and half years (since it was > introduced by commit c36cf0717631 ("thermal: cpu_cooling: implement the > power cooling device API")), get rid of it. Linaro used it in lsk 3.18 for the cpufreq driver for Juno. The cpufreq driver was converted to the generic one from dt in lsk 4.4, but the generic cpufreq driver does not support static power because everything has to come from device tree and we don't have a way to specify it there. Cheers, Javi
On 15/11/2017 12:31, Javi Merino wrote: > On Wed, Nov 15, 2017 at 02:49:48PM +0530, Viresh Kumar wrote: >> No one has used it for the last two and half years (since it was >> introduced by commit c36cf0717631 ("thermal: cpu_cooling: implement the >> power cooling device API")), get rid of it. > > Linaro used it in lsk 3.18 for the cpufreq driver for Juno. The cpufreq > driver was converted to the generic one from dt in lsk 4.4, but the > generic cpufreq driver does not support static power because everything > has to come from device tree and we don't have a way to specify it there. Are in favor of removing it or improving the code ?
On Wed, Nov 15, 2017 at 12:39:36PM +0100, Daniel Lezcano wrote: > On 15/11/2017 12:31, Javi Merino wrote: > > On Wed, Nov 15, 2017 at 02:49:48PM +0530, Viresh Kumar wrote: > >> No one has used it for the last two and half years (since it was > >> introduced by commit c36cf0717631 ("thermal: cpu_cooling: implement the > >> power cooling device API")), get rid of it. > > > > Linaro used it in lsk 3.18 for the cpufreq driver for Juno. The cpufreq > > driver was converted to the generic one from dt in lsk 4.4, but the > > generic cpufreq driver does not support static power because everything > > has to come from device tree and we don't have a way to specify it there. > > Are in favor of removing it or improving the code ? Yes, to what I can remember, Juno driver originally used this and it was supposed to be a reference on who this stuff gets to be used. So, this is more like "the code never made mainline" instead of no one is using. And to be frank, this is the API that represents static power, which is the component that differentiate things while using IPA. Maybe So yes, my suggestion is to put effort to get the juno code that uses the static power back to mainline. > > > -- > <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs > > Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | > <http://twitter.com/#!/linaroorg> Twitter | > <http://www.linaro.org/linaro-blog/> Blog >
On Wed, Nov 15, 2017 at 11:18:03AM +0100, Daniel Lezcano wrote: > On 15/11/2017 10:19, Viresh Kumar wrote: > > No one has used it for the last two and half years (since it was > > introduced by commit c36cf0717631 ("thermal: cpu_cooling: implement the > > power cooling device API")), get rid of it. > > > > Cc: Javi Merino <javi.merino@arm.com> > > Cc: Punit Agrawal <punit.agrawal@arm.com> > > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> > > --- > > Even if I agree that is not used to in the mainstream kernel, it is part > of the EAS which is currently merged in Android. > Even though we really should care about stuff that is in mainline, this specific case is about a piece of code that never made mainline, or got lost on translation from one version to another. So, I am currently nacking this patch and asking ARM/linaro folks to upstream the juno implementation that uses static power. > -- > <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs > > Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | > <http://twitter.com/#!/linaroorg> Twitter | > <http://www.linaro.org/linaro-blog/> Blog >
On Wed, Nov 15, 2017 at 4:43 PM, Eduardo Valentin <edubezval@gmail.com> wrote: > On Wed, Nov 15, 2017 at 11:18:03AM +0100, Daniel Lezcano wrote: >> On 15/11/2017 10:19, Viresh Kumar wrote: >> > No one has used it for the last two and half years (since it was >> > introduced by commit c36cf0717631 ("thermal: cpu_cooling: implement the >> > power cooling device API")), get rid of it. >> > >> > Cc: Javi Merino <javi.merino@arm.com> >> > Cc: Punit Agrawal <punit.agrawal@arm.com> >> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> >> > --- >> >> Even if I agree that is not used to in the mainstream kernel, it is part >> of the EAS which is currently merged in Android. >> > > Even though we really should care about stuff that is in mainline, this > specific case is about a piece of code that never made mainline, or got > lost on translation from one version to another. So, I am currently > nacking this patch and asking ARM/linaro folks to upstream the juno > implementation that uses static power. However, I would like to see a clear declaration from whoever is maintaining that code today that there is a plan in place to upstream it and that this plan will actually be acted on. And, better yet, *when* that can be expected to happen. Without such a declaration I'm afraid there is no point for the mainline to carry the unused code. Which apparently gets in the way somehow, or Viresh wouldn't have taken the time to attempt to remove it I suppose? Thanks, Rafael
On Wed, Nov 15, 2017 at 07:17:49PM +0100, Rafael J. Wysocki wrote: > On Wed, Nov 15, 2017 at 4:43 PM, Eduardo Valentin <edubezval@gmail.com> wrote: > > On Wed, Nov 15, 2017 at 11:18:03AM +0100, Daniel Lezcano wrote: > >> On 15/11/2017 10:19, Viresh Kumar wrote: > >> > No one has used it for the last two and half years (since it was > >> > introduced by commit c36cf0717631 ("thermal: cpu_cooling: implement the > >> > power cooling device API")), get rid of it. > >> > > >> > Cc: Javi Merino <javi.merino@arm.com> > >> > Cc: Punit Agrawal <punit.agrawal@arm.com> > >> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> > >> > --- > >> > >> Even if I agree that is not used to in the mainstream kernel, it is part > >> of the EAS which is currently merged in Android. > >> > > > > Even though we really should care about stuff that is in mainline, this > > specific case is about a piece of code that never made mainline, or got > > lost on translation from one version to another. So, I am currently > > nacking this patch and asking ARM/linaro folks to upstream the juno > > implementation that uses static power. > > However, I would like to see a clear declaration from whoever is > maintaining that code today that there is a plan in place to upstream > it and that this plan will actually be acted on. And, better yet, > *when* that can be expected to happen. > > Without such a declaration I'm afraid there is no point for the > mainline to carry the unused code. Which apparently gets in the way > somehow, or Viresh wouldn't have taken the time to attempt to remove > it I suppose? I agree here. This is mostly a code maintained by the linaro folks at this moment (daniel, please chime in if I am wrong). If no effort is done to get the code into mainline, there is no point in keeping the static component as a dead code in our tree. > > Thanks, > Rafael
On 15-11-17, 19:17, Rafael J. Wysocki wrote: > However, I would like to see a clear declaration from whoever is > maintaining that code today that there is a plan in place to upstream > it and that this plan will actually be acted on. And, better yet, > *when* that can be expected to happen. Exactly what I have been advocating. And there is bunch of other places where such examples can be seen. For example multiple regulator support in the OPP framework, which I added an year ago hasn't seen a user yet. And I am pushing the TI guys (who wanted it badly) to upstream their code before we remove that as well :) Again, someone has to come up and take responsibility of getting static power platform support upstream in a definite amount of time.
Hi guys, On 15/11/17 18:20, Eduardo Valentin wrote: > On Wed, Nov 15, 2017 at 07:17:49PM +0100, Rafael J. Wysocki wrote: >> On Wed, Nov 15, 2017 at 4:43 PM, Eduardo Valentin <edubezval@gmail.com> wrote: >>> On Wed, Nov 15, 2017 at 11:18:03AM +0100, Daniel Lezcano wrote: >>>> On 15/11/2017 10:19, Viresh Kumar wrote: >>>>> No one has used it for the last two and half years (since it was >>>>> introduced by commit c36cf0717631 ("thermal: cpu_cooling: implement the >>>>> power cooling device API")), get rid of it. >>>>> >>>>> Cc: Javi Merino <javi.merino@arm.com> >>>>> Cc: Punit Agrawal <punit.agrawal@arm.com> >>>>> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> >>>>> --- >>>> >>>> Even if I agree that is not used to in the mainstream kernel, it is part >>>> of the EAS which is currently merged in Android. >>>> >>> >>> Even though we really should care about stuff that is in mainline, this >>> specific case is about a piece of code that never made mainline, or got >>> lost on translation from one version to another. So, I am currently >>> nacking this patch and asking ARM/linaro folks to upstream the juno >>> implementation that uses static power. >> >> However, I would like to see a clear declaration from whoever is >> maintaining that code today that there is a plan in place to upstream >> it and that this plan will actually be acted on. And, better yet, >> *when* that can be expected to happen. >> >> Without such a declaration I'm afraid there is no point for the >> mainline to carry the unused code. Which apparently gets in the way >> somehow, or Viresh wouldn't have taken the time to attempt to remove >> it I suppose? > > > I agree here. This is mostly a code maintained by the linaro folks at > this moment (daniel, please chime in if I am wrong). If no effort is > done to get the code into mainline, there is no point in keeping the > static component as a dead code in our tree. > When it was added in lsk 3.18 in what was then a thermal driver for Juno it was believed to have an effect in thermal mitigation, but that was not proven later as to justify posting it upstream, and that is why the code never made it in mainline. The code added there can be found at: https://git.linaro.org/landing-teams/working/arm/kernel-release.git/tree/drivers/thermal/scpi-thermal.c?h=lsk-3.18-armlt#n247 As for removing this code now, I would not want to make that decision without spending more time to check if it impacts other customer codelines. I'll come back with a reply to this in the next couple of days. Thank you, Ionela. >> >> Thanks, >> Rafael >
On 16-11-17, 15:02, Ionela Voinescu wrote: > When it was added in lsk 3.18 in what was then a thermal driver for Juno > it was believed to have an effect in thermal mitigation, but that was > not proven later as to justify posting it upstream, and that is why the > code never made it in mainline. > The code added there can be found at: > https://git.linaro.org/landing-teams/working/arm/kernel-release.git/tree/drivers/thermal/scpi-thermal.c?h=lsk-3.18-armlt#n247 > > As for removing this code now, I would not want to make that decision without > spending more time to check if it impacts other customer codelines. > I'll come back with a reply to this in the next couple of days. Sure, we can wait for few days definitely :)
On Thursday, November 16, 2017 4:20:58 PM CET Viresh Kumar wrote: > On 16-11-17, 15:02, Ionela Voinescu wrote: > > When it was added in lsk 3.18 in what was then a thermal driver for Juno > > it was believed to have an effect in thermal mitigation, but that was > > not proven later as to justify posting it upstream, and that is why the > > code never made it in mainline. > > The code added there can be found at: > > https://git.linaro.org/landing-teams/working/arm/kernel-release.git/tree/drivers/thermal/scpi-thermal.c?h=lsk-3.18-armlt#n247 > > > > As for removing this code now, I would not want to make that decision without > > spending more time to check if it impacts other customer codelines. > > > > I'll come back with a reply to this in the next couple of days. > > Sure, we can wait for few days definitely :) While the above certainly is true, it doesn't matter whether or not any out-of-the-tree code will be affected by removing this from the mainline. What matters is *only* whether or not anyone is going to add anything depending on it to the mainline any time soon. If that's not the case, the stuff goes away (and may be added back in the future if need be). To avoid delaying this indefinitely, let's make a deal as follows. Unless anyone with some changes targeted at the mainline and depending on the code in question shows up before the end of the merge window currently under way, I will queue up the patches from Viresh for 4.16. Then, there will be 8 weeks (or more) of time before the 4.16 merge window opens to drop them if any new material depending on the code removed by them materializes in the meantime. Thanks, Rafael
Hello, On Fri, Nov 17, 2017 at 12:31:41AM +0100, Rafael J. Wysocki wrote: > On Thursday, November 16, 2017 4:20:58 PM CET Viresh Kumar wrote: > > On 16-11-17, 15:02, Ionela Voinescu wrote: > > > When it was added in lsk 3.18 in what was then a thermal driver for Juno > > > it was believed to have an effect in thermal mitigation, but that was > > > not proven later as to justify posting it upstream, and that is why the > > > code never made it in mainline. Really? Do you have data that can be shared to back up the above statement? Last time I checked, not only in Juno, static power does have a non-negligible contribution. Neglecting static power can essentially make IPA to undershoot in many cases to eventually overshoot. And that is what I recollect from the data that I was presented, even for getting this code reviewed. Just do not recollect to have the link to it. > > > The code added there can be found at: > > > https://git.linaro.org/landing-teams/working/arm/kernel-release.git/tree/drivers/thermal/scpi-thermal.c?h=lsk-3.18-armlt#n247 > > > > > > As for removing this code now, I would not want to make that decision without > > > spending more time to check if it impacts other customer codelines. > > > Maybe this is a matter of Linaro/ARM failing to convince SoC vendors to really publish static power to mainline Linux instead of really having the benefit of modeling it? Even simple models based on temperature ranges would be better than only using dynamic power model. > > > I'll come back with a reply to this in the next couple of days. > > > > Sure, we can wait for few days definitely :) > > While the above certainly is true, it doesn't matter whether or not any > out-of-the-tree code will be affected by removing this from the mainline. > > What matters is *only* whether or not anyone is going to add anything > depending on it to the mainline any time soon. If that's not the case, > the stuff goes away (and may be added back in the future if need be). > > To avoid delaying this indefinitely, let's make a deal as follows. > > Unless anyone with some changes targeted at the mainline and depending on the > code in question shows up before the end of the merge window currently under > way, I will queue up the patches from Viresh for 4.16. Then, there will be > 8 weeks (or more) of time before the 4.16 merge window opens to drop them if > any new material depending on the code removed by them materializes in the > meantime. Sure, as I mentioned before, if we failed to convince SoC manufacturers to provide valid models, makes no sense to keep dead code in the tree, you have my support and you can add my: Acked-by: Eduardo Valentin <edubezval@gmail.com> I am just not convinced that this is really about the static power being negligible for IPA. > > Thanks, > Rafael >
On 16/11/2017 03:47, Viresh Kumar wrote: > On 15-11-17, 19:17, Rafael J. Wysocki wrote: >> However, I would like to see a clear declaration from whoever is >> maintaining that code today that there is a plan in place to upstream >> it and that this plan will actually be acted on. And, better yet, >> *when* that can be expected to happen. > > Exactly what I have been advocating. > > And there is bunch of other places where such examples can be seen. > For example multiple regulator support in the OPP framework, which I > added an year ago hasn't seen a user yet. And I am pushing the TI guys > (who wanted it badly) to upstream their code before we remove that as > well :) > > Again, someone has to come up and take responsibility of getting > static power platform support upstream in a definite amount of time. Instead of removing entirely the code, why not convert this to a DT based info and put the Juno values back ?
Hi Eduardo, Eduardo Valentin <edubezval@gmail.com> writes: > Hello, > > On Fri, Nov 17, 2017 at 12:31:41AM +0100, Rafael J. Wysocki wrote: >> On Thursday, November 16, 2017 4:20:58 PM CET Viresh Kumar wrote: >> > On 16-11-17, 15:02, Ionela Voinescu wrote: >> > > When it was added in lsk 3.18 in what was then a thermal driver for Juno >> > > it was believed to have an effect in thermal mitigation, but that was >> > > not proven later as to justify posting it upstream, and that is why the >> > > code never made it in mainline. > > > Really? Do you have data that can be shared to back up the above > statement? > > Last time I checked, not only in Juno, static power does have > a non-negligible contribution. Neglecting static power can essentially > make IPA to undershoot in many cases to eventually overshoot. And that > is what I recollect from the data that I was presented, even for getting > this code reviewed. Just do not recollect to have the link to it. Just to make sure we are on the same page - static power can be a significant portion of SoC power consumption. It varies widely across SoCs and from experience depends on things like fabrication process, ambient temperature, applied voltage/current drain, etc. There are many SoCs where static power is a significant part of power consumption and needs to be modelled for good thermal control. Specifically in the case of Juno - we'd done some thermal and performance benchmarking when working on IPA. This included implementing a static power model to understand and test it's impact. If memory serves me right static power was approximately in the 10-15% range. Unfortunately, I can't find any datasets to support or disprove this claim. My take away from the Juno experiment was that it is not a particularly thermally constrained system. At the least it took many 10s of seconds running at max frequency (both clusters and the GPU) with the case fan turned off for it to see a 10C increase. So the lack of a static power model wasn't affecting it's thermal control. But this situation is only true for Juno. More below... > >> > > The code added there can be found at: >> > > https://git.linaro.org/landing-teams/working/arm/kernel-release.git/tree/drivers/thermal/scpi-thermal.c?h=lsk-3.18-armlt#n247 >> > > >> > > As for removing this code now, I would not want to make that decision without >> > > spending more time to check if it impacts other customer codelines. >> > > > > Maybe this is a matter of Linaro/ARM failing to convince SoC vendors to > really publish static power to mainline Linux instead of really having > the benefit of modeling it? Even simple models based on temperature > ranges would be better than only using dynamic power model. I know of a few SoCs implementing static power model in their kernel (not mainline). It would be great for that code to hit mainline. But as with all the out of tree code, I'm not sure have much influence we have in making that happen. Again, I don't think we are arguing about the importance of static power in a power model based thermal control strategy like IPA. > >> > > I'll come back with a reply to this in the next couple of days. >> > >> > Sure, we can wait for few days definitely :) >> >> While the above certainly is true, it doesn't matter whether or not any >> out-of-the-tree code will be affected by removing this from the mainline. >> >> What matters is *only* whether or not anyone is going to add anything >> depending on it to the mainline any time soon. If that's not the case, >> the stuff goes away (and may be added back in the future if need be). >> >> To avoid delaying this indefinitely, let's make a deal as follows. >> >> Unless anyone with some changes targeted at the mainline and depending on the >> code in question shows up before the end of the merge window currently under >> way, I will queue up the patches from Viresh for 4.16. Then, there will be >> 8 weeks (or more) of time before the 4.16 merge window opens to drop them if >> any new material depending on the code removed by them materializes in the >> meantime. > > > Sure, as I mentioned before, if we failed to convince SoC manufacturers > to provide valid models, makes no sense to keep dead code in the tree, > you have my support and you can add my: > > Acked-by: Eduardo Valentin <edubezval@gmail.com> > > I am just not convinced that this is really about the static power > being negligible for IPA. Just to reiterate once more, we are in agreement here. :) I'd like to think this patchset has come out of a plan to develop functionality on top but I don't know if that is the case. > >> >> Thanks, >> Rafael >>
On 17-11-17, 11:02, Punit Agrawal wrote: > I'd like to think this patchset has come out of a plan to develop > functionality on top but I don't know if that is the case. If you are talking about my patch series (?), then no, I don't have any plans on top of it. I just cleaned things up a bit. Nothing more :)
Hi guys, On 17/11/17 11:02, Punit Agrawal wrote: > Hi Eduardo, > > Eduardo Valentin <edubezval@gmail.com> writes: > >> Hello, >> >> On Fri, Nov 17, 2017 at 12:31:41AM +0100, Rafael J. Wysocki wrote: >>> On Thursday, November 16, 2017 4:20:58 PM CET Viresh Kumar wrote: >>>> On 16-11-17, 15:02, Ionela Voinescu wrote: >>>>> When it was added in lsk 3.18 in what was then a thermal driver for Juno >>>>> it was believed to have an effect in thermal mitigation, but that was >>>>> not proven later as to justify posting it upstream, and that is why the >>>>> code never made it in mainline. >> >> >> Really? Do you have data that can be shared to back up the above >> statement? >> >> Last time I checked, not only in Juno, static power does have >> a non-negligible contribution. Neglecting static power can essentially >> make IPA to undershoot in many cases to eventually overshoot. And that >> is what I recollect from the data that I was presented, even for getting >> this code reviewed. Just do not recollect to have the link to it. > > Just to make sure we are on the same page - static power can be a > significant portion of SoC power consumption. It varies widely across > SoCs and from experience depends on things like fabrication process, > ambient temperature, applied voltage/current drain, etc. There are many > SoCs where static power is a significant part of power consumption and > needs to be modelled for good thermal control. > > Specifically in the case of Juno - we'd done some thermal and > performance benchmarking when working on IPA. This included implementing > a static power model to understand and test it's impact. If memory > serves me right static power was approximately in the 10-15% > range. Unfortunately, I can't find any datasets to support or disprove > this claim. > > My take away from the Juno experiment was that it is not a particularly > thermally constrained system. At the least it took many 10s of seconds > running at max frequency (both clusters and the GPU) with the case fan > turned off for it to see a 10C increase. So the lack of a static power > model wasn't affecting it's thermal control. > > But this situation is only true for Juno. More below... > Sorry, I did not mean to say that static power is irrelevant. I only specified that for Juno, the values used in the lsk3.18 kernel did not have a significant effect in thermal mitigation, as Punit detailed here and below. Talking generically, IPA uses approximate values for dynamic power consumption (due to approximate values for the dynamic power coefficient, frequency at the end of the analysis window, an approximate value for utilization), approximate values for sustainable power in within a thermal zone, and approximate values for static power. A lot of my own experiments so far showed that IPA can compensate very well for inaccuracies in all of these due to the PID control loop but it pays the price in the stability of its signal when they are way off. As you also mention, when the accuracy of these values is neglected this can result in an inefficient ramp up and seesaw effects (undershoots and overshoots). That's why is very important for IPA to be properly tuned and I would not suggest that any of these should be neglected, but to be as accurate as possible. But there is only so much accuracy that you can achieve given the high cost of added complexity: static power and the dynamic power coefficient depend on PVT which sometimes cannot easily be factored in, the utilization that scales the dynamic power cannot be easily tracked especially for clusters of CPUs and given inaccurate estimations of idle states, etc. This being said, I believe there are platforms out there where the static power cost might be much too expensive to model for the gain it brings in the stability and accuracy of IPA power request estimation and allocation. Also, as you pointed out, there is reluctance in sharing these models. >> >>>>> The code added there can be found at: >>>>> https://git.linaro.org/landing-teams/working/arm/kernel-release.git/tree/drivers/thermal/scpi-thermal.c?h=lsk-3.18-armlt#n247 >>>>> >>>>> As for removing this code now, I would not want to make that decision without >>>>> spending more time to check if it impacts other customer codelines. >>>>> >> >> Maybe this is a matter of Linaro/ARM failing to convince SoC vendors to >> really publish static power to mainline Linux instead of really having >> the benefit of modeling it? Even simple models based on temperature >> ranges would be better than only using dynamic power model. > > I know of a few SoCs implementing static power model in their kernel > (not mainline). It would be great for that code to hit mainline. But as > with all the out of tree code, I'm not sure have much influence we have > in making that happen. > > Again, I don't think we are arguing about the importance of static power > in a power model based thermal control strategy like IPA. > >> >>>>> I'll come back with a reply to this in the next couple of days. >>>> >>>> Sure, we can wait for few days definitely :) >>> >>> While the above certainly is true, it doesn't matter whether or not any >>> out-of-the-tree code will be affected by removing this from the mainline. >>> >>> What matters is *only* whether or not anyone is going to add anything >>> depending on it to the mainline any time soon. If that's not the case, >>> the stuff goes away (and may be added back in the future if need be). >>> >>> To avoid delaying this indefinitely, let's make a deal as follows. >>> >>> Unless anyone with some changes targeted at the mainline and depending on the >>> code in question shows up before the end of the merge window currently under >>> way, I will queue up the patches from Viresh for 4.16. Then, there will be >>> 8 weeks (or more) of time before the 4.16 merge window opens to drop them if >>> any new material depending on the code removed by them materializes in the >>> meantime. >> >> >> Sure, as I mentioned before, if we failed to convince SoC manufacturers >> to provide valid models, makes no sense to keep dead code in the tree, >> you have my support and you can add my: >> >> Acked-by: Eduardo Valentin <edubezval@gmail.com> >> >> I am just not convinced that this is really about the static power >> being negligible for IPA. > Static power is definitely significant for IPA as is the accuracy of all other values that contribute to the power request and allocation calculations. For me, part the decision or whether to remove or to keep this code is about how much use we can make of it now or in the future, as it stands. Information on static power, depending on platform and achievable accuracy, can be as simple as a DT value (I would definitely not recommend this to be the only supported source), a more complex callback or maybe a value provided by firmware where the mechanism to obtain that value is hidden. A DT model would be easy to support with the current code but it would be very inaccurate. More complex algorithms provided as callbacks should give the possibility for platform specific customization which lacks at the moment. But even if it was supported, SoC manufactures are usually reluctant to share that information. Passing information from firmware would allow for platform specific customization as well as hiding the mechanism through which is obtained, but there's no standard way to obtain this value at the moment (probably can be added to the OPP framework in the future). Another factor to consider is the imbalance between cpufreq and devfreq cooling devices (devfreq cooling devices are still able to provide static power information) that removing this code creates. Therefore, I would rather see this interface extended and not removed completely, in order to give users the possibility to link a source of information more appropriate for them. And it should all start with support for one platform. But I can't volunteer my time in short term to make these changes. So I can ack these patches for now, as the justification for cleaning this code is correct, but sooner or later better support for providing static power information should and will be added. Best regards, Ionela. > Just to reiterate once more, we are in agreement here. :) > > I'd like to think this patchset has come out of a plan to develop > functionality on top but I don't know if that is the case. > >> >>> >>> Thanks, >>> Rafael >>> >
On 21/11/2017 12:30, Ionela Voinescu wrote: [ ... ] > A DT model would be easy to support with the current code but it would > be very inaccurate. Why ? [ ... ]
On 21/11/17 14:06, Daniel Lezcano wrote: > On 21/11/2017 12:30, Ionela Voinescu wrote: > > [ ... ] > >> A DT model would be easy to support with the current code but it would >> be very inaccurate. > > Why ? > > [ ... ] > Hi all, The DT solution won't fly, the reason can be found below. I agree with Ionela and Punit that the Juno board is not the best platform to test the static power impact on IPA. In some other platforms the static power can be 50% or more of the total power, so it cannot be neglected. These are the issues. The static power equation is complicated, here is one known to me. The leakage function is exponentially influenced by current circuit supply voltage, body-bias and some constants K_{4,5}. P_{leak} = L_{g}*V_{dd}*K_{3}*e^{K_{4}*V_{dd}}*e^{K_{5}*V_{bs}}+| V_{bs}|*I_{Ju} It can also vary depending on technology (CMOS, FinFET, etc). It would be really hard to approximate by i.e. a polynomial function with inputs from DT. One size does not fit all. The equation can also tell you some interesting things about the manufacturing process. Exposing such information might be the last thing the vendors want to. That's why the vendors might want to implement whole thermal management in the firmware or skip static power and rely on IPA adaptation. They can also use a different api in IPA, when they have some mechanism to measure power in firmware, it can be feed into IPA. Anyway, I would recommend to keep it as is, to have a complete power model in the kernel. The code without static power routines looks awkward to me. From my side - NACK for the patch which removes static power. Regards, Lukasz Luba
Hi Lukasz On 21 November 2017 at 16:56, Lukasz Luba <llu.ker.dev@gmail.com> wrote: > On 21/11/17 14:06, Daniel Lezcano wrote: >> >> On 21/11/2017 12:30, Ionela Voinescu wrote: >> >> [ ... ] >> >>> A DT model would be easy to support with the current code but it would >>> be very inaccurate. >> >> >> Why ? >> >> [ ... ] >> > Hi all, > > The DT solution won't fly, the reason can be found below. > > I agree with Ionela and Punit that the Juno board is not > the best platform to test the static power impact on IPA. > In some other platforms the static power can be 50% or more > of the total power, so it cannot be neglected. > > These are the issues. > The static power equation is complicated, here is one known to me. > The leakage function is exponentially influenced by current circuit > supply voltage, body-bias and some constants K_{4,5}. > > P_{leak} = L_{g}*V_{dd}*K_{3}*e^{K_{4}*V_{dd}}*e^{K_{5}*V_{bs}}+| > V_{bs}|*I_{Ju} You forgot one main contributor of static leakage: the temperature > > It can also vary depending on technology (CMOS, FinFET, etc). > > It would be really hard to approximate by i.e. a polynomial > function with inputs from DT. One size does not fit all. But can't we linearized around the target temp ? that were we want to be accurate Regards, > > The equation can also tell you some interesting things about > the manufacturing process. Exposing such information might be the last > thing the vendors want to. > That's why the vendors might want to implement whole > thermal management in the firmware or skip static power and > rely on IPA adaptation. > They can also use a different api in IPA, when they have some mechanism > to measure power in firmware, it can be feed into IPA. > > Anyway, I would recommend to keep it as is, to have a complete > power model in the kernel. > The code without static power routines looks awkward to me. > From my side - NACK for the patch which removes static power. > > Regards, > Lukasz Luba
Hello folks, On Tue, Nov 21, 2017 at 05:08:34PM +0100, Vincent Guittot wrote: > Hi Lukasz > > On 21 November 2017 at 16:56, Lukasz Luba <llu.ker.dev@gmail.com> wrote: > > On 21/11/17 14:06, Daniel Lezcano wrote: > >> > >> On 21/11/2017 12:30, Ionela Voinescu wrote: > >> > >> [ ... ] > >> > >>> A DT model would be easy to support with the current code but it would > >>> be very inaccurate. > >> > >> > >> Why ? > >> > >> [ ... ] > >> > > Hi all, > > > > The DT solution won't fly, the reason can be found below. The APIs being removed by this patch is exactly to cover for the difficulty to model all static power cases. > > > > I agree with Ionela and Punit that the Juno board is not > > the best platform to test the static power impact on IPA. > > In some other platforms the static power can be 50% or more > > of the total power, so it cannot be neglected. > > > > These are the issues. > > The static power equation is complicated, here is one known to me. > > The leakage function is exponentially influenced by current circuit > > supply voltage, body-bias and some constants K_{4,5}. > > > > P_{leak} = L_{g}*V_{dd}*K_{3}*e^{K_{4}*V_{dd}}*e^{K_{5}*V_{bs}}+| > > V_{bs}|*I_{Ju} > > You forgot one main contributor of static leakage: the temperature > We all agree that it is hard to model static power, specially considering all variables. And that today, ARM/Linaro failed to convince vendors to expose this in mainline. So, ... > > > > It can also vary depending on technology (CMOS, FinFET, etc). > > > > It would be really hard to approximate by i.e. a polynomial > > function with inputs from DT. One size does not fit all. > > But can't we linearized around the target temp ? that were we want to > be accurate > > Regards, > > > > The equation can also tell you some interesting things about > > the manufacturing process. Exposing such information might be the last > > thing the vendors want to. > > That's why the vendors might want to implement whole > > thermal management in the firmware or skip static power and > > rely on IPA adaptation. > > They can also use a different api in IPA, when they have some mechanism > > to measure power in firmware, it can be feed into IPA. > > The lack of code in mainline, is not really because the API would not help IPA, but because 2.5y after this has been merged, vendors were not convinced to push a model, even if simple, to mainline. > > Anyway, I would recommend to keep it as is, to have a complete > > power model in the kernel. > > The code without static power routines looks awkward to me. > > From my side - NACK for the patch which removes static power. > > However, we cannot NACK just because we like the code :-). Nor we can NACK because vendors keep their code in Android tree somewhere, or in any other tree. Viresh has a point, if one looks at this code today in mainline, no one is using, it is a dead code, doesn't matter what is out of the tree using it. As I said before, the minimal you guys (ARM and Linaro) can do is to at least upstream the Juno code! as a reference. Come on guys? what is preventing you to upstream Juno model? As already discussed in this thread, we know Juno won't be the best platform to benefit for it, but it has a static power component, and for sure behaves better with the static power model than only with dynamic power. > > Regards, > > Lukasz Luba
Hi Vincent, On 21/11/17 17:08, Vincent Guittot wrote: > Hi Lukasz > > On 21 November 2017 at 16:56, Lukasz Luba <llu.ker.dev@gmail.com> wrote: >> On 21/11/17 14:06, Daniel Lezcano wrote: >>> >>> On 21/11/2017 12:30, Ionela Voinescu wrote: >>> >>> [ ... ] >>> >>>> A DT model would be easy to support with the current code but it would >>>> be very inaccurate. >>> >>> >>> Why ? >>> >>> [ ... ] >>> >> Hi all, >> >> The DT solution won't fly, the reason can be found below. >> >> I agree with Ionela and Punit that the Juno board is not >> the best platform to test the static power impact on IPA. >> In some other platforms the static power can be 50% or more >> of the total power, so it cannot be neglected. >> >> These are the issues. >> The static power equation is complicated, here is one known to me. >> The leakage function is exponentially influenced by current circuit >> supply voltage, body-bias and some constants K_{4,5}. >> >> P_{leak} = L_{g}*V_{dd}*K_{3}*e^{K_{4}*V_{dd}}*e^{K_{5}*V_{bs}}+| >> V_{bs}|*I_{Ju} > > You forgot one main contributor of static leakage: the temperature Yes, so basically, the relation between temperature and the power is exponential. The power doubles every 20deg. You can find this model for static power for Mali GPU: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/stabilize-8743.60.B-chromeos-4.4/drivers/gpu/arm/midgard/backend/gpu/mali_kbase_power_model_simple.c#37 There is a polynomial which approximates it for gpu (starting from line 56). > >> >> It can also vary depending on technology (CMOS, FinFET, etc). >> >> It would be really hard to approximate by i.e. a polynomial >> function with inputs from DT. One size does not fit all. > > But can't we linearized around the target temp ? that were we want to > be accurate I would also add: 'around the target temp' and starting at least from IPA enable trip point (so i.e. from 55degC to 75degC + margin) I would have to simulate it and see some results to see error values. Of course it would be better that having no static power at all, but the vendors would have create a tool which calculates the factors and put them to DT. Regards, Lukasz
Hi, On Tue, Nov 21, 2017 at 11:30:44AM +0000, Ionela Voinescu wrote: > Hi guys, > > On 17/11/17 11:02, Punit Agrawal wrote: > > Hi Eduardo, > > > > Eduardo Valentin <edubezval@gmail.com> writes: > > > >> Hello, > >> > >> On Fri, Nov 17, 2017 at 12:31:41AM +0100, Rafael J. Wysocki wrote: > >>> On Thursday, November 16, 2017 4:20:58 PM CET Viresh Kumar wrote: > >>>> On 16-11-17, 15:02, Ionela Voinescu wrote: > >>>>> When it was added in lsk 3.18 in what was then a thermal driver for Juno > >>>>> it was believed to have an effect in thermal mitigation, but that was > >>>>> not proven later as to justify posting it upstream, and that is why the > >>>>> code never made it in mainline. > >> > >> > >> Really? Do you have data that can be shared to back up the above > >> statement? > >> > >> Last time I checked, not only in Juno, static power does have > >> a non-negligible contribution. Neglecting static power can essentially > >> make IPA to undershoot in many cases to eventually overshoot. And that > >> is what I recollect from the data that I was presented, even for getting > >> this code reviewed. Just do not recollect to have the link to it. > > > > Just to make sure we are on the same page - static power can be a > > significant portion of SoC power consumption. It varies widely across > > SoCs and from experience depends on things like fabrication process, > > ambient temperature, applied voltage/current drain, etc. There are many > > SoCs where static power is a significant part of power consumption and > > needs to be modelled for good thermal control. > > > > Specifically in the case of Juno - we'd done some thermal and > > performance benchmarking when working on IPA. This included implementing > > a static power model to understand and test it's impact. If memory > > serves me right static power was approximately in the 10-15% > > range. Unfortunately, I can't find any datasets to support or disprove > > this claim. > > > > My take away from the Juno experiment was that it is not a particularly > > thermally constrained system. At the least it took many 10s of seconds > > running at max frequency (both clusters and the GPU) with the case fan > > turned off for it to see a 10C increase. So the lack of a static power > > model wasn't affecting it's thermal control. > > > > But this situation is only true for Juno. More below... > > > > Sorry, I did not mean to say that static power is irrelevant. I only > specified that for Juno, the values used in the lsk3.18 kernel did not > have a significant effect in thermal mitigation, as Punit detailed here > and below. > > Talking generically, IPA uses approximate values for dynamic power > consumption (due to approximate values for the dynamic power > coefficient, frequency at the end of the analysis window, an approximate > value for utilization), approximate values for sustainable power in > within a thermal zone, and approximate values for static power. A lot of > my own experiments so far showed that IPA can compensate very well for > inaccuracies in all of these due to the PID control loop but it pays the > price in the stability of its signal when they are way off. > > As you also mention, when the accuracy of these values is neglected this > can result in an inefficient ramp up and seesaw effects (undershoots and > overshoots). That's why is very important for IPA to be properly tuned > and I would not suggest that any of these should be neglected, but to be > as accurate as possible. > > But there is only so much accuracy that you can achieve given the high > cost of added complexity: static power and the dynamic power coefficient > depend on PVT which sometimes cannot easily be factored in, the > utilization that scales the dynamic power cannot be easily tracked > especially for clusters of CPUs and given inaccurate estimations of idle > states, etc. > > This being said, I believe there are platforms out there where the > static power cost might be much too expensive to model for the gain it > brings in the stability and accuracy of IPA power request estimation and > allocation. Also, as you pointed out, there is reluctance in sharing > these models. > > >> > >>>>> The code added there can be found at: > >>>>> https://git.linaro.org/landing-teams/working/arm/kernel-release.git/tree/drivers/thermal/scpi-thermal.c?h=lsk-3.18-armlt#n247 > >>>>> > >>>>> As for removing this code now, I would not want to make that decision without > >>>>> spending more time to check if it impacts other customer codelines. > >>>>> > >> > >> Maybe this is a matter of Linaro/ARM failing to convince SoC vendors to > >> really publish static power to mainline Linux instead of really having > >> the benefit of modeling it? Even simple models based on temperature > >> ranges would be better than only using dynamic power model. > > > > I know of a few SoCs implementing static power model in their kernel > > (not mainline). It would be great for that code to hit mainline. But as > > with all the out of tree code, I'm not sure have much influence we have > > in making that happen. > > > > Again, I don't think we are arguing about the importance of static power > > in a power model based thermal control strategy like IPA. > > > >> > >>>>> I'll come back with a reply to this in the next couple of days. > >>>> > >>>> Sure, we can wait for few days definitely :) > >>> > >>> While the above certainly is true, it doesn't matter whether or not any > >>> out-of-the-tree code will be affected by removing this from the mainline. > >>> > >>> What matters is *only* whether or not anyone is going to add anything > >>> depending on it to the mainline any time soon. If that's not the case, > >>> the stuff goes away (and may be added back in the future if need be). > >>> > >>> To avoid delaying this indefinitely, let's make a deal as follows. > >>> > >>> Unless anyone with some changes targeted at the mainline and depending on the > >>> code in question shows up before the end of the merge window currently under > >>> way, I will queue up the patches from Viresh for 4.16. Then, there will be > >>> 8 weeks (or more) of time before the 4.16 merge window opens to drop them if > >>> any new material depending on the code removed by them materializes in the > >>> meantime. > >> > >> > >> Sure, as I mentioned before, if we failed to convince SoC manufacturers > >> to provide valid models, makes no sense to keep dead code in the tree, > >> you have my support and you can add my: > >> > >> Acked-by: Eduardo Valentin <edubezval@gmail.com> > >> > >> I am just not convinced that this is really about the static power > >> being negligible for IPA. > > > > Static power is definitely significant for IPA as is the accuracy of > all other values that contribute to the power request and allocation > calculations. > > For me, part the decision or whether to remove or to keep this code is > about how much use we can make of it now or in the future, as it stands. > > Information on static power, depending on platform and achievable > accuracy, can be as simple as a DT value (I would definitely not > recommend this to be the only supported source), a more complex callback > or maybe a value provided by firmware where the mechanism to obtain that > value is hidden. > A DT model would be easy to support with the current code but it would > be very inaccurate. > More complex algorithms provided as callbacks should give the possibility > for platform specific customization which lacks at the moment. But even > if it was supported, SoC manufactures are usually reluctant to share that > information. > Passing information from firmware would allow for platform specific > customization as well as hiding the mechanism through which is obtained, > but there's no standard way to obtain this value at the moment (probably > can be added to the OPP framework in the future). > > Another factor to consider is the imbalance between cpufreq and devfreq > cooling devices (devfreq cooling devices are still able to provide > static power information) that removing this code creates. > > Therefore, I would rather see this interface extended and not removed > completely, in order to give users the possibility to link a source of > information more appropriate for them. And it should all start with > support for one platform. But I can't volunteer my time in short term > to make these changes. So I can ack these patches for now, as the > justification for cleaning this code is correct, but sooner or later > better support for providing static power information should and will > be added. Well, I really do not see the point of a ask to extend the current API if have no single user of it. What are the current users problems with the API? What needs to be improved? What are the problems? We cannot tell, guess why? We have no users of it! Once again, do we have a reference platform? Oh, yes, that is Juno, and not even that is in mainline code. Folks, this can be a very nice discussion on how we can over engineer this API, but being pragmatic and avoiding wasting our time here, we all know what needs to be done. Dead code in mainline is hard to maintain. API to support out of the tree code is even harder. I am surprised to see this code was able to sustain itself in mainline with none challenging it for 2.5 y. So, we either get you guys to upstream at least one user of it or we just move on and remove the API, and keep mainline with only dynamic power, with periodic undershoots and overshoots. It is a simple decision: you either mainline it or keep IPA MORE inaccurate and take the burden to keep vendor own implementation of APIs for static power model on each product cycle, you choose. BR, Eduardo > > Best regards, > Ionela. > > > Just to reiterate once more, we are in agreement here. :) > > > > I'd like to think this patchset has come out of a plan to develop > > functionality on top but I don't know if that is the case. > > > >> > >>> > >>> Thanks, > >>> Rafael > >>> > >
On 21/11/2017 18:09, Eduardo Valentin wrote: [ ... ] > Well, I really do not see the point of a ask to extend the current API > if have no single user of it. What are the current users problems with > the API? What needs to be improved? What are the problems? We cannot > tell, guess why? We have no users of it! > > Once again, do we have a reference platform? Oh, yes, that is Juno, and > not even that is in mainline code. > > Folks, this can be a very nice discussion on how we can over engineer > this API, but being pragmatic and avoiding wasting our time here, we all > know what needs to be done. Dead code in mainline is hard to maintain. > API to support out of the tree code is even harder. I am surprised to > see this code was able to sustain itself in mainline with none > challenging it for 2.5 y. So, we either get you guys to upstream at > least one user of it or we just move on and remove the API, and keep > mainline with only dynamic power, with periodic undershoots and > overshoots. It is a simple decision: you either mainline it or keep IPA > MORE inaccurate and take the burden to keep vendor own implementation > of APIs for static power model on each product cycle, you choose. +1 to remove it.
Hi, On Tue, Nov 21, 2017 at 08:57:06AM -0800, Eduardo Valentin wrote: [snip] > As I said before, the minimal you guys (ARM and Linaro) can do is to at > least upstream the Juno code! as a reference. Come on guys? what is > preventing you to upstream Juno model? As Ionela pointed out earlier in the thread, the cpufreq driver for Juno was not acceptable for mainline because it used platform specific code. When it was converted to cpufreq-dt, the static power was left behind because it can't be represented in device tree. This is because there isn't a function that works for every SoC, different process nodes (among other things) will need different functions. So it can't be just a bunch of coefficients in DT, we need a function. Hence the callback. In a nutshell, mainline does not want platform specific code, but we haven't figured out how to calculate static power without platform specific code. Cheers, Javi
On 21/11/2017 19:00, Javi Merino wrote: > Hi, > > On Tue, Nov 21, 2017 at 08:57:06AM -0800, Eduardo Valentin wrote: > > [snip] > >> As I said before, the minimal you guys (ARM and Linaro) can do is to at >> least upstream the Juno code! as a reference. Come on guys? what is >> preventing you to upstream Juno model? > > As Ionela pointed out earlier in the thread, the cpufreq driver for Juno > was not acceptable for mainline because it used platform specific code. > When it was converted to cpufreq-dt, the static power was left behind > because it can't be represented in device tree. This is because there > isn't a function that works for every SoC, different process nodes > (among other things) will need different functions. So it can't be just > a bunch of coefficients in DT, we need a function. Hence the callback. The DT could contain the coef and a compatible string for a specific polynomial computation callback. I imagine we should not have a lot of different equations, no ?
On Tue, Nov 21, 2017 at 06:00:07PM +0000, Javi Merino wrote: > Hi, > > On Tue, Nov 21, 2017 at 08:57:06AM -0800, Eduardo Valentin wrote: > > [snip] > > > As I said before, the minimal you guys (ARM and Linaro) can do is to at > > least upstream the Juno code! as a reference. Come on guys? what is > > preventing you to upstream Juno model? > > As Ionela pointed out earlier in the thread, the cpufreq driver for Juno > was not acceptable for mainline because it used platform specific code. > When it was converted to cpufreq-dt, the static power was left behind > because it can't be represented in device tree. This is because there Or we still haven't figure what to represent. Once decided what to represent, then we can claim if is doable in DT or not. > isn't a function that works for every SoC, different process nodes > (among other things) will need different functions. So it can't be just > a bunch of coefficients in DT, we need a function. Hence the callback. Well, DT is full of vendor specific stuff. > > In a nutshell, mainline does not want platform specific code, but we > haven't figured out how to calculate static power without platform > specific code. To, that is still fine to have it as a callback, as long as you have at least one user! I still do not understand why Juno static power cannot go as platform code that register the callback to implement the static power model. > > Cheers, > Javi
On Tue, Nov 21, 2017 at 07:05:46PM +0100, Daniel Lezcano wrote: > On 21/11/2017 19:00, Javi Merino wrote: > > Hi, > > > > On Tue, Nov 21, 2017 at 08:57:06AM -0800, Eduardo Valentin wrote: > > > > [snip] > > > >> As I said before, the minimal you guys (ARM and Linaro) can do is to at > >> least upstream the Juno code! as a reference. Come on guys? what is > >> preventing you to upstream Juno model? > > > > As Ionela pointed out earlier in the thread, the cpufreq driver for Juno > > was not acceptable for mainline because it used platform specific code. > > When it was converted to cpufreq-dt, the static power was left behind > > because it can't be represented in device tree. This is because there > > isn't a function that works for every SoC, different process nodes > > (among other things) will need different functions. So it can't be just > > a bunch of coefficients in DT, we need a function. Hence the callback. > > The DT could contain the coef and a compatible string for a specific > polynomial computation callback. I imagine we should not have a lot of > different equations, no ? > Yeah, that would be another way of doing it. If there is no equation that correlates all processes, then we need a vendor specific entry, or a compatible string, as Daniel said. > > > -- > <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs > > Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | > <http://twitter.com/#!/linaroorg> Twitter | > <http://www.linaro.org/linaro-blog/> Blog >
On 21/11/17 19:13, Eduardo Valentin wrote: > On Tue, Nov 21, 2017 at 07:05:46PM +0100, Daniel Lezcano wrote: >> On 21/11/2017 19:00, Javi Merino wrote: >>> Hi, >>> >>> On Tue, Nov 21, 2017 at 08:57:06AM -0800, Eduardo Valentin wrote: >>> >>> [snip] >>> >>>> As I said before, the minimal you guys (ARM and Linaro) can do is to at >>>> least upstream the Juno code! as a reference. Come on guys? what is >>>> preventing you to upstream Juno model? >>> >>> As Ionela pointed out earlier in the thread, the cpufreq driver for Juno >>> was not acceptable for mainline because it used platform specific code. >>> When it was converted to cpufreq-dt, the static power was left behind >>> because it can't be represented in device tree. This is because there >>> isn't a function that works for every SoC, different process nodes >>> (among other things) will need different functions. So it can't be just >>> a bunch of coefficients in DT, we need a function. Hence the callback. >> >> The DT could contain the coef and a compatible string for a specific >> polynomial computation callback. I imagine we should not have a lot of >> different equations, no ? >> > > Yeah, that would be another way of doing it. If there is no equation > that correlates all processes, then we need a vendor specific entry, or > a compatible string, as Daniel said. > So we have ~8 weeks (before it will vanish from mainline) to come up with ideas or to show that it is needed and used by some platform. Let's see... Regards, Lukasz
On 21-11-17, 18:00, Javi Merino wrote: > As Ionela pointed out earlier in the thread, the cpufreq driver for Juno > was not acceptable for mainline because it used platform specific code. Can we get a link to that thread? I don't remember what I have commented earlier but the above doesn't seem to be entirely true. The basic idea is to use as much common stuff as possible and so to use cpufreq-dt.c if possible. Its not that we are against platform specific bits, they are fine if they are really required. > When it was converted to cpufreq-dt, the static power was left behind > because it can't be represented in device tree. This is because there > isn't a function that works for every SoC, different process nodes > (among other things) will need different functions. So it can't be just > a bunch of coefficients in DT, we need a function. Hence the callback. Sure thing. And we can make this happen if we need. We aren't blocking it. > In a nutshell, mainline does not want platform specific code, but we Not really. We don't want platform specific code in arch/arm64, but we can have that in drivers/opp/ for example if required. Please start a discussion (in a separate thread if you want) and we can get cpufreq support updated for Juno very easily. And don't worry about this patch here. We can surely drop the patch if someone is serious enough to start using it. But there needs to be a commitment, nothing more.
(sorry for chiming in quite late) On 21/11/17 18:12, Eduardo Valentin wrote: > On Tue, Nov 21, 2017 at 06:00:07PM +0000, Javi Merino wrote: >> Hi, >> >> On Tue, Nov 21, 2017 at 08:57:06AM -0800, Eduardo Valentin wrote: >> [...] >> >> In a nutshell, mainline does not want platform specific code, but we >> haven't figured out how to calculate static power without platform >> specific code.> > To, that is still fine to have it as a callback, as long as you have at > least one user! I still do not understand why Juno static power cannot > go as platform code that register the callback to implement the static > power model. > 1. It was proved not so useful(anyone can prove otherwise ?) 2. I am told static power is negligible compared to dynamic power with new fab processes. 3. It's very hard to even test IPA on Juno as it doesn't reach the required critical temperature easily. So as Juno platform maintainer I want a test case to test regression before we merge anything. IMO, if the $subject code is expected to be used on Juno, then my answer is no if one can't test it reliably and also prove that static power really matters on Juno. So far, I have heard both the above is not possible. So please delete the code if Juno is the only user in short and mid term. We can get the code back if we find any users in longer term.
On 22-11-17, 06:55, Viresh Kumar wrote: > On 21-11-17, 18:00, Javi Merino wrote: > > As Ionela pointed out earlier in the thread, the cpufreq driver for Juno > > was not acceptable for mainline because it used platform specific code. > > Can we get a link to that thread? I don't remember what I have commented earlier > but the above doesn't seem to be entirely true. > > The basic idea is to use as much common stuff as possible and so to use > cpufreq-dt.c if possible. Its not that we are against platform specific bits, > they are fine if they are really required. Just to correct everyone here, Juno doesn't use the cpufreq-dt driver but scpi-cpufreq.c :) -- viresh
On 22-11-17, 10:59, Sudeep Holla wrote: > IMO, if the $subject code is expected to be used on Juno, then my answer > is no if one can't test it reliably and also prove that static power > really matters on Juno. So far, I have heard both the above is not > possible. So please delete the code if Juno is the only user in > short and mid term. We can get the code back if we find any users in > longer term. Its funny that the $subject patch receives few Acks and Nacks every day now :)
On 22/11/17 11:10, Viresh Kumar wrote: > On 22-11-17, 10:59, Sudeep Holla wrote: >> IMO, if the $subject code is expected to be used on Juno, then my answer >> is no if one can't test it reliably and also prove that static power >> really matters on Juno. So far, I have heard both the above is not >> possible. So please delete the code if Juno is the only user in >> short and mid term. We can get the code back if we find any users in >> longer term. > > Its funny that the $subject patch receives few Acks and Nacks every day now :) > If Nacks are depending on Juno, then you can convert them to Ack on my behalf ;) as that won't happen unless someone disproves all the known facts so far and provides a solid test case we can depend on to test regressions.
On Wed, Nov 22, 2017 at 10:59:21AM +0000, Sudeep Holla wrote: > (sorry for chiming in quite late) > > On 21/11/17 18:12, Eduardo Valentin wrote: > > On Tue, Nov 21, 2017 at 06:00:07PM +0000, Javi Merino wrote: > >> Hi, > >> > >> On Tue, Nov 21, 2017 at 08:57:06AM -0800, Eduardo Valentin wrote: > >> > > [...] > > >> > >> In a nutshell, mainline does not want platform specific code, but we > >> haven't figured out how to calculate static power without platform > >> specific code.> > > To, that is still fine to have it as a callback, as long as you have at > > least one user! I still do not understand why Juno static power cannot > > go as platform code that register the callback to implement the static > > power model. > > > > 1. It was proved not so useful(anyone can prove otherwise ?) Can anyone prove it does not have static power? > 2. I am told static power is negligible compared to dynamic power with > new fab processes. I am told quantum computer is out there :-), does it mean we should drop the maintenance of everything else? > 3. It's very hard to even test IPA on Juno as it doesn't reach the > required critical temperature easily. So as Juno platform maintainer > I want a test case to test regression before we merge anything. > > IMO, if the $subject code is expected to be used on Juno, then my answer > is no if one can't test it reliably and also prove that static power > really matters on Juno. So far, I have heard both the above is not > possible. So please delete the code if Juno is the only user in > short and mid term. We can get the code back if we find any users in > longer term. Yeah, the fact that Juno takes time to reach crit temperature does not necessarily imply it does not have static power consumption, or that its static power consumption is negligible. Now, if you want to ignore it, because it is not the best example to show usefulness of IPA, that is a different story. Eduardo > > -- > Regards, > Sudeep
On Wed, Nov 22, 2017 at 04:40:13PM +0530, Viresh Kumar wrote: > On 22-11-17, 10:59, Sudeep Holla wrote: > > IMO, if the $subject code is expected to be used on Juno, then my answer > > is no if one can't test it reliably and also prove that static power > > really matters on Juno. So far, I have heard both the above is not > > possible. So please delete the code if Juno is the only user in > > short and mid term. We can get the code back if we find any users in > > longer term. > > Its funny that the $subject patch receives few Acks and Nacks every day now :) I am still in favor to removed it, my ack still sustains, as my original motivation to keep this is pointless as people responsible for the code is not willing upstream it nor have the time nor will get further vendors to use it. > > -- > viresh
On 22/11/17 15:34, Eduardo Valentin wrote: > On Wed, Nov 22, 2017 at 10:59:21AM +0000, Sudeep Holla wrote: >> (sorry for chiming in quite late) >> >> On 21/11/17 18:12, Eduardo Valentin wrote: >>> On Tue, Nov 21, 2017 at 06:00:07PM +0000, Javi Merino wrote: >>>> Hi, >>>> >>>> On Tue, Nov 21, 2017 at 08:57:06AM -0800, Eduardo Valentin wrote: >>>> >> >> [...] >> >>>> >>>> In a nutshell, mainline does not want platform specific code, but we >>>> haven't figured out how to calculate static power without platform >>>> specific code.> >>> To, that is still fine to have it as a callback, as long as you have at >>> least one user! I still do not understand why Juno static power cannot >>> go as platform code that register the callback to implement the static >>> power model. >>> >> >> 1. It was proved not so useful(anyone can prove otherwise ?) > > Can anyone prove it does not have static power? > I didn't claim that. I said it has been found that it's negligible based on experiments. >> 2. I am told static power is negligible compared to dynamic power with >> new fab processes. > > I am told quantum computer is out there :-), does it mean we should drop the > maintenance of everything else? > Good comparison :) Anyways all I told is if this code expects Juno to be user, then no as there's no reliable way to test it. It's entirely up to you if you want to support it or delete it as you are the maintainer. I am bit confused here as you seem to imply you want to continue supporting it here with you quantum comparison but say you are fine to delete in other thread. > >> 3. It's very hard to even test IPA on Juno as it doesn't reach the >> required critical temperature easily. So as Juno platform maintainer >> I want a test case to test regression before we merge anything. >> >> IMO, if the $subject code is expected to be used on Juno, then my answer >> is no if one can't test it reliably and also prove that static power >> really matters on Juno. So far, I have heard both the above is not >> possible. So please delete the code if Juno is the only user in >> short and mid term. We can get the code back if we find any users in >> longer term. > > Yeah, the fact that Juno takes time to reach crit temperature does not > necessarily imply it does not have static power consumption, or that its > static power consumption is negligible. Now, if you want to ignore it, > because it is not the best example to show usefulness of IPA, that is a > different story. > OK, I agree that I want to ignore the usefulness of static power on Juno as no one provides a reliable way to see that and test that regularly. I am open to change my mind if circumstances change.
diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c index 55d6b9fb909d..f102ad6127a4 100644 --- a/drivers/thermal/cpu_cooling.c +++ b/drivers/thermal/cpu_cooling.c @@ -88,7 +88,6 @@ struct time_in_idle { * @policy: cpufreq policy. * @node: list_head to link all cpufreq_cooling_device together. * @idle_time: idle time stats - * @plat_get_static_power: callback to calculate the static power * * This structure is required for keeping information of each registered * cpufreq_cooling_device. @@ -104,7 +103,6 @@ struct cpufreq_cooling_device { struct cpufreq_policy *policy; struct list_head node; struct time_in_idle *idle_time; - get_static_t plat_get_static_power; }; static DEFINE_IDA(cpufreq_ida); @@ -318,60 +316,6 @@ static u32 get_load(struct cpufreq_cooling_device *cpufreq_cdev, int cpu, return load; } -/** - * get_static_power() - calculate the static power consumed by the cpus - * @cpufreq_cdev: struct &cpufreq_cooling_device for this cpu cdev - * @tz: thermal zone device in which we're operating - * @freq: frequency in KHz - * @power: pointer in which to store the calculated static power - * - * Calculate the static power consumed by the cpus described by - * @cpu_actor running at frequency @freq. This function relies on a - * platform specific function that should have been provided when the - * actor was registered. If it wasn't, the static power is assumed to - * be negligible. The calculated static power is stored in @power. - * - * Return: 0 on success, -E* on failure. - */ -static int get_static_power(struct cpufreq_cooling_device *cpufreq_cdev, - struct thermal_zone_device *tz, unsigned long freq, - u32 *power) -{ - struct dev_pm_opp *opp; - unsigned long voltage; - struct cpufreq_policy *policy = cpufreq_cdev->policy; - struct cpumask *cpumask = policy->related_cpus; - unsigned long freq_hz = freq * 1000; - struct device *dev; - - if (!cpufreq_cdev->plat_get_static_power) { - *power = 0; - return 0; - } - - dev = get_cpu_device(policy->cpu); - WARN_ON(!dev); - - opp = dev_pm_opp_find_freq_exact(dev, freq_hz, true); - if (IS_ERR(opp)) { - dev_warn_ratelimited(dev, "Failed to find OPP for frequency %lu: %ld\n", - freq_hz, PTR_ERR(opp)); - return -EINVAL; - } - - voltage = dev_pm_opp_get_voltage(opp); - dev_pm_opp_put(opp); - - if (voltage == 0) { - dev_err_ratelimited(dev, "Failed to get voltage for frequency %lu\n", - freq_hz); - return -EINVAL; - } - - return cpufreq_cdev->plat_get_static_power(cpumask, tz->passive_delay, - voltage, power); -} - /** * get_dynamic_power() - calculate the dynamic power * @cpufreq_cdev: &cpufreq_cooling_device for this cdev @@ -491,8 +435,8 @@ static int cpufreq_get_requested_power(struct thermal_cooling_device *cdev, u32 *power) { unsigned long freq; - int i = 0, cpu, ret; - u32 static_power, dynamic_power, total_load = 0; + int i = 0, cpu; + u32 total_load = 0; struct cpufreq_cooling_device *cpufreq_cdev = cdev->devdata; struct cpufreq_policy *policy = cpufreq_cdev->policy; u32 *load_cpu = NULL; @@ -522,22 +466,15 @@ static int cpufreq_get_requested_power(struct thermal_cooling_device *cdev, cpufreq_cdev->last_load = total_load; - dynamic_power = get_dynamic_power(cpufreq_cdev, freq); - ret = get_static_power(cpufreq_cdev, tz, freq, &static_power); - if (ret) { - kfree(load_cpu); - return ret; - } + *power = get_dynamic_power(cpufreq_cdev, freq); if (load_cpu) { trace_thermal_power_cpu_get_power(policy->related_cpus, freq, - load_cpu, i, dynamic_power, - static_power); + load_cpu, i, *power); kfree(load_cpu); } - *power = static_power + dynamic_power; return 0; } @@ -561,8 +498,6 @@ static int cpufreq_state2power(struct thermal_cooling_device *cdev, unsigned long state, u32 *power) { unsigned int freq, num_cpus; - u32 static_power, dynamic_power; - int ret; struct cpufreq_cooling_device *cpufreq_cdev = cdev->devdata; /* Request state should be less than max_level */ @@ -572,13 +507,9 @@ static int cpufreq_state2power(struct thermal_cooling_device *cdev, num_cpus = cpumask_weight(cpufreq_cdev->policy->cpus); freq = cpufreq_cdev->freq_table[state].frequency; - dynamic_power = cpu_freq_to_power(cpufreq_cdev, freq) * num_cpus; - ret = get_static_power(cpufreq_cdev, tz, freq, &static_power); - if (ret) - return ret; + *power = cpu_freq_to_power(cpufreq_cdev, freq) * num_cpus; - *power = static_power + dynamic_power; - return ret; + return 0; } /** @@ -606,21 +537,14 @@ static int cpufreq_power2state(struct thermal_cooling_device *cdev, unsigned long *state) { unsigned int cur_freq, target_freq; - int ret; - s32 dyn_power; - u32 last_load, normalised_power, static_power; + u32 last_load, normalised_power; struct cpufreq_cooling_device *cpufreq_cdev = cdev->devdata; struct cpufreq_policy *policy = cpufreq_cdev->policy; cur_freq = cpufreq_quick_get(policy->cpu); - ret = get_static_power(cpufreq_cdev, tz, cur_freq, &static_power); - if (ret) - return ret; - - dyn_power = power - static_power; - dyn_power = dyn_power > 0 ? dyn_power : 0; + power = power > 0 ? power : 0; last_load = cpufreq_cdev->last_load ?: 1; - normalised_power = (dyn_power * 100) / last_load; + normalised_power = (power * 100) / last_load; target_freq = cpu_power_to_freq(cpufreq_cdev, normalised_power); *state = get_level(cpufreq_cdev, target_freq); @@ -671,8 +595,6 @@ static unsigned int find_next_max(struct cpufreq_frequency_table *table, * @policy: cpufreq policy * Normally this should be same as cpufreq policy->related_cpus. * @capacitance: dynamic power coefficient for these cpus - * @plat_static_func: function to calculate the static power consumed by these - * cpus (optional) * * This interface function registers the cpufreq cooling device with the name * "thermal-cpufreq-%x". This api can support multiple instances of cpufreq @@ -684,8 +606,7 @@ static unsigned int find_next_max(struct cpufreq_frequency_table *table, */ static struct thermal_cooling_device * __cpufreq_cooling_register(struct device_node *np, - struct cpufreq_policy *policy, u32 capacitance, - get_static_t plat_static_func) + struct cpufreq_policy *policy, u32 capacitance) { struct thermal_cooling_device *cdev; struct cpufreq_cooling_device *cpufreq_cdev; @@ -755,8 +676,6 @@ __cpufreq_cooling_register(struct device_node *np, } if (capacitance) { - cpufreq_cdev->plat_get_static_power = plat_static_func; - ret = update_freq_table(cpufreq_cdev, capacitance); if (ret) { cdev = ERR_PTR(ret); @@ -813,7 +732,7 @@ __cpufreq_cooling_register(struct device_node *np, struct thermal_cooling_device * cpufreq_cooling_register(struct cpufreq_policy *policy) { - return __cpufreq_cooling_register(NULL, policy, 0, NULL); + return __cpufreq_cooling_register(NULL, policy, 0); } EXPORT_SYMBOL_GPL(cpufreq_cooling_register); @@ -853,8 +772,7 @@ of_cpufreq_cooling_register(struct cpufreq_policy *policy) of_property_read_u32(np, "dynamic-power-coefficient", &capacitance); - cdev = __cpufreq_cooling_register(np, policy, capacitance, - NULL); + cdev = __cpufreq_cooling_register(np, policy, capacitance); if (IS_ERR(cdev)) { pr_err("cpu_cooling: cpu%d is not running as cooling device: %ld\n", policy->cpu, PTR_ERR(cdev)); diff --git a/include/linux/cpu_cooling.h b/include/linux/cpu_cooling.h index a0204c58d269..22d7364cbc16 100644 --- a/include/linux/cpu_cooling.h +++ b/include/linux/cpu_cooling.h @@ -30,9 +30,6 @@ struct cpufreq_policy; -typedef int (*get_static_t)(cpumask_t *cpumask, int interval, - unsigned long voltage, u32 *power); - #ifdef CONFIG_CPU_THERMAL /** * cpufreq_cooling_register - function to create cpufreq cooling device. diff --git a/include/trace/events/thermal.h b/include/trace/events/thermal.h index 466c09d882ad..52424cf13408 100644 --- a/include/trace/events/thermal.h +++ b/include/trace/events/thermal.h @@ -93,9 +93,9 @@ TRACE_EVENT(thermal_zone_trip, TRACE_EVENT(thermal_power_cpu_get_power, TP_PROTO(const struct cpumask *cpus, unsigned long freq, u32 *load, - size_t load_len, u32 dynamic_power, u32 static_power), + size_t load_len, u32 dynamic_power), - TP_ARGS(cpus, freq, load, load_len, dynamic_power, static_power), + TP_ARGS(cpus, freq, load, load_len, dynamic_power), TP_STRUCT__entry( __bitmask(cpumask, num_possible_cpus()) @@ -103,7 +103,6 @@ TRACE_EVENT(thermal_power_cpu_get_power, __dynamic_array(u32, load, load_len) __field(size_t, load_len ) __field(u32, dynamic_power ) - __field(u32, static_power ) ), TP_fast_assign( @@ -114,13 +113,12 @@ TRACE_EVENT(thermal_power_cpu_get_power, load_len * sizeof(*load)); __entry->load_len = load_len; __entry->dynamic_power = dynamic_power; - __entry->static_power = static_power; ), - TP_printk("cpus=%s freq=%lu load={%s} dynamic_power=%d static_power=%d", + TP_printk("cpus=%s freq=%lu load={%s} dynamic_power=%d", __get_bitmask(cpumask), __entry->freq, __print_array(__get_dynamic_array(load), __entry->load_len, 4), - __entry->dynamic_power, __entry->static_power) + __entry->dynamic_power) ); TRACE_EVENT(thermal_power_cpu_limit,
No one has used it for the last two and half years (since it was introduced by commit c36cf0717631 ("thermal: cpu_cooling: implement the power cooling device API")), get rid of it. Cc: Javi Merino <javi.merino@arm.com> Cc: Punit Agrawal <punit.agrawal@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> --- drivers/thermal/cpu_cooling.c | 106 +++++------------------------------------ include/linux/cpu_cooling.h | 3 -- include/trace/events/thermal.h | 10 ++-- 3 files changed, 16 insertions(+), 103 deletions(-)