Message ID | 20200813113030.1.I89c33c4119eaffb986b1e8c1bc6f0e30267089cd@changeid (mailing list archive) |
---|---|
State | Accepted |
Commit | 5a4d9f3e18f5e2281f50e7f644e1f9a1589d900b |
Headers | show |
Series | arm64: dts: qcom: sc7180: Add 'sustainable_power' for CPU thermal zones | expand |
On Thu, Aug 13, 2020 at 11:30:33AM -0700, Matthias Kaehlcke wrote: > The 'sustainable_power' attribute provides an estimate of the sustained > power that can be dissipated at the desired control temperature. One > could argue that this value is not necessarily the same for all devices > with the same SoC, which may have different form factors or thermal > designs. However there are reasons to specify a (default) value at SoC > level for SC7180: most importantly, if no value is specified at all the > power_allocator thermal governor (aka 'IPA') estimates a value, using the > minimum power of all cooling devices of the zone, which can result in > overly aggressive thermal throttling. For most devices an approximate > conservative value should be more useful than the minimum guesstimate > of power_allocator. Devices that need a different value can overwrite > it in their <device>.dts. Also the thermal zones for SC7180 have a high > level of granularity (essentially one for each function block), which > makes it more likely that the default value just works for many devices. > > The values correspond to 1901 MHz for the big cores, and 1804 MHz for > the small cores. The values were determined by limiting the CPU > frequencies to different max values and launching a bunch of processes > that cause high CPU load ('while true; do true; done &' is simple and > does a good job). A frequency is deemed sustainable if the CPU > temperatures don't rise (consistently) above the second trip point > ('control temperature', 95 degC in this case). Once the highest > sustainable frequency is found, the sustainable power can be calculated > by multiplying the energy consumption per core at this frequency (which > can be found in /sys/kernel/debug/energy_model/) with the number of > cores that are specified as cooling devices. > > The sustainable frequencies were determined at room temperature > on a device without heat sink or other passive cooling elements. > > Signed-off-by: Matthias Kaehlcke <mka@chromium.org> > --- > If maintainers think 'sustainable_power' should be specified at > device level (with which I conceptually agree) I'm fine with > doing that, just seemed it could be useful to have a reasonable > 'default' at SoC level in this case. Any comments on this?
Hi, On Tue, Sep 1, 2020 at 10:07 AM Matthias Kaehlcke <mka@chromium.org> wrote: > > On Thu, Aug 13, 2020 at 11:30:33AM -0700, Matthias Kaehlcke wrote: > > The 'sustainable_power' attribute provides an estimate of the sustained > > power that can be dissipated at the desired control temperature. One > > could argue that this value is not necessarily the same for all devices > > with the same SoC, which may have different form factors or thermal > > designs. However there are reasons to specify a (default) value at SoC > > level for SC7180: most importantly, if no value is specified at all the > > power_allocator thermal governor (aka 'IPA') estimates a value, using the > > minimum power of all cooling devices of the zone, which can result in > > overly aggressive thermal throttling. For most devices an approximate > > conservative value should be more useful than the minimum guesstimate > > of power_allocator. Devices that need a different value can overwrite > > it in their <device>.dts. Also the thermal zones for SC7180 have a high > > level of granularity (essentially one for each function block), which > > makes it more likely that the default value just works for many devices. > > > > The values correspond to 1901 MHz for the big cores, and 1804 MHz for > > the small cores. The values were determined by limiting the CPU > > frequencies to different max values and launching a bunch of processes > > that cause high CPU load ('while true; do true; done &' is simple and > > does a good job). A frequency is deemed sustainable if the CPU > > temperatures don't rise (consistently) above the second trip point > > ('control temperature', 95 degC in this case). Once the highest > > sustainable frequency is found, the sustainable power can be calculated > > by multiplying the energy consumption per core at this frequency (which > > can be found in /sys/kernel/debug/energy_model/) with the number of > > cores that are specified as cooling devices. > > > > The sustainable frequencies were determined at room temperature > > on a device without heat sink or other passive cooling elements. I'm curious: was this a bare board, or a device in a case? Hrm, I'm not sure which one would be worse at heat dissipation, but I would imagine that being inside a plastic case might be worse? > > Signed-off-by: Matthias Kaehlcke <mka@chromium.org> > > --- > > If maintainers think 'sustainable_power' should be specified at > > device level (with which I conceptually agree) I'm fine with > > doing that, just seemed it could be useful to have a reasonable > > 'default' at SoC level in this case. > > Any comments on this? I'm not massively familiar with this area of the code, but I guess I shouldn't let that stop me from having an opinion! :-P * I would agree that it seems highly unlikely that someone would put one of these chips in a device that could only dissipate the heat from the lowest OPP, so having some higher estimate definitely makes sense. * In terms of the numbers here, I believe that you're claiming that we can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power. My memory of how much power we could dissipate in previous laptops I worked on is a little fuzzy, but that doesn't seem insane for a passively-cooled laptop. However, I think someone could conceivably put this chip in a smaller form factor. In such a case, it seems like we'd want these things to sum up to ~2000 (if it would ever make sense for someone to put this chip in a phone) or ~4000 (if it would ever make sense for someone to put this chip in a small tablet). It seems possible that, to achieve this, we might have to tweak the "dynamic-power-coefficient". I don't know how much thought was put into those numbers, but the fact that the little cores have a super round 100 for their dynamic-power-coefficient makes me feel like they might have been more schwags than anything. Rajendra maybe knows? * I'm curious about the fact that there are two numbers here: one for littles and one for bigs. If I had to guess I'd say that since all the cores are in one package so the contributions kinda need to be thought of together, right? If we're sitting there thermally throttled then we'd want to pick the best perf-per-watt for the overall package. This is why your patch says we can sustain the little cores at max and the big cores get whatever is left over, right? * Should we be leaving some room in here for the GPU? ...or I guess once we list it as a cooling device we'll have to decrease the amount the CPUs can use? So I guess the tl; dr is: a) We should check "dynamic-power-coefficient" and possibly adjust. b) I don't think the "conservative" by-default numbers should add up to 7 Watts. I could be convinced that this chip is not intended for phones and thus we could have it add up to 4 Watts, but 7 Watts seems too much. -Doug
Hi Doug, On Tue, Sep 01, 2020 at 01:19:10PM -0700, Doug Anderson wrote: > Hi, > > On Tue, Sep 1, 2020 at 10:07 AM Matthias Kaehlcke <mka@chromium.org> wrote: > > > > On Thu, Aug 13, 2020 at 11:30:33AM -0700, Matthias Kaehlcke wrote: > > > The 'sustainable_power' attribute provides an estimate of the sustained > > > power that can be dissipated at the desired control temperature. One > > > could argue that this value is not necessarily the same for all devices > > > with the same SoC, which may have different form factors or thermal > > > designs. However there are reasons to specify a (default) value at SoC > > > level for SC7180: most importantly, if no value is specified at all the > > > power_allocator thermal governor (aka 'IPA') estimates a value, using the > > > minimum power of all cooling devices of the zone, which can result in > > > overly aggressive thermal throttling. For most devices an approximate > > > conservative value should be more useful than the minimum guesstimate > > > of power_allocator. Devices that need a different value can overwrite > > > it in their <device>.dts. Also the thermal zones for SC7180 have a high > > > level of granularity (essentially one for each function block), which > > > makes it more likely that the default value just works for many devices. > > > > > > The values correspond to 1901 MHz for the big cores, and 1804 MHz for > > > the small cores. The values were determined by limiting the CPU > > > frequencies to different max values and launching a bunch of processes > > > that cause high CPU load ('while true; do true; done &' is simple and > > > does a good job). A frequency is deemed sustainable if the CPU > > > temperatures don't rise (consistently) above the second trip point > > > ('control temperature', 95 degC in this case). Once the highest > > > sustainable frequency is found, the sustainable power can be calculated > > > by multiplying the energy consumption per core at this frequency (which > > > can be found in /sys/kernel/debug/energy_model/) with the number of > > > cores that are specified as cooling devices. > > > > > > The sustainable frequencies were determined at room temperature > > > on a device without heat sink or other passive cooling elements. > > I'm curious: was this a bare board, or a device in a case? Hrm, I'm > not sure which one would be worse at heat dissipation, but I would > imagine that being inside a plastic case might be worse? This was with a device in a plastic case. > > > Signed-off-by: Matthias Kaehlcke <mka@chromium.org> > > > --- > > > If maintainers think 'sustainable_power' should be specified at > > > device level (with which I conceptually agree) I'm fine with > > > doing that, just seemed it could be useful to have a reasonable > > > 'default' at SoC level in this case. > > > > Any comments on this? > > I'm not massively familiar with this area of the code, but I guess I > shouldn't let that stop me from having an opinion! :-P > > * I would agree that it seems highly unlikely that someone would put > one of these chips in a device that could only dissipate the heat from > the lowest OPP, so having some higher estimate definitely makes sense. > > * In terms of the numbers here, I believe that you're claiming that we > can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power. No, I'm claiming it's 768 mW + 1202 mW = ~2 W. SC7180 has a 6 thermal zones for the 6 little cores and 4 zones for the 2 big cores. Each of these thermal zones uses either all little or all big cores as cooling devices, hence the power sustainable power of the individual zones doesn't add up. 768 mW corresponds to 6x 128 mW (aka all little cores at 1.8 GHz), and 1202 mW to 2x 601 mW (both big cores at 1.9 GHz). > My memory > of how much power we could dissipate in previous laptops I worked on > is a little fuzzy, but that doesn't seem insane for a passively-cooled > laptop. However, I think someone could conceivably put this chip in a > smaller form factor. In such a case, it seems like we'd want these > things to sum up to ~2000 (if it would ever make sense for someone to > put this chip in a phone) or ~4000 (if it would ever make sense for > someone to put this chip in a small tablet). See above, the sustainable power with this patch only adds up to ~2000. It is possible though that it would be lower in a smaller form factor device. I'd be ok with posting something lower for SC7180 (it would be a guess though) and use the specific numbers in the device specific DT. > It seems possible that, > to achieve this, we might have to tweak the > "dynamic-power-coefficient". I don't know how much thought was put > into those numbers, but the fact that the little cores have a super > round 100 for their dynamic-power-coefficient makes me feel like they > might have been more schwags than anything. Rajendra maybe knows? Yeah, it's possible that that was just an approximation > * I'm curious about the fact that there are two numbers here: one for > littles and one for bigs. If I had to guess I'd say that since all > the cores are in one package so the contributions kinda need to be > thought of together, right? If we're sitting there thermally > throttled then we'd want to pick the best perf-per-watt for the > overall package. This is why your patch says we can sustain the > little cores at max and the big cores get whatever is left over, > right? It's derived from how Qualcomm specified the thermal zones and cooling devices. Any ("cpu") zone is either cooled by (all) big cores or by (all) little cores, but not a mix of them. In my tests I also saw that the big cores seemed to have little impact on the little ones. The little cores are at max because even running at max frequency the temperature in the 'little zones' wouldn't come close to the trip point. > * Should we be leaving some room in here for the GPU? ...or I guess > once we list it as a cooling device we'll have to decrease the amount > the CPUs can use? I don't know for sure, but judging from the CPU zones I wouldn't be surprised if the GPU was managed exclusively in the dedicated GPU thermal zones (I guess that's what 'gpuss0-thermal' and 'gpuss1-thermal' are). If that's not the case the values in the CPU zones can be adjusted when specific data is available. > So I guess the tl; dr is: > > a) We should check "dynamic-power-coefficient" and possibly adjust. ok, lets see if Rajendra can check if there is room for tweaking. > b) I don't think the "conservative" by-default numbers should add up > to 7 Watts. I could be convinced that this chip is not intended for > phones and thus we could have it add up to 4 Watts, but 7 Watts seems > too much. I suppose this is mostly addressed by my explications above, unless we think that 2 Watts in CPU power might still be too aggressive as a default. Thanks m.
Hi, On Tue, Sep 1, 2020 at 2:33 PM Matthias Kaehlcke <mka@chromium.org> wrote: > > Hi Doug, > > On Tue, Sep 01, 2020 at 01:19:10PM -0700, Doug Anderson wrote: > > Hi, > > > > On Tue, Sep 1, 2020 at 10:07 AM Matthias Kaehlcke <mka@chromium.org> wrote: > > > > > > On Thu, Aug 13, 2020 at 11:30:33AM -0700, Matthias Kaehlcke wrote: > > > > The 'sustainable_power' attribute provides an estimate of the sustained > > > > power that can be dissipated at the desired control temperature. One > > > > could argue that this value is not necessarily the same for all devices > > > > with the same SoC, which may have different form factors or thermal > > > > designs. However there are reasons to specify a (default) value at SoC > > > > level for SC7180: most importantly, if no value is specified at all the > > > > power_allocator thermal governor (aka 'IPA') estimates a value, using the > > > > minimum power of all cooling devices of the zone, which can result in > > > > overly aggressive thermal throttling. For most devices an approximate > > > > conservative value should be more useful than the minimum guesstimate > > > > of power_allocator. Devices that need a different value can overwrite > > > > it in their <device>.dts. Also the thermal zones for SC7180 have a high > > > > level of granularity (essentially one for each function block), which > > > > makes it more likely that the default value just works for many devices. > > > > > > > > The values correspond to 1901 MHz for the big cores, and 1804 MHz for > > > > the small cores. The values were determined by limiting the CPU > > > > frequencies to different max values and launching a bunch of processes > > > > that cause high CPU load ('while true; do true; done &' is simple and > > > > does a good job). A frequency is deemed sustainable if the CPU > > > > temperatures don't rise (consistently) above the second trip point > > > > ('control temperature', 95 degC in this case). Once the highest > > > > sustainable frequency is found, the sustainable power can be calculated > > > > by multiplying the energy consumption per core at this frequency (which > > > > can be found in /sys/kernel/debug/energy_model/) with the number of > > > > cores that are specified as cooling devices. > > > > > > > > The sustainable frequencies were determined at room temperature > > > > on a device without heat sink or other passive cooling elements. > > > > I'm curious: was this a bare board, or a device in a case? Hrm, I'm > > not sure which one would be worse at heat dissipation, but I would > > imagine that being inside a plastic case might be worse? > > This was with a device in a plastic case. > > > > > Signed-off-by: Matthias Kaehlcke <mka@chromium.org> > > > > --- > > > > If maintainers think 'sustainable_power' should be specified at > > > > device level (with which I conceptually agree) I'm fine with > > > > doing that, just seemed it could be useful to have a reasonable > > > > 'default' at SoC level in this case. > > > > > > Any comments on this? > > > > I'm not massively familiar with this area of the code, but I guess I > > shouldn't let that stop me from having an opinion! :-P > > > > * I would agree that it seems highly unlikely that someone would put > > one of these chips in a device that could only dissipate the heat from > > the lowest OPP, so having some higher estimate definitely makes sense. > > > > * In terms of the numbers here, I believe that you're claiming that we > > can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power. > > No, I'm claiming it's 768 mW + 1202 mW = ~2 W. > > SC7180 has a 6 thermal zones for the 6 little cores and 4 zones for the > 2 big cores. Each of these thermal zones uses either all little or all big > cores as cooling devices, hence the power sustainable power of the > individual zones doesn't add up. 768 mW corresponds to 6x 128 mW (aka all > little cores at 1.8 GHz), and 1202 mW to 2x 601 mW (both big cores at 1.9 GHz). Ah! Thanks for explaining. > > My memory > > of how much power we could dissipate in previous laptops I worked on > > is a little fuzzy, but that doesn't seem insane for a passively-cooled > > laptop. However, I think someone could conceivably put this chip in a > > smaller form factor. In such a case, it seems like we'd want these > > things to sum up to ~2000 (if it would ever make sense for someone to > > put this chip in a phone) or ~4000 (if it would ever make sense for > > someone to put this chip in a small tablet). > > See above, the sustainable power with this patch only adds up to ~2000. > It is possible though that it would be lower in a smaller form factor > device. > > I'd be ok with posting something lower for SC7180 (it would be a guess > though) and use the specific numbers in the device specific DT. Given the advice in the bindings it seems like 2W should be fine. > > It seems possible that, > > to achieve this, we might have to tweak the > > "dynamic-power-coefficient". I don't know how much thought was put > > into those numbers, but the fact that the little cores have a super > > round 100 for their dynamic-power-coefficient makes me feel like they > > might have been more schwags than anything. Rajendra maybe knows? > > Yeah, it's possible that that was just an approximation > > > * I'm curious about the fact that there are two numbers here: one for > > littles and one for bigs. If I had to guess I'd say that since all > > the cores are in one package so the contributions kinda need to be > > thought of together, right? If we're sitting there thermally > > throttled then we'd want to pick the best perf-per-watt for the > > overall package. This is why your patch says we can sustain the > > little cores at max and the big cores get whatever is left over, > > right? > > It's derived from how Qualcomm specified the thermal zones and cooling > devices. Any ("cpu") zone is either cooled by (all) big cores or by (all) > little cores, but not a mix of them. In my tests I also saw that the big > cores seemed to have little impact on the little ones. The little cores > are at max because even running at max frequency the temperature in the > 'little zones' wouldn't come close to the trip point. OK, crazy. I suppose that this makes sense,especially without a heatsink and over a short burst of time. I'd imagine that with a heatsink things might look different, but trying to model everything is impossible and seems like what't there works OK until someone can say why it doesn't. :-) > > * Should we be leaving some room in here for the GPU? ...or I guess > > once we list it as a cooling device we'll have to decrease the amount > > the CPUs can use? > > I don't know for sure, but judging from the CPU zones I wouldn't be > surprised if the GPU was managed exclusively in the dedicated GPU > thermal zones (I guess that's what 'gpuss0-thermal' and 'gpuss1-thermal' > are). If that's not the case the values in the CPU zones can be > adjusted when specific data is available. Sounds good. > > So I guess the tl; dr is: > > > > a) We should check "dynamic-power-coefficient" and possibly adjust. > > ok, lets see if Rajendra can check if there is room for tweaking. > > > b) I don't think the "conservative" by-default numbers should add up > > to 7 Watts. I could be convinced that this chip is not intended for > > phones and thus we could have it add up to 4 Watts, but 7 Watts seems > > too much. > > I suppose this is mostly addressed by my explications above, unless we > think that 2 Watts in CPU power might still be too aggressive as a > default. With all your explanations, I'm happy to add: Reviewed-by: Douglas Anderson <dianders@chromium.org> -Doug
On Tue, Sep 01, 2020 at 03:45:52PM -0700, Doug Anderson wrote: > Hi, > > On Tue, Sep 1, 2020 at 2:33 PM Matthias Kaehlcke <mka@chromium.org> wrote: > > > > Hi Doug, > > > > On Tue, Sep 01, 2020 at 01:19:10PM -0700, Doug Anderson wrote: > > > Hi, > > > > > > On Tue, Sep 1, 2020 at 10:07 AM Matthias Kaehlcke <mka@chromium.org> wrote: > > > > > > > > On Thu, Aug 13, 2020 at 11:30:33AM -0700, Matthias Kaehlcke wrote: > > > > > The 'sustainable_power' attribute provides an estimate of the sustained > > > > > power that can be dissipated at the desired control temperature. One > > > > > could argue that this value is not necessarily the same for all devices > > > > > with the same SoC, which may have different form factors or thermal > > > > > designs. However there are reasons to specify a (default) value at SoC > > > > > level for SC7180: most importantly, if no value is specified at all the > > > > > power_allocator thermal governor (aka 'IPA') estimates a value, using the > > > > > minimum power of all cooling devices of the zone, which can result in > > > > > overly aggressive thermal throttling. For most devices an approximate > > > > > conservative value should be more useful than the minimum guesstimate > > > > > of power_allocator. Devices that need a different value can overwrite > > > > > it in their <device>.dts. Also the thermal zones for SC7180 have a high > > > > > level of granularity (essentially one for each function block), which > > > > > makes it more likely that the default value just works for many devices. > > > > > > > > > > The values correspond to 1901 MHz for the big cores, and 1804 MHz for > > > > > the small cores. The values were determined by limiting the CPU > > > > > frequencies to different max values and launching a bunch of processes > > > > > that cause high CPU load ('while true; do true; done &' is simple and > > > > > does a good job). A frequency is deemed sustainable if the CPU > > > > > temperatures don't rise (consistently) above the second trip point > > > > > ('control temperature', 95 degC in this case). Once the highest > > > > > sustainable frequency is found, the sustainable power can be calculated > > > > > by multiplying the energy consumption per core at this frequency (which > > > > > can be found in /sys/kernel/debug/energy_model/) with the number of > > > > > cores that are specified as cooling devices. > > > > > > > > > > The sustainable frequencies were determined at room temperature > > > > > on a device without heat sink or other passive cooling elements. > > > > > > I'm curious: was this a bare board, or a device in a case? Hrm, I'm > > > not sure which one would be worse at heat dissipation, but I would > > > imagine that being inside a plastic case might be worse? > > > > This was with a device in a plastic case. > > > > > > > Signed-off-by: Matthias Kaehlcke <mka@chromium.org> > > > > > --- > > > > > If maintainers think 'sustainable_power' should be specified at > > > > > device level (with which I conceptually agree) I'm fine with > > > > > doing that, just seemed it could be useful to have a reasonable > > > > > 'default' at SoC level in this case. > > > > > > > > Any comments on this? > > > > > > I'm not massively familiar with this area of the code, but I guess I > > > shouldn't let that stop me from having an opinion! :-P > > > > > > * I would agree that it seems highly unlikely that someone would put > > > one of these chips in a device that could only dissipate the heat from > > > the lowest OPP, so having some higher estimate definitely makes sense. > > > > > > * In terms of the numbers here, I believe that you're claiming that we > > > can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power. > > > > No, I'm claiming it's 768 mW + 1202 mW = ~2 W. > > > > SC7180 has a 6 thermal zones for the 6 little cores and 4 zones for the > > 2 big cores. Each of these thermal zones uses either all little or all big > > cores as cooling devices, hence the power sustainable power of the > > individual zones doesn't add up. 768 mW corresponds to 6x 128 mW (aka all > > little cores at 1.8 GHz), and 1202 mW to 2x 601 mW (both big cores at 1.9 GHz). > > Ah! Thanks for explaining. > > > > > My memory > > > of how much power we could dissipate in previous laptops I worked on > > > is a little fuzzy, but that doesn't seem insane for a passively-cooled > > > laptop. However, I think someone could conceivably put this chip in a > > > smaller form factor. In such a case, it seems like we'd want these > > > things to sum up to ~2000 (if it would ever make sense for someone to > > > put this chip in a phone) or ~4000 (if it would ever make sense for > > > someone to put this chip in a small tablet). > > > > See above, the sustainable power with this patch only adds up to ~2000. > > It is possible though that it would be lower in a smaller form factor > > device. > > > > I'd be ok with posting something lower for SC7180 (it would be a guess > > though) and use the specific numbers in the device specific DT. > > Given the advice in the bindings it seems like 2W should be fine. > > > > > It seems possible that, > > > to achieve this, we might have to tweak the > > > "dynamic-power-coefficient". I don't know how much thought was put > > > into those numbers, but the fact that the little cores have a super > > > round 100 for their dynamic-power-coefficient makes me feel like they > > > might have been more schwags than anything. Rajendra maybe knows? > > > > Yeah, it's possible that that was just an approximation > > > > > * I'm curious about the fact that there are two numbers here: one for > > > littles and one for bigs. If I had to guess I'd say that since all > > > the cores are in one package so the contributions kinda need to be > > > thought of together, right? If we're sitting there thermally > > > throttled then we'd want to pick the best perf-per-watt for the > > > overall package. This is why your patch says we can sustain the > > > little cores at max and the big cores get whatever is left over, > > > right? > > > > It's derived from how Qualcomm specified the thermal zones and cooling > > devices. Any ("cpu") zone is either cooled by (all) big cores or by (all) > > little cores, but not a mix of them. In my tests I also saw that the big > > cores seemed to have little impact on the little ones. The little cores > > are at max because even running at max frequency the temperature in the > > 'little zones' wouldn't come close to the trip point. > > OK, crazy. I suppose that this makes sense,especially without a > heatsink and over a short burst of time. I'd imagine that with a > heatsink things might look different, but trying to model everything > is impossible and seems like what't there works OK until someone can > say why it doesn't. :-) Exactly, they will likely look different, but since we want conservative numbers for the generic case it's ok/good to use numbers from a configuration without heatsink. And 1.9 GHz doesn't even seem horribly slow for a device with a heatsink. When we have such a device we can evaluate whether there are significant gains from tweaking the values for the big cores in the device specific DT. > > > * Should we be leaving some room in here for the GPU? ...or I guess > > > once we list it as a cooling device we'll have to decrease the amount > > > the CPUs can use? > > > > I don't know for sure, but judging from the CPU zones I wouldn't be > > surprised if the GPU was managed exclusively in the dedicated GPU > > thermal zones (I guess that's what 'gpuss0-thermal' and 'gpuss1-thermal' > > are). If that's not the case the values in the CPU zones can be > > adjusted when specific data is available. > > Sounds good. > > > > > So I guess the tl; dr is: > > > > > > a) We should check "dynamic-power-coefficient" and possibly adjust. > > > > ok, lets see if Rajendra can check if there is room for tweaking. > > > > > b) I don't think the "conservative" by-default numbers should add up > > > to 7 Watts. I could be convinced that this chip is not intended for > > > phones and thus we could have it add up to 4 Watts, but 7 Watts seems > > > too much. > > > > I suppose this is mostly addressed by my explications above, unless we > > think that 2 Watts in CPU power might still be too aggressive as a > > default. > > With all your explanations, I'm happy to add: > > Reviewed-by: Douglas Anderson <dianders@chromium.org> Thanks!
> * In terms of the numbers here, I believe that you're claiming that we > can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power. My memory > of how much power we could dissipate in previous laptops I worked on > is a little fuzzy, but that doesn't seem insane for a passively-cooled > laptop. However, I think someone could conceivably put this chip in a > smaller form factor. In such a case, it seems like we'd want these > things to sum up to ~2000 (if it would ever make sense for someone to > put this chip in a phone) or ~4000 (if it would ever make sense for > someone to put this chip in a small tablet). It seems possible that, > to achieve this, we might have to tweak the > "dynamic-power-coefficient". DPC values are calculated (at a SoC) by actually measuring max power at various frequency/voltage combinations by running things like dhrystone. How would the max power a SoC can generate depend on form factors? How much it can dissipate sure is, but then I am not super familiar how thermal frameworks end up using DPC for calculating power dissipated, I am guessing they don't. > I don't know how much thought was put > into those numbers, but the fact that the little cores have a super > round 100 for their dynamic-power-coefficient makes me feel like they > might have been more schwags than anything. Rajendra maybe knows? FWIK, the values are always scaled and normalized to 100 for silver and then used to derive the relative DPC number for gold. If you see the DPC for silver cores even on sdm845 is a 100. Again these are not estimations but based on actual power measurements.
>> I'm not massively familiar with this area of the code, but I guess I >> shouldn't let that stop me from having an opinion! :-P >> >> * I would agree that it seems highly unlikely that someone would put >> one of these chips in a device that could only dissipate the heat from >> the lowest OPP, so having some higher estimate definitely makes sense. >> >> * In terms of the numbers here, I believe that you're claiming that we >> can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power. > > No, I'm claiming it's 768 mW + 1202 mW = ~2 W. > > SC7180 has a 6 thermal zones for the 6 little cores and 4 zones for the > 2 big cores. Each of these thermal zones uses either all little or all big > cores as cooling devices, hence the power sustainable power of the > individual zones doesn't add up. 768 mW corresponds to 6x 128 mW (aka all > little cores at 1.8 GHz), and 1202 mW to 2x 601 mW (both big cores at 1.9 GHz). > >> My memory >> of how much power we could dissipate in previous laptops I worked on >> is a little fuzzy, but that doesn't seem insane for a passively-cooled >> laptop. However, I think someone could conceivably put this chip in a >> smaller form factor. In such a case, it seems like we'd want these >> things to sum up to ~2000 (if it would ever make sense for someone to >> put this chip in a phone) or ~4000 (if it would ever make sense for >> someone to put this chip in a small tablet). > > See above, the sustainable power with this patch only adds up to ~2000. > It is possible though that it would be lower in a smaller form factor > device. > > I'd be ok with posting something lower for SC7180 (it would be a guess > though) and use the specific numbers in the device specific DT. > >> It seems possible that, >> to achieve this, we might have to tweak the >> "dynamic-power-coefficient". I don't know how much thought was put >> into those numbers, but the fact that the little cores have a super >> round 100 for their dynamic-power-coefficient makes me feel like they >> might have been more schwags than anything. Rajendra maybe knows? > > Yeah, it's possible that that was just an approximation No, these are based on actual power measurements. > >> * I'm curious about the fact that there are two numbers here: one for >> littles and one for bigs. If I had to guess I'd say that since all >> the cores are in one package so the contributions kinda need to be >> thought of together, right? If we're sitting there thermally >> throttled then we'd want to pick the best perf-per-watt for the >> overall package. This is why your patch says we can sustain the >> little cores at max and the big cores get whatever is left over, >> right? > > It's derived from how Qualcomm specified the thermal zones and cooling > devices. Any ("cpu") zone is either cooled by (all) big cores or by (all) > little cores, but not a mix of them. In my tests I also saw that the big > cores seemed to have little impact on the little ones. The little cores > are at max because even running at max frequency the temperature in the > 'little zones' wouldn't come close to the trip point. > >> * Should we be leaving some room in here for the GPU? ...or I guess >> once we list it as a cooling device we'll have to decrease the amount >> the CPUs can use? > > I don't know for sure, but judging from the CPU zones I wouldn't be > surprised if the GPU was managed exclusively in the dedicated GPU > thermal zones (I guess that's what 'gpuss0-thermal' and 'gpuss1-thermal' > are). If that's not the case the values in the CPU zones can be > adjusted when specific data is available. > >> So I guess the tl; dr is: >> >> a) We should check "dynamic-power-coefficient" and possibly adjust. > > ok, lets see if Rajendra can check if there is room for tweaking. I suggest we don't :)
Hi, On Tue, Sep 1, 2020 at 10:36 PM Rajendra Nayak <rnayak@codeaurora.org> wrote: > > > > * In terms of the numbers here, I believe that you're claiming that we > > can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power. My memory > > of how much power we could dissipate in previous laptops I worked on > > is a little fuzzy, but that doesn't seem insane for a passively-cooled > > laptop. However, I think someone could conceivably put this chip in a > > smaller form factor. In such a case, it seems like we'd want these > > things to sum up to ~2000 (if it would ever make sense for someone to > > put this chip in a phone) or ~4000 (if it would ever make sense for > > someone to put this chip in a small tablet). It seems possible that, > > to achieve this, we might have to tweak the > > "dynamic-power-coefficient". > > DPC values are calculated (at a SoC) by actually measuring max power at various > frequency/voltage combinations by running things like dhrystone. > How would the max power a SoC can generate depend on form factors? > How much it can dissipate sure is, but then I am not super familiar how > thermal frameworks end up using DPC for calculating power dissipated, > I am guessing they don't. > > > I don't know how much thought was put > > into those numbers, but the fact that the little cores have a super > > round 100 for their dynamic-power-coefficient makes me feel like they > > might have been more schwags than anything. Rajendra maybe knows? > > FWIK, the values are always scaled and normalized to 100 for silver and > then used to derive the relative DPC number for gold. If you see the DPC > for silver cores even on sdm845 is a 100. > Again these are not estimations but based on actual power measurements. The scaling to 100 doesn't seem to match how the thermal framework is using them. Take a look at of_cpufreq_cooling_register(). It takes the "dynamic-power-coefficient" and passes it as "capacitance" into __cpufreq_cooling_register(). That's eventually used to compute power, which is documented in the code to be in mW. power = (u64)capacitance * freq_mhz * voltage_mv * voltage_mv; do_div(power, 1000000000); /* power is stored in mW */ freq_table[i].power = power; That's used together with "sustainable-power", which is the attribute that Matthias is trying to set. That value is documented to be in mW as well. ...so if the silver cores are always scaled to 100 regardless of how much power they actually draw then it'll be impossible to actually think about "sustainable-power" as a mW value. Presumably we either need to accept that fact (and ideally document it) or we need to change the values for silver / gold cores (we could still keep the relative values the same and just scale them). -Doug
On 9/2/2020 9:02 PM, Doug Anderson wrote: > Hi, > > On Tue, Sep 1, 2020 at 10:36 PM Rajendra Nayak <rnayak@codeaurora.org> wrote: >> >> >>> * In terms of the numbers here, I believe that you're claiming that we >>> can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power. My memory >>> of how much power we could dissipate in previous laptops I worked on >>> is a little fuzzy, but that doesn't seem insane for a passively-cooled >>> laptop. However, I think someone could conceivably put this chip in a >>> smaller form factor. In such a case, it seems like we'd want these >>> things to sum up to ~2000 (if it would ever make sense for someone to >>> put this chip in a phone) or ~4000 (if it would ever make sense for >>> someone to put this chip in a small tablet). It seems possible that, >>> to achieve this, we might have to tweak the >>> "dynamic-power-coefficient". >> >> DPC values are calculated (at a SoC) by actually measuring max power at various >> frequency/voltage combinations by running things like dhrystone. >> How would the max power a SoC can generate depend on form factors? >> How much it can dissipate sure is, but then I am not super familiar how >> thermal frameworks end up using DPC for calculating power dissipated, >> I am guessing they don't. >> >>> I don't know how much thought was put >>> into those numbers, but the fact that the little cores have a super >>> round 100 for their dynamic-power-coefficient makes me feel like they >>> might have been more schwags than anything. Rajendra maybe knows? >> >> FWIK, the values are always scaled and normalized to 100 for silver and >> then used to derive the relative DPC number for gold. If you see the DPC >> for silver cores even on sdm845 is a 100. >> Again these are not estimations but based on actual power measurements. > > The scaling to 100 doesn't seem to match how the thermal framework is > using them. Take a look at of_cpufreq_cooling_register(). It takes > the "dynamic-power-coefficient" and passes it as "capacitance" into > __cpufreq_cooling_register(). That's eventually used to compute > power, which is documented in the code to be in mW. > > power = (u64)capacitance * freq_mhz * voltage_mv * voltage_mv; > do_div(power, 1000000000); > > /* power is stored in mW */ > freq_table[i].power = power; > > That's used together with "sustainable-power", which is the attribute > that Matthias is trying to set. That value is documented to be in mW > as well. > > ...so if the silver cores are always scaled to 100 regardless of how > much power they actually draw then it'll be impossible to actually > think about "sustainable-power" as a mW value. Presumably we either > need to accept that fact (and ideally document it) or we need to > change the values for silver / gold cores (we could still keep the > relative values the same and just scale them). That sounds reasonable (still keep the relative values and scale them) I'll get back on what those scaled numbers would look like, and try to get some sense of why this scaling to 100 was done (like you said I don't see any documentation on this), but I see atleast a few other non-qcom SoCs doing this too in mainline (like rockchip/rk3399)
On 9/3/2020 10:14 AM, Rajendra Nayak wrote: > > On 9/2/2020 9:02 PM, Doug Anderson wrote: >> Hi, >> >> On Tue, Sep 1, 2020 at 10:36 PM Rajendra Nayak <rnayak@codeaurora.org> wrote: >>> >>> >>>> * In terms of the numbers here, I believe that you're claiming that we >>>> can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power. My memory >>>> of how much power we could dissipate in previous laptops I worked on >>>> is a little fuzzy, but that doesn't seem insane for a passively-cooled >>>> laptop. However, I think someone could conceivably put this chip in a >>>> smaller form factor. In such a case, it seems like we'd want these >>>> things to sum up to ~2000 (if it would ever make sense for someone to >>>> put this chip in a phone) or ~4000 (if it would ever make sense for >>>> someone to put this chip in a small tablet). It seems possible that, >>>> to achieve this, we might have to tweak the >>>> "dynamic-power-coefficient". >>> >>> DPC values are calculated (at a SoC) by actually measuring max power at various >>> frequency/voltage combinations by running things like dhrystone. >>> How would the max power a SoC can generate depend on form factors? >>> How much it can dissipate sure is, but then I am not super familiar how >>> thermal frameworks end up using DPC for calculating power dissipated, >>> I am guessing they don't. >>> >>>> I don't know how much thought was put >>>> into those numbers, but the fact that the little cores have a super >>>> round 100 for their dynamic-power-coefficient makes me feel like they >>>> might have been more schwags than anything. Rajendra maybe knows? >>> >>> FWIK, the values are always scaled and normalized to 100 for silver and >>> then used to derive the relative DPC number for gold. If you see the DPC >>> for silver cores even on sdm845 is a 100. >>> Again these are not estimations but based on actual power measurements. >> >> The scaling to 100 doesn't seem to match how the thermal framework is >> using them. Take a look at of_cpufreq_cooling_register(). It takes >> the "dynamic-power-coefficient" and passes it as "capacitance" into >> __cpufreq_cooling_register(). That's eventually used to compute >> power, which is documented in the code to be in mW. >> >> power = (u64)capacitance * freq_mhz * voltage_mv * voltage_mv; >> do_div(power, 1000000000); >> >> /* power is stored in mW */ >> freq_table[i].power = power; >> >> That's used together with "sustainable-power", which is the attribute >> that Matthias is trying to set. That value is documented to be in mW >> as well. >> >> ...so if the silver cores are always scaled to 100 regardless of how >> much power they actually draw then it'll be impossible to actually >> think about "sustainable-power" as a mW value. Presumably we either >> need to accept that fact (and ideally document it) or we need to >> change the values for silver / gold cores (we could still keep the >> relative values the same and just scale them). > > That sounds reasonable (still keep the relative values and scale them) > I'll get back on what those scaled numbers would look like, and try to > get some sense of why this scaling to 100 was done (like you said > I don't see any documentation on this), but I see atleast a few other non-qcom > SoCs doing this too in mainline (like rockchip/rk3399) On second thoughts, why wouldn't a relative 'sustainable-power' value work? On every device, one would need to do the exercise that Matthias did to come up with the OPP at which we can sustain max CPU/GPU loads anyway. I mean even if we do change the DPC values to match actual power, Matthias would still observe that we can sustain at the very same OPP and not any different. Its just that the mW values that are passed to kernel are relative and not absolute. My worry is that perhaps no SoC vendor wants to put these absolute numbers out.
Hi Rajendra, On Thu, Sep 03, 2020 at 11:00:52AM +0530, Rajendra Nayak wrote: > > On 9/3/2020 10:14 AM, Rajendra Nayak wrote: > > > > On 9/2/2020 9:02 PM, Doug Anderson wrote: > > > Hi, > > > > > > On Tue, Sep 1, 2020 at 10:36 PM Rajendra Nayak <rnayak@codeaurora.org> wrote: > > > > > > > > > > > > > * In terms of the numbers here, I believe that you're claiming that we > > > > > can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power. My memory > > > > > of how much power we could dissipate in previous laptops I worked on > > > > > is a little fuzzy, but that doesn't seem insane for a passively-cooled > > > > > laptop. However, I think someone could conceivably put this chip in a > > > > > smaller form factor. In such a case, it seems like we'd want these > > > > > things to sum up to ~2000 (if it would ever make sense for someone to > > > > > put this chip in a phone) or ~4000 (if it would ever make sense for > > > > > someone to put this chip in a small tablet). It seems possible that, > > > > > to achieve this, we might have to tweak the > > > > > "dynamic-power-coefficient". > > > > > > > > DPC values are calculated (at a SoC) by actually measuring max power at various > > > > frequency/voltage combinations by running things like dhrystone. > > > > How would the max power a SoC can generate depend on form factors? > > > > How much it can dissipate sure is, but then I am not super familiar how > > > > thermal frameworks end up using DPC for calculating power dissipated, > > > > I am guessing they don't. > > > > > > > > > I don't know how much thought was put > > > > > into those numbers, but the fact that the little cores have a super > > > > > round 100 for their dynamic-power-coefficient makes me feel like they > > > > > might have been more schwags than anything. Rajendra maybe knows? > > > > > > > > FWIK, the values are always scaled and normalized to 100 for silver and > > > > then used to derive the relative DPC number for gold. If you see the DPC > > > > for silver cores even on sdm845 is a 100. > > > > Again these are not estimations but based on actual power measurements. > > > > > > The scaling to 100 doesn't seem to match how the thermal framework is > > > using them. Take a look at of_cpufreq_cooling_register(). It takes > > > the "dynamic-power-coefficient" and passes it as "capacitance" into > > > __cpufreq_cooling_register(). That's eventually used to compute > > > power, which is documented in the code to be in mW. > > > > > > power = (u64)capacitance * freq_mhz * voltage_mv * voltage_mv; > > > do_div(power, 1000000000); > > > > > > /* power is stored in mW */ > > > freq_table[i].power = power; > > > > > > That's used together with "sustainable-power", which is the attribute > > > that Matthias is trying to set. That value is documented to be in mW > > > as well. > > > > > > ...so if the silver cores are always scaled to 100 regardless of how > > > much power they actually draw then it'll be impossible to actually > > > think about "sustainable-power" as a mW value. Presumably we either > > > need to accept that fact (and ideally document it) or we need to > > > change the values for silver / gold cores (we could still keep the > > > relative values the same and just scale them). > > > > That sounds reasonable (still keep the relative values and scale them) > > I'll get back on what those scaled numbers would look like, and try to > > get some sense of why this scaling to 100 was done (like you said > > I don't see any documentation on this), but I see atleast a few other non-qcom > > SoCs doing this too in mainline (like rockchip/rk3399) > > On second thoughts, why wouldn't a relative 'sustainable-power' value work? > On every device, one would need to do the exercise that Matthias did to come > up with the OPP at which we can sustain max CPU/GPU loads anyway. You assume that a thermal zone only has cooling devices of a the same type (or with the same fake unit for power consumption). This falls apart when multiple types are used, which is common. Also sustainable power is only a derived value, the lying already starts in the energy model, which is used by EAS, so a fake unit could cause further problems. > I mean even if we do change the DPC values to match actual power, Matthias would > still observe that we can sustain at the very same OPP and not any different. > Its just that the mW values that are passed to kernel are relative and not > absolute. My worry is that perhaps no SoC vendor wants to put these absolute numbers > out. This is pretty much 'security' by obscurity. It would be relatively easy to measure actual power consumption at different CPU speeds and derive the DPC values from that.
Hi, On Thu, Sep 3, 2020 at 5:17 AM Matthias Kaehlcke <mka@chromium.org> wrote: > > Hi Rajendra, > > On Thu, Sep 03, 2020 at 11:00:52AM +0530, Rajendra Nayak wrote: > > > > On 9/3/2020 10:14 AM, Rajendra Nayak wrote: > > > > > > On 9/2/2020 9:02 PM, Doug Anderson wrote: > > > > Hi, > > > > > > > > On Tue, Sep 1, 2020 at 10:36 PM Rajendra Nayak <rnayak@codeaurora.org> wrote: > > > > > > > > > > > > > > > > * In terms of the numbers here, I believe that you're claiming that we > > > > > > can dissipate 768 mW * 6 + 1202 mW * 2 = ~7 Watts of power. My memory > > > > > > of how much power we could dissipate in previous laptops I worked on > > > > > > is a little fuzzy, but that doesn't seem insane for a passively-cooled > > > > > > laptop. However, I think someone could conceivably put this chip in a > > > > > > smaller form factor. In such a case, it seems like we'd want these > > > > > > things to sum up to ~2000 (if it would ever make sense for someone to > > > > > > put this chip in a phone) or ~4000 (if it would ever make sense for > > > > > > someone to put this chip in a small tablet). It seems possible that, > > > > > > to achieve this, we might have to tweak the > > > > > > "dynamic-power-coefficient". > > > > > > > > > > DPC values are calculated (at a SoC) by actually measuring max power at various > > > > > frequency/voltage combinations by running things like dhrystone. > > > > > How would the max power a SoC can generate depend on form factors? > > > > > How much it can dissipate sure is, but then I am not super familiar how > > > > > thermal frameworks end up using DPC for calculating power dissipated, > > > > > I am guessing they don't. > > > > > > > > > > > I don't know how much thought was put > > > > > > into those numbers, but the fact that the little cores have a super > > > > > > round 100 for their dynamic-power-coefficient makes me feel like they > > > > > > might have been more schwags than anything. Rajendra maybe knows? > > > > > > > > > > FWIK, the values are always scaled and normalized to 100 for silver and > > > > > then used to derive the relative DPC number for gold. If you see the DPC > > > > > for silver cores even on sdm845 is a 100. > > > > > Again these are not estimations but based on actual power measurements. > > > > > > > > The scaling to 100 doesn't seem to match how the thermal framework is > > > > using them. Take a look at of_cpufreq_cooling_register(). It takes > > > > the "dynamic-power-coefficient" and passes it as "capacitance" into > > > > __cpufreq_cooling_register(). That's eventually used to compute > > > > power, which is documented in the code to be in mW. > > > > > > > > power = (u64)capacitance * freq_mhz * voltage_mv * voltage_mv; > > > > do_div(power, 1000000000); > > > > > > > > /* power is stored in mW */ > > > > freq_table[i].power = power; > > > > > > > > That's used together with "sustainable-power", which is the attribute > > > > that Matthias is trying to set. That value is documented to be in mW > > > > as well. > > > > > > > > ...so if the silver cores are always scaled to 100 regardless of how > > > > much power they actually draw then it'll be impossible to actually > > > > think about "sustainable-power" as a mW value. Presumably we either > > > > need to accept that fact (and ideally document it) or we need to > > > > change the values for silver / gold cores (we could still keep the > > > > relative values the same and just scale them). > > > > > > That sounds reasonable (still keep the relative values and scale them) > > > I'll get back on what those scaled numbers would look like, and try to > > > get some sense of why this scaling to 100 was done (like you said > > > I don't see any documentation on this), but I see atleast a few other non-qcom > > > SoCs doing this too in mainline (like rockchip/rk3399) I don't think I was too closely involved in these numbers on rk3399, but as far as I can tell the 100 number came from: https://crrev.com/c/364003 ...interestingly enough the number _wasn't_ scaled to 100 (but was a number close to 100) and then was changed to scale to 100. That makes it seem like 100, though awfully round, was at least based loosely on fact for rk3399. In any case, the devicetree bindings make it pretty clear that this value should be based in reality and not some bogus number. > > On second thoughts, why wouldn't a relative 'sustainable-power' value work? > > On every device, one would need to do the exercise that Matthias did to come > > up with the OPP at which we can sustain max CPU/GPU loads anyway. > > You assume that a thermal zone only has cooling devices of a the same type (or > with the same fake unit for power consumption). This falls apart when multiple > types are used, which is common. > > Also sustainable power is only a derived value, the lying already starts in > the energy model, which is used by EAS, so a fake unit could cause further > problems. > > > I mean even if we do change the DPC values to match actual power, Matthias would > > still observe that we can sustain at the very same OPP and not any different. > > Its just that the mW values that are passed to kernel are relative and not > > absolute. My worry is that perhaps no SoC vendor wants to put these absolute numbers > > out. > > This is pretty much 'security' by obscurity. It would be relatively easy to > measure actual power consumption at different CPU speeds and derive the DPC > values from that. Right, I was going to say that. Specifically: * Anyone that actually gets one of these chips can just measure it pretty trivially. Run the core at a certain speed and measure with the smart battery. Run at a different speed and measure again. * Presumably the power consumption of different types of cores in Qualcomm SoCs of the same generation is roughly equivalent. So I could go and grab a Pixel 4a and put AOSP on it and measure the power consumption and presumably get pretty close numbers for big and little power coefficients. I don't know for sure if Pixel 4a's SoC is officially the same generation but I'd bet it's close. * Presumably someone would be able to get a pretty good guess by figuring out the form factor and working backwards. It sounds as if thermal dissipation (in terms of Watts) for various form factor devices is somewhat standard. Maybe this is more so for phones / tablets than laptops which might have bigger heat pipes or active cooling, but still. Someone could do the math pretty easily. I guess if you're really worried about protecting this then you can delay posting it for brand new chipsets using a new type of technology until product is almost ready to ship, but for sc7180 it doesn't feel like this is something worth fighting about. -Doug
diff --git a/arch/arm64/boot/dts/qcom/sc7180.dtsi b/arch/arm64/boot/dts/qcom/sc7180.dtsi index d46b3833e52f..23f84639d6b9 100644 --- a/arch/arm64/boot/dts/qcom/sc7180.dtsi +++ b/arch/arm64/boot/dts/qcom/sc7180.dtsi @@ -3320,6 +3320,7 @@ cpu0-thermal { polling-delay = <0>; thermal-sensors = <&tsens0 1>; + sustainable-power = <768>; trips { cpu0_alert0: trip-point0 { @@ -3368,6 +3369,7 @@ cpu1-thermal { polling-delay = <0>; thermal-sensors = <&tsens0 2>; + sustainable-power = <768>; trips { cpu1_alert0: trip-point0 { @@ -3416,6 +3418,7 @@ cpu2-thermal { polling-delay = <0>; thermal-sensors = <&tsens0 3>; + sustainable-power = <768>; trips { cpu2_alert0: trip-point0 { @@ -3464,6 +3467,7 @@ cpu3-thermal { polling-delay = <0>; thermal-sensors = <&tsens0 4>; + sustainable-power = <768>; trips { cpu3_alert0: trip-point0 { @@ -3512,6 +3516,7 @@ cpu4-thermal { polling-delay = <0>; thermal-sensors = <&tsens0 5>; + sustainable-power = <768>; trips { cpu4_alert0: trip-point0 { @@ -3560,6 +3565,7 @@ cpu5-thermal { polling-delay = <0>; thermal-sensors = <&tsens0 6>; + sustainable-power = <768>; trips { cpu5_alert0: trip-point0 { @@ -3608,6 +3614,7 @@ cpu6-thermal { polling-delay = <0>; thermal-sensors = <&tsens0 9>; + sustainable-power = <1202>; trips { cpu6_alert0: trip-point0 { @@ -3648,6 +3655,7 @@ cpu7-thermal { polling-delay = <0>; thermal-sensors = <&tsens0 10>; + sustainable-power = <1202>; trips { cpu7_alert0: trip-point0 { @@ -3688,6 +3696,7 @@ cpu8-thermal { polling-delay = <0>; thermal-sensors = <&tsens0 11>; + sustainable-power = <1202>; trips { cpu8_alert0: trip-point0 { @@ -3728,6 +3737,7 @@ cpu9-thermal { polling-delay = <0>; thermal-sensors = <&tsens0 12>; + sustainable-power = <1202>; trips { cpu9_alert0: trip-point0 {
The 'sustainable_power' attribute provides an estimate of the sustained power that can be dissipated at the desired control temperature. One could argue that this value is not necessarily the same for all devices with the same SoC, which may have different form factors or thermal designs. However there are reasons to specify a (default) value at SoC level for SC7180: most importantly, if no value is specified at all the power_allocator thermal governor (aka 'IPA') estimates a value, using the minimum power of all cooling devices of the zone, which can result in overly aggressive thermal throttling. For most devices an approximate conservative value should be more useful than the minimum guesstimate of power_allocator. Devices that need a different value can overwrite it in their <device>.dts. Also the thermal zones for SC7180 have a high level of granularity (essentially one for each function block), which makes it more likely that the default value just works for many devices. The values correspond to 1901 MHz for the big cores, and 1804 MHz for the small cores. The values were determined by limiting the CPU frequencies to different max values and launching a bunch of processes that cause high CPU load ('while true; do true; done &' is simple and does a good job). A frequency is deemed sustainable if the CPU temperatures don't rise (consistently) above the second trip point ('control temperature', 95 degC in this case). Once the highest sustainable frequency is found, the sustainable power can be calculated by multiplying the energy consumption per core at this frequency (which can be found in /sys/kernel/debug/energy_model/) with the number of cores that are specified as cooling devices. The sustainable frequencies were determined at room temperature on a device without heat sink or other passive cooling elements. Signed-off-by: Matthias Kaehlcke <mka@chromium.org> --- If maintainers think 'sustainable_power' should be specified at device level (with which I conceptually agree) I'm fine with doing that, just seemed it could be useful to have a reasonable 'default' at SoC level in this case. arch/arm64/boot/dts/qcom/sc7180.dtsi | 10 ++++++++++ 1 file changed, 10 insertions(+)