Message ID | 20210115094744.21156-1-rui.zhang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Zhang Rui |
Headers | show |
Series | thermal/intel: introduce tcc cooling driver | expand |
On 2021.01.15 Zhang Rui wrote: > > On Intel processors, the core frequency can be reduced below OS request, > when the current temperature reaches the TCC (Thermal Control Circuit) > activation temperature. > > The default TCC activation temperature is specified by > MSR_IA32_TEMPERATURE_TARGET. However, it can be adjusted by specifying an > offset in degrees C, using the TCC Offset bits in the same MSR register. > > This patch introduces a cooling devices driver that utilizes the TCC > Offset feature. The bigger the current cooling state is, the lower the > effective TCC activation temperature is, so that the processors can be > throttled earlier before system critical overheats. Thank you for this useful patch. My systems don't need thermald or any other thermal control, but it is nice to have this extra margin to add to the critical stuff, as a backup. I also like to use the offset to test stuff. I use the internal power limit servo for power limiting, and that servo works very well indeed. Using this temperature offset as a way to servo the thermal operating limit does work, but tends to overshoot, oscillate, hold low excessively long (minutes). It also seems to limit CPU clock frequency reduction to the non-turbo limit, regardless of the desired maximum temperature. I am not familiar with the thermal stuff at all, and didn't know where to find the trip point knob. Anyway, found "cooling_devices11". I do not understand this: ~$ cat /sys/devices/virtual/thermal/cooling_device11/stats/trans_table cat: /sys/devices/virtual/thermal/cooling_device11/stats/trans_table: File too large Rather than enter the actual TCC offset, I would rather enter the desired trip point, and have the driver do the math to convert it to the offset. Example step function overshoot, trip point set to 55 degrees C. doug@s18:~$ sudo ~/turbostat --Summary --quiet --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ --interval 1 Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt 0.07 800 45 24 1.89 0.00 0.04 800 29 23 1.89 0.00 61.76 4546 4151 66 103.77 0.00 < step function load applied on 4 of 6 cores 67.76 4570 4476 66 120.42 0.00 68.03 4567 4488 66 120.73 0.00 67.98 4572 4492 67 121.00 0.00 < 19 degrees over trip point 68.10 4489 4493 58 109.19 0.00 < this throttling is either the power servo or the temp servo. 68.08 4262 4476 51 82.82 0.00 < this throttling is the temp servo. 68.13 4143 4513 48 75.16 0.00 68.03 4086 4488 46 71.87 0.00 < It actually undershoots often, I don't know why. 68.12 4000 4505 46 67.02 0.00 < often it doesn't undershoot. 68.44 4000 4502 45 67.16 0.00 68.06 4000 4483 45 66.95 0.00 68.02 3973 4490 44 65.20 0.00 67.94 3900 4489 43 60.51 0.00 67.88 3900 4501 44 60.55 0.00 67.85 3900 4472 43 60.52 0.00 67.96 3900 4481 43 60.59 0.00 68.26 3900 4501 44 60.70 0.00 67.93 3900 4498 43 60.58 0.00 68.03 3900 4476 43 60.68 0.00 67.83 3900 4481 44 60.54 0.00 35.06 3895 2412 25 32.13 0.00 < load removed. 0.04 800 25 24 1.89 0.00 0.04 800 22 23 1.89 0.00 0.06 800 35 23 1.90 0.00 0.03 800 18 23 1.89 0.00 0.04 800 26 22 1.90 0.00 0.30 1927 44 23 1.97 0.00 ^C0.10 800 25 23 1.91 0.00 Example long time to recover: (actually, this example never recovers, unusual): Note: 3.7 GHz is the limit. doug@s18:~$ sudo ~/turbostat --Summary --quiet --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ --interval 30 Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt 67.58 3700 134812 42 52.15 0.00 <<< the trip point was changed from 37 to 57 degrees 67.90 3700 134964 42 52.08 0.00 68.07 3700 134424 42 52.06 0.00 68.01 3700 134415 41 50.76 0.00 68.14 3700 134521 41 50.78 0.00 68.11 3700 134424 42 50.75 0.00 68.03 3700 134329 42 50.70 0.00 68.11 3700 134321 42 50.76 0.00 68.05 3700 134456 42 51.09 0.00 68.12 3700 134549 42 52.21 0.00 68.12 3700 134482 42 52.19 0.00 68.10 3700 134301 42 52.20 0.00 68.11 3700 134444 42 52.14 0.00 68.08 3700 134422 42 52.17 0.00 68.07 3700 134430 42 52.23 0.00 68.00 3700 134723 42 52.12 0.00 67.96 3711 135207 44 52.53 0.00 <<< It takes 8 minutes until the frequency goes above 3.7 GHz 68.05 3765 134519 42 54.34 0.00 68.11 3771 134461 43 54.60 0.00 67.83 3763 134867 43 54.26 0.00 67.93 3773 134577 43 54.78 0.00 <<< But it never recovers, Why not? ... For unknown reason the processor seems to now think it is not heavily loaded. From my MSR decoder: 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 200020 AUTO AUTOL From the book: > Autonomous Utilization-Based Frequency Control > Status (R0) > When set, frequency is reduced below the operating > system request because the processor has detected > that utilization is low. Which is not true. Anyway, Acked-by: Doug Smythies <dsmythies@telus.net> ... Doug
On 2021.01.16 09:08 Doug Smythies wrote: > On 2021.01.15 Zhang Rui wrote: Added Len to the "To" list: Turostat has another issue with this stuff. It will be more work than I want to do to submit a fix patch, so I am not, but see further down for my hack fix. ... > Example step function overshoot, trip point set to 55 degrees C. > > doug@s18:~$ sudo ~/turbostat --Summary --quiet --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ -- > interval 1 > Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt > 0.07 800 45 24 1.89 0.00 > 0.04 800 29 23 1.89 0.00 > 61.76 4546 4151 66 103.77 0.00 < step function load applied on 4 of 6 cores > 67.76 4570 4476 66 120.42 0.00 > 68.03 4567 4488 66 120.73 0.00 > 67.98 4572 4492 67 121.00 0.00 < 19 degrees over trip point > 68.10 4489 4493 58 109.19 0.00 < this throttling is either the power servo or the temp > servo. > 68.08 4262 4476 51 82.82 0.00 < this throttling is the temp servo. > 68.13 4143 4513 48 75.16 0.00 > 68.03 4086 4488 46 71.87 0.00 < It actually undershoots often, I don't know why. > 68.12 4000 4505 46 67.02 0.00 < often it doesn't undershoot. It turns out that tubostat does not list the package temperature properly if it is started with an active TCC offset. It erroneously includes the offset in the temperature math. In the above example turbostat had also not yet been fixed for the bit mask issue. So the real temp above was 59 degrees C. > 68.44 4000 4502 45 67.16 0.00 > 68.06 4000 4483 45 66.95 0.00 > 68.02 3973 4490 44 65.20 0.00 > 67.94 3900 4489 43 60.51 0.00 > 67.88 3900 4501 44 60.55 0.00 > 67.85 3900 4472 43 60.52 0.00 And it settled at about 56 degrees, close to what was asked for. To proceed with my work, I did a hack fix to turbostat: doug@s18:~/temp-k-git/linux/tools/power/x86/turbostat$ git diff diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c index d7acdd4d16c4..7f0a22ab3a0d 100644 --- a/tools/power/x86/turbostat/turbostat.c +++ b/tools/power/x86/turbostat/turbostat.c @@ -4831,6 +4831,7 @@ int read_tcc_activation_temp() fprintf(outf, "cpu%d: MSR_IA32_TEMPERATURE_TARGET: 0x%08llx (%d C) (%d default - %d offset)\n", base_cpu, msr, tcc, target_c, offset_c); + tcc = target_c; return tcc; } So this: cpu4: MSR_IA32_TEMPERATURE_TARGET: 0x2b64100d (57 C) (100 default - 43 offset) cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x88420000 (-9 C) becomes this: cpu1: MSR_IA32_TEMPERATURE_TARGET: 0x2b64100d (57 C) (100 default - 43 offset) cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x88400000 (36 C) and this: Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt 0.08 1079 928 -11 1.91 0.00 Becomes this: Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt 0.05 1046 846 32 1.94 0.00 So now back to my overshoot example: This: > 67.98 4572 4492 67 121.00 0.00 < 19 degrees over trip point Was actually: > 67.98 4572 4492 80 121.00 0.00 <<< 25 degrees over trip point But let's just do it again: doug@s18:~$ cat /sys/devices/virtual/thermal/cooling_device11/cur_state 43 <<< so 100 - 43 = 57 degrees trip point. doug@s18:~$ sudo ~/turbostat --Summary --quiet --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ --interval 0.25 Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt 0.09 800 6 36 2.01 0.00 0.16 800 23 36 2.00 0.00 0.11 800 14 36 2.15 0.00 66.81 4461 1160 70 101.17 0.00 <<< load applied, temp up 34 degrees in less than 0.25 seconds. Normal. 68.06 4581 1126 74 117.36 0.00 67.69 4589 1119 76 119.60 0.00 67.80 4589 1125 77 120.94 0.00 67.83 4587 1132 78 120.75 0.00 67.68 4591 1125 78 121.63 0.00 68.07 4585 1139 77 121.25 0.00 67.80 4588 1121 79 121.41 0.00 <<< now 20 degrees over trip point. 68.57 4579 1139 79 121.71 0.00 ... 68.03 4220 1130 63 80.28 0.00 <<< it takes quite awhile (>7 seconds) to really throttle down. ... Doug
> -----Original Message----- > From: Doug Smythies <dsmythies@telus.net> > Sent: Sunday, January 17, 2021 5:22 AM > To: Zhang, Rui <rui.zhang@intel.com>; Brown, Len <len.brown@intel.com> > Cc: daniel.lezcano@linaro.org; srinivas.pandruvada@linux.intel.com; linux- > pm@vger.kernel.org; 'Doug Smythies' <dsmythies@telus.net> > Subject: RE: [PATCH] thermal/intel: introduce tcc cooling driver > Importance: High > > On 2021.01.16 09:08 Doug Smythies wrote: > > On 2021.01.15 Zhang Rui wrote: > > Added Len to the "To" list: > > Turostat has another issue with this stuff. > It will be more work than I want to do to submit a fix patch, so I am not, but > see further down for my hack fix. > > ... > > > Example step function overshoot, trip point set to 55 degrees C. > > > > doug@s18:~$ sudo ~/turbostat --Summary --quiet --show > > Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ -- interval 1 > > Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt > > 0.07 800 45 24 1.89 0.00 > > 0.04 800 29 23 1.89 0.00 > > 61.76 4546 4151 66 103.77 0.00 < step function load applied on 4 of 6 > cores > > 67.76 4570 4476 66 120.42 0.00 > > 68.03 4567 4488 66 120.73 0.00 > > 67.98 4572 4492 67 121.00 0.00 < 19 degrees over trip point > > 68.10 4489 4493 58 109.19 0.00 < this throttling is either the power > servo or the temp > > servo. > > 68.08 4262 4476 51 82.82 0.00 < this throttling is the temp servo. > > 68.13 4143 4513 48 75.16 0.00 > > 68.03 4086 4488 46 71.87 0.00 < It actually undershoots often, I don't > know why. > > 68.12 4000 4505 46 67.02 0.00 < often it doesn't undershoot. > > It turns out that tubostat does not list the package temperature properly if it > is started with an active TCC offset. > It erroneously includes the offset in the temperature math. > In the above example turbostat had also not yet been fixed for the bit mask > issue. So the real temp above was 59 degrees C. > > > 68.44 4000 4502 45 67.16 0.00 > > 68.06 4000 4483 45 66.95 0.00 > > 68.02 3973 4490 44 65.20 0.00 > > 67.94 3900 4489 43 60.51 0.00 > > 67.88 3900 4501 44 60.55 0.00 > > 67.85 3900 4472 43 60.52 0.00 > > And it settled at about 56 degrees, close to what was asked for. > > To proceed with my work, I did a hack fix to turbostat: > > doug@s18:~/temp-k-git/linux/tools/power/x86/turbostat$ git diff diff --git > a/tools/power/x86/turbostat/turbostat.c > b/tools/power/x86/turbostat/turbostat.c > index d7acdd4d16c4..7f0a22ab3a0d 100644 > --- a/tools/power/x86/turbostat/turbostat.c > +++ b/tools/power/x86/turbostat/turbostat.c > @@ -4831,6 +4831,7 @@ int read_tcc_activation_temp() > fprintf(outf, "cpu%d: MSR_IA32_TEMPERATURE_TARGET: 0x%08llx > (%d C) (%d default - %d offset)\n", > base_cpu, msr, tcc, target_c, offset_c); > > + tcc = target_c; > return tcc; > } > Yes, this is a right fix. I think Len already knows this breakage and he will propose some fix soon. > So this: > > cpu4: MSR_IA32_TEMPERATURE_TARGET: 0x2b64100d (57 C) (100 default - > 43 offset) > cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x88420000 (-9 C) > > becomes this: > > cpu1: MSR_IA32_TEMPERATURE_TARGET: 0x2b64100d (57 C) (100 default - > 43 offset) > cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x88400000 (36 C) > > and this: > > Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt > 0.08 1079 928 -11 1.91 0.00 > > Becomes this: > > Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt > 0.05 1046 846 32 1.94 0.00 > > So now back to my overshoot example: > > This: > > > 67.98 4572 4492 67 121.00 0.00 < 19 degrees over trip point > > Was actually: > > > 67.98 4572 4492 80 121.00 0.00 <<< 25 degrees over trip point > > But let's just do it again: > > doug@s18:~$ cat /sys/devices/virtual/thermal/cooling_device11/cur_state > 43 <<< so 100 - 43 = 57 degrees trip point. > doug@s18:~$ sudo ~/turbostat --Summary --quiet --show > Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ --interval 0.25 > Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt > 0.09 800 6 36 2.01 0.00 > 0.16 800 23 36 2.00 0.00 > 0.11 800 14 36 2.15 0.00 > 66.81 4461 1160 70 101.17 0.00 <<< load applied, temp up 34 degrees in > less than 0.25 seconds. Normal. > 68.06 4581 1126 74 117.36 0.00 > 67.69 4589 1119 76 119.60 0.00 > 67.80 4589 1125 77 120.94 0.00 > 67.83 4587 1132 78 120.75 0.00 > 67.68 4591 1125 78 121.63 0.00 > 68.07 4585 1139 77 121.25 0.00 > 67.80 4588 1121 79 121.41 0.00 <<< now 20 degrees over trip point. > 68.57 4579 1139 79 121.71 0.00 > ... > 68.03 4220 1130 63 80.28 0.00 <<< it takes quite awhile (>7 seconds) to > really throttle down. What platform this is? On a KBL platform I'm running right now, with performance governor, and tcc offset set to 30 (Effective TCC is 70c), and also turbostat fixed, I can observe that 1. all cpus running at max turbo freq (3.9G) when idle, PkgTmp around 40C 2. with load applied (I use stress tool to get 100% CPU load), the PkgTmp reports 70C and the frequency drops to around 3G, IMMEDIATELY. 3. when I change TCC Offset to 60, cpu is throttled to around 200MHz, and the temperature is at around 50C, IMMEDIATELY. 4. when I change TCC Offset to 20, cpu freq raises to turbo range, and PkgTmp reaches 80C, IMMEDIATELY. So in your test, there is something I don't understand.
Hi, Doug, Thanks for testing this patch. > -----Original Message----- > From: Doug Smythies <dsmythies@telus.net> > Sent: Sunday, January 17, 2021 1:08 AM > To: Zhang, Rui <rui.zhang@intel.com> > Cc: daniel.lezcano@linaro.org; srinivas.pandruvada@linux.intel.com; linux- > pm@vger.kernel.org > Subject: RE: [PATCH] thermal/intel: introduce tcc cooling driver > Importance: High > > On 2021.01.15 Zhang Rui wrote: > > > > On Intel processors, the core frequency can be reduced below OS > > request, when the current temperature reaches the TCC (Thermal Control > > Circuit) activation temperature. > > > > The default TCC activation temperature is specified by > > MSR_IA32_TEMPERATURE_TARGET. However, it can be adjusted by > specifying > > an offset in degrees C, using the TCC Offset bits in the same MSR register. > > > > This patch introduces a cooling devices driver that utilizes the TCC > > Offset feature. The bigger the current cooling state is, the lower the > > effective TCC activation temperature is, so that the processors can be > > throttled earlier before system critical overheats. > > Thank you for this useful patch. > My systems don't need thermald or any other thermal control, but it is nice > to have this extra margin to add to the critical stuff, as a backup. > I also like to use the offset to test stuff. > > I use the internal power limit servo for power limiting, and that servo works > very well indeed. Using this temperature offset as a way to servo the > thermal operating limit does work, but tends to overshoot, oscillate, hold low > excessively long (minutes). Do you have a script to test and show the drawbacks of this feature? It seems that it behaves differently on different platforms. Maybe we can evaluate this on more platforms. > It also seems to limit CPU clock frequency > reduction to the non-turbo limit, regardless of the desired maximum > temperature. > > I am not familiar with the thermal stuff at all, and didn't know where to find > the trip point knob. Anyway, found "cooling_devices11". > > I do not understand this: > > ~$ cat /sys/devices/virtual/thermal/cooling_device11/stats/trans_table > cat: /sys/devices/virtual/thermal/cooling_device11/stats/trans_table: File > too large This is a known issue that stats table can not handle devices with too many cooling states, say, 127 cooling states for TCC Offset cooling device. We can ignore this for now. > > Rather than enter the actual TCC offset, I would rather enter the desired trip > point, and have the driver do the math to convert it to the offset. Hmmm, a writable trip point? I need to think about this. > > Example step function overshoot, trip point set to 55 degrees C. > > doug@s18:~$ sudo ~/turbostat --Summary --quiet --show > Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ --interval 1 > Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt > 0.07 800 45 24 1.89 0.00 > 0.04 800 29 23 1.89 0.00 > 61.76 4546 4151 66 103.77 0.00 < step function load applied on 4 of 6 > cores > 67.76 4570 4476 66 120.42 0.00 > 68.03 4567 4488 66 120.73 0.00 > 67.98 4572 4492 67 121.00 0.00 < 19 degrees over trip point > 68.10 4489 4493 58 109.19 0.00 < this throttling is either the power > servo or the temp servo. > 68.08 4262 4476 51 82.82 0.00 < this throttling is the temp servo. > 68.13 4143 4513 48 75.16 0.00 > 68.03 4086 4488 46 71.87 0.00 < It actually undershoots often, I don't > know why. > 68.12 4000 4505 46 67.02 0.00 < often it doesn't undershoot. > 68.44 4000 4502 45 67.16 0.00 > 68.06 4000 4483 45 66.95 0.00 > 68.02 3973 4490 44 65.20 0.00 > 67.94 3900 4489 43 60.51 0.00 > 67.88 3900 4501 44 60.55 0.00 > 67.85 3900 4472 43 60.52 0.00 > 67.96 3900 4481 43 60.59 0.00 > 68.26 3900 4501 44 60.70 0.00 > 67.93 3900 4498 43 60.58 0.00 > 68.03 3900 4476 43 60.68 0.00 > 67.83 3900 4481 44 60.54 0.00 > 35.06 3895 2412 25 32.13 0.00 < load removed. > 0.04 800 25 24 1.89 0.00 > 0.04 800 22 23 1.89 0.00 > 0.06 800 35 23 1.90 0.00 > 0.03 800 18 23 1.89 0.00 > 0.04 800 26 22 1.90 0.00 > 0.30 1927 44 23 1.97 0.00 > ^C0.10 800 25 23 1.91 0.00 > > Example long time to recover: > (actually, this example never recovers, unusual): > Note: 3.7 GHz is the limit. > > doug@s18:~$ sudo ~/turbostat --Summary --quiet --show > Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ --interval 30 > Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt > 67.58 3700 134812 42 52.15 0.00 <<< the trip point was changed from 37 > to 57 degrees > 67.90 3700 134964 42 52.08 0.00 > 68.07 3700 134424 42 52.06 0.00 > 68.01 3700 134415 41 50.76 0.00 > 68.14 3700 134521 41 50.78 0.00 > 68.11 3700 134424 42 50.75 0.00 > 68.03 3700 134329 42 50.70 0.00 > 68.11 3700 134321 42 50.76 0.00 > 68.05 3700 134456 42 51.09 0.00 > 68.12 3700 134549 42 52.21 0.00 > 68.12 3700 134482 42 52.19 0.00 > 68.10 3700 134301 42 52.20 0.00 > 68.11 3700 134444 42 52.14 0.00 > 68.08 3700 134422 42 52.17 0.00 > 68.07 3700 134430 42 52.23 0.00 > 68.00 3700 134723 42 52.12 0.00 > 67.96 3711 135207 44 52.53 0.00 <<< It takes 8 minutes until the > frequency goes above 3.7 GHz > 68.05 3765 134519 42 54.34 0.00 > 68.11 3771 134461 43 54.60 0.00 > 67.83 3763 134867 43 54.26 0.00 > 67.93 3773 134577 43 54.78 0.00 <<< But it never recovers, Why not? > ... > > For unknown reason the processor seems to now think it is not heavily > loaded. From my MSR decoder: > > 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 200020 AUTO AUTOL > > From the book: > > > Autonomous Utilization-Based Frequency Control Status (R0) When set, > > frequency is reduced below the operating system request because the > > processor has detected that utilization is low. > > Which is not true. > > Anyway, > > Acked-by: Doug Smythies <dsmythies@telus.net> > thanks, rui
On 2021.01.18 01:32 Zhang, Rui wrote: > On 2021.01.17 05:22 Doug Smythies wrote: > > On 2021.01.16 09:08 Doug Smythies wrote: > > > On 2021.01.15 Zhang Rui wrote: ... > > What platform this is? My i5-9600K test server. Intel(R) Core(TM) i5-9600K CPU @ 3.70GHz 6 CPUs and 6 cores. Kernel: 5.11-rc3 + this patch. Water cooled, with water pump always running full speed. > On a KBL platform I'm running right now, with performance governor, and tcc offset set to 30 > (Effective TCC is 70c), and also turbostat fixed, > I can observe that > 1. all cpus running at max turbo freq (3.9G) when idle, PkgTmp around 40C > 2. with load applied (I use stress tool to get 100% CPU load), the PkgTmp reports 70C and the > frequency drops to around 3G, IMMEDIATELY. > 3. when I change TCC Offset to 60, cpu is throttled to around 200MHz, and the temperature is at around > 50C, IMMEDIATELY. > 4. when I change TCC Offset to 20, cpu freq raises to turbo range, and PkgTmp reaches 80C, > IMMEDIATELY. O.K. You should be able to measure "IMMEDIATELY" and tell us what it is. > > So in your test, there is something I don't understand.
Hi, Just a small follow up on this one: On 2021.01.16 09:08 Doug Smythies wrote: > On 2021.01.15 Zhang Rui wrote: ... > Busy% Bzy_MHz IRQ PkgTmp PkgWatt > 67.93 3773 134577 43 54.78 > > For unknown reason the processor seems to now > think it is not heavily loaded. From my MSR decoder: > > 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 200020 AUTO AUTOL > > From the book: > > > Autonomous Utilization-Based Frequency Control > > Status (R0) > > When set, frequency is reduced below the operating > > system request because the processor has detected > > that utilization is low. > > Which is not true. > > Anyway, > > Acked-by: Doug Smythies <dsmythies@telus.net> O.K. there were 2 things wrong above: 1.) I used the wrong intel SDM table for those bit definitions. They should have been: RATL and RATLL. From the proper page of the book: > Running Average Thermal Limit Status (R0) > When set, frequency is reduced below the operating > system request due to Running Average Thermal Limit > (RATL). 2.) Due to the already discussed turbostat issue, that was not the actual temperature and so the RATL bit being set was actually valid at that time. I have not been able to find the time window knob for this, if there even is one, similar to the time window knobs for the package power limits. I wanted to reduce the time constant, just as a test, in an attempt to reduce the step function load potential temperature overshoot. One additional informational follow up note: There always seems to be a significant turn on transient to using the TCC offset, appearing as temperature undershoot. I am saying that an offset of 0 seems to also act as some sort of on/off switch to the running average. Example 1 - start with offset 0: $ sudo ~/turbostat --Summary --quiet --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ --interval 1 Busy% Bzy_MHz IRQ PkgTmp PkgWatt 51.17 4600 3531 71 93.57 51.37 4600 3543 71 93.60 51.37 4600 3590 71 93.63 <<< offset changed from 0 to 24 50.99 3737 3566 52 43.49 <<< trip point = 76 degrees 51.20 3700 3550 51 41.14 <<< TCC offset turn on transient 51.09 3700 3559 51 41.30 <<< There was no need to throttle 51.12 3779 3515 53 43.78 50.95 4064 3553 58 55.57 51.55 4271 3522 63 65.30 51.18 4424 3534 67 76.58 51.27 4500 3532 68 84.12 51.14 4500 3529 68 84.14 51.24 4599 3522 71 93.61 51.14 4600 3523 71 93.71 <<< Eventually it does return to not throttled. Example 2 - start with offset 1: Busy% Bzy_MHz IRQ PkgTmp PkgWatt 51.14 4600 3554 73 94.73 51.37 4600 3544 73 94.85 51.03 4600 3560 74 94.64 <<< offset changed from 1 to 24 51.33 4600 3508 73 94.88 <<< trip point = 76 degrees 51.14 4600 3526 73 94.69 <<< No TCC offset transient 51.22 4600 3614 73 94.85 51.24 4600 3531 73 94.84 51.50 4600 3578 73 94.92 51.15 4600 3571 73 94.77 51.20 4600 3521 73 94.91 51.19 4600 3550 73 94.76 51.27 4600 3522 74 94.81 51.27 4600 3530 74 94.98 ... Doug
Hi, Doug, On Tue, 2021-01-26 at 11:18 -0800, Doug Smythies wrote: > Hi, Just a small follow up on this one: > > On 2021.01.16 09:08 Doug Smythies wrote: > > On 2021.01.15 Zhang Rui wrote: > > ... > > Busy% Bzy_MHz IRQ PkgTmp PkgWatt > > 67.93 3773 134577 43 54.78 > > > > For unknown reason the processor seems to now > > think it is not heavily loaded. From my MSR decoder: > > > > 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 200020 AUTO AUTOL > > > > From the book: > > > > > Autonomous Utilization-Based Frequency Control > > > Status (R0) > > > When set, frequency is reduced below the operating > > > system request because the processor has detected > > > that utilization is low. > > > > Which is not true. > > > > Anyway, > > > > Acked-by: Doug Smythies <dsmythies@telus.net> > > O.K. there were 2 things wrong above: > > 1.) I used the wrong intel SDM table for those bit definitions. > They should have been: RATL and RATLL. > > From the proper page of the book: > > > Running Average Thermal Limit Status (R0) > > When set, frequency is reduced below the operating > > system request due to Running Average Thermal Limit > > (RATL). > > 2.) Due to the already discussed turbostat issue, that was not > the actual temperature and so the RATL bit being set was actually > valid at that time. > On my side, I got the "Thermal status bit" set. > I have not been able to find the time window knob for this, if there > even is one, similar to the time window knobs for the package power > limits. > I wanted to reduce the time constant, just as a test, in an attempt > to reduce the step function load potential temperature overshoot. > > One additional informational follow up note: > > There always seems to be a significant turn on transient to using the > TCC offset, appearing as temperature undershoot. I am saying that > an offset of 0 seems to also act as some sort of on/off switch to the > running average. > > Example 1 - start with offset 0: > > $ sudo ~/turbostat --Summary --quiet --show > Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ --interval 1 > Busy% Bzy_MHz IRQ PkgTmp PkgWatt > 51.17 4600 3531 71 93.57 > 51.37 4600 3543 71 93.60 > 51.37 4600 3590 71 93.63 <<< offset changed from 0 to > 24 > 50.99 3737 3566 52 43.49 <<< trip point = 76 degrees > 51.20 3700 3550 51 41.14 <<< TCC offset turn on > transient > 51.09 3700 3559 51 41.30 <<< There was no need to > throttle > 51.12 3779 3515 53 43.78 > 50.95 4064 3553 58 55.57 > 51.55 4271 3522 63 65.30 > 51.18 4424 3534 67 76.58 > 51.27 4500 3532 68 84.12 > 51.14 4500 3529 68 84.14 > 51.24 4599 3522 71 93.61 > 51.14 4600 3523 71 93.71 <<< Eventually it does return > to not throttled. > > Example 2 - start with offset 1: > > Busy% Bzy_MHz IRQ PkgTmp PkgWatt > 51.14 4600 3554 73 94.73 > 51.37 4600 3544 73 94.85 > 51.03 4600 3560 74 94.64 <<< offset changed from 1 to 24 > 51.33 4600 3508 73 94.88 <<< trip point = 76 degrees > 51.14 4600 3526 73 94.69 <<< No TCC offset transient > 51.22 4600 3614 73 94.85 > 51.24 4600 3531 73 94.84 > 51.50 4600 3578 73 94.92 > 51.15 4600 3571 73 94.77 > 51.20 4600 3521 73 94.91 > 51.19 4600 3550 73 94.76 > 51.27 4600 3522 74 94.81 > 51.27 4600 3530 74 94.98 > > Thanks for your test. I'd prefer this is platform specific. Because it behaves really differently from what I observed. $sudo turbostat --Summary --quiet --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ --interval 1 99.45 2216 10656 80 14.81 <<< start with offset=0 99.48 2234 10621 79 15.02 99.47 2233 10436 80 14.96 99.45 2236 10587 79 14.94 99.49 2216 10673 79 15.04 99.46 2226 10685 79 14.87 99.43 2233 10776 79 14.89 99.73 399 9139 66 4.51 <<< offset set to 50 99.76 212 8998 65 3.31 99.77 212 8902 64 3.27 ... <<< throttled for 20 seconds 99.76 212 8911 55 2.97 99.77 211 8851 55 2.95 99.76 211 8916 55 2.94 99.77 211 8844 55 3.05 99.77 211 8828 54 3.21 99.77 211 8911 54 3.05 99.74 212 8998 54 3.20 99.77 212 8802 54 2.90 99.77 211 8849 54 2.90 99.76 212 8942 53 2.98 99.76 211 9039 53 3.22 99.74 212 8977 53 2.89 99.77 211 8913 53 2.89 99.76 212 8900 53 2.89 99.77 211 8817 52 2.87 99.77 212 8923 52 2.88 99.77 212 8985 52 2.88 99.73 212 8877 52 2.86 99.58 575 9308 66 5.54 <<< offset set to 32 98.92 2460 13694 66 17.32 98.98 2298 13768 66 15.24 99.03 2244 14652 66 14.48 98.97 2198 14489 66 13.95 99.03 2148 14583 66 13.43 99.02 2107 14093 66 13.45 99.06 2060 13750 66 12.61 99.06 2036 14195 66 12.27 99.07 2007 14240 66 12.07 99.12 2888 12147 98 28.23 <<< offset cleared 99.03 3413 11503 98 37.21 98.96 3317 11698 98 34.64 99.07 3246 11410 98 32.89 98.95 3210 12107 98 32.13 98.94 3164 11790 98 31.08 99.00 3124 12106 98 30.84 99.00 3086 11876 98 29.60 98.94 3054 12482 98 29.00 98.89 3030 12629 98 28.54 99.39 2377 10764 82 17.62 <<< Didn't do anything, so it is probably thermald or something 99.49 2200 10679 81 14.44 99.52 2211 10267 80 14.66 99.49 2221 10318 80 14.71 99.45 2220 10289 81 14.74 99.43 2222 10326 81 14.76 I tried both tests, and the results are the same, in both cases, it starts throttling immediately (within a second), and no over-throttling observed. Do you have a script to do this? Say, run turbostat in background and then change tcc offset at certain timestamp? Maybe we can try exactly the same test on different machines. thanks, rui
> > > > Rather than enter the actual TCC offset, I would rather enter the > > desired trip > > point, and have the driver do the math to convert it to the offset. > > Hmmm, a writable trip point? I need to think about this. I think this is a better idea, and I will export this as a writable trip point of the x86_pkg_temp_thermal driver later, thanks for the suggestion. thanks, rui
On Thu, Jan 28, 2021 at 9:30 AM Zhang Rui <rui.zhang@intel.com> wrote: > On Tue, 2021-01-26 at 11:18 -0800, Doug Smythies wrote: > > On 2021.01.16 09:08 Doug Smythies wrote: > > > On 2021.01.15 Zhang Rui wrote: ... > > They should have been: RATL and RATLL. > > > > From the proper page of the book: > > > > > Running Average Thermal Limit Status (R0) > > > When set, frequency is reduced below the operating > > > system request due to Running Average Thermal Limit > > > (RATL). > > > > > 2.) Due to the already discussed turbostat issue, that was not > > the actual temperature and so the RATL bit being set was actually > > valid at that time. > > > On my side, I got the "Thermal status bit" set. Yes, and if I understand your comment correctly, you are referring to IA32_THERM_STATUS (0X19C) and/or IA32_PACKAGE_THERM_STATUS (0X1B1). I am referring to MSR_CORE_PERF_LIMIT_REASONS (0X64F). > > > I have not been able to find the time window knob for this, if there > > even is one, similar to the time window knobs for the package power > > limits. I just assume there is a time window, similar to the RAPL based power limits. But I haven't found it. > > I wanted to reduce the time constant, just as a test, in an attempt > > to reduce the step function load potential temperature overshoot. ... > > > Thanks for your test. > I'd prefer this is platform specific. > Because it behaves really differently from what I observed. O.K. These oddities aside, in the end it does do the expected job. > 99.06 2036 14195 66 12.27 > 99.07 2007 14240 66 12.07 > 99.12 2888 12147 98 28.23 <<< offset cleared > 99.03 3413 11503 98 37.21 > 98.96 3317 11698 98 34.64 very close to critical temp. I never knowingly allow my processor to go above 80 degrees. Although, I admit it hit 90 degrees a couple of times during this work. > 99.07 3246 11410 98 32.89 > 98.95 3210 12107 98 32.13 > 98.94 3164 11790 98 31.08 > 99.00 3124 12106 98 30.84 > 99.00 3086 11876 98 29.60 > 98.94 3054 12482 98 29.00 > 98.89 3030 12629 98 28.54 > 99.39 2377 10764 82 17.62 <<< Didn't do anything, so it > is probably thermald or something or critical temp hit. > > I tried both tests, and the results are the same, in both cases, it > starts throttling immediately (within a second), and no over-throttling > observed. > > Do you have a script to do this? No, all of my tests were done manually, varing: . placement of high loads on some cores for more heat over smaller surface area. . balance between 100% CPU load at max heat verses 100% CPU load at less heat. . balance between this TCC Offset throttling verses package power limits . using ambient (coolant temperature) as a heat removal capacity knob. In summary: I played around until I found something interesting. > Say, run turbostat in background and > then change tcc offset at certain timestamp? Maybe we can try exactly > the same test on different machines. I had an idea, and wasted way way too much time trying to make it work. I thought to just get turbostat to also show the offset, so then we know for certain when it changed. I tried virtually all combinations of: turbostat --Summary --quiet --add /sys/devices/virtual/thermal/cooling_device11/cur_state,,,,TCC --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ --interval 1 turbostat --Summary --quiet --add msr0x1a2,u32,package,raw,TCC --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ --interval 1 and could never get it to work in "Summary" mode. (note: about 95% of my use of turbostat is in "Summary" mode.) Anyway, after too long, I did get this to work: turbostat --quiet --cpu 0 --add /sys/devices/virtual/thermal/cooling_device11/cur_state,u32,,raw,TCC --show CPU,Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ --interval 1 | grep "^ 0" Example 1: turbostat --quiet --cpu 0 --add /sys/devices/virtual/thermal/cooling_device11/cur_state,u32,,raw,TCC --show CPU,Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ --interval 1 | grep "^0" CPU Busy% Bzy_MHz IRQ TCC PkgTmp PkgWatt 0 100.26 4500 1002 0x00000001 78 99.88 <<< Offset = 1 0 100.26 4501 1002 0x00000001 77 99.90 <<< steady state power limit throttle 0 100.26 4501 1004 0x00000001 77 99.92 0 100.26 4500 1002 0x0000001e 78 99.91 <<< offset changed, trip int 70 0 100.25 4502 1003 0x0000001e 77 100.03 0 100.25 4503 1002 0x0000001e 77 99.85 0 100.25 4502 1002 0x0000001e 78 99.92 0 100.26 4501 1003 0x0000001e 78 99.95 0 100.25 4503 1002 0x0000001e 77 99.88 0 100.25 4502 1002 0x0000001e 78 99.86 0 100.25 4502 1004 0x0000001e 77 99.92 0 100.25 4503 1002 0x0000001e 77 99.98 0 100.25 4502 1002 0x0000001e 77 99.88 0 100.26 4498 1004 0x0000001e 77 100.06 0 100.26 4501 1002 0x0000001e 78 99.77 0 100.26 4500 1002 0x0000001e 78 99.53 0 100.26 4430 1002 0x0000001e 72 91.19 <<< Thermal throttling. 13 Seconds 0 100.26 4400 1002 0x0000001e 72 87.55 0 100.26 4400 1002 0x0000001e 71 87.52 0 100.26 4400 1005 0x0000001e 71 87.56 0 100.26 4400 1002 0x0000001e 72 87.53 Example 2: 0 100.26 4600 1002 0x00000000 83 113.26 <<< Offset = 0 0 100.26 4600 1002 0x00000000 84 113.43 0 100.25 4599 1002 0x00000000 83 113.42 <<< No power limit throttle yet. 0 100.26 4600 1004 0x00000000 83 113.40 <<< Not steady state. 0 100.26 4600 1002 0x00000000 83 113.25 0 100.25 3797 1003 0x00000018 56 54.11 <<< Overshoot is immediate. 0 100.26 3700 1002 0x00000018 56 47.09 0 100.26 3700 1002 0x00000018 55 47.08 0 100.26 3700 1002 0x00000018 54 46.98 0 100.26 3820 1002 0x00000018 58 51.62 <<< starts to recover. 0 100.26 4016 1002 0x00000018 62 61.55 0 100.26 4177 1002 0x00000018 64 69.91 0 100.26 4275 1004 0x00000018 68 75.81 0 100.26 4300 1002 0x00000018 68 77.36 0 100.26 4371 1002 0x00000018 71 84.53 0 100.26 4400 1002 0x00000018 72 87.52 0 100.26 4400 1003 0x00000018 72 87.62 Example 3: This test is specifically an attempt to test the TCC Offset in the exact way I intend to use it. trip point = 75 degrees, and never changes. Power limit 2 is 115 watts, timing window short. Power limit 1 is 100 watts , timing window 8 seconds. Note: all previous work was with the timing window at 28 seconds. Note: typically temperature < 75 at 100 watts. The load is 4 prime95 maximum heat threads, plus 0 weaker memory hammering threads. The collant had to be preheated for about an hour before this test started, otherwise the processor would not get hot enough before package power limit 1 took over the throttling duties. Now, watching the TCC offset is useless for this test, so let's watch MSR_CORE_PERF_LIMIT_REASONS instead: turbostat --add msr0x64f,u32,,raw,TCC --show CPU,Busy%,Bzy_MHz,PkgTmp,PkgWatt,IRQ,RAMWatt --interval 1 | grep "^0" (O.K., I should have changed the added column name. I filter it anyhow, but manually added back, edited.) CPU Busy% Bzy_MHz IRQ TCC PkgTmp PkgWatt RAMWatt 0 0.07 1081 5 0x08200000 38 2.31 0.45 <<< Note high idle start temp. 0 0.16 824 11 0x08200000 38 2.12 0.45 0 1.74 3430 44 0x00000000 38 2.65 0.45 <<< clear last times log bits 0 0.16 851 6 0x00000000 37 2.27 0.45 0 4.32 3313 269 0x00000000 75 47.15 0.45 <<< load applied 0 4.24 4585 458 0x08000800 78 97.16 0.45 <<< package power limit 2 0 2.80 4588 482 0x08000000 77 97.49 0.45 <<< temperature just high 0 2.87 4593 463 0x08000000 78 97.95 0.45 0 3.39 4600 465 0x08000000 78 97.68 0.45 0 2.66 4600 462 0x08000000 78 97.55 0.45 0 2.28 4584 490 0x08000000 78 97.97 0.45 0 3.29 4583 478 0x08000000 78 97.72 0.45 0 3.24 4595 465 0x08000000 77 97.52 0.45 0 2.47 4600 465 0x08000000 78 97.50 0.45 0 4.18 4570 464 0x08000000 78 97.72 0.45 0 2.51 4600 470 0x08000000 78 97.40 0.45 0 1.77 4601 482 0x08000000 78 97.33 0.45 0 3.13 4584 462 0x08000000 78 97.57 0.45 0 3.06 4600 466 0x08000000 78 97.77 0.45 0 2.86 4592 461 0x08000000 78 97.56 0.45 0 2.85 4569 486 0x08000000 78 97.99 0.45 0 2.96 4600 465 0x08000000 78 97.91 0.45 0 3.00 4585 451 0x08000000 78 97.68 0.45 0 2.06 4600 475 0x08000000 78 97.50 0.45 0 3.05 4594 462 0x08000000 78 97.78 0.45 0 3.11 4592 461 0x08000000 78 97.68 0.45 0 2.31 4546 463 0x08200020 73 93.00 0.45 <<< RATL 0 2.80 4525 454 0x08200000 78 91.29 0.45 <<< Oscillates within 0 3.32 4538 445 0x08200020 73 91.61 0.45 <<< 1 pstate 0 3.27 4557 434 0x08200000 78 93.12 0.45 0 3.26 4523 470 0x08200020 73 89.85 0.45 <<< rough estimate is 0 2.48 4586 466 0x08200020 74 95.67 0.45 <<< oscillation costs 0.4% 0 1.95 4521 468 0x08200000 76 87.93 0.45 <<< performance loss verses 0 3.28 4569 449 0x08200020 73 94.67 0.45 <<< the power limit 2 servo. 0 0.44 4546 495 0x08200000 78 91.77 0.45 <<< (very crude, hard to defend 0 1.91 4518 487 0x08200020 73 91.24 0.45 <<< data.) 0 3.25 4539 460 0x08200000 78 91.63 0.45 0 2.51 4546 469 0x08200020 74 91.12 0.45 0 3.60 4540 453 0x08200000 77 91.43 0.45 0 3.06 4542 463 0x08200020 73 91.56 0.45 ... Doug
diff --git a/drivers/thermal/intel/Kconfig b/drivers/thermal/intel/Kconfig index 8025b21f43fa..67de49cc9fb4 100644 --- a/drivers/thermal/intel/Kconfig +++ b/drivers/thermal/intel/Kconfig @@ -75,3 +75,11 @@ config INTEL_PCH_THERMAL Enable this to support thermal reporting on certain intel PCHs. Thermal reporting device will provide temperature reading, programmable trip points and other information. + +config INTEL_TCC_COOLING + tristate "Intel TCC offset cooling Driver" + depends on X86 + help + Enable this to support system cooling by adjusting the effective TCC + activation temperature via the TCC Offset register, which is widely + supported on modern Intel platforms. diff --git a/drivers/thermal/intel/Makefile b/drivers/thermal/intel/Makefile index 0d9736ced5d4..40e86973f88d 100644 --- a/drivers/thermal/intel/Makefile +++ b/drivers/thermal/intel/Makefile @@ -10,3 +10,4 @@ obj-$(CONFIG_INTEL_QUARK_DTS_THERMAL) += intel_quark_dts_thermal.o obj-$(CONFIG_INT340X_THERMAL) += int340x_thermal/ obj-$(CONFIG_INTEL_BXT_PMIC_THERMAL) += intel_bxt_pmic_thermal.o obj-$(CONFIG_INTEL_PCH_THERMAL) += intel_pch_thermal.o +obj-$(CONFIG_INTEL_TCC_COOLING) += intel_tcc_cooling.o diff --git a/drivers/thermal/intel/intel_tcc_cooling.c b/drivers/thermal/intel/intel_tcc_cooling.c new file mode 100644 index 000000000000..aa6bbb9ba898 --- /dev/null +++ b/drivers/thermal/intel/intel_tcc_cooling.c @@ -0,0 +1,128 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * cooling device driver that activates the processor throttling by + * programming the TCC Offset register. + * Copyright (c) 2021, Intel Corporation. + */ +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/device.h> +#include <linux/module.h> +#include <linux/thermal.h> +#include <asm/cpu_device_id.h> + +#define TCC_SHIFT 24 +#define TCC_MASK (0x3fULL<<24) +#define TCC_PROGRAMMABLE BIT(30) + +static struct thermal_cooling_device *tcc_cdev; + +static int tcc_get_max_state(struct thermal_cooling_device *cdev, unsigned long + *state) +{ + *state = TCC_MASK >> TCC_SHIFT; + return 0; +} + +static int tcc_offset_update(int tcc) +{ + u64 val; + int err; + + err = rdmsrl_safe(MSR_IA32_TEMPERATURE_TARGET, &val); + if (err) + return err; + + val &= ~TCC_MASK; + val |= tcc << TCC_SHIFT; + + err = wrmsrl_safe(MSR_IA32_TEMPERATURE_TARGET, val); + if (err) + return err; + + return 0; +} + +static int tcc_get_cur_state(struct thermal_cooling_device *cdev, unsigned long + *state) +{ + u64 val; + int err; + + err = rdmsrl_safe(MSR_IA32_TEMPERATURE_TARGET, &val); + if (err) + return err; + + *state = (val & TCC_MASK) >> TCC_SHIFT; + return 0; +} + +static int tcc_set_cur_state(struct thermal_cooling_device *cdev, unsigned long + state) +{ + return tcc_offset_update(state); +} + +static const struct thermal_cooling_device_ops tcc_cooling_ops = { + .get_max_state = tcc_get_max_state, + .get_cur_state = tcc_get_cur_state, + .set_cur_state = tcc_set_cur_state, +}; + +static const struct x86_cpu_id tcc_ids[] __initconst = { + X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE, NULL), + X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_L, NULL), + X86_MATCH_INTEL_FAM6_MODEL(KABYLAKE, NULL), + X86_MATCH_INTEL_FAM6_MODEL(KABYLAKE_L, NULL), + X86_MATCH_INTEL_FAM6_MODEL(ICELAKE, NULL), + X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_L, NULL), + X86_MATCH_INTEL_FAM6_MODEL(TIGERLAKE, NULL), + X86_MATCH_INTEL_FAM6_MODEL(TIGERLAKE_L, NULL), + {} +}; + +MODULE_DEVICE_TABLE(x86cpu, tcc_ids); + +static int __init tcc_cooling_init(void) +{ + int ret; + u64 val; + const struct x86_cpu_id *id; + + int err; + + id = x86_match_cpu(tcc_ids); + if (!id) + return -ENODEV; + + err = rdmsrl_safe(MSR_PLATFORM_INFO, &val); + if (err) + return err; + + if (!(val & TCC_PROGRAMMABLE)) + return -ENODEV; + + pr_info("Programmable TCC Offset detected\n"); + + tcc_cdev = + thermal_cooling_device_register("TCC Offset", NULL, + &tcc_cooling_ops); + if (IS_ERR(tcc_cdev)) { + ret = PTR_ERR(tcc_cdev); + return ret; + } + return 0; +} + +module_init(tcc_cooling_init) + +static void __exit tcc_cooling_exit(void) +{ + thermal_cooling_device_unregister(tcc_cdev); +} + +module_exit(tcc_cooling_exit) + +MODULE_DESCRIPTION("TCC offset cooling device Driver"); +MODULE_AUTHOR("Zhang Rui <rui.zhang@intel.com>"); +MODULE_LICENSE("GPL v2");
On Intel processors, the core frequency can be reduced below OS request, when the current temperature reaches the TCC (Thermal Control Circuit) activation temperature. The default TCC activation temperature is specified by MSR_IA32_TEMPERATURE_TARGET. However, it can be adjusted by specifying an offset in degrees C, using the TCC Offset bits in the same MSR register. This patch introduces a cooling devices driver that utilizes the TCC Offset feature. The bigger the current cooling state is, the lower the effective TCC activation temperature is, so that the processors can be throttled earlier before system critical overheats. This patch has been tested on a KBL mobile platform. Signed-off-by: Zhang Rui <rui.zhang@intel.com> --- drivers/thermal/intel/Kconfig | 8 ++ drivers/thermal/intel/Makefile | 1 + drivers/thermal/intel/intel_tcc_cooling.c | 128 ++++++++++++++++++++++ 3 files changed, 137 insertions(+) create mode 100644 drivers/thermal/intel/intel_tcc_cooling.c