diff mbox series

[v5,6/6] arm64: dts: qcom: Enable cpu cooling devices for QCS9075 platforms

Message ID 20241229152332.3068172-7-quic_wasimn@quicinc.com (mailing list archive)
State New
Headers show
Series arm64: qcom: Add support for QCS9075 boards | expand

Commit Message

Wasim Nazir Dec. 29, 2024, 3:23 p.m. UTC
From: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>

In QCS9100 SoC, the safety subsystem monitors all thermal sensors and
does corrective action for each subsystem based on sensor violation
to comply safety standards. But as QCS9075 is non-safe SoC it
requires conventional thermal mitigation to control thermal for
different subsystems.

The cpu frequency throttling for different cpu tsens is enabled in
hardware as first defense for cpu thermal control. But QCS9075 SoC
has higher ambient specification. During high ambient condition, even
lowest frequency with multi cores can slowly build heat over the time
and it can lead to thermal run-away situations. This patch restrict
cpu cores during this scenario helps further thermal control and
avoids thermal critical violation.

Add cpu idle injection cooling bindings for cpu tsens thermal zones
as a mitigation for cpu subsystem prior to thermal shutdown.

Add cpu frequency cooling devices that will be used by userspace
thermal governor to mitigate skin thermal management.

Signed-off-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
---
 arch/arm64/boot/dts/qcom/qcs9075-rb8.dts      |   1 +
 arch/arm64/boot/dts/qcom/qcs9075-ride-r3.dts  |   1 +
 arch/arm64/boot/dts/qcom/qcs9075-ride.dts     |   1 +
 arch/arm64/boot/dts/qcom/qcs9075-thermal.dtsi | 287 ++++++++++++++++++
 4 files changed, 290 insertions(+)
 create mode 100644 arch/arm64/boot/dts/qcom/qcs9075-thermal.dtsi

--
2.47.0

Comments

Aiqun(Maria) Yu Dec. 30, 2024, 6:02 a.m. UTC | #1
On 12/29/2024 11:23 PM, Wasim Nazir wrote:
> From: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
> 
> In QCS9100 SoC, the safety subsystem monitors all thermal sensors and
[...]
> Add cpu frequency cooling devices that will be used by userspace
> thermal governor to mitigate skin thermal management.
> 
> Signed-off-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>

Also need to add SOB from the patch handler(Wasim).

Doc can reference [1].
snippets:
 - Signed-off-by: ``Patch handler <handler@mail>``

   SOBs after the author SOB are from people handling and transporting
   the patch, but were not involved in development. SOB chains should
   reflect the **real** route a patch took as it was propagated to us,
   with the first SOB entry signalling primary authorship of a single
   author.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/maintainer-tip.rst
[1]

> ---
>  arch/arm64/boot/dts/qcom/qcs9075-rb8.dts      |   1 +
>  arch/arm64/boot/dts/qcom/qcs9075-ride-r3.dts  |   1 +
[...]
> 
>  #include "sa8775p-ride.dtsi"
> +#include "qcs9075-thermal.dtsi"

Thermal nodes are usually added by soc.dtsi chips like sa8775p.dtsi.
From the description, it seems that having thermal information is a
common feature for SOC qcs9075.

Would it be better to have below dts structure instead?:

1) Add a qcs9075.dtsi that includes sa8775p.dtsi and qcs9075-thermal.dtsi.
2) Have a qcs9075-ride.dtsi that includes sa8776p.dtsi and
qcs9075-thermal.dtsi.
3) Ensure all qcs9075 board dts include qcs9075-ride.dtsi

> 
>  / {
>  	model = "Qualcomm Technologies, Inc. QCS9075 Ride";
> diff --git a/arch/arm64/boot/dts/qcom/qcs9075-thermal.dtsi b/arch/arm64/boot/dts/qcom/qcs9075-thermal.dtsi
> new file mode 100644
> index 000000000000..40544c8582c4
> --- /dev/null
> +++ b/arch/arm64/boot/dts/qcom/qcs9075-thermal.dtsi
> @@ -0,0 +1,287 @@
> +// SPDX-License-Identifier: BSD-3-Clause
> +/*
> + * Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#include <dt-bindings/thermal/thermal.h>
> +
> +&cpu0 {
> +	#cooling-cells = <2>;

Why is cpu0 treated specially when it doesn't include
cpu0_idle/thermal-idle nodes? Could you provide the information to the
commit message?

By the way, if there is no cpu0_idle, does that mean the #cooling-cell
is also not needed?

> +};
> +
> +&cpu1 {
[...]
> +
> +/ {
> +	thermal-zones {

The first /thermal-zones is located in sa8775p.dtsi. Should it have an
alias instead of referencing the whole node with the path? Using an
alias can help the reviewer check the previous node's information and
imply that it is an override rather than a newly added node.

> +		cpu-0-1-0-thermal {
> +			trips {
> +				cpu_0_1_0_passive: trip-point1 {

It seems like a common attribute for cpu1-cpu7. Can it be a common trips
node that can be referenced by different cpu-*-*-*-thermal nodes?
Konrad Dybcio Dec. 30, 2024, 3:35 p.m. UTC | #2
On 29.12.2024 4:23 PM, Wasim Nazir wrote:
> From: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
> 
> In QCS9100 SoC, the safety subsystem monitors all thermal sensors and
> does corrective action for each subsystem based on sensor violation
> to comply safety standards. But as QCS9075 is non-safe SoC it
> requires conventional thermal mitigation to control thermal for
> different subsystems.
> 
> The cpu frequency throttling for different cpu tsens is enabled in
> hardware as first defense for cpu thermal control. But QCS9075 SoC
> has higher ambient specification. During high ambient condition, even
> lowest frequency with multi cores can slowly build heat over the time
> and it can lead to thermal run-away situations. This patch restrict
> cpu cores during this scenario helps further thermal control and
> avoids thermal critical violation.
> 
> Add cpu idle injection cooling bindings for cpu tsens thermal zones
> as a mitigation for cpu subsystem prior to thermal shutdown.
> 
> Add cpu frequency cooling devices that will be used by userspace
> thermal governor to mitigate skin thermal management.
> 
> Signed-off-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
> ---

Does this bring measurable benefits over just making the CPU a cooling
device and pointing the thermal zones to it (and not the idle subnode)?

Konrad
Dmitry Baryshkov Dec. 30, 2024, 3:40 p.m. UTC | #3
On Sun, Dec 29, 2024 at 08:53:32PM +0530, Wasim Nazir wrote:
> From: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
> 
> In QCS9100 SoC, the safety subsystem monitors all thermal sensors and
> does corrective action for each subsystem based on sensor violation
> to comply safety standards. But as QCS9075 is non-safe SoC it
> requires conventional thermal mitigation to control thermal for
> different subsystems.
> 
> The cpu frequency throttling for different cpu tsens is enabled in
> hardware as first defense for cpu thermal control. But QCS9075 SoC
> has higher ambient specification. During high ambient condition, even
> lowest frequency with multi cores can slowly build heat over the time
> and it can lead to thermal run-away situations. This patch restrict
> cpu cores during this scenario helps further thermal control and
> avoids thermal critical violation.
> 
> Add cpu idle injection cooling bindings for cpu tsens thermal zones
> as a mitigation for cpu subsystem prior to thermal shutdown.
> 
> Add cpu frequency cooling devices that will be used by userspace
> thermal governor to mitigate skin thermal management.

Does anything prevent us from having this config as a part of the basic
sa8775p.dtsi setup? If HW is present in the base version but it is not
accessible for whatever reason, please move it the base device config
and use status "disabled" or "reserved" to the respective board files.

> 
> Signed-off-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
> ---
>  arch/arm64/boot/dts/qcom/qcs9075-rb8.dts      |   1 +
>  arch/arm64/boot/dts/qcom/qcs9075-ride-r3.dts  |   1 +
>  arch/arm64/boot/dts/qcom/qcs9075-ride.dts     |   1 +
>  arch/arm64/boot/dts/qcom/qcs9075-thermal.dtsi | 287 ++++++++++++++++++
>  4 files changed, 290 insertions(+)
>  create mode 100644 arch/arm64/boot/dts/qcom/qcs9075-thermal.dtsi
> 
> diff --git a/arch/arm64/boot/dts/qcom/qcs9075-rb8.dts b/arch/arm64/boot/dts/qcom/qcs9075-rb8.dts
> index ecaa383b6508..3ab6deeaacf1 100644
> --- a/arch/arm64/boot/dts/qcom/qcs9075-rb8.dts
> +++ b/arch/arm64/boot/dts/qcom/qcs9075-rb8.dts
> @@ -9,6 +9,7 @@
> 
>  #include "sa8775p.dtsi"
>  #include "sa8775p-pmics.dtsi"
> +#include "qcs9075-thermal.dtsi"
> 
>  / {
>  	model = "Qualcomm Technologies, Inc. Robotics RB8";
> diff --git a/arch/arm64/boot/dts/qcom/qcs9075-ride-r3.dts b/arch/arm64/boot/dts/qcom/qcs9075-ride-r3.dts
> index d9a8956d3a76..5f2d9f416617 100644
> --- a/arch/arm64/boot/dts/qcom/qcs9075-ride-r3.dts
> +++ b/arch/arm64/boot/dts/qcom/qcs9075-ride-r3.dts
> @@ -5,6 +5,7 @@
>  /dts-v1/;
> 
>  #include "sa8775p-ride.dtsi"
> +#include "qcs9075-thermal.dtsi"
> 
>  / {
>  	model = "Qualcomm Technologies, Inc. QCS9075 Ride Rev3";
> diff --git a/arch/arm64/boot/dts/qcom/qcs9075-ride.dts b/arch/arm64/boot/dts/qcom/qcs9075-ride.dts
> index 3b524359a72d..10ce48e7ba2f 100644
> --- a/arch/arm64/boot/dts/qcom/qcs9075-ride.dts
> +++ b/arch/arm64/boot/dts/qcom/qcs9075-ride.dts
> @@ -5,6 +5,7 @@
>  /dts-v1/;
> 
>  #include "sa8775p-ride.dtsi"
> +#include "qcs9075-thermal.dtsi"
> 
>  / {
>  	model = "Qualcomm Technologies, Inc. QCS9075 Ride";
> diff --git a/arch/arm64/boot/dts/qcom/qcs9075-thermal.dtsi b/arch/arm64/boot/dts/qcom/qcs9075-thermal.dtsi
> new file mode 100644
> index 000000000000..40544c8582c4
> --- /dev/null
> +++ b/arch/arm64/boot/dts/qcom/qcs9075-thermal.dtsi
> @@ -0,0 +1,287 @@
> +// SPDX-License-Identifier: BSD-3-Clause
> +/*
> + * Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#include <dt-bindings/thermal/thermal.h>
> +
> +&cpu0 {
> +	#cooling-cells = <2>;
> +};
> +
> +&cpu1 {
> +	#cooling-cells = <2>;
> +	cpu1_idle: thermal-idle {
> +		#cooling-cells = <2>;
> +		duration-us = <800000>;
> +		exit-latency-us = <10000>;
> +	};
> +};
> +
> +&cpu2 {
> +	#cooling-cells = <2>;
> +	cpu2_idle: thermal-idle {
> +		#cooling-cells = <2>;
> +		duration-us = <800000>;
> +		exit-latency-us = <10000>;
> +	};
> +};
> +
> +&cpu3 {
> +	#cooling-cells = <2>;
> +	cpu3_idle: thermal-idle {
> +		#cooling-cells = <2>;
> +		duration-us = <800000>;
> +		exit-latency-us = <10000>;
> +	};
> +};
> +
> +&cpu4 {
> +	#cooling-cells = <2>;
> +	cpu4_idle: thermal-idle {
> +		#cooling-cells = <2>;
> +		duration-us = <800000>;
> +		exit-latency-us = <10000>;
> +	};
> +};
> +
> +&cpu5 {
> +	#cooling-cells = <2>;
> +	cpu5_idle: thermal-idle {
> +		#cooling-cells = <2>;
> +		duration-us = <800000>;
> +		exit-latency-us = <10000>;
> +	};
> +};
> +
> +&cpu6 {
> +	#cooling-cells = <2>;
> +	cpu6_idle: thermal-idle {
> +		#cooling-cells = <2>;
> +		duration-us = <800000>;
> +		exit-latency-us = <10000>;
> +	};
> +};
> +
> +&cpu7 {
> +	#cooling-cells = <2>;
> +	cpu7_idle: thermal-idle {
> +		#cooling-cells = <2>;
> +		duration-us = <800000>;
> +		exit-latency-us = <10000>;
> +	};
> +};
> +
> +/ {
> +	thermal-zones {
> +		cpu-0-1-0-thermal {
> +			trips {
> +				cpu_0_1_0_passive: trip-point1 {
> +					temperature = <116000>;
> +				};
> +			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu_0_1_0_passive>;
> +					cooling-device = <&cpu1_idle 100 100>;
> +				};
> +			};
> +		};
> +
> +		cpu-0-2-0-thermal {
> +			trips {
> +				cpu_0_2_0_passive: trip-point1 {
> +					temperature = <116000>;
> +				};
> +			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu_0_2_0_passive>;
> +					cooling-device = <&cpu2_idle 100 100>;
> +				};
> +			};
> +		};
> +
> +		cpu-0-3-0-thermal {
> +			trips {
> +				cpu_0_3_0_passive: trip-point1 {
> +					temperature = <116000>;
> +				};
> +			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu_0_3_0_passive>;
> +					cooling-device = <&cpu3_idle 100 100>;
> +				};
> +			};
> +		};
> +
> +		cpu-0-1-1-thermal {
> +			trips {
> +				cpu_0_1_1_passive: trip-point1 {
> +					temperature = <116000>;
> +				};
> +			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu_0_1_1_passive>;
> +					cooling-device = <&cpu1_idle 100 100>;
> +				};
> +			};
> +		};
> +
> +		cpu-0-2-1-thermal {
> +			trips {
> +				cpu_0_2_1_passive: trip-point1 {
> +					temperature = <116000>;
> +				};
> +			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu_0_2_1_passive>;
> +					cooling-device = <&cpu2_idle 100 100>;
> +				};
> +			};
> +		};
> +
> +		cpu-0-3-1-thermal {
> +			trips {
> +				cpu_0_3_1_passive: trip-point1 {
> +					temperature = <116000>;
> +				};
> +			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu_0_3_1_passive>;
> +					cooling-device = <&cpu3_idle 100 100>;
> +				};
> +			};
> +		};
> +
> +		cpu-1-0-0-thermal {
> +			trips {
> +				cpu_1_0_0_passive: trip-point1 {
> +					temperature = <116000>;
> +				};
> +			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu_1_0_0_passive>;
> +					cooling-device = <&cpu4_idle 100 100>;
> +				};
> +			};
> +		};
> +
> +		cpu-1-1-0-thermal {
> +			trips {
> +				cpu_1_1_0_passive: trip-point1 {
> +					temperature = <116000>;
> +				};
> +			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu_1_1_0_passive>;
> +					cooling-device = <&cpu5_idle 100 100>;
> +				};
> +			};
> +		};
> +
> +		cpu-1-2-0-thermal {
> +			trips {
> +				cpu_1_2_0_passive: trip-point1 {
> +					temperature = <116000>;
> +				};
> +			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu_1_2_0_passive>;
> +					cooling-device = <&cpu6_idle 100 100>;
> +				};
> +			};
> +		};
> +
> +		cpu-1-3-0-thermal {
> +			trips {
> +				cpu_1_3_0_passive: trip-point1 {
> +					temperature = <116000>;
> +				};
> +			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu_1_3_0_passive>;
> +					cooling-device = <&cpu7_idle 100 100>;
> +				};
> +			};
> +		};
> +
> +		cpu-1-0-1-thermal {
> +			trips {
> +				cpu_1_0_1_passive: trip-point1 {
> +					temperature = <116000>;
> +				};
> +			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu_1_0_1_passive>;
> +					cooling-device = <&cpu4_idle 100 100>;
> +				};
> +			};
> +		};
> +
> +		cpu-1-1-1-thermal {
> +			trips {
> +				cpu_1_1_1_passive: trip-point1 {
> +					temperature = <116000>;
> +				};
> +			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu_1_1_1_passive>;
> +					cooling-device = <&cpu5_idle 100 100>;
> +				};
> +			};
> +		};
> +
> +		cpu-1-2-1-thermal {
> +			trips {
> +				cpu_1_2_1_passive: trip-point1 {
> +					temperature = <116000>;
> +				};
> +			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu_1_2_1_passive>;
> +					cooling-device = <&cpu6_idle 100 100>;
> +				};
> +			};
> +		};
> +
> +		cpu-1-3-1-thermal {
> +			trips {
> +				cpu_1_3_1_passive: trip-point1 {
> +					temperature = <116000>;
> +				};
> +			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu_1_3_1_passive>;
> +					cooling-device = <&cpu7_idle 100 100>;
> +				};
> +			};
> +		};
> +	};
> +};
> --
> 2.47.0
>
Manaf Meethalavalappu Pallikunhi Dec. 31, 2024, 11:05 a.m. UTC | #4
Hi Konrad,

On 12/30/2024 9:05 PM, Konrad Dybcio wrote:
> On 29.12.2024 4:23 PM, Wasim Nazir wrote:
>> From: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
>>
>> In QCS9100 SoC, the safety subsystem monitors all thermal sensors and
>> does corrective action for each subsystem based on sensor violation
>> to comply safety standards. But as QCS9075 is non-safe SoC it
>> requires conventional thermal mitigation to control thermal for
>> different subsystems.
>>
>> The cpu frequency throttling for different cpu tsens is enabled in
>> hardware as first defense for cpu thermal control. But QCS9075 SoC
>> has higher ambient specification. During high ambient condition, even
>> lowest frequency with multi cores can slowly build heat over the time
>> and it can lead to thermal run-away situations. This patch restrict
>> cpu cores during this scenario helps further thermal control and
>> avoids thermal critical violation.
>>
>> Add cpu idle injection cooling bindings for cpu tsens thermal zones
>> as a mitigation for cpu subsystem prior to thermal shutdown.
>>
>> Add cpu frequency cooling devices that will be used by userspace
>> thermal governor to mitigate skin thermal management.
>>
>> Signed-off-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
>> ---
> Does this bring measurable benefits over just making the CPU a cooling
> device and pointing the thermal zones to it (and not the idle subnode)?
>
> Konrad
As noted in the commit, CPU frequency mitigation is handled by hardware 
as a first level mitigation. The software/scheduler will be updated via 
arch_update_hw_pressure API [1] for this mitigation. Adding the same CPU 
mitigation in thermal zones is redundant. We are adding idle injection 
with a 100% duty cycle as an additional mitigation step  at higher trip 
to further reduce CPU power consumption. This helps device thermal 
stability further, especially in high ambient conditions.

[1]. 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/cpufreq/qcom-cpufreq-hw.c?h=next-20241220#n352

Best regards,

Manaf
Manaf Meethalavalappu Pallikunhi Dec. 31, 2024, 12:01 p.m. UTC | #5
Hi Dmitry,

On 12/30/2024 9:10 PM, Dmitry Baryshkov wrote:
> On Sun, Dec 29, 2024 at 08:53:32PM +0530, Wasim Nazir wrote:
>> From: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
>>
>> In QCS9100 SoC, the safety subsystem monitors all thermal sensors and
>> does corrective action for each subsystem based on sensor violation
>> to comply safety standards. But as QCS9075 is non-safe SoC it
>> requires conventional thermal mitigation to control thermal for
>> different subsystems.
>>
>> The cpu frequency throttling for different cpu tsens is enabled in
>> hardware as first defense for cpu thermal control. But QCS9075 SoC
>> has higher ambient specification. During high ambient condition, even
>> lowest frequency with multi cores can slowly build heat over the time
>> and it can lead to thermal run-away situations. This patch restrict
>> cpu cores during this scenario helps further thermal control and
>> avoids thermal critical violation.
>>
>> Add cpu idle injection cooling bindings for cpu tsens thermal zones
>> as a mitigation for cpu subsystem prior to thermal shutdown.
>>
>> Add cpu frequency cooling devices that will be used by userspace
>> thermal governor to mitigate skin thermal management.
> Does anything prevent us from having this config as a part of the basic
> sa8775p.dtsi setup? If HW is present in the base version but it is not
> accessible for whatever reason, please move it the base device config
> and use status "disabled" or "reserved" to the respective board files.

Sure,  I will move idle injection node for each cpu to sa8775p.dtsi and 
keep it disabled state. #cooling cells property for CPU, still wanted to 
keep it in board files as we don't want to enable any cooling device in 
base DT.

Best Regards,

Manaf

>
>> Signed-off-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
>> ---
>>   arch/arm64/boot/dts/qcom/qcs9075-rb8.dts      |   1 +
>>   arch/arm64/boot/dts/qcom/qcs9075-ride-r3.dts  |   1 +
>>   arch/arm64/boot/dts/qcom/qcs9075-ride.dts     |   1 +
>>   arch/arm64/boot/dts/qcom/qcs9075-thermal.dtsi | 287 ++++++++++++++++++
>>   4 files changed, 290 insertions(+)
>>   create mode 100644 arch/arm64/boot/dts/qcom/qcs9075-thermal.dtsi
>>
>> diff --git a/arch/arm64/boot/dts/qcom/qcs9075-rb8.dts b/arch/arm64/boot/dts/qcom/qcs9075-rb8.dts
>> index ecaa383b6508..3ab6deeaacf1 100644
>> --- a/arch/arm64/boot/dts/qcom/qcs9075-rb8.dts
>> +++ b/arch/arm64/boot/dts/qcom/qcs9075-rb8.dts
>> @@ -9,6 +9,7 @@
>>
>>   #include "sa8775p.dtsi"
>>   #include "sa8775p-pmics.dtsi"
>> +#include "qcs9075-thermal.dtsi"
>>
>>   / {
>>   	model = "Qualcomm Technologies, Inc. Robotics RB8";
>> diff --git a/arch/arm64/boot/dts/qcom/qcs9075-ride-r3.dts b/arch/arm64/boot/dts/qcom/qcs9075-ride-r3.dts
>> index d9a8956d3a76..5f2d9f416617 100644
>> --- a/arch/arm64/boot/dts/qcom/qcs9075-ride-r3.dts
>> +++ b/arch/arm64/boot/dts/qcom/qcs9075-ride-r3.dts
>> @@ -5,6 +5,7 @@
>>   /dts-v1/;
>>
>>   #include "sa8775p-ride.dtsi"
>> +#include "qcs9075-thermal.dtsi"
>>
>>   / {
>>   	model = "Qualcomm Technologies, Inc. QCS9075 Ride Rev3";
>> diff --git a/arch/arm64/boot/dts/qcom/qcs9075-ride.dts b/arch/arm64/boot/dts/qcom/qcs9075-ride.dts
>> index 3b524359a72d..10ce48e7ba2f 100644
>> --- a/arch/arm64/boot/dts/qcom/qcs9075-ride.dts
>> +++ b/arch/arm64/boot/dts/qcom/qcs9075-ride.dts
>> @@ -5,6 +5,7 @@
>>   /dts-v1/;
>>
>>   #include "sa8775p-ride.dtsi"
>> +#include "qcs9075-thermal.dtsi"
>>
>>   / {
>>   	model = "Qualcomm Technologies, Inc. QCS9075 Ride";
>> diff --git a/arch/arm64/boot/dts/qcom/qcs9075-thermal.dtsi b/arch/arm64/boot/dts/qcom/qcs9075-thermal.dtsi
>> new file mode 100644
>> index 000000000000..40544c8582c4
>> --- /dev/null
>> +++ b/arch/arm64/boot/dts/qcom/qcs9075-thermal.dtsi
>> @@ -0,0 +1,287 @@
>> +// SPDX-License-Identifier: BSD-3-Clause
>> +/*
>> + * Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
>> + */
>> +
>> +#include <dt-bindings/thermal/thermal.h>
>> +
>> +&cpu0 {
>> +	#cooling-cells = <2>;
>> +};
>> +
>> +&cpu1 {
>> +	#cooling-cells = <2>;
>> +	cpu1_idle: thermal-idle {
>> +		#cooling-cells = <2>;
>> +		duration-us = <800000>;
>> +		exit-latency-us = <10000>;
>> +	};
>> +};
>> +
>> +&cpu2 {
>> +	#cooling-cells = <2>;
>> +	cpu2_idle: thermal-idle {
>> +		#cooling-cells = <2>;
>> +		duration-us = <800000>;
>> +		exit-latency-us = <10000>;
>> +	};
>> +};
>> +
>> +&cpu3 {
>> +	#cooling-cells = <2>;
>> +	cpu3_idle: thermal-idle {
>> +		#cooling-cells = <2>;
>> +		duration-us = <800000>;
>> +		exit-latency-us = <10000>;
>> +	};
>> +};
>> +
>> +&cpu4 {
>> +	#cooling-cells = <2>;
>> +	cpu4_idle: thermal-idle {
>> +		#cooling-cells = <2>;
>> +		duration-us = <800000>;
>> +		exit-latency-us = <10000>;
>> +	};
>> +};
>> +
>> +&cpu5 {
>> +	#cooling-cells = <2>;
>> +	cpu5_idle: thermal-idle {
>> +		#cooling-cells = <2>;
>> +		duration-us = <800000>;
>> +		exit-latency-us = <10000>;
>> +	};
>> +};
>> +
>> +&cpu6 {
>> +	#cooling-cells = <2>;
>> +	cpu6_idle: thermal-idle {
>> +		#cooling-cells = <2>;
>> +		duration-us = <800000>;
>> +		exit-latency-us = <10000>;
>> +	};
>> +};
>> +
>> +&cpu7 {
>> +	#cooling-cells = <2>;
>> +	cpu7_idle: thermal-idle {
>> +		#cooling-cells = <2>;
>> +		duration-us = <800000>;
>> +		exit-latency-us = <10000>;
>> +	};
>> +};
>> +
>> +/ {
>> +	thermal-zones {
>> +		cpu-0-1-0-thermal {
>> +			trips {
>> +				cpu_0_1_0_passive: trip-point1 {
>> +					temperature = <116000>;
>> +				};
>> +			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu_0_1_0_passive>;
>> +					cooling-device = <&cpu1_idle 100 100>;
>> +				};
>> +			};
>> +		};
>> +
>> +		cpu-0-2-0-thermal {
>> +			trips {
>> +				cpu_0_2_0_passive: trip-point1 {
>> +					temperature = <116000>;
>> +				};
>> +			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu_0_2_0_passive>;
>> +					cooling-device = <&cpu2_idle 100 100>;
>> +				};
>> +			};
>> +		};
>> +
>> +		cpu-0-3-0-thermal {
>> +			trips {
>> +				cpu_0_3_0_passive: trip-point1 {
>> +					temperature = <116000>;
>> +				};
>> +			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu_0_3_0_passive>;
>> +					cooling-device = <&cpu3_idle 100 100>;
>> +				};
>> +			};
>> +		};
>> +
>> +		cpu-0-1-1-thermal {
>> +			trips {
>> +				cpu_0_1_1_passive: trip-point1 {
>> +					temperature = <116000>;
>> +				};
>> +			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu_0_1_1_passive>;
>> +					cooling-device = <&cpu1_idle 100 100>;
>> +				};
>> +			};
>> +		};
>> +
>> +		cpu-0-2-1-thermal {
>> +			trips {
>> +				cpu_0_2_1_passive: trip-point1 {
>> +					temperature = <116000>;
>> +				};
>> +			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu_0_2_1_passive>;
>> +					cooling-device = <&cpu2_idle 100 100>;
>> +				};
>> +			};
>> +		};
>> +
>> +		cpu-0-3-1-thermal {
>> +			trips {
>> +				cpu_0_3_1_passive: trip-point1 {
>> +					temperature = <116000>;
>> +				};
>> +			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu_0_3_1_passive>;
>> +					cooling-device = <&cpu3_idle 100 100>;
>> +				};
>> +			};
>> +		};
>> +
>> +		cpu-1-0-0-thermal {
>> +			trips {
>> +				cpu_1_0_0_passive: trip-point1 {
>> +					temperature = <116000>;
>> +				};
>> +			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu_1_0_0_passive>;
>> +					cooling-device = <&cpu4_idle 100 100>;
>> +				};
>> +			};
>> +		};
>> +
>> +		cpu-1-1-0-thermal {
>> +			trips {
>> +				cpu_1_1_0_passive: trip-point1 {
>> +					temperature = <116000>;
>> +				};
>> +			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu_1_1_0_passive>;
>> +					cooling-device = <&cpu5_idle 100 100>;
>> +				};
>> +			};
>> +		};
>> +
>> +		cpu-1-2-0-thermal {
>> +			trips {
>> +				cpu_1_2_0_passive: trip-point1 {
>> +					temperature = <116000>;
>> +				};
>> +			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu_1_2_0_passive>;
>> +					cooling-device = <&cpu6_idle 100 100>;
>> +				};
>> +			};
>> +		};
>> +
>> +		cpu-1-3-0-thermal {
>> +			trips {
>> +				cpu_1_3_0_passive: trip-point1 {
>> +					temperature = <116000>;
>> +				};
>> +			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu_1_3_0_passive>;
>> +					cooling-device = <&cpu7_idle 100 100>;
>> +				};
>> +			};
>> +		};
>> +
>> +		cpu-1-0-1-thermal {
>> +			trips {
>> +				cpu_1_0_1_passive: trip-point1 {
>> +					temperature = <116000>;
>> +				};
>> +			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu_1_0_1_passive>;
>> +					cooling-device = <&cpu4_idle 100 100>;
>> +				};
>> +			};
>> +		};
>> +
>> +		cpu-1-1-1-thermal {
>> +			trips {
>> +				cpu_1_1_1_passive: trip-point1 {
>> +					temperature = <116000>;
>> +				};
>> +			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu_1_1_1_passive>;
>> +					cooling-device = <&cpu5_idle 100 100>;
>> +				};
>> +			};
>> +		};
>> +
>> +		cpu-1-2-1-thermal {
>> +			trips {
>> +				cpu_1_2_1_passive: trip-point1 {
>> +					temperature = <116000>;
>> +				};
>> +			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu_1_2_1_passive>;
>> +					cooling-device = <&cpu6_idle 100 100>;
>> +				};
>> +			};
>> +		};
>> +
>> +		cpu-1-3-1-thermal {
>> +			trips {
>> +				cpu_1_3_1_passive: trip-point1 {
>> +					temperature = <116000>;
>> +				};
>> +			};
>> +
>> +			cooling-maps {
>> +				map0 {
>> +					trip = <&cpu_1_3_1_passive>;
>> +					cooling-device = <&cpu7_idle 100 100>;
>> +				};
>> +			};
>> +		};
>> +	};
>> +};
>> --
>> 2.47.0
>>
Konrad Dybcio Dec. 31, 2024, 4:21 p.m. UTC | #6
On 31.12.2024 12:05 PM, Manaf Meethalavalappu Pallikunhi wrote:
> 
> Hi Konrad,
> 
> On 12/30/2024 9:05 PM, Konrad Dybcio wrote:
>> On 29.12.2024 4:23 PM, Wasim Nazir wrote:
>>> From: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
>>>
>>> In QCS9100 SoC, the safety subsystem monitors all thermal sensors and
>>> does corrective action for each subsystem based on sensor violation
>>> to comply safety standards. But as QCS9075 is non-safe SoC it
>>> requires conventional thermal mitigation to control thermal for
>>> different subsystems.
>>>
>>> The cpu frequency throttling for different cpu tsens is enabled in
>>> hardware as first defense for cpu thermal control. But QCS9075 SoC
>>> has higher ambient specification. During high ambient condition, even
>>> lowest frequency with multi cores can slowly build heat over the time
>>> and it can lead to thermal run-away situations. This patch restrict
>>> cpu cores during this scenario helps further thermal control and
>>> avoids thermal critical violation.
>>>
>>> Add cpu idle injection cooling bindings for cpu tsens thermal zones
>>> as a mitigation for cpu subsystem prior to thermal shutdown.
>>>
>>> Add cpu frequency cooling devices that will be used by userspace
>>> thermal governor to mitigate skin thermal management.
>>>
>>> Signed-off-by: Manaf Meethalavalappu Pallikunhi <quic_manafm@quicinc.com>
>>> ---
>> Does this bring measurable benefits over just making the CPU a cooling
>> device and pointing the thermal zones to it (and not the idle subnode)?
>>
>> Konrad
> As noted in the commit, CPU frequency mitigation is handled by hardware as a first level mitigation. The software/scheduler will be updated via arch_update_hw_pressure API [1] for this mitigation. Adding the same CPU mitigation in thermal zones is redundant. We are adding idle injection with a 100% duty cycle as an additional mitigation step  at higher trip to further reduce CPU power consumption. This helps device thermal stability further, especially in high ambient conditions.

I understood this much from the commit message.

What I'm asking is, whether your solution actually works better than just
letting Linux software-throttle the CPUs, preferably backed by some
numbers.

I'm also unsure how this is supposed to reduce power consumption. If the
CPUs aren't busy, they should idle, and if they are not fully utilized, a
lower frequency would likely be scheduled.

Konrad


> 
> [1]. https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/cpufreq/qcom-cpufreq-hw.c?h=next-20241220#n352
> 
> Best regards,
> 
> Manaf
>
diff mbox series

Patch

diff --git a/arch/arm64/boot/dts/qcom/qcs9075-rb8.dts b/arch/arm64/boot/dts/qcom/qcs9075-rb8.dts
index ecaa383b6508..3ab6deeaacf1 100644
--- a/arch/arm64/boot/dts/qcom/qcs9075-rb8.dts
+++ b/arch/arm64/boot/dts/qcom/qcs9075-rb8.dts
@@ -9,6 +9,7 @@ 

 #include "sa8775p.dtsi"
 #include "sa8775p-pmics.dtsi"
+#include "qcs9075-thermal.dtsi"

 / {
 	model = "Qualcomm Technologies, Inc. Robotics RB8";
diff --git a/arch/arm64/boot/dts/qcom/qcs9075-ride-r3.dts b/arch/arm64/boot/dts/qcom/qcs9075-ride-r3.dts
index d9a8956d3a76..5f2d9f416617 100644
--- a/arch/arm64/boot/dts/qcom/qcs9075-ride-r3.dts
+++ b/arch/arm64/boot/dts/qcom/qcs9075-ride-r3.dts
@@ -5,6 +5,7 @@ 
 /dts-v1/;

 #include "sa8775p-ride.dtsi"
+#include "qcs9075-thermal.dtsi"

 / {
 	model = "Qualcomm Technologies, Inc. QCS9075 Ride Rev3";
diff --git a/arch/arm64/boot/dts/qcom/qcs9075-ride.dts b/arch/arm64/boot/dts/qcom/qcs9075-ride.dts
index 3b524359a72d..10ce48e7ba2f 100644
--- a/arch/arm64/boot/dts/qcom/qcs9075-ride.dts
+++ b/arch/arm64/boot/dts/qcom/qcs9075-ride.dts
@@ -5,6 +5,7 @@ 
 /dts-v1/;

 #include "sa8775p-ride.dtsi"
+#include "qcs9075-thermal.dtsi"

 / {
 	model = "Qualcomm Technologies, Inc. QCS9075 Ride";
diff --git a/arch/arm64/boot/dts/qcom/qcs9075-thermal.dtsi b/arch/arm64/boot/dts/qcom/qcs9075-thermal.dtsi
new file mode 100644
index 000000000000..40544c8582c4
--- /dev/null
+++ b/arch/arm64/boot/dts/qcom/qcs9075-thermal.dtsi
@@ -0,0 +1,287 @@ 
+// SPDX-License-Identifier: BSD-3-Clause
+/*
+ * Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#include <dt-bindings/thermal/thermal.h>
+
+&cpu0 {
+	#cooling-cells = <2>;
+};
+
+&cpu1 {
+	#cooling-cells = <2>;
+	cpu1_idle: thermal-idle {
+		#cooling-cells = <2>;
+		duration-us = <800000>;
+		exit-latency-us = <10000>;
+	};
+};
+
+&cpu2 {
+	#cooling-cells = <2>;
+	cpu2_idle: thermal-idle {
+		#cooling-cells = <2>;
+		duration-us = <800000>;
+		exit-latency-us = <10000>;
+	};
+};
+
+&cpu3 {
+	#cooling-cells = <2>;
+	cpu3_idle: thermal-idle {
+		#cooling-cells = <2>;
+		duration-us = <800000>;
+		exit-latency-us = <10000>;
+	};
+};
+
+&cpu4 {
+	#cooling-cells = <2>;
+	cpu4_idle: thermal-idle {
+		#cooling-cells = <2>;
+		duration-us = <800000>;
+		exit-latency-us = <10000>;
+	};
+};
+
+&cpu5 {
+	#cooling-cells = <2>;
+	cpu5_idle: thermal-idle {
+		#cooling-cells = <2>;
+		duration-us = <800000>;
+		exit-latency-us = <10000>;
+	};
+};
+
+&cpu6 {
+	#cooling-cells = <2>;
+	cpu6_idle: thermal-idle {
+		#cooling-cells = <2>;
+		duration-us = <800000>;
+		exit-latency-us = <10000>;
+	};
+};
+
+&cpu7 {
+	#cooling-cells = <2>;
+	cpu7_idle: thermal-idle {
+		#cooling-cells = <2>;
+		duration-us = <800000>;
+		exit-latency-us = <10000>;
+	};
+};
+
+/ {
+	thermal-zones {
+		cpu-0-1-0-thermal {
+			trips {
+				cpu_0_1_0_passive: trip-point1 {
+					temperature = <116000>;
+				};
+			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu_0_1_0_passive>;
+					cooling-device = <&cpu1_idle 100 100>;
+				};
+			};
+		};
+
+		cpu-0-2-0-thermal {
+			trips {
+				cpu_0_2_0_passive: trip-point1 {
+					temperature = <116000>;
+				};
+			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu_0_2_0_passive>;
+					cooling-device = <&cpu2_idle 100 100>;
+				};
+			};
+		};
+
+		cpu-0-3-0-thermal {
+			trips {
+				cpu_0_3_0_passive: trip-point1 {
+					temperature = <116000>;
+				};
+			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu_0_3_0_passive>;
+					cooling-device = <&cpu3_idle 100 100>;
+				};
+			};
+		};
+
+		cpu-0-1-1-thermal {
+			trips {
+				cpu_0_1_1_passive: trip-point1 {
+					temperature = <116000>;
+				};
+			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu_0_1_1_passive>;
+					cooling-device = <&cpu1_idle 100 100>;
+				};
+			};
+		};
+
+		cpu-0-2-1-thermal {
+			trips {
+				cpu_0_2_1_passive: trip-point1 {
+					temperature = <116000>;
+				};
+			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu_0_2_1_passive>;
+					cooling-device = <&cpu2_idle 100 100>;
+				};
+			};
+		};
+
+		cpu-0-3-1-thermal {
+			trips {
+				cpu_0_3_1_passive: trip-point1 {
+					temperature = <116000>;
+				};
+			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu_0_3_1_passive>;
+					cooling-device = <&cpu3_idle 100 100>;
+				};
+			};
+		};
+
+		cpu-1-0-0-thermal {
+			trips {
+				cpu_1_0_0_passive: trip-point1 {
+					temperature = <116000>;
+				};
+			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu_1_0_0_passive>;
+					cooling-device = <&cpu4_idle 100 100>;
+				};
+			};
+		};
+
+		cpu-1-1-0-thermal {
+			trips {
+				cpu_1_1_0_passive: trip-point1 {
+					temperature = <116000>;
+				};
+			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu_1_1_0_passive>;
+					cooling-device = <&cpu5_idle 100 100>;
+				};
+			};
+		};
+
+		cpu-1-2-0-thermal {
+			trips {
+				cpu_1_2_0_passive: trip-point1 {
+					temperature = <116000>;
+				};
+			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu_1_2_0_passive>;
+					cooling-device = <&cpu6_idle 100 100>;
+				};
+			};
+		};
+
+		cpu-1-3-0-thermal {
+			trips {
+				cpu_1_3_0_passive: trip-point1 {
+					temperature = <116000>;
+				};
+			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu_1_3_0_passive>;
+					cooling-device = <&cpu7_idle 100 100>;
+				};
+			};
+		};
+
+		cpu-1-0-1-thermal {
+			trips {
+				cpu_1_0_1_passive: trip-point1 {
+					temperature = <116000>;
+				};
+			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu_1_0_1_passive>;
+					cooling-device = <&cpu4_idle 100 100>;
+				};
+			};
+		};
+
+		cpu-1-1-1-thermal {
+			trips {
+				cpu_1_1_1_passive: trip-point1 {
+					temperature = <116000>;
+				};
+			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu_1_1_1_passive>;
+					cooling-device = <&cpu5_idle 100 100>;
+				};
+			};
+		};
+
+		cpu-1-2-1-thermal {
+			trips {
+				cpu_1_2_1_passive: trip-point1 {
+					temperature = <116000>;
+				};
+			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu_1_2_1_passive>;
+					cooling-device = <&cpu6_idle 100 100>;
+				};
+			};
+		};
+
+		cpu-1-3-1-thermal {
+			trips {
+				cpu_1_3_1_passive: trip-point1 {
+					temperature = <116000>;
+				};
+			};
+
+			cooling-maps {
+				map0 {
+					trip = <&cpu_1_3_1_passive>;
+					cooling-device = <&cpu7_idle 100 100>;
+				};
+			};
+		};
+	};
+};