diff mbox

[1/2] clk: scpi: RfC - Allow to ignore invalid SCPI DVFS clock rates

Message ID d7b8f9db-7512-646c-bb96-ef74e7266661@gmail.com (mailing list archive)
State RFC
Headers show

Commit Message

Heiner Kallweit Feb. 4, 2017, 9:03 p.m. UTC
Introduce an optional property "clock-max-frequency" for SCPI DVFS
clocks. All frequencies for the respective clock exceeding this
threshold will be ignored.

This is useful on systems where the firmware offers too optimistic
clock rates causing instabilities and crashes.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 Documentation/devicetree/bindings/arm/arm,scpi.txt |  7 +++++++
 drivers/clk/clk-scpi.c                             | 15 ++++++++++++++-
 2 files changed, 21 insertions(+), 1 deletion(-)

Comments

Sudeep Holla Feb. 8, 2017, 11:23 a.m. UTC | #1
On 04/02/17 21:03, Heiner Kallweit wrote:
> Introduce an optional property "clock-max-frequency" for SCPI DVFS
> clocks. All frequencies for the respective clock exceeding this
> threshold will be ignored.
> 
> This is useful on systems where the firmware offers too optimistic
> clock rates causing instabilities and crashes.
> 

It clearly means the firmware/hardware(IOW platform) was not tested
correctly before firmware advertised the OPPs. It needs to fixed in the
firmware. The approach should be advertise the known minimal set working
rather than the set for which hardware was designed.

That's the whole reason while these are kept in firmware so the OS need
not worry about such details.

So NACK, go fix the firmware or disable it completely and be happy with
the boot frequency.
Kevin Hilman Feb. 8, 2017, 7:45 p.m. UTC | #2
Sudeep Holla <sudeep.holla@arm.com> writes:

> On 04/02/17 21:03, Heiner Kallweit wrote:
>> Introduce an optional property "clock-max-frequency" for SCPI DVFS
>> clocks. All frequencies for the respective clock exceeding this
>> threshold will be ignored.
>> 
>> This is useful on systems where the firmware offers too optimistic
>> clock rates causing instabilities and crashes.
>> 
>
> It clearly means the firmware/hardware(IOW platform) was not tested
> correctly before firmware advertised the OPPs. It needs to fixed in the
> firmware. The approach should be advertise the known minimal set working
> rather than the set for which hardware was designed.
>
> That's the whole reason while these are kept in firmware so the OS need
> not worry about such details.
>
> So NACK, go fix the firmware 

Sorry, but "go fix the firmware" is not an option for most users of
these boards.

Even if the source were provided for the firwmare (it's not), it usually
needs signing by the vendor, and we know how likely that will be
provided by the vendors.

Firmware will will always be buggy and/or broken and we will be stuck
with it.  IMO, not allowing the kernel to work around broken
firmwaretakes a very idealistic view of firmware, and is not based on
historical reality with ARM SoC vendors.

> or disable it completely and be happy with the boot frequency.

That's an awful solution also, when we know that most of the frequencies
work just fine.

Kevin
Sudeep Holla Feb. 9, 2017, 10:52 a.m. UTC | #3
On 08/02/17 19:45, Kevin Hilman wrote:
> Sudeep Holla <sudeep.holla@arm.com> writes:
> 
>> On 04/02/17 21:03, Heiner Kallweit wrote:
>>> Introduce an optional property "clock-max-frequency" for SCPI DVFS
>>> clocks. All frequencies for the respective clock exceeding this
>>> threshold will be ignored.
>>>
>>> This is useful on systems where the firmware offers too optimistic
>>> clock rates causing instabilities and crashes.
>>>
>>
>> It clearly means the firmware/hardware(IOW platform) was not tested
>> correctly before firmware advertised the OPPs. It needs to fixed in the
>> firmware. The approach should be advertise the known minimal set working
>> rather than the set for which hardware was designed.
>>
>> That's the whole reason while these are kept in firmware so the OS need
>> not worry about such details.
>>
>> So NACK, go fix the firmware 
> 
> Sorry, but "go fix the firmware" is not an option for most users of
> these boards.
> 

I knew this was coming :). I just wanted to shout at vendors who are not
validating their firmware. Sometimes I feel it's better have platform
driver and drive everything from Linux and don't use buggy firmware at
all instead of adding tons of workaround. It defeats the whole purpose
of having the firmware.

> Even if the source were provided for the firmware (it's not), it
> usually needs signing by the vendor, and we know how likely that will
> be provided by the vendors.
> 

I agree, but the main reason for raising my voice is to communicate the
message to those vendors. Blindly accepting whatever they give is also
not a good practice.

> Firmware will will always be buggy and/or broken and we will be
> stuck with it.  IMO, not allowing the kernel to work around broken 
> firmware takes a very idealistic view of firmware, and is not based
> on historical reality with ARM SoC vendors.
> 

Yes but things should start to improve somewhere sometime. It really
stupid to keep working around everything vendor screws up giving them no
incentive to get things right the next time. We need to accept the fact
that firmware will do more as the complex systems evolve and that won't
go away. It's time we make them aware of that and ensure they take
necessary steps to fix it for future platforms.

>> or disable it completely and be happy with the boot frequency.
> 
> That's an awful solution also, when we know that most of the
> frequencies work just fine.
> 

OK, I understand the concern. I will look into the patches.
Michał Zegan Feb. 9, 2017, 12:19 p.m. UTC | #4
W dniu 09.02.2017 o 11:52, Sudeep Holla pisze:
> 
> 
> On 08/02/17 19:45, Kevin Hilman wrote:
>> Sudeep Holla <sudeep.holla@arm.com> writes:
>>
>>> On 04/02/17 21:03, Heiner Kallweit wrote:
>>>> Introduce an optional property "clock-max-frequency" for SCPI DVFS
>>>> clocks. All frequencies for the respective clock exceeding this
>>>> threshold will be ignored.
>>>>
>>>> This is useful on systems where the firmware offers too optimistic
>>>> clock rates causing instabilities and crashes.
>>>>
>>>
>>> It clearly means the firmware/hardware(IOW platform) was not tested
>>> correctly before firmware advertised the OPPs. It needs to fixed in the
>>> firmware. The approach should be advertise the known minimal set working
>>> rather than the set for which hardware was designed.
>>>
>>> That's the whole reason while these are kept in firmware so the OS need
>>> not worry about such details.
>>>
>>> So NACK, go fix the firmware 
>>
>> Sorry, but "go fix the firmware" is not an option for most users of
>> these boards.
>>
> 
> I knew this was coming :). I just wanted to shout at vendors who are not
> validating their firmware. Sometimes I feel it's better have platform
> driver and drive everything from Linux and don't use buggy firmware at
> all instead of adding tons of workaround. It defeats the whole purpose
> of having the firmware.
> 
Well, at least in the case of odroid c2 from hardkernel, I believe those
unstable frequencies are supported intentionally. The wiki of the board
lists them explicitly, and says when they are stable.
So if that was intentional, then a frequency capping set by default
would be needed, it can be lifted by a specific user if he wants. Unless
hk should disable that "feature" instead.
>> Even if the source were provided for the firmware (it's not), it
>> usually needs signing by the vendor, and we know how likely that will
>> be provided by the vendors.
>>
> 
> I agree, but the main reason for raising my voice is to communicate the
> message to those vendors. Blindly accepting whatever they give is also
> not a good practice.
> 
>> Firmware will will always be buggy and/or broken and we will be
>> stuck with it.  IMO, not allowing the kernel to work around broken 
>> firmware takes a very idealistic view of firmware, and is not based
>> on historical reality with ARM SoC vendors.
>>
> 
> Yes but things should start to improve somewhere sometime. It really
> stupid to keep working around everything vendor screws up giving them no
> incentive to get things right the next time. We need to accept the fact
> that firmware will do more as the complex systems evolve and that won't
> go away. It's time we make them aware of that and ensure they take
> necessary steps to fix it for future platforms.
> 
>>> or disable it completely and be happy with the boot frequency.
>>
>> That's an awful solution also, when we know that most of the
>> frequencies work just fine.
>>
> 
> OK, I understand the concern. I will look into the patches.
>
Sudeep Holla Feb. 9, 2017, 12:25 p.m. UTC | #5
On 09/02/17 12:19, Michał Zegan wrote:
> 
> 
> W dniu 09.02.2017 o 11:52, Sudeep Holla pisze:
>>
>>
>> On 08/02/17 19:45, Kevin Hilman wrote:
>>> Sudeep Holla <sudeep.holla@arm.com> writes:
>>>
>>>> On 04/02/17 21:03, Heiner Kallweit wrote:
>>>>> Introduce an optional property "clock-max-frequency" for SCPI DVFS
>>>>> clocks. All frequencies for the respective clock exceeding this
>>>>> threshold will be ignored.
>>>>>
>>>>> This is useful on systems where the firmware offers too optimistic
>>>>> clock rates causing instabilities and crashes.
>>>>>
>>>>
>>>> It clearly means the firmware/hardware(IOW platform) was not tested
>>>> correctly before firmware advertised the OPPs. It needs to fixed in the
>>>> firmware. The approach should be advertise the known minimal set working
>>>> rather than the set for which hardware was designed.
>>>>
>>>> That's the whole reason while these are kept in firmware so the OS need
>>>> not worry about such details.
>>>>
>>>> So NACK, go fix the firmware 
>>>
>>> Sorry, but "go fix the firmware" is not an option for most users of
>>> these boards.
>>>
>>
>> I knew this was coming :). I just wanted to shout at vendors who are not
>> validating their firmware. Sometimes I feel it's better have platform
>> driver and drive everything from Linux and don't use buggy firmware at
>> all instead of adding tons of workaround. It defeats the whole purpose
>> of having the firmware.
>>
> Well, at least in the case of odroid c2 from hardkernel, I believe those
> unstable frequencies are supported intentionally. The wiki of the board
> lists them explicitly, and says when they are stable.
> So if that was intentional, then a frequency capping set by default
> would be needed, it can be lifted by a specific user if he wants. Unless
> hk should disable that "feature" instead.

If those frequency are known to cause any stability issues, they should
be considered as *not supported*. If it's just thermal constraints then
yes the user/developer should be allowed to use them as they can take
care of thermal management so that the platform remains stable for usage.
Michał Zegan Feb. 9, 2017, 12:51 p.m. UTC | #6
W dniu 09.02.2017 o 13:25, Sudeep Holla pisze:
> 
> 
> On 09/02/17 12:19, Michał Zegan wrote:
>>
>>
>> W dniu 09.02.2017 o 11:52, Sudeep Holla pisze:
>>>
>>>
>>> On 08/02/17 19:45, Kevin Hilman wrote:
>>>> Sudeep Holla <sudeep.holla@arm.com> writes:
>>>>
>>>>> On 04/02/17 21:03, Heiner Kallweit wrote:
>>>>>> Introduce an optional property "clock-max-frequency" for SCPI DVFS
>>>>>> clocks. All frequencies for the respective clock exceeding this
>>>>>> threshold will be ignored.
>>>>>>
>>>>>> This is useful on systems where the firmware offers too optimistic
>>>>>> clock rates causing instabilities and crashes.
>>>>>>
>>>>>
>>>>> It clearly means the firmware/hardware(IOW platform) was not tested
>>>>> correctly before firmware advertised the OPPs. It needs to fixed in the
>>>>> firmware. The approach should be advertise the known minimal set working
>>>>> rather than the set for which hardware was designed.
>>>>>
>>>>> That's the whole reason while these are kept in firmware so the OS need
>>>>> not worry about such details.
>>>>>
>>>>> So NACK, go fix the firmware 
>>>>
>>>> Sorry, but "go fix the firmware" is not an option for most users of
>>>> these boards.
>>>>
>>>
>>> I knew this was coming :). I just wanted to shout at vendors who are not
>>> validating their firmware. Sometimes I feel it's better have platform
>>> driver and drive everything from Linux and don't use buggy firmware at
>>> all instead of adding tons of workaround. It defeats the whole purpose
>>> of having the firmware.
>>>
>> Well, at least in the case of odroid c2 from hardkernel, I believe those
>> unstable frequencies are supported intentionally. The wiki of the board
>> lists them explicitly, and says when they are stable.
>> So if that was intentional, then a frequency capping set by default
>> would be needed, it can be lifted by a specific user if he wants. Unless
>> hk should disable that "feature" instead.
> 
> If those frequency are known to cause any stability issues, they should
> be considered as *not supported*. If it's just thermal constraints then
> yes the user/developer should be allowed to use them as they can take
> care of thermal management so that the platform remains stable for usage.
> 
The unstability does not occur when you do not use all cores at once, so hmm
Neil Armstrong Feb. 9, 2017, 1:31 p.m. UTC | #7
On 02/09/2017 01:51 PM, Michał Zegan wrote:
> 
> 
> W dniu 09.02.2017 o 13:25, Sudeep Holla pisze:
>>
>>
>> On 09/02/17 12:19, Michał Zegan wrote:
>>>
>>>
>>> W dniu 09.02.2017 o 11:52, Sudeep Holla pisze:
>>>>
>>>>
>>>> On 08/02/17 19:45, Kevin Hilman wrote:
>>>>> Sudeep Holla <sudeep.holla@arm.com> writes:
>>>>>
>>>>>> On 04/02/17 21:03, Heiner Kallweit wrote:
>>>>>>> Introduce an optional property "clock-max-frequency" for SCPI DVFS
>>>>>>> clocks. All frequencies for the respective clock exceeding this
>>>>>>> threshold will be ignored.
>>>>>>>
>>>>>>> This is useful on systems where the firmware offers too optimistic
>>>>>>> clock rates causing instabilities and crashes.
>>>>>>>
>>>>>>
>>>>>> It clearly means the firmware/hardware(IOW platform) was not tested
>>>>>> correctly before firmware advertised the OPPs. It needs to fixed in the
>>>>>> firmware. The approach should be advertise the known minimal set working
>>>>>> rather than the set for which hardware was designed.
>>>>>>
>>>>>> That's the whole reason while these are kept in firmware so the OS need
>>>>>> not worry about such details.
>>>>>>
>>>>>> So NACK, go fix the firmware 
>>>>>
>>>>> Sorry, but "go fix the firmware" is not an option for most users of
>>>>> these boards.
>>>>>
>>>>
>>>> I knew this was coming :). I just wanted to shout at vendors who are not
>>>> validating their firmware. Sometimes I feel it's better have platform
>>>> driver and drive everything from Linux and don't use buggy firmware at
>>>> all instead of adding tons of workaround. It defeats the whole purpose
>>>> of having the firmware.
>>>>
>>> Well, at least in the case of odroid c2 from hardkernel, I believe those
>>> unstable frequencies are supported intentionally. The wiki of the board
>>> lists them explicitly, and says when they are stable.
>>> So if that was intentional, then a frequency capping set by default
>>> would be needed, it can be lifted by a specific user if he wants. Unless
>>> hk should disable that "feature" instead.
>>
>> If those frequency are known to cause any stability issues, they should
>> be considered as *not supported*. If it's just thermal constraints then
>> yes the user/developer should be allowed to use them as they can take
>> care of thermal management so that the platform remains stable for usage.
>>
> The unstability does not occur when you do not use all cores at once, so hmm

Hi Sudeep,

On Hardkernel ODroid C2 based on Amlogic S905, due to a bad hardware design the platform
crashes when using >1536MHz frequencies while activating all cores.

But Hardkernel kept the frequencies to be able to use them when having a single core enabled
using their off-tree highly modified linux kernel.

So we need to handle these kind of cases where the vendor leaves them for strong reasons
but mainlinux linux is not (yet) a strong reason to change the firmware.

I'm sure we can find a smart solution to handle these use cases without adding too much
ugly code.

Neil
Sudeep Holla Feb. 9, 2017, 2:29 p.m. UTC | #8
On 09/02/17 13:31, Neil Armstrong wrote:
> On 02/09/2017 01:51 PM, Michał Zegan wrote:
>>
>>
>> W dniu 09.02.2017 o 13:25, Sudeep Holla pisze:
>>>
>>>
>>> On 09/02/17 12:19, Michał Zegan wrote:
>>>>
>>>>
>>>> W dniu 09.02.2017 o 11:52, Sudeep Holla pisze:
>>>>>
>>>>>
>>>>> On 08/02/17 19:45, Kevin Hilman wrote:
>>>>>> Sudeep Holla <sudeep.holla@arm.com> writes:
>>>>>>
>>>>>>> On 04/02/17 21:03, Heiner Kallweit wrote:
>>>>>>>> Introduce an optional property "clock-max-frequency" for SCPI DVFS
>>>>>>>> clocks. All frequencies for the respective clock exceeding this
>>>>>>>> threshold will be ignored.
>>>>>>>>
>>>>>>>> This is useful on systems where the firmware offers too optimistic
>>>>>>>> clock rates causing instabilities and crashes.
>>>>>>>>
>>>>>>>
>>>>>>> It clearly means the firmware/hardware(IOW platform) was not tested
>>>>>>> correctly before firmware advertised the OPPs. It needs to fixed in the
>>>>>>> firmware. The approach should be advertise the known minimal set working
>>>>>>> rather than the set for which hardware was designed.
>>>>>>>
>>>>>>> That's the whole reason while these are kept in firmware so the OS need
>>>>>>> not worry about such details.
>>>>>>>
>>>>>>> So NACK, go fix the firmware 
>>>>>>
>>>>>> Sorry, but "go fix the firmware" is not an option for most users of
>>>>>> these boards.
>>>>>>
>>>>>
>>>>> I knew this was coming :). I just wanted to shout at vendors who are not
>>>>> validating their firmware. Sometimes I feel it's better have platform
>>>>> driver and drive everything from Linux and don't use buggy firmware at
>>>>> all instead of adding tons of workaround. It defeats the whole purpose
>>>>> of having the firmware.
>>>>>
>>>> Well, at least in the case of odroid c2 from hardkernel, I believe those
>>>> unstable frequencies are supported intentionally. The wiki of the board
>>>> lists them explicitly, and says when they are stable.
>>>> So if that was intentional, then a frequency capping set by default
>>>> would be needed, it can be lifted by a specific user if he wants. Unless
>>>> hk should disable that "feature" instead.
>>>
>>> If those frequency are known to cause any stability issues, they should
>>> be considered as *not supported*. If it's just thermal constraints then
>>> yes the user/developer should be allowed to use them as they can take
>>> care of thermal management so that the platform remains stable for usage.
>>>
>> The unstability does not occur when you do not use all cores at once, so hmm
> 
> Hi Sudeep,
> 
> On Hardkernel ODroid C2 based on Amlogic S905, due to a bad hardware design the platform
> crashes when using >1536MHz frequencies while activating all cores.
> 
> But Hardkernel kept the frequencies to be able to use them when having a single core enabled
> using their off-tree highly modified linux kernel.
> 

Some kind of turbo mode which needs complete software attention to deal
with the hardware+firmware(platform) mess. Generally such frequencies
are exposed as just one OPP above the nominal range. If selected, the
firmware can choose is the maximum freq an individual processor may
reach, assuming ideal conditions, not be sustainable for long durations,
and may only be achievable if other processors are in a specific
idle/power state.

The point is it needs to be well hidden in the platform. Exposing such
specific and leaving it to the user to deal with seems stupid. They may
just fry the chip trying to experiment something :)

> So we need to handle these kind of cases where the vendor leaves them for strong reasons
> but mainlinux linux is not (yet) a strong reason to change the firmware.
> 

OK

> I'm sure we can find a smart solution to handle these use cases without adding too much
> ugly code.
> 

Agreed, I was thinking if adding some thermal driver and capping
frequency is a feasible option. Adding max-frequency to DT means one
needs to change the DT to run that custom Hardkernel ? That's doesn't
sound good, but I am sure many argue otherwise ;)
diff mbox

Patch

diff --git a/Documentation/devicetree/bindings/arm/arm,scpi.txt b/Documentation/devicetree/bindings/arm/arm,scpi.txt
index 40183197..7582dd2e 100644
--- a/Documentation/devicetree/bindings/arm/arm,scpi.txt
+++ b/Documentation/devicetree/bindings/arm/arm,scpi.txt
@@ -56,6 +56,13 @@  Other required properties for all clocks(all from common clock binding):
 	node. It can be non linear and hence provide the mapping of identifiers
 	into the clock-output-names array.
 
+For DVFS SCPI clocks a further optional property is supported to deal with
+SCPI DVFS clocks where some firmware-provided clocks rates are too optimistic
+and cause stability issues.
+
+- clock-max-frequency : Ignore all firmware-provided frequencies above this
+			threshold. Value 0 means no threshold.
+
 SRAM and Shared Memory for SCPI
 -------------------------------
 
diff --git a/drivers/clk/clk-scpi.c b/drivers/clk/clk-scpi.c
index 96d37175..fc99bb6e 100644
--- a/drivers/clk/clk-scpi.c
+++ b/drivers/clk/clk-scpi.c
@@ -27,6 +27,7 @@ 
 
 struct scpi_clk {
 	u32 id;
+	u32 max_freq;
 	struct clk_hw hw;
 	struct scpi_dvfs_info *info;
 	struct scpi_ops *scpi_ops;
@@ -36,6 +37,11 @@  struct scpi_clk {
 
 static struct platform_device *cpufreq_dev;
 
+static inline bool invalid_freq(struct scpi_clk *clk, u32 freq)
+{
+	return clk->max_freq && freq > clk->max_freq;
+}
+
 static unsigned long scpi_clk_recalc_rate(struct clk_hw *hw,
 					  unsigned long parent_rate)
 {
@@ -79,6 +85,8 @@  static int __scpi_dvfs_round_rate(struct scpi_clk *clk, unsigned long rate)
 
 	for (idx = 0; idx < clk->info->count; idx++, opp++) {
 		ftmp = opp->freq;
+		if (invalid_freq(clk, ftmp))
+			continue;
 		if (ftmp >= (u32)rate) {
 			if (ftmp <= fmax)
 				fmax = ftmp;
@@ -118,7 +126,7 @@  static int __scpi_find_dvfs_index(struct scpi_clk *clk, unsigned long rate)
 	const struct scpi_opp *opp = clk->info->opps;
 
 	for (idx = 0; idx < max_opp; idx++, opp++)
-		if (opp->freq == rate)
+		if (opp->freq == rate && !invalid_freq(clk, opp->freq))
 			return idx;
 	return -EINVAL;
 }
@@ -165,6 +173,7 @@  scpi_clk_ops_init(struct device *dev, const struct of_device_id *match,
 		sclk->info = sclk->scpi_ops->dvfs_get_info(sclk->id);
 		if (IS_ERR(sclk->info))
 			return PTR_ERR(sclk->info);
+		max = sclk->max_freq;
 	} else if (init.ops == &scpi_clk_ops) {
 		if (sclk->scpi_ops->clk_get_range(sclk->id, &min, &max) || !max)
 			return -EINVAL;
@@ -244,6 +253,10 @@  static int scpi_clk_add(struct device *dev, struct device_node *np,
 
 		sclk->id = val;
 
+		if (match->data == &scpi_dvfs_ops)
+			of_property_read_u32_index(np, "clock-max-frequency",
+						   idx, &sclk->max_freq);
+
 		err = scpi_clk_ops_init(dev, match, sclk, name);
 		if (err)
 			dev_err(dev, "failed to register clock '%s'\n", name);