Message ID | 20240704-topic-sm8x50-upstream-fix-battmgr-temp-tz-warn-v1-1-9d66d6f6efde@linaro.org (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | power: supply: core: return -EAGAIN on uninitialized read temp | expand |
On Thu, Jul 04, 2024 at 10:52:08AM +0200, Neil Armstrong wrote: > If the thermal core tries to update the temperature from an > uninitialized power supply, it will swawn the following warning: > thermal thermal_zoneXX: failed to read out thermal zone (-19) > > But reading from an uninitialized power supply should not be > considered as a fatal error, but the thermal core expects > the -EAGAIN error to be returned in this particular case. > > So convert -ENODEV as -EAGAIN to express the fact that reading > temperature from an uninitialized power supply shouldn't be > a fatal error, but should indicate to the thermal zone it should > retry later. > > It notably removes such messages on Qualcomm platforms using the > qcom_battmgr driver spawning warnings until the aDSP firmware > gets up and the battery manager reports valid data. > > Link: https://lore.kernel.org/all/2ed4c630-204a-4f80-a37f-f2ca838eb455@linaro.org/ > Fixes: 5bc28b93a36e ("power_supply: power_supply_read_temp only if use_cnt > 0") > Fixes: 3be330bf8860 ("power_supply: Register battery as a thermal zone") > Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> > --- > drivers/power/supply/power_supply_core.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) Hi, This is the friendly patch-bot of Greg Kroah-Hartman. You have sent him a patch that has triggered this response. He used to manually respond to these common problems, but in order to save his sanity (he kept writing the same thing over and over, yet to different people), I was created. Hopefully you will not take offence and will fix the problem in your patch and resubmit it so that it can be accepted into the Linux kernel tree. You are receiving this message because of the following common error(s) as indicated below: - You have marked a patch with a "Fixes:" tag for a commit that is in an older released kernel, yet you do not have a cc: stable line in the signed-off-by area at all, which means that the patch will not be applied to any older kernel releases. To properly fix this, please follow the documented rules in the Documentation/process/stable-kernel-rules.rst file for how to resolve this. If you wish to discuss this problem further, or you have questions about how to resolve this issue, please feel free to respond to this email and Greg will reply once he has dug out from the pending patches received from other developers. thanks, greg k-h's patch email bot
On 04/07/2024 10:52, Neil Armstrong wrote: > If the thermal core tries to update the temperature from an > uninitialized power supply, it will swawn the following warning: > thermal thermal_zoneXX: failed to read out thermal zone (-19) > > But reading from an uninitialized power supply should not be > considered as a fatal error, but the thermal core expects > the -EAGAIN error to be returned in this particular case. > > So convert -ENODEV as -EAGAIN to express the fact that reading > temperature from an uninitialized power supply shouldn't be > a fatal error, but should indicate to the thermal zone it should > retry later. > > It notably removes such messages on Qualcomm platforms using the > qcom_battmgr driver spawning warnings until the aDSP firmware > gets up and the battery manager reports valid data. Is it possible to have the aDSP firmware ready first ? > Link: https://lore.kernel.org/all/2ed4c630-204a-4f80-a37f-f2ca838eb455@linaro.org/ > Fixes: 5bc28b93a36e ("power_supply: power_supply_read_temp only if use_cnt > 0") > Fixes: 3be330bf8860 ("power_supply: Register battery as a thermal zone") > Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> > --- > drivers/power/supply/power_supply_core.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/power/supply/power_supply_core.c b/drivers/power/supply/power_supply_core.c > index 8f6025acd10a..b38bff4dbfc7 100644 > --- a/drivers/power/supply/power_supply_core.c > +++ b/drivers/power/supply/power_supply_core.c > @@ -1287,8 +1287,13 @@ static int power_supply_read_temp(struct thermal_zone_device *tzd, > WARN_ON(tzd == NULL); > psy = thermal_zone_device_priv(tzd); > ret = power_supply_get_property(psy, POWER_SUPPLY_PROP_TEMP, &val); > + /* > + * The thermal core expects -EAGAIN as non-fatal error, > + * convert -ENODEV as -EAGAIN since -ENODEV is returned > + * when a power supply device isn't initialized > + */ > if (ret) > - return ret; > + return ret == -ENODEV ? -EAGAIN : ret; > > /* Convert tenths of degree Celsius to milli degree Celsius. */ > *temp = val.intval * 100; > > --- > base-commit: 82e4255305c554b0bb18b7ccf2db86041b4c8b6e > change-id: 20240704-topic-sm8x50-upstream-fix-battmgr-temp-tz-warn-077166861efb > > Best regards,
On 04/07/2024 18:41, Daniel Lezcano wrote: > On 04/07/2024 10:52, Neil Armstrong wrote: >> If the thermal core tries to update the temperature from an >> uninitialized power supply, it will swawn the following warning: >> thermal thermal_zoneXX: failed to read out thermal zone (-19) >> >> But reading from an uninitialized power supply should not be >> considered as a fatal error, but the thermal core expects >> the -EAGAIN error to be returned in this particular case. >> >> So convert -ENODEV as -EAGAIN to express the fact that reading >> temperature from an uninitialized power supply shouldn't be >> a fatal error, but should indicate to the thermal zone it should >> retry later. >> >> It notably removes such messages on Qualcomm platforms using the >> qcom_battmgr driver spawning warnings until the aDSP firmware >> gets up and the battery manager reports valid data. > > Is it possible to have the aDSP firmware ready first ? I don't think so. ADSP firmware is a file, so as every firmware it can be loaded from rootfs, not initramfs (unlike this driver), or even missing. Best regards, Krzysztof
On 05/07/2024 07:56, Krzysztof Kozlowski wrote: > On 04/07/2024 18:41, Daniel Lezcano wrote: >> On 04/07/2024 10:52, Neil Armstrong wrote: >>> If the thermal core tries to update the temperature from an >>> uninitialized power supply, it will swawn the following warning: >>> thermal thermal_zoneXX: failed to read out thermal zone (-19) >>> >>> But reading from an uninitialized power supply should not be >>> considered as a fatal error, but the thermal core expects >>> the -EAGAIN error to be returned in this particular case. >>> >>> So convert -ENODEV as -EAGAIN to express the fact that reading >>> temperature from an uninitialized power supply shouldn't be >>> a fatal error, but should indicate to the thermal zone it should >>> retry later. >>> >>> It notably removes such messages on Qualcomm platforms using the >>> qcom_battmgr driver spawning warnings until the aDSP firmware >>> gets up and the battery manager reports valid data. >> >> Is it possible to have the aDSP firmware ready first ? > > I don't think so. ADSP firmware is a file, so as every firmware it can > be loaded from rootfs, not initramfs (unlike this driver), or even missing. Ok, said differently, can't we initialize the thermal zone after the firmware is loaded ?
On 05/07/2024 10:08, Daniel Lezcano wrote: > On 05/07/2024 07:56, Krzysztof Kozlowski wrote: >> On 04/07/2024 18:41, Daniel Lezcano wrote: >>> On 04/07/2024 10:52, Neil Armstrong wrote: >>>> If the thermal core tries to update the temperature from an >>>> uninitialized power supply, it will swawn the following warning: >>>> thermal thermal_zoneXX: failed to read out thermal zone (-19) >>>> >>>> But reading from an uninitialized power supply should not be >>>> considered as a fatal error, but the thermal core expects >>>> the -EAGAIN error to be returned in this particular case. >>>> >>>> So convert -ENODEV as -EAGAIN to express the fact that reading >>>> temperature from an uninitialized power supply shouldn't be >>>> a fatal error, but should indicate to the thermal zone it should >>>> retry later. >>>> >>>> It notably removes such messages on Qualcomm platforms using the >>>> qcom_battmgr driver spawning warnings until the aDSP firmware >>>> gets up and the battery manager reports valid data. >>> >>> Is it possible to have the aDSP firmware ready first ? >> >> I don't think so. ADSP firmware is a file, so as every firmware it can >> be loaded from rootfs, not initramfs (unlike this driver), or even missing. > > Ok, said differently, can't we initialize the thermal zone after the firmware is loaded ? This is the goal, but this can't be a fix but a proper rework. > I think changing power_supply_core.c is not the right solution. qcom_battmgr_bat_get_property() should return -EAGAIN instead of -ENODEV. Neil
On 15/07/2024 11:30, Neil Armstrong wrote: > On 05/07/2024 10:08, Daniel Lezcano wrote: >> On 05/07/2024 07:56, Krzysztof Kozlowski wrote: >>> On 04/07/2024 18:41, Daniel Lezcano wrote: >>>> On 04/07/2024 10:52, Neil Armstrong wrote: >>>>> If the thermal core tries to update the temperature from an >>>>> uninitialized power supply, it will swawn the following warning: >>>>> thermal thermal_zoneXX: failed to read out thermal zone (-19) >>>>> >>>>> But reading from an uninitialized power supply should not be >>>>> considered as a fatal error, but the thermal core expects >>>>> the -EAGAIN error to be returned in this particular case. >>>>> >>>>> So convert -ENODEV as -EAGAIN to express the fact that reading >>>>> temperature from an uninitialized power supply shouldn't be >>>>> a fatal error, but should indicate to the thermal zone it should >>>>> retry later. >>>>> >>>>> It notably removes such messages on Qualcomm platforms using the >>>>> qcom_battmgr driver spawning warnings until the aDSP firmware >>>>> gets up and the battery manager reports valid data. >>>> >>>> Is it possible to have the aDSP firmware ready first ? >>> >>> I don't think so. ADSP firmware is a file, so as every firmware it can >>> be loaded from rootfs, not initramfs (unlike this driver), or even >>> missing. >> >> Ok, said differently, can't we initialize the thermal zone after the >> firmware is loaded ? > > This is the goal, but this can't be a fix but a proper rework. Right, it is a design issue and we are finding this problem in several drivers using the thermal zone. Unfortunately that forces the thermal core to do cumbersome mechanisms because of this and obviously it is a friction for thermal core cleanups / rework. IOW, bad driver design => thermal core impacted. > I think changing power_supply_core.c is not the right solution. From my POV, it is the right solution but I agree it could take a cycle or more to fix. > qcom_battmgr_bat_get_property() should return -EAGAIN instead of > -ENODEV. Yes, we can do that in the first place and come back to solve this firmware / async issue in a more generic way later
diff --git a/drivers/power/supply/power_supply_core.c b/drivers/power/supply/power_supply_core.c index 8f6025acd10a..b38bff4dbfc7 100644 --- a/drivers/power/supply/power_supply_core.c +++ b/drivers/power/supply/power_supply_core.c @@ -1287,8 +1287,13 @@ static int power_supply_read_temp(struct thermal_zone_device *tzd, WARN_ON(tzd == NULL); psy = thermal_zone_device_priv(tzd); ret = power_supply_get_property(psy, POWER_SUPPLY_PROP_TEMP, &val); + /* + * The thermal core expects -EAGAIN as non-fatal error, + * convert -ENODEV as -EAGAIN since -ENODEV is returned + * when a power supply device isn't initialized + */ if (ret) - return ret; + return ret == -ENODEV ? -EAGAIN : ret; /* Convert tenths of degree Celsius to milli degree Celsius. */ *temp = val.intval * 100;
If the thermal core tries to update the temperature from an uninitialized power supply, it will swawn the following warning: thermal thermal_zoneXX: failed to read out thermal zone (-19) But reading from an uninitialized power supply should not be considered as a fatal error, but the thermal core expects the -EAGAIN error to be returned in this particular case. So convert -ENODEV as -EAGAIN to express the fact that reading temperature from an uninitialized power supply shouldn't be a fatal error, but should indicate to the thermal zone it should retry later. It notably removes such messages on Qualcomm platforms using the qcom_battmgr driver spawning warnings until the aDSP firmware gets up and the battery manager reports valid data. Link: https://lore.kernel.org/all/2ed4c630-204a-4f80-a37f-f2ca838eb455@linaro.org/ Fixes: 5bc28b93a36e ("power_supply: power_supply_read_temp only if use_cnt > 0") Fixes: 3be330bf8860 ("power_supply: Register battery as a thermal zone") Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> --- drivers/power/supply/power_supply_core.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) --- base-commit: 82e4255305c554b0bb18b7ccf2db86041b4c8b6e change-id: 20240704-topic-sm8x50-upstream-fix-battmgr-temp-tz-warn-077166861efb Best regards,