Message ID | 20250113-mt8192-lvts-filtered-suspend-fix-v2-2-07a25200c7c6@collabora.com (mailing list archive) |
---|---|
State | New |
Delegated to: | Daniel Lezcano |
Headers | show |
Series | thermal/drivers/mediatek/lvts: Fixes for suspend and IRQ storm, and cleanups | expand |
On 13/01/2025 14:27, Nícolas F. R. A. Prado wrote: > The Stage 3 thermal threshold is currently configured during > the controller initialization to 105 Celsius. From the kernel > perspective, this configuration is harmful because: > * The stage 3 interrupt that gets triggered when the threshold is > crossed is not handled in any way by the IRQ handler, it just gets > cleared. Besides, the temperature used for stage 3 comes from the > sensors, and the critical thermal trip points described in the > Devicetree will already cause a shutdown when crossed (at a lower > temperature, of 100 Celsius, for all SoCs currently using this > driver). > * The only effect of crossing the stage 3 threshold that has been > observed is that it causes the machine to no longer be able to enter > suspend. Even if that was a result of a momentary glitch in the > temperature reading of a sensor (as has been observed on the > MT8192-based Chromebooks). > > For those reasons, disable the Stage 3 thermal threshold configuration. Does this stage 3 not designed to reset the system ? So the interrupt line should be attached to the reset line ? (just asking)
On Tue, Jan 14, 2025 at 12:54:43PM +0100, Daniel Lezcano wrote: > On 13/01/2025 14:27, Nícolas F. R. A. Prado wrote: > > The Stage 3 thermal threshold is currently configured during > > the controller initialization to 105 Celsius. From the kernel > > perspective, this configuration is harmful because: > > * The stage 3 interrupt that gets triggered when the threshold is > > crossed is not handled in any way by the IRQ handler, it just gets > > cleared. Besides, the temperature used for stage 3 comes from the > > sensors, and the critical thermal trip points described in the > > Devicetree will already cause a shutdown when crossed (at a lower > > temperature, of 100 Celsius, for all SoCs currently using this > > driver). > > * The only effect of crossing the stage 3 threshold that has been > > observed is that it causes the machine to no longer be able to enter > > suspend. Even if that was a result of a momentary glitch in the > > temperature reading of a sensor (as has been observed on the > > MT8192-based Chromebooks). > > > > For those reasons, disable the Stage 3 thermal threshold configuration. > > Does this stage 3 not designed to reset the system ? So the interrupt line > should be attached to the reset line ? (just asking) Yes, my guess is that the intention of stage 3 is to cause a system reset, however it clearly does not cause a system reset, so it is not directly connected to the reset line in any way. Instead it is up to the kernel to receive the interrupt and deal with it. But then, since there are lower thermal thresholds that already cause a system shutdown, it's useless to keep this around - it only causes suspend/resume misbehaviors in case there are spurious readings. Thanks, Nícolas
diff --git a/drivers/thermal/mediatek/lvts_thermal.c b/drivers/thermal/mediatek/lvts_thermal.c index a1a438ebad33c1fff8ca9781e12ef9e278eef785..0aaa44b734ca43e6abfd97b2ca4ce34dc6f15826 100644 --- a/drivers/thermal/mediatek/lvts_thermal.c +++ b/drivers/thermal/mediatek/lvts_thermal.c @@ -65,7 +65,7 @@ #define LVTS_HW_FILTER 0x0 #define LVTS_TSSEL_CONF 0x13121110 #define LVTS_CALSCALE_CONF 0x300 -#define LVTS_MONINT_CONF 0x8300318C +#define LVTS_MONINT_CONF 0x0300318C #define LVTS_MONINT_OFFSET_SENSOR0 0xC #define LVTS_MONINT_OFFSET_SENSOR1 0x180 @@ -91,8 +91,6 @@ #define LVTS_MSR_READ_TIMEOUT_US 400 #define LVTS_MSR_READ_WAIT_US (LVTS_MSR_READ_TIMEOUT_US / 2) -#define LVTS_HW_TSHUT_TEMP 105000 - #define LVTS_MINIMUM_THRESHOLD 20000 static int golden_temp = LVTS_GOLDEN_TEMP_DEFAULT; @@ -145,7 +143,6 @@ struct lvts_ctrl { struct lvts_sensor sensors[LVTS_SENSOR_MAX]; const struct lvts_data *lvts_data; u32 calibration[LVTS_SENSOR_MAX]; - u32 hw_tshut_raw_temp; u8 valid_sensor_mask; int mode; void __iomem *base; @@ -837,14 +834,6 @@ static int lvts_ctrl_init(struct device *dev, struct lvts_domain *lvts_td, */ lvts_ctrl[i].mode = lvts_data->lvts_ctrl[i].mode; - /* - * The temperature to raw temperature must be done - * after initializing the calibration. - */ - lvts_ctrl[i].hw_tshut_raw_temp = - lvts_temp_to_raw(LVTS_HW_TSHUT_TEMP, - lvts_data->temp_factor); - lvts_ctrl[i].low_thresh = INT_MIN; lvts_ctrl[i].high_thresh = INT_MIN; } @@ -919,7 +908,6 @@ static int lvts_irq_init(struct lvts_ctrl *lvts_ctrl) * 10 : Selected sensor with bits 19-18 * 11 : Reserved */ - writel(BIT(16), LVTS_PROTCTL(lvts_ctrl->base)); /* * LVTS_PROTTA : Stage 1 temperature threshold @@ -932,8 +920,8 @@ static int lvts_irq_init(struct lvts_ctrl *lvts_ctrl) * * writel(0x0, LVTS_PROTTA(lvts_ctrl->base)); * writel(0x0, LVTS_PROTTB(lvts_ctrl->base)); + * writel(0x0, LVTS_PROTTC(lvts_ctrl->base)); */ - writel(lvts_ctrl->hw_tshut_raw_temp, LVTS_PROTTC(lvts_ctrl->base)); /* * LVTS_MONINT : Interrupt configuration register