diff mbox series

[1/2] thermal/core: Hardening the self-encapsulation

Message ID 20231012102700.2858952-1-daniel.lezcano@linaro.org (mailing list archive)
State Deferred, archived
Headers show
Series [1/2] thermal/core: Hardening the self-encapsulation | expand

Commit Message

Daniel Lezcano Oct. 12, 2023, 10:26 a.m. UTC
The thermal private header has leaked all around the drivers which
interacted with the core internals. The thermal zone structure which
was part of the exported header led also to a leakage of the fields
into the different drivers, making very difficult to improve the core
code without having to change the drivers.

Now we mostly fixed how the thermal drivers were interacting with the
thermal zones (actually fixed how they should not interact). The
thermal zone structure will be moved to the private thermal core
header. This header has been removed from the different drivers and
must belong to the core code only. In order to prevent this private
header to be included again in the drivers, make explicit only the
core code can include this header by defining a THERMAL_CORE_SUBSYS
macro. The private header will contain a check against this macro.

The Tegra SoCtherm driver needs to access thermal_core.h to have the
get_thermal_instance() function definition. It is the only one
remaining driver which need to access the thermal_core.h header, so
the check will emit a warning at compilation time.

Thierry Reding is reworking the driver to get rid of this function [1]
and thus when the changes will be merged, the compilation warning will
be converted to a compilation error, closing definitively the door to
the drivers willing to play with the thermal zone device internals.

No functional changes intended.

[1] https://lore.kernel.org/all/20230414125721.1043589-1-thierry.reding@gmail.com/

Cc: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 drivers/thermal/gov_bang_bang.c       | 1 +
 drivers/thermal/gov_fair_share.c      | 1 +
 drivers/thermal/gov_power_allocator.c | 1 +
 drivers/thermal/gov_step_wise.c       | 1 +
 drivers/thermal/gov_user_space.c      | 1 +
 drivers/thermal/thermal_acpi.c        | 1 +
 drivers/thermal/thermal_core.c        | 1 +
 drivers/thermal/thermal_core.h        | 4 ++++
 drivers/thermal/thermal_helpers.c     | 1 +
 drivers/thermal/thermal_hwmon.c       | 1 +
 drivers/thermal/thermal_netlink.c     | 1 +
 drivers/thermal/thermal_of.c          | 1 +
 drivers/thermal/thermal_sysfs.c       | 1 +
 drivers/thermal/thermal_trip.c        | 1 +
 14 files changed, 17 insertions(+)

Comments

Lukasz Luba Oct. 12, 2023, 12:01 p.m. UTC | #1
Hi Daniel,

On 10/12/23 11:26, Daniel Lezcano wrote:
> The thermal private header has leaked all around the drivers which
> interacted with the core internals. The thermal zone structure which
> was part of the exported header led also to a leakage of the fields
> into the different drivers, making very difficult to improve the core
> code without having to change the drivers.
> 
> Now we mostly fixed how the thermal drivers were interacting with the
> thermal zones (actually fixed how they should not interact). The
> thermal zone structure will be moved to the private thermal core
> header. This header has been removed from the different drivers and
> must belong to the core code only. In order to prevent this private
> header to be included again in the drivers, make explicit only the
> core code can include this header by defining a THERMAL_CORE_SUBSYS
> macro. The private header will contain a check against this macro.
> 
> The Tegra SoCtherm driver needs to access thermal_core.h to have the
> get_thermal_instance() function definition. It is the only one
> remaining driver which need to access the thermal_core.h header, so
> the check will emit a warning at compilation time.
> 
> Thierry Reding is reworking the driver to get rid of this function [1]
> and thus when the changes will be merged, the compilation warning will
> be converted to a compilation error, closing definitively the door to
> the drivers willing to play with the thermal zone device internals.

That looks like a good idea. Although, shouldn't we avoid the
compilation warnings and just first merge the fixes for drivers?

Regards,
Lukasz
Daniel Lezcano Oct. 12, 2023, 1:14 p.m. UTC | #2
Hi Lukasz,

On 12/10/2023 14:01, Lukasz Luba wrote:
> Hi Daniel,
> 
> On 10/12/23 11:26, Daniel Lezcano wrote:
>> The thermal private header has leaked all around the drivers which
>> interacted with the core internals. The thermal zone structure which
>> was part of the exported header led also to a leakage of the fields
>> into the different drivers, making very difficult to improve the core
>> code without having to change the drivers.
>>
>> Now we mostly fixed how the thermal drivers were interacting with the
>> thermal zones (actually fixed how they should not interact). The
>> thermal zone structure will be moved to the private thermal core
>> header. This header has been removed from the different drivers and
>> must belong to the core code only. In order to prevent this private
>> header to be included again in the drivers, make explicit only the
>> core code can include this header by defining a THERMAL_CORE_SUBSYS
>> macro. The private header will contain a check against this macro.
>>
>> The Tegra SoCtherm driver needs to access thermal_core.h to have the
>> get_thermal_instance() function definition. It is the only one
>> remaining driver which need to access the thermal_core.h header, so
>> the check will emit a warning at compilation time.
>>
>> Thierry Reding is reworking the driver to get rid of this function [1]
>> and thus when the changes will be merged, the compilation warning will
>> be converted to a compilation error, closing definitively the door to
>> the drivers willing to play with the thermal zone device internals.
> 
> That looks like a good idea. Although, shouldn't we avoid the
> compilation warnings and just first merge the fixes for drivers?

Yes, we should but there is the series for nvidia (pointed in the 
changelog) which need a slight refresh for the bindings AFAIR. That 
series is since March 2023 and Thierry seems busy [1]. I'm holding the 
hardening since then.

So I don't know how to make progress on this? I was assuming we can 
merge this series and let the compiler recall what has to be fixed.

[1] https://lore.kernel.org/all/ZK14edZUih1kH_sZ@orome/

and as soon as it is fixed, we convert the WARNING to ERROR :P
Thierry Reding Oct. 12, 2023, 2:30 p.m. UTC | #3
On Thu, Oct 12, 2023 at 03:14:23PM +0200, Daniel Lezcano wrote:
> 
> Hi Lukasz,
> 
> On 12/10/2023 14:01, Lukasz Luba wrote:
> > Hi Daniel,
> > 
> > On 10/12/23 11:26, Daniel Lezcano wrote:
> > > The thermal private header has leaked all around the drivers which
> > > interacted with the core internals. The thermal zone structure which
> > > was part of the exported header led also to a leakage of the fields
> > > into the different drivers, making very difficult to improve the core
> > > code without having to change the drivers.
> > > 
> > > Now we mostly fixed how the thermal drivers were interacting with the
> > > thermal zones (actually fixed how they should not interact). The
> > > thermal zone structure will be moved to the private thermal core
> > > header. This header has been removed from the different drivers and
> > > must belong to the core code only. In order to prevent this private
> > > header to be included again in the drivers, make explicit only the
> > > core code can include this header by defining a THERMAL_CORE_SUBSYS
> > > macro. The private header will contain a check against this macro.
> > > 
> > > The Tegra SoCtherm driver needs to access thermal_core.h to have the
> > > get_thermal_instance() function definition. It is the only one
> > > remaining driver which need to access the thermal_core.h header, so
> > > the check will emit a warning at compilation time.
> > > 
> > > Thierry Reding is reworking the driver to get rid of this function [1]
> > > and thus when the changes will be merged, the compilation warning will
> > > be converted to a compilation error, closing definitively the door to
> > > the drivers willing to play with the thermal zone device internals.
> > 
> > That looks like a good idea. Although, shouldn't we avoid the
> > compilation warnings and just first merge the fixes for drivers?
> 
> Yes, we should but there is the series for nvidia (pointed in the changelog)
> which need a slight refresh for the bindings AFAIR. That series is since
> March 2023 and Thierry seems busy [1]. I'm holding the hardening since then.
> 
> So I don't know how to make progress on this? I was assuming we can merge
> this series and let the compiler recall what has to be fixed.
> 
> [1] https://lore.kernel.org/all/ZK14edZUih1kH_sZ@orome/
> 
> and as soon as it is fixed, we convert the WARNING to ERROR :P

Yeah, sorry about that. This has been plagued by bad luck. Since then
the device that I need to test this on broke and it then took a while to
get a new one. I still need to find time to set this up and do some more
extensive testing to make sure I haven't regressed anything.

Thierry
Rafael J. Wysocki Oct. 12, 2023, 5:44 p.m. UTC | #4
On Thu, Oct 12, 2023 at 3:14 PM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
>
> Hi Lukasz,
>
> On 12/10/2023 14:01, Lukasz Luba wrote:
> > Hi Daniel,
> >
> > On 10/12/23 11:26, Daniel Lezcano wrote:
> >> The thermal private header has leaked all around the drivers which
> >> interacted with the core internals. The thermal zone structure which
> >> was part of the exported header led also to a leakage of the fields
> >> into the different drivers, making very difficult to improve the core
> >> code without having to change the drivers.
> >>
> >> Now we mostly fixed how the thermal drivers were interacting with the
> >> thermal zones (actually fixed how they should not interact). The
> >> thermal zone structure will be moved to the private thermal core
> >> header. This header has been removed from the different drivers and
> >> must belong to the core code only. In order to prevent this private
> >> header to be included again in the drivers, make explicit only the
> >> core code can include this header by defining a THERMAL_CORE_SUBSYS
> >> macro. The private header will contain a check against this macro.
> >>
> >> The Tegra SoCtherm driver needs to access thermal_core.h to have the
> >> get_thermal_instance() function definition. It is the only one
> >> remaining driver which need to access the thermal_core.h header, so
> >> the check will emit a warning at compilation time.
> >>
> >> Thierry Reding is reworking the driver to get rid of this function [1]
> >> and thus when the changes will be merged, the compilation warning will
> >> be converted to a compilation error, closing definitively the door to
> >> the drivers willing to play with the thermal zone device internals.
> >
> > That looks like a good idea. Although, shouldn't we avoid the
> > compilation warnings and just first merge the fixes for drivers?
>
> Yes, we should but there is the series for nvidia (pointed in the
> changelog) which need a slight refresh for the bindings AFAIR. That
> series is since March 2023 and Thierry seems busy [1]. I'm holding the
> hardening since then.
>
> So I don't know how to make progress on this? I was assuming we can
> merge this series and let the compiler recall what has to be fixed.
>
> [1] https://lore.kernel.org/all/ZK14edZUih1kH_sZ@orome/
>
> and as soon as it is fixed, we convert the WARNING to ERROR :P

To be honest, I'm not sure if anything needs to be done along the
lines of this patch right now or even at all.

The only concern here would be that some new drivers would include
thermal_core.h while we were waiting for the remaining existing
abusers to be fixed, but since this hasn't happened for the last 6
months, I'm not worried.

It would be good to add a notice to thermal_core.h that this file is
for internal use in the thermal core only, though.
Daniel Lezcano Oct. 12, 2023, 9:13 p.m. UTC | #5
On 12/10/2023 19:44, Rafael J. Wysocki wrote:

[ ... ]

>> Yes, we should but there is the series for nvidia (pointed in the
>> changelog) which need a slight refresh for the bindings AFAIR. That
>> series is since March 2023 and Thierry seems busy [1]. I'm holding the
>> hardening since then.
>>
>> So I don't know how to make progress on this? I was assuming we can
>> merge this series and let the compiler recall what has to be fixed.
>>
>> [1] https://lore.kernel.org/all/ZK14edZUih1kH_sZ@orome/
>>
>> and as soon as it is fixed, we convert the WARNING to ERROR :P
> 
> To be honest, I'm not sure if anything needs to be done along the
> lines of this patch right now or even at all.
> 
> The only concern here would be that some new drivers would include
> thermal_core.h while we were waiting for the remaining existing
> abusers to be fixed, but since this hasn't happened for the last 6
> months, I'm not worried.
> 
> It would be good to add a notice to thermal_core.h that this file is
> for internal use in the thermal core only, though.

So this series will give a warning for the remaining nvidia driver but 
Thierry just send the changes to get rid of the thermal_core.h (Thanks!)

AFAICT, the last user of the thermal_zone_device structure is the 
int340x driver but the patch fixing the structure internal usage is 
already in the bleeding edge (well perhaps one nit is missing for the trace)

As soon as the bindings are acked, then I can pick the patches from 
Thierry which will end up in your tree. Then you can apply the current 
series. And finally I'll send the last patch moving the thermal zone 
device structure to the thermal_core.h. And we will be done with this part.

Having a compilation warning (I would prefer a more coercive error if we 
agree on that) will help to not have to double check every time 
thermal_core.h is not pulled in the drivers when patches are submitted.
Lukasz Luba Oct. 13, 2023, 8:03 a.m. UTC | #6
On 10/12/23 14:14, Daniel Lezcano wrote:
> 
> Hi Lukasz,
> 
> On 12/10/2023 14:01, Lukasz Luba wrote:
>> Hi Daniel,
>>
>> On 10/12/23 11:26, Daniel Lezcano wrote:
>>> The thermal private header has leaked all around the drivers which
>>> interacted with the core internals. The thermal zone structure which
>>> was part of the exported header led also to a leakage of the fields
>>> into the different drivers, making very difficult to improve the core
>>> code without having to change the drivers.
>>>
>>> Now we mostly fixed how the thermal drivers were interacting with the
>>> thermal zones (actually fixed how they should not interact). The
>>> thermal zone structure will be moved to the private thermal core
>>> header. This header has been removed from the different drivers and
>>> must belong to the core code only. In order to prevent this private
>>> header to be included again in the drivers, make explicit only the
>>> core code can include this header by defining a THERMAL_CORE_SUBSYS
>>> macro. The private header will contain a check against this macro.
>>>
>>> The Tegra SoCtherm driver needs to access thermal_core.h to have the
>>> get_thermal_instance() function definition. It is the only one
>>> remaining driver which need to access the thermal_core.h header, so
>>> the check will emit a warning at compilation time.
>>>
>>> Thierry Reding is reworking the driver to get rid of this function [1]
>>> and thus when the changes will be merged, the compilation warning will
>>> be converted to a compilation error, closing definitively the door to
>>> the drivers willing to play with the thermal zone device internals.
>>
>> That looks like a good idea. Although, shouldn't we avoid the
>> compilation warnings and just first merge the fixes for drivers?
> 
> Yes, we should but there is the series for nvidia (pointed in the 
> changelog) which need a slight refresh for the bindings AFAIR. That 
> series is since March 2023 and Thierry seems busy [1]. I'm holding the 
> hardening since then.
> 
> So I don't know how to make progress on this? I was assuming we can 
> merge this series and let the compiler recall what has to be fixed.
> 
> [1] https://lore.kernel.org/all/ZK14edZUih1kH_sZ@orome/
> 
> and as soon as it is fixed, we convert the WARNING to ERROR :P
> 
> 
> 
> 

OK, so that should be good for possible distro configs IMO.
The chicken egg problem should be addressed and we can start from
this patch approach.

Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Rafael J. Wysocki Oct. 13, 2023, 9:43 a.m. UTC | #7
On Thu, Oct 12, 2023 at 11:13 PM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> On 12/10/2023 19:44, Rafael J. Wysocki wrote:
>
> [ ... ]
>
> >> Yes, we should but there is the series for nvidia (pointed in the
> >> changelog) which need a slight refresh for the bindings AFAIR. That
> >> series is since March 2023 and Thierry seems busy [1]. I'm holding the
> >> hardening since then.
> >>
> >> So I don't know how to make progress on this? I was assuming we can
> >> merge this series and let the compiler recall what has to be fixed.
> >>
> >> [1] https://lore.kernel.org/all/ZK14edZUih1kH_sZ@orome/
> >>
> >> and as soon as it is fixed, we convert the WARNING to ERROR :P
> >
> > To be honest, I'm not sure if anything needs to be done along the
> > lines of this patch right now or even at all.
> >
> > The only concern here would be that some new drivers would include
> > thermal_core.h while we were waiting for the remaining existing
> > abusers to be fixed, but since this hasn't happened for the last 6
> > months, I'm not worried.
> >
> > It would be good to add a notice to thermal_core.h that this file is
> > for internal use in the thermal core only, though.
>
> So this series will give a warning for the remaining nvidia driver but
> Thierry just send the changes to get rid of the thermal_core.h (Thanks!)
>
> AFAICT, the last user of the thermal_zone_device structure is the
> int340x driver but the patch fixing the structure internal usage is
> already in the bleeding edge (well perhaps one nit is missing for the trace)
>
> As soon as the bindings are acked, then I can pick the patches from
> Thierry which will end up in your tree. Then you can apply the current
> series. And finally I'll send the last patch moving the thermal zone
> device structure to the thermal_core.h. And we will be done with this part.

OK

> Having a compilation warning (I would prefer a more coercive error if we
> agree on that) will help to not have to double check every time
> thermal_core.h is not pulled in the drivers when patches are submitted.

Well, at least it doesn't hurt to have it.
diff mbox series

Patch

diff --git a/drivers/thermal/gov_bang_bang.c b/drivers/thermal/gov_bang_bang.c
index 1b121066521f..752c627075ba 100644
--- a/drivers/thermal/gov_bang_bang.c
+++ b/drivers/thermal/gov_bang_bang.c
@@ -11,6 +11,7 @@ 
 
 #include <linux/thermal.h>
 
+#define THERMAL_CORE_SUBSYS
 #include "thermal_core.h"
 
 static int thermal_zone_trip_update(struct thermal_zone_device *tz, int trip_id)
diff --git a/drivers/thermal/gov_fair_share.c b/drivers/thermal/gov_fair_share.c
index 03c2daeb6ee8..108cb5074594 100644
--- a/drivers/thermal/gov_fair_share.c
+++ b/drivers/thermal/gov_fair_share.c
@@ -13,6 +13,7 @@ 
 #include <linux/thermal.h>
 #include "thermal_trace.h"
 
+#define THERMAL_CORE_SUBSYS
 #include "thermal_core.h"
 
 /**
diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
index 8642f1096b91..d1c6ad92e5b4 100644
--- a/drivers/thermal/gov_power_allocator.c
+++ b/drivers/thermal/gov_power_allocator.c
@@ -14,6 +14,7 @@ 
 #define CREATE_TRACE_POINTS
 #include "thermal_trace_ipa.h"
 
+#define THERMAL_CORE_SUBSYS
 #include "thermal_core.h"
 
 #define INVALID_TRIP -1
diff --git a/drivers/thermal/gov_step_wise.c b/drivers/thermal/gov_step_wise.c
index 1050fb4d94c2..1c844004afe5 100644
--- a/drivers/thermal/gov_step_wise.c
+++ b/drivers/thermal/gov_step_wise.c
@@ -14,6 +14,7 @@ 
 #include <linux/minmax.h>
 #include "thermal_trace.h"
 
+#define THERMAL_CORE_SUBSYS
 #include "thermal_core.h"
 
 /*
diff --git a/drivers/thermal/gov_user_space.c b/drivers/thermal/gov_user_space.c
index 8bc1c22aaf03..8883c9ca930f 100644
--- a/drivers/thermal/gov_user_space.c
+++ b/drivers/thermal/gov_user_space.c
@@ -13,6 +13,7 @@ 
 #include <linux/slab.h>
 #include <linux/thermal.h>
 
+#define THERMAL_CORE_SUBSYS
 #include "thermal_core.h"
 
 static int user_space_bind(struct thermal_zone_device *tz)
diff --git a/drivers/thermal/thermal_acpi.c b/drivers/thermal/thermal_acpi.c
index 0e5698818f69..556c9f0cc40d 100644
--- a/drivers/thermal/thermal_acpi.c
+++ b/drivers/thermal/thermal_acpi.c
@@ -9,6 +9,7 @@ 
 #include <linux/acpi.h>
 #include <linux/units.h>
 
+#define THERMAL_CORE_SUBSYS
 #include "thermal_core.h"
 
 /*
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 58533ea75cd9..9ee0ec3bdff6 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -24,6 +24,7 @@ 
 #define CREATE_TRACE_POINTS
 #include "thermal_trace.h"
 
+#define THERMAL_CORE_SUBSYS
 #include "thermal_core.h"
 #include "thermal_hwmon.h"
 
diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
index de884bea28b6..387b06c49415 100644
--- a/drivers/thermal/thermal_core.h
+++ b/drivers/thermal/thermal_core.h
@@ -14,6 +14,10 @@ 
 
 #include "thermal_netlink.h"
 
+#ifndef THERMAL_CORE_SUBSYS
+#warning This header can only be included by the thermal core code
+#endif
+
 /* Default Thermal Governor */
 #if defined(CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE)
 #define DEFAULT_THERMAL_GOVERNOR       "step_wise"
diff --git a/drivers/thermal/thermal_helpers.c b/drivers/thermal/thermal_helpers.c
index 4d66372c9629..26804cfdd494 100644
--- a/drivers/thermal/thermal_helpers.c
+++ b/drivers/thermal/thermal_helpers.c
@@ -19,6 +19,7 @@ 
 #include <linux/string.h>
 #include <linux/sysfs.h>
 
+#define THERMAL_CORE_SUBSYS
 #include "thermal_core.h"
 #include "thermal_trace.h"
 
diff --git a/drivers/thermal/thermal_hwmon.c b/drivers/thermal/thermal_hwmon.c
index c3ae44659b81..ba2ae5be0c23 100644
--- a/drivers/thermal/thermal_hwmon.c
+++ b/drivers/thermal/thermal_hwmon.c
@@ -17,6 +17,7 @@ 
 #include <linux/thermal.h>
 
 #include "thermal_hwmon.h"
+#define THERMAL_CORE_SUBSYS
 #include "thermal_core.h"
 
 /* hwmon sys I/F */
diff --git a/drivers/thermal/thermal_netlink.c b/drivers/thermal/thermal_netlink.c
index 08bc46c3ec7b..f3ac6432bf5f 100644
--- a/drivers/thermal/thermal_netlink.c
+++ b/drivers/thermal/thermal_netlink.c
@@ -11,6 +11,7 @@ 
 #include <net/genetlink.h>
 #include <uapi/linux/thermal.h>
 
+#define THERMAL_CORE_SUBSYS
 #include "thermal_core.h"
 
 static const struct genl_multicast_group thermal_genl_mcgrps[] = {
diff --git a/drivers/thermal/thermal_of.c b/drivers/thermal/thermal_of.c
index 1e0655b63259..db83596ea105 100644
--- a/drivers/thermal/thermal_of.c
+++ b/drivers/thermal/thermal_of.c
@@ -16,6 +16,7 @@ 
 #include <linux/types.h>
 #include <linux/string.h>
 
+#define THERMAL_CORE_SUBSYS
 #include "thermal_core.h"
 
 /***   functions parsing device tree nodes   ***/
diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index 4e6a97db894e..ca616b3f5172 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -19,6 +19,7 @@ 
 #include <linux/string.h>
 #include <linux/jiffies.h>
 
+#define THERMAL_CORE_SUBSYS
 #include "thermal_core.h"
 
 /* sys I/F for thermal zone */
diff --git a/drivers/thermal/thermal_trip.c b/drivers/thermal/thermal_trip.c
index 024e2e365a26..2f4c6fa83bea 100644
--- a/drivers/thermal/thermal_trip.c
+++ b/drivers/thermal/thermal_trip.c
@@ -7,6 +7,7 @@ 
  *
  * Thermal trips handling
  */
+#define THERMAL_CORE_SUBSYS
 #include "thermal_core.h"
 
 int for_each_thermal_trip(struct thermal_zone_device *tz,