diff mbox series

thermal/core: Correctly free tz->tzp in thermal zone registration error path

Message ID 20231219082726.844508-1-wenst@chromium.org (mailing list archive)
State Superseded, archived
Headers show
Series thermal/core: Correctly free tz->tzp in thermal zone registration error path | expand

Commit Message

Chen-Yu Tsai Dec. 19, 2023, 8:27 a.m. UTC
After commit 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal
zone parameters structure"), the core now copies the thermal zone
parameters structure, and frees it if an error happens during thermal
zone device registration, or upon unregistration of the device.

In the error path, if device_register() was called, then `tz` disappears
before kfree(tz->tzp) happens, causing a NULL pointer deference crash.

In my case, the error path was entered from the sbs power supply driver,
which through the power supply core registers a thermal zone *without
trip points* for the battery temperature sensor. This combined with
setting the default thermal governor to "power allocator", which
*requires* trip_max, causes the thermal zone registration to error out.

The error path should handle the two cases, one where device_register
has not happened and the kobj hasn't been reference counted, and vice
versa where it has. The original commit tried to cover the first case,
but fails for the second. Fix this by adding kfree(tz->tzp) before
put_device() to cover the second case, and check if `tz` is still valid
before calling kfree(tz->tzp) to avoid crashing in the second case.

Fixes: 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal zone parameters structure")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
---
This includes the minimal changes to fix the crash. I suppose some other
things in the thermal core could be reworked:
- Don't use "power allocator" for thermal zones without trip points
- Move some of the thermal zone cleanup code into the release function

 drivers/thermal/thermal_core.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Rafael J. Wysocki Dec. 19, 2023, 3:28 p.m. UTC | #1
On Tue, Dec 19, 2023 at 9:27 AM Chen-Yu Tsai <wenst@chromium.org> wrote:
>
> After commit 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal
> zone parameters structure"), the core now copies the thermal zone
> parameters structure, and frees it if an error happens during thermal
> zone device registration, or upon unregistration of the device.
>
> In the error path, if device_register() was called, then `tz` disappears
> before kfree(tz->tzp) happens, causing a NULL pointer deference crash.
>
> In my case, the error path was entered from the sbs power supply driver,
> which through the power supply core registers a thermal zone *without
> trip points* for the battery temperature sensor. This combined with
> setting the default thermal governor to "power allocator", which
> *requires* trip_max, causes the thermal zone registration to error out.
>
> The error path should handle the two cases, one where device_register
> has not happened and the kobj hasn't been reference counted, and vice
> versa where it has. The original commit tried to cover the first case,
> but fails for the second. Fix this by adding kfree(tz->tzp) before
> put_device() to cover the second case, and check if `tz` is still valid
> before calling kfree(tz->tzp) to avoid crashing in the second case.
>
> Fixes: 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal zone parameters structure")
> Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
> ---
> This includes the minimal changes to fix the crash. I suppose some other
> things in the thermal core could be reworked:
> - Don't use "power allocator" for thermal zones without trip points
> - Move some of the thermal zone cleanup code into the release function
>
>  drivers/thermal/thermal_core.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 2415dc50c31d..e47826d82062 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -1392,12 +1392,16 @@ thermal_zone_device_register_with_trips(const char *type, struct thermal_trip *t
>  unregister:
>         device_del(&tz->device);
>  release_device:
> +       /* Free tz->tzp before tz goes away. */
> +       kfree(tz->tzp);
>         put_device(&tz->device);
>         tz = NULL;
>  remove_id:
>         ida_free(&thermal_tz_ida, id);
>  free_tzp:
> -       kfree(tz->tzp);
> +       /* If we arrived here before device_register() was called. */
> +       if (tz)
> +               kfree(tz->tzp);
>  free_tz:
>         kfree(tz);
>         return ERR_PTR(result);
> --

Can you please test linux-next from today?  The issue addressed by
your patch should be fixed there.
Chen-Yu Tsai Jan. 9, 2024, 3:45 a.m. UTC | #2
On Tue, Dec 19, 2023 at 11:28 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Tue, Dec 19, 2023 at 9:27 AM Chen-Yu Tsai <wenst@chromium.org> wrote:
> >
> > After commit 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal
> > zone parameters structure"), the core now copies the thermal zone
> > parameters structure, and frees it if an error happens during thermal
> > zone device registration, or upon unregistration of the device.
> >
> > In the error path, if device_register() was called, then `tz` disappears
> > before kfree(tz->tzp) happens, causing a NULL pointer deference crash.
> >
> > In my case, the error path was entered from the sbs power supply driver,
> > which through the power supply core registers a thermal zone *without
> > trip points* for the battery temperature sensor. This combined with
> > setting the default thermal governor to "power allocator", which
> > *requires* trip_max, causes the thermal zone registration to error out.
> >
> > The error path should handle the two cases, one where device_register
> > has not happened and the kobj hasn't been reference counted, and vice
> > versa where it has. The original commit tried to cover the first case,
> > but fails for the second. Fix this by adding kfree(tz->tzp) before
> > put_device() to cover the second case, and check if `tz` is still valid
> > before calling kfree(tz->tzp) to avoid crashing in the second case.
> >
> > Fixes: 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal zone parameters structure")
> > Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
> > ---
> > This includes the minimal changes to fix the crash. I suppose some other
> > things in the thermal core could be reworked:
> > - Don't use "power allocator" for thermal zones without trip points
> > - Move some of the thermal zone cleanup code into the release function
> >
> >  drivers/thermal/thermal_core.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> > index 2415dc50c31d..e47826d82062 100644
> > --- a/drivers/thermal/thermal_core.c
> > +++ b/drivers/thermal/thermal_core.c
> > @@ -1392,12 +1392,16 @@ thermal_zone_device_register_with_trips(const char *type, struct thermal_trip *t
> >  unregister:
> >         device_del(&tz->device);
> >  release_device:
> > +       /* Free tz->tzp before tz goes away. */
> > +       kfree(tz->tzp);
> >         put_device(&tz->device);
> >         tz = NULL;
> >  remove_id:
> >         ida_free(&thermal_tz_ida, id);
> >  free_tzp:
> > -       kfree(tz->tzp);
> > +       /* If we arrived here before device_register() was called. */
> > +       if (tz)
> > +               kfree(tz->tzp);
> >  free_tz:
> >         kfree(tz);
> >         return ERR_PTR(result);
> > --
>
> Can you please test linux-next from today?  The issue addressed by
> your patch should be fixed there.

Sorry for the very late reply. Yes it does. Thanks.

ChenYu
diff mbox series

Patch

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 2415dc50c31d..e47826d82062 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -1392,12 +1392,16 @@  thermal_zone_device_register_with_trips(const char *type, struct thermal_trip *t
 unregister:
 	device_del(&tz->device);
 release_device:
+	/* Free tz->tzp before tz goes away. */
+	kfree(tz->tzp);
 	put_device(&tz->device);
 	tz = NULL;
 remove_id:
 	ida_free(&thermal_tz_ida, id);
 free_tzp:
-	kfree(tz->tzp);
+	/* If we arrived here before device_register() was called. */
+	if (tz)
+		kfree(tz->tzp);
 free_tz:
 	kfree(tz);
 	return ERR_PTR(result);