Message ID | 2236794.NgBsaNRSFp@rjwysocki.net (mailing list archive) |
---|---|
State | Mainlined, archived |
Headers | show |
Series | thermal: Rework binding cooling devices to trip points | expand |
On 19/08/2024 18:30, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Make the thermal_of driver use the .should_bind() thermal zone callback > to provide the thermal core with the information on whether or not to > bind the given cooling device to the given trip point in the given > thermal zone. If it returns 'true', the thermal core will bind the > cooling device to the trip and the corresponding unbinding will be > taken care of automatically by the core on the removal of the involved > thermal zone or cooling device. > > This replaces the .bind() and .unbind() thermal zone callbacks which > assumed the same trip points ordering in the driver and in the thermal > core (that may not be true any more in the future). The .bind() > callback would walk the given thermal zone's cooling maps to find all > of the valid trip point combinations with the given cooling device and > it would call thermal_zone_bind_cooling_device() for all of them using > trip point indices reflecting the ordering of the trips in the DT. > > The .should_bind() callback still walks the thermal zone's cooling maps, > but it can use the trip object passed to it by the thermal core to find > the trip in question in the first place and then it uses the > corresponding 'cooling-device' entries to look up the given cooling > device. To be able to match the trip object provided by the thermal > core to a specific device node, the driver sets the 'priv' field of each > trip to the corresponding device node pointer during initialization. > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Tested-by: Daniel Lezcano <daniel.lezcano@linaro.org> # rk3399-rock960
On 19.08.2024 18:30, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Make the thermal_of driver use the .should_bind() thermal zone callback > to provide the thermal core with the information on whether or not to > bind the given cooling device to the given trip point in the given > thermal zone. If it returns 'true', the thermal core will bind the > cooling device to the trip and the corresponding unbinding will be > taken care of automatically by the core on the removal of the involved > thermal zone or cooling device. > > This replaces the .bind() and .unbind() thermal zone callbacks which > assumed the same trip points ordering in the driver and in the thermal > core (that may not be true any more in the future). The .bind() > callback would walk the given thermal zone's cooling maps to find all > of the valid trip point combinations with the given cooling device and > it would call thermal_zone_bind_cooling_device() for all of them using > trip point indices reflecting the ordering of the trips in the DT. > > The .should_bind() callback still walks the thermal zone's cooling maps, > but it can use the trip object passed to it by the thermal core to find > the trip in question in the first place and then it uses the > corresponding 'cooling-device' entries to look up the given cooling > device. To be able to match the trip object provided by the thermal > core to a specific device node, the driver sets the 'priv' field of each > trip to the corresponding device node pointer during initialization. > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> This patch landed recently in linux-next as commit 6d71d55c3b12 ("thermal/of: Use the .should_bind() thermal zone callback"). In my tests I found that it breaks booting some on my test boars: Exynos-based (OdroidXU4 with ARM32 bit kernel from multi_v7_defconfig) and Amlogic Meson based boards (OdroidC4, VIM3 with ARM64 defconfig+some debug options). Reverting $subject on top of next-20240823 together with c1ee6e1f68f5 ("thermal: core: Clean up trip bind/unbind functions") and 526954900465 ("thermal: core: Drop unused bind/unbind functions and callbacks") due to compile dependencies fixes the issue. On Odroid C4 I see the following warnings before the boards hangs: BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 263, name: systemd-udevd preempt_count: 1, expected: 0 RCU nest depth: 0, expected: 0 4 locks held by systemd-udevd/263: #0: ffff0000013768f8 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x90/0x1ac #1: ffff80008349e1a0 (thermal_list_lock){+.+.}-{3:3}, at: __thermal_cooling_device_register.part.0+0x154/0x2f4 #2: ffff000000988700 (&tz->lock){+.+.}-{3:3}, at: thermal_zone_cdev_binding+0x84/0x1e4 #3: ffff8000834b8a98 (devtree_lock){....}-{2:2}, at: of_get_next_child+0x2c/0x80 irq event stamp: 7936 hardirqs last enabled at (7935): [<ffff8000812b1700>] _raw_spin_unlock_irqrestore+0x74/0x78 hardirqs last disabled at (7936): [<ffff8000812b0b14>] _raw_spin_lock_irqsave+0x84/0x88 softirqs last enabled at (7302): [<ffff8000800b13dc>] handle_softirqs+0x4cc/0x4e4 softirqs last disabled at (7295): [<ffff8000800105b0>] __do_softirq+0x14/0x20 CPU: 3 UID: 0 PID: 263 Comm: systemd-udevd Not tainted 6.11.0-rc3+ #15264 Hardware name: Hardkernel ODROID-C4 (DT) Call trace: dump_backtrace+0x94/0xec show_stack+0x18/0x24 dump_stack_lvl+0x90/0xd0 dump_stack+0x18/0x24 __might_resched+0x144/0x248 __might_sleep+0x48/0x98 down_write+0x28/0xe8 kernfs_remove+0x34/0x58 sysfs_remove_dir+0x54/0x70 __kobject_del+0x40/0xb8 kobject_put+0x104/0x124 of_node_put+0x18/0x28 of_get_next_child+0x4c/0x80 thermal_of_should_bind+0xec/0x28c thermal_zone_cdev_binding+0x104/0x1e4 __thermal_cooling_device_register.part.0+0x194/0x2f4 thermal_of_cooling_device_register+0x3c/0x54 of_devfreq_cooling_register_power+0x220/0x298 devfreq_cooling_em_register+0x48/0xa8 panfrost_devfreq_init+0x294/0x320 [panfrost] panfrost_device_init+0x16c/0x5c8 [panfrost] panfrost_probe+0xbc/0x194 [panfrost] platform_probe+0x68/0xdc really_probe+0xbc/0x298 __driver_probe_device+0x78/0x12c driver_probe_device+0x40/0x164 __driver_attach+0x9c/0x1ac bus_for_each_dev+0x74/0xd4 driver_attach+0x24/0x30 bus_add_driver+0xe4/0x208 driver_register+0x60/0x128 __platform_driver_register+0x28/0x34 panfrost_driver_init+0x20/0x1000 [panfrost] do_one_initcall+0x68/0x300 do_init_module+0x60/0x224 load_module+0x1b0c/0x1cb0 init_module_from_file+0x84/0xc4 idempotent_init_module+0x18c/0x284 __arm64_sys_finit_module+0x64/0xa0 invoke_syscall+0x48/0x110 el0_svc_common.constprop.0+0x40/0xe8 do_el0_svc_compat+0x20/0x3c el0_svc_compat+0x44/0xe0 el0t_32_sync_handler+0x98/0x148 el0t_32_sync+0x194/0x198 ============================= [ BUG: Invalid wait context ] 6.11.0-rc3+ #15264 Tainted: G W ----------------------------- systemd-udevd/263 is trying to lock: ffff0000000e5948 (&root->kernfs_rwsem){++++}-{3:3}, at: kernfs_remove+0x34/0x58 other info that might help us debug this: context-{4:4} 4 locks held by systemd-udevd/263: #0: ffff0000013768f8 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x90/0x1ac #1: ffff80008349e1a0 (thermal_list_lock){+.+.}-{3:3}, at: __thermal_cooling_device_register.part.0+0x154/0x2f4 #2: ffff000000988700 (&tz->lock){+.+.}-{3:3}, at: thermal_zone_cdev_binding+0x84/0x1e4 #3: ffff8000834b8a98 (devtree_lock){....}-{2:2}, at: of_get_next_child+0x2c/0x80 stack backtrace: CPU: 3 UID: 0 PID: 263 Comm: systemd-udevd Tainted: G W 6.11.0-rc3+ #15264 Tainted: [W]=WARN Hardware name: Hardkernel ODROID-C4 (DT) Call trace: dump_backtrace+0x94/0xec show_stack+0x18/0x24 dump_stack_lvl+0x90/0xd0 dump_stack+0x18/0x24 __lock_acquire+0x9fc/0x21a0 lock_acquire+0x200/0x340 down_write+0x50/0xe8 kernfs_remove+0x34/0x58 sysfs_remove_dir+0x54/0x70 __kobject_del+0x40/0xb8 kobject_put+0x104/0x124 of_node_put+0x18/0x28 of_get_next_child+0x4c/0x80 thermal_of_should_bind+0xec/0x28c thermal_zone_cdev_binding+0x104/0x1e4 __thermal_cooling_device_register.part.0+0x194/0x2f4 thermal_of_cooling_device_register+0x3c/0x54 of_devfreq_cooling_register_power+0x220/0x298 devfreq_cooling_em_register+0x48/0xa8 panfrost_devfreq_init+0x294/0x320 [panfrost] panfrost_device_init+0x16c/0x5c8 [panfrost] panfrost_probe+0xbc/0x194 [panfrost] platform_probe+0x68/0xdc really_probe+0xbc/0x298 __driver_probe_device+0x78/0x12c driver_probe_device+0x40/0x164 __driver_attach+0x9c/0x1ac bus_for_each_dev+0x74/0xd4 driver_attach+0x24/0x30 bus_add_driver+0xe4/0x208 driver_register+0x60/0x128 __platform_driver_register+0x28/0x34 panfrost_driver_init+0x20/0x1000 [panfrost] do_one_initcall+0x68/0x300 do_init_module+0x60/0x224 load_module+0x1b0c/0x1cb0 init_module_from_file+0x84/0xc4 idempotent_init_module+0x18c/0x284 __arm64_sys_finit_module+0x64/0xa0 invoke_syscall+0x48/0x110 el0_svc_common.constprop.0+0x40/0xe8 do_el0_svc_compat+0x20/0x3c el0_svc_compat+0x44/0xe0 el0t_32_sync_handler+0x98/0x148 el0t_32_sync+0x194/0x198 rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: rcu: 2-...!: (0 ticks this GP) idle=2aac/1/0x4000000000000000 softirq=798/798 fqs=4 rcu: 3-...!: (0 ticks this GP) idle=28a4/1/0x4000000000000000 softirq=1007/1007 fqs=4 rcu: (detected by 0, t=6505 jiffies, g=349, q=46 ncpus=4) Sending NMI from CPU 0 to CPUs 2: Sending NMI from CPU 0 to CPUs 3: rcu: rcu_preempt kthread timer wakeup didn't happen for 6483 jiffies! g349 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 rcu: Possible timer handling issue on cpu=1 timer-softirq=260 rcu: rcu_preempt kthread starved for 6484 jiffies! g349 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1 rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_preempt state:I stack:0 pid:16 tgid:16 ppid:2 flags:0x00000008 Call trace: __switch_to+0xe0/0x124 __schedule+0x318/0xc30 schedule+0x50/0x15c schedule_timeout+0xac/0x134 rcu_gp_fqs_loop+0x16c/0x8b4 rcu_gp_kthread+0x280/0x314 kthread+0x124/0x128 ret_from_fork+0x10/0x20 rcu: Stack dump where RCU GP kthread last ran: Sending NMI from CPU 0 to CPUs 1: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: rcu: 2-...!: (0 ticks this GP) idle=2aac/1/0x4000000000000000 softirq=798/798 fqs=4 rcu: 3-...!: (0 ticks this GP) idle=28a4/1/0x4000000000000000 softirq=1007/1007 fqs=4 rcu: (detected by 0, t=26013 jiffies, g=349, q=46 ncpus=4) Sending NMI from CPU 0 to CPUs 2: Sending NMI from CPU 0 to CPUs 3: Let me know if I can help debugging this issue further. > --- > > v2 -> v3: Reorder (previously [14/17]) > > v1 -> v2: > * Fix a build issue (undefined symbol) > > This patch only depends on the [06/14] introducing the .should_bind() > thermal zone callback: > > https://lore.kernel.org/linux-pm/9334403.CDJkKcVGEf@rjwysocki.net/ > > --- > drivers/thermal/thermal_of.c | 171 ++++++++++--------------------------------- > 1 file changed, 41 insertions(+), 130 deletions(-) > > Index: linux-pm/drivers/thermal/thermal_of.c > =================================================================== > --- linux-pm.orig/drivers/thermal/thermal_of.c > +++ linux-pm/drivers/thermal/thermal_of.c > @@ -20,37 +20,6 @@ > > /*** functions parsing device tree nodes ***/ > > -static int of_find_trip_id(struct device_node *np, struct device_node *trip) > -{ > - struct device_node *trips; > - struct device_node *t; > - int i = 0; > - > - trips = of_get_child_by_name(np, "trips"); > - if (!trips) { > - pr_err("Failed to find 'trips' node\n"); > - return -EINVAL; > - } > - > - /* > - * Find the trip id point associated with the cooling device map > - */ > - for_each_child_of_node(trips, t) { > - > - if (t == trip) { > - of_node_put(t); > - goto out; > - } > - i++; > - } > - > - i = -ENXIO; > -out: > - of_node_put(trips); > - > - return i; > -} > - > /* > * It maps 'enum thermal_trip_type' found in include/linux/thermal.h > * into the device tree binding of 'trip', property type. > @@ -119,6 +88,8 @@ static int thermal_of_populate_trip(stru > > trip->flags = THERMAL_TRIP_FLAG_RW_TEMP; > > + trip->priv = np; > + > return 0; > } > > @@ -290,39 +261,9 @@ static struct device_node *thermal_of_zo > return tz_np; > } > > -static int __thermal_of_unbind(struct device_node *map_np, int index, int trip_id, > - struct thermal_zone_device *tz, struct thermal_cooling_device *cdev) > -{ > - struct of_phandle_args cooling_spec; > - int ret; > - > - ret = of_parse_phandle_with_args(map_np, "cooling-device", "#cooling-cells", > - index, &cooling_spec); > - > - if (ret < 0) { > - pr_err("Invalid cooling-device entry\n"); > - return ret; > - } > - > - of_node_put(cooling_spec.np); > - > - if (cooling_spec.args_count < 2) { > - pr_err("wrong reference to cooling device, missing limits\n"); > - return -EINVAL; > - } > - > - if (cooling_spec.np != cdev->np) > - return 0; > - > - ret = thermal_zone_unbind_cooling_device(tz, trip_id, cdev); > - if (ret) > - pr_err("Failed to unbind '%s' with '%s': %d\n", tz->type, cdev->type, ret); > - > - return ret; > -} > - > -static int __thermal_of_bind(struct device_node *map_np, int index, int trip_id, > - struct thermal_zone_device *tz, struct thermal_cooling_device *cdev) > +static bool thermal_of_get_cooling_spec(struct device_node *map_np, int index, > + struct thermal_cooling_device *cdev, > + struct cooling_spec *c) > { > struct of_phandle_args cooling_spec; > int ret, weight = THERMAL_WEIGHT_DEFAULT; > @@ -334,104 +275,75 @@ static int __thermal_of_bind(struct devi > > if (ret < 0) { > pr_err("Invalid cooling-device entry\n"); > - return ret; > + return false; > } > > of_node_put(cooling_spec.np); > > if (cooling_spec.args_count < 2) { > pr_err("wrong reference to cooling device, missing limits\n"); > - return -EINVAL; > + return false; > } > > if (cooling_spec.np != cdev->np) > - return 0; > - > - ret = thermal_zone_bind_cooling_device(tz, trip_id, cdev, cooling_spec.args[1], > - cooling_spec.args[0], > - weight); > - if (ret) > - pr_err("Failed to bind '%s' with '%s': %d\n", tz->type, cdev->type, ret); > - > - return ret; > -} > - > -static int thermal_of_for_each_cooling_device(struct device_node *tz_np, struct device_node *map_np, > - struct thermal_zone_device *tz, struct thermal_cooling_device *cdev, > - int (*action)(struct device_node *, int, int, > - struct thermal_zone_device *, struct thermal_cooling_device *)) > -{ > - struct device_node *tr_np; > - int count, i, trip_id; > - > - tr_np = of_parse_phandle(map_np, "trip", 0); > - if (!tr_np) > - return -ENODEV; > - > - trip_id = of_find_trip_id(tz_np, tr_np); > - if (trip_id < 0) > - return trip_id; > - > - count = of_count_phandle_with_args(map_np, "cooling-device", "#cooling-cells"); > - if (count <= 0) { > - pr_err("Add a cooling_device property with at least one device\n"); > - return -ENOENT; > - } > + return false; > > - /* > - * At this point, we don't want to bail out when there is an > - * error, we will try to bind/unbind as many as possible > - * cooling devices > - */ > - for (i = 0; i < count; i++) > - action(map_np, i, trip_id, tz, cdev); > + c->lower = cooling_spec.args[0]; > + c->upper = cooling_spec.args[1]; > + c->weight = weight; > > - return 0; > + return true; > } > > -static int thermal_of_for_each_cooling_maps(struct thermal_zone_device *tz, > - struct thermal_cooling_device *cdev, > - int (*action)(struct device_node *, int, int, > - struct thermal_zone_device *, struct thermal_cooling_device *)) > +static bool thermal_of_should_bind(struct thermal_zone_device *tz, > + const struct thermal_trip *trip, > + struct thermal_cooling_device *cdev, > + struct cooling_spec *c) > { > struct device_node *tz_np, *cm_np, *child; > - int ret = 0; > + bool result = false; > > tz_np = thermal_of_zone_get_by_name(tz); > if (IS_ERR(tz_np)) { > pr_err("Failed to get node tz by name\n"); > - return PTR_ERR(tz_np); > + return false; > } > > cm_np = of_get_child_by_name(tz_np, "cooling-maps"); > if (!cm_np) > goto out; > > + /* Look up the trip and the cdev in the cooling maps. */ > for_each_child_of_node(cm_np, child) { > - ret = thermal_of_for_each_cooling_device(tz_np, child, tz, cdev, action); > - if (ret) { > + struct device_node *tr_np; > + int count, i; > + > + tr_np = of_parse_phandle(child, "trip", 0); > + if (tr_np != trip->priv) { > of_node_put(child); > - break; > + continue; > + } > + > + /* The trip has been found, look up the cdev. */ > + count = of_count_phandle_with_args(child, "cooling-device", "#cooling-cells"); > + if (count <= 0) > + pr_err("Add a cooling_device property with at least one device\n"); > + > + for (i = 0; i < count; i++) { > + result = thermal_of_get_cooling_spec(child, i, cdev, c); > + if (result) > + break; > } > + > + of_node_put(child); > + break; > } > > of_node_put(cm_np); > out: > of_node_put(tz_np); > > - return ret; > -} > - > -static int thermal_of_bind(struct thermal_zone_device *tz, > - struct thermal_cooling_device *cdev) > -{ > - return thermal_of_for_each_cooling_maps(tz, cdev, __thermal_of_bind); > -} > - > -static int thermal_of_unbind(struct thermal_zone_device *tz, > - struct thermal_cooling_device *cdev) > -{ > - return thermal_of_for_each_cooling_maps(tz, cdev, __thermal_of_unbind); > + return result; > } > > /** > @@ -502,8 +414,7 @@ static struct thermal_zone_device *therm > > thermal_of_parameters_init(np, &tzp); > > - of_ops.bind = thermal_of_bind; > - of_ops.unbind = thermal_of_unbind; > + of_ops.should_bind = thermal_of_should_bind; > > ret = of_property_read_string(np, "critical-action", &action); > if (!ret) > > > > Best regards
On Mon, Aug 26, 2024 at 1:32 PM Marek Szyprowski <m.szyprowski@samsung.com> wrote: > > On 19.08.2024 18:30, Rafael J. Wysocki wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > Make the thermal_of driver use the .should_bind() thermal zone callback > > to provide the thermal core with the information on whether or not to > > bind the given cooling device to the given trip point in the given > > thermal zone. If it returns 'true', the thermal core will bind the > > cooling device to the trip and the corresponding unbinding will be > > taken care of automatically by the core on the removal of the involved > > thermal zone or cooling device. > > > > This replaces the .bind() and .unbind() thermal zone callbacks which > > assumed the same trip points ordering in the driver and in the thermal > > core (that may not be true any more in the future). The .bind() > > callback would walk the given thermal zone's cooling maps to find all > > of the valid trip point combinations with the given cooling device and > > it would call thermal_zone_bind_cooling_device() for all of them using > > trip point indices reflecting the ordering of the trips in the DT. > > > > The .should_bind() callback still walks the thermal zone's cooling maps, > > but it can use the trip object passed to it by the thermal core to find > > the trip in question in the first place and then it uses the > > corresponding 'cooling-device' entries to look up the given cooling > > device. To be able to match the trip object provided by the thermal > > core to a specific device node, the driver sets the 'priv' field of each > > trip to the corresponding device node pointer during initialization. > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > This patch landed recently in linux-next as commit 6d71d55c3b12 > ("thermal/of: Use the .should_bind() thermal zone callback") It has been fixed since and it is commit 94c6110b0b13c6416146 now. Bottom line is that it was calling of_node_put() too many times due to a coding mistake. > In my tests I found that it breaks booting some on my test boars: Exynos-based > (OdroidXU4 with ARM32 bit kernel from multi_v7_defconfig) and Amlogic > Meson based boards (OdroidC4, VIM3 with ARM64 defconfig+some debug > options). Reverting $subject on top of next-20240823 together with > c1ee6e1f68f5 ("thermal: core: Clean up trip bind/unbind functions") and > 526954900465 ("thermal: core: Drop unused bind/unbind functions and > callbacks") due to compile dependencies fixes the issue. Thanks for the report!
On 26.08.2024 14:14, Rafael J. Wysocki wrote: > On Mon, Aug 26, 2024 at 1:32 PM Marek Szyprowski > <m.szyprowski@samsung.com> wrote: >> On 19.08.2024 18:30, Rafael J. Wysocki wrote: >>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >>> >>> Make the thermal_of driver use the .should_bind() thermal zone callback >>> to provide the thermal core with the information on whether or not to >>> bind the given cooling device to the given trip point in the given >>> thermal zone. If it returns 'true', the thermal core will bind the >>> cooling device to the trip and the corresponding unbinding will be >>> taken care of automatically by the core on the removal of the involved >>> thermal zone or cooling device. >>> >>> This replaces the .bind() and .unbind() thermal zone callbacks which >>> assumed the same trip points ordering in the driver and in the thermal >>> core (that may not be true any more in the future). The .bind() >>> callback would walk the given thermal zone's cooling maps to find all >>> of the valid trip point combinations with the given cooling device and >>> it would call thermal_zone_bind_cooling_device() for all of them using >>> trip point indices reflecting the ordering of the trips in the DT. >>> >>> The .should_bind() callback still walks the thermal zone's cooling maps, >>> but it can use the trip object passed to it by the thermal core to find >>> the trip in question in the first place and then it uses the >>> corresponding 'cooling-device' entries to look up the given cooling >>> device. To be able to match the trip object provided by the thermal >>> core to a specific device node, the driver sets the 'priv' field of each >>> trip to the corresponding device node pointer during initialization. >>> >>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> This patch landed recently in linux-next as commit 6d71d55c3b12 >> ("thermal/of: Use the .should_bind() thermal zone callback") > It has been fixed since and it is commit 94c6110b0b13c6416146 now. Confirmed. Thanks for fixing it and sorry for the noise. > Bottom line is that it was calling of_node_put() too many times due to > a coding mistake. > >> In my tests I found that it breaks booting some on my test boars: Exynos-based >> (OdroidXU4 with ARM32 bit kernel from multi_v7_defconfig) and Amlogic >> Meson based boards (OdroidC4, VIM3 with ARM64 defconfig+some debug >> options). Reverting $subject on top of next-20240823 together with >> c1ee6e1f68f5 ("thermal: core: Clean up trip bind/unbind functions") and >> 526954900465 ("thermal: core: Drop unused bind/unbind functions and >> callbacks") due to compile dependencies fixes the issue. > Thanks for the report! > Best regards
On Mon, Aug 26, 2024 at 10:49 PM Marek Szyprowski <m.szyprowski@samsung.com> wrote: > > On 26.08.2024 14:14, Rafael J. Wysocki wrote: > > On Mon, Aug 26, 2024 at 1:32 PM Marek Szyprowski > > <m.szyprowski@samsung.com> wrote: > >> On 19.08.2024 18:30, Rafael J. Wysocki wrote: > >>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > >>> > >>> Make the thermal_of driver use the .should_bind() thermal zone callback > >>> to provide the thermal core with the information on whether or not to > >>> bind the given cooling device to the given trip point in the given > >>> thermal zone. If it returns 'true', the thermal core will bind the > >>> cooling device to the trip and the corresponding unbinding will be > >>> taken care of automatically by the core on the removal of the involved > >>> thermal zone or cooling device. > >>> > >>> This replaces the .bind() and .unbind() thermal zone callbacks which > >>> assumed the same trip points ordering in the driver and in the thermal > >>> core (that may not be true any more in the future). The .bind() > >>> callback would walk the given thermal zone's cooling maps to find all > >>> of the valid trip point combinations with the given cooling device and > >>> it would call thermal_zone_bind_cooling_device() for all of them using > >>> trip point indices reflecting the ordering of the trips in the DT. > >>> > >>> The .should_bind() callback still walks the thermal zone's cooling maps, > >>> but it can use the trip object passed to it by the thermal core to find > >>> the trip in question in the first place and then it uses the > >>> corresponding 'cooling-device' entries to look up the given cooling > >>> device. To be able to match the trip object provided by the thermal > >>> core to a specific device node, the driver sets the 'priv' field of each > >>> trip to the corresponding device node pointer during initialization. > >>> > >>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > >> This patch landed recently in linux-next as commit 6d71d55c3b12 > >> ("thermal/of: Use the .should_bind() thermal zone callback") > > It has been fixed since and it is commit 94c6110b0b13c6416146 now. > > > Confirmed. Thanks for fixing it and sorry for the noise. Thank you! And it wasn't noise. You reported the problem as soon as you saw it and before you could see the fix. Somebody else saw it earlier, but there's nothing wrong with that.
Index: linux-pm/drivers/thermal/thermal_of.c =================================================================== --- linux-pm.orig/drivers/thermal/thermal_of.c +++ linux-pm/drivers/thermal/thermal_of.c @@ -20,37 +20,6 @@ /*** functions parsing device tree nodes ***/ -static int of_find_trip_id(struct device_node *np, struct device_node *trip) -{ - struct device_node *trips; - struct device_node *t; - int i = 0; - - trips = of_get_child_by_name(np, "trips"); - if (!trips) { - pr_err("Failed to find 'trips' node\n"); - return -EINVAL; - } - - /* - * Find the trip id point associated with the cooling device map - */ - for_each_child_of_node(trips, t) { - - if (t == trip) { - of_node_put(t); - goto out; - } - i++; - } - - i = -ENXIO; -out: - of_node_put(trips); - - return i; -} - /* * It maps 'enum thermal_trip_type' found in include/linux/thermal.h * into the device tree binding of 'trip', property type. @@ -119,6 +88,8 @@ static int thermal_of_populate_trip(stru trip->flags = THERMAL_TRIP_FLAG_RW_TEMP; + trip->priv = np; + return 0; } @@ -290,39 +261,9 @@ static struct device_node *thermal_of_zo return tz_np; } -static int __thermal_of_unbind(struct device_node *map_np, int index, int trip_id, - struct thermal_zone_device *tz, struct thermal_cooling_device *cdev) -{ - struct of_phandle_args cooling_spec; - int ret; - - ret = of_parse_phandle_with_args(map_np, "cooling-device", "#cooling-cells", - index, &cooling_spec); - - if (ret < 0) { - pr_err("Invalid cooling-device entry\n"); - return ret; - } - - of_node_put(cooling_spec.np); - - if (cooling_spec.args_count < 2) { - pr_err("wrong reference to cooling device, missing limits\n"); - return -EINVAL; - } - - if (cooling_spec.np != cdev->np) - return 0; - - ret = thermal_zone_unbind_cooling_device(tz, trip_id, cdev); - if (ret) - pr_err("Failed to unbind '%s' with '%s': %d\n", tz->type, cdev->type, ret); - - return ret; -} - -static int __thermal_of_bind(struct device_node *map_np, int index, int trip_id, - struct thermal_zone_device *tz, struct thermal_cooling_device *cdev) +static bool thermal_of_get_cooling_spec(struct device_node *map_np, int index, + struct thermal_cooling_device *cdev, + struct cooling_spec *c) { struct of_phandle_args cooling_spec; int ret, weight = THERMAL_WEIGHT_DEFAULT; @@ -334,104 +275,75 @@ static int __thermal_of_bind(struct devi if (ret < 0) { pr_err("Invalid cooling-device entry\n"); - return ret; + return false; } of_node_put(cooling_spec.np); if (cooling_spec.args_count < 2) { pr_err("wrong reference to cooling device, missing limits\n"); - return -EINVAL; + return false; } if (cooling_spec.np != cdev->np) - return 0; - - ret = thermal_zone_bind_cooling_device(tz, trip_id, cdev, cooling_spec.args[1], - cooling_spec.args[0], - weight); - if (ret) - pr_err("Failed to bind '%s' with '%s': %d\n", tz->type, cdev->type, ret); - - return ret; -} - -static int thermal_of_for_each_cooling_device(struct device_node *tz_np, struct device_node *map_np, - struct thermal_zone_device *tz, struct thermal_cooling_device *cdev, - int (*action)(struct device_node *, int, int, - struct thermal_zone_device *, struct thermal_cooling_device *)) -{ - struct device_node *tr_np; - int count, i, trip_id; - - tr_np = of_parse_phandle(map_np, "trip", 0); - if (!tr_np) - return -ENODEV; - - trip_id = of_find_trip_id(tz_np, tr_np); - if (trip_id < 0) - return trip_id; - - count = of_count_phandle_with_args(map_np, "cooling-device", "#cooling-cells"); - if (count <= 0) { - pr_err("Add a cooling_device property with at least one device\n"); - return -ENOENT; - } + return false; - /* - * At this point, we don't want to bail out when there is an - * error, we will try to bind/unbind as many as possible - * cooling devices - */ - for (i = 0; i < count; i++) - action(map_np, i, trip_id, tz, cdev); + c->lower = cooling_spec.args[0]; + c->upper = cooling_spec.args[1]; + c->weight = weight; - return 0; + return true; } -static int thermal_of_for_each_cooling_maps(struct thermal_zone_device *tz, - struct thermal_cooling_device *cdev, - int (*action)(struct device_node *, int, int, - struct thermal_zone_device *, struct thermal_cooling_device *)) +static bool thermal_of_should_bind(struct thermal_zone_device *tz, + const struct thermal_trip *trip, + struct thermal_cooling_device *cdev, + struct cooling_spec *c) { struct device_node *tz_np, *cm_np, *child; - int ret = 0; + bool result = false; tz_np = thermal_of_zone_get_by_name(tz); if (IS_ERR(tz_np)) { pr_err("Failed to get node tz by name\n"); - return PTR_ERR(tz_np); + return false; } cm_np = of_get_child_by_name(tz_np, "cooling-maps"); if (!cm_np) goto out; + /* Look up the trip and the cdev in the cooling maps. */ for_each_child_of_node(cm_np, child) { - ret = thermal_of_for_each_cooling_device(tz_np, child, tz, cdev, action); - if (ret) { + struct device_node *tr_np; + int count, i; + + tr_np = of_parse_phandle(child, "trip", 0); + if (tr_np != trip->priv) { of_node_put(child); - break; + continue; + } + + /* The trip has been found, look up the cdev. */ + count = of_count_phandle_with_args(child, "cooling-device", "#cooling-cells"); + if (count <= 0) + pr_err("Add a cooling_device property with at least one device\n"); + + for (i = 0; i < count; i++) { + result = thermal_of_get_cooling_spec(child, i, cdev, c); + if (result) + break; } + + of_node_put(child); + break; } of_node_put(cm_np); out: of_node_put(tz_np); - return ret; -} - -static int thermal_of_bind(struct thermal_zone_device *tz, - struct thermal_cooling_device *cdev) -{ - return thermal_of_for_each_cooling_maps(tz, cdev, __thermal_of_bind); -} - -static int thermal_of_unbind(struct thermal_zone_device *tz, - struct thermal_cooling_device *cdev) -{ - return thermal_of_for_each_cooling_maps(tz, cdev, __thermal_of_unbind); + return result; } /** @@ -502,8 +414,7 @@ static struct thermal_zone_device *therm thermal_of_parameters_init(np, &tzp); - of_ops.bind = thermal_of_bind; - of_ops.unbind = thermal_of_unbind; + of_ops.should_bind = thermal_of_should_bind; ret = of_property_read_string(np, "critical-action", &action); if (!ret)