diff mbox series

[v1] trace/hwlat: stop worker if !is_percpu_thread due to hotplug event

Message ID 20240201062455.3154803-1-andy.chiu@sifive.com (mailing list archive)
State New
Headers show
Series [v1] trace/hwlat: stop worker if !is_percpu_thread due to hotplug event | expand

Commit Message

Andy Chiu Feb. 1, 2024, 6:24 a.m. UTC
If the task happens to run after cpu hot-plug offline, then it would not
be running in a percpu_thread. Instead, it would be re-queued into a
UNBOUND workqueue. This would trigger a warning if we enable kernel
preemption.

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
---
 kernel/trace/trace_hwlat.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Steven Rostedt Feb. 1, 2024, 2:57 p.m. UTC | #1
On Thu,  1 Feb 2024 14:24:55 +0800
Andy Chiu <andy.chiu@sifive.com> wrote:

> If the task happens to run after cpu hot-plug offline, then it would not
> be running in a percpu_thread. Instead, it would be re-queued into a
> UNBOUND workqueue. This would trigger a warning if we enable kernel
> preemption.
> 
> Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
> ---
>  kernel/trace/trace_hwlat.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
> index b791524a6536..87258ddc2141 100644
> --- a/kernel/trace/trace_hwlat.c
> +++ b/kernel/trace/trace_hwlat.c
> @@ -511,7 +511,16 @@ static int start_cpu_kthread(unsigned int cpu)
>  static void hwlat_hotplug_workfn(struct work_struct *dummy)
>  {
>  	struct trace_array *tr = hwlat_trace;
> -	unsigned int cpu = smp_processor_id();
> +	unsigned int cpu;
> +
> +	/*
> +	 * If the work is scheduled after CPU hotplug offline being invoked,
> +	 * then it would be queued into UNBOUNDED workqueue

Is it due to a CPU going to online and then right back to offline? This
function is called by the online path.

From the queue_work_on() comment:

/**
 * queue_work_on - queue work on specific cpu
 * @cpu: CPU number to execute work on
 * @wq: workqueue to use
 * @work: work to queue
 *
 * We queue the work to a specific CPU, the caller must ensure it
 * can't go away.  Callers that fail to ensure that the specified
 * CPU cannot go away will execute on a randomly chosen CPU.
 * But note well that callers specifying a CPU that never has been
 * online will get a splat.
 *
 * Return: %false if @work was already on a queue, %true otherwise.
 */

So the work gets queued on the CPU but then the CPU immediately goes
offline. I wonder what happens if it comes back online?

The above states that if the work is already queued on a CPU it won't queue
it again. Hmm, reading the comments, I'm not sure you can run the same
workqueue on muliple CPUs. It looks to do only one and all the others will
fail.

I think we need to change this to a global work queue and use CPU masks.
Where in the *cpu_init() function, it just sets the current CPU bit in a
bit mask and calls the workqueue, and that workqueue is responsible for all
CPUs.

Then the workqueue function will call cpu_read_lock() and just iterate the
CPU mask starting the threads for each of the bits that are set and clear
the bit (possibly with __test_and_clear_bit).

I don't think the schedule_on_cpu() is doing what we think it should be
doing.


-- Steve


> +	 */
> +	if (!is_percpu_thread())
> +		return;
> +
> +	cpu = smp_processor_id();
>  
>  	mutex_lock(&trace_types_lock);
>  	mutex_lock(&hwlat_data.lock);
diff mbox series

Patch

diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
index b791524a6536..87258ddc2141 100644
--- a/kernel/trace/trace_hwlat.c
+++ b/kernel/trace/trace_hwlat.c
@@ -511,7 +511,16 @@  static int start_cpu_kthread(unsigned int cpu)
 static void hwlat_hotplug_workfn(struct work_struct *dummy)
 {
 	struct trace_array *tr = hwlat_trace;
-	unsigned int cpu = smp_processor_id();
+	unsigned int cpu;
+
+	/*
+	 * If the work is scheduled after CPU hotplug offline being invoked,
+	 * then it would be queued into UNBOUNDED workqueue
+	 */
+	if (!is_percpu_thread())
+		return;
+
+	cpu = smp_processor_id();
 
 	mutex_lock(&trace_types_lock);
 	mutex_lock(&hwlat_data.lock);