diff mbox series

[19/39] sched_ext: Print sched_ext info when dumping stack

Message ID 20240501151312.635565-20-tj@kernel.org (mailing list archive)
State Not Applicable
Headers show
Series [01/39] cgroup: Implement cgroup_show_cftypes() | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch, async

Commit Message

Tejun Heo May 1, 2024, 3:09 p.m. UTC
From: David Vernet <void@manifault.com>

It would be useful to see what the sched_ext scheduler state is, and what
scheduler is running, when we're dumping a task's stack. This patch
therefore adds a new print_scx_info() function that's called in the same
context as print_worker_info() and print_stop_info(). An example dump
follows.

  BUG: kernel NULL pointer dereference, address: 0000000000000999
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0002) - not-present page
  PGD 0 P4D 0
  Oops: 0002 [#1] PREEMPT SMP
  CPU: 13 PID: 2047 Comm: insmod Tainted: G           O       6.6.0-work-10323-gb58d4cae8e99-dirty #34
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS unknown 2/2/2022
  Sched_ext: qmap (enabled+all), task: runnable_at=-17ms
  RIP: 0010:init_module+0x9/0x1000 [test_module]
  ...

v3: - scx_ops_enable_state_str[] definition moved to an earlier patch as
      it's now used by core implementation.

    - Convert jiffy delta to msecs using jiffies_to_msecs() instead of
      multiplying by (HZ / MSEC_PER_SEC). The conversion is implemented in
      jiffies_delta_msecs().

v2: - We are now using scx_ops_enable_state_str[] outside
      CONFIG_SCHED_DEBUG. Move it outside of CONFIG_SCHED_DEBUG and to the
      top. This was reported by Changwoo and Andrea.

Signed-off-by: David Vernet <void@manifault.com>
Reported-by: Changwoo Min <changwoo@igalia.com>
Reported-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
 include/linux/sched/ext.h |  2 ++
 kernel/sched/core.c       |  1 +
 kernel/sched/ext.c        | 53 +++++++++++++++++++++++++++++++++++++++
 lib/dump_stack.c          |  1 +
 4 files changed, 57 insertions(+)

Comments

Peter Zijlstra June 24, 2024, 12:46 p.m. UTC | #1
On Wed, May 01, 2024 at 05:09:54AM -1000, Tejun Heo wrote:

> +static long jiffies_delta_msecs(unsigned long at, unsigned long now)
> +{
> +	if (time_after(at, now))
> +		return jiffies_to_msecs(at - now);
> +	else
> +		return -(long)jiffies_to_msecs(now - at);
> +}

You have this weird superfluous else:

	if ()
		return foo;
	else
		return bar;

pattern in multiple patches, please change that to:

	if ()
		return foo;
	return bar;

Throughout the series.

Also, if we consider 2s complement, does the above actually make sense?
Linus Torvalds June 24, 2024, 2:25 p.m. UTC | #2
On Mon, 24 Jun 2024 at 08:46, Peter Zijlstra <peterz@infradead.org> wrote:
>
> Also, if we consider 2s complement, does the above actually make sense?

jiffies_to_msecs() does not work on 2s complement. It just takes
'unsigned long', and just considers large positive numbers to be
exactly that - not a negative jiffy.

We have a separate jiffies_delta_to_msecs(), but that clamps to zero.
So again, it doesn't actually work for negative jiffies.

Side note: jiffies_to_msecs() seems to return an "unsigned int". That
seems a bit odd.

             Linus
Tejun Heo June 24, 2024, 6:34 p.m. UTC | #3
Hello, Peter.

On Mon, Jun 24, 2024 at 02:46:18PM +0200, Peter Zijlstra wrote:
> On Wed, May 01, 2024 at 05:09:54AM -1000, Tejun Heo wrote:
> 
> > +static long jiffies_delta_msecs(unsigned long at, unsigned long now)
> > +{
> > +	if (time_after(at, now))
> > +		return jiffies_to_msecs(at - now);
> > +	else
> > +		return -(long)jiffies_to_msecs(now - at);
> > +}
> 
> You have this weird superfluous else:
> 
> 	if ()
> 		return foo;
> 	else
> 		return bar;
> 
> pattern in multiple patches, please change that to:
> 
> 	if ()
> 		return foo;
> 	return bar;
> 
> Throughout the series.

It's just my personal preference. When the if and else blocks are relatively
short and completely symmetric like the above case, using exlicit if/else
structure looks more immediately intuitive to me. If it isn't too bothersome
to everybody, I'd like to keep it that way.

> Also, if we consider 2s complement, does the above actually make sense?

Maybe we can update jiffifes_to_msecs() to be 2s complement transparent but
probably not a good idea given how many are calling it.

Thanks.
diff mbox series

Patch

diff --git a/include/linux/sched/ext.h b/include/linux/sched/ext.h
index 2608a8f548db..123d6dffdf26 100644
--- a/include/linux/sched/ext.h
+++ b/include/linux/sched/ext.h
@@ -155,10 +155,12 @@  struct sched_ext_entity {
 };
 
 void sched_ext_free(struct task_struct *p);
+void print_scx_info(const char *log_lvl, struct task_struct *p);
 
 #else	/* !CONFIG_SCHED_CLASS_EXT */
 
 static inline void sched_ext_free(struct task_struct *p) {}
+static inline void print_scx_info(const char *log_lvl, struct task_struct *p) {}
 
 #endif	/* CONFIG_SCHED_CLASS_EXT */
 #endif	/* _LINUX_SCHED_EXT_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index aae9c1297622..42fe654bf946 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9272,6 +9272,7 @@  void sched_show_task(struct task_struct *p)
 
 	print_worker_info(KERN_INFO, p);
 	print_stop_info(KERN_INFO, p);
+	print_scx_info(KERN_INFO, p);
 	show_stack(p, NULL, KERN_INFO);
 	put_task_stack(p);
 }
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 8d2ff81e8dd4..ff080b5f0330 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -577,6 +577,14 @@  static __printf(3, 4) void scx_ops_exit_kind(enum scx_exit_kind kind,
 
 #define SCX_HAS_OP(op)	static_branch_likely(&scx_has_op[SCX_OP_IDX(op)])
 
+static long jiffies_delta_msecs(unsigned long at, unsigned long now)
+{
+	if (time_after(at, now))
+		return jiffies_to_msecs(at - now);
+	else
+		return -(long)jiffies_to_msecs(now - at);
+}
+
 /* if the highest set bit is N, return a mask with bits [N+1, 31] set */
 static u32 higher_bits(u32 flags)
 {
@@ -3695,6 +3703,51 @@  static const struct sysrq_key_op sysrq_sched_ext_reset_op = {
 	.enable_mask	= SYSRQ_ENABLE_RTNICE,
 };
 
+/**
+ * print_scx_info - print out sched_ext scheduler state
+ * @log_lvl: the log level to use when printing
+ * @p: target task
+ *
+ * If a sched_ext scheduler is enabled, print the name and state of the
+ * scheduler. If @p is on sched_ext, print further information about the task.
+ *
+ * This function can be safely called on any task as long as the task_struct
+ * itself is accessible. While safe, this function isn't synchronized and may
+ * print out mixups or garbages of limited length.
+ */
+void print_scx_info(const char *log_lvl, struct task_struct *p)
+{
+	enum scx_ops_enable_state state = scx_ops_enable_state();
+	const char *all = READ_ONCE(scx_switching_all) ? "+all" : "";
+	char runnable_at_buf[22] = "?";
+	struct sched_class *class;
+	unsigned long runnable_at;
+
+	if (state == SCX_OPS_DISABLED)
+		return;
+
+	/*
+	 * Carefully check if the task was running on sched_ext, and then
+	 * carefully copy the time it's been runnable, and its state.
+	 */
+	if (copy_from_kernel_nofault(&class, &p->sched_class, sizeof(class)) ||
+	    class != &ext_sched_class) {
+		printk("%sSched_ext: %s (%s%s)", log_lvl, scx_ops.name,
+		       scx_ops_enable_state_str[state], all);
+		return;
+	}
+
+	if (!copy_from_kernel_nofault(&runnable_at, &p->scx.runnable_at,
+				      sizeof(runnable_at)))
+		scnprintf(runnable_at_buf, sizeof(runnable_at_buf), "%+ldms",
+			  jiffies_delta_msecs(runnable_at, jiffies));
+
+	/* print everything onto one line to conserve console space */
+	printk("%sSched_ext: %s (%s%s), task: runnable_at=%s",
+	       log_lvl, scx_ops.name, scx_ops_enable_state_str[state], all,
+	       runnable_at_buf);
+}
+
 void __init init_sched_ext_class(void)
 {
 	s32 cpu, v;
diff --git a/lib/dump_stack.c b/lib/dump_stack.c
index 222c6d6c8281..9581ef4efec5 100644
--- a/lib/dump_stack.c
+++ b/lib/dump_stack.c
@@ -68,6 +68,7 @@  void dump_stack_print_info(const char *log_lvl)
 
 	print_worker_info(log_lvl, current);
 	print_stop_info(log_lvl, current);
+	print_scx_info(log_lvl, current);
 }
 
 /**