diff mbox series

[2/3] rcu: print first CPU on expedited stall line

Message ID 20221219202910.3063036-3-elliott@hpe.com (mailing list archive)
State New, archived
Headers show
Series crypto/rcu: suppress unnecessary CPU stall warnings | expand

Commit Message

Elliott, Robert (Servers) Dec. 19, 2022, 8:29 p.m. UTC
Include the first CPU number in the first pr_err() call reporting
an expedited stall warning.

Printing the CPU numbers with subsequent pr_cont() calls can
result in the prints being many lines away or being dropped entirely
in a busy system. This change ensures there is indication of at
least one of the CPUs with the problem with the original message.

Before (if prints are interspersed with other prints):
  rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: {
  13-....
  } 32 jiffies s: 6685 root: 0x1/.

After:
  rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: {13-....
  } 32 jiffies s: 6685 root: 0x1/.

Signed-off-by: Robert Elliott <elliott@hpe.com>
---
 kernel/rcu/tree_exp.h | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

Comments

Paul E. McKenney Dec. 21, 2022, 7:02 p.m. UTC | #1
On Mon, Dec 19, 2022 at 02:29:09PM -0600, Robert Elliott wrote:
> Include the first CPU number in the first pr_err() call reporting
> an expedited stall warning.
> 
> Printing the CPU numbers with subsequent pr_cont() calls can
> result in the prints being many lines away or being dropped entirely
> in a busy system. This change ensures there is indication of at
> least one of the CPUs with the problem with the original message.
> 
> Before (if prints are interspersed with other prints):
>   rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: {
>   13-....
>   } 32 jiffies s: 6685 root: 0x1/.
> 
> After:
>   rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: {13-....
>   } 32 jiffies s: 6685 root: 0x1/.
> 
> Signed-off-by: Robert Elliott <elliott@hpe.com>

A couple of questions below.

> ---
>  kernel/rcu/tree_exp.h | 22 +++++++++++++++-------
>  1 file changed, 15 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index ed6c3cce28f2..ade6a18e6c07 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -624,8 +624,6 @@ static void synchronize_rcu_expedited_wait(void)
>  		if (rcu_stall_is_suppressed())
>  			continue;
>  		trace_rcu_stall_warning(rcu_state.name, TPS("ExpeditedStall"));
> -		pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {",
> -		       rcu_state.name);
>  		ndetected = 0;
>  		rcu_for_each_leaf_node(rnp) {
>  			ndetected += rcu_print_task_exp_stall(rnp);
> @@ -637,11 +635,21 @@ static void synchronize_rcu_expedited_wait(void)
>  					continue;
>  				ndetected++;
>  				rdp = per_cpu_ptr(&rcu_data, cpu);
> -				pr_cont(" %d-%c%c%c%c", cpu,
> -					"O."[!!cpu_online(cpu)],
> -					"o."[!!(rdp->grpmask & rnp->expmaskinit)],
> -					"N."[!!(rdp->grpmask & rnp->expmaskinitnext)],
> -					"D."[!!(rdp->cpu_no_qs.b.exp)]);
> +				// print the prefix and the first CPU number together
> +				// under heavy load, the pr_cont prints can be far away or dropped
> +				if (ndetected == 1)

Is the purpose here to print the header only on the first detected task?
If so, what if there is more than one task blocking the first rcu_node
structure?  Wouldn't that omit the header entirely?

> +					pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {%d-%c%c%c%c",

We can of course get other console output interspersed at this point.
This might be OK in practice, but the commit log should clearly spell
out the reasons.

> +					       rcu_state.name, cpu,
> +					       "O."[!!cpu_online(cpu)],
> +					       "o."[!!(rdp->grpmask & rnp->expmaskinit)],
> +					       "N."[!!(rdp->grpmask & rnp->expmaskinitnext)],
> +					       "D."[!!(rdp->cpu_no_qs.b.exp)]);
> +				else
> +					pr_cont(" %d-%c%c%c%c", cpu,
> +						"O."[!!cpu_online(cpu)],
> +						"o."[!!(rdp->grpmask & rnp->expmaskinit)],
> +						"N."[!!(rdp->grpmask & rnp->expmaskinitnext)],
> +						"D."[!!(rdp->cpu_no_qs.b.exp)]);
>  			}
>  		}
>  		pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n",
> -- 
> 2.38.1
>
diff mbox series

Patch

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index ed6c3cce28f2..ade6a18e6c07 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -624,8 +624,6 @@  static void synchronize_rcu_expedited_wait(void)
 		if (rcu_stall_is_suppressed())
 			continue;
 		trace_rcu_stall_warning(rcu_state.name, TPS("ExpeditedStall"));
-		pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {",
-		       rcu_state.name);
 		ndetected = 0;
 		rcu_for_each_leaf_node(rnp) {
 			ndetected += rcu_print_task_exp_stall(rnp);
@@ -637,11 +635,21 @@  static void synchronize_rcu_expedited_wait(void)
 					continue;
 				ndetected++;
 				rdp = per_cpu_ptr(&rcu_data, cpu);
-				pr_cont(" %d-%c%c%c%c", cpu,
-					"O."[!!cpu_online(cpu)],
-					"o."[!!(rdp->grpmask & rnp->expmaskinit)],
-					"N."[!!(rdp->grpmask & rnp->expmaskinitnext)],
-					"D."[!!(rdp->cpu_no_qs.b.exp)]);
+				// print the prefix and the first CPU number together
+				// under heavy load, the pr_cont prints can be far away or dropped
+				if (ndetected == 1)
+					pr_err("INFO: %s detected expedited stalls on CPUs/tasks: {%d-%c%c%c%c",
+					       rcu_state.name, cpu,
+					       "O."[!!cpu_online(cpu)],
+					       "o."[!!(rdp->grpmask & rnp->expmaskinit)],
+					       "N."[!!(rdp->grpmask & rnp->expmaskinitnext)],
+					       "D."[!!(rdp->cpu_no_qs.b.exp)]);
+				else
+					pr_cont(" %d-%c%c%c%c", cpu,
+						"O."[!!cpu_online(cpu)],
+						"o."[!!(rdp->grpmask & rnp->expmaskinit)],
+						"N."[!!(rdp->grpmask & rnp->expmaskinitnext)],
+						"D."[!!(rdp->cpu_no_qs.b.exp)]);
 			}
 		}
 		pr_cont(" } %lu jiffies s: %lu root: %#lx/%c\n",