diff mbox series

rcu: Reduce the frequency of triggering irq-work for strict grace period

Message ID 20220808022626.12825-1-qiang1.zhang@intel.com (mailing list archive)
State Accepted
Commit 1e5aa3d5d53a6b38718510cce2b91a8199303891
Headers show
Series rcu: Reduce the frequency of triggering irq-work for strict grace period | expand

Commit Message

Zqiang Aug. 8, 2022, 2:26 a.m. UTC
For kernel built with PREEMPT_RCU=y and RCU_STRICT_GRACE_PERIOD=y,
currently, the irq-work will be triggered by high frequency to make
the scheduler re-evaluate and call hooks to check whether there are
qs need to report when exit RCU read critical section in irq-disbaled
context. however if there is no gp in progress, the irq-work trigger
is meaningless.

This commit reduced irq-work trigger frequency by check whether the
current CPU is not experiencing qs or there is task that blocks
the current grace period for RCU_STRICT_GRACE_PERIOD=y kernel.

The test results are as follows:

echo rcu_preempt_deferred_qs_handler > /sys/kernel/debug/tracing/set_ftrace_filter
echo 1 > /sys/kernel/debug/tracing/function_profile_enabled
insmod rcutorture.ko
sleep 20
rmmod rcutorture.ko
echo 0 > /sys/kernel/debug/tracing/function_profile_enabled
echo > /sys/kernel/debug/tracing/set_ftrace_filter

head /sys/kernel/debug/tracing/trace_stat/function*

original:

==> /sys/kernel/debug/tracing/trace_stat/function0 <==
  Function                               Hit    Time            Avg             s^2
  --------                               ---    ----            ---             ---
  rcu_preempt_deferred_qs_handle      838746    182650.3 us     0.217 us        0.004 us

==> /sys/kernel/debug/tracing/trace_stat/function1 <==
  Function                               Hit    Time            Avg             s^2
  --------                               ---    ----            ---             ---
  rcu_preempt_deferred_qs_handle      841768    191138.6 us     0.227 us        0.024 us

==> /sys/kernel/debug/tracing/trace_stat/function2 <==
  Function                               Hit    Time            Avg             s^2
  --------                               ---    ----            ---             ---
  rcu_preempt_deferred_qs_handle      828243    180455.6 us     0.217 us        0.004 us

==> /sys/kernel/debug/tracing/trace_stat/function3 <==
  Function                               Hit    Time            Avg             s^2
  --------                               ---    ----            ---             ---
  rcu_preempt_deferred_qs_handle      810258    189158.1 us     0.233 us        0.023 us

apply patch:

==> /sys/kernel/debug/tracing/trace_stat/function0 <==
  Function                               Hit    Time            Avg             s^2
  --------                               ---    ----            ---             ---
  rcu_preempt_deferred_qs_handle      302373    67434.95 us     0.223 us        0.001 us

==> /sys/kernel/debug/tracing/trace_stat/function1 <==
  Function                               Hit    Time            Avg             s^2
  --------                               ---    ----            ---             ---
  rcu_preempt_deferred_qs_handle      307174    68398.34 us     0.222 us        0.002 us

==> /sys/kernel/debug/tracing/trace_stat/function2 <==
  Function                               Hit    Time            Avg             s^2
  --------                               ---    ----            ---             ---
  rcu_preempt_deferred_qs_handle      250910    56157.42 us     0.223 us        0.002 us

==> /sys/kernel/debug/tracing/trace_stat/function3 <==
  Function                               Hit    Time            Avg             s^2
  --------                               ---    ----            ---             ---
  rcu_preempt_deferred_qs_handle      279902    62644.64 us     0.223 us        0.003 us

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
---
 kernel/rcu/tree_plugin.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Paul E. McKenney Aug. 8, 2022, 6:03 p.m. UTC | #1
On Mon, Aug 08, 2022 at 10:26:26AM +0800, Zqiang wrote:
> For kernel built with PREEMPT_RCU=y and RCU_STRICT_GRACE_PERIOD=y,
> currently, the irq-work will be triggered by high frequency to make
> the scheduler re-evaluate and call hooks to check whether there are
> qs need to report when exit RCU read critical section in irq-disbaled
> context. however if there is no gp in progress, the irq-work trigger
> is meaningless.
> 
> This commit reduced irq-work trigger frequency by check whether the
> current CPU is not experiencing qs or there is task that blocks
> the current grace period for RCU_STRICT_GRACE_PERIOD=y kernel.
> 
> The test results are as follows:
> 
> echo rcu_preempt_deferred_qs_handler > /sys/kernel/debug/tracing/set_ftrace_filter
> echo 1 > /sys/kernel/debug/tracing/function_profile_enabled
> insmod rcutorture.ko
> sleep 20
> rmmod rcutorture.ko
> echo 0 > /sys/kernel/debug/tracing/function_profile_enabled
> echo > /sys/kernel/debug/tracing/set_ftrace_filter
> 
> head /sys/kernel/debug/tracing/trace_stat/function*
> 
> original:
> 
> ==> /sys/kernel/debug/tracing/trace_stat/function0 <==
>   Function                               Hit    Time            Avg             s^2
>   --------                               ---    ----            ---             ---
>   rcu_preempt_deferred_qs_handle      838746    182650.3 us     0.217 us        0.004 us
> 
> ==> /sys/kernel/debug/tracing/trace_stat/function1 <==
>   Function                               Hit    Time            Avg             s^2
>   --------                               ---    ----            ---             ---
>   rcu_preempt_deferred_qs_handle      841768    191138.6 us     0.227 us        0.024 us
> 
> ==> /sys/kernel/debug/tracing/trace_stat/function2 <==
>   Function                               Hit    Time            Avg             s^2
>   --------                               ---    ----            ---             ---
>   rcu_preempt_deferred_qs_handle      828243    180455.6 us     0.217 us        0.004 us
> 
> ==> /sys/kernel/debug/tracing/trace_stat/function3 <==
>   Function                               Hit    Time            Avg             s^2
>   --------                               ---    ----            ---             ---
>   rcu_preempt_deferred_qs_handle      810258    189158.1 us     0.233 us        0.023 us
> 
> apply patch:
> 
> ==> /sys/kernel/debug/tracing/trace_stat/function0 <==
>   Function                               Hit    Time            Avg             s^2
>   --------                               ---    ----            ---             ---
>   rcu_preempt_deferred_qs_handle      302373    67434.95 us     0.223 us        0.001 us
> 
> ==> /sys/kernel/debug/tracing/trace_stat/function1 <==
>   Function                               Hit    Time            Avg             s^2
>   --------                               ---    ----            ---             ---
>   rcu_preempt_deferred_qs_handle      307174    68398.34 us     0.222 us        0.002 us
> 
> ==> /sys/kernel/debug/tracing/trace_stat/function2 <==
>   Function                               Hit    Time            Avg             s^2
>   --------                               ---    ----            ---             ---
>   rcu_preempt_deferred_qs_handle      250910    56157.42 us     0.223 us        0.002 us
> 
> ==> /sys/kernel/debug/tracing/trace_stat/function3 <==
>   Function                               Hit    Time            Avg             s^2
>   --------                               ---    ----            ---             ---
>   rcu_preempt_deferred_qs_handle      279902    62644.64 us     0.223 us        0.003 us
> 
> Signed-off-by: Zqiang <qiang1.zhang@intel.com>

Looks good, queued for testing and review, thank you!

Please check the version shown below for the usual wordsmithing.

							Thanx, Paul

------------------------------------------------------------------------

commit 1e5aa3d5d53a6b38718510cce2b91a8199303891
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Mon Aug 8 10:26:26 2022 +0800

    rcu: Avoid triggering strict-GP irq-work when RCU is idle
    
    Kernels built with PREEMPT_RCU=y and RCU_STRICT_GRACE_PERIOD=y trigger
    irq-work from rcu_read_unlock(), and the resulting irq-work handler
    invokes rcu_preempt_deferred_qs_handle().  The point of this triggering
    is to force grace periods to end quickly in order to give tools like KASAN
    a better chance of detecting RCU usage bugs such as leaking RCU-protected
    pointers out of an RCU read-side critical section.
    
    However, this irq-work triggering is unconditional.  This works, but
    there is no point in doing this irq-work unless the current grace period
    is waiting on the running CPU or task, which is not the common case.
    After all, in the common case there are many rcu_read_unlock() calls
    per CPU per grace period.
    
    This commit therefore triggers the irq-work only when the current grace
    period is waiting on the running CPU or task.
    
    This change was tested as follows on a four-CPU system:
    
            echo rcu_preempt_deferred_qs_handler > /sys/kernel/debug/tracing/set_ftrace_filter
            echo 1 > /sys/kernel/debug/tracing/function_profile_enabled
            insmod rcutorture.ko
            sleep 20
            rmmod rcutorture.ko
            echo 0 > /sys/kernel/debug/tracing/function_profile_enabled
            echo > /sys/kernel/debug/tracing/set_ftrace_filter
    
    This procedure produces results in this per-CPU set of files:
    
            /sys/kernel/debug/tracing/trace_stat/function*
    
    Sample output from one of these files is as follows:
    
      Function                               Hit    Time            Avg             s^2
      --------                               ---    ----            ---             ---
      rcu_preempt_deferred_qs_handle      838746    182650.3 us     0.217 us        0.004 us
    
    The baseline sum of the "Hit" values (the number of calls to this
    function) was 3,319,015.  With this commit, that sum was 1,140,359,
    for a 2.9x reduction.  The worst-case variance across the CPUs was less
    than 25%, so this large effect size is statistically significant.
    
    The raw data is available in the Link: URL.
    
    Link: https://lore.kernel.org/all/20220808022626.12825-1-qiang1.zhang@intel.com/
    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index b76076014e126..1514909cf905f 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -641,7 +641,8 @@ static void rcu_read_unlock_special(struct task_struct *t)
 
 		expboost = (t->rcu_blocked_node && READ_ONCE(t->rcu_blocked_node->exp_tasks)) ||
 			   (rdp->grpmask & READ_ONCE(rnp->expmask)) ||
-			   IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ||
+			   (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) &&
+			   ((rdp->grpmask & READ_ONCE(rnp->qsmask)) || t->rcu_blocked_node)) ||
 			   (IS_ENABLED(CONFIG_RCU_BOOST) && irqs_were_disabled &&
 			    t->rcu_blocked_node);
 		// Need to defer quiescent state until everything is enabled.
Zqiang Aug. 9, 2022, 8:14 a.m. UTC | #2
On Mon, Aug 08, 2022 at 10:26:26AM +0800, Zqiang wrote:
> For kernel built with PREEMPT_RCU=y and RCU_STRICT_GRACE_PERIOD=y,
> currently, the irq-work will be triggered by high frequency to make
> the scheduler re-evaluate and call hooks to check whether there are
> qs need to report when exit RCU read critical section in irq-disbaled
> context. however if there is no gp in progress, the irq-work trigger
> is meaningless.
> 
> This commit reduced irq-work trigger frequency by check whether the
> current CPU is not experiencing qs or there is task that blocks
> the current grace period for RCU_STRICT_GRACE_PERIOD=y kernel.
> 
> The test results are as follows:
> 
> echo rcu_preempt_deferred_qs_handler > /sys/kernel/debug/tracing/set_ftrace_filter
> echo 1 > /sys/kernel/debug/tracing/function_profile_enabled
> insmod rcutorture.ko
> sleep 20
> rmmod rcutorture.ko
> echo 0 > /sys/kernel/debug/tracing/function_profile_enabled
> echo > /sys/kernel/debug/tracing/set_ftrace_filter
> 
> head /sys/kernel/debug/tracing/trace_stat/function*
> 
> original:
> 
> ==> /sys/kernel/debug/tracing/trace_stat/function0 <==
>   Function                               Hit    Time            Avg             s^2
>   --------                               ---    ----            ---             ---
>   rcu_preempt_deferred_qs_handle      838746    182650.3 us     0.217 us        0.004 us
> 
> ==> /sys/kernel/debug/tracing/trace_stat/function1 <==
>   Function                               Hit    Time            Avg             s^2
>   --------                               ---    ----            ---             ---
>   rcu_preempt_deferred_qs_handle      841768    191138.6 us     0.227 us        0.024 us
> 
> ==> /sys/kernel/debug/tracing/trace_stat/function2 <==
>   Function                               Hit    Time            Avg             s^2
>   --------                               ---    ----            ---             ---
>   rcu_preempt_deferred_qs_handle      828243    180455.6 us     0.217 us        0.004 us
> 
> ==> /sys/kernel/debug/tracing/trace_stat/function3 <==
>   Function                               Hit    Time            Avg             s^2
>   --------                               ---    ----            ---             ---
>   rcu_preempt_deferred_qs_handle      810258    189158.1 us     0.233 us        0.023 us
> 
> apply patch:
> 
> ==> /sys/kernel/debug/tracing/trace_stat/function0 <==
>   Function                               Hit    Time            Avg             s^2
>   --------                               ---    ----            ---             ---
>   rcu_preempt_deferred_qs_handle      302373    67434.95 us     0.223 us        0.001 us
> 
> ==> /sys/kernel/debug/tracing/trace_stat/function1 <==
>   Function                               Hit    Time            Avg             s^2
>   --------                               ---    ----            ---             ---
>   rcu_preempt_deferred_qs_handle      307174    68398.34 us     0.222 us        0.002 us
> 
> ==> /sys/kernel/debug/tracing/trace_stat/function2 <==
>   Function                               Hit    Time            Avg             s^2
>   --------                               ---    ----            ---             ---
>   rcu_preempt_deferred_qs_handle      250910    56157.42 us     0.223 us        0.002 us
> 
> ==> /sys/kernel/debug/tracing/trace_stat/function3 <==
>   Function                               Hit    Time            Avg             s^2
>   --------                               ---    ----            ---             ---
>   rcu_preempt_deferred_qs_handle      279902    62644.64 us     0.223 us        0.003 us
> 
> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
>
>Looks good, queued for testing and review, thank you!
>
>Please check the version shown below for the usual wordsmithing.

Thank you for a more detailed description.

Thanks
Zqiang

>
>							Thanx, Paul
>
>------------------------------------------------------------------------
>
commit 1e5aa3d5d53a6b38718510cce2b91a8199303891
Author: Zqiang <qiang1.zhang@intel.com>
Date:   Mon Aug 8 10:26:26 2022 +0800

    rcu: Avoid triggering strict-GP irq-work when RCU is idle
    
    Kernels built with PREEMPT_RCU=y and RCU_STRICT_GRACE_PERIOD=y trigger
    irq-work from rcu_read_unlock(), and the resulting irq-work handler
    invokes rcu_preempt_deferred_qs_handle().  The point of this triggering
    is to force grace periods to end quickly in order to give tools like KASAN
    a better chance of detecting RCU usage bugs such as leaking RCU-protected
    pointers out of an RCU read-side critical section.
    
    However, this irq-work triggering is unconditional.  This works, but
    there is no point in doing this irq-work unless the current grace period
    is waiting on the running CPU or task, which is not the common case.
    After all, in the common case there are many rcu_read_unlock() calls
    per CPU per grace period.
    
    This commit therefore triggers the irq-work only when the current grace
    period is waiting on the running CPU or task.
    
    This change was tested as follows on a four-CPU system:
    
            echo rcu_preempt_deferred_qs_handler > /sys/kernel/debug/tracing/set_ftrace_filter
            echo 1 > /sys/kernel/debug/tracing/function_profile_enabled
            insmod rcutorture.ko
            sleep 20
            rmmod rcutorture.ko
            echo 0 > /sys/kernel/debug/tracing/function_profile_enabled
            echo > /sys/kernel/debug/tracing/set_ftrace_filter
    
    This procedure produces results in this per-CPU set of files:
    
            /sys/kernel/debug/tracing/trace_stat/function*
    
    Sample output from one of these files is as follows:
    
      Function                               Hit    Time            Avg             s^2
      --------                               ---    ----            ---             ---
      rcu_preempt_deferred_qs_handle      838746    182650.3 us     0.217 us        0.004 us
    
    The baseline sum of the "Hit" values (the number of calls to this
    function) was 3,319,015.  With this commit, that sum was 1,140,359,
    for a 2.9x reduction.  The worst-case variance across the CPUs was less
    than 25%, so this large effect size is statistically significant.
    
    The raw data is available in the Link: URL.
    
    Link: https://lore.kernel.org/all/20220808022626.12825-1-qiang1.zhang@intel.com/
    Signed-off-by: Zqiang <qiang1.zhang@intel.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index b76076014e126..1514909cf905f 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -641,7 +641,8 @@ static void rcu_read_unlock_special(struct task_struct *t)
 
 		expboost = (t->rcu_blocked_node && READ_ONCE(t->rcu_blocked_node->exp_tasks)) ||
 			   (rdp->grpmask & READ_ONCE(rnp->expmask)) ||
-			   IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ||
+			   (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) &&
+			   ((rdp->grpmask & READ_ONCE(rnp->qsmask)) || t->rcu_blocked_node)) ||
 			   (IS_ENABLED(CONFIG_RCU_BOOST) && irqs_were_disabled &&
 			    t->rcu_blocked_node);
 		// Need to defer quiescent state until everything is enabled.
diff mbox series

Patch

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index b76076014e12..1514909cf905 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -641,7 +641,8 @@  static void rcu_read_unlock_special(struct task_struct *t)
 
 		expboost = (t->rcu_blocked_node && READ_ONCE(t->rcu_blocked_node->exp_tasks)) ||
 			   (rdp->grpmask & READ_ONCE(rnp->expmask)) ||
-			   IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ||
+			   (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) &&
+			   ((rdp->grpmask & READ_ONCE(rnp->qsmask)) || t->rcu_blocked_node)) ||
 			   (IS_ENABLED(CONFIG_RCU_BOOST) && irqs_were_disabled &&
 			    t->rcu_blocked_node);
 		// Need to defer quiescent state until everything is enabled.