Message ID | 20240308173409.335345-1-urezki@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Reduce synchronize_rcu() latency(v6) | expand |
On Fri, Mar 08, 2024 at 06:34:03PM +0100, Uladzislau Rezki (Sony) wrote: > This is v6. It is based on the Paul's "dev" branch: > > HEAD: f1bfe538c7970283040a7188a291aca9f18f0c42 > > please note, that patches should be applied from scratch, > i.e. the v5 has to be dropped from the "dev". > > v5 -> v6: > - Fix a race due to realising a wait-head from the gp-kthread; > - Use our own private workqueue with WQ_MEM_RECLAIM to have > at least one execution context. > > v5: https://lore.kernel.org/lkml/20240220183115.74124-1-urezki@gmail.com/ > v4: https://lore.kernel.org/lkml/ZZ2bi5iPwXLgjB-f@google.com/T/ > v3: https://lore.kernel.org/lkml/cd45b0b5-f86b-43fb-a5f3-47d340cd4f9f@paulmck-laptop/T/ > v2: https://lore.kernel.org/all/20231030131254.488186-1-urezki@gmail.com/T/ > v1: https://lore.kernel.org/lkml/20231025140915.590390-1-urezki@gmail.com/T/ Queued in place of your earlier series, thank you! Not urgent, but which rcutorture scenario should be pressed into service testing this? Thanx, Paul > Uladzislau Rezki (Sony) (6): > rcu: Add data structures for synchronize_rcu() > rcu: Reduce synchronize_rcu() latency > rcu: Add a trace event for synchronize_rcu_normal() > rcu: Support direct wake-up of synchronize_rcu() users > rcu: Do not release a wait-head from a GP kthread > rcu: Allocate WQ with WQ_MEM_RECLAIM bit set > > .../admin-guide/kernel-parameters.txt | 14 + > include/trace/events/rcu.h | 27 ++ > kernel/rcu/tree.c | 361 +++++++++++++++++- > kernel/rcu/tree.h | 20 + > kernel/rcu/tree_exp.h | 2 +- > 5 files changed, 422 insertions(+), 2 deletions(-) > > -- > 2.39.2 >
On Fri, Mar 08, 2024 at 01:51:29PM -0800, Paul E. McKenney wrote: > On Fri, Mar 08, 2024 at 06:34:03PM +0100, Uladzislau Rezki (Sony) wrote: > > This is v6. It is based on the Paul's "dev" branch: > > > > HEAD: f1bfe538c7970283040a7188a291aca9f18f0c42 > > > > please note, that patches should be applied from scratch, > > i.e. the v5 has to be dropped from the "dev". > > > > v5 -> v6: > > - Fix a race due to realising a wait-head from the gp-kthread; > > - Use our own private workqueue with WQ_MEM_RECLAIM to have > > at least one execution context. > > > > v5: https://lore.kernel.org/lkml/20240220183115.74124-1-urezki@gmail.com/ > > v4: https://lore.kernel.org/lkml/ZZ2bi5iPwXLgjB-f@google.com/T/ > > v3: https://lore.kernel.org/lkml/cd45b0b5-f86b-43fb-a5f3-47d340cd4f9f@paulmck-laptop/T/ > > v2: https://lore.kernel.org/all/20231030131254.488186-1-urezki@gmail.com/T/ > > v1: https://lore.kernel.org/lkml/20231025140915.590390-1-urezki@gmail.com/T/ > > Queued in place of your earlier series, thank you! > Thank you! > > Not urgent, but which rcutorture scenario should be pressed into service > testing this? > I tested with setting '5*TREE01 5*TREE02 5*TREE03 5*TREE04' apart of that i used some private test cases. The rcutree.rcu_normal_wake_from_gp=1 has to be passed also. Also, "rcuscale" can be used to stress the "cur_ops->sync()" path: <snip> #! /usr/bin/env bash LOOPS=1 for (( i=0; i<$LOOPS; i++ )); do tools/testing/selftests/rcutorture/bin/kvm.sh --memory 10G --torture rcuscale \ --allcpus \ --kconfig CONFIG_NR_CPUS=64 \ --kconfig CONFIG_RCU_NOCB_CPU=y \ --kconfig CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y \ --kconfig CONFIG_RCU_LAZY=n \ --bootargs "rcuscale.nwriters=200 rcuscale.nreaders=220 rcuscale.minruntime=50000 \ torture.disable_onoff_at_boot rcutree.rcu_normal_wake_from_gp=1" --trust-make echo "Done $i" done <snip> -- Uladzislau Rezki
On Mon, Mar 11, 2024 at 09:43:51AM +0100, Uladzislau Rezki wrote: > On Fri, Mar 08, 2024 at 01:51:29PM -0800, Paul E. McKenney wrote: > > On Fri, Mar 08, 2024 at 06:34:03PM +0100, Uladzislau Rezki (Sony) wrote: > > > This is v6. It is based on the Paul's "dev" branch: > > > > > > HEAD: f1bfe538c7970283040a7188a291aca9f18f0c42 > > > > > > please note, that patches should be applied from scratch, > > > i.e. the v5 has to be dropped from the "dev". > > > > > > v5 -> v6: > > > - Fix a race due to realising a wait-head from the gp-kthread; > > > - Use our own private workqueue with WQ_MEM_RECLAIM to have > > > at least one execution context. > > > > > > v5: https://lore.kernel.org/lkml/20240220183115.74124-1-urezki@gmail.com/ > > > v4: https://lore.kernel.org/lkml/ZZ2bi5iPwXLgjB-f@google.com/T/ > > > v3: https://lore.kernel.org/lkml/cd45b0b5-f86b-43fb-a5f3-47d340cd4f9f@paulmck-laptop/T/ > > > v2: https://lore.kernel.org/all/20231030131254.488186-1-urezki@gmail.com/T/ > > > v1: https://lore.kernel.org/lkml/20231025140915.590390-1-urezki@gmail.com/T/ > > > > Queued in place of your earlier series, thank you! > > > Thank you! > > > > > Not urgent, but which rcutorture scenario should be pressed into service > > testing this? > > > I tested with setting '5*TREE01 5*TREE02 5*TREE03 5*TREE04' apart of that > i used some private test cases. The rcutree.rcu_normal_wake_from_gp=1 has > to be passed also. > > Also, "rcuscale" can be used to stress the "cur_ops->sync()" path: > > <snip> > #! /usr/bin/env bash > > LOOPS=1 > > for (( i=0; i<$LOOPS; i++ )); do > tools/testing/selftests/rcutorture/bin/kvm.sh --memory 10G --torture rcuscale \ > --allcpus \ > --kconfig CONFIG_NR_CPUS=64 \ > --kconfig CONFIG_RCU_NOCB_CPU=y \ > --kconfig CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y \ > --kconfig CONFIG_RCU_LAZY=n \ > --bootargs "rcuscale.nwriters=200 rcuscale.nreaders=220 rcuscale.minruntime=50000 \ > torture.disable_onoff_at_boot rcutree.rcu_normal_wake_from_gp=1" --trust-make > echo "Done $i" > done > <snip> Very good, thank you! Of those five options (TREE01, TREE02, TREE03, TREE04, and rcuscale), which one should be changed so that my own testing automatically covers the rcutree.rcu_normal_wake_from_gp=1 case? I would guess that we should leave out TREE03, since it covers tall rcu_node trees. TREE01 looks closest to the ChromeOS/Android use case, but you tell me! And it might be time to rework the test cases to better align with the use cases. For example, I created TREE10 to cover Meta's fleet. But ChromeOS and Android have relatively small numbers of CPUs, so it should be possible to rework things a bit to make one of the existing tests cover that case, while modifying other tests to take up any situations that these changes exclude. Thoughts? Thanx, Paul