Message ID | 20230420051946.7463-8-yury.norov@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | sched/topology: add for_each_numa_cpu() macro | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Guessing tree name failed - patch did not apply |
On 19/04/23 22:19, Yury Norov wrote: > + for (node = 0; node < sched_domains_numa_levels; node++) { > + unsigned int hop, c = 0; > + > + rcu_read_lock(); > + for_each_numa_cpu(cpu, hop, node, cpu_online_mask) > + expect_eq_uint(cpumask_local_spread(c++, node), cpu); > + rcu_read_unlock(); > + } I'm not fond of the export of sched_domains_numa_levels, especially considering it's just there for tests. Furthermore, is there any value is testing parity with cpumask_local_spread()? Rather, shouldn't we check that using this API does yield CPUs of increasing NUMA distance? Something like for_each_node(node) { unsigned int prev_cpu, hop = 0; cpu = cpumask_first(cpumask_of_node(node)); prev_cpu = cpu; rcu_read_lock(); /* Assert distance is monotonically increasing */ for_each_numa_cpu(cpu, hop, node, cpu_online_mask) { expect_ge_uint(cpu_to_node(cpu), cpu_to_node(prev_cpu)); prev_cpu = cpu; } rcu_read_unlock(); }
Hi Valentin, Thanks for review! On Mon, Apr 24, 2023 at 06:09:52PM +0100, Valentin Schneider wrote: > On 19/04/23 22:19, Yury Norov wrote: > > + for (node = 0; node < sched_domains_numa_levels; node++) { > > + unsigned int hop, c = 0; > > + > > + rcu_read_lock(); > > + for_each_numa_cpu(cpu, hop, node, cpu_online_mask) > > + expect_eq_uint(cpumask_local_spread(c++, node), cpu); > > + rcu_read_unlock(); > > + } > > I'm not fond of the export of sched_domains_numa_levels, especially > considering it's just there for tests. > > Furthermore, is there any value is testing parity with > cpumask_local_spread()? I wanted to emphasize that new NUMA-aware functions are coherent with each other, just like find_nth_bit() is coherent with find_next_bit(). But all that coherence looks important only in non-NUMA case, because client code may depend on fact that next CPU is never less than current. This doesn't hold for NUMA iterators anyways... > Rather, shouldn't we check that using this API does > yield CPUs of increasing NUMA distance? > > Something like > > for_each_node(node) { > unsigned int prev_cpu, hop = 0; > > cpu = cpumask_first(cpumask_of_node(node)); > prev_cpu = cpu; > > rcu_read_lock(); > > /* Assert distance is monotonically increasing */ > for_each_numa_cpu(cpu, hop, node, cpu_online_mask) { > expect_ge_uint(cpu_to_node(cpu), cpu_to_node(prev_cpu)); > prev_cpu = cpu; > } > > rcu_read_unlock(); > } Your version of the test looks more straightforward. I need to think for more, but it looks like I can take it in v3. Thanks, Yury
On 25/04/23 22:50, Yury Norov wrote: > Hi Valentin, > > Thanks for review! > > On Mon, Apr 24, 2023 at 06:09:52PM +0100, Valentin Schneider wrote: >> On 19/04/23 22:19, Yury Norov wrote: >> > + for (node = 0; node < sched_domains_numa_levels; node++) { >> > + unsigned int hop, c = 0; >> > + >> > + rcu_read_lock(); >> > + for_each_numa_cpu(cpu, hop, node, cpu_online_mask) >> > + expect_eq_uint(cpumask_local_spread(c++, node), cpu); >> > + rcu_read_unlock(); >> > + } >> >> I'm not fond of the export of sched_domains_numa_levels, especially >> considering it's just there for tests. >> >> Furthermore, is there any value is testing parity with >> cpumask_local_spread()? > > I wanted to emphasize that new NUMA-aware functions are coherent with > each other, just like find_nth_bit() is coherent with find_next_bit(). > > But all that coherence looks important only in non-NUMA case, because > client code may depend on fact that next CPU is never less than current. > This doesn't hold for NUMA iterators anyways... > Ah right, I see your point. But yes, distance-ordered walks break this assumption. >> Rather, shouldn't we check that using this API does >> yield CPUs of increasing NUMA distance? >> >> Something like >> >> for_each_node(node) { >> unsigned int prev_cpu, hop = 0; >> >> cpu = cpumask_first(cpumask_of_node(node)); >> prev_cpu = cpu; >> >> rcu_read_lock(); >> >> /* Assert distance is monotonically increasing */ >> for_each_numa_cpu(cpu, hop, node, cpu_online_mask) { >> expect_ge_uint(cpu_to_node(cpu), cpu_to_node(prev_cpu)); >> prev_cpu = cpu; >> } >> >> rcu_read_unlock(); >> } > > Your version of the test looks more straightforward. I need to think > for more, but it looks like I can take it in v3. > I realized I only wrote half the relevant code - comparing node IDs is meaningless, I meant to compare distances as we walk through the CPUs... I tested the below against a few NUMA topologies and it seems to be sane: diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c index 6becb044a66f0..8f8512d139d58 100644 --- a/lib/test_bitmap.c +++ b/lib/test_bitmap.c @@ -174,11 +174,23 @@ __check_eq_str(const char *srcfile, unsigned int line, return eq; } -#define __expect_eq(suffix, ...) \ +static bool __init +__check_ge_uint(const char *srcfile, unsigned int line, + const unsigned int a, unsigned int b) +{ + if (a < b) { + pr_err("[%s:%u] expected a(%u) >= b(%u)\n", + srcfile, line, a, b); + return false; + } + return true; +} + +#define __expect_op(op, suffix, ...) \ ({ \ int result = 0; \ total_tests++; \ - if (!__check_eq_ ## suffix(__FILE__, __LINE__, \ + if (!__check_## op ## _ ## suffix(__FILE__, __LINE__, \ ##__VA_ARGS__)) { \ failed_tests++; \ result = 1; \ @@ -186,6 +198,9 @@ __check_eq_str(const char *srcfile, unsigned int line, result; \ }) +#define __expect_eq(suffix, ...) __expect_op(eq, suffix, ##__VA_ARGS__) +#define __expect_ge(suffix, ...) __expect_op(ge, suffix, ##__VA_ARGS__) + #define expect_eq_uint(...) __expect_eq(uint, ##__VA_ARGS__) #define expect_eq_bitmap(...) __expect_eq(bitmap, ##__VA_ARGS__) #define expect_eq_pbl(...) __expect_eq(pbl, ##__VA_ARGS__) @@ -193,6 +208,8 @@ __check_eq_str(const char *srcfile, unsigned int line, #define expect_eq_clump8(...) __expect_eq(clump8, ##__VA_ARGS__) #define expect_eq_str(...) __expect_eq(str, ##__VA_ARGS__) +#define expect_ge_uint(...) __expect_ge(uint, ##__VA_ARGS__) + static void __init test_zero_clear(void) { DECLARE_BITMAP(bmap, 1024); @@ -756,12 +773,23 @@ static void __init test_for_each_numa(void) { unsigned int cpu, node; - for (node = 0; node < sched_domains_numa_levels; node++) { - unsigned int hop, c = 0; + for_each_node(node) { + unsigned int start_cpu, prev_dist, hop = 0; + + cpu = cpumask_first(cpumask_of_node(node)); + prev_dist = node_distance(node, node); + start_cpu = cpu; rcu_read_lock(); - for_each_numa_cpu(cpu, hop, node, cpu_online_mask) - expect_eq_uint(cpumask_local_spread(c++, node), cpu); + + /* Assert distance is monotonically increasing */ + for_each_numa_cpu(cpu, hop, node, cpu_online_mask) { + unsigned int dist = node_distance(cpu_to_node(cpu), cpu_to_node(start_cpu)); + + expect_ge_uint(dist, prev_dist); + prev_dist = dist; + } + rcu_read_unlock(); } }
> I realized I only wrote half the relevant code - comparing node IDs is > meaningless, I meant to compare distances as we walk through the > CPUs... I tested the below against a few NUMA topologies and it seems to be > sane: > > @@ -756,12 +773,23 @@ static void __init test_for_each_numa(void) > { > unsigned int cpu, node; > > - for (node = 0; node < sched_domains_numa_levels; node++) { > - unsigned int hop, c = 0; > + for_each_node(node) { > + unsigned int start_cpu, prev_dist, hop = 0; > + > + cpu = cpumask_first(cpumask_of_node(node)); > + prev_dist = node_distance(node, node); > + start_cpu = cpu; > > rcu_read_lock(); > - for_each_numa_cpu(cpu, hop, node, cpu_online_mask) > - expect_eq_uint(cpumask_local_spread(c++, node), cpu); > + > + /* Assert distance is monotonically increasing */ > + for_each_numa_cpu(cpu, hop, node, cpu_online_mask) { > + unsigned int dist = node_distance(cpu_to_node(cpu), cpu_to_node(start_cpu)); Interestingly, node_distance() is an arch-specific function. Generic implementation is quite useless: #define node_distance(from,to) ((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE) Particularly, arm64 takes the above. With node_distance() implemented like that, we can barely test something... Taking that into the account, I think it's better to test iterator against cpumask_local_spread(), like in v2. I'll add a comment about that in v3. > + > + expect_ge_uint(dist, prev_dist); > + prev_dist = dist; > + } > + > rcu_read_unlock(); > } > }
On 26/04/23 13:51, Yury Norov wrote: >> I realized I only wrote half the relevant code - comparing node IDs is >> meaningless, I meant to compare distances as we walk through the >> CPUs... I tested the below against a few NUMA topologies and it seems to be >> sane: >> >> @@ -756,12 +773,23 @@ static void __init test_for_each_numa(void) >> { >> unsigned int cpu, node; >> >> - for (node = 0; node < sched_domains_numa_levels; node++) { >> - unsigned int hop, c = 0; >> + for_each_node(node) { >> + unsigned int start_cpu, prev_dist, hop = 0; >> + >> + cpu = cpumask_first(cpumask_of_node(node)); >> + prev_dist = node_distance(node, node); >> + start_cpu = cpu; >> >> rcu_read_lock(); >> - for_each_numa_cpu(cpu, hop, node, cpu_online_mask) >> - expect_eq_uint(cpumask_local_spread(c++, node), cpu); >> + >> + /* Assert distance is monotonically increasing */ >> + for_each_numa_cpu(cpu, hop, node, cpu_online_mask) { >> + unsigned int dist = node_distance(cpu_to_node(cpu), cpu_to_node(start_cpu)); > > Interestingly, node_distance() is an arch-specific function. Generic > implementation is quite useless: > > #define node_distance(from,to) ((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE) > > Particularly, arm64 takes the above. With node_distance() implemented > like that, we can barely test something... > riscv and arm64 rely on drivers/base/arch_numa.c to provide __node_distance() (cf. CONFIG_GENERIC_ARCH_NUMA). x86, sparc, powerpc and ia64 define __node_distance() loongarch and mips define their own node_distance(). So all of those archs will have a usable node_distance(), the others won't and that means the scheduler can't do anything about it - the scheduler relies on node_distance() to understand the topolgoy!
diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c index a8005ad3bd58..1b5f805f6879 100644 --- a/lib/test_bitmap.c +++ b/lib/test_bitmap.c @@ -12,6 +12,7 @@ #include <linux/printk.h> #include <linux/slab.h> #include <linux/string.h> +#include <linux/topology.h> #include <linux/uaccess.h> #include "../tools/testing/selftests/kselftest_module.h" @@ -751,6 +752,33 @@ static void __init test_for_each_set_bit_wrap(void) } } +static void __init test_for_each_numa(void) +{ + unsigned int cpu, node; + + for (node = 0; node < sched_domains_numa_levels; node++) { + const struct cpumask *m, *p = cpu_none_mask; + unsigned int c = 0; + + rcu_read_lock(); + for_each_numa_hop_mask(m, node) { + for_each_cpu_andnot(cpu, m, p) + expect_eq_uint(cpumask_local_spread(c++, node), cpu); + p = m; + } + rcu_read_unlock(); + } + + for (node = 0; node < sched_domains_numa_levels; node++) { + unsigned int hop, c = 0; + + rcu_read_lock(); + for_each_numa_cpu(cpu, hop, node, cpu_online_mask) + expect_eq_uint(cpumask_local_spread(c++, node), cpu); + rcu_read_unlock(); + } +} + static void __init test_for_each_set_bit(void) { DECLARE_BITMAP(orig, 500); @@ -1237,6 +1265,7 @@ static void __init selftest(void) test_for_each_clear_bitrange_from(); test_for_each_set_clump8(); test_for_each_set_bit_wrap(); + test_for_each_numa(); } KSTM_MODULE_LOADERS(test_bitmap);
The test ensures that enumerators' output is consistent with cpumask_local_spread(). Signed-off-by: Yury Norov <yury.norov@gmail.com> --- lib/test_bitmap.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+)