Message ID | 20230915154438.82931-2-ilpo.jarvinen@linux.intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | selftests/resctrl: Fixes to failing tests | expand |
Hi Ilpo, On 9/15/2023 8:44 AM, Ilpo Järvinen wrote: > Unmounting resctrl FS has been moved into the per test functions in > resctrl_tests.c by commit caddc0fbe495 ("selftests/resctrl: Move > resctrl FS mount/umount to higher level"). In case a signal (SIGINT, > SIGTERM, or SIGHUP) is received, the running selftest is aborted by > ctrlc_handler() which then unmounts resctrl fs before exiting. The > current section between signal_handler_register() and > signal_handler_unregister(), however, does not cover the entire > duration when resctrl FS is mounted. > > Move signal_handler_register() and signal_handler_unregister() calls > from per test files into resctrl_tests.c to properly unmount resctrl > fs. In order to not add signal_handler_register()/unregister() n times, > create helpers test_prepare() and test_cleanup(). > > Adjust child process kill() call in ctrlc_handler() to only be invoked > if the child was already forked. > > Fixes: caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level") > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> > Cc: <stable@vger.kernel.org> > --- > tools/testing/selftests/resctrl/cat_test.c | 8 --- > .../testing/selftests/resctrl/resctrl_tests.c | 65 +++++++++++-------- > tools/testing/selftests/resctrl/resctrl_val.c | 22 +++---- > 3 files changed, 48 insertions(+), 47 deletions(-) > > diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c > index 97b87285ab2a..224ba8544d8a 100644 > --- a/tools/testing/selftests/resctrl/cat_test.c > +++ b/tools/testing/selftests/resctrl/cat_test.c > @@ -167,12 +167,6 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) > strcpy(param.filename, RESULT_FILE_NAME1); > param.num_of_runs = 0; > param.cpu_no = sibling_cpu_no; > - } else { > - ret = signal_handler_register(); > - if (ret) { > - kill(bm_pid, SIGKILL); > - goto out; > - } > } > > remove(param.filename); > @@ -209,10 +203,8 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) > } > close(pipefd[0]); > kill(bm_pid, SIGKILL); > - signal_handler_unregister(); > } > > -out: > cat_test_cleanup(); > > return ret; > diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c > index 823672a20a43..524ba83d7568 100644 > --- a/tools/testing/selftests/resctrl/resctrl_tests.c > +++ b/tools/testing/selftests/resctrl/resctrl_tests.c > @@ -67,21 +67,41 @@ void tests_cleanup(void) > cat_test_cleanup(); > } > > -static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) > +static int test_prepare() > { > int res; > > - ksft_print_msg("Starting MBM BW change ...\n"); > + res = signal_handler_register(); > + if (res) > + return res; > > res = mount_resctrlfs(); > if (res) { > + signal_handler_unregister(); > ksft_exit_fail_msg("Failed to mount resctrl FS\n"); > - return; > + return res; > } > + return 0; > +} > + > +static void test_cleanup() > +{ > + umount_resctrlfs(); > + signal_handler_unregister(); > +} Thank you for adding these. > + > +static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) > +{ > + int res; > + > + ksft_print_msg("Starting MBM BW change ...\n"); > + > + if (test_prepare()) > + return; > I am not sure about this. With this exit the kselftest machinery is not aware of the test passing or failing. I wonder if there should not rather be a "goto" here that triggers ksft_test_result()? This needs some more thought though. First, with this change test_prepare() officially gains responsibility to determine if a failure is transient (just a single test fails) or permanent (no use trying any other tests if this fails). For the former it would then be up to the caller to call ksft_test_result() and for the latter test_prepare() will call ksft_exit_fail_msg(). Second, that SNC warning may be an inconvenience with a new goto. Here it may be ok to print that message before the test failure? > if (!validate_resctrl_feature_request(MBM_STR) || (get_vendor() != ARCH_INTEL)) { > ksft_test_result_skip("Hardware does not support MBM or MBM is disabled\n"); > - goto umount; > + goto cleanup; > } > > res = mbm_bw_change(cpu_no, benchmark_cmd); > @@ -89,8 +109,8 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) > if ((get_vendor() == ARCH_INTEL) && res) > ksft_print_msg("Intel MBM may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n"); > > -umount: > - umount_resctrlfs(); > +cleanup: > + test_cleanup(); > } > > static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) > @@ -99,22 +119,19 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) > > ksft_print_msg("Starting MBA Schemata change ...\n"); > > - res = mount_resctrlfs(); > - if (res) { > - ksft_exit_fail_msg("Failed to mount resctrl FS\n"); > + if (test_prepare()) > return; > - } > > if (!validate_resctrl_feature_request(MBA_STR) || (get_vendor() != ARCH_INTEL)) { > ksft_test_result_skip("Hardware does not support MBA or MBA is disabled\n"); > - goto umount; > + goto cleanup; > } > > res = mba_schemata_change(cpu_no, benchmark_cmd); > ksft_test_result(!res, "MBA: schemata change\n"); > > -umount: > - umount_resctrlfs(); > +cleanup: > + test_cleanup(); > } > > static void run_cmt_test(const char * const *benchmark_cmd, int cpu_no) > @@ -123,15 +140,12 @@ static void run_cmt_test(const char * const *benchmark_cmd, int cpu_no) > > ksft_print_msg("Starting CMT test ...\n"); > > - res = mount_resctrlfs(); > - if (res) { > - ksft_exit_fail_msg("Failed to mount resctrl FS\n"); > + if (test_prepare()) > return; > - } > > if (!validate_resctrl_feature_request(CMT_STR)) { > ksft_test_result_skip("Hardware does not support CMT or CMT is disabled\n"); > - goto umount; > + goto cleanup; > } > > res = cmt_resctrl_val(cpu_no, 5, benchmark_cmd); > @@ -139,8 +153,8 @@ static void run_cmt_test(const char * const *benchmark_cmd, int cpu_no) > if ((get_vendor() == ARCH_INTEL) && res) > ksft_print_msg("Intel CMT may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n"); > > -umount: > - umount_resctrlfs(); > +cleanup: > + test_cleanup(); > } > > static void run_cat_test(int cpu_no, int no_of_bits) > @@ -149,22 +163,19 @@ static void run_cat_test(int cpu_no, int no_of_bits) > > ksft_print_msg("Starting CAT test ...\n"); > > - res = mount_resctrlfs(); > - if (res) { > - ksft_exit_fail_msg("Failed to mount resctrl FS\n"); > + if (test_prepare()) > return; > - } > > if (!validate_resctrl_feature_request(CAT_STR)) { > ksft_test_result_skip("Hardware does not support CAT or CAT is disabled\n"); > - goto umount; > + goto cleanup; > } > > res = cat_perf_miss_val(cpu_no, no_of_bits, "L3"); > ksft_test_result(!res, "CAT: test\n"); > > -umount: > - umount_resctrlfs(); > +cleanup: > + test_cleanup(); > } > > int main(int argc, char **argv) > diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c > index 51963a6f2186..a9fe61133119 100644 > --- a/tools/testing/selftests/resctrl/resctrl_val.c > +++ b/tools/testing/selftests/resctrl/resctrl_val.c > @@ -468,7 +468,9 @@ pid_t bm_pid, ppid; > > void ctrlc_handler(int signum, siginfo_t *info, void *ptr) > { > - kill(bm_pid, SIGKILL); > + /* Only kill child after bm_pid is set after fork() */ > + if (bm_pid) > + kill(bm_pid, SIGKILL); > umount_resctrlfs(); > tests_cleanup(); > ksft_print_msg("Ending\n\n"); > @@ -485,6 +487,8 @@ int signal_handler_register(void) > struct sigaction sigact; > int ret = 0; > > + bm_pid = 0; > + Since this is an initialization fix in this area ... what do you think of also initializing sigact? It could just be a change to struct sigaction sigact = {}; This will prevent registering a signal handler with uninitialized sa_flags. Reinette
Hello > On 9/15/2023 8:44 AM, Ilpo Järvinen wrote: > > Unmounting resctrl FS has been moved into the per test functions in > > resctrl_tests.c by commit caddc0fbe495 ("selftests/resctrl: Move > > resctrl FS mount/umount to higher level"). In case a signal (SIGINT, > > SIGTERM, or SIGHUP) is received, the running selftest is aborted by > > ctrlc_handler() which then unmounts resctrl fs before exiting. The > > current section between signal_handler_register() and > > signal_handler_unregister(), however, does not cover the entire > > duration when resctrl FS is mounted. > > > > Move signal_handler_register() and signal_handler_unregister() calls > > from per test files into resctrl_tests.c to properly unmount resctrl > > fs. In order to not add signal_handler_register()/unregister() n > > times, create helpers test_prepare() and test_cleanup(). > > > > Adjust child process kill() call in ctrlc_handler() to only be invoked > > if the child was already forked. > > > > Fixes: caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount > > to higher level") > > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> > > Cc: <stable@vger.kernel.org> > > --- > > tools/testing/selftests/resctrl/cat_test.c | 8 --- > > .../testing/selftests/resctrl/resctrl_tests.c | 65 > > +++++++++++-------- tools/testing/selftests/resctrl/resctrl_val.c | > > 22 +++---- > > 3 files changed, 48 insertions(+), 47 deletions(-) > > > > diff --git a/tools/testing/selftests/resctrl/cat_test.c > > b/tools/testing/selftests/resctrl/cat_test.c > > index 97b87285ab2a..224ba8544d8a 100644 > > --- a/tools/testing/selftests/resctrl/cat_test.c > > +++ b/tools/testing/selftests/resctrl/cat_test.c > > @@ -167,12 +167,6 @@ int cat_perf_miss_val(int cpu_no, int n, char > *cache_type) > > strcpy(param.filename, RESULT_FILE_NAME1); > > param.num_of_runs = 0; > > param.cpu_no = sibling_cpu_no; > > - } else { > > - ret = signal_handler_register(); > > - if (ret) { > > - kill(bm_pid, SIGKILL); > > - goto out; > > - } > > } > > > > remove(param.filename); > > @@ -209,10 +203,8 @@ int cat_perf_miss_val(int cpu_no, int n, char > *cache_type) > > } > > close(pipefd[0]); > > kill(bm_pid, SIGKILL); > > - signal_handler_unregister(); > > } > > > > -out: > > cat_test_cleanup(); > > > > return ret; > > diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c > > b/tools/testing/selftests/resctrl/resctrl_tests.c > > index 823672a20a43..524ba83d7568 100644 > > --- a/tools/testing/selftests/resctrl/resctrl_tests.c > > +++ b/tools/testing/selftests/resctrl/resctrl_tests.c > > @@ -67,21 +67,41 @@ void tests_cleanup(void) > > cat_test_cleanup(); > > } > > > > -static void run_mbm_test(const char * const *benchmark_cmd, int > > cpu_no) > > +static int test_prepare() > > { > > int res; > > > > - ksft_print_msg("Starting MBM BW change ...\n"); > > + res = signal_handler_register(); > > + if (res) > > + return res; > > > > res = mount_resctrlfs(); > > if (res) { > > + signal_handler_unregister(); > > ksft_exit_fail_msg("Failed to mount resctrl FS\n"); > > - return; > > + return res; > > } > > + return 0; > > +} > > + > > +static void test_cleanup() > > +{ > > + umount_resctrlfs(); > > + signal_handler_unregister(); > > +} > > Thank you for adding these. > > > + > > +static void run_mbm_test(const char * const *benchmark_cmd, int > > +cpu_no) { > > + int res; > > + > > + ksft_print_msg("Starting MBM BW change ...\n"); > > + > > + if (test_prepare()) > > + return; > > > > I am not sure about this. With this exit the kselftest machinery is not aware of > the test passing or failing. I wonder if there should not rather be a "goto" here > that triggers ksft_test_result()? This needs some more thought though. First, > with this change test_prepare() officially gains responsibility to determine if a > failure is transient (just a single test > fails) or permanent (no use trying any other tests if this fails). For the former it > would then be up to the caller to call ksft_test_result() and for the latter > test_prepare() will call ksft_exit_fail_msg(). > Second, that SNC warning may be an inconvenience with a new goto. Here it > may be ok to print that message before the test failure? If a failure may be permanent, it may be best to detect it before running all tests, rather than in test_prepare(). Now some detections are completed before running all tests. For example: 273 if (geteuid() != 0) 274 return ksft_exit_skip("Not running as root. Skipping...\n"); 275 276 if (!check_resctrlfs_support()) 277 return ksft_exit_skip("resctrl FS does not exist. Enable X86_CPU_RESCTRL config option.\n"); 278 279 if (umount_resctrlfs()) 280 return ksft_exit_skip("resctrl FS unmount failed.\n"); Best regards, Shaopeng TAN > > if (!validate_resctrl_feature_request(MBM_STR) || (get_vendor() != > ARCH_INTEL)) { > > ksft_test_result_skip("Hardware does not support MBM or > MBM is disabled\n"); > > - goto umount; > > + goto cleanup; > > } > > > > res = mbm_bw_change(cpu_no, benchmark_cmd); @@ -89,8 +109,8 > @@ > > static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) > > if ((get_vendor() == ARCH_INTEL) && res) > > ksft_print_msg("Intel MBM may be inaccurate when > Sub-NUMA > > Clustering is enabled. Check BIOS configuration.\n"); > > > > -umount: > > - umount_resctrlfs(); > > +cleanup: > > + test_cleanup(); > > } > > > > static void run_mba_test(const char * const *benchmark_cmd, int > > cpu_no) @@ -99,22 +119,19 @@ static void run_mba_test(const char * > > const *benchmark_cmd, int cpu_no) > > > > ksft_print_msg("Starting MBA Schemata change ...\n"); > > > > - res = mount_resctrlfs(); > > - if (res) { > > - ksft_exit_fail_msg("Failed to mount resctrl FS\n"); > > + if (test_prepare()) > > return; > > - } > > > > if (!validate_resctrl_feature_request(MBA_STR) || (get_vendor() != > ARCH_INTEL)) { > > ksft_test_result_skip("Hardware does not support MBA or > MBA is disabled\n"); > > - goto umount; > > + goto cleanup; > > } > > > > res = mba_schemata_change(cpu_no, benchmark_cmd); > > ksft_test_result(!res, "MBA: schemata change\n"); > > > > -umount: > > - umount_resctrlfs(); > > +cleanup: > > + test_cleanup(); > > } > > > > static void run_cmt_test(const char * const *benchmark_cmd, int > > cpu_no) @@ -123,15 +140,12 @@ static void run_cmt_test(const char * > > const *benchmark_cmd, int cpu_no) > > > > ksft_print_msg("Starting CMT test ...\n"); > > > > - res = mount_resctrlfs(); > > - if (res) { > > - ksft_exit_fail_msg("Failed to mount resctrl FS\n"); > > + if (test_prepare()) > > return; > > - } > > > > if (!validate_resctrl_feature_request(CMT_STR)) { > > ksft_test_result_skip("Hardware does not support CMT or > CMT is disabled\n"); > > - goto umount; > > + goto cleanup; > > } > > > > res = cmt_resctrl_val(cpu_no, 5, benchmark_cmd); @@ -139,8 +153,8 > @@ > > static void run_cmt_test(const char * const *benchmark_cmd, int cpu_no) > > if ((get_vendor() == ARCH_INTEL) && res) > > ksft_print_msg("Intel CMT may be inaccurate when > Sub-NUMA > > Clustering is enabled. Check BIOS configuration.\n"); > > > > -umount: > > - umount_resctrlfs(); > > +cleanup: > > + test_cleanup(); > > } > > > > static void run_cat_test(int cpu_no, int no_of_bits) @@ -149,22 > > +163,19 @@ static void run_cat_test(int cpu_no, int no_of_bits) > > > > ksft_print_msg("Starting CAT test ...\n"); > > > > - res = mount_resctrlfs(); > > - if (res) { > > - ksft_exit_fail_msg("Failed to mount resctrl FS\n"); > > + if (test_prepare()) > > return; > > - } > > > > if (!validate_resctrl_feature_request(CAT_STR)) { > > ksft_test_result_skip("Hardware does not support CAT or CAT > is disabled\n"); > > - goto umount; > > + goto cleanup; > > } > > > > res = cat_perf_miss_val(cpu_no, no_of_bits, "L3"); > > ksft_test_result(!res, "CAT: test\n"); > > > > -umount: > > - umount_resctrlfs(); > > +cleanup: > > + test_cleanup(); > > } > > > > int main(int argc, char **argv) > > diff --git a/tools/testing/selftests/resctrl/resctrl_val.c > > b/tools/testing/selftests/resctrl/resctrl_val.c > > index 51963a6f2186..a9fe61133119 100644 > > --- a/tools/testing/selftests/resctrl/resctrl_val.c > > +++ b/tools/testing/selftests/resctrl/resctrl_val.c > > @@ -468,7 +468,9 @@ pid_t bm_pid, ppid; > > > > void ctrlc_handler(int signum, siginfo_t *info, void *ptr) { > > - kill(bm_pid, SIGKILL); > > + /* Only kill child after bm_pid is set after fork() */ > > + if (bm_pid) > > + kill(bm_pid, SIGKILL); > > umount_resctrlfs(); > > tests_cleanup(); > > ksft_print_msg("Ending\n\n"); > > @@ -485,6 +487,8 @@ int signal_handler_register(void) > > struct sigaction sigact; > > int ret = 0; > > > > + bm_pid = 0; > > + > > Since this is an initialization fix in this area ... what do you think of also > initializing sigact? It could just be a change to > struct sigaction sigact = {}; > > This will prevent registering a signal handler with uninitialized sa_flags. > > Reinette
On Tue, 26 Sep 2023, Reinette Chatre wrote: > On 9/15/2023 8:44 AM, Ilpo Järvinen wrote: > > Unmounting resctrl FS has been moved into the per test functions in > > resctrl_tests.c by commit caddc0fbe495 ("selftests/resctrl: Move > > resctrl FS mount/umount to higher level"). In case a signal (SIGINT, > > SIGTERM, or SIGHUP) is received, the running selftest is aborted by > > ctrlc_handler() which then unmounts resctrl fs before exiting. The > > current section between signal_handler_register() and > > signal_handler_unregister(), however, does not cover the entire > > duration when resctrl FS is mounted. > > > > Move signal_handler_register() and signal_handler_unregister() calls > > from per test files into resctrl_tests.c to properly unmount resctrl > > fs. In order to not add signal_handler_register()/unregister() n times, > > create helpers test_prepare() and test_cleanup(). > > > > Adjust child process kill() call in ctrlc_handler() to only be invoked > > if the child was already forked. > > > > Fixes: caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level") > > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> > > Cc: <stable@vger.kernel.org> > > --- > > tools/testing/selftests/resctrl/cat_test.c | 8 --- > > .../testing/selftests/resctrl/resctrl_tests.c | 65 +++++++++++-------- > > tools/testing/selftests/resctrl/resctrl_val.c | 22 +++---- > > 3 files changed, 48 insertions(+), 47 deletions(-) > > > > diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c > > index 97b87285ab2a..224ba8544d8a 100644 > > --- a/tools/testing/selftests/resctrl/cat_test.c > > +++ b/tools/testing/selftests/resctrl/cat_test.c > > @@ -167,12 +167,6 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) > > strcpy(param.filename, RESULT_FILE_NAME1); > > param.num_of_runs = 0; > > param.cpu_no = sibling_cpu_no; > > - } else { > > - ret = signal_handler_register(); > > - if (ret) { > > - kill(bm_pid, SIGKILL); > > - goto out; > > - } > > } > > > > remove(param.filename); > > @@ -209,10 +203,8 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) > > } > > close(pipefd[0]); > > kill(bm_pid, SIGKILL); > > - signal_handler_unregister(); > > } > > > > -out: > > cat_test_cleanup(); > > > > return ret; > > diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c > > index 823672a20a43..524ba83d7568 100644 > > --- a/tools/testing/selftests/resctrl/resctrl_tests.c > > +++ b/tools/testing/selftests/resctrl/resctrl_tests.c > > @@ -67,21 +67,41 @@ void tests_cleanup(void) > > cat_test_cleanup(); > > } > > > > -static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) > > +static int test_prepare() > > { > > int res; > > > > - ksft_print_msg("Starting MBM BW change ...\n"); > > + res = signal_handler_register(); > > + if (res) > > + return res; > > res = mount_resctrlfs(); > > if (res) { > > + signal_handler_unregister(); > > ksft_exit_fail_msg("Failed to mount resctrl FS\n"); > > - return; > > + return res; > > } > > + return 0; > > +} > > + > > +static void test_cleanup() > > +{ > > + umount_resctrlfs(); > > + signal_handler_unregister(); > > +} > > Thank you for adding these. > > > + > > +static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) > > +{ > > + int res; > > + > > + ksft_print_msg("Starting MBM BW change ...\n"); > > + > > + if (test_prepare()) > > + return; > > > > I am not sure about this. With this exit the kselftest machinery is not > aware of the test passing or failing. I wonder if there should not rather > be a "goto" here that triggers ksft_test_result()? Yes, ksft_test_result() is needed here (I forgot to add it). > This needs some more > thought though. First, with this change test_prepare() officially gains > responsibility to determine if a failure is transient (just a single test > fails) or permanent (no use trying any other tests if this fails). For > the former it would then be up to the caller to call ksft_test_result() > and for the latter test_prepare() will call ksft_exit_fail_msg(). Well, I didn't initially have test_prepare() at all but all this was within the test functions (which will be consolidated to a single function by the series that comes after the two series are done + one patch from Maciej). I was just trying to do what was done previously but it seems I forgot to handle the result status on signal reg fail path. TBH, I wouldn't mind if also the signal reg fail is just up'ed to ksft_exit_fail_msg(). I don't think it can ever fail with the parameters given to it so its error handling feels pretty much dead-code (unless some crazy thing such as apparmor does something out of the blue, I don't know if apparmor has capability override sigaction() but I've seen apparmor to create errors that from the surface make no sense whatsoever comparable to this case). So basically this discussion is now about what to do with the mount failing which already does _exit() before this patch (and possibly some hypotethical, new prepare code after the consolidation work which also will have some impact and I believe we might actually want to kill test_prepare() at that point anyway). > Second, that SNC warning may be an inconvenience with a new goto. Here > it may be ok to print that message before the test failure? I don't follow what you're referring to with "that SNC warning". To the "Intel CMT may be inaccurate ..." one? > > if (!validate_resctrl_feature_request(MBM_STR) || (get_vendor() != ARCH_INTEL)) { > > ksft_test_result_skip("Hardware does not support MBM or MBM is disabled\n"); > > - goto umount; > > + goto cleanup; > > } > > > > res = mbm_bw_change(cpu_no, benchmark_cmd); > > @@ -89,8 +109,8 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) > > if ((get_vendor() == ARCH_INTEL) && res) > > ksft_print_msg("Intel MBM may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n"); > > > > -umount: > > - umount_resctrlfs(); > > +cleanup: > > + test_cleanup(); > > } > > > > static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) > > @@ -99,22 +119,19 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) > > > > ksft_print_msg("Starting MBA Schemata change ...\n"); > > > > - res = mount_resctrlfs(); > > - if (res) { > > - ksft_exit_fail_msg("Failed to mount resctrl FS\n"); > > + if (test_prepare()) > > return; > > - } > > > > if (!validate_resctrl_feature_request(MBA_STR) || (get_vendor() != ARCH_INTEL)) { > > ksft_test_result_skip("Hardware does not support MBA or MBA is disabled\n"); > > - goto umount; > > + goto cleanup; > > } > > > > res = mba_schemata_change(cpu_no, benchmark_cmd); > > ksft_test_result(!res, "MBA: schemata change\n"); > > > > -umount: > > - umount_resctrlfs(); > > +cleanup: > > + test_cleanup(); > > } > > > > static void run_cmt_test(const char * const *benchmark_cmd, int cpu_no) > > @@ -123,15 +140,12 @@ static void run_cmt_test(const char * const *benchmark_cmd, int cpu_no) > > > > ksft_print_msg("Starting CMT test ...\n"); > > > > - res = mount_resctrlfs(); > > - if (res) { > > - ksft_exit_fail_msg("Failed to mount resctrl FS\n"); > > + if (test_prepare()) > > return; > > - } > > > > if (!validate_resctrl_feature_request(CMT_STR)) { > > ksft_test_result_skip("Hardware does not support CMT or CMT is disabled\n"); > > - goto umount; > > + goto cleanup; > > } > > > > res = cmt_resctrl_val(cpu_no, 5, benchmark_cmd); > > @@ -139,8 +153,8 @@ static void run_cmt_test(const char * const *benchmark_cmd, int cpu_no) > > if ((get_vendor() == ARCH_INTEL) && res) > > ksft_print_msg("Intel CMT may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n"); > > > > -umount: > > - umount_resctrlfs(); > > +cleanup: > > + test_cleanup(); > > } > > > > static void run_cat_test(int cpu_no, int no_of_bits) > > @@ -149,22 +163,19 @@ static void run_cat_test(int cpu_no, int no_of_bits) > > > > ksft_print_msg("Starting CAT test ...\n"); > > > > - res = mount_resctrlfs(); > > - if (res) { > > - ksft_exit_fail_msg("Failed to mount resctrl FS\n"); > > + if (test_prepare()) > > return; > > - } > > > > if (!validate_resctrl_feature_request(CAT_STR)) { > > ksft_test_result_skip("Hardware does not support CAT or CAT is disabled\n"); > > - goto umount; > > + goto cleanup; > > } > > > > res = cat_perf_miss_val(cpu_no, no_of_bits, "L3"); > > ksft_test_result(!res, "CAT: test\n"); > > > > -umount: > > - umount_resctrlfs(); > > +cleanup: > > + test_cleanup(); > > } > > > > int main(int argc, char **argv) > > diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c > > index 51963a6f2186..a9fe61133119 100644 > > --- a/tools/testing/selftests/resctrl/resctrl_val.c > > +++ b/tools/testing/selftests/resctrl/resctrl_val.c > > @@ -468,7 +468,9 @@ pid_t bm_pid, ppid; > > > > void ctrlc_handler(int signum, siginfo_t *info, void *ptr) > > { > > - kill(bm_pid, SIGKILL); > > + /* Only kill child after bm_pid is set after fork() */ > > + if (bm_pid) > > + kill(bm_pid, SIGKILL); > > umount_resctrlfs(); > > tests_cleanup(); > > ksft_print_msg("Ending\n\n"); > > @@ -485,6 +487,8 @@ int signal_handler_register(void) > > struct sigaction sigact; > > int ret = 0; > > > > + bm_pid = 0; > > + > > Since this is an initialization fix in this area ... what > do you think of also initializing sigact? It could just be > a change to > struct sigaction sigact = {}; > > This will prevent registering a signal handler with > uninitialized sa_flags. Nice catch. It seems quite bad bug, I'll add another patch to fix it. Thanks once again for your reviews! I'll also address the changelog improvements you mentioned against the other patches.
Hi Ilpo, On 9/28/2023 5:47 AM, Ilpo Järvinen wrote: > On Tue, 26 Sep 2023, Reinette Chatre wrote: >> On 9/15/2023 8:44 AM, Ilpo Järvinen wrote: ... >>> + >>> +static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) >>> +{ >>> + int res; >>> + >>> + ksft_print_msg("Starting MBM BW change ...\n"); >>> + >>> + if (test_prepare()) >>> + return; >>> >> >> I am not sure about this. With this exit the kselftest machinery is not >> aware of the test passing or failing. I wonder if there should not rather >> be a "goto" here that triggers ksft_test_result()? > > Yes, ksft_test_result() is needed here (I forgot to add it). > >> This needs some more >> thought though. First, with this change test_prepare() officially gains >> responsibility to determine if a failure is transient (just a single test >> fails) or permanent (no use trying any other tests if this fails). For >> the former it would then be up to the caller to call ksft_test_result() >> and for the latter test_prepare() will call ksft_exit_fail_msg(). > > Well, I didn't initially have test_prepare() at all but all this was > within the test functions (which will be consolidated to a single function > by the series that comes after the two series are done + one patch from > Maciej). > > I was just trying to do what was done previously but it seems I forgot to > handle the result status on signal reg fail path. > > TBH, I wouldn't mind if also the signal reg fail is just up'ed to > ksft_exit_fail_msg(). I don't think it can ever fail with the parameters > given to it so its error handling feels pretty much dead-code (unless some > crazy thing such as apparmor does something out of the blue, I don't know > if apparmor has capability override sigaction() but I've seen apparmor to > create errors that from the surface make no sense whatsoever comparable > to this case). > > So basically this discussion is now about what to do with the mount > failing which already does _exit() before this patch (and possibly some > hypotethical, new prepare code after the consolidation work which also > will have some impact and I believe we might actually want to kill > test_prepare() at that point anyway). Having failure during signal handler registration also trigger ksft_exit_fail_msg() sounds fair to me. I am also ok with keeping the exit when mount fails. If any future test_prepare() code does not imply a test exit then I hope it would be obvious that ksft_test_result() needs to be called. Perhaps that can be accomplished if test_prepare() does not exit the test but instead just returns an error code (if needed it can use ksft_print_msg() internally for any details about particular failures) and the caller call ksft_exit_fail_msg() if test_prepare() fails? With the caller responsible for the ksft_exit_fail_msg() as well as ksft_test_result() then any new addition may be guided to the right calls. This considers hypothetical future changes to code that is being consolidated so surely no strong opinions from my side. >> Second, that SNC warning may be an inconvenience with a new goto. Here >> it may be ok to print that message before the test failure? > > I don't follow what you're referring to with "that SNC warning". To the > "Intel CMT may be inaccurate ..." one? Yes, that is the warning. I envisioned addressing the issue by adding a goto label right before the ksft_test_result() call within run_cmt_test() in this case (but also in run_mbm_test()). Doing so would solve the issue that test counters are incremented on test_prepare() failure but it will also trigger the message you note and that would be confusing to the user if the test failure was because of signal handler registration failure. ... >>> diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c >>> index 51963a6f2186..a9fe61133119 100644 >>> --- a/tools/testing/selftests/resctrl/resctrl_val.c >>> +++ b/tools/testing/selftests/resctrl/resctrl_val.c >>> @@ -468,7 +468,9 @@ pid_t bm_pid, ppid; >>> >>> void ctrlc_handler(int signum, siginfo_t *info, void *ptr) >>> { >>> - kill(bm_pid, SIGKILL); >>> + /* Only kill child after bm_pid is set after fork() */ >>> + if (bm_pid) >>> + kill(bm_pid, SIGKILL); >>> umount_resctrlfs(); >>> tests_cleanup(); >>> ksft_print_msg("Ending\n\n"); >>> @@ -485,6 +487,8 @@ int signal_handler_register(void) >>> struct sigaction sigact; >>> int ret = 0; >>> >>> + bm_pid = 0; >>> + >> >> Since this is an initialization fix in this area ... what >> do you think of also initializing sigact? It could just be >> a change to >> struct sigaction sigact = {}; >> >> This will prevent registering a signal handler with >> uninitialized sa_flags. > > Nice catch. It seems quite bad bug, I'll add another patch to fix it. > > Thanks once again for your reviews! I'll also address the changelog > improvements you mentioned against the other patches. > Thanks to you for improving the resctrl selftests so significantly. This work is very valuable because we use it to measure and gain confidence in the health of the resctrl subsystem. Reinette
Hi Shaopeng, On 9/28/2023 1:10 AM, Shaopeng Tan (Fujitsu) wrote: >> On 9/15/2023 8:44 AM, Ilpo Järvinen wrote: ... >>> +static void run_mbm_test(const char * const *benchmark_cmd, int >>> +cpu_no) { >>> + int res; >>> + >>> + ksft_print_msg("Starting MBM BW change ...\n"); >>> + >>> + if (test_prepare()) >>> + return; >>> >> >> I am not sure about this. With this exit the kselftest machinery is not aware of >> the test passing or failing. I wonder if there should not rather be a "goto" here >> that triggers ksft_test_result()? This needs some more thought though. First, >> with this change test_prepare() officially gains responsibility to determine if a >> failure is transient (just a single test >> fails) or permanent (no use trying any other tests if this fails). For the former it >> would then be up to the caller to call ksft_test_result() and for the latter >> test_prepare() will call ksft_exit_fail_msg(). >> Second, that SNC warning may be an inconvenience with a new goto. Here it >> may be ok to print that message before the test failure? > > If a failure may be permanent, it may be best to detect it before running all tests, rather than in test_prepare(). > Now some detections are completed before running all tests. For example: > 273 if (geteuid() != 0) > 274 return ksft_exit_skip("Not running as root. Skipping...\n"); > 275 > 276 if (!check_resctrlfs_support()) > 277 return ksft_exit_skip("resctrl FS does not exist. Enable X86_CPU_RESCTRL config option.\n"); > 278 > 279 if (umount_resctrlfs()) > 280 return ksft_exit_skip("resctrl FS unmount failed.\n"); > You are correct that the tests should aim to detect as early as possible if no test has a chance of succeeding. This is covered in the checks you mention. The purpose of test_prepare()/test_cleanup() pair is to perform actions that should be done for every test. For example, resctrl is mounted before each test and unmounted after each test. Since these actions are required to be done for every test it cannot be a single call before all tests are run. It may be possible to add a test_prepare() directly followed by a test_cleanup() before any test is run to be more explicit about early detection but that does not seem necessary considering the checks would be done anyway when the first test is run. Even when doing so it would not eliminate the need for test_prepare()/test_cleanup() to form part of every test run and needing to exit if, for example, a previous test triggered a fault preventing resctrl from being mounted. Reinette
Hello Reinette, > On 9/28/2023 1:10 AM, Shaopeng Tan (Fujitsu) wrote: > >> On 9/15/2023 8:44 AM, Ilpo Järvinen wrote: > > ... > > >>> +static void run_mbm_test(const char * const *benchmark_cmd, int > >>> +cpu_no) { > >>> + int res; > >>> + > >>> + ksft_print_msg("Starting MBM BW change ...\n"); > >>> + > >>> + if (test_prepare()) > >>> + return; > >>> > >> > >> I am not sure about this. With this exit the kselftest machinery is > >> not aware of the test passing or failing. I wonder if there should > >> not rather be a "goto" here that triggers ksft_test_result()? This > >> needs some more thought though. First, with this change > >> test_prepare() officially gains responsibility to determine if a > >> failure is transient (just a single test > >> fails) or permanent (no use trying any other tests if this fails). > >> For the former it would then be up to the caller to call > >> ksft_test_result() and for the latter > >> test_prepare() will call ksft_exit_fail_msg(). > >> Second, that SNC warning may be an inconvenience with a new goto. > >> Here it may be ok to print that message before the test failure? > > > > If a failure may be permanent, it may be best to detect it before running all > tests, rather than in test_prepare(). > > Now some detections are completed before running all tests. For example: > > 273 if (geteuid() != 0) > > 274 return ksft_exit_skip("Not running as root. > Skipping...\n"); > > 275 > > 276 if (!check_resctrlfs_support()) > > 277 return ksft_exit_skip("resctrl FS does not exist. Enable > X86_CPU_RESCTRL config option.\n"); > > 278 > > 279 if (umount_resctrlfs()) > > 280 return ksft_exit_skip("resctrl FS unmount failed.\n"); > > > > You are correct that the tests should aim to detect as early as possible if no test > has a chance of succeeding. This is covered in the checks you mention. > The purpose of test_prepare()/test_cleanup() pair is to perform actions that > should be done for every test. For example, resctrl is mounted before each test > and unmounted after each test. Since these actions are required to be done for > every test it cannot be a single call before all tests are run. > > It may be possible to add a test_prepare() directly followed by a test_cleanup() > before any test is run to be more explicit about early detection but that does not > seem necessary considering the checks would be done anyway when the first > test is run. Even when doing so it would not eliminate the need for > test_prepare()/test_cleanup() to form part of every test run and needing to exit > if, for example, a previous test triggered a fault preventing resctrl from being > mounted. Thanks for your explanation. I understand Best regards, Shaopeng TAN
diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index 97b87285ab2a..224ba8544d8a 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -167,12 +167,6 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) strcpy(param.filename, RESULT_FILE_NAME1); param.num_of_runs = 0; param.cpu_no = sibling_cpu_no; - } else { - ret = signal_handler_register(); - if (ret) { - kill(bm_pid, SIGKILL); - goto out; - } } remove(param.filename); @@ -209,10 +203,8 @@ int cat_perf_miss_val(int cpu_no, int n, char *cache_type) } close(pipefd[0]); kill(bm_pid, SIGKILL); - signal_handler_unregister(); } -out: cat_test_cleanup(); return ret; diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 823672a20a43..524ba83d7568 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -67,21 +67,41 @@ void tests_cleanup(void) cat_test_cleanup(); } -static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) +static int test_prepare() { int res; - ksft_print_msg("Starting MBM BW change ...\n"); + res = signal_handler_register(); + if (res) + return res; res = mount_resctrlfs(); if (res) { + signal_handler_unregister(); ksft_exit_fail_msg("Failed to mount resctrl FS\n"); - return; + return res; } + return 0; +} + +static void test_cleanup() +{ + umount_resctrlfs(); + signal_handler_unregister(); +} + +static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) +{ + int res; + + ksft_print_msg("Starting MBM BW change ...\n"); + + if (test_prepare()) + return; if (!validate_resctrl_feature_request(MBM_STR) || (get_vendor() != ARCH_INTEL)) { ksft_test_result_skip("Hardware does not support MBM or MBM is disabled\n"); - goto umount; + goto cleanup; } res = mbm_bw_change(cpu_no, benchmark_cmd); @@ -89,8 +109,8 @@ static void run_mbm_test(const char * const *benchmark_cmd, int cpu_no) if ((get_vendor() == ARCH_INTEL) && res) ksft_print_msg("Intel MBM may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n"); -umount: - umount_resctrlfs(); +cleanup: + test_cleanup(); } static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) @@ -99,22 +119,19 @@ static void run_mba_test(const char * const *benchmark_cmd, int cpu_no) ksft_print_msg("Starting MBA Schemata change ...\n"); - res = mount_resctrlfs(); - if (res) { - ksft_exit_fail_msg("Failed to mount resctrl FS\n"); + if (test_prepare()) return; - } if (!validate_resctrl_feature_request(MBA_STR) || (get_vendor() != ARCH_INTEL)) { ksft_test_result_skip("Hardware does not support MBA or MBA is disabled\n"); - goto umount; + goto cleanup; } res = mba_schemata_change(cpu_no, benchmark_cmd); ksft_test_result(!res, "MBA: schemata change\n"); -umount: - umount_resctrlfs(); +cleanup: + test_cleanup(); } static void run_cmt_test(const char * const *benchmark_cmd, int cpu_no) @@ -123,15 +140,12 @@ static void run_cmt_test(const char * const *benchmark_cmd, int cpu_no) ksft_print_msg("Starting CMT test ...\n"); - res = mount_resctrlfs(); - if (res) { - ksft_exit_fail_msg("Failed to mount resctrl FS\n"); + if (test_prepare()) return; - } if (!validate_resctrl_feature_request(CMT_STR)) { ksft_test_result_skip("Hardware does not support CMT or CMT is disabled\n"); - goto umount; + goto cleanup; } res = cmt_resctrl_val(cpu_no, 5, benchmark_cmd); @@ -139,8 +153,8 @@ static void run_cmt_test(const char * const *benchmark_cmd, int cpu_no) if ((get_vendor() == ARCH_INTEL) && res) ksft_print_msg("Intel CMT may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n"); -umount: - umount_resctrlfs(); +cleanup: + test_cleanup(); } static void run_cat_test(int cpu_no, int no_of_bits) @@ -149,22 +163,19 @@ static void run_cat_test(int cpu_no, int no_of_bits) ksft_print_msg("Starting CAT test ...\n"); - res = mount_resctrlfs(); - if (res) { - ksft_exit_fail_msg("Failed to mount resctrl FS\n"); + if (test_prepare()) return; - } if (!validate_resctrl_feature_request(CAT_STR)) { ksft_test_result_skip("Hardware does not support CAT or CAT is disabled\n"); - goto umount; + goto cleanup; } res = cat_perf_miss_val(cpu_no, no_of_bits, "L3"); ksft_test_result(!res, "CAT: test\n"); -umount: - umount_resctrlfs(); +cleanup: + test_cleanup(); } int main(int argc, char **argv) diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c index 51963a6f2186..a9fe61133119 100644 --- a/tools/testing/selftests/resctrl/resctrl_val.c +++ b/tools/testing/selftests/resctrl/resctrl_val.c @@ -468,7 +468,9 @@ pid_t bm_pid, ppid; void ctrlc_handler(int signum, siginfo_t *info, void *ptr) { - kill(bm_pid, SIGKILL); + /* Only kill child after bm_pid is set after fork() */ + if (bm_pid) + kill(bm_pid, SIGKILL); umount_resctrlfs(); tests_cleanup(); ksft_print_msg("Ending\n\n"); @@ -485,6 +487,8 @@ int signal_handler_register(void) struct sigaction sigact; int ret = 0; + bm_pid = 0; + sigact.sa_sigaction = ctrlc_handler; sigemptyset(&sigact.sa_mask); sigact.sa_flags = SA_SIGINFO; @@ -706,10 +710,6 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par ksft_print_msg("Benchmark PID: %d\n", bm_pid); - ret = signal_handler_register(); - if (ret) - goto out; - /* * The cast removes constness but nothing mutates benchmark_cmd within * the context of this process. At the receiving process, it becomes @@ -721,19 +721,19 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par /* Taskset benchmark to specified cpu */ ret = taskset_benchmark(bm_pid, param->cpu_no); if (ret) - goto unregister; + goto out; /* Write benchmark to specified control&monitoring grp in resctrl FS */ ret = write_bm_pid_to_resctrl(bm_pid, param->ctrlgrp, param->mongrp, resctrl_val); if (ret) - goto unregister; + goto out; if (!strncmp(resctrl_val, MBM_STR, sizeof(MBM_STR)) || !strncmp(resctrl_val, MBA_STR, sizeof(MBA_STR))) { ret = initialize_mem_bw_imc(); if (ret) - goto unregister; + goto out; initialize_mem_bw_resctrl(param->ctrlgrp, param->mongrp, param->cpu_no, resctrl_val); @@ -748,7 +748,7 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par sizeof(pipe_message)) { perror("# failed reading message from child process"); close(pipefd[0]); - goto unregister; + goto out; } } close(pipefd[0]); @@ -757,7 +757,7 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par if (sigqueue(bm_pid, SIGUSR1, value) == -1) { perror("# sigqueue SIGUSR1 to child"); ret = errno; - goto unregister; + goto out; } /* Give benchmark enough time to fully run */ @@ -786,8 +786,6 @@ int resctrl_val(const char * const *benchmark_cmd, struct resctrl_val_param *par } } -unregister: - signal_handler_unregister(); out: kill(bm_pid, SIGKILL);
Unmounting resctrl FS has been moved into the per test functions in resctrl_tests.c by commit caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level"). In case a signal (SIGINT, SIGTERM, or SIGHUP) is received, the running selftest is aborted by ctrlc_handler() which then unmounts resctrl fs before exiting. The current section between signal_handler_register() and signal_handler_unregister(), however, does not cover the entire duration when resctrl FS is mounted. Move signal_handler_register() and signal_handler_unregister() calls from per test files into resctrl_tests.c to properly unmount resctrl fs. In order to not add signal_handler_register()/unregister() n times, create helpers test_prepare() and test_cleanup(). Adjust child process kill() call in ctrlc_handler() to only be invoked if the child was already forked. Fixes: caddc0fbe495 ("selftests/resctrl: Move resctrl FS mount/umount to higher level") Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: <stable@vger.kernel.org> --- tools/testing/selftests/resctrl/cat_test.c | 8 --- .../testing/selftests/resctrl/resctrl_tests.c | 65 +++++++++++-------- tools/testing/selftests/resctrl/resctrl_val.c | 22 +++---- 3 files changed, 48 insertions(+), 47 deletions(-)