Message ID | 1662001807-7-1-git-send-email-lizhijian@fujitsu.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | [v3] ksefltests: pidfd: Fix wait_states: Test terminated by timeout | expand |
ping On 01/09/2022 11:10, Li Zhijian wrote: > 0Day/LKP observed that the kselftest blocks forever since one of the > pidfd_wait doesn't terminate in 1 of 30 runs. After digging into > the source, we found that it blocks at: > ASSERT_EQ(sys_waitid(P_PIDFD, pidfd, &info, WCONTINUED, NULL), 0); > > wait_states has below testing flow: > CHILD PARENT > ---------------+-------------- > 1 STOP itself > 2 WAIT for CHILD STOPPED > 3 SIGNAL CHILD to CONT > 4 CONT > 5 STOP itself > 5' WAIT for CHILD CONT > 6 WAIT for CHILD STOPPED > > The problem is that the kernel cannot ensure the order of 5 and 5', once > 5 goes first, the test will fail. > > we can reproduce it by: > $ while true; do make run_tests -C pidfd; done > > Introduce a blocking read in child process to make sure the parent can > check its WCONTINUED. > > CC: Philip Li <philip.li@intel.com> > Reported-by: kernel test robot <lkp@intel.com> > Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> > Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> > --- > I have almost forgotten this patch since the former version post over 6 months > ago. This time I just do a rebase and update the comments. > V3: fixes description and add review tag > V2: rewrite with pipe to avoid usleep > --- > tools/testing/selftests/pidfd/pidfd_wait.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/tools/testing/selftests/pidfd/pidfd_wait.c b/tools/testing/selftests/pidfd/pidfd_wait.c > index 070c1c876df1..c3e2a3041f55 100644 > --- a/tools/testing/selftests/pidfd/pidfd_wait.c > +++ b/tools/testing/selftests/pidfd/pidfd_wait.c > @@ -95,20 +95,28 @@ static int sys_waitid(int which, pid_t pid, siginfo_t *info, int options, > .flags = CLONE_PIDFD | CLONE_PARENT_SETTID, > .exit_signal = SIGCHLD, > }; > + int pfd[2]; > pid_t pid; > siginfo_t info = { > .si_signo = 0, > }; > > + ASSERT_EQ(pipe(pfd), 0); > pid = sys_clone3(&args); > ASSERT_GE(pid, 0); > > if (pid == 0) { > + char buf[2]; > + > + close(pfd[1]); > kill(getpid(), SIGSTOP); > + ASSERT_EQ(read(pfd[0], buf, 1), 1); > + close(pfd[0]); > kill(getpid(), SIGSTOP); > exit(EXIT_SUCCESS); > } > > + close(pfd[0]); > ASSERT_EQ(sys_waitid(P_PIDFD, pidfd, &info, WSTOPPED, NULL), 0); > ASSERT_EQ(info.si_signo, SIGCHLD); > ASSERT_EQ(info.si_code, CLD_STOPPED); > @@ -117,6 +125,8 @@ static int sys_waitid(int which, pid_t pid, siginfo_t *info, int options, > ASSERT_EQ(sys_pidfd_send_signal(pidfd, SIGCONT, NULL, 0), 0); > > ASSERT_EQ(sys_waitid(P_PIDFD, pidfd, &info, WCONTINUED, NULL), 0); > + ASSERT_EQ(write(pfd[1], "C", 1), 1); > + close(pfd[1]); > ASSERT_EQ(info.si_signo, SIGCHLD); > ASSERT_EQ(info.si_code, CLD_CONTINUED); > ASSERT_EQ(info.si_pid, parent_tid);
ping again On 29/09/2022 08:56, Li Zhijian wrote: > ping > > > On 01/09/2022 11:10, Li Zhijian wrote: >> 0Day/LKP observed that the kselftest blocks forever since one of the >> pidfd_wait doesn't terminate in 1 of 30 runs. After digging into >> the source, we found that it blocks at: >> ASSERT_EQ(sys_waitid(P_PIDFD, pidfd, &info, WCONTINUED, NULL), 0); >> >> wait_states has below testing flow: >> CHILD PARENT >> ---------------+-------------- >> 1 STOP itself >> 2 WAIT for CHILD STOPPED >> 3 SIGNAL CHILD to CONT >> 4 CONT >> 5 STOP itself >> 5' WAIT for CHILD CONT >> 6 WAIT for CHILD STOPPED >> >> The problem is that the kernel cannot ensure the order of 5 and 5', once >> 5 goes first, the test will fail. >> >> we can reproduce it by: >> $ while true; do make run_tests -C pidfd; done >> >> Introduce a blocking read in child process to make sure the parent can >> check its WCONTINUED. >> >> CC: Philip Li <philip.li@intel.com> >> Reported-by: kernel test robot <lkp@intel.com> >> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> >> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> >> --- >> I have almost forgotten this patch since the former version post over 6 months >> ago. This time I just do a rebase and update the comments. >> V3: fixes description and add review tag >> V2: rewrite with pipe to avoid usleep >> --- >> tools/testing/selftests/pidfd/pidfd_wait.c | 10 ++++++++++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/tools/testing/selftests/pidfd/pidfd_wait.c b/tools/testing/selftests/pidfd/pidfd_wait.c >> index 070c1c876df1..c3e2a3041f55 100644 >> --- a/tools/testing/selftests/pidfd/pidfd_wait.c >> +++ b/tools/testing/selftests/pidfd/pidfd_wait.c >> @@ -95,20 +95,28 @@ static int sys_waitid(int which, pid_t pid, siginfo_t *info, int options, >> .flags = CLONE_PIDFD | CLONE_PARENT_SETTID, >> .exit_signal = SIGCHLD, >> }; >> + int pfd[2]; >> pid_t pid; >> siginfo_t info = { >> .si_signo = 0, >> }; >> + ASSERT_EQ(pipe(pfd), 0); >> pid = sys_clone3(&args); >> ASSERT_GE(pid, 0); >> if (pid == 0) { >> + char buf[2]; >> + >> + close(pfd[1]); >> kill(getpid(), SIGSTOP); >> + ASSERT_EQ(read(pfd[0], buf, 1), 1); >> + close(pfd[0]); >> kill(getpid(), SIGSTOP); >> exit(EXIT_SUCCESS); >> } >> + close(pfd[0]); >> ASSERT_EQ(sys_waitid(P_PIDFD, pidfd, &info, WSTOPPED, NULL), 0); >> ASSERT_EQ(info.si_signo, SIGCHLD); >> ASSERT_EQ(info.si_code, CLD_STOPPED); >> @@ -117,6 +125,8 @@ static int sys_waitid(int which, pid_t pid, siginfo_t *info, int options, >> ASSERT_EQ(sys_pidfd_send_signal(pidfd, SIGCONT, NULL, 0), 0); >> ASSERT_EQ(sys_waitid(P_PIDFD, pidfd, &info, WCONTINUED, NULL), 0); >> + ASSERT_EQ(write(pfd[1], "C", 1), 1); >> + close(pfd[1]); >> ASSERT_EQ(info.si_signo, SIGCHLD); >> ASSERT_EQ(info.si_code, CLD_CONTINUED); >> ASSERT_EQ(info.si_pid, parent_tid); >
On 8/31/22 21:10, Li Zhijian wrote: > 0Day/LKP observed that the kselftest blocks forever since one of the > pidfd_wait doesn't terminate in 1 of 30 runs. After digging into > the source, we found that it blocks at: > ASSERT_EQ(sys_waitid(P_PIDFD, pidfd, &info, WCONTINUED, NULL), 0); > > wait_states has below testing flow: > CHILD PARENT > ---------------+-------------- > 1 STOP itself > 2 WAIT for CHILD STOPPED > 3 SIGNAL CHILD to CONT > 4 CONT > 5 STOP itself > 5' WAIT for CHILD CONT > 6 WAIT for CHILD STOPPED > > The problem is that the kernel cannot ensure the order of 5 and 5', once > 5 goes first, the test will fail. > > we can reproduce it by: > $ while true; do make run_tests -C pidfd; done > > Introduce a blocking read in child process to make sure the parent can > check its WCONTINUED. > > CC: Philip Li <philip.li@intel.com> > Reported-by: kernel test robot <lkp@intel.com> > Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> > Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> > --- Sorry for the delay. Now applied to linux-kselftest fixes for rc4 thanks, -- Shuah
diff --git a/tools/testing/selftests/pidfd/pidfd_wait.c b/tools/testing/selftests/pidfd/pidfd_wait.c index 070c1c876df1..c3e2a3041f55 100644 --- a/tools/testing/selftests/pidfd/pidfd_wait.c +++ b/tools/testing/selftests/pidfd/pidfd_wait.c @@ -95,20 +95,28 @@ static int sys_waitid(int which, pid_t pid, siginfo_t *info, int options, .flags = CLONE_PIDFD | CLONE_PARENT_SETTID, .exit_signal = SIGCHLD, }; + int pfd[2]; pid_t pid; siginfo_t info = { .si_signo = 0, }; + ASSERT_EQ(pipe(pfd), 0); pid = sys_clone3(&args); ASSERT_GE(pid, 0); if (pid == 0) { + char buf[2]; + + close(pfd[1]); kill(getpid(), SIGSTOP); + ASSERT_EQ(read(pfd[0], buf, 1), 1); + close(pfd[0]); kill(getpid(), SIGSTOP); exit(EXIT_SUCCESS); } + close(pfd[0]); ASSERT_EQ(sys_waitid(P_PIDFD, pidfd, &info, WSTOPPED, NULL), 0); ASSERT_EQ(info.si_signo, SIGCHLD); ASSERT_EQ(info.si_code, CLD_STOPPED); @@ -117,6 +125,8 @@ static int sys_waitid(int which, pid_t pid, siginfo_t *info, int options, ASSERT_EQ(sys_pidfd_send_signal(pidfd, SIGCONT, NULL, 0), 0); ASSERT_EQ(sys_waitid(P_PIDFD, pidfd, &info, WCONTINUED, NULL), 0); + ASSERT_EQ(write(pfd[1], "C", 1), 1); + close(pfd[1]); ASSERT_EQ(info.si_signo, SIGCHLD); ASSERT_EQ(info.si_code, CLD_CONTINUED); ASSERT_EQ(info.si_pid, parent_tid);