Message ID | 20230111223018.3965423-1-stefanb@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | tests/qtest: Poll on waitpid() for a while before sending SIGKILL | expand |
On Wed, Jan 11, 2023 at 05:30:18PM -0500, Stefan Berger wrote: > To prevent getting stuck on waitpid() in case the target process does > not terminate on SIGTERM, poll on waitpid() for 10s and if the target > process has not changed state until then send a SIGKILL to it. > > Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> > --- > tests/qtest/libqtest.c | 18 +++++++++++++++++- > 1 file changed, 17 insertions(+), 1 deletion(-) Since this is a test suite and we know our CI system gets very heavily loaded, I think we should wait more than 10 secs, to ensure QEMU has time to flush pending I/O in particular which is most likely to delay things. If you bump the time to 30 secs then Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> > > diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c > index 2fbc3b88f3..362b1f724f 100644 > --- a/tests/qtest/libqtest.c > +++ b/tests/qtest/libqtest.c > @@ -202,8 +202,24 @@ void qtest_wait_qemu(QTestState *s) > { > #ifndef _WIN32 > pid_t pid; > + uint64_t end; > + > + /* poll for 10s until sending SIGKILL */ > + end = g_get_monotonic_time() + 10 * G_TIME_SPAN_SECOND; > + > + do { > + pid = waitpid(s->qemu_pid, &s->wstatus, WNOHANG); > + if (pid != 0) { > + break; > + } > + g_usleep(100 * 1000); > + } while (g_get_monotonic_time() < end); > + > + if (pid == 0) { > + kill(s->qemu_pid, SIGKILL); > + TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0)); > + } > > - TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0)); > assert(pid == s->qemu_pid); > #else > DWORD ret; > -- > 2.39.0 > With regards, Daniel
On 11/1/23 23:30, Stefan Berger wrote: > To prevent getting stuck on waitpid() in case the target process does > not terminate on SIGTERM, poll on waitpid() for 10s and if the target > process has not changed state until then send a SIGKILL to it. > > Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> > --- > tests/qtest/libqtest.c | 18 +++++++++++++++++- > 1 file changed, 17 insertions(+), 1 deletion(-) > > diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c > index 2fbc3b88f3..362b1f724f 100644 > --- a/tests/qtest/libqtest.c > +++ b/tests/qtest/libqtest.c > @@ -202,8 +202,24 @@ void qtest_wait_qemu(QTestState *s) > { > #ifndef _WIN32 > pid_t pid; > + uint64_t end; > + > + /* poll for 10s until sending SIGKILL */ > + end = g_get_monotonic_time() + 10 * G_TIME_SPAN_SECOND; Maybe we could use getenv() to allow tuning / using different value? > + do { > + pid = waitpid(s->qemu_pid, &s->wstatus, WNOHANG); > + if (pid != 0) { > + break; > + } > + g_usleep(100 * 1000); > + } while (g_get_monotonic_time() < end); > + > + if (pid == 0) { > + kill(s->qemu_pid, SIGKILL); > + TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0)); > + } > > - TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0)); > assert(pid == s->qemu_pid); > #else > DWORD ret;
On Thu, Jan 12, 2023 at 10:18:01AM +0100, Philippe Mathieu-Daudé wrote: > On 11/1/23 23:30, Stefan Berger wrote: > > To prevent getting stuck on waitpid() in case the target process does > > not terminate on SIGTERM, poll on waitpid() for 10s and if the target > > process has not changed state until then send a SIGKILL to it. > > > > Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> > > --- > > tests/qtest/libqtest.c | 18 +++++++++++++++++- > > 1 file changed, 17 insertions(+), 1 deletion(-) > > > > diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c > > index 2fbc3b88f3..362b1f724f 100644 > > --- a/tests/qtest/libqtest.c > > +++ b/tests/qtest/libqtest.c > > @@ -202,8 +202,24 @@ void qtest_wait_qemu(QTestState *s) > > { > > #ifndef _WIN32 > > pid_t pid; > > + uint64_t end; > > + > > + /* poll for 10s until sending SIGKILL */ > > + end = g_get_monotonic_time() + 10 * G_TIME_SPAN_SECOND; > > Maybe we could use getenv() to allow tuning / using different value? I'd rather we picked a value large enough that it will work reliably out of the box for all scenarios with no magic env required. We're just trying to prevent infinite waits if something unexpected happens. We don't need to use an aggressively short value, as most users will never hit this scenario. I think 30 seconds is large enough to be reliable but we could easily go higher to 60/120 if we want to be really really sure. With regards, Daniel
On 12/1/23 10:54, Daniel P. Berrangé wrote: > On Thu, Jan 12, 2023 at 10:18:01AM +0100, Philippe Mathieu-Daudé wrote: >> On 11/1/23 23:30, Stefan Berger wrote: >>> To prevent getting stuck on waitpid() in case the target process does >>> not terminate on SIGTERM, poll on waitpid() for 10s and if the target >>> process has not changed state until then send a SIGKILL to it. >>> >>> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> >>> --- >>> tests/qtest/libqtest.c | 18 +++++++++++++++++- >>> 1 file changed, 17 insertions(+), 1 deletion(-) >>> >>> diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c >>> index 2fbc3b88f3..362b1f724f 100644 >>> --- a/tests/qtest/libqtest.c >>> +++ b/tests/qtest/libqtest.c >>> @@ -202,8 +202,24 @@ void qtest_wait_qemu(QTestState *s) >>> { >>> #ifndef _WIN32 >>> pid_t pid; >>> + uint64_t end; >>> + >>> + /* poll for 10s until sending SIGKILL */ >>> + end = g_get_monotonic_time() + 10 * G_TIME_SPAN_SECOND; >> >> Maybe we could use getenv() to allow tuning / using different value? > > I'd rather we picked a value large enough that it will work > reliably out of the box for all scenarios with no magic > env required. We're just trying to prevent infinite waits if > something unexpected happens. We don't need to use an > aggressively short value, as most users will never hit this > scenario. I think 30 seconds is large enough to be reliable > but we could easily go higher to 60/120 if we want to be > really really sure. I read your other comment later and I agree with you.
diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c index 2fbc3b88f3..362b1f724f 100644 --- a/tests/qtest/libqtest.c +++ b/tests/qtest/libqtest.c @@ -202,8 +202,24 @@ void qtest_wait_qemu(QTestState *s) { #ifndef _WIN32 pid_t pid; + uint64_t end; + + /* poll for 10s until sending SIGKILL */ + end = g_get_monotonic_time() + 10 * G_TIME_SPAN_SECOND; + + do { + pid = waitpid(s->qemu_pid, &s->wstatus, WNOHANG); + if (pid != 0) { + break; + } + g_usleep(100 * 1000); + } while (g_get_monotonic_time() < end); + + if (pid == 0) { + kill(s->qemu_pid, SIGKILL); + TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0)); + } - TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0)); assert(pid == s->qemu_pid); #else DWORD ret;
To prevent getting stuck on waitpid() in case the target process does not terminate on SIGTERM, poll on waitpid() for 10s and if the target process has not changed state until then send a SIGKILL to it. Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> --- tests/qtest/libqtest.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)