diff mbox series

tests/qtest: Poll on waitpid() for a while before sending SIGKILL

Message ID 20230111223018.3965423-1-stefanb@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series tests/qtest: Poll on waitpid() for a while before sending SIGKILL | expand

Commit Message

Stefan Berger Jan. 11, 2023, 10:30 p.m. UTC
To prevent getting stuck on waitpid() in case the target process does
not terminate on SIGTERM, poll on waitpid() for 10s and if the target
process has not changed state until then send a SIGKILL to it.

Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
---
 tests/qtest/libqtest.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

Comments

Daniel P. Berrangé Jan. 12, 2023, 8:53 a.m. UTC | #1
On Wed, Jan 11, 2023 at 05:30:18PM -0500, Stefan Berger wrote:
> To prevent getting stuck on waitpid() in case the target process does
> not terminate on SIGTERM, poll on waitpid() for 10s and if the target
> process has not changed state until then send a SIGKILL to it.
> 
> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
> ---
>  tests/qtest/libqtest.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)

Since this is a test suite and we know our CI system gets very
heavily loaded, I think we should wait more than 10 secs, to
ensure QEMU has time to flush pending I/O in particular which
is most likely to delay things. If you bump the time to 30 secs
then

  Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

> 
> diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
> index 2fbc3b88f3..362b1f724f 100644
> --- a/tests/qtest/libqtest.c
> +++ b/tests/qtest/libqtest.c
> @@ -202,8 +202,24 @@ void qtest_wait_qemu(QTestState *s)
>  {
>  #ifndef _WIN32
>      pid_t pid;
> +    uint64_t end;
> +
> +    /* poll for 10s until sending SIGKILL */
> +    end = g_get_monotonic_time() + 10 * G_TIME_SPAN_SECOND;
> +
> +    do {
> +        pid = waitpid(s->qemu_pid, &s->wstatus, WNOHANG);
> +        if (pid != 0) {
> +            break;
> +        }
> +        g_usleep(100 * 1000);
> +    } while (g_get_monotonic_time() < end);
> +
> +    if (pid == 0) {
> +        kill(s->qemu_pid, SIGKILL);
> +        TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0));
> +    }
>  
> -    TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0));
>      assert(pid == s->qemu_pid);
>  #else
>      DWORD ret;
> -- 
> 2.39.0
> 

With regards,
Daniel
Philippe Mathieu-Daudé Jan. 12, 2023, 9:18 a.m. UTC | #2
On 11/1/23 23:30, Stefan Berger wrote:
> To prevent getting stuck on waitpid() in case the target process does
> not terminate on SIGTERM, poll on waitpid() for 10s and if the target
> process has not changed state until then send a SIGKILL to it.
> 
> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
> ---
>   tests/qtest/libqtest.c | 18 +++++++++++++++++-
>   1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
> index 2fbc3b88f3..362b1f724f 100644
> --- a/tests/qtest/libqtest.c
> +++ b/tests/qtest/libqtest.c
> @@ -202,8 +202,24 @@ void qtest_wait_qemu(QTestState *s)
>   {
>   #ifndef _WIN32
>       pid_t pid;
> +    uint64_t end;
> +
> +    /* poll for 10s until sending SIGKILL */
> +    end = g_get_monotonic_time() + 10 * G_TIME_SPAN_SECOND;

Maybe we could use getenv() to allow tuning / using different value?

> +    do {
> +        pid = waitpid(s->qemu_pid, &s->wstatus, WNOHANG);
> +        if (pid != 0) {
> +            break;
> +        }
> +        g_usleep(100 * 1000);
> +    } while (g_get_monotonic_time() < end);
> +
> +    if (pid == 0) {
> +        kill(s->qemu_pid, SIGKILL);
> +        TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0));
> +    }
>   
> -    TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0));
>       assert(pid == s->qemu_pid);
>   #else
>       DWORD ret;
Daniel P. Berrangé Jan. 12, 2023, 9:54 a.m. UTC | #3
On Thu, Jan 12, 2023 at 10:18:01AM +0100, Philippe Mathieu-Daudé wrote:
> On 11/1/23 23:30, Stefan Berger wrote:
> > To prevent getting stuck on waitpid() in case the target process does
> > not terminate on SIGTERM, poll on waitpid() for 10s and if the target
> > process has not changed state until then send a SIGKILL to it.
> > 
> > Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
> > ---
> >   tests/qtest/libqtest.c | 18 +++++++++++++++++-
> >   1 file changed, 17 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
> > index 2fbc3b88f3..362b1f724f 100644
> > --- a/tests/qtest/libqtest.c
> > +++ b/tests/qtest/libqtest.c
> > @@ -202,8 +202,24 @@ void qtest_wait_qemu(QTestState *s)
> >   {
> >   #ifndef _WIN32
> >       pid_t pid;
> > +    uint64_t end;
> > +
> > +    /* poll for 10s until sending SIGKILL */
> > +    end = g_get_monotonic_time() + 10 * G_TIME_SPAN_SECOND;
> 
> Maybe we could use getenv() to allow tuning / using different value?

I'd rather we picked a value large enough that it will work
reliably out of the box for all scenarios with no magic
env required. We're just trying to prevent infinite waits if
something unexpected happens. We don't need to use an
aggressively short value, as most users will never hit this
scenario. I think 30 seconds is large enough to be reliable
but we could easily go higher to 60/120 if we want to be
really really sure.


With regards,
Daniel
Philippe Mathieu-Daudé Jan. 12, 2023, 10:28 a.m. UTC | #4
On 12/1/23 10:54, Daniel P. Berrangé wrote:
> On Thu, Jan 12, 2023 at 10:18:01AM +0100, Philippe Mathieu-Daudé wrote:
>> On 11/1/23 23:30, Stefan Berger wrote:
>>> To prevent getting stuck on waitpid() in case the target process does
>>> not terminate on SIGTERM, poll on waitpid() for 10s and if the target
>>> process has not changed state until then send a SIGKILL to it.
>>>
>>> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
>>> ---
>>>    tests/qtest/libqtest.c | 18 +++++++++++++++++-
>>>    1 file changed, 17 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
>>> index 2fbc3b88f3..362b1f724f 100644
>>> --- a/tests/qtest/libqtest.c
>>> +++ b/tests/qtest/libqtest.c
>>> @@ -202,8 +202,24 @@ void qtest_wait_qemu(QTestState *s)
>>>    {
>>>    #ifndef _WIN32
>>>        pid_t pid;
>>> +    uint64_t end;
>>> +
>>> +    /* poll for 10s until sending SIGKILL */
>>> +    end = g_get_monotonic_time() + 10 * G_TIME_SPAN_SECOND;
>>
>> Maybe we could use getenv() to allow tuning / using different value?
> 
> I'd rather we picked a value large enough that it will work
> reliably out of the box for all scenarios with no magic
> env required. We're just trying to prevent infinite waits if
> something unexpected happens. We don't need to use an
> aggressively short value, as most users will never hit this
> scenario. I think 30 seconds is large enough to be reliable
> but we could easily go higher to 60/120 if we want to be
> really really sure.

I read your other comment later and I agree with you.
diff mbox series

Patch

diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index 2fbc3b88f3..362b1f724f 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -202,8 +202,24 @@  void qtest_wait_qemu(QTestState *s)
 {
 #ifndef _WIN32
     pid_t pid;
+    uint64_t end;
+
+    /* poll for 10s until sending SIGKILL */
+    end = g_get_monotonic_time() + 10 * G_TIME_SPAN_SECOND;
+
+    do {
+        pid = waitpid(s->qemu_pid, &s->wstatus, WNOHANG);
+        if (pid != 0) {
+            break;
+        }
+        g_usleep(100 * 1000);
+    } while (g_get_monotonic_time() < end);
+
+    if (pid == 0) {
+        kill(s->qemu_pid, SIGKILL);
+        TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0));
+    }
 
-    TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0));
     assert(pid == s->qemu_pid);
 #else
     DWORD ret;