diff mbox series

[for-next,1/3] selftests/watchdog: add count parameter for watchdog-test

Message ID 20241025013933.6516-1-lizhijian@fujitsu.com (mailing list archive)
State New
Headers show
Series [for-next,1/3] selftests/watchdog: add count parameter for watchdog-test | expand

Commit Message

Zhijian Li (Fujitsu) Oct. 25, 2024, 1:39 a.m. UTC
Currently, watchdog-test keep running until it gets a SIGINT. However,
when watchdog-test is executed from the kselftests framework, where it
launches test via timeout which will send SIGTERM in time up. This could
lead to
1. watchdog haven't stop, a watchdog reset is triggered to reboot the OS
   in silent.
2. kselftests gets an timeout exit code, and judge watchdog-test as
  'not ok'

This patch is prepare to fix above 2 issues

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
Hey,
Cover letter is here.

It's notice that a OS reboot was triggerred after ran the watchdog-test
in kselftests framwork 'make run_tests', that's because watchdog-test
didn't stop feeding the watchdog after enable it.

In addition, current watchdog-test didn't adapt to the kselftests
framework which launchs the test with /usr/bin/timeout and no timeout
is expected.
---
 tools/testing/selftests/watchdog/watchdog-test.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

Comments

Shuah Khan Oct. 27, 2024, 12:28 a.m. UTC | #1
On 10/24/24 19:39, Li Zhijian wrote:
> Currently, watchdog-test keep running until it gets a SIGINT. However,
> when watchdog-test is executed from the kselftests framework, where it
> launches test via timeout which will send SIGTERM in time up. This could
> lead to
> 1. watchdog haven't stop, a watchdog reset is triggered to reboot the OS
>     in silent.
> 2. kselftests gets an timeout exit code, and judge watchdog-test as
>    'not ok'
> 
This test isn't really supposed to be run from kselftest framework.
This is the reason why it isn't included in the default run.

> This patch is prepare to fix above 2 issues

This series needs a separate cover letter explaining how this problem is
being fixed.

> 
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> ---
> Hey,
> Cover letter is here.
> 
> It's notice that a OS reboot was triggerred after ran the watchdog-test
> in kselftests framwork 'make run_tests', that's because watchdog-test
> didn't stop feeding the watchdog after enable it.
> 
> In addition, current watchdog-test didn't adapt to the kselftests
> framework which launchs the test with /usr/bin/timeout and no timeout
> is expected.
> ---
>   tools/testing/selftests/watchdog/watchdog-test.c | 13 +++++++++----
>   1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/testing/selftests/watchdog/watchdog-test.c b/tools/testing/selftests/watchdog/watchdog-test.c
> index bc71cbca0dde..2f8fd2670897 100644
> --- a/tools/testing/selftests/watchdog/watchdog-test.c
> +++ b/tools/testing/selftests/watchdog/watchdog-test.c
> @@ -27,7 +27,7 @@
>   
>   int fd;
>   const char v = 'V';
> -static const char sopts[] = "bdehp:st:Tn:NLf:i";
> +static const char sopts[] = "bdehp:st:Tn:NLf:c:i";
>   static const struct option lopts[] = {
>   	{"bootstatus",          no_argument, NULL, 'b'},
>   	{"disable",             no_argument, NULL, 'd'},
> @@ -42,6 +42,7 @@ static const struct option lopts[] = {
>   	{"gettimeleft",		no_argument, NULL, 'L'},
>   	{"file",          required_argument, NULL, 'f'},
>   	{"info",		no_argument, NULL, 'i'},
> +	{"count",         required_argument, NULL, 'c'},
>   	{NULL,                  no_argument, NULL, 0x0}
>   };
>   
> @@ -95,6 +96,7 @@ static void usage(char *progname)
>   	printf(" -n, --pretimeout=T\tSet the pretimeout to T seconds\n");
>   	printf(" -N, --getpretimeout\tGet the pretimeout\n");
>   	printf(" -L, --gettimeleft\tGet the time left until timer expires\n");
> +	printf(" -c, --count\tStop after feeding the watchdog count times\n");
>   	printf("\n");
>   	printf("Parameters are parsed left-to-right in real-time.\n");
>   	printf("Example: %s -d -t 10 -p 5 -e\n", progname);
> @@ -174,7 +176,7 @@ int main(int argc, char *argv[])
>   	unsigned int ping_rate = DEFAULT_PING_RATE;
>   	int ret;
>   	int c;
> -	int oneshot = 0;
> +	int oneshot = 0, stop = 1, count = 0;
>   	char *file = "/dev/watchdog";
>   	struct watchdog_info info;
>   	int temperature;
> @@ -307,6 +309,9 @@ int main(int argc, char *argv[])
>   			else
>   				printf("WDIOC_GETTIMELEFT error '%s'\n", strerror(errno));
>   			break;
> +		case 'c':
> +			stop = 0;
> +			count = strtoul(optarg, NULL, 0);
>   		case 'f':
>   			/* Handled above */
>   			break;
> @@ -336,8 +341,8 @@ int main(int argc, char *argv[])
>   
>   	signal(SIGINT, term);
>   
> -	while (1) {
> -		keep_alive();
> +	while (stop || count--) {
> +		exit_code = keep_alive();
>   		sleep(ping_rate);
>   	}
>   end:
Zhijian Li (Fujitsu) Oct. 28, 2024, 12:50 a.m. UTC | #2
On 27/10/2024 08:28, Shuah Khan wrote:
> On 10/24/24 19:39, Li Zhijian wrote:
>> Currently, watchdog-test keep running until it gets a SIGINT. However,
>> when watchdog-test is executed from the kselftests framework, where it
>> launches test via timeout which will send SIGTERM in time up. This could
>> lead to
>> 1. watchdog haven't stop, a watchdog reset is triggered to reboot the OS
>>     in silent.
>> 2. kselftests gets an timeout exit code, and judge watchdog-test as
>>    'not ok'
>>
> This test isn't really supposed to be run from kselftest framework.
> This is the reason why it isn't included in the default run.

May I know what's the default run, is it different from `make run_tests` ?


> 
>> This patch is prepare to fix above 2 issues
> 
> This series needs a separate cover letter explaining how this problem is
> being fixed.

Cover letter is in this patch, see below:
In addition, we can get the 'How' by reading the simple change in each change.


> 
>>
>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
>> ---
>> Hey,
>> Cover letter is here.
>>
>> It's notice that a OS reboot was triggerred after ran the watchdog-test
>> in kselftests framwork 'make run_tests', that's because watchdog-test
>> didn't stop feeding the watchdog after enable it.
>>
>> In addition, current watchdog-test didn't adapt to the kselftests
>> framework which launchs the test with /usr/bin/timeout and no timeout
>> is expected.
>> ---
Shuah Khan Oct. 28, 2024, 3:29 a.m. UTC | #3
On 10/27/24 18:50, Zhijian Li (Fujitsu) wrote:
> 
> 
> On 27/10/2024 08:28, Shuah Khan wrote:
>> On 10/24/24 19:39, Li Zhijian wrote:
>>> Currently, watchdog-test keep running until it gets a SIGINT. However,
>>> when watchdog-test is executed from the kselftests framework, where it
>>> launches test via timeout which will send SIGTERM in time up. This could
>>> lead to
>>> 1. watchdog haven't stop, a watchdog reset is triggered to reboot the OS
>>>      in silent.
>>> 2. kselftests gets an timeout exit code, and judge watchdog-test as
>>>     'not ok'
>>>
>> This test isn't really supposed to be run from kselftest framework.
>> This is the reason why it isn't included in the default run.
> 
> May I know what's the default run, is it different from `make run_tests` ?

No it isn't. "make kselftest" runs only the targets mentioned in the
selftests Makefile. That is considered the kselftest default run.

There is a reason why watchdog isn't included in the default run.
It isn't intended to be run by users by default as this is test is
just for testing watchdog api

> 
> 
>>
>>> This patch is prepare to fix above 2 issues
>>
>> This series needs a separate cover letter explaining how this problem is
>> being fixed.
> 
> Cover letter is in this patch, see below:
> In addition, we can get the 'How' by reading the simple change in each change.

That isn't enough to understand why this change is needed.
Send patch series with a cover letter explaining what you are
doing.

> 
> 
>>
>>>
>>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
>>> ---
>>> Hey,
>>> Cover letter is here.
>>>
>>> It's notice that a OS reboot was triggerred after ran the watchdog-test
>>> in kselftests framwork 'make run_tests', that's because watchdog-test
>>> didn't stop feeding the watchdog after enable it.
>>>
>>> In addition, current watchdog-test didn't adapt to the kselftests
>>> framework which launchs the test with /usr/bin/timeout and no timeout
>>> is expected.
>>> ---

thanks,
-- Shuah
Zhijian Li (Fujitsu) Oct. 28, 2024, 4:02 a.m. UTC | #4
On 28/10/2024 11:29, Shuah Khan wrote:
> On 10/27/24 18:50, Zhijian Li (Fujitsu) wrote:
>>
>>
>> On 27/10/2024 08:28, Shuah Khan wrote:
>>> On 10/24/24 19:39, Li Zhijian wrote:
>>>> Currently, watchdog-test keep running until it gets a SIGINT. However,
>>>> when watchdog-test is executed from the kselftests framework, where it
>>>> launches test via timeout which will send SIGTERM in time up. This could
>>>> lead to
>>>> 1. watchdog haven't stop, a watchdog reset is triggered to reboot the OS
>>>>      in silent.
>>>> 2. kselftests gets an timeout exit code, and judge watchdog-test as
>>>>     'not ok'
>>>>
>>> This test isn't really supposed to be run from kselftest framework.
>>> This is the reason why it isn't included in the default run.
>>
>> May I know what's the default run, is it different from `make run_tests` ?
> 
> No it isn't. "make kselftest" runs only the targets mentioned in the
> selftests Makefile. That is considered the kselftest default run.

Hey, Shuah,


Thanks for your explanation.
If that is the case, I do not have an urgent need for the current patch, expect
I'd like to avoid the reboot issue after an accidentally `make run_tests`

Some changes are make as below, please take a look. I will send it out we reach a consensus.


commit 2296f9d88fde4921758a45bf160a7f1b9d4678a0 (HEAD)
Author: Li Zhijian <lizhijian@fujitsu.com>
Date:   Mon Oct 28 11:54:03 2024 +0800

     selftests/watchdog-test: Fix system accidentally reset after watchdog-test
     
     After `make run_tests` to run watchdog-test, a system reboot would
     happen due to watchdog not stop.
     ```
     [ 1367.185172] watchdog: watchdog0: watchdog did not stop!
     ```
     
     Fix it by registering a timeout signal in watchdog-test, where its
     signal handler will stop the watchdog.
     
     After that
      # timeout 1 ./watchdog-test
      Watchdog Ticking Away!
      .
      Stopping watchdog ticks...
     
     Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>

diff --git a/tools/testing/selftests/watchdog/watchdog-test.c b/tools/testing/selftests/watchdog/watchdog-test.c
index bc71cbca0dde..97acb90f8b30 100644
--- a/tools/testing/selftests/watchdog/watchdog-test.c
+++ b/tools/testing/selftests/watchdog/watchdog-test.c
@@ -335,6 +335,10 @@ int main(int argc, char *argv[])
         printf("Watchdog Ticking Away!\n");
  
         signal(SIGINT, term);
+       /*
+        * Register the timeout signal
+        */
+       signal(SIGTERM, term);
  
         while (1) {
                 keep_alive();




> 
> There is a reason why watchdog isn't included in the default run.
> It isn't intended to be run by users by default as this is test is
> just for testing watchdog api
> 
>>
>>
>>>
>>>> This patch is prepare to fix above 2 issues
>>>
>>> This series needs a separate cover letter explaining how this problem is
>>> being fixed.
>>
>> Cover letter is in this patch, see below:
>> In addition, we can get the 'How' by reading the simple change in each change.
> 
> That isn't enough to understand why this change is needed.
> Send patch series with a cover letter explaining what you are
> doing.
> 
>>
>>
>>>
>>>>
>>>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
>>>> ---
>>>> Hey,
>>>> Cover letter is here.
>>>>
>>>> It's notice that a OS reboot was triggerred after ran the watchdog-test
>>>> in kselftests framwork 'make run_tests', that's because watchdog-test
>>>> didn't stop feeding the watchdog after enable it.
>>>>
>>>> In addition, current watchdog-test didn't adapt to the kselftests
>>>> framework which launchs the test with /usr/bin/timeout and no timeout
>>>> is expected.
>>>> ---
> 
> thanks,
> -- Shuah
Shuah Khan Oct. 28, 2024, 5:31 a.m. UTC | #5
On 10/27/24 22:02, Zhijian Li (Fujitsu) wrote:
> 
> 
> On 28/10/2024 11:29, Shuah Khan wrote:
>> On 10/27/24 18:50, Zhijian Li (Fujitsu) wrote:
>>>
>>>
>>> On 27/10/2024 08:28, Shuah Khan wrote:
>>>> On 10/24/24 19:39, Li Zhijian wrote:
>>>>> Currently, watchdog-test keep running until it gets a SIGINT. However,
>>>>> when watchdog-test is executed from the kselftests framework, where it
>>>>> launches test via timeout which will send SIGTERM in time up. This could
>>>>> lead to
>>>>> 1. watchdog haven't stop, a watchdog reset is triggered to reboot the OS
>>>>>       in silent.
>>>>> 2. kselftests gets an timeout exit code, and judge watchdog-test as
>>>>>      'not ok'
>>>>>
>>>> This test isn't really supposed to be run from kselftest framework.
>>>> This is the reason why it isn't included in the default run.
>>>
>>> May I know what's the default run, is it different from `make run_tests` ?
>>
>> No it isn't. "make kselftest" runs only the targets mentioned in the
>> selftests Makefile. That is considered the kselftest default run.
> 
> Hey, Shuah,
> 
> 
> Thanks for your explanation.
> If that is the case, I do not have an urgent need for the current patch, expect
> I'd like to avoid the reboot issue after an accidentally `make run_tests`
> 
> Some changes are make as below, please take a look. I will send it out we reach a consensus.
> 
> 
> commit 2296f9d88fde4921758a45bf160a7f1b9d4678a0 (HEAD)
> Author: Li Zhijian <lizhijian@fujitsu.com>
> Date:   Mon Oct 28 11:54:03 2024 +0800
> 
>       selftests/watchdog-test: Fix system accidentally reset after watchdog-test
>       
>       After `make run_tests` to run watchdog-test, a system reboot would
>       happen due to watchdog not stop.
>       ```

The system shouldn't reboot just because watchdog test is left running.
watchdog test keeps calling ioctl() with WDIOC_KEEPALIVE to make sure
the watchdog card timer is reset.

If you are seeing reboots, that means watchdog test couldn't reset the
timer. This usually mean system is unresponsive or something is wrong
with the watchdog card on your system.

This is the behavior you would expect from a watchdog timer. Does your
system have a watchdog card ot or you enabling softdog module?

Either way there is some other reason for the system reboot.

thanks,
-- Shuah
Zhijian Li (Fujitsu) Oct. 28, 2024, 5:45 a.m. UTC | #6
On 28/10/2024 13:31, Shuah Khan wrote:
> On 10/27/24 22:02, Zhijian Li (Fujitsu) wrote:
>>
>>
>> On 28/10/2024 11:29, Shuah Khan wrote:
>>> On 10/27/24 18:50, Zhijian Li (Fujitsu) wrote:
>>>>
>>>>
>>>> On 27/10/2024 08:28, Shuah Khan wrote:
>>>>> On 10/24/24 19:39, Li Zhijian wrote:
>>>>>> Currently, watchdog-test keep running until it gets a SIGINT. However,
>>>>>> when watchdog-test is executed from the kselftests framework, where it
>>>>>> launches test via timeout which will send SIGTERM in time up. This could
>>>>>> lead to
>>>>>> 1. watchdog haven't stop, a watchdog reset is triggered to reboot the OS
>>>>>>       in silent.
>>>>>> 2. kselftests gets an timeout exit code, and judge watchdog-test as
>>>>>>      'not ok'
>>>>>>
>>>>> This test isn't really supposed to be run from kselftest framework.
>>>>> This is the reason why it isn't included in the default run.
>>>>
>>>> May I know what's the default run, is it different from `make run_tests` ?
>>>
>>> No it isn't. "make kselftest" runs only the targets mentioned in the
>>> selftests Makefile. That is considered the kselftest default run.
>>
>> Hey, Shuah,
>>
>>
>> Thanks for your explanation.
>> If that is the case, I do not have an urgent need for the current patch, expect
>> I'd like to avoid the reboot issue after an accidentally `make run_tests`
>>
>> Some changes are make as below, please take a look. I will send it out we reach a consensus.
>>
>>
>> commit 2296f9d88fde4921758a45bf160a7f1b9d4678a0 (HEAD)
>> Author: Li Zhijian <lizhijian@fujitsu.com>
>> Date:   Mon Oct 28 11:54:03 2024 +0800
>>
>>       selftests/watchdog-test: Fix system accidentally reset after watchdog-test
>>       After `make run_tests` to run watchdog-test, a system reboot would
>>       happen due to watchdog not stop.
>>       ```
> 
> The system shouldn't reboot just because watchdog test is left running.
> watchdog test keeps calling ioctl() with WDIOC_KEEPALIVE to make sure
> the watchdog card timer is reset.

Err..

How watchdog test keep calling ioctl() with WDIOC_KEEPALIVE after ./watchdog_test has finished?

In my understanding, the cause is that, ./watchdog_test didn't goto neither
A)
347 end:
348         /*
349          * Send specific magic character 'V' just in case Magic Close is
350          * enabled to ensure watchdog gets disabled on close.
351          */
352         ret = write(fd, &v, 1);
353         if (ret < 0)
354                 printf("Stopping watchdog ticks failed (%d)...\n", errno);

nor B)

  68 static void term(int sig)
  69 {
  70         int ret = write(fd, &v, 1);
  71
  72         close(fd);
  73         if (ret < 0)
  74                 printf("\nStopping watchdog ticks failed (%d)...\n", errno);
  75         else
  76                 printf("\nStopping watchdog ticks...\n");
  77         exit(0);
  78 }

to "ensure watchdog gets disabled on close"


The timeout default signal is SIGTERM, watchdog_test only registered SIGINT handler.

Thanks
Zhijian



> 
> If you are seeing reboots, that means watchdog test couldn't reset the
> timer. This usually mean system is unresponsive or something is wrong
> with the watchdog card on your system.
> 
> This is the behavior you would expect from a watchdog timer. Does your
> system have a watchdog card ot or you enabling softdog module?
> 
> Either way there is some other reason for the system reboot.
> 
> thanks,
> -- Shuah
Shuah Khan Oct. 28, 2024, 5:55 a.m. UTC | #7
On 10/27/24 23:45, Zhijian Li (Fujitsu) wrote:
> 
> 
> On 28/10/2024 13:31, Shuah Khan wrote:
>> On 10/27/24 22:02, Zhijian Li (Fujitsu) wrote:
>>>
>>>
>>> On 28/10/2024 11:29, Shuah Khan wrote:
>>>> On 10/27/24 18:50, Zhijian Li (Fujitsu) wrote:
>>>>>
>>>>>
>>>>> On 27/10/2024 08:28, Shuah Khan wrote:
>>>>>> On 10/24/24 19:39, Li Zhijian wrote:
>>>>>>> Currently, watchdog-test keep running until it gets a SIGINT. However,
>>>>>>> when watchdog-test is executed from the kselftests framework, where it
>>>>>>> launches test via timeout which will send SIGTERM in time up. This could
>>>>>>> lead to
>>>>>>> 1. watchdog haven't stop, a watchdog reset is triggered to reboot the OS
>>>>>>>        in silent.
>>>>>>> 2. kselftests gets an timeout exit code, and judge watchdog-test as
>>>>>>>       'not ok'
>>>>>>>
>>>>>> This test isn't really supposed to be run from kselftest framework.
>>>>>> This is the reason why it isn't included in the default run.
>>>>>
>>>>> May I know what's the default run, is it different from `make run_tests` ?
>>>>
>>>> No it isn't. "make kselftest" runs only the targets mentioned in the
>>>> selftests Makefile. That is considered the kselftest default run.
>>>
>>> Hey, Shuah,
>>>
>>>
>>> Thanks for your explanation.
>>> If that is the case, I do not have an urgent need for the current patch, expect
>>> I'd like to avoid the reboot issue after an accidentally `make run_tests`
>>>
>>> Some changes are make as below, please take a look. I will send it out we reach a consensus.
>>>
>>>
>>> commit 2296f9d88fde4921758a45bf160a7f1b9d4678a0 (HEAD)
>>> Author: Li Zhijian <lizhijian@fujitsu.com>
>>> Date:   Mon Oct 28 11:54:03 2024 +0800
>>>
>>>        selftests/watchdog-test: Fix system accidentally reset after watchdog-test
>>>        After `make run_tests` to run watchdog-test, a system reboot would
>>>        happen due to watchdog not stop.
>>>        ```
>>
>> The system shouldn't reboot just because watchdog test is left running.
>> watchdog test keeps calling ioctl() with WDIOC_KEEPALIVE to make sure
>> the watchdog card timer is reset.
> 
> Err..
> 
> How watchdog test keep calling ioctl() with WDIOC_KEEPALIVE after ./watchdog_test has finished?
> 
> In my understanding, the cause is that, ./watchdog_test didn't goto neither
> A)
> 347 end:
> 348         /*
> 349          * Send specific magic character 'V' just in case Magic Close is
> 350          * enabled to ensure watchdog gets disabled on close.
> 351          */
> 352         ret = write(fd, &v, 1);
> 353         if (ret < 0)
> 354                 printf("Stopping watchdog ticks failed (%d)...\n", errno);
> 
> nor B)

Can you send strace output from "make run_tests" from your system?

thanks,
-- Shuah
Zhijian Li (Fujitsu) Oct. 28, 2024, 6:06 a.m. UTC | #8
linux/tools/testing/selftests/watchdog# make run_tests
TAP version 13
1..1
# timeout set to 45
# selftests: watchdog: watchdog-test
# Watchdog Ticking Away!
# .............................................#
not ok 1 selftests: watchdog: watchdog-test # TIMEOUT 45 seconds


And i got warning in dmesg

                                                                                                                   
[ 1953.229511] watchdog: watchdog0: watchdog did not stop!




On 28/10/2024 13:55, Shuah Khan wrote:
>>>>
>>>
>>> The system shouldn't reboot just because watchdog test is left running.
>>> watchdog test keeps calling ioctl() with WDIOC_KEEPALIVE to make sure
>>> the watchdog card timer is reset.
>>
>> Err..
>>
>> How watchdog test keep calling ioctl() with WDIOC_KEEPALIVE after ./watchdog_test has finished?
>>
>> In my understanding, the cause is that, ./watchdog_test didn't goto neither
>> A)
>> 347 end:
>> 348         /*
>> 349          * Send specific magic character 'V' just in case Magic Close is
>> 350          * enabled to ensure watchdog gets disabled on close.
>> 351          */
>> 352         ret = write(fd, &v, 1);
>> 353         if (ret < 0)
>> 354                 printf("Stopping watchdog ticks failed (%d)...\n", errno);
>>
>> nor B)
> 
> Can you send strace output from "make run_tests" from your system?
> 
> thanks,
> -- Shuah
Shuah Khan Oct. 28, 2024, 6:25 a.m. UTC | #9
On 10/28/24 00:06, Zhijian Li (Fujitsu) wrote:
> linux/tools/testing/selftests/watchdog# make run_tests
> TAP version 13
> 1..1
> # timeout set to 45
> # selftests: watchdog: watchdog-test
> # Watchdog Ticking Away!
> # .............................................#
> not ok 1 selftests: watchdog: watchdog-test # TIMEOUT 45 seconds
> 
> 
> And i got warning in dmesg
> 
>                                                                                                                     
> [ 1953.229511] watchdog: watchdog0: watchdog did not stop!
> 
> 
> 
> 

Run "make run_tests" under strace and send me the output.

thanks,
-- Shuah
Zhijian Li (Fujitsu) Oct. 28, 2024, 6:32 a.m. UTC | #10
On 28/10/2024 14:25, Shuah Khan wrote:
> On 10/28/24 00:06, Zhijian Li (Fujitsu) wrote:
>> linux/tools/testing/selftests/watchdog# make run_tests
>> TAP version 13
>> 1..1
>> # timeout set to 45
>> # selftests: watchdog: watchdog-test
>> # Watchdog Ticking Away!
>> # .............................................#
>> not ok 1 selftests: watchdog: watchdog-test # TIMEOUT 45 seconds
>>
>>
>> And i got warning in dmesg
>>
>> [ 1953.229511] watchdog: watchdog0: watchdog did not stop!
>>
>>
>>
>>
> 
> Run "make run_tests" under strace and send me the output.


Could you share the exact command, how to 'Run "make run_tests" under strace'




> 
> thanks,
> -- Shuah
> 
> 
>
Shuah Khan Oct. 28, 2024, 6:44 a.m. UTC | #11
On 10/28/24 00:32, Zhijian Li (Fujitsu) wrote:
> 
> 
> On 28/10/2024 14:25, Shuah Khan wrote:
>> On 10/28/24 00:06, Zhijian Li (Fujitsu) wrote:
>>> linux/tools/testing/selftests/watchdog# make run_tests
>>> TAP version 13
>>> 1..1
>>> # timeout set to 45
>>> # selftests: watchdog: watchdog-test
>>> # Watchdog Ticking Away!
>>> # .............................................#
>>> not ok 1 selftests: watchdog: watchdog-test # TIMEOUT 45 seconds
>>>
>>>
>>> And i got warning in dmesg
>>>
>>> [ 1953.229511] watchdog: watchdog0: watchdog did not stop!
>>>
>>>
>>>
>>>
>>
>> Run "make run_tests" under strace and send me the output.
> 
> 
> Could you share the exact command, how to 'Run "make run_tests" under strace'
> 

strace make run_tests > strace.out 2>&1

Send me strace.out

thanks.
-- Shuah
Shuah Khan Oct. 29, 2024, 2:24 a.m. UTC | #12
On 10/28/24 00:44, Shuah Khan wrote:
> On 10/28/24 00:32, Zhijian Li (Fujitsu) wrote:
>>
>>
>> On 28/10/2024 14:25, Shuah Khan wrote:
>>> On 10/28/24 00:06, Zhijian Li (Fujitsu) wrote:
>>>> linux/tools/testing/selftests/watchdog# make run_tests
>>>> TAP version 13
>>>> 1..1
>>>> # timeout set to 45
>>>> # selftests: watchdog: watchdog-test
>>>> # Watchdog Ticking Away!
>>>> # .............................................#
>>>> not ok 1 selftests: watchdog: watchdog-test # TIMEOUT 45 seconds
>>>>
>>>>
>>>> And i got warning in dmesg
>>>>
>>>> [ 1953.229511] watchdog: watchdog0: watchdog did not stop!
>>>>
>>>>
>>>>
>>>>
>>>
>>> Run "make run_tests" under strace and send me the output.
>>
>>
>> Could you share the exact command, how to 'Run "make run_tests" under strace'
>>
> 
> strace make run_tests > strace.out 2>&1
> 
> Send me strace.out

Thank you for the strace output. kselftest uses a timeout to terminate
hung tests - that timeout is 45 seconds. When you run "make run_tests"
under watchdog directory, you are running into this.

Yes your fix to add SIGTERM handling makes sense. Please also handle
other signals - SIGKILL, SIGQUIT.

Thanks for the find.

thanks,
-- Shuah
Zhijian Li (Fujitsu) Oct. 29, 2024, 3:03 a.m. UTC | #13
On 29/10/2024 10:24, Shuah Khan wrote:
> On 10/28/24 00:44, Shuah Khan wrote:
>> On 10/28/24 00:32, Zhijian Li (Fujitsu) wrote:
>>>
>>>
>>> On 28/10/2024 14:25, Shuah Khan wrote:
>>>> On 10/28/24 00:06, Zhijian Li (Fujitsu) wrote:
>>>>> linux/tools/testing/selftests/watchdog# make run_tests
>>>>> TAP version 13
>>>>> 1..1
>>>>> # timeout set to 45
>>>>> # selftests: watchdog: watchdog-test
>>>>> # Watchdog Ticking Away!
>>>>> # .............................................#
>>>>> not ok 1 selftests: watchdog: watchdog-test # TIMEOUT 45 seconds
>>>>>
>>>>>
>>>>> And i got warning in dmesg
>>>>>
>>>>> [ 1953.229511] watchdog: watchdog0: watchdog did not stop!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> Run "make run_tests" under strace and send me the output.
>>>
>>>
>>> Could you share the exact command, how to 'Run "make run_tests" under strace'
>>>
>>
>> strace make run_tests > strace.out 2>&1
>>
>> Send me strace.out
> 
> Thank you for the strace output. kselftest uses a timeout to terminate
> hung tests - that timeout is 45 seconds. When you run "make run_tests"
> under watchdog directory, you are running into this.
> 
> Yes your fix to add SIGTERM handling makes sense. Please also handle
> other signals - SIGKILL, SIGQUIT.


Understood, I will update the patch soon.


Thanks
Zhijian

> 
> Thanks for the find.
> 
> thanks,
> -- Shuah
diff mbox series

Patch

diff --git a/tools/testing/selftests/watchdog/watchdog-test.c b/tools/testing/selftests/watchdog/watchdog-test.c
index bc71cbca0dde..2f8fd2670897 100644
--- a/tools/testing/selftests/watchdog/watchdog-test.c
+++ b/tools/testing/selftests/watchdog/watchdog-test.c
@@ -27,7 +27,7 @@ 
 
 int fd;
 const char v = 'V';
-static const char sopts[] = "bdehp:st:Tn:NLf:i";
+static const char sopts[] = "bdehp:st:Tn:NLf:c:i";
 static const struct option lopts[] = {
 	{"bootstatus",          no_argument, NULL, 'b'},
 	{"disable",             no_argument, NULL, 'd'},
@@ -42,6 +42,7 @@  static const struct option lopts[] = {
 	{"gettimeleft",		no_argument, NULL, 'L'},
 	{"file",          required_argument, NULL, 'f'},
 	{"info",		no_argument, NULL, 'i'},
+	{"count",         required_argument, NULL, 'c'},
 	{NULL,                  no_argument, NULL, 0x0}
 };
 
@@ -95,6 +96,7 @@  static void usage(char *progname)
 	printf(" -n, --pretimeout=T\tSet the pretimeout to T seconds\n");
 	printf(" -N, --getpretimeout\tGet the pretimeout\n");
 	printf(" -L, --gettimeleft\tGet the time left until timer expires\n");
+	printf(" -c, --count\tStop after feeding the watchdog count times\n");
 	printf("\n");
 	printf("Parameters are parsed left-to-right in real-time.\n");
 	printf("Example: %s -d -t 10 -p 5 -e\n", progname);
@@ -174,7 +176,7 @@  int main(int argc, char *argv[])
 	unsigned int ping_rate = DEFAULT_PING_RATE;
 	int ret;
 	int c;
-	int oneshot = 0;
+	int oneshot = 0, stop = 1, count = 0;
 	char *file = "/dev/watchdog";
 	struct watchdog_info info;
 	int temperature;
@@ -307,6 +309,9 @@  int main(int argc, char *argv[])
 			else
 				printf("WDIOC_GETTIMELEFT error '%s'\n", strerror(errno));
 			break;
+		case 'c':
+			stop = 0;
+			count = strtoul(optarg, NULL, 0);
 		case 'f':
 			/* Handled above */
 			break;
@@ -336,8 +341,8 @@  int main(int argc, char *argv[])
 
 	signal(SIGINT, term);
 
-	while (1) {
-		keep_alive();
+	while (stop || count--) {
+		exit_code = keep_alive();
 		sleep(ping_rate);
 	}
 end: