diff mbox series

was [Re: [PATCH v3] watchdog: Add hrtimer-based pretimeout feature]

Message ID 85ce6068-c98e-dbe0-a4b6-5c877b460f64@kernel.org (mailing list archive)
State Not Applicable
Headers show
Series was [Re: [PATCH v3] watchdog: Add hrtimer-based pretimeout feature] | expand

Commit Message

Jiri Slaby Sept. 4, 2021, 8:16 a.m. UTC
On 02. 09. 21, 16:05, Guenter Roeck wrote:
> On 9/1/21 11:55 PM, Jiri Slaby wrote:
>> On 03. 02. 21, 21:11, Curtis Klein wrote:
>>> This adds the option to use a hrtimer to generate a watchdog pretimeout
>>> event for hardware watchdogs that do not natively support watchdog
>>> pretimeouts.
>>>
>>> With this enabled, all watchdogs will appear to have pretimeout support
>>> in userspace. If no pretimeout value is set, there will be no change in
>>> the watchdog's behavior.
>>
>> Hi,
>>
>> on my Dell Latitude 7280, CONFIG_WATCHDOG_HRTIMER_PRETIMEOUT=y causes 
>> all reboot, kexec, suspend to panic. Disabling that option makes it 
>> all work again. Provided it happens very late in the process, I don't 
>> know how to grab some logs...
>>
>> Any ideas?
>>
> 
> AFAICS the timer does not stop on reboot. I think we'll need to augment 
> the code
> to do that.

No, it is stopped via device unregister -> watchdog_dev_unregister -> 
watchdog_cdev_unregister -> watchdog_hrtimer_pretimeout_stop.

But look:
watchdog_cdev_unregister
   -> wdd->wd_data = NULL;
   -> watchdog_hrtimer_pretimeout_stop
     -> hrtimer_cancel(&wdd->wd_data->pretimeout_timer);

The diff below obviously fixes the issue, but I don't know what the 
consequences are. The other possibility would be to pass wd_data 
directly to watchdog_hrtimer_pretimeout_stop.

I don't know how this can work on some machines (I verified it does).

         wdd->wd_data = NULL;
@@ -1103,7 +1105,6 @@ static void watchdog_cdev_unregister(struct 
watchdog_device *wdd)

         hrtimer_cancel(&wd_data->timer);
         kthread_cancel_work_sync(&wd_data->work);
-       watchdog_hrtimer_pretimeout_stop(wdd);

         put_device(&wd_data->dev);
  }

thanks,

Comments

Curtis Klein Sept. 5, 2021, 10:22 p.m. UTC | #1
On 9/4/21, 1:19AM, Jiri Slaby wrote:
> On 04. 09. 21, 10:16, Jiri Slaby wrote:
> > On 02. 09. 21, 16:05, Guenter Roeck wrote:
> >> On 9/1/21 11:55 PM, Jiri Slaby wrote:
> >>> On 03. 02. 21, 21:11, Curtis Klein wrote:
> >>>> This adds the option to use a hrtimer to generate a watchdog pretimeout
> >>>> event for hardware watchdogs that do not natively support watchdog
> >>>> pretimeouts.
> >>>>
> >>>> With this enabled, all watchdogs will appear to have pretimeout support
> >>>> in userspace. If no pretimeout value is set, there will be no change in
> >>>> the watchdog's behavior.
> >>>
> >>> Hi,
> >>>
> >>> on my Dell Latitude 7280, CONFIG_WATCHDOG_HRTIMER_PRETIMEOUT=y causes
> >>> all reboot, kexec, suspend to panic. Disabling that option makes it
> >>> all work again. Provided it happens very late in the process, I don't
> >>> know how to grab some logs...
> >>>
> >>> Any ideas?
> >>>
> >>
> >> AFAICS the timer does not stop on reboot. I think we'll need to
> >> augment the code
> >> to do that.
> >
> > No, it is stopped via device unregister -> watchdog_dev_unregister ->
> > watchdog_cdev_unregister -> watchdog_hrtimer_pretimeout_stop.
> >
> > But look:
> > watchdog_cdev_unregister
> >    -> wdd->wd_data = NULL;
> >    -> watchdog_hrtimer_pretimeout_stop
> >      -> hrtimer_cancel(&wdd->wd_data->pretimeout_timer);
> >
> > The diff below obviously fixes the issue,
> 
> Which is exactly a -next commit:
> commit c7b178dae139f8857edc50888cfbf251cd974a38
> Author: Curtis Klein <curtis.klein@hpe.com>
> Date:   Tue Jun 22 23:26:23 2021 -0700
> 
>      watchdog: Fix NULL pointer dereference when releasing cdev
> 
> > --- a/drivers/watchdog/watchdog_dev.c
> > +++ b/drivers/watchdog/watchdog_dev.c
> > @@ -1096,6 +1096,8 @@ static void watchdog_cdev_unregister(struct
> > watchdog_device *wdd)
> >                  watchdog_stop(wdd);
> >          }
> >
> > +       watchdog_hrtimer_pretimeout_stop(wdd);
> > +
> >          mutex_lock(&wd_data->lock);
> >          wd_data->wdd = NULL;
> >          wdd->wd_data = NULL;
> > @@ -1103,7 +1105,6 @@ static void watchdog_cdev_unregister(struct
> > watchdog_device *wdd)
> >
> >          hrtimer_cancel(&wd_data->timer);
> >          kthread_cancel_work_sync(&wd_data->work);
> > -       watchdog_hrtimer_pretimeout_stop(wdd);
> >
> >          put_device(&wd_data->dev);
> >   }
> >
> > thanks,
> 
> 
> --
> js
> suse labs

Does it still make sense to stop the timer on reboot or suspend?

I haven't had any problems with rebooting but I haven't been able to test
suspending.

-Curtis
Guenter Roeck Sept. 5, 2021, 10:56 p.m. UTC | #2
On 9/5/21 3:22 PM, Klein, Curtis wrote:
> On 9/4/21, 1:19AM, Jiri Slaby wrote:
>> On 04. 09. 21, 10:16, Jiri Slaby wrote:
>>> On 02. 09. 21, 16:05, Guenter Roeck wrote:
>>>> On 9/1/21 11:55 PM, Jiri Slaby wrote:
>>>>> On 03. 02. 21, 21:11, Curtis Klein wrote:
>>>>>> This adds the option to use a hrtimer to generate a watchdog pretimeout
>>>>>> event for hardware watchdogs that do not natively support watchdog
>>>>>> pretimeouts.
>>>>>>
>>>>>> With this enabled, all watchdogs will appear to have pretimeout support
>>>>>> in userspace. If no pretimeout value is set, there will be no change in
>>>>>> the watchdog's behavior.
>>>>>
>>>>> Hi,
>>>>>
>>>>> on my Dell Latitude 7280, CONFIG_WATCHDOG_HRTIMER_PRETIMEOUT=y causes
>>>>> all reboot, kexec, suspend to panic. Disabling that option makes it
>>>>> all work again. Provided it happens very late in the process, I don't
>>>>> know how to grab some logs...
>>>>>
>>>>> Any ideas?
>>>>>
>>>>
>>>> AFAICS the timer does not stop on reboot. I think we'll need to
>>>> augment the code
>>>> to do that.
>>>
>>> No, it is stopped via device unregister -> watchdog_dev_unregister ->
>>> watchdog_cdev_unregister -> watchdog_hrtimer_pretimeout_stop.
>>>
>>> But look:
>>> watchdog_cdev_unregister
>>>     -> wdd->wd_data = NULL;
>>>     -> watchdog_hrtimer_pretimeout_stop
>>>       -> hrtimer_cancel(&wdd->wd_data->pretimeout_timer);
>>>
>>> The diff below obviously fixes the issue,
>>
>> Which is exactly a -next commit:
>> commit c7b178dae139f8857edc50888cfbf251cd974a38
>> Author: Curtis Klein <curtis.klein@hpe.com>
>> Date:   Tue Jun 22 23:26:23 2021 -0700
>>
>>       watchdog: Fix NULL pointer dereference when releasing cdev
>>
>>> --- a/drivers/watchdog/watchdog_dev.c
>>> +++ b/drivers/watchdog/watchdog_dev.c
>>> @@ -1096,6 +1096,8 @@ static void watchdog_cdev_unregister(struct
>>> watchdog_device *wdd)
>>>                   watchdog_stop(wdd);
>>>           }
>>>
>>> +       watchdog_hrtimer_pretimeout_stop(wdd);
>>> +
>>>           mutex_lock(&wd_data->lock);
>>>           wd_data->wdd = NULL;
>>>           wdd->wd_data = NULL;
>>> @@ -1103,7 +1105,6 @@ static void watchdog_cdev_unregister(struct
>>> watchdog_device *wdd)
>>>
>>>           hrtimer_cancel(&wd_data->timer);
>>>           kthread_cancel_work_sync(&wd_data->work);
>>> -       watchdog_hrtimer_pretimeout_stop(wdd);
>>>
>>>           put_device(&wd_data->dev);
>>>    }
>>>
>>> thanks,
>>
>>
>> --
>> js
>> suse labs
> 
> Does it still make sense to stop the timer on reboot or suspend?
> 
> I haven't had any problems with rebooting but I haven't been able to test
> suspending.
> 
Only if it is really a problem.

Guenter
diff mbox series

Patch

--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -1096,6 +1096,8 @@  static void watchdog_cdev_unregister(struct 
watchdog_device *wdd)
                 watchdog_stop(wdd);
         }

+       watchdog_hrtimer_pretimeout_stop(wdd);
+
         mutex_lock(&wd_data->lock);
         wd_data->wdd = NULL;