diff mbox

[v6,05/19] watchdog: orion: Make sure the watchdog is initially stopped

Message ID 1391707226-18258-6-git-send-email-ezequiel.garcia@free-electrons.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ezequiel Garcia Feb. 6, 2014, 5:20 p.m. UTC
Having the watchdog initially fully stopped is important to avoid
any spurious watchdog triggers, in case the registers are not in
its reset state.

Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
---
 drivers/watchdog/orion_wdt.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Guenter Roeck Feb. 7, 2014, 2:02 a.m. UTC | #1
On 02/06/2014 09:20 AM, Ezequiel Garcia wrote:
> Having the watchdog initially fully stopped is important to avoid
> any spurious watchdog triggers, in case the registers are not in
> its reset state.
>
> Reviewed-by: Guenter Roeck <linux@roeck-us.net>
> Tested-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
> Tested-by: Willy Tarreau <w@1wt.eu>
> Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
> ---
>   drivers/watchdog/orion_wdt.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/watchdog/orion_wdt.c b/drivers/watchdog/orion_wdt.c
> index 6746033..2dbeee9 100644
> --- a/drivers/watchdog/orion_wdt.c
> +++ b/drivers/watchdog/orion_wdt.c
> @@ -142,6 +142,9 @@ static int orion_wdt_probe(struct platform_device *pdev)
>   	orion_wdt.max_timeout = wdt_max_duration;
>   	watchdog_init_timeout(&orion_wdt, heartbeat, &pdev->dev);
>
> +	/* Let's make sure the watchdog is fully stopped */
> +	orion_wdt_stop(&orion_wdt);
> +

Actually we just had that in another driver, and I stumbled over it there.

Problem with stopping the watchdog in probe unconditionally is that you can
use it to defeat nowayout: unload the module, then load it again,
and the watchdog is stopped even if nowayout is true.

Is this really what you want ? Or, in other words, what is the problem
you are trying to solve ?

Thanks,
Guenter


>   	watchdog_set_nowayout(&orion_wdt, nowayout);
>   	ret = watchdog_register_device(&orion_wdt);
>   	if (ret)
>
Ezequiel Garcia Feb. 7, 2014, 10:40 a.m. UTC | #2
On Thu, Feb 06, 2014 at 06:02:56PM -0800, Guenter Roeck wrote:
> On 02/06/2014 09:20 AM, Ezequiel Garcia wrote:
> > Having the watchdog initially fully stopped is important to avoid
> > any spurious watchdog triggers, in case the registers are not in
> > its reset state.
> >
> > Reviewed-by: Guenter Roeck <linux@roeck-us.net>
> > Tested-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
> > Tested-by: Willy Tarreau <w@1wt.eu>
> > Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
> > ---
> >   drivers/watchdog/orion_wdt.c | 3 +++
> >   1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/watchdog/orion_wdt.c b/drivers/watchdog/orion_wdt.c
> > index 6746033..2dbeee9 100644
> > --- a/drivers/watchdog/orion_wdt.c
> > +++ b/drivers/watchdog/orion_wdt.c
> > @@ -142,6 +142,9 @@ static int orion_wdt_probe(struct platform_device *pdev)
> >   	orion_wdt.max_timeout = wdt_max_duration;
> >   	watchdog_init_timeout(&orion_wdt, heartbeat, &pdev->dev);
> >
> > +	/* Let's make sure the watchdog is fully stopped */
> > +	orion_wdt_stop(&orion_wdt);
> > +
> 
> Actually we just had that in another driver, and I stumbled over it there.
> 
> Problem with stopping the watchdog in probe unconditionally is that you can
> use it to defeat nowayout: unload the module, then load it again,
> and the watchdog is stopped even if nowayout is true.
> 

Hm... I see.

> Is this really what you want ? Or, in other words, what is the problem
> you are trying to solve ?
> 

Well, this is related to the discussion about the bootloader not
reseting the watchdog properly, provoking spurious watchdog triggering.

Jason Gunthorpe explained [1] that we needed a particular sequence:

 1. Disable WDT
 2. Clear bridge
 3. Enable WDT

We added the irq handling to satisfy (2), and the watchdog stop for (1).

The watchdog stop was agreed specifically [2].

Ideas?

[1] http://www.spinics.net/lists/arm-kernel/msg302340.html
[2] http://www.spinics.net/lists/arm-kernel/msg302507.html
Guenter Roeck Feb. 7, 2014, 1:38 p.m. UTC | #3
On 02/07/2014 02:40 AM, Ezequiel Garcia wrote:
> On Thu, Feb 06, 2014 at 06:02:56PM -0800, Guenter Roeck wrote:
>> On 02/06/2014 09:20 AM, Ezequiel Garcia wrote:
>>> Having the watchdog initially fully stopped is important to avoid
>>> any spurious watchdog triggers, in case the registers are not in
>>> its reset state.
>>>
>>> Reviewed-by: Guenter Roeck <linux@roeck-us.net>
>>> Tested-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
>>> Tested-by: Willy Tarreau <w@1wt.eu>
>>> Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
>>> ---
>>>    drivers/watchdog/orion_wdt.c | 3 +++
>>>    1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/watchdog/orion_wdt.c b/drivers/watchdog/orion_wdt.c
>>> index 6746033..2dbeee9 100644
>>> --- a/drivers/watchdog/orion_wdt.c
>>> +++ b/drivers/watchdog/orion_wdt.c
>>> @@ -142,6 +142,9 @@ static int orion_wdt_probe(struct platform_device *pdev)
>>>    	orion_wdt.max_timeout = wdt_max_duration;
>>>    	watchdog_init_timeout(&orion_wdt, heartbeat, &pdev->dev);
>>>
>>> +	/* Let's make sure the watchdog is fully stopped */
>>> +	orion_wdt_stop(&orion_wdt);
>>> +
>>
>> Actually we just had that in another driver, and I stumbled over it there.
>>
>> Problem with stopping the watchdog in probe unconditionally is that you can
>> use it to defeat nowayout: unload the module, then load it again,
>> and the watchdog is stopped even if nowayout is true.
>>
>
> Hm... I see.
>
>> Is this really what you want ? Or, in other words, what is the problem
>> you are trying to solve ?
>>
>
> Well, this is related to the discussion about the bootloader not
> reseting the watchdog properly, provoking spurious watchdog triggering.
>
> Jason Gunthorpe explained [1] that we needed a particular sequence:
>
>   1. Disable WDT
>   2. Clear bridge
>   3. Enable WDT
>
> We added the irq handling to satisfy (2), and the watchdog stop for (1).
>
> The watchdog stop was agreed specifically [2].
>
> Ideas?
>

Other drivers assume that if the watchdog is running, it is supposed
to be running. The more common approach in such cases is to ping the
watchdog once to give userspace more time to get ready, but leave
it enabled. So you could check if the watchdog is enabled, and if
it was enabled re-enable it after initialization is complete
(and maybe log a message stating that the watchdog is enabled).

If you don't want to do that, and if you are defeating nowayout
on purpose to fix a problem with a broken bootloader,
you should at least put in comment describing the problem you are
trying to solve, and that you accept breaking nowayout with your fix.

Thanks,
Guenter
Ezequiel Garcia Feb. 7, 2014, 3:17 p.m. UTC | #4
On Fri, Feb 07, 2014 at 05:38:09AM -0800, Guenter Roeck wrote:
> On 02/07/2014 02:40 AM, Ezequiel Garcia wrote:
> > On Thu, Feb 06, 2014 at 06:02:56PM -0800, Guenter Roeck wrote:
> >> On 02/06/2014 09:20 AM, Ezequiel Garcia wrote:
> >>> Having the watchdog initially fully stopped is important to avoid
> >>> any spurious watchdog triggers, in case the registers are not in
> >>> its reset state.
> >>>
> >>> Reviewed-by: Guenter Roeck <linux@roeck-us.net>
> >>> Tested-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
> >>> Tested-by: Willy Tarreau <w@1wt.eu>
> >>> Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
> >>> ---
> >>>    drivers/watchdog/orion_wdt.c | 3 +++
> >>>    1 file changed, 3 insertions(+)
> >>>
> >>> diff --git a/drivers/watchdog/orion_wdt.c b/drivers/watchdog/orion_wdt.c
> >>> index 6746033..2dbeee9 100644
> >>> --- a/drivers/watchdog/orion_wdt.c
> >>> +++ b/drivers/watchdog/orion_wdt.c
> >>> @@ -142,6 +142,9 @@ static int orion_wdt_probe(struct platform_device *pdev)
> >>>    	orion_wdt.max_timeout = wdt_max_duration;
> >>>    	watchdog_init_timeout(&orion_wdt, heartbeat, &pdev->dev);
> >>>
> >>> +	/* Let's make sure the watchdog is fully stopped */
> >>> +	orion_wdt_stop(&orion_wdt);
> >>> +
> >>
> >> Actually we just had that in another driver, and I stumbled over it there.
> >>
> >> Problem with stopping the watchdog in probe unconditionally is that you can
> >> use it to defeat nowayout: unload the module, then load it again,
> >> and the watchdog is stopped even if nowayout is true.
> >>
> >
> > Hm... I see.
> >
> >> Is this really what you want ? Or, in other words, what is the problem
> >> you are trying to solve ?
> >>
> >
> > Well, this is related to the discussion about the bootloader not
> > reseting the watchdog properly, provoking spurious watchdog triggering.
> >
> > Jason Gunthorpe explained [1] that we needed a particular sequence:
> >
> >   1. Disable WDT
> >   2. Clear bridge
> >   3. Enable WDT
> >
> > We added the irq handling to satisfy (2), and the watchdog stop for (1).
> >
> > The watchdog stop was agreed specifically [2].
> >
> > Ideas?
> >
> 
> Other drivers assume that if the watchdog is running, it is supposed
> to be running. The more common approach in such cases is to ping the
> watchdog once to give userspace more time to get ready, but leave
> it enabled. So you could check if the watchdog is enabled, and if
> it was enabled re-enable it after initialization is complete
> (and maybe log a message stating that the watchdog is enabled).
> 
> If you don't want to do that, and if you are defeating nowayout
> on purpose to fix a problem with a broken bootloader,
> you should at least put in comment describing the problem you are
> trying to solve, and that you accept breaking nowayout with your fix.
> 

I'm not fond of not having "nowayout" option on our driver, given I'm sure
it's a watchdog feature for a good reason.

On the other side, I can't see how can we distinguish a previously
and explicitly enabled watchdog, from a spurious enable by broken bootloader.

Jason, what do you say?
Jason Cooper Feb. 7, 2014, 3:44 p.m. UTC | #5
On Fri, Feb 07, 2014 at 12:17:28PM -0300, Ezequiel Garcia wrote:
> On Fri, Feb 07, 2014 at 05:38:09AM -0800, Guenter Roeck wrote:
> > On 02/07/2014 02:40 AM, Ezequiel Garcia wrote:
> > > On Thu, Feb 06, 2014 at 06:02:56PM -0800, Guenter Roeck wrote:
> > >> On 02/06/2014 09:20 AM, Ezequiel Garcia wrote:
> > >>> Having the watchdog initially fully stopped is important to avoid
> > >>> any spurious watchdog triggers, in case the registers are not in
> > >>> its reset state.
> > >>>
> > >>> Reviewed-by: Guenter Roeck <linux@roeck-us.net>
> > >>> Tested-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
> > >>> Tested-by: Willy Tarreau <w@1wt.eu>
> > >>> Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
> > >>> ---
> > >>>    drivers/watchdog/orion_wdt.c | 3 +++
> > >>>    1 file changed, 3 insertions(+)
> > >>>
> > >>> diff --git a/drivers/watchdog/orion_wdt.c b/drivers/watchdog/orion_wdt.c
> > >>> index 6746033..2dbeee9 100644
> > >>> --- a/drivers/watchdog/orion_wdt.c
> > >>> +++ b/drivers/watchdog/orion_wdt.c
> > >>> @@ -142,6 +142,9 @@ static int orion_wdt_probe(struct platform_device *pdev)
> > >>>    	orion_wdt.max_timeout = wdt_max_duration;
> > >>>    	watchdog_init_timeout(&orion_wdt, heartbeat, &pdev->dev);
> > >>>
> > >>> +	/* Let's make sure the watchdog is fully stopped */
> > >>> +	orion_wdt_stop(&orion_wdt);
> > >>> +
> > >>
> > >> Actually we just had that in another driver, and I stumbled over it there.
> > >>
> > >> Problem with stopping the watchdog in probe unconditionally is that you can
> > >> use it to defeat nowayout: unload the module, then load it again,
> > >> and the watchdog is stopped even if nowayout is true.
> > >>

How often would a user legitimately want to unload/load the watchdog
module?

> > >
> > > Hm... I see.
> > >
> > >> Is this really what you want ? Or, in other words, what is the problem
> > >> you are trying to solve ?
> > >>
> > >
> > > Well, this is related to the discussion about the bootloader not
> > > reseting the watchdog properly, provoking spurious watchdog triggering.
> > >
> > > Jason Gunthorpe explained [1] that we needed a particular sequence:
> > >
> > >   1. Disable WDT
> > >   2. Clear bridge
> > >   3. Enable WDT
> > >
> > > We added the irq handling to satisfy (2), and the watchdog stop for (1).
> > >
> > > The watchdog stop was agreed specifically [2].
> > >
> > > Ideas?
> > >
> > 
> > Other drivers assume that if the watchdog is running, it is supposed
> > to be running. The more common approach in such cases is to ping the
> > watchdog once to give userspace more time to get ready, but leave
> > it enabled. So you could check if the watchdog is enabled, and if
> > it was enabled re-enable it after initialization is complete
> > (and maybe log a message stating that the watchdog is enabled).
> > 
> > If you don't want to do that, and if you are defeating nowayout
> > on purpose to fix a problem with a broken bootloader,
> > you should at least put in comment describing the problem you are
> > trying to solve, and that you accept breaking nowayout with your fix.

Yes, this should be commented.

> I'm not fond of not having "nowayout" option on our driver, given I'm sure
> it's a watchdog feature for a good reason.
> 
> On the other side, I can't see how can we distinguish a previously
> and explicitly enabled watchdog, from a spurious enable by broken bootloader.

How about we just don't define module_exit() and leave a comment as
such?  It's not unprecedented, a couple of the atm drivers are
explicitly setup like this (uPD98402.c, zatm.c, eni.c).

thx,

Jason.
Guenter Roeck Feb. 7, 2014, 4:55 p.m. UTC | #6
On Fri, Feb 07, 2014 at 10:44:53AM -0500, Jason Cooper wrote:
> On Fri, Feb 07, 2014 at 12:17:28PM -0300, Ezequiel Garcia wrote:
> > On Fri, Feb 07, 2014 at 05:38:09AM -0800, Guenter Roeck wrote:
> > > On 02/07/2014 02:40 AM, Ezequiel Garcia wrote:
> > > > On Thu, Feb 06, 2014 at 06:02:56PM -0800, Guenter Roeck wrote:
> > > >> On 02/06/2014 09:20 AM, Ezequiel Garcia wrote:
> > > >>> Having the watchdog initially fully stopped is important to avoid
> > > >>> any spurious watchdog triggers, in case the registers are not in
> > > >>> its reset state.
> > > >>>
> > > >>> Reviewed-by: Guenter Roeck <linux@roeck-us.net>
> > > >>> Tested-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
> > > >>> Tested-by: Willy Tarreau <w@1wt.eu>
> > > >>> Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
> > > >>> ---
> > > >>>    drivers/watchdog/orion_wdt.c | 3 +++
> > > >>>    1 file changed, 3 insertions(+)
> > > >>>
> > > >>> diff --git a/drivers/watchdog/orion_wdt.c b/drivers/watchdog/orion_wdt.c
> > > >>> index 6746033..2dbeee9 100644
> > > >>> --- a/drivers/watchdog/orion_wdt.c
> > > >>> +++ b/drivers/watchdog/orion_wdt.c
> > > >>> @@ -142,6 +142,9 @@ static int orion_wdt_probe(struct platform_device *pdev)
> > > >>>    	orion_wdt.max_timeout = wdt_max_duration;
> > > >>>    	watchdog_init_timeout(&orion_wdt, heartbeat, &pdev->dev);
> > > >>>
> > > >>> +	/* Let's make sure the watchdog is fully stopped */
> > > >>> +	orion_wdt_stop(&orion_wdt);
> > > >>> +
> > > >>
> > > >> Actually we just had that in another driver, and I stumbled over it there.
> > > >>
> > > >> Problem with stopping the watchdog in probe unconditionally is that you can
> > > >> use it to defeat nowayout: unload the module, then load it again,
> > > >> and the watchdog is stopped even if nowayout is true.
> > > >>
> 
> How often would a user legitimately want to unload/load the watchdog
> module?
> 
Not sure what you are saying here. If you don't think the nowayout option
should be supported, drop it. Don't claim it is supported when it isn't.

> > > >
> > > > Hm... I see.
> > > >
> > > >> Is this really what you want ? Or, in other words, what is the problem
> > > >> you are trying to solve ?
> > > >>
> > > >
> > > > Well, this is related to the discussion about the bootloader not
> > > > reseting the watchdog properly, provoking spurious watchdog triggering.
> > > >
> > > > Jason Gunthorpe explained [1] that we needed a particular sequence:
> > > >
> > > >   1. Disable WDT
> > > >   2. Clear bridge
> > > >   3. Enable WDT
> > > >
> > > > We added the irq handling to satisfy (2), and the watchdog stop for (1).
> > > >
> > > > The watchdog stop was agreed specifically [2].
> > > >
> > > > Ideas?
> > > >
> > > 
> > > Other drivers assume that if the watchdog is running, it is supposed
> > > to be running. The more common approach in such cases is to ping the
> > > watchdog once to give userspace more time to get ready, but leave
> > > it enabled. So you could check if the watchdog is enabled, and if
> > > it was enabled re-enable it after initialization is complete
> > > (and maybe log a message stating that the watchdog is enabled).
> > > 
> > > If you don't want to do that, and if you are defeating nowayout
> > > on purpose to fix a problem with a broken bootloader,
> > > you should at least put in comment describing the problem you are
> > > trying to solve, and that you accept breaking nowayout with your fix.
> 
> Yes, this should be commented.
> 
> > I'm not fond of not having "nowayout" option on our driver, given I'm sure
> > it's a watchdog feature for a good reason.
> > 
> > On the other side, I can't see how can we distinguish a previously
> > and explicitly enabled watchdog, from a spurious enable by broken bootloader.
> 
> How about we just don't define module_exit() and leave a comment as
> such?  It's not unprecedented, a couple of the atm drivers are
> explicitly setup like this (uPD98402.c, zatm.c, eni.c).
> 
Or don't support tristate in the first place. There was some argument from
others earlier that a watchdog should never be optional. Or drop the nowayout
option from the driver.

There is one problem, though, if you don't support a pre-enabled watchdog:
If the system dies before the watchdog application starts running, it will
hang. This is the reason why the watchdog is enabled on purpose by some
boot loaders.

Guenter
Jason Gunthorpe Feb. 7, 2014, 5:43 p.m. UTC | #7
On Fri, Feb 07, 2014 at 07:40:45AM -0300, Ezequiel Garcia wrote:

> Well, this is related to the discussion about the bootloader not
> reseting the watchdog properly, provoking spurious watchdog triggering.
> 
> Jason Gunthorpe explained [1] that we needed a particular sequence:
> 
>  1. Disable WDT
>  2. Clear bridge
>  3. Enable WDT
> 
> We added the irq handling to satisfy (2), and the watchdog stop for (1).

The issue here is the driver configures two 'machine kill' elements:
the PANIC IRQ and the RstOut setup.

Before configuring either of those the driver needs to ensure that any
old watchdog events are cleared out of the HW. We must not get a
spurious event. 

I agree not disabling an already functional and properly configured
counter from the bootloader is desirable.

So lets break it down a bit..

1) The IRQ:
  It looks like the cause bit latches high on watchdog timer
  expiration but has no side effect unless it is unmasked.

  The new IRQ flow code ensures the bit is cleared during request_irq
  so no old events can trigger the IRQ. Thus it is solved now.

2) The RstOut:
  It is a bit unspecific, but I think this signal latches high when
  any unmasked rst event occurs and stays high until the RstIn pin
  is asserted.

  So we don't need to worry about clearing old events from here

  Alternatively the WDRstOutEn routes the Cause bit into the RstOut
  pin.

3) The timer itself:
  The WDT is just a general timer with an optional hookup to the
  rst control. If it is harmlessly counting but not resetting we need
  to stop that before enabling rst out.

So, how about this for psuedo-code in probe:

if (readl(RSTOUTn) & WDRstOutEn)
{
    /* Watchdog is configured and may be down counting,
       don't touch it */
    request_irq(..);
}
else
{
    /* Watchdog is not configured, fully disable the timer
       and configure for watchdog operation. */
    disable_watchdog();
    request_irq();
    writel(RSTOUTn), .. WDRstOutEn);
}

Jason
Ezequiel Garcia Feb. 10, 2014, 12:22 p.m. UTC | #8
On Fri, Feb 07, 2014 at 10:43:14AM -0700, Jason Gunthorpe wrote:
> On Fri, Feb 07, 2014 at 07:40:45AM -0300, Ezequiel Garcia wrote:
> 
> > Well, this is related to the discussion about the bootloader not
> > reseting the watchdog properly, provoking spurious watchdog triggering.
> > 
> > Jason Gunthorpe explained [1] that we needed a particular sequence:
> > 
> >  1. Disable WDT
> >  2. Clear bridge
> >  3. Enable WDT
> > 
> > We added the irq handling to satisfy (2), and the watchdog stop for (1).
> 
> The issue here is the driver configures two 'machine kill' elements:
> the PANIC IRQ and the RstOut setup.
> 
> Before configuring either of those the driver needs to ensure that any
> old watchdog events are cleared out of the HW. We must not get a
> spurious event. 
> 
> I agree not disabling an already functional and properly configured
> counter from the bootloader is desirable.
> 
> So lets break it down a bit..
> 
> 1) The IRQ:
>   It looks like the cause bit latches high on watchdog timer
>   expiration but has no side effect unless it is unmasked.
> 
>   The new IRQ flow code ensures the bit is cleared during request_irq
>   so no old events can trigger the IRQ. Thus it is solved now.
> 

Agreed.

> 3) The timer itself:
>   The WDT is just a general timer with an optional hookup to the
>   rst control. If it is harmlessly counting but not resetting we need
>   to stop that before enabling rst out.
> 

Actually, the current flow is to:

1. Disable rst out and then disable the counter, in probe().

2. Enable the counter, and then enable rst out, in start().

> So, how about this for psuedo-code in probe:
> 
> if (readl(RSTOUTn) & WDRstOutEn)
> {
>     /* Watchdog is configured and may be down counting,
>        don't touch it */
>     request_irq(..);
> }
> else
> {
>     /* Watchdog is not configured, fully disable the timer
>        and configure for watchdog operation. */
>     disable_watchdog();
>     request_irq();
>     writel(RSTOUTn), .. WDRstOutEn);
> }
> 

Sounds good, although it seems to me it's actually simpler:

  /* Let's make sure the watchdog is fully stopped, unless
   * it's explicitly enabled and running
   */
  if ( !(wdt_rst_out_en && wdt_timer_enabled) ) {
    watchdog_stop();
  }
Guenter Roeck Feb. 10, 2014, 4:48 p.m. UTC | #9
Quoting Ezequiel Garcia <ezequiel.garcia@free-electrons.com>:

> On Fri, Feb 07, 2014 at 10:43:14AM -0700, Jason Gunthorpe wrote:
>> On Fri, Feb 07, 2014 at 07:40:45AM -0300, Ezequiel Garcia wrote:
>>
>> > Well, this is related to the discussion about the bootloader not
>> > reseting the watchdog properly, provoking spurious watchdog triggering.
>> >
>> > Jason Gunthorpe explained [1] that we needed a particular sequence:
>> >
>> >  1. Disable WDT
>> >  2. Clear bridge
>> >  3. Enable WDT
>> >
>> > We added the irq handling to satisfy (2), and the watchdog stop for (1).
>>
>> The issue here is the driver configures two 'machine kill' elements:
>> the PANIC IRQ and the RstOut setup.
>>
>> Before configuring either of those the driver needs to ensure that any
>> old watchdog events are cleared out of the HW. We must not get a
>> spurious event.
>>
>> I agree not disabling an already functional and properly configured
>> counter from the bootloader is desirable.
>>
>> So lets break it down a bit..
>>
>> 1) The IRQ:
>>   It looks like the cause bit latches high on watchdog timer
>>   expiration but has no side effect unless it is unmasked.
>>
>>   The new IRQ flow code ensures the bit is cleared during request_irq
>>   so no old events can trigger the IRQ. Thus it is solved now.
>>
>
> Agreed.
>
>> 3) The timer itself:
>>   The WDT is just a general timer with an optional hookup to the
>>   rst control. If it is harmlessly counting but not resetting we need
>>   to stop that before enabling rst out.
>>
>
> Actually, the current flow is to:
>
> 1. Disable rst out and then disable the counter, in probe().
>
> 2. Enable the counter, and then enable rst out, in start().
>
>> So, how about this for psuedo-code in probe:
>>
>> if (readl(RSTOUTn) & WDRstOutEn)
>> {
>>     /* Watchdog is configured and may be down counting,
>>        don't touch it */
>>     request_irq(..);
>> }
>> else
>> {
>>     /* Watchdog is not configured, fully disable the timer
>>        and configure for watchdog operation. */
>>     disable_watchdog();
>>     request_irq();
>>     writel(RSTOUTn), .. WDRstOutEn);
>> }
>>
>
> Sounds good, although it seems to me it's actually simpler:
>
>   /* Let's make sure the watchdog is fully stopped, unless
>    * it's explicitly enabled and running
>    */
>   if ( !(wdt_rst_out_en && wdt_timer_enabled) ) {
>     watchdog_stop();
>   }
>

     if (!wdt_rst_out_en || !wdt_timer_enabled)
         watchdog_stop();

seems to be a bit easier to understand.

Also watch out for coding style issues. Above code
snippet has multiple violations ;-).

Guenter
Ezequiel Garcia Feb. 10, 2014, 4:54 p.m. UTC | #10
On Mon, Feb 10, 2014 at 10:48:50AM -0600, linux@roeck-us.net wrote:
> Quoting Ezequiel Garcia <ezequiel.garcia@free-electrons.com>:
> 
> > On Fri, Feb 07, 2014 at 10:43:14AM -0700, Jason Gunthorpe wrote:
> >> On Fri, Feb 07, 2014 at 07:40:45AM -0300, Ezequiel Garcia wrote:
> >>
> >> > Well, this is related to the discussion about the bootloader not
> >> > reseting the watchdog properly, provoking spurious watchdog triggering.
> >> >
> >> > Jason Gunthorpe explained [1] that we needed a particular sequence:
> >> >
> >> >  1. Disable WDT
> >> >  2. Clear bridge
> >> >  3. Enable WDT
> >> >
> >> > We added the irq handling to satisfy (2), and the watchdog stop for (1).
> >>
> >> The issue here is the driver configures two 'machine kill' elements:
> >> the PANIC IRQ and the RstOut setup.
> >>
> >> Before configuring either of those the driver needs to ensure that any
> >> old watchdog events are cleared out of the HW. We must not get a
> >> spurious event.
> >>
> >> I agree not disabling an already functional and properly configured
> >> counter from the bootloader is desirable.
> >>
> >> So lets break it down a bit..
> >>
> >> 1) The IRQ:
> >>   It looks like the cause bit latches high on watchdog timer
> >>   expiration but has no side effect unless it is unmasked.
> >>
> >>   The new IRQ flow code ensures the bit is cleared during request_irq
> >>   so no old events can trigger the IRQ. Thus it is solved now.
> >>
> >
> > Agreed.
> >
> >> 3) The timer itself:
> >>   The WDT is just a general timer with an optional hookup to the
> >>   rst control. If it is harmlessly counting but not resetting we need
> >>   to stop that before enabling rst out.
> >>
> >
> > Actually, the current flow is to:
> >
> > 1. Disable rst out and then disable the counter, in probe().
> >
> > 2. Enable the counter, and then enable rst out, in start().
> >
> >> So, how about this for psuedo-code in probe:
> >>
> >> if (readl(RSTOUTn) & WDRstOutEn)
> >> {
> >>     /* Watchdog is configured and may be down counting,
> >>        don't touch it */
> >>     request_irq(..);
> >> }
> >> else
> >> {
> >>     /* Watchdog is not configured, fully disable the timer
> >>        and configure for watchdog operation. */
> >>     disable_watchdog();
> >>     request_irq();
> >>     writel(RSTOUTn), .. WDRstOutEn);
> >> }
> >>
> >
> > Sounds good, although it seems to me it's actually simpler:
> >
> >   /* Let's make sure the watchdog is fully stopped, unless
> >    * it's explicitly enabled and running
> >    */
> >   if ( !(wdt_rst_out_en && wdt_timer_enabled) ) {
> >     watchdog_stop();
> >   }
> >
> 
>      if (!wdt_rst_out_en || !wdt_timer_enabled)
>          watchdog_stop();
> 
> seems to be a bit easier to understand.
> 

Yeah, I was actually planning to have a orion_wdt_enabled() function
to get the running and enabled status. Looks cleaner I think.

	if (!orion_wdt_enabled())
		watchdog_stop();

So the idea is OK?

I'll push the new series in a short while. Unfortunately, this change
means I have to rebase almost all the series, because I introduce the
orion watchdog struct :-(
Guenter Roeck Feb. 10, 2014, 4:57 p.m. UTC | #11
Quoting Ezequiel Garcia <ezequiel.garcia@free-electrons.com>:

> On Mon, Feb 10, 2014 at 10:48:50AM -0600, linux@roeck-us.net wrote:
>> Quoting Ezequiel Garcia <ezequiel.garcia@free-electrons.com>:
>>
>> > On Fri, Feb 07, 2014 at 10:43:14AM -0700, Jason Gunthorpe wrote:
>> >> On Fri, Feb 07, 2014 at 07:40:45AM -0300, Ezequiel Garcia wrote:
>> >>
>> >> > Well, this is related to the discussion about the bootloader not
>> >> > reseting the watchdog properly, provoking spurious watchdog triggering.
>> >> >
>> >> > Jason Gunthorpe explained [1] that we needed a particular sequence:
>> >> >
>> >> >  1. Disable WDT
>> >> >  2. Clear bridge
>> >> >  3. Enable WDT
>> >> >
>> >> > We added the irq handling to satisfy (2), and the watchdog  
>> stop for (1).
>> >>
>> >> The issue here is the driver configures two 'machine kill' elements:
>> >> the PANIC IRQ and the RstOut setup.
>> >>
>> >> Before configuring either of those the driver needs to ensure that any
>> >> old watchdog events are cleared out of the HW. We must not get a
>> >> spurious event.
>> >>
>> >> I agree not disabling an already functional and properly configured
>> >> counter from the bootloader is desirable.
>> >>
>> >> So lets break it down a bit..
>> >>
>> >> 1) The IRQ:
>> >>   It looks like the cause bit latches high on watchdog timer
>> >>   expiration but has no side effect unless it is unmasked.
>> >>
>> >>   The new IRQ flow code ensures the bit is cleared during request_irq
>> >>   so no old events can trigger the IRQ. Thus it is solved now.
>> >>
>> >
>> > Agreed.
>> >
>> >> 3) The timer itself:
>> >>   The WDT is just a general timer with an optional hookup to the
>> >>   rst control. If it is harmlessly counting but not resetting we need
>> >>   to stop that before enabling rst out.
>> >>
>> >
>> > Actually, the current flow is to:
>> >
>> > 1. Disable rst out and then disable the counter, in probe().
>> >
>> > 2. Enable the counter, and then enable rst out, in start().
>> >
>> >> So, how about this for psuedo-code in probe:
>> >>
>> >> if (readl(RSTOUTn) & WDRstOutEn)
>> >> {
>> >>     /* Watchdog is configured and may be down counting,
>> >>        don't touch it */
>> >>     request_irq(..);
>> >> }
>> >> else
>> >> {
>> >>     /* Watchdog is not configured, fully disable the timer
>> >>        and configure for watchdog operation. */
>> >>     disable_watchdog();
>> >>     request_irq();
>> >>     writel(RSTOUTn), .. WDRstOutEn);
>> >> }
>> >>
>> >
>> > Sounds good, although it seems to me it's actually simpler:
>> >
>> >   /* Let's make sure the watchdog is fully stopped, unless
>> >    * it's explicitly enabled and running
>> >    */
>> >   if ( !(wdt_rst_out_en && wdt_timer_enabled) ) {
>> >     watchdog_stop();
>> >   }
>> >
>>
>>      if (!wdt_rst_out_en || !wdt_timer_enabled)
>>          watchdog_stop();
>>
>> seems to be a bit easier to understand.
>>
>
> Yeah, I was actually planning to have a orion_wdt_enabled() function
> to get the running and enabled status. Looks cleaner I think.
>
> 	if (!orion_wdt_enabled())
> 		watchdog_stop();
>
> So the idea is OK?
>
Sounds good to me.

> I'll push the new series in a short while. Unfortunately, this change
> means I have to rebase almost all the series, because I introduce the
> orion watchdog struct :-(

Just trying to keep you busy :-)

Guenter
diff mbox

Patch

diff --git a/drivers/watchdog/orion_wdt.c b/drivers/watchdog/orion_wdt.c
index 6746033..2dbeee9 100644
--- a/drivers/watchdog/orion_wdt.c
+++ b/drivers/watchdog/orion_wdt.c
@@ -142,6 +142,9 @@  static int orion_wdt_probe(struct platform_device *pdev)
 	orion_wdt.max_timeout = wdt_max_duration;
 	watchdog_init_timeout(&orion_wdt, heartbeat, &pdev->dev);
 
+	/* Let's make sure the watchdog is fully stopped */
+	orion_wdt_stop(&orion_wdt);
+
 	watchdog_set_nowayout(&orion_wdt, nowayout);
 	ret = watchdog_register_device(&orion_wdt);
 	if (ret)