Message ID | 1391707226-18258-6-git-send-email-ezequiel.garcia@free-electrons.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 02/06/2014 09:20 AM, Ezequiel Garcia wrote: > Having the watchdog initially fully stopped is important to avoid > any spurious watchdog triggers, in case the registers are not in > its reset state. > > Reviewed-by: Guenter Roeck <linux@roeck-us.net> > Tested-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> > Tested-by: Willy Tarreau <w@1wt.eu> > Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> > --- > drivers/watchdog/orion_wdt.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/watchdog/orion_wdt.c b/drivers/watchdog/orion_wdt.c > index 6746033..2dbeee9 100644 > --- a/drivers/watchdog/orion_wdt.c > +++ b/drivers/watchdog/orion_wdt.c > @@ -142,6 +142,9 @@ static int orion_wdt_probe(struct platform_device *pdev) > orion_wdt.max_timeout = wdt_max_duration; > watchdog_init_timeout(&orion_wdt, heartbeat, &pdev->dev); > > + /* Let's make sure the watchdog is fully stopped */ > + orion_wdt_stop(&orion_wdt); > + Actually we just had that in another driver, and I stumbled over it there. Problem with stopping the watchdog in probe unconditionally is that you can use it to defeat nowayout: unload the module, then load it again, and the watchdog is stopped even if nowayout is true. Is this really what you want ? Or, in other words, what is the problem you are trying to solve ? Thanks, Guenter > watchdog_set_nowayout(&orion_wdt, nowayout); > ret = watchdog_register_device(&orion_wdt); > if (ret) >
On Thu, Feb 06, 2014 at 06:02:56PM -0800, Guenter Roeck wrote: > On 02/06/2014 09:20 AM, Ezequiel Garcia wrote: > > Having the watchdog initially fully stopped is important to avoid > > any spurious watchdog triggers, in case the registers are not in > > its reset state. > > > > Reviewed-by: Guenter Roeck <linux@roeck-us.net> > > Tested-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> > > Tested-by: Willy Tarreau <w@1wt.eu> > > Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> > > --- > > drivers/watchdog/orion_wdt.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/watchdog/orion_wdt.c b/drivers/watchdog/orion_wdt.c > > index 6746033..2dbeee9 100644 > > --- a/drivers/watchdog/orion_wdt.c > > +++ b/drivers/watchdog/orion_wdt.c > > @@ -142,6 +142,9 @@ static int orion_wdt_probe(struct platform_device *pdev) > > orion_wdt.max_timeout = wdt_max_duration; > > watchdog_init_timeout(&orion_wdt, heartbeat, &pdev->dev); > > > > + /* Let's make sure the watchdog is fully stopped */ > > + orion_wdt_stop(&orion_wdt); > > + > > Actually we just had that in another driver, and I stumbled over it there. > > Problem with stopping the watchdog in probe unconditionally is that you can > use it to defeat nowayout: unload the module, then load it again, > and the watchdog is stopped even if nowayout is true. > Hm... I see. > Is this really what you want ? Or, in other words, what is the problem > you are trying to solve ? > Well, this is related to the discussion about the bootloader not reseting the watchdog properly, provoking spurious watchdog triggering. Jason Gunthorpe explained [1] that we needed a particular sequence: 1. Disable WDT 2. Clear bridge 3. Enable WDT We added the irq handling to satisfy (2), and the watchdog stop for (1). The watchdog stop was agreed specifically [2]. Ideas? [1] http://www.spinics.net/lists/arm-kernel/msg302340.html [2] http://www.spinics.net/lists/arm-kernel/msg302507.html
On 02/07/2014 02:40 AM, Ezequiel Garcia wrote: > On Thu, Feb 06, 2014 at 06:02:56PM -0800, Guenter Roeck wrote: >> On 02/06/2014 09:20 AM, Ezequiel Garcia wrote: >>> Having the watchdog initially fully stopped is important to avoid >>> any spurious watchdog triggers, in case the registers are not in >>> its reset state. >>> >>> Reviewed-by: Guenter Roeck <linux@roeck-us.net> >>> Tested-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> >>> Tested-by: Willy Tarreau <w@1wt.eu> >>> Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> >>> --- >>> drivers/watchdog/orion_wdt.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/drivers/watchdog/orion_wdt.c b/drivers/watchdog/orion_wdt.c >>> index 6746033..2dbeee9 100644 >>> --- a/drivers/watchdog/orion_wdt.c >>> +++ b/drivers/watchdog/orion_wdt.c >>> @@ -142,6 +142,9 @@ static int orion_wdt_probe(struct platform_device *pdev) >>> orion_wdt.max_timeout = wdt_max_duration; >>> watchdog_init_timeout(&orion_wdt, heartbeat, &pdev->dev); >>> >>> + /* Let's make sure the watchdog is fully stopped */ >>> + orion_wdt_stop(&orion_wdt); >>> + >> >> Actually we just had that in another driver, and I stumbled over it there. >> >> Problem with stopping the watchdog in probe unconditionally is that you can >> use it to defeat nowayout: unload the module, then load it again, >> and the watchdog is stopped even if nowayout is true. >> > > Hm... I see. > >> Is this really what you want ? Or, in other words, what is the problem >> you are trying to solve ? >> > > Well, this is related to the discussion about the bootloader not > reseting the watchdog properly, provoking spurious watchdog triggering. > > Jason Gunthorpe explained [1] that we needed a particular sequence: > > 1. Disable WDT > 2. Clear bridge > 3. Enable WDT > > We added the irq handling to satisfy (2), and the watchdog stop for (1). > > The watchdog stop was agreed specifically [2]. > > Ideas? > Other drivers assume that if the watchdog is running, it is supposed to be running. The more common approach in such cases is to ping the watchdog once to give userspace more time to get ready, but leave it enabled. So you could check if the watchdog is enabled, and if it was enabled re-enable it after initialization is complete (and maybe log a message stating that the watchdog is enabled). If you don't want to do that, and if you are defeating nowayout on purpose to fix a problem with a broken bootloader, you should at least put in comment describing the problem you are trying to solve, and that you accept breaking nowayout with your fix. Thanks, Guenter
On Fri, Feb 07, 2014 at 05:38:09AM -0800, Guenter Roeck wrote: > On 02/07/2014 02:40 AM, Ezequiel Garcia wrote: > > On Thu, Feb 06, 2014 at 06:02:56PM -0800, Guenter Roeck wrote: > >> On 02/06/2014 09:20 AM, Ezequiel Garcia wrote: > >>> Having the watchdog initially fully stopped is important to avoid > >>> any spurious watchdog triggers, in case the registers are not in > >>> its reset state. > >>> > >>> Reviewed-by: Guenter Roeck <linux@roeck-us.net> > >>> Tested-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> > >>> Tested-by: Willy Tarreau <w@1wt.eu> > >>> Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> > >>> --- > >>> drivers/watchdog/orion_wdt.c | 3 +++ > >>> 1 file changed, 3 insertions(+) > >>> > >>> diff --git a/drivers/watchdog/orion_wdt.c b/drivers/watchdog/orion_wdt.c > >>> index 6746033..2dbeee9 100644 > >>> --- a/drivers/watchdog/orion_wdt.c > >>> +++ b/drivers/watchdog/orion_wdt.c > >>> @@ -142,6 +142,9 @@ static int orion_wdt_probe(struct platform_device *pdev) > >>> orion_wdt.max_timeout = wdt_max_duration; > >>> watchdog_init_timeout(&orion_wdt, heartbeat, &pdev->dev); > >>> > >>> + /* Let's make sure the watchdog is fully stopped */ > >>> + orion_wdt_stop(&orion_wdt); > >>> + > >> > >> Actually we just had that in another driver, and I stumbled over it there. > >> > >> Problem with stopping the watchdog in probe unconditionally is that you can > >> use it to defeat nowayout: unload the module, then load it again, > >> and the watchdog is stopped even if nowayout is true. > >> > > > > Hm... I see. > > > >> Is this really what you want ? Or, in other words, what is the problem > >> you are trying to solve ? > >> > > > > Well, this is related to the discussion about the bootloader not > > reseting the watchdog properly, provoking spurious watchdog triggering. > > > > Jason Gunthorpe explained [1] that we needed a particular sequence: > > > > 1. Disable WDT > > 2. Clear bridge > > 3. Enable WDT > > > > We added the irq handling to satisfy (2), and the watchdog stop for (1). > > > > The watchdog stop was agreed specifically [2]. > > > > Ideas? > > > > Other drivers assume that if the watchdog is running, it is supposed > to be running. The more common approach in such cases is to ping the > watchdog once to give userspace more time to get ready, but leave > it enabled. So you could check if the watchdog is enabled, and if > it was enabled re-enable it after initialization is complete > (and maybe log a message stating that the watchdog is enabled). > > If you don't want to do that, and if you are defeating nowayout > on purpose to fix a problem with a broken bootloader, > you should at least put in comment describing the problem you are > trying to solve, and that you accept breaking nowayout with your fix. > I'm not fond of not having "nowayout" option on our driver, given I'm sure it's a watchdog feature for a good reason. On the other side, I can't see how can we distinguish a previously and explicitly enabled watchdog, from a spurious enable by broken bootloader. Jason, what do you say?
On Fri, Feb 07, 2014 at 12:17:28PM -0300, Ezequiel Garcia wrote: > On Fri, Feb 07, 2014 at 05:38:09AM -0800, Guenter Roeck wrote: > > On 02/07/2014 02:40 AM, Ezequiel Garcia wrote: > > > On Thu, Feb 06, 2014 at 06:02:56PM -0800, Guenter Roeck wrote: > > >> On 02/06/2014 09:20 AM, Ezequiel Garcia wrote: > > >>> Having the watchdog initially fully stopped is important to avoid > > >>> any spurious watchdog triggers, in case the registers are not in > > >>> its reset state. > > >>> > > >>> Reviewed-by: Guenter Roeck <linux@roeck-us.net> > > >>> Tested-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> > > >>> Tested-by: Willy Tarreau <w@1wt.eu> > > >>> Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> > > >>> --- > > >>> drivers/watchdog/orion_wdt.c | 3 +++ > > >>> 1 file changed, 3 insertions(+) > > >>> > > >>> diff --git a/drivers/watchdog/orion_wdt.c b/drivers/watchdog/orion_wdt.c > > >>> index 6746033..2dbeee9 100644 > > >>> --- a/drivers/watchdog/orion_wdt.c > > >>> +++ b/drivers/watchdog/orion_wdt.c > > >>> @@ -142,6 +142,9 @@ static int orion_wdt_probe(struct platform_device *pdev) > > >>> orion_wdt.max_timeout = wdt_max_duration; > > >>> watchdog_init_timeout(&orion_wdt, heartbeat, &pdev->dev); > > >>> > > >>> + /* Let's make sure the watchdog is fully stopped */ > > >>> + orion_wdt_stop(&orion_wdt); > > >>> + > > >> > > >> Actually we just had that in another driver, and I stumbled over it there. > > >> > > >> Problem with stopping the watchdog in probe unconditionally is that you can > > >> use it to defeat nowayout: unload the module, then load it again, > > >> and the watchdog is stopped even if nowayout is true. > > >> How often would a user legitimately want to unload/load the watchdog module? > > > > > > Hm... I see. > > > > > >> Is this really what you want ? Or, in other words, what is the problem > > >> you are trying to solve ? > > >> > > > > > > Well, this is related to the discussion about the bootloader not > > > reseting the watchdog properly, provoking spurious watchdog triggering. > > > > > > Jason Gunthorpe explained [1] that we needed a particular sequence: > > > > > > 1. Disable WDT > > > 2. Clear bridge > > > 3. Enable WDT > > > > > > We added the irq handling to satisfy (2), and the watchdog stop for (1). > > > > > > The watchdog stop was agreed specifically [2]. > > > > > > Ideas? > > > > > > > Other drivers assume that if the watchdog is running, it is supposed > > to be running. The more common approach in such cases is to ping the > > watchdog once to give userspace more time to get ready, but leave > > it enabled. So you could check if the watchdog is enabled, and if > > it was enabled re-enable it after initialization is complete > > (and maybe log a message stating that the watchdog is enabled). > > > > If you don't want to do that, and if you are defeating nowayout > > on purpose to fix a problem with a broken bootloader, > > you should at least put in comment describing the problem you are > > trying to solve, and that you accept breaking nowayout with your fix. Yes, this should be commented. > I'm not fond of not having "nowayout" option on our driver, given I'm sure > it's a watchdog feature for a good reason. > > On the other side, I can't see how can we distinguish a previously > and explicitly enabled watchdog, from a spurious enable by broken bootloader. How about we just don't define module_exit() and leave a comment as such? It's not unprecedented, a couple of the atm drivers are explicitly setup like this (uPD98402.c, zatm.c, eni.c). thx, Jason.
On Fri, Feb 07, 2014 at 10:44:53AM -0500, Jason Cooper wrote: > On Fri, Feb 07, 2014 at 12:17:28PM -0300, Ezequiel Garcia wrote: > > On Fri, Feb 07, 2014 at 05:38:09AM -0800, Guenter Roeck wrote: > > > On 02/07/2014 02:40 AM, Ezequiel Garcia wrote: > > > > On Thu, Feb 06, 2014 at 06:02:56PM -0800, Guenter Roeck wrote: > > > >> On 02/06/2014 09:20 AM, Ezequiel Garcia wrote: > > > >>> Having the watchdog initially fully stopped is important to avoid > > > >>> any spurious watchdog triggers, in case the registers are not in > > > >>> its reset state. > > > >>> > > > >>> Reviewed-by: Guenter Roeck <linux@roeck-us.net> > > > >>> Tested-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> > > > >>> Tested-by: Willy Tarreau <w@1wt.eu> > > > >>> Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> > > > >>> --- > > > >>> drivers/watchdog/orion_wdt.c | 3 +++ > > > >>> 1 file changed, 3 insertions(+) > > > >>> > > > >>> diff --git a/drivers/watchdog/orion_wdt.c b/drivers/watchdog/orion_wdt.c > > > >>> index 6746033..2dbeee9 100644 > > > >>> --- a/drivers/watchdog/orion_wdt.c > > > >>> +++ b/drivers/watchdog/orion_wdt.c > > > >>> @@ -142,6 +142,9 @@ static int orion_wdt_probe(struct platform_device *pdev) > > > >>> orion_wdt.max_timeout = wdt_max_duration; > > > >>> watchdog_init_timeout(&orion_wdt, heartbeat, &pdev->dev); > > > >>> > > > >>> + /* Let's make sure the watchdog is fully stopped */ > > > >>> + orion_wdt_stop(&orion_wdt); > > > >>> + > > > >> > > > >> Actually we just had that in another driver, and I stumbled over it there. > > > >> > > > >> Problem with stopping the watchdog in probe unconditionally is that you can > > > >> use it to defeat nowayout: unload the module, then load it again, > > > >> and the watchdog is stopped even if nowayout is true. > > > >> > > How often would a user legitimately want to unload/load the watchdog > module? > Not sure what you are saying here. If you don't think the nowayout option should be supported, drop it. Don't claim it is supported when it isn't. > > > > > > > > Hm... I see. > > > > > > > >> Is this really what you want ? Or, in other words, what is the problem > > > >> you are trying to solve ? > > > >> > > > > > > > > Well, this is related to the discussion about the bootloader not > > > > reseting the watchdog properly, provoking spurious watchdog triggering. > > > > > > > > Jason Gunthorpe explained [1] that we needed a particular sequence: > > > > > > > > 1. Disable WDT > > > > 2. Clear bridge > > > > 3. Enable WDT > > > > > > > > We added the irq handling to satisfy (2), and the watchdog stop for (1). > > > > > > > > The watchdog stop was agreed specifically [2]. > > > > > > > > Ideas? > > > > > > > > > > Other drivers assume that if the watchdog is running, it is supposed > > > to be running. The more common approach in such cases is to ping the > > > watchdog once to give userspace more time to get ready, but leave > > > it enabled. So you could check if the watchdog is enabled, and if > > > it was enabled re-enable it after initialization is complete > > > (and maybe log a message stating that the watchdog is enabled). > > > > > > If you don't want to do that, and if you are defeating nowayout > > > on purpose to fix a problem with a broken bootloader, > > > you should at least put in comment describing the problem you are > > > trying to solve, and that you accept breaking nowayout with your fix. > > Yes, this should be commented. > > > I'm not fond of not having "nowayout" option on our driver, given I'm sure > > it's a watchdog feature for a good reason. > > > > On the other side, I can't see how can we distinguish a previously > > and explicitly enabled watchdog, from a spurious enable by broken bootloader. > > How about we just don't define module_exit() and leave a comment as > such? It's not unprecedented, a couple of the atm drivers are > explicitly setup like this (uPD98402.c, zatm.c, eni.c). > Or don't support tristate in the first place. There was some argument from others earlier that a watchdog should never be optional. Or drop the nowayout option from the driver. There is one problem, though, if you don't support a pre-enabled watchdog: If the system dies before the watchdog application starts running, it will hang. This is the reason why the watchdog is enabled on purpose by some boot loaders. Guenter
On Fri, Feb 07, 2014 at 07:40:45AM -0300, Ezequiel Garcia wrote: > Well, this is related to the discussion about the bootloader not > reseting the watchdog properly, provoking spurious watchdog triggering. > > Jason Gunthorpe explained [1] that we needed a particular sequence: > > 1. Disable WDT > 2. Clear bridge > 3. Enable WDT > > We added the irq handling to satisfy (2), and the watchdog stop for (1). The issue here is the driver configures two 'machine kill' elements: the PANIC IRQ and the RstOut setup. Before configuring either of those the driver needs to ensure that any old watchdog events are cleared out of the HW. We must not get a spurious event. I agree not disabling an already functional and properly configured counter from the bootloader is desirable. So lets break it down a bit.. 1) The IRQ: It looks like the cause bit latches high on watchdog timer expiration but has no side effect unless it is unmasked. The new IRQ flow code ensures the bit is cleared during request_irq so no old events can trigger the IRQ. Thus it is solved now. 2) The RstOut: It is a bit unspecific, but I think this signal latches high when any unmasked rst event occurs and stays high until the RstIn pin is asserted. So we don't need to worry about clearing old events from here Alternatively the WDRstOutEn routes the Cause bit into the RstOut pin. 3) The timer itself: The WDT is just a general timer with an optional hookup to the rst control. If it is harmlessly counting but not resetting we need to stop that before enabling rst out. So, how about this for psuedo-code in probe: if (readl(RSTOUTn) & WDRstOutEn) { /* Watchdog is configured and may be down counting, don't touch it */ request_irq(..); } else { /* Watchdog is not configured, fully disable the timer and configure for watchdog operation. */ disable_watchdog(); request_irq(); writel(RSTOUTn), .. WDRstOutEn); } Jason
On Fri, Feb 07, 2014 at 10:43:14AM -0700, Jason Gunthorpe wrote: > On Fri, Feb 07, 2014 at 07:40:45AM -0300, Ezequiel Garcia wrote: > > > Well, this is related to the discussion about the bootloader not > > reseting the watchdog properly, provoking spurious watchdog triggering. > > > > Jason Gunthorpe explained [1] that we needed a particular sequence: > > > > 1. Disable WDT > > 2. Clear bridge > > 3. Enable WDT > > > > We added the irq handling to satisfy (2), and the watchdog stop for (1). > > The issue here is the driver configures two 'machine kill' elements: > the PANIC IRQ and the RstOut setup. > > Before configuring either of those the driver needs to ensure that any > old watchdog events are cleared out of the HW. We must not get a > spurious event. > > I agree not disabling an already functional and properly configured > counter from the bootloader is desirable. > > So lets break it down a bit.. > > 1) The IRQ: > It looks like the cause bit latches high on watchdog timer > expiration but has no side effect unless it is unmasked. > > The new IRQ flow code ensures the bit is cleared during request_irq > so no old events can trigger the IRQ. Thus it is solved now. > Agreed. > 3) The timer itself: > The WDT is just a general timer with an optional hookup to the > rst control. If it is harmlessly counting but not resetting we need > to stop that before enabling rst out. > Actually, the current flow is to: 1. Disable rst out and then disable the counter, in probe(). 2. Enable the counter, and then enable rst out, in start(). > So, how about this for psuedo-code in probe: > > if (readl(RSTOUTn) & WDRstOutEn) > { > /* Watchdog is configured and may be down counting, > don't touch it */ > request_irq(..); > } > else > { > /* Watchdog is not configured, fully disable the timer > and configure for watchdog operation. */ > disable_watchdog(); > request_irq(); > writel(RSTOUTn), .. WDRstOutEn); > } > Sounds good, although it seems to me it's actually simpler: /* Let's make sure the watchdog is fully stopped, unless * it's explicitly enabled and running */ if ( !(wdt_rst_out_en && wdt_timer_enabled) ) { watchdog_stop(); }
Quoting Ezequiel Garcia <ezequiel.garcia@free-electrons.com>: > On Fri, Feb 07, 2014 at 10:43:14AM -0700, Jason Gunthorpe wrote: >> On Fri, Feb 07, 2014 at 07:40:45AM -0300, Ezequiel Garcia wrote: >> >> > Well, this is related to the discussion about the bootloader not >> > reseting the watchdog properly, provoking spurious watchdog triggering. >> > >> > Jason Gunthorpe explained [1] that we needed a particular sequence: >> > >> > 1. Disable WDT >> > 2. Clear bridge >> > 3. Enable WDT >> > >> > We added the irq handling to satisfy (2), and the watchdog stop for (1). >> >> The issue here is the driver configures two 'machine kill' elements: >> the PANIC IRQ and the RstOut setup. >> >> Before configuring either of those the driver needs to ensure that any >> old watchdog events are cleared out of the HW. We must not get a >> spurious event. >> >> I agree not disabling an already functional and properly configured >> counter from the bootloader is desirable. >> >> So lets break it down a bit.. >> >> 1) The IRQ: >> It looks like the cause bit latches high on watchdog timer >> expiration but has no side effect unless it is unmasked. >> >> The new IRQ flow code ensures the bit is cleared during request_irq >> so no old events can trigger the IRQ. Thus it is solved now. >> > > Agreed. > >> 3) The timer itself: >> The WDT is just a general timer with an optional hookup to the >> rst control. If it is harmlessly counting but not resetting we need >> to stop that before enabling rst out. >> > > Actually, the current flow is to: > > 1. Disable rst out and then disable the counter, in probe(). > > 2. Enable the counter, and then enable rst out, in start(). > >> So, how about this for psuedo-code in probe: >> >> if (readl(RSTOUTn) & WDRstOutEn) >> { >> /* Watchdog is configured and may be down counting, >> don't touch it */ >> request_irq(..); >> } >> else >> { >> /* Watchdog is not configured, fully disable the timer >> and configure for watchdog operation. */ >> disable_watchdog(); >> request_irq(); >> writel(RSTOUTn), .. WDRstOutEn); >> } >> > > Sounds good, although it seems to me it's actually simpler: > > /* Let's make sure the watchdog is fully stopped, unless > * it's explicitly enabled and running > */ > if ( !(wdt_rst_out_en && wdt_timer_enabled) ) { > watchdog_stop(); > } > if (!wdt_rst_out_en || !wdt_timer_enabled) watchdog_stop(); seems to be a bit easier to understand. Also watch out for coding style issues. Above code snippet has multiple violations ;-). Guenter
On Mon, Feb 10, 2014 at 10:48:50AM -0600, linux@roeck-us.net wrote: > Quoting Ezequiel Garcia <ezequiel.garcia@free-electrons.com>: > > > On Fri, Feb 07, 2014 at 10:43:14AM -0700, Jason Gunthorpe wrote: > >> On Fri, Feb 07, 2014 at 07:40:45AM -0300, Ezequiel Garcia wrote: > >> > >> > Well, this is related to the discussion about the bootloader not > >> > reseting the watchdog properly, provoking spurious watchdog triggering. > >> > > >> > Jason Gunthorpe explained [1] that we needed a particular sequence: > >> > > >> > 1. Disable WDT > >> > 2. Clear bridge > >> > 3. Enable WDT > >> > > >> > We added the irq handling to satisfy (2), and the watchdog stop for (1). > >> > >> The issue here is the driver configures two 'machine kill' elements: > >> the PANIC IRQ and the RstOut setup. > >> > >> Before configuring either of those the driver needs to ensure that any > >> old watchdog events are cleared out of the HW. We must not get a > >> spurious event. > >> > >> I agree not disabling an already functional and properly configured > >> counter from the bootloader is desirable. > >> > >> So lets break it down a bit.. > >> > >> 1) The IRQ: > >> It looks like the cause bit latches high on watchdog timer > >> expiration but has no side effect unless it is unmasked. > >> > >> The new IRQ flow code ensures the bit is cleared during request_irq > >> so no old events can trigger the IRQ. Thus it is solved now. > >> > > > > Agreed. > > > >> 3) The timer itself: > >> The WDT is just a general timer with an optional hookup to the > >> rst control. If it is harmlessly counting but not resetting we need > >> to stop that before enabling rst out. > >> > > > > Actually, the current flow is to: > > > > 1. Disable rst out and then disable the counter, in probe(). > > > > 2. Enable the counter, and then enable rst out, in start(). > > > >> So, how about this for psuedo-code in probe: > >> > >> if (readl(RSTOUTn) & WDRstOutEn) > >> { > >> /* Watchdog is configured and may be down counting, > >> don't touch it */ > >> request_irq(..); > >> } > >> else > >> { > >> /* Watchdog is not configured, fully disable the timer > >> and configure for watchdog operation. */ > >> disable_watchdog(); > >> request_irq(); > >> writel(RSTOUTn), .. WDRstOutEn); > >> } > >> > > > > Sounds good, although it seems to me it's actually simpler: > > > > /* Let's make sure the watchdog is fully stopped, unless > > * it's explicitly enabled and running > > */ > > if ( !(wdt_rst_out_en && wdt_timer_enabled) ) { > > watchdog_stop(); > > } > > > > if (!wdt_rst_out_en || !wdt_timer_enabled) > watchdog_stop(); > > seems to be a bit easier to understand. > Yeah, I was actually planning to have a orion_wdt_enabled() function to get the running and enabled status. Looks cleaner I think. if (!orion_wdt_enabled()) watchdog_stop(); So the idea is OK? I'll push the new series in a short while. Unfortunately, this change means I have to rebase almost all the series, because I introduce the orion watchdog struct :-(
Quoting Ezequiel Garcia <ezequiel.garcia@free-electrons.com>: > On Mon, Feb 10, 2014 at 10:48:50AM -0600, linux@roeck-us.net wrote: >> Quoting Ezequiel Garcia <ezequiel.garcia@free-electrons.com>: >> >> > On Fri, Feb 07, 2014 at 10:43:14AM -0700, Jason Gunthorpe wrote: >> >> On Fri, Feb 07, 2014 at 07:40:45AM -0300, Ezequiel Garcia wrote: >> >> >> >> > Well, this is related to the discussion about the bootloader not >> >> > reseting the watchdog properly, provoking spurious watchdog triggering. >> >> > >> >> > Jason Gunthorpe explained [1] that we needed a particular sequence: >> >> > >> >> > 1. Disable WDT >> >> > 2. Clear bridge >> >> > 3. Enable WDT >> >> > >> >> > We added the irq handling to satisfy (2), and the watchdog >> stop for (1). >> >> >> >> The issue here is the driver configures two 'machine kill' elements: >> >> the PANIC IRQ and the RstOut setup. >> >> >> >> Before configuring either of those the driver needs to ensure that any >> >> old watchdog events are cleared out of the HW. We must not get a >> >> spurious event. >> >> >> >> I agree not disabling an already functional and properly configured >> >> counter from the bootloader is desirable. >> >> >> >> So lets break it down a bit.. >> >> >> >> 1) The IRQ: >> >> It looks like the cause bit latches high on watchdog timer >> >> expiration but has no side effect unless it is unmasked. >> >> >> >> The new IRQ flow code ensures the bit is cleared during request_irq >> >> so no old events can trigger the IRQ. Thus it is solved now. >> >> >> > >> > Agreed. >> > >> >> 3) The timer itself: >> >> The WDT is just a general timer with an optional hookup to the >> >> rst control. If it is harmlessly counting but not resetting we need >> >> to stop that before enabling rst out. >> >> >> > >> > Actually, the current flow is to: >> > >> > 1. Disable rst out and then disable the counter, in probe(). >> > >> > 2. Enable the counter, and then enable rst out, in start(). >> > >> >> So, how about this for psuedo-code in probe: >> >> >> >> if (readl(RSTOUTn) & WDRstOutEn) >> >> { >> >> /* Watchdog is configured and may be down counting, >> >> don't touch it */ >> >> request_irq(..); >> >> } >> >> else >> >> { >> >> /* Watchdog is not configured, fully disable the timer >> >> and configure for watchdog operation. */ >> >> disable_watchdog(); >> >> request_irq(); >> >> writel(RSTOUTn), .. WDRstOutEn); >> >> } >> >> >> > >> > Sounds good, although it seems to me it's actually simpler: >> > >> > /* Let's make sure the watchdog is fully stopped, unless >> > * it's explicitly enabled and running >> > */ >> > if ( !(wdt_rst_out_en && wdt_timer_enabled) ) { >> > watchdog_stop(); >> > } >> > >> >> if (!wdt_rst_out_en || !wdt_timer_enabled) >> watchdog_stop(); >> >> seems to be a bit easier to understand. >> > > Yeah, I was actually planning to have a orion_wdt_enabled() function > to get the running and enabled status. Looks cleaner I think. > > if (!orion_wdt_enabled()) > watchdog_stop(); > > So the idea is OK? > Sounds good to me. > I'll push the new series in a short while. Unfortunately, this change > means I have to rebase almost all the series, because I introduce the > orion watchdog struct :-( Just trying to keep you busy :-) Guenter
diff --git a/drivers/watchdog/orion_wdt.c b/drivers/watchdog/orion_wdt.c index 6746033..2dbeee9 100644 --- a/drivers/watchdog/orion_wdt.c +++ b/drivers/watchdog/orion_wdt.c @@ -142,6 +142,9 @@ static int orion_wdt_probe(struct platform_device *pdev) orion_wdt.max_timeout = wdt_max_duration; watchdog_init_timeout(&orion_wdt, heartbeat, &pdev->dev); + /* Let's make sure the watchdog is fully stopped */ + orion_wdt_stop(&orion_wdt); + watchdog_set_nowayout(&orion_wdt, nowayout); ret = watchdog_register_device(&orion_wdt); if (ret)