Message ID | 1f2cd155057e5ab0cdb20a9a11614bbb09bb49ad.camel@yadro.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | add dual-boot support | expand |
On Wed, Aug 21, 2019 at 06:57:43PM +0300, Ivan Mikhaylov wrote: > Set WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION into WDT_CLEAR_TIMEOUT_STATUS > to clear out boot code source and re-enable access to the primary SPI flash > chip while booted via wdt2 from the alternate chip. > > AST2400 datasheet says: > "In the 2nd flash booting mode, all the address mapping to CS0# would be > re-directed to CS1#. And CS0# is not accessable under this mode. To access > CS0#, firmware should clear the 2nd boot mode register in the WDT2 status > register WDT30.bit[1]." Is there reason to not do this automatically when loading the module in alt-boot mode ? What means does userspace have to determine if CS0 or CS1 is active at any given time ? If there is reason to ever have CS1 active instead of CS0, what means would userspace have to enable it ? If userspace can not really determine if CS1 or CS0 is active, all it could ever do was to enable CS0 to be in a deterministic state. If so, it doesn't make sense to ever have CS1 active, and re-enabling CS0 could be automatic. Similar, if CS1 can ever be enabled, there is no means for userspace to ensure that some other application did not re-enable CS0 while it believes that CS1 is enabled. If there is no means for userspace to enable CS1, it can never be sure what is enabled (because some other entity may have enabled CS0 while userspace just thought that CS1 is still enabled). Again, the only means to guarantee a well defined state would be to explicitly enable CS0 and provive no means to enable CS1. Again, this could be done during boot, not requiring an explicit request from userspace. > > Signed-off-by: Ivan Mikhaylov <i.mikhaylov@yadro.com> > --- > drivers/watchdog/aspeed_wdt.c | 30 ++++++++++++++++++++++++++++++ > 1 file changed, 30 insertions(+) > > diff --git a/drivers/watchdog/aspeed_wdt.c b/drivers/watchdog/aspeed_wdt.c > index cc71861e033a..858e62f1c7ba 100644 > --- a/drivers/watchdog/aspeed_wdt.c > +++ b/drivers/watchdog/aspeed_wdt.c > @@ -53,6 +53,8 @@ MODULE_DEVICE_TABLE(of, aspeed_wdt_of_table); > #define WDT_CTRL_ENABLE BIT(0) > #define WDT_TIMEOUT_STATUS 0x10 > #define WDT_TIMEOUT_STATUS_BOOT_SECONDARY BIT(1) > +#define WDT_CLEAR_TIMEOUT_STATUS 0x14 > +#define WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION BIT(0) > > /* > * WDT_RESET_WIDTH controls the characteristics of the external pulse (if > @@ -165,6 +167,29 @@ static int aspeed_wdt_restart(struct watchdog_device *wdd, > return 0; > } > > +static ssize_t access_cs0_store(struct device *dev, > + struct device_attribute *attr, > + const char *buf, size_t size) > +{ > + struct aspeed_wdt *wdt = dev_get_drvdata(dev); > + > + if (unlikely(!wdt)) > + return -ENODEV; > + How would this ever happen, and how / where is drvdata set to NULL ? > + writel(WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION, > + wdt->base + WDT_CLEAR_TIMEOUT_STATUS); > + wdt->wdd.bootstatus |= WDIOF_EXTERN1; The variable reflects the _boot status_. It should not change after booting. > + > + return size; > +} > +static DEVICE_ATTR_WO(access_cs0); > + > +static struct attribute *bswitch_attrs[] = { > + &dev_attr_access_cs0.attr, > + NULL > +}; > +ATTRIBUTE_GROUPS(bswitch); > + > static const struct watchdog_ops aspeed_wdt_ops = { > .start = aspeed_wdt_start, > .stop = aspeed_wdt_stop, > @@ -223,6 +248,9 @@ static int aspeed_wdt_probe(struct platform_device *pdev) > > wdt->ctrl = WDT_CTRL_1MHZ_CLK; > > + if (of_property_read_bool(np, "aspeed,alt-boot")) > + wdt->wdd.groups = bswitch_groups; > + Why does this have to be separate to the existing evaluation of aspeed,alt-boot, and why does the existing code not work ? Also, is it guaranteed that this does not interfer with existing support for alt-boot ? > /* > * Control reset on a per-device basis to ensure the > * host is not affected by a BMC reboot > @@ -309,6 +337,8 @@ static int aspeed_wdt_probe(struct platform_device *pdev) > if (status & WDT_TIMEOUT_STATUS_BOOT_SECONDARY) > wdt->wdd.bootstatus = WDIOF_CARDRESET; > > + dev_set_drvdata(dev, wdt); > + > return devm_watchdog_register_device(dev, &wdt->wdd); > } > > -- > 2.20.1 > >
21.08.2019 19:32, Guenter Roeck wrote: > On Wed, Aug 21, 2019 at 06:57:43PM +0300, Ivan Mikhaylov wrote: >> Set WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION into WDT_CLEAR_TIMEOUT_STATUS >> to clear out boot code source and re-enable access to the primary SPI flash >> chip while booted via wdt2 from the alternate chip. >> >> AST2400 datasheet says: >> "In the 2nd flash booting mode, all the address mapping to CS0# would be >> re-directed to CS1#. And CS0# is not accessable under this mode. To access >> CS0#, firmware should clear the 2nd boot mode register in the WDT2 status >> register WDT30.bit[1]." > Is there reason to not do this automatically when loading the module > in alt-boot mode ? What means does userspace have to determine if CS0 > or CS1 is active at any given time ? If there is reason to ever have CS1 > active instead of CS0, what means would userspace have to enable it ? Yes, there is. The driver is loaded long before the filesystems are mounted. The filesystems, in the event of alternate/recovery boot, need to be mounted from the same chip that the kernel was booted. For one reason because the main chip at CS0 is most probably corrupt. If you clear that bit when driver is loaded, your software will not know that and will try to mount the wrong filesystems. The whole idea of ASPEED's switching chipselects is to have identical firmware in both chips, without the need to process the alternate boot state in any way except for indicating a successful boot and restoring access to CS0 when needed. The userspace can read bootstatus sysfs node to determine if an alternate boot has occured. With ASPEED, CS1 is activated automatically by wdt2 when system fails to boot from the primary flash chip (at CS0) and disable the watchdog to indicate a successful boot. When that happens, both CS0 and CS1 controls get routed in hardware to CS1 line, making the primary flash chip inaccessible. Depending on the architecture of the user-space software, it may choose to re-enable access to the primary chip via CS0 at different times. There must be a way to do so. > If userspace can not really determine if CS1 or CS0 is active, all it could > ever do was to enable CS0 to be in a deterministic state. If so, it doesn't > make sense to ever have CS1 active, and re-enabling CS0 could be automatic. > > Similar, if CS1 can ever be enabled, there is no means for userspace to ensure > that some other application did not re-enable CS0 while it believes that CS1 > is enabled. If there is no means for userspace to enable CS1, it can never be > sure what is enabled (because some other entity may have enabled CS0 while > userspace just thought that CS1 is still enabled). Again, the only means > to guarantee a well defined state would be to explicitly enable CS0 and provive > no means to enable CS1. Again, this could be done during boot, not requiring > an explicit request from userspace. Please understand that activation of CS1 in place of CS0 is NOT a software choice! >> + if (unlikely(!wdt)) >> + return -ENODEV; >> + > How would this ever happen, and how / where is drvdata set to NULL ? This is purely for robustness. Seeing a pointer obtained via a function accessed without first checking it for validity makes me nervous. This code most probably adds nothing at the assembly level. > >> + writel(WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION, >> + wdt->base + WDT_CLEAR_TIMEOUT_STATUS); >> + wdt->wdd.bootstatus |= WDIOF_EXTERN1; > The variable reflects the _boot status_. It should not change after booting. Is there any documentation that dictates that? All I could find is "bootstatus: status of the device after booting". That doesn't look to me like it absolutely can not change to reflect the updated status (that is, to reflect that the originally set up alternate CS routing has been reset to normal). If you absolutely disallow that, I think we could make 'access_cs0' readable instead, so it could report the current state of the boot code selection bit. Reverted, I suppose. That way 'access_cs0' would report 1 after 1 has been written to it (it wouldn't be possible to write a zero). > @@ -223,6 +248,9 @@ static int aspeed_wdt_probe(struct platform_device *pdev) > > wdt->ctrl = WDT_CTRL_1MHZ_CLK; > > + if (of_property_read_bool(np, "aspeed,alt-boot")) > + wdt->wdd.groups = bswitch_groups; > + > Why does this have to be separate to the existing evaluation of > aspeed,alt-boot, and why does the existing code not work ? > > Also, is it guaranteed that this does not interfer with existing > support for alt-boot ? I think Ivan will comment on this. With best regards, Alexander Amelkin, BIOS/BMC Team Lead, YADRO https://yadro.com
On Wed, Aug 21, 2019 at 08:42:24PM +0300, Alexander Amelkin wrote: > 21.08.2019 19:32, Guenter Roeck wrote: > > On Wed, Aug 21, 2019 at 06:57:43PM +0300, Ivan Mikhaylov wrote: > >> Set WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION into WDT_CLEAR_TIMEOUT_STATUS > >> to clear out boot code source and re-enable access to the primary SPI flash > >> chip while booted via wdt2 from the alternate chip. > >> > >> AST2400 datasheet says: > >> "In the 2nd flash booting mode, all the address mapping to CS0# would be > >> re-directed to CS1#. And CS0# is not accessable under this mode. To access > >> CS0#, firmware should clear the 2nd boot mode register in the WDT2 status > >> register WDT30.bit[1]." > > Is there reason to not do this automatically when loading the module > > in alt-boot mode ? What means does userspace have to determine if CS0 > > or CS1 is active at any given time ? If there is reason to ever have CS1 > > active instead of CS0, what means would userspace have to enable it ? > > Yes, there is. The driver is loaded long before the filesystems are mounted. The filesystems, in the event of alternate/recovery boot, need to be mounted from the same chip that the kernel was booted. For one reason because the main chip at CS0 is most probably corrupt. If you clear that bit when driver is loaded, your software will not know that and will try to mount the wrong filesystems. The whole idea of ASPEED's switching chipselects is to have identical firmware in both chips, without the need to process the alternate boot state in any way except for indicating a successful boot and restoring access to CS0 when needed. > > The userspace can read bootstatus sysfs node to determine if an alternate boot has occured. > > With ASPEED, CS1 is activated automatically by wdt2 when system fails to boot from the primary flash chip (at CS0) and disable the watchdog to indicate a successful boot. When that happens, both CS0 and CS1 controls get routed in hardware to CS1 line, making the primary flash chip inaccessible. Depending on the architecture of the user-space software, it may choose to re-enable access to the primary chip via CS0 at different times. There must be a way to do so. > So by activating cs0, userspace would essentially pull its own root file system from underneath itself ? > > If userspace can not really determine if CS1 or CS0 is active, all it could > > ever do was to enable CS0 to be in a deterministic state. If so, it doesn't > > make sense to ever have CS1 active, and re-enabling CS0 could be automatic. > > > > Similar, if CS1 can ever be enabled, there is no means for userspace to ensure > > that some other application did not re-enable CS0 while it believes that CS1 > > is enabled. If there is no means for userspace to enable CS1, it can never be > > sure what is enabled (because some other entity may have enabled CS0 while > > userspace just thought that CS1 is still enabled). Again, the only means > > to guarantee a well defined state would be to explicitly enable CS0 and provive > > no means to enable CS1. Again, this could be done during boot, not requiring > > an explicit request from userspace. > > Please understand that activation of CS1 in place of CS0 is NOT a software choice! > > > >> + if (unlikely(!wdt)) > >> + return -ENODEV; > >> + > > How would this ever happen, and how / where is drvdata set to NULL ? > > This is purely for robustness. Seeing a pointer obtained via a function accessed without first checking it for validity makes me nervous. > This is not how kernel code is commonly written. Sure, we could add similar checks to each sysfs access code in the kernel, blowing up its size significantly. I do not see a point of this. > This code most probably adds nothing at the assembly level. > That seems quite unlikely. Please demonstrate. > > > >> + writel(WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION, > >> + wdt->base + WDT_CLEAR_TIMEOUT_STATUS); > >> + wdt->wdd.bootstatus |= WDIOF_EXTERN1; > > The variable reflects the _boot status_. It should not change after booting. > Is there any documentation that dictates that? All I could find is > > "bootstatus: status of the device after booting". That doesn't look to me like it absolutely can not change to reflect the updated status (that is, to reflect that the originally set up alternate CS routing has been reset to normal). > You choose to interpret "after booting" in a kind of novel way, which I find a bit disturbing. I am not really sure how else to describe "boot status" in a way that does not permit such reinterpratation of the term. On top of that, how specifically would "WDIOF_EXTERN1" reflect what you claim it does ? Not only you are hijacking bootstatus9 (which is supposed to describe the reason for a reboot), you are also hijacking WDIOF_EXTERN1. That seems highly arbitrary to me, and is not really how an API/ABI should be used. Guenter > If you absolutely disallow that, I think we could make 'access_cs0' readable instead, so it could report the current state of the boot code selection bit. Reverted, I suppose. That way 'access_cs0' would report 1 after 1 has been written to it (it wouldn't be possible to write a zero). > > > @@ -223,6 +248,9 @@ static int aspeed_wdt_probe(struct platform_device *pdev) > > > > wdt->ctrl = WDT_CTRL_1MHZ_CLK; > > > > + if (of_property_read_bool(np, "aspeed,alt-boot")) > > + wdt->wdd.groups = bswitch_groups; > > + > > Why does this have to be separate to the existing evaluation of > > aspeed,alt-boot, and why does the existing code not work ? > > > > Also, is it guaranteed that this does not interfer with existing > > support for alt-boot ? > > I think Ivan will comment on this. > > With best regards, > Alexander Amelkin, > BIOS/BMC Team Lead, YADRO > https://yadro.com > >
On Wed, 2019-08-21 at 09:32 -0700, Guenter Roeck wrote: > > > + writel(WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION, > > + wdt->base + WDT_CLEAR_TIMEOUT_STATUS); > > + wdt->wdd.bootstatus |= WDIOF_EXTERN1; > > The variable reflects the _boot status_. It should not change after booting. Okay, then perhaps may we set 'status' handler for watchdog device and check 'status' file? Right now 'bootstatus' and 'status' are same because there is no handler for 'status'. > > + > > + return size; > > +} > > +static DEVICE_ATTR_WO(access_cs0); > > + > > +static struct attribute *bswitch_attrs[] = { > > + &dev_attr_access_cs0.attr, > > + NULL > > +}; > > +ATTRIBUTE_GROUPS(bswitch); > > + > > static const struct watchdog_ops aspeed_wdt_ops = { > > .start = aspeed_wdt_start, > > .stop = aspeed_wdt_stop, > > @@ -223,6 +248,9 @@ static int aspeed_wdt_probe(struct platform_device > > *pdev) > > > > wdt->ctrl = WDT_CTRL_1MHZ_CLK; > > > > + if (of_property_read_bool(np, "aspeed,alt-boot")) > > + wdt->wdd.groups = bswitch_groups; > > + > Why does this have to be separate to the existing evaluation of > aspeed,alt-boot, and why does the existing code not work ? > > Also, is it guaranteed that this does not interfer with existing > support for alt-boot ? It doesn't, it just provides for ast2400 switch to cs0 at side 1(cs1). Problem is that only one flash chip(side 1/cs1) is accessible on alternate boot, there is citation from the documentation in commit body. So if by some reason side 0 is corrupted, need to switch into alternate boot to cs1, do the load from it, drop that bit to make side 0 accessible and do the flash of first side. On ast2500/2600 this problem is solved already, in alternate boot there both flash chips are present. It's additional requirement for alternate boot on ast2400, to make the possibility to access at all side 0 flash chip after we boot to the alternate side.
On Thu, Aug 22, 2019 at 12:15:20PM +0300, Ivan Mikhaylov wrote: > On Wed, 2019-08-21 at 09:32 -0700, Guenter Roeck wrote: > > > > > + writel(WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION, > > > + wdt->base + WDT_CLEAR_TIMEOUT_STATUS); > > > + wdt->wdd.bootstatus |= WDIOF_EXTERN1; > > > > The variable reflects the _boot status_. It should not change after booting. > > Okay, then perhaps may we set 'status' handler for watchdog device and check > 'status' file? Right now 'bootstatus' and 'status' are same because there is no > handler for 'status'. > You would still have to redefine one of the status bits to mean something driver specific. You would also still have two different flags to read and control cs0 - to read the status, you would read an ioctl (or the status sysfs attribute), to write it you would write into access_cs0. I guess I must be missing something. What is the problem with using access_cs0 for both ? Guenter > > > + > > > + return size; > > > +} > > > +static DEVICE_ATTR_WO(access_cs0); > > > + > > > +static struct attribute *bswitch_attrs[] = { > > > + &dev_attr_access_cs0.attr, > > > + NULL > > > +}; > > > +ATTRIBUTE_GROUPS(bswitch); > > > + > > > static const struct watchdog_ops aspeed_wdt_ops = { > > > .start = aspeed_wdt_start, > > > .stop = aspeed_wdt_stop, > > > @@ -223,6 +248,9 @@ static int aspeed_wdt_probe(struct platform_device > > > *pdev) > > > > > > wdt->ctrl = WDT_CTRL_1MHZ_CLK; > > > > > > + if (of_property_read_bool(np, "aspeed,alt-boot")) > > > + wdt->wdd.groups = bswitch_groups; > > > + > > Why does this have to be separate to the existing evaluation of > > aspeed,alt-boot, and why does the existing code not work ? > > > > Also, is it guaranteed that this does not interfer with existing > > support for alt-boot ? > > It doesn't, it just provides for ast2400 switch to cs0 at side 1(cs1). Problem > is that only one flash chip(side 1/cs1) is accessible on alternate boot, there > is citation from the documentation in commit body. So if by some reason side 0 > is corrupted, need to switch into alternate boot to cs1, do the load from it, > drop that bit to make side 0 accessible and do the flash of first side. On > ast2500/2600 this problem is solved already, in alternate boot there both flash > chips are present. It's additional requirement for alternate boot on ast2400, to > make the possibility to access at all side 0 flash chip after we boot to the > alternate side. >
On Thu, 2019-08-22 at 06:55 -0700, Guenter Roeck wrote: > On Thu, Aug 22, 2019 at 12:15:20PM +0300, Ivan Mikhaylov wrote: > > On Wed, 2019-08-21 at 09:32 -0700, Guenter Roeck wrote: > > > > + writel(WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION, > > > > + wdt->base + WDT_CLEAR_TIMEOUT_STATUS); > > > > + wdt->wdd.bootstatus |= WDIOF_EXTERN1; > > > > > > The variable reflects the _boot status_. It should not change after > > > booting. > > > > Okay, then perhaps may we set 'status' handler for watchdog device and > > check > > 'status' file? Right now 'bootstatus' and 'status' are same because there is > > no > > handler for 'status'. > > > > You would still have to redefine one of the status bits to mean something > driver specific. You would also still have two different flags to read > and control cs0 - to read the status, you would read an ioctl (or the > status sysfs attribute), to write it you would write into access_cs0. > > I guess I must be missing something. What is the problem with using > access_cs0 for both ? > > Guenter > There is no problem, I'll do that way, thanks!
21.08.2019 21:10, Guenter Roeck wrote: > On Wed, Aug 21, 2019 at 08:42:24PM +0300, Alexander Amelkin wrote: >> 21.08.2019 19:32, Guenter Roeck wrote: >>> On Wed, Aug 21, 2019 at 06:57:43PM +0300, Ivan Mikhaylov wrote: >>>> Set WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION into WDT_CLEAR_TIMEOUT_STATUS >>>> to clear out boot code source and re-enable access to the primary SPI flash >>>> chip while booted via wdt2 from the alternate chip. >>>> >>>> AST2400 datasheet says: >>>> "In the 2nd flash booting mode, all the address mapping to CS0# would be >>>> re-directed to CS1#. And CS0# is not accessable under this mode. To access >>>> CS0#, firmware should clear the 2nd boot mode register in the WDT2 status >>>> register WDT30.bit[1]." >>> Is there reason to not do this automatically when loading the module >>> in alt-boot mode ? What means does userspace have to determine if CS0 >>> or CS1 is active at any given time ? If there is reason to ever have CS1 >>> active instead of CS0, what means would userspace have to enable it ? >> Yes, there is. The driver is loaded long before the filesystems are mounted. >> The filesystems, in the event of alternate/recovery boot, need to be mounted >> from the same chip that the kernel was booted. For one reason because the main >> chip at CS0 is most probably corrupt. If you clear that bit when driver is >> loaded, your software will not know that and will try to mount the wrong >> filesystems. The whole idea of ASPEED's switching chipselects is to have >> identical firmware in both chips, without the need to process the alternate >> boot state in any way except for indicating a successful boot and restoring >> access to CS0 when needed. >> >> The userspace can read bootstatus sysfs node to determine if an alternate >> boot has occured. >> >> With ASPEED, CS1 is activated automatically by wdt2 when system fails to boot >> from the primary flash chip (at CS0) and disable the watchdog to indicate a >> successful boot. When that happens, both CS0 and CS1 controls get routed in >> hardware to CS1 line, making the primary flash chip inaccessible. Depending >> on the architecture of the user-space software, it may choose to re-enable >> access to the primary chip via CS0 at different times. There must be a way to do so. >> > So by activating cs0, userspace would essentially pull its own root file system > from underneath itself ? Exactly. That's why for alternate boot the firmware would usually copy all filesystems to memory and mount from there. Some embedded systems do that always, regardless of which chip they boot from. However, to be able to recover the main flash chip, the system needs CS0 to function as such (not as CS1). That's why this control is needed. As Ivan mentioned, for AST2500 and the upcoming AST2600 the behavior is slightly different. They don't just connect both CS controls to CS1 but instead swap them so the primary chip becomes secondary from the software point of view. The means to restore the normal wiring may still be needed. > >> This code most probably adds nothing at the assembly level. >> > That seems quite unlikely. Please demonstrate. Yes, you were right. It adds 7 instructions. We'll drop the check. It's just my DO-178 background, I add 'robustness' checks everywhere. >>>> + writel(WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION, >>>> + wdt->base + WDT_CLEAR_TIMEOUT_STATUS); >>>> + wdt->wdd.bootstatus |= WDIOF_EXTERN1; >>> The variable reflects the _boot status_. It should not change after booting. >> Is there any documentation that dictates that? All I could find is >> >> "bootstatus: status of the device after booting". That doesn't look to me like it absolutely can not change to reflect the updated status (that is, to reflect that the originally set up alternate CS routing has been reset to normal). >> > You choose to interpret "after booting" in a kind of novel way, > which I find a bit disturbing. I am not really sure how else to > describe "boot status" in a way that does not permit such > reinterpratation of the term. How about "Reflects reasons that caused a reboot, remains constant until the next boot" ? > On top of that, how specifically would "WDIOF_EXTERN1" reflect > what you claim it does ? Not only you are hijacking bootstatus9 > (which is supposed to describe the reason for a reboot), you > are also hijacking WDIOF_EXTERN1. That seems highly arbitrary > to me, and is not really how an API/ABI should be used. We used WDIOF_EXTERN1 because: 1. We thought that bootstatus _can_ change 2. We thought that adding extra bits wouldn't be appreciated Now as you clarified that assumption 1 was wrong we are going to implement status as I proposed earlier: > >> I think we could make 'access_cs0' readable instead, so it could report the >> current state of the boot code selection bit. Reverted, I suppose. That >> way 'access_cs0' would report 1 after 1 has been written to it (it wouldn't >> be possible to write a zero). With best regards, Alexander Amelkin, BIOS/BMC Team Lead, YADRO https://yadro.com
On Thu, Aug 22, 2019 at 05:36:21PM +0300, Alexander Amelkin wrote: > 21.08.2019 21:10, Guenter Roeck wrote: > > On Wed, Aug 21, 2019 at 08:42:24PM +0300, Alexander Amelkin wrote: > >> 21.08.2019 19:32, Guenter Roeck wrote: > >>> On Wed, Aug 21, 2019 at 06:57:43PM +0300, Ivan Mikhaylov wrote: > >>>> Set WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION into WDT_CLEAR_TIMEOUT_STATUS > >>>> to clear out boot code source and re-enable access to the primary SPI flash > >>>> chip while booted via wdt2 from the alternate chip. > >>>> > >>>> AST2400 datasheet says: > >>>> "In the 2nd flash booting mode, all the address mapping to CS0# would be > >>>> re-directed to CS1#. And CS0# is not accessable under this mode. To access > >>>> CS0#, firmware should clear the 2nd boot mode register in the WDT2 status > >>>> register WDT30.bit[1]." > >>> Is there reason to not do this automatically when loading the module > >>> in alt-boot mode ? What means does userspace have to determine if CS0 > >>> or CS1 is active at any given time ? If there is reason to ever have CS1 > >>> active instead of CS0, what means would userspace have to enable it ? > >> Yes, there is. The driver is loaded long before the filesystems are mounted. > >> The filesystems, in the event of alternate/recovery boot, need to be mounted > >> from the same chip that the kernel was booted. For one reason because the main > >> chip at CS0 is most probably corrupt. If you clear that bit when driver is > >> loaded, your software will not know that and will try to mount the wrong > >> filesystems. The whole idea of ASPEED's switching chipselects is to have > >> identical firmware in both chips, without the need to process the alternate > >> boot state in any way except for indicating a successful boot and restoring > >> access to CS0 when needed. > >> > >> The userspace can read bootstatus sysfs node to determine if an alternate > >> boot has occured. > >> > >> With ASPEED, CS1 is activated automatically by wdt2 when system fails to boot > >> from the primary flash chip (at CS0) and disable the watchdog to indicate a > >> successful boot. When that happens, both CS0 and CS1 controls get routed in > >> hardware to CS1 line, making the primary flash chip inaccessible. Depending > >> on the architecture of the user-space software, it may choose to re-enable > >> access to the primary chip via CS0 at different times. There must be a way to do so. > >> > > So by activating cs0, userspace would essentially pull its own root file system > > from underneath itself ? > > Exactly. That's why for alternate boot the firmware would usually copy > all filesystems to memory and mount from there. Some embedded systems > do that always, regardless of which chip they boot from. > That is different, though, to what you said earlier. Linux would then start with a clean file system, and not need access to the file system in cs1 at all. Clearing the flag when starting the driver would then be ok. > However, to be able to recover the main flash chip, the system needs CS0 > to function as such (not as CS1). That's why this control is needed. > If what you said is correct, not really. It should be fine and create more predictive behavior if the probe function selects cs0 automatically. Guenter > As Ivan mentioned, for AST2500 and the upcoming AST2600 the behavior > is slightly different. They don't just connect both CS controls to CS1 but instead > swap them so the primary chip becomes secondary from the software point > of view. The means to restore the normal wiring may still be needed. > > > > >> This code most probably adds nothing at the assembly level. > >> > > That seems quite unlikely. Please demonstrate. > > Yes, you were right. It adds 7 instructions. We'll drop the check. > It's just my DO-178 background, I add 'robustness' checks everywhere. > > >>>> + writel(WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION, > >>>> + wdt->base + WDT_CLEAR_TIMEOUT_STATUS); > >>>> + wdt->wdd.bootstatus |= WDIOF_EXTERN1; > >>> The variable reflects the _boot status_. It should not change after booting. > >> Is there any documentation that dictates that? All I could find is > >> > >> "bootstatus: status of the device after booting". That doesn't look to me like it absolutely can not change to reflect the updated status (that is, to reflect that the originally set up alternate CS routing has been reset to normal). > >> > > You choose to interpret "after booting" in a kind of novel way, > > which I find a bit disturbing. I am not really sure how else to > > describe "boot status" in a way that does not permit such > > reinterpratation of the term. > > How about "Reflects reasons that caused a reboot, remains constant until the next boot" ? > > > On top of that, how specifically would "WDIOF_EXTERN1" reflect > > what you claim it does ? Not only you are hijacking bootstatus9 > > (which is supposed to describe the reason for a reboot), you > > are also hijacking WDIOF_EXTERN1. That seems highly arbitrary > > to me, and is not really how an API/ABI should be used. > > We used WDIOF_EXTERN1 because: > > 1. We thought that bootstatus _can_ change > > 2. We thought that adding extra bits wouldn't be appreciated > > Now as you clarified that assumption 1 was wrong we are going to implement status as I proposed earlier: > > > > >> I think we could make 'access_cs0' readable instead, so it could report the > >> current state of the boot code selection bit. Reverted, I suppose. That > >> way 'access_cs0' would report 1 after 1 has been written to it (it wouldn't > >> be possible to write a zero). > > With best regards, > Alexander Amelkin, > BIOS/BMC Team Lead, YADRO > https://yadro.com > >
22.08.2019 19:01, Guenter Roeck wrote: > On Thu, Aug 22, 2019 at 05:36:21PM +0300, Alexander Amelkin wrote: >> 21.08.2019 21:10, Guenter Roeck wrote: >>> On Wed, Aug 21, 2019 at 08:42:24PM +0300, Alexander Amelkin wrote: >>>> 21.08.2019 19:32, Guenter Roeck wrote: >>>>> On Wed, Aug 21, 2019 at 06:57:43PM +0300, Ivan Mikhaylov wrote: >>>>>> Set WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION into WDT_CLEAR_TIMEOUT_STATUS >>>>>> to clear out boot code source and re-enable access to the primary SPI flash >>>>>> chip while booted via wdt2 from the alternate chip. >>>>>> >>>>>> AST2400 datasheet says: >>>>>> "In the 2nd flash booting mode, all the address mapping to CS0# would be >>>>>> re-directed to CS1#. And CS0# is not accessable under this mode. To access >>>>>> CS0#, firmware should clear the 2nd boot mode register in the WDT2 status >>>>>> register WDT30.bit[1]." >>>>> Is there reason to not do this automatically when loading the module >>>>> in alt-boot mode ? What means does userspace have to determine if CS0 >>>>> or CS1 is active at any given time ? If there is reason to ever have CS1 >>>>> active instead of CS0, what means would userspace have to enable it ? >>>> Yes, there is. The driver is loaded long before the filesystems are mounted. >>>> The filesystems, in the event of alternate/recovery boot, need to be mounted >>>> from the same chip that the kernel was booted. For one reason because the main >>>> chip at CS0 is most probably corrupt. If you clear that bit when driver is >>>> loaded, your software will not know that and will try to mount the wrong >>>> filesystems. The whole idea of ASPEED's switching chipselects is to have >>>> identical firmware in both chips, without the need to process the alternate >>>> boot state in any way except for indicating a successful boot and restoring >>>> access to CS0 when needed. >>>> >>>> The userspace can read bootstatus sysfs node to determine if an alternate >>>> boot has occured. >>>> >>>> With ASPEED, CS1 is activated automatically by wdt2 when system fails to boot >>>> from the primary flash chip (at CS0) and disable the watchdog to indicate a >>>> successful boot. When that happens, both CS0 and CS1 controls get routed in >>>> hardware to CS1 line, making the primary flash chip inaccessible. Depending >>>> on the architecture of the user-space software, it may choose to re-enable >>>> access to the primary chip via CS0 at different times. There must be a way to do so. >>>> >>> So by activating cs0, userspace would essentially pull its own root file system >>> from underneath itself ? >> Exactly. That's why for alternate boot the firmware would usually copy >> all filesystems to memory and mount from there. Some embedded systems >> do that always, regardless of which chip they boot from. >> > That is different, though, to what you said earlier. Linux would then start > with a clean file system, and not need access to the file system in cs1 at all. > Clearing the flag when starting the driver would then be ok. I don't see how that is different. Copying to memory may be done by startup scripts that run after the driver is loaded, so they need to read the data from the chip they are booted from. That is how it is done in OpenBMC, for instance. Other flavors of firmware may choose a different approach. Having the control available via sysfs gives more flexibility. >> However, to be able to recover the main flash chip, the system needs CS0 >> to function as such (not as CS1). That's why this control is needed. >> > If what you said is correct, not really. It should be fine and create more > predictive behavior if the probe function selects cs0 automatically. Well, this is not a function for home users. This is for servers. You won't even find an ASPEED BMC chip in a home PC. Aspeed's dual-boot is quite an advanced feature and people willing to use it are expected to be able to predict the behavior. To me, as an embedded systems developer, automatic selection of cs0 by probe is a limitation. I prefer flexibility. With best regards, Alexander Amelkin, BIOS/BMC Team Lead, YADRO https://yadro.com
diff --git a/drivers/watchdog/aspeed_wdt.c b/drivers/watchdog/aspeed_wdt.c index cc71861e033a..858e62f1c7ba 100644 --- a/drivers/watchdog/aspeed_wdt.c +++ b/drivers/watchdog/aspeed_wdt.c @@ -53,6 +53,8 @@ MODULE_DEVICE_TABLE(of, aspeed_wdt_of_table); #define WDT_CTRL_ENABLE BIT(0) #define WDT_TIMEOUT_STATUS 0x10 #define WDT_TIMEOUT_STATUS_BOOT_SECONDARY BIT(1) +#define WDT_CLEAR_TIMEOUT_STATUS 0x14 +#define WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION BIT(0) /* * WDT_RESET_WIDTH controls the characteristics of the external pulse (if @@ -165,6 +167,29 @@ static int aspeed_wdt_restart(struct watchdog_device *wdd, return 0; } +static ssize_t access_cs0_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t size) +{ + struct aspeed_wdt *wdt = dev_get_drvdata(dev); + + if (unlikely(!wdt)) + return -ENODEV; + + writel(WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION, + wdt->base + WDT_CLEAR_TIMEOUT_STATUS); + wdt->wdd.bootstatus |= WDIOF_EXTERN1; + + return size; +} +static DEVICE_ATTR_WO(access_cs0); + +static struct attribute *bswitch_attrs[] = { + &dev_attr_access_cs0.attr, + NULL +}; +ATTRIBUTE_GROUPS(bswitch); + static const struct watchdog_ops aspeed_wdt_ops = { .start = aspeed_wdt_start, .stop = aspeed_wdt_stop, @@ -223,6 +248,9 @@ static int aspeed_wdt_probe(struct platform_device *pdev) wdt->ctrl = WDT_CTRL_1MHZ_CLK; + if (of_property_read_bool(np, "aspeed,alt-boot")) + wdt->wdd.groups = bswitch_groups; + /* * Control reset on a per-device basis to ensure the * host is not affected by a BMC reboot @@ -309,6 +337,8 @@ static int aspeed_wdt_probe(struct platform_device *pdev) if (status & WDT_TIMEOUT_STATUS_BOOT_SECONDARY) wdt->wdd.bootstatus = WDIOF_CARDRESET; + dev_set_drvdata(dev, wdt); + return devm_watchdog_register_device(dev, &wdt->wdd); }
Set WDT_CLEAR_TIMEOUT_AND_BOOT_CODE_SELECTION into WDT_CLEAR_TIMEOUT_STATUS to clear out boot code source and re-enable access to the primary SPI flash chip while booted via wdt2 from the alternate chip. AST2400 datasheet says: "In the 2nd flash booting mode, all the address mapping to CS0# would be re-directed to CS1#. And CS0# is not accessable under this mode. To access CS0#, firmware should clear the 2nd boot mode register in the WDT2 status register WDT30.bit[1]." Signed-off-by: Ivan Mikhaylov <i.mikhaylov@yadro.com> --- drivers/watchdog/aspeed_wdt.c | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+)