Message ID | 1481147303-7979-5-git-send-email-tbaicar@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Dec 07, 2016 at 02:48:17PM -0700, Tyler Baicar wrote: > SEA exceptions are often caused by an uncorrected hardware > error, and are handled when data abort and instruction abort > exception classes have specific values for their Fault Status > Code. > When SEA occurs, before killing the process, go through > the handlers registered in the notification list. > Update fault_info[] with specific SEA faults so that the > new SEA handler is used. > > Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> > Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> > Signed-off-by: Naveen Kaje <nkaje@codeaurora.org> > --- > arch/arm64/include/asm/system_misc.h | 13 ++++++++ > arch/arm64/mm/fault.c | 58 +++++++++++++++++++++++++++++------- > 2 files changed, 61 insertions(+), 10 deletions(-) > > diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h > index 57f110b..9040e1d 100644 > --- a/arch/arm64/include/asm/system_misc.h > +++ b/arch/arm64/include/asm/system_misc.h > @@ -64,4 +64,17 @@ extern void (*arm_pm_restart)(enum reboot_mode reboot_mode, const char *cmd); > > #endif /* __ASSEMBLY__ */ > > +/* > + * The functions below are used to register and unregister callbacks > + * that are to be invoked when a Synchronous External Abort (SEA) > + * occurs. An SEA is raised by certain fault status codes that have > + * either data or instruction abort as the exception class, and > + * callbacks may be registered to parse or handle such hardware errors. > + * > + * Registered callbacks are run in an interrupt/atomic context. They > + * are not allowed to block or sleep. > + */ > +int register_synchronous_ext_abort_notifier(struct notifier_block *nb); > +void unregister_synchronous_ext_abort_notifier(struct notifier_block *nb); I think that we may as well use the "SEA" acronym consistently in code, expanding it only for strings and comments, so these can be renamed to {register,unregister}_sea_notifier. That said, what is the use of having a notifier chain here as well as in the ghes code? If the ghes code is the only place to register a notifier, we may as well start simple and call that code directly, like we call handle_mm_fault directly for data aborts. > static const struct fault_info { > int (*fn)(unsigned long addr, unsigned int esr, struct pt_regs *regs); > int sig; > @@ -502,22 +540,22 @@ static const struct fault_info { > { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 1 permission fault" }, > { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 2 permission fault" }, > { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 3 permission fault" }, > - { do_bad, SIGBUS, 0, "synchronous external abort" }, > + { do_synch_ext_abort, SIGBUS, 0, "synchronous external abort" }, Again, just stick with do_sea for the function name... > { do_bad, SIGBUS, 0, "unknown 17" }, > { do_bad, SIGBUS, 0, "unknown 18" }, > { do_bad, SIGBUS, 0, "unknown 19" }, > - { do_bad, SIGBUS, 0, "synchronous abort (translation table walk)" }, > - { do_bad, SIGBUS, 0, "synchronous abort (translation table walk)" }, > - { do_bad, SIGBUS, 0, "synchronous abort (translation table walk)" }, > - { do_bad, SIGBUS, 0, "synchronous abort (translation table walk)" }, > - { do_bad, SIGBUS, 0, "synchronous parity error" }, > + { do_synch_ext_abort, SIGBUS, 0, "level 0 SEA (trans tbl walk)" }, ... but there's no need to abbreviate "translation table walk" here. Long strings that run over 80 chars are fine. Similarly for "SEA". > + { do_synch_ext_abort, SIGBUS, 0, "level 1 SEA (trans tbl walk)" }, > + { do_synch_ext_abort, SIGBUS, 0, "level 2 SEA (trans tbl walk)" }, > + { do_synch_ext_abort, SIGBUS, 0, "level 3 SEA (trans tbl walk)" }, > + { do_synch_ext_abort, SIGBUS, 0, "synchronous parity or ECC err" }, > { do_bad, SIGBUS, 0, "unknown 25" }, > { do_bad, SIGBUS, 0, "unknown 26" }, > { do_bad, SIGBUS, 0, "unknown 27" }, > - { do_bad, SIGBUS, 0, "synchronous parity error (translation table walk)" }, > - { do_bad, SIGBUS, 0, "synchronous parity error (translation table walk)" }, > - { do_bad, SIGBUS, 0, "synchronous parity error (translation table walk)" }, > - { do_bad, SIGBUS, 0, "synchronous parity error (translation table walk)" }, > + { do_synch_ext_abort, SIGBUS, 0, "level 0 synch parity error" }, > + { do_synch_ext_abort, SIGBUS, 0, "level 1 synch parity error" }, > + { do_synch_ext_abort, SIGBUS, 0, "level 2 synch parity error" }, > + { do_synch_ext_abort, SIGBUS, 0, "level 3 synch parity error" }, Please keep mention of "translation table walk", since we have exception levels too and it's confusing just saying "level n". Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Will, On 1/4/2017 6:54 AM, Will Deacon wrote: > On Wed, Dec 07, 2016 at 02:48:17PM -0700, Tyler Baicar wrote: >> SEA exceptions are often caused by an uncorrected hardware >> error, and are handled when data abort and instruction abort >> exception classes have specific values for their Fault Status >> Code. >> When SEA occurs, before killing the process, go through >> the handlers registered in the notification list. >> Update fault_info[] with specific SEA faults so that the >> new SEA handler is used. >> >> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> >> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> >> Signed-off-by: Naveen Kaje <nkaje@codeaurora.org> >> --- >> arch/arm64/include/asm/system_misc.h | 13 ++++++++ >> arch/arm64/mm/fault.c | 58 +++++++++++++++++++++++++++++------- >> 2 files changed, 61 insertions(+), 10 deletions(-) >> >> diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h >> index 57f110b..9040e1d 100644 >> --- a/arch/arm64/include/asm/system_misc.h >> +++ b/arch/arm64/include/asm/system_misc.h >> @@ -64,4 +64,17 @@ extern void (*arm_pm_restart)(enum reboot_mode reboot_mode, const char *cmd); >> >> #endif /* __ASSEMBLY__ */ >> >> +/* >> + * The functions below are used to register and unregister callbacks >> + * that are to be invoked when a Synchronous External Abort (SEA) >> + * occurs. An SEA is raised by certain fault status codes that have >> + * either data or instruction abort as the exception class, and >> + * callbacks may be registered to parse or handle such hardware errors. >> + * >> + * Registered callbacks are run in an interrupt/atomic context. They >> + * are not allowed to block or sleep. >> + */ >> +int register_synchronous_ext_abort_notifier(struct notifier_block *nb); >> +void unregister_synchronous_ext_abort_notifier(struct notifier_block *nb); > I think that we may as well use the "SEA" acronym consistently in code, > expanding it only for strings and comments, so these can be renamed to > {register,unregister}_sea_notifier. That said, what is the use of having a > notifier chain here as well as in the ghes code? If the ghes code is the > only place to register a notifier, we may as well start simple and call that > code directly, like we call handle_mm_fault directly for data aborts. I originally used the acronym and got feedback to expand it, but I'll revert back to just using the acronym. Using a notifier here is consistent with the SCI error type in the GHES code which is also only registered in the GHES code. I can remove the notifier for SEA if you think making the call directly is better here. >> static const struct fault_info { >> int (*fn)(unsigned long addr, unsigned int esr, struct pt_regs *regs); >> int sig; >> @@ -502,22 +540,22 @@ static const struct fault_info { >> { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 1 permission fault" }, >> { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 2 permission fault" }, >> { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 3 permission fault" }, >> - { do_bad, SIGBUS, 0, "synchronous external abort" }, >> + { do_synch_ext_abort, SIGBUS, 0, "synchronous external abort" }, > Again, just stick with do_sea for the function name... > >> { do_bad, SIGBUS, 0, "unknown 17" }, >> { do_bad, SIGBUS, 0, "unknown 18" }, >> { do_bad, SIGBUS, 0, "unknown 19" }, >> - { do_bad, SIGBUS, 0, "synchronous abort (translation table walk)" }, >> - { do_bad, SIGBUS, 0, "synchronous abort (translation table walk)" }, >> - { do_bad, SIGBUS, 0, "synchronous abort (translation table walk)" }, >> - { do_bad, SIGBUS, 0, "synchronous abort (translation table walk)" }, >> - { do_bad, SIGBUS, 0, "synchronous parity error" }, >> + { do_synch_ext_abort, SIGBUS, 0, "level 0 SEA (trans tbl walk)" }, > ... but there's no need to abbreviate "translation table walk" here. Long > strings that run over 80 chars are fine. Similarly for "SEA". I will expand this in the next patchset. >> + { do_synch_ext_abort, SIGBUS, 0, "level 1 SEA (trans tbl walk)" }, >> + { do_synch_ext_abort, SIGBUS, 0, "level 2 SEA (trans tbl walk)" }, >> + { do_synch_ext_abort, SIGBUS, 0, "level 3 SEA (trans tbl walk)" }, >> + { do_synch_ext_abort, SIGBUS, 0, "synchronous parity or ECC err" }, >> { do_bad, SIGBUS, 0, "unknown 25" }, >> { do_bad, SIGBUS, 0, "unknown 26" }, >> { do_bad, SIGBUS, 0, "unknown 27" }, >> - { do_bad, SIGBUS, 0, "synchronous parity error (translation table walk)" }, >> - { do_bad, SIGBUS, 0, "synchronous parity error (translation table walk)" }, >> - { do_bad, SIGBUS, 0, "synchronous parity error (translation table walk)" }, >> - { do_bad, SIGBUS, 0, "synchronous parity error (translation table walk)" }, >> + { do_synch_ext_abort, SIGBUS, 0, "level 0 synch parity error" }, >> + { do_synch_ext_abort, SIGBUS, 0, "level 1 synch parity error" }, >> + { do_synch_ext_abort, SIGBUS, 0, "level 2 synch parity error" }, >> + { do_synch_ext_abort, SIGBUS, 0, "level 3 synch parity error" }, > Please keep mention of "translation table walk", since we have exception > levels too and it's confusing just saying "level n". I will add them back in the next patchset. Thanks, Tyler > Will
diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h index 57f110b..9040e1d 100644 --- a/arch/arm64/include/asm/system_misc.h +++ b/arch/arm64/include/asm/system_misc.h @@ -64,4 +64,17 @@ extern void (*arm_pm_restart)(enum reboot_mode reboot_mode, const char *cmd); #endif /* __ASSEMBLY__ */ +/* + * The functions below are used to register and unregister callbacks + * that are to be invoked when a Synchronous External Abort (SEA) + * occurs. An SEA is raised by certain fault status codes that have + * either data or instruction abort as the exception class, and + * callbacks may be registered to parse or handle such hardware errors. + * + * Registered callbacks are run in an interrupt/atomic context. They + * are not allowed to block or sleep. + */ +int register_synchronous_ext_abort_notifier(struct notifier_block *nb); +void unregister_synchronous_ext_abort_notifier(struct notifier_block *nb); + #endif /* __ASM_SYSTEM_MISC_H */ diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c index 05d2bd7..fcc49f1 100644 --- a/arch/arm64/mm/fault.c +++ b/arch/arm64/mm/fault.c @@ -39,6 +39,22 @@ #include <asm/pgtable.h> #include <asm/tlbflush.h> +/* + * GHES SEA handler code may register a notifier call here to + * handle HW error record passed from platform. + */ +static ATOMIC_NOTIFIER_HEAD(sea_handler_chain); + +int register_synchronous_ext_abort_notifier(struct notifier_block *nb) +{ + return atomic_notifier_chain_register(&sea_handler_chain, nb); +} + +void unregister_synchronous_ext_abort_notifier(struct notifier_block *nb) +{ + atomic_notifier_chain_unregister(&sea_handler_chain, nb); +} + static const char *fault_name(unsigned int esr); #ifdef CONFIG_KPROBES @@ -480,6 +496,28 @@ static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs) return 1; } +/* + * This abort handler deals with Synchronous External Abort. + * It calls notifiers, and then returns "fault". + */ +static int do_synch_ext_abort(unsigned long addr, unsigned int esr, struct pt_regs *regs) +{ + struct siginfo info; + + atomic_notifier_call_chain(&sea_handler_chain, 0, NULL); + + pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n", + fault_name(esr), esr, addr); + + info.si_signo = SIGBUS; + info.si_errno = 0; + info.si_code = 0; + info.si_addr = (void __user *)addr; + arm64_notify_die("", regs, &info, esr); + + return 0; +} + static const struct fault_info { int (*fn)(unsigned long addr, unsigned int esr, struct pt_regs *regs); int sig; @@ -502,22 +540,22 @@ static const struct fault_info { { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 1 permission fault" }, { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 2 permission fault" }, { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 3 permission fault" }, - { do_bad, SIGBUS, 0, "synchronous external abort" }, + { do_synch_ext_abort, SIGBUS, 0, "synchronous external abort" }, { do_bad, SIGBUS, 0, "unknown 17" }, { do_bad, SIGBUS, 0, "unknown 18" }, { do_bad, SIGBUS, 0, "unknown 19" }, - { do_bad, SIGBUS, 0, "synchronous abort (translation table walk)" }, - { do_bad, SIGBUS, 0, "synchronous abort (translation table walk)" }, - { do_bad, SIGBUS, 0, "synchronous abort (translation table walk)" }, - { do_bad, SIGBUS, 0, "synchronous abort (translation table walk)" }, - { do_bad, SIGBUS, 0, "synchronous parity error" }, + { do_synch_ext_abort, SIGBUS, 0, "level 0 SEA (trans tbl walk)" }, + { do_synch_ext_abort, SIGBUS, 0, "level 1 SEA (trans tbl walk)" }, + { do_synch_ext_abort, SIGBUS, 0, "level 2 SEA (trans tbl walk)" }, + { do_synch_ext_abort, SIGBUS, 0, "level 3 SEA (trans tbl walk)" }, + { do_synch_ext_abort, SIGBUS, 0, "synchronous parity or ECC err" }, { do_bad, SIGBUS, 0, "unknown 25" }, { do_bad, SIGBUS, 0, "unknown 26" }, { do_bad, SIGBUS, 0, "unknown 27" }, - { do_bad, SIGBUS, 0, "synchronous parity error (translation table walk)" }, - { do_bad, SIGBUS, 0, "synchronous parity error (translation table walk)" }, - { do_bad, SIGBUS, 0, "synchronous parity error (translation table walk)" }, - { do_bad, SIGBUS, 0, "synchronous parity error (translation table walk)" }, + { do_synch_ext_abort, SIGBUS, 0, "level 0 synch parity error" }, + { do_synch_ext_abort, SIGBUS, 0, "level 1 synch parity error" }, + { do_synch_ext_abort, SIGBUS, 0, "level 2 synch parity error" }, + { do_synch_ext_abort, SIGBUS, 0, "level 3 synch parity error" }, { do_bad, SIGBUS, 0, "unknown 32" }, { do_alignment_fault, SIGBUS, BUS_ADRALN, "alignment fault" }, { do_bad, SIGBUS, 0, "unknown 34" },