diff mbox

[V6,04/10] arm64: exception: handle Synchronous External Abort

Message ID 1481147303-7979-5-git-send-email-tbaicar@codeaurora.org (mailing list archive)
State New, archived
Headers show

Commit Message

Tyler Baicar Dec. 7, 2016, 9:48 p.m. UTC
SEA exceptions are often caused by an uncorrected hardware
error, and are handled when data abort and instruction abort
exception classes have specific values for their Fault Status
Code.
When SEA occurs, before killing the process, go through
the handlers registered in the notification list.
Update fault_info[] with specific SEA faults so that the
new SEA handler is used.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
---
 arch/arm64/include/asm/system_misc.h | 13 ++++++++
 arch/arm64/mm/fault.c                | 58 +++++++++++++++++++++++++++++-------
 2 files changed, 61 insertions(+), 10 deletions(-)

Comments

Will Deacon Jan. 4, 2017, 1:54 p.m. UTC | #1
On Wed, Dec 07, 2016 at 02:48:17PM -0700, Tyler Baicar wrote:
> SEA exceptions are often caused by an uncorrected hardware
> error, and are handled when data abort and instruction abort
> exception classes have specific values for their Fault Status
> Code.
> When SEA occurs, before killing the process, go through
> the handlers registered in the notification list.
> Update fault_info[] with specific SEA faults so that the
> new SEA handler is used.
> 
> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
> Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
> ---
>  arch/arm64/include/asm/system_misc.h | 13 ++++++++
>  arch/arm64/mm/fault.c                | 58 +++++++++++++++++++++++++++++-------
>  2 files changed, 61 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h
> index 57f110b..9040e1d 100644
> --- a/arch/arm64/include/asm/system_misc.h
> +++ b/arch/arm64/include/asm/system_misc.h
> @@ -64,4 +64,17 @@ extern void (*arm_pm_restart)(enum reboot_mode reboot_mode, const char *cmd);
>  
>  #endif	/* __ASSEMBLY__ */
>  
> +/*
> + * The functions below are used to register and unregister callbacks
> + * that are to be invoked when a Synchronous External Abort (SEA)
> + * occurs. An SEA is raised by certain fault status codes that have
> + * either data or instruction abort as the exception class, and
> + * callbacks may be registered to parse or handle such hardware errors.
> + *
> + * Registered callbacks are run in an interrupt/atomic context. They
> + * are not allowed to block or sleep.
> + */
> +int register_synchronous_ext_abort_notifier(struct notifier_block *nb);
> +void unregister_synchronous_ext_abort_notifier(struct notifier_block *nb);

I think that we may as well use the "SEA" acronym consistently in code,
expanding it only for strings and comments, so these can be renamed to
{register,unregister}_sea_notifier. That said, what is the use of having a
notifier chain here as well as in the ghes code? If the ghes code is the
only place to register a notifier, we may as well start simple and call that
code directly, like we call handle_mm_fault directly for data aborts.

>  static const struct fault_info {
>  	int	(*fn)(unsigned long addr, unsigned int esr, struct pt_regs *regs);
>  	int	sig;
> @@ -502,22 +540,22 @@ static const struct fault_info {
>  	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 1 permission fault"	},
>  	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 permission fault"	},
>  	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 permission fault"	},
> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort"	},
> +	{ do_synch_ext_abort,	SIGBUS,  0,		"synchronous external abort"	},

Again, just stick with do_sea for the function name...

>  	{ do_bad,		SIGBUS,  0,		"unknown 17"			},
>  	{ do_bad,		SIGBUS,  0,		"unknown 18"			},
>  	{ do_bad,		SIGBUS,  0,		"unknown 19"			},
> -	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error"	},
> +	{ do_synch_ext_abort,	SIGBUS,  0,		"level 0 SEA (trans tbl walk)"	},

... but there's no need to abbreviate "translation table walk" here. Long
strings that run over 80 chars are fine. Similarly for "SEA".

> +	{ do_synch_ext_abort,	SIGBUS,  0,		"level 1 SEA (trans tbl walk)"	},
> +	{ do_synch_ext_abort,	SIGBUS,  0,		"level 2 SEA (trans tbl walk)"	},
> +	{ do_synch_ext_abort,	SIGBUS,  0,		"level 3 SEA (trans tbl walk)"	},
> +	{ do_synch_ext_abort,	SIGBUS,  0,		"synchronous parity or ECC err" },
>  	{ do_bad,		SIGBUS,  0,		"unknown 25"			},
>  	{ do_bad,		SIGBUS,  0,		"unknown 26"			},
>  	{ do_bad,		SIGBUS,  0,		"unknown 27"			},
> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
> +	{ do_synch_ext_abort,	SIGBUS,  0,		"level 0 synch parity error"	},
> +	{ do_synch_ext_abort,	SIGBUS,  0,		"level 1 synch parity error"	},
> +	{ do_synch_ext_abort,	SIGBUS,  0,		"level 2 synch parity error"	},
> +	{ do_synch_ext_abort,	SIGBUS,  0,		"level 3 synch parity error"	},

Please keep mention of "translation table walk", since we have exception
levels too and it's confusing just saying "level n".

Will
Tyler Baicar Jan. 6, 2017, 4:58 p.m. UTC | #2
Hi Will,

On 1/4/2017 6:54 AM, Will Deacon wrote:
> On Wed, Dec 07, 2016 at 02:48:17PM -0700, Tyler Baicar wrote:
>> SEA exceptions are often caused by an uncorrected hardware
>> error, and are handled when data abort and instruction abort
>> exception classes have specific values for their Fault Status
>> Code.
>> When SEA occurs, before killing the process, go through
>> the handlers registered in the notification list.
>> Update fault_info[] with specific SEA faults so that the
>> new SEA handler is used.
>>
>> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
>> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
>> Signed-off-by: Naveen Kaje <nkaje@codeaurora.org>
>> ---
>>   arch/arm64/include/asm/system_misc.h | 13 ++++++++
>>   arch/arm64/mm/fault.c                | 58 +++++++++++++++++++++++++++++-------
>>   2 files changed, 61 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h
>> index 57f110b..9040e1d 100644
>> --- a/arch/arm64/include/asm/system_misc.h
>> +++ b/arch/arm64/include/asm/system_misc.h
>> @@ -64,4 +64,17 @@ extern void (*arm_pm_restart)(enum reboot_mode reboot_mode, const char *cmd);
>>   
>>   #endif	/* __ASSEMBLY__ */
>>   
>> +/*
>> + * The functions below are used to register and unregister callbacks
>> + * that are to be invoked when a Synchronous External Abort (SEA)
>> + * occurs. An SEA is raised by certain fault status codes that have
>> + * either data or instruction abort as the exception class, and
>> + * callbacks may be registered to parse or handle such hardware errors.
>> + *
>> + * Registered callbacks are run in an interrupt/atomic context. They
>> + * are not allowed to block or sleep.
>> + */
>> +int register_synchronous_ext_abort_notifier(struct notifier_block *nb);
>> +void unregister_synchronous_ext_abort_notifier(struct notifier_block *nb);
> I think that we may as well use the "SEA" acronym consistently in code,
> expanding it only for strings and comments, so these can be renamed to
> {register,unregister}_sea_notifier. That said, what is the use of having a
> notifier chain here as well as in the ghes code? If the ghes code is the
> only place to register a notifier, we may as well start simple and call that
> code directly, like we call handle_mm_fault directly for data aborts.
I originally used the acronym and got feedback to expand it, but I'll 
revert back to just using the acronym.
Using a notifier here is consistent with the SCI error type in the GHES 
code which is also only registered in
the GHES code. I can remove the notifier for SEA if you think making the 
call directly is better here.
>>   static const struct fault_info {
>>   	int	(*fn)(unsigned long addr, unsigned int esr, struct pt_regs *regs);
>>   	int	sig;
>> @@ -502,22 +540,22 @@ static const struct fault_info {
>>   	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 1 permission fault"	},
>>   	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 permission fault"	},
>>   	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 permission fault"	},
>> -	{ do_bad,		SIGBUS,  0,		"synchronous external abort"	},
>> +	{ do_synch_ext_abort,	SIGBUS,  0,		"synchronous external abort"	},
> Again, just stick with do_sea for the function name...
>
>>   	{ do_bad,		SIGBUS,  0,		"unknown 17"			},
>>   	{ do_bad,		SIGBUS,  0,		"unknown 18"			},
>>   	{ do_bad,		SIGBUS,  0,		"unknown 19"			},
>> -	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error"	},
>> +	{ do_synch_ext_abort,	SIGBUS,  0,		"level 0 SEA (trans tbl walk)"	},
> ... but there's no need to abbreviate "translation table walk" here. Long
> strings that run over 80 chars are fine. Similarly for "SEA".
I will expand this in the next patchset.
>> +	{ do_synch_ext_abort,	SIGBUS,  0,		"level 1 SEA (trans tbl walk)"	},
>> +	{ do_synch_ext_abort,	SIGBUS,  0,		"level 2 SEA (trans tbl walk)"	},
>> +	{ do_synch_ext_abort,	SIGBUS,  0,		"level 3 SEA (trans tbl walk)"	},
>> +	{ do_synch_ext_abort,	SIGBUS,  0,		"synchronous parity or ECC err" },
>>   	{ do_bad,		SIGBUS,  0,		"unknown 25"			},
>>   	{ do_bad,		SIGBUS,  0,		"unknown 26"			},
>>   	{ do_bad,		SIGBUS,  0,		"unknown 27"			},
>> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
>> -	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
>> +	{ do_synch_ext_abort,	SIGBUS,  0,		"level 0 synch parity error"	},
>> +	{ do_synch_ext_abort,	SIGBUS,  0,		"level 1 synch parity error"	},
>> +	{ do_synch_ext_abort,	SIGBUS,  0,		"level 2 synch parity error"	},
>> +	{ do_synch_ext_abort,	SIGBUS,  0,		"level 3 synch parity error"	},
> Please keep mention of "translation table walk", since we have exception
> levels too and it's confusing just saying "level n".
I will add them back in the next patchset.

Thanks,
Tyler
> Will
diff mbox

Patch

diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h
index 57f110b..9040e1d 100644
--- a/arch/arm64/include/asm/system_misc.h
+++ b/arch/arm64/include/asm/system_misc.h
@@ -64,4 +64,17 @@  extern void (*arm_pm_restart)(enum reboot_mode reboot_mode, const char *cmd);
 
 #endif	/* __ASSEMBLY__ */
 
+/*
+ * The functions below are used to register and unregister callbacks
+ * that are to be invoked when a Synchronous External Abort (SEA)
+ * occurs. An SEA is raised by certain fault status codes that have
+ * either data or instruction abort as the exception class, and
+ * callbacks may be registered to parse or handle such hardware errors.
+ *
+ * Registered callbacks are run in an interrupt/atomic context. They
+ * are not allowed to block or sleep.
+ */
+int register_synchronous_ext_abort_notifier(struct notifier_block *nb);
+void unregister_synchronous_ext_abort_notifier(struct notifier_block *nb);
+
 #endif	/* __ASM_SYSTEM_MISC_H */
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 05d2bd7..fcc49f1 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -39,6 +39,22 @@ 
 #include <asm/pgtable.h>
 #include <asm/tlbflush.h>
 
+/*
+ * GHES SEA handler code may register a notifier call here to
+ * handle HW error record passed from platform.
+ */
+static ATOMIC_NOTIFIER_HEAD(sea_handler_chain);
+
+int register_synchronous_ext_abort_notifier(struct notifier_block *nb)
+{
+	return atomic_notifier_chain_register(&sea_handler_chain, nb);
+}
+
+void unregister_synchronous_ext_abort_notifier(struct notifier_block *nb)
+{
+	atomic_notifier_chain_unregister(&sea_handler_chain, nb);
+}
+
 static const char *fault_name(unsigned int esr);
 
 #ifdef CONFIG_KPROBES
@@ -480,6 +496,28 @@  static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 	return 1;
 }
 
+/*
+ * This abort handler deals with Synchronous External Abort.
+ * It calls notifiers, and then returns "fault".
+ */
+static int do_synch_ext_abort(unsigned long addr, unsigned int esr, struct pt_regs *regs)
+{
+	struct siginfo info;
+
+	atomic_notifier_call_chain(&sea_handler_chain, 0, NULL);
+
+	pr_err("Synchronous External Abort: %s (0x%08x) at 0x%016lx\n",
+		 fault_name(esr), esr, addr);
+
+	info.si_signo = SIGBUS;
+	info.si_errno = 0;
+	info.si_code  = 0;
+	info.si_addr  = (void __user *)addr;
+	arm64_notify_die("", regs, &info, esr);
+
+	return 0;
+}
+
 static const struct fault_info {
 	int	(*fn)(unsigned long addr, unsigned int esr, struct pt_regs *regs);
 	int	sig;
@@ -502,22 +540,22 @@  static const struct fault_info {
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 1 permission fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 permission fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 permission fault"	},
-	{ do_bad,		SIGBUS,  0,		"synchronous external abort"	},
+	{ do_synch_ext_abort,	SIGBUS,  0,		"synchronous external abort"	},
 	{ do_bad,		SIGBUS,  0,		"unknown 17"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 18"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 19"			},
-	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error"	},
+	{ do_synch_ext_abort,	SIGBUS,  0,		"level 0 SEA (trans tbl walk)"	},
+	{ do_synch_ext_abort,	SIGBUS,  0,		"level 1 SEA (trans tbl walk)"	},
+	{ do_synch_ext_abort,	SIGBUS,  0,		"level 2 SEA (trans tbl walk)"	},
+	{ do_synch_ext_abort,	SIGBUS,  0,		"level 3 SEA (trans tbl walk)"	},
+	{ do_synch_ext_abort,	SIGBUS,  0,		"synchronous parity or ECC err" },
 	{ do_bad,		SIGBUS,  0,		"unknown 25"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 26"			},
 	{ do_bad,		SIGBUS,  0,		"unknown 27"			},
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
-	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk)" },
+	{ do_synch_ext_abort,	SIGBUS,  0,		"level 0 synch parity error"	},
+	{ do_synch_ext_abort,	SIGBUS,  0,		"level 1 synch parity error"	},
+	{ do_synch_ext_abort,	SIGBUS,  0,		"level 2 synch parity error"	},
+	{ do_synch_ext_abort,	SIGBUS,  0,		"level 3 synch parity error"	},
 	{ do_bad,		SIGBUS,  0,		"unknown 32"			},
 	{ do_alignment_fault,	SIGBUS,  BUS_ADRALN,	"alignment fault"		},
 	{ do_bad,		SIGBUS,  0,		"unknown 34"			},