diff mbox

[v4,4/4] ARM: BCM5301X: workaround suppress fault

Message ID 1388944697-19927-5-git-send-email-hauke@hauke-m.de (mailing list archive)
State New, archived
Headers show

Commit Message

Hauke Mehrtens Jan. 5, 2014, 5:58 p.m. UTC
Without this patch I am getting a unhandled fault exception like this
one after "Freeing unused kernel memory":

Freeing unused kernel memory: 1260K (c02c1000 - c03fc000)
Unhandled fault: imprecise external abort (0x1c06) at 0xb6f89005
Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007

The address which is here 0xb6f89005 changes from boot to boot, with a
new build the changes are bigger. With kernel 3.10 I have also seen
this fault at different places in the boot process, but starting with
3.11 they are always occurring after the "Freeing unused kernel memory"
message. I never was able to completely boot to userspace without this
handler. The abort code is constant 0x1c06. This fault just happens
once in the boot process I have never seen it happing twice or more.

I also tried changing the CPSR.A bit to 0 in init_early, with this code
like Afzal suggested, but that did not change anything:
asm volatile("mrs r12, cpsr\n"
	"bic r12, r12, #0x00000100\n"
	"msr cpsr_c, r12" ::: "r12", "cc", "memory");

Disabling the L2 cache by building with CONFIG_CACHE_L2X0 unset did not
help.

This workaround was copied from the vendor code including most of the
comments. It says it they think this is caused by the CFE boot loader
used on this device. I do not have any access to any datasheet or
errata document to check this.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
---
 arch/arm/mach-bcm/bcm_5301x.c |   23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

Comments

Arnd Bergmann Jan. 5, 2014, 8:25 p.m. UTC | #1
On Sunday 05 January 2014, Hauke Mehrtens wrote:
> Without this patch I am getting a unhandled fault exception like this
> one after "Freeing unused kernel memory":
> 
> Freeing unused kernel memory: 1260K (c02c1000 - c03fc000)
> Unhandled fault: imprecise external abort (0x1c06) at 0xb6f89005
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007
> 
> The address which is here 0xb6f89005 changes from boot to boot, with a
> new build the changes are bigger. With kernel 3.10 I have also seen
> this fault at different places in the boot process, but starting with
> 3.11 they are always occurring after the "Freeing unused kernel memory"
> message. I never was able to completely boot to userspace without this
> handler. The abort code is constant 0x1c06. This fault just happens
> once in the boot process I have never seen it happing twice or more.

How about narrowing down the abort handler to only ignore a single
fault after boot, and only with the abort code 0x1c06? That way you
don't risk silent data corruption in case something else goes wrong
after booting.

> This workaround was copied from the vendor code including most of the
> comments. It says it they think this is caused by the CFE boot loader
> used on this device. I do not have any access to any datasheet or
> errata document to check this.

Does the SoC by chance have a PCI host controller? The only other
platforms with this kind of handler have it to catch things going wrong
with PCI. Maybe another thing to try is to turn off the PCI core
at early boot.

	Arnd
Hauke Mehrtens Jan. 5, 2014, 10:29 p.m. UTC | #2
On 01/05/2014 09:25 PM, Arnd Bergmann wrote:
> On Sunday 05 January 2014, Hauke Mehrtens wrote:
>> Without this patch I am getting a unhandled fault exception like this
>> one after "Freeing unused kernel memory":
>>
>> Freeing unused kernel memory: 1260K (c02c1000 - c03fc000)
>> Unhandled fault: imprecise external abort (0x1c06) at 0xb6f89005
>> Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007
>>
>> The address which is here 0xb6f89005 changes from boot to boot, with a
>> new build the changes are bigger. With kernel 3.10 I have also seen
>> this fault at different places in the boot process, but starting with
>> 3.11 they are always occurring after the "Freeing unused kernel memory"
>> message. I never was able to completely boot to userspace without this
>> handler. The abort code is constant 0x1c06. This fault just happens
>> once in the boot process I have never seen it happing twice or more.
> 
> How about narrowing down the abort handler to only ignore a single
> fault after boot, and only with the abort code 0x1c06? That way you
> don't risk silent data corruption in case something else goes wrong
> after booting.

Ok I extended the bcm5301x_abort_handler() to only ignore the fault when
the code is 0x1c06 and it is the first fault.

>> This workaround was copied from the vendor code including most of the
>> comments. It says it they think this is caused by the CFE boot loader
>> used on this device. I do not have any access to any datasheet or
>> errata document to check this.
> 
> Does the SoC by chance have a PCI host controller? The only other
> platforms with this kind of handler have it to catch things going wrong
> with PCI. Maybe another thing to try is to turn off the PCI core
> at early boot.

Yes it has two PCIe host controller. I haven't tried to initialized the
PCIe core. It could be that the boot loader did something with the PCIe
controller, but I do not think so.

Hauke
diff mbox

Patch

diff --git a/arch/arm/mach-bcm/bcm_5301x.c b/arch/arm/mach-bcm/bcm_5301x.c
index 4b83b52..30dcfea 100644
--- a/arch/arm/mach-bcm/bcm_5301x.c
+++ b/arch/arm/mach-bcm/bcm_5301x.c
@@ -12,6 +12,28 @@ 
 #include <asm/hardware/cache-l2x0.h>
 
 #include <asm/mach/arch.h>
+#include <asm/signal.h>
+
+static int bcm5301x_abort_handler(unsigned long addr, unsigned int fsr,
+				 struct pt_regs *regs)
+{
+	/*
+	 * These happen for no good reason, possibly left over from the CFE
+	 * boot loader.
+	 */
+	pr_warn("External imprecise Data abort at addr=%#lx, fsr=%#x ignored.\n",
+		addr, fsr);
+
+	/* Returning non-zero causes fault display and panic */
+	return 0;
+}
+
+static void __init bcm5301x_init_early(void)
+{
+	/* Install our hook */
+	hook_fault_code(16 + 6, bcm5301x_abort_handler, SIGBUS, 0,
+			"imprecise external abort");
+}
 
 static void __init bcm5301x_timer_init(void)
 {
@@ -31,6 +53,7 @@  static const char __initconst *bcm5301x_dt_compat[] = {
 };
 
 DT_MACHINE_START(BCM5301X, "BCM5301X")
+	.init_early	= bcm5301x_init_early,
 	.init_time	= bcm5301x_timer_init,
 	.init_machine	= bcm5301x_dt_init,
 	.dt_compat	= bcm5301x_dt_compat,