diff mbox series

arm64: panic on synchronous external abort in kernel context

Message ID 20200410015245.23230-1-xiexiuqi@huawei.com (mailing list archive)
State New, archived
Headers show
Series arm64: panic on synchronous external abort in kernel context | expand

Commit Message

Xie XiuQi April 10, 2020, 1:52 a.m. UTC
We should panic even panic_on_oops is not set, when we can't recover
from synchronous external abort in kernel context.

Othervise, there are two issues:
1) fallback to do_exit() in exception context, cause this core hung up.
   do_sea()
   -> arm64_notify_die
      -> die
         -> do_exit
2) errors may propagated.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
Cc: Xiaofei Tan <tanxiaofei@huawei.com>
---
 arch/arm64/include/asm/esr.h | 12 ++++++++++++
 arch/arm64/kernel/traps.c    |  2 ++
 2 files changed, 14 insertions(+)

Comments

Mark Rutland April 14, 2020, 10:59 a.m. UTC | #1
On Fri, Apr 10, 2020 at 09:52:45AM +0800, Xie XiuQi wrote:
> We should panic even panic_on_oops is not set, when we can't recover
> from synchronous external abort in kernel context.
> 
> Othervise, there are two issues:
> 1) fallback to do_exit() in exception context, cause this core hung up.
>    do_sea()
>    -> arm64_notify_die
>       -> die
>          -> do_exit
> 2) errors may propagated.
> 
> Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
> Cc: Xiaofei Tan <tanxiaofei@huawei.com>
> ---
>  arch/arm64/include/asm/esr.h | 12 ++++++++++++
>  arch/arm64/kernel/traps.c    |  2 ++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> index cb29253ae86b..acfc71c6d148 100644
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -326,6 +326,18 @@ static inline bool esr_is_data_abort(u32 esr)
>  	return ec == ESR_ELx_EC_DABT_LOW || ec == ESR_ELx_EC_DABT_CUR;
>  }
>  
> +static inline bool esr_is_inst_abort(u32 esr)
> +{
> +	const u32 ec = ESR_ELx_EC(esr);
> +
> +	return ec == ESR_ELx_EC_IABT_LOW || ec == ESR_ELx_EC_IABT_CUR;
> +}
> +
> +static inline bool esr_is_ext_abort(u32 esr)
> +{
> +	return esr_is_data_abort(esr) || esr_is_inst_abort(esr);
> +}

A data abort or an intstruction abort are not necessarily synchronus
external aborts, so this isn't right.

What exactly are you trying to catch here? If you are seeing a problem
in practice, can you please share your log from a crash?

Thanks,
Mark.

> +
>  const char *esr_get_class_string(u32 esr);
>  #endif /* __ASSEMBLY */
>  
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index cf402be5c573..08f7f7688d5b 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -202,6 +202,8 @@ void die(const char *str, struct pt_regs *regs, int err)
>  		panic("Fatal exception in interrupt");
>  	if (panic_on_oops)
>  		panic("Fatal exception");
> +	if (esr_is_ext_abort(err))
> +		panic("Synchronous external abort in kernel context");
>  
>  	raw_spin_unlock_irqrestore(&die_lock, flags);
>  
> -- 
> 2.20.1
>
James Morse April 14, 2020, 12:16 p.m. UTC | #2
Hi Xie,

On 14/04/2020 11:59, Mark Rutland wrote:
> On Fri, Apr 10, 2020 at 09:52:45AM +0800, Xie XiuQi wrote:
>> We should panic even panic_on_oops is not set, when we can't recover
>> from synchronous external abort in kernel context.

Hmm, fault-from-kernel-context doesn't mean the fault affects the kernel. If the kernel is
reading or writing from user-space memory for a syscall, its the user-space memory that is
affected. This thread can't make progress, so we kill it.
If its a kernel thread or we were in irq context, we panic().

I don't think you really want all faults that happen as a result of a kernel access to be
fatal!

[...]

> What exactly are you trying to catch here? If you are seeing a problem
> in practice, can you please share your log from a crash?

Yes please!


I suspect you want to make memory_failure() smarter about faults that affect the kernel
text or data. If so, please do it in memory_failure() where it benefits all architectures,
and all methods of reporting errors.
(we may need a 'synchronous' hint to memory_failure(), it expects everything to be
asynchronous).

If its not memory, we should extend the RAS handling to know what this error is, and that
it is fatal. (e.g. PE state is infected)


Thanks,

James
Xie XiuQi April 14, 2020, 12:19 p.m. UTC | #3
Hi Mark,

On 2020/4/14 18:59, Mark Rutland wrote:
> On Fri, Apr 10, 2020 at 09:52:45AM +0800, Xie XiuQi wrote:
>> We should panic even panic_on_oops is not set, when we can't recover
>> from synchronous external abort in kernel context.
>>
>> Othervise, there are two issues:
>> 1) fallback to do_exit() in exception context, cause this core hung up.
>>    do_sea()
>>    -> arm64_notify_die
>>       -> die
>>          -> do_exit
>> 2) errors may propagated.
>>
>> Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
>> Cc: Xiaofei Tan <tanxiaofei@huawei.com>
>> ---
>>  arch/arm64/include/asm/esr.h | 12 ++++++++++++
>>  arch/arm64/kernel/traps.c    |  2 ++
>>  2 files changed, 14 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
>> index cb29253ae86b..acfc71c6d148 100644
>> --- a/arch/arm64/include/asm/esr.h
>> +++ b/arch/arm64/include/asm/esr.h
>> @@ -326,6 +326,18 @@ static inline bool esr_is_data_abort(u32 esr)
>>  	return ec == ESR_ELx_EC_DABT_LOW || ec == ESR_ELx_EC_DABT_CUR;
>>  }
>>  
>> +static inline bool esr_is_inst_abort(u32 esr)
>> +{
>> +	const u32 ec = ESR_ELx_EC(esr);
>> +
>> +	return ec == ESR_ELx_EC_IABT_LOW || ec == ESR_ELx_EC_IABT_CUR;
>> +}
>> +
>> +static inline bool esr_is_ext_abort(u32 esr)
>> +{
>> +	return esr_is_data_abort(esr) || esr_is_inst_abort(esr);
>> +}
> 
> A data abort or an intstruction abort are not necessarily synchronus
> external aborts, so this isn't right.
> 
> What exactly are you trying to catch here? If you are seeing a problem
> in practice, can you please share your log from a crash?

Yes, I meet a problem in practice.

Tan Xiaofei report this issue when doing ras error inject testing:
1) panic_on_oops is not set;
2) trigger a sea by inject a ECC error;
3) a cpu core receive a sea in kernel context:
   do_mem_abort
     -> arm64_notify_die
        -> die                    # kernel context, call die() directly;
           -> do_exit             # kernel process context, call do_exit(SIGSEGV);
              -> do_task_dead()   # call do_task_dead(), and hung up this core;

Actually, we should not call do_exit() for kernel task, or before return from
external abort.

So, I send this patch intend to panic directly when receive a sea in kernel context.

crash log:

NOTICE:  [TotemRasIntCpuNodeEri]:[1879L]
NOTICE:  [RasEriInterrupt]:[173L]NodeTYP2Status = 0x0
[  387.740609] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
[  387.748837] {1}[Hardware Error]: event severity: recoverable
[  387.754470] {1}[Hardware Error]:  Error 0, type: recoverable
[  387.760103] {1}[Hardware Error]:   section_type: ARM processor error
[  387.766425] {1}[Hardware Error]:   MIDR: 0x00000000481fd010
[  387.771972] {1}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081080000
[  387.780628] {1}[Hardware Error]:   error affinity level: 0
[  387.786088] {1}[Hardware Error]:   running state: 0x1
[  387.791115] {1}[Hardware Error]:   Power State Coordination Interface state: 0
[  387.798301] {1}[Hardware Error]:   Error info structure 0:
[  387.803761] {1}[Hardware Error]:   num errors: 1
[  387.808356] {1}[Hardware Error]:    error_type: 0, cache error
[  387.814160] {1}[Hardware Error]:    error_info: 0x0000000024400017
[  387.820311] {1}[Hardware Error]:     transaction type: Instruction
[  387.826461] {1}[Hardware Error]:     operation type: Generic error (type cannot be determined)
[  387.835031] {1}[Hardware Error]:     cache level: 1
[  387.839878] {1}[Hardware Error]:     the error has been corrected
[  387.845942] {1}[Hardware Error]:    physical fault address: 0x00000027caf50770
[  387.853162] Internal error: synchronous external abort: 96000610 [#1] PREEMPT SMP
[  387.860611] Modules linked in: l1l2_inject(O) vfio_iommu_type1 vfio_pci vfio_virqfd vfio ib_ipoib ib_umad rpcrdma ib_iser libiscsi scsi_transport_iscsi hns_roce_hw_v2 crct10dif_ce ses hns3 hclge hnae3 hisi_trng_v2 rng_core hisi_zip hisi_sec2 hisi_hpre hisi_qm uacce hisi_sas_v3_hw hisi_sas_main libsas scsi_transport_sas
[  387.888725] CPU: 0 PID: 940 Comm: kworker/0:3 Kdump: loaded Tainted: G           O      5.5.0-rc4-g5993cbe #1
[  387.898592] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B210.01 03/12/2020
[
 Message from  387.907429] Workqueue: events error_inject [l1l2_inject]
[  387.914098] pstate: 80c00009 (Nzcv daif +PAN +UAO)
 s[yslogd@localho  387.918867] pc : lsu_inj_ue+0x58/0x70 [l1l2_inject]
[  387.925103] lr : error_inject+0x64/0xb0 [l1l2_inject]
[  387.930132] sp : ffff800020af3d90
[  387.933435] x29: ffff800020af3d90 x28: 0000000000000000
st[ at Mar 30 11:  387.938723] x27: ffff0027dae6e838 x26: ffff80001178bcc8
[  387.945391] x25: 0000000000000000 x24: ffffa2c77162e090
[  387.950680] x23: 0000000000000000 x22: ffff2027d7c33d40
33[:55 ...
 ker  387.955968] x21: ffff2027d7c37a00 x20: ffff0027d679c000
[  387.962636] x19: ffffa2c77162e088 x18: 0000000020cbf59a
ne[l:Internal err  387.967924] x17: 000000000000000e x16: ffffa2c7b812bc98
[  387.974592] x15: 0000000000000001 x14: 0000000000000000
[  387.979880] x13: ffff2027cf75a780 x12: ffffa2c7b8299c18
or[: synchronous   387.985168] x11: 0000000000000000 x10: 00000000000009f0
[  387.991836] x9 : ffff800020af3d50 x8 : fefefefefefefeff
ex[ternal abort:   387.997124] x7 : 0000000000000000 x6 : 001d4ed88e000000
[  388.003792] x5 : 000073746e657665 x4 : 0000080110f81381
[  388.009080] x3 : 000000000000002f x2 : 807fffffffffffff
96[000610 [#1] PR  388.014369] x1 : ffffa2c77162c518 x0 : 0000000000000081
[  388.021037] Call trace:
EEMPT SMP
[  388.023475]  lsu_inj_ue+0x58/0x70 [l1l2_inject]
[  388.029019]  error_inject+0x64/0xb0 [l1l2_inject]
[  388.033707]  process_one_work+0x158/0x4b8
[  388.037699]  worker_thread+0x50/0x498
[  388.041348]  kthread+0xfc/0x128
[  388.044480]  ret_from_fork+0x10/0x1c
[  388.048042] Code: b2790000 d519f780 f9800020 d5033f9f (58001001)
[  388.054109] ---[ end trace 39d51c21b0e42ba6 ]---

core 0 hung up at here.


> 
> Thanks,
> Mark.
> 
>> +
>>  const char *esr_get_class_string(u32 esr);
>>  #endif /* __ASSEMBLY */
>>  
>> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
>> index cf402be5c573..08f7f7688d5b 100644
>> --- a/arch/arm64/kernel/traps.c
>> +++ b/arch/arm64/kernel/traps.c
>> @@ -202,6 +202,8 @@ void die(const char *str, struct pt_regs *regs, int err)
>>  		panic("Fatal exception in interrupt");
>>  	if (panic_on_oops)
>>  		panic("Fatal exception");
>> +	if (esr_is_ext_abort(err))
>> +		panic("Synchronous external abort in kernel context");
>>  
>>  	raw_spin_unlock_irqrestore(&die_lock, flags);
>>  
>> -- 
>> 2.20.1
>>
> .
>
Xie XiuQi April 14, 2020, 12:39 p.m. UTC | #4
Hi James,

On 2020/4/14 20:16, James Morse wrote:
> Hi Xie,
> 
> On 14/04/2020 11:59, Mark Rutland wrote:
>> On Fri, Apr 10, 2020 at 09:52:45AM +0800, Xie XiuQi wrote:
>>> We should panic even panic_on_oops is not set, when we can't recover
>>> from synchronous external abort in kernel context.
> 
> Hmm, fault-from-kernel-context doesn't mean the fault affects the kernel. If the kernel is
> reading or writing from user-space memory for a syscall, its the user-space memory that is
> affected. This thread can't make progress, so we kill it.
> If its a kernel thread or we were in irq context, we panic().
> 
> I don't think you really want all faults that happen as a result of a kernel access to be
> fatal!

Yes, you're right. I just want to fix a hung up when ras error inject testing.

panic_on_oops is not set in the kernel image for testing. When receiving a sea in kernel
context, the PE trap in do_exit(), and can't return any more.

I analyze the source code, the call trace might like this:
   do_mem_abort
     -> arm64_notify_die
        -> die                    # kernel context, call die() directly;
           -> do_exit             # kernel process context, call do_exit(SIGSEGV);
              -> do_task_dead()   # call do_task_dead(), and hung up this core;

> 
> [...]
> 
>> What exactly are you trying to catch here? If you are seeing a problem
>> in practice, can you please share your log from a crash?
> 
> Yes please!
> 

crash log:

NOTICE:  [TotemRasIntCpuNodeEri]:[1879L]
NOTICE:  [RasEriInterrupt]:[173L]NodeTYP2Status = 0x0
[  387.740609] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
[  387.748837] {1}[Hardware Error]: event severity: recoverable
[  387.754470] {1}[Hardware Error]:  Error 0, type: recoverable
[  387.760103] {1}[Hardware Error]:   section_type: ARM processor error
[  387.766425] {1}[Hardware Error]:   MIDR: 0x00000000481fd010
[  387.771972] {1}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081080000
[  387.780628] {1}[Hardware Error]:   error affinity level: 0
[  387.786088] {1}[Hardware Error]:   running state: 0x1
[  387.791115] {1}[Hardware Error]:   Power State Coordination Interface state: 0
[  387.798301] {1}[Hardware Error]:   Error info structure 0:
[  387.803761] {1}[Hardware Error]:   num errors: 1
[  387.808356] {1}[Hardware Error]:    error_type: 0, cache error
[  387.814160] {1}[Hardware Error]:    error_info: 0x0000000024400017
[  387.820311] {1}[Hardware Error]:     transaction type: Instruction
[  387.826461] {1}[Hardware Error]:     operation type: Generic error (type cannot be determined)
[  387.835031] {1}[Hardware Error]:     cache level: 1
[  387.839878] {1}[Hardware Error]:     the error has been corrected
[  387.845942] {1}[Hardware Error]:    physical fault address: 0x00000027caf50770
[  387.853162] Internal error: synchronous external abort: 96000610 [#1] PREEMPT SMP
[  387.860611] Modules linked in: l1l2_inject(O) vfio_iommu_type1 vfio_pci vfio_virqfd vfio ib_ipoib ib_umad rpcrdma ib_iser libiscsi scsi_transport_iscsi hns_roce_hw_v2 crct10dif_ce ses hns3 hclge hnae3 hisi_trng_v2 rng_core hisi_zip hisi_sec2 hisi_hpre hisi_qm uacce hisi_sas_v3_hw hisi_sas_main libsas scsi_transport_sas
[  387.888725] CPU: 0 PID: 940 Comm: kworker/0:3 Kdump: loaded Tainted: G           O      5.5.0-rc4-g5993cbe #1
[  387.898592] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B210.01 03/12/2020
[
 Message from  387.907429] Workqueue: events error_inject [l1l2_inject]
[  387.914098] pstate: 80c00009 (Nzcv daif +PAN +UAO)
 s[yslogd@localho  387.918867] pc : lsu_inj_ue+0x58/0x70 [l1l2_inject]
[  387.925103] lr : error_inject+0x64/0xb0 [l1l2_inject]
[  387.930132] sp : ffff800020af3d90
[  387.933435] x29: ffff800020af3d90 x28: 0000000000000000
st[ at Mar 30 11:  387.938723] x27: ffff0027dae6e838 x26: ffff80001178bcc8
[  387.945391] x25: 0000000000000000 x24: ffffa2c77162e090
[  387.950680] x23: 0000000000000000 x22: ffff2027d7c33d40
33[:55 ...
 ker  387.955968] x21: ffff2027d7c37a00 x20: ffff0027d679c000
[  387.962636] x19: ffffa2c77162e088 x18: 0000000020cbf59a
ne[l:Internal err  387.967924] x17: 000000000000000e x16: ffffa2c7b812bc98
[  387.974592] x15: 0000000000000001 x14: 0000000000000000
[  387.979880] x13: ffff2027cf75a780 x12: ffffa2c7b8299c18
or[: synchronous   387.985168] x11: 0000000000000000 x10: 00000000000009f0
[  387.991836] x9 : ffff800020af3d50 x8 : fefefefefefefeff
ex[ternal abort:   387.997124] x7 : 0000000000000000 x6 : 001d4ed88e000000
[  388.003792] x5 : 000073746e657665 x4 : 0000080110f81381
[  388.009080] x3 : 000000000000002f x2 : 807fffffffffffff
96[000610 [#1] PR  388.014369] x1 : ffffa2c77162c518 x0 : 0000000000000081
[  388.021037] Call trace:
EEMPT SMP
[  388.023475]  lsu_inj_ue+0x58/0x70 [l1l2_inject]
[  388.029019]  error_inject+0x64/0xb0 [l1l2_inject]
[  388.033707]  process_one_work+0x158/0x4b8
[  388.037699]  worker_thread+0x50/0x498
[  388.041348]  kthread+0xfc/0x128
[  388.044480]  ret_from_fork+0x10/0x1c
[  388.048042] Code: b2790000 d519f780 f9800020 d5033f9f (58001001)
[  388.054109] ---[ end trace 39d51c21b0e42ba6 ]---

core 0 hung up at here.

> 
> I suspect you want to make memory_failure() smarter about faults that affect the kernel
> text or data. If so, please do it in memory_failure() where it benefits all architectures,
> and all methods of reporting errors.
> (we may need a 'synchronous' hint to memory_failure(), it expects everything to be
> asynchronous).
> 
> If its not memory, we should extend the RAS handling to know what this error is, and that
> it is fatal. (e.g. PE state is infected)
> 
> 
> Thanks,
> 
> James
> .
>
James Morse April 14, 2020, 2:53 p.m. UTC | #5
Hi Xie,

On 14/04/2020 13:39, Xie XiuQi wrote:
> On 2020/4/14 20:16, James Morse wrote:
>> On 14/04/2020 11:59, Mark Rutland wrote:
>>> On Fri, Apr 10, 2020 at 09:52:45AM +0800, Xie XiuQi wrote:
>>>> We should panic even panic_on_oops is not set, when we can't recover
>>>> from synchronous external abort in kernel context.
>>
>> Hmm, fault-from-kernel-context doesn't mean the fault affects the kernel. If the kernel is
>> reading or writing from user-space memory for a syscall, its the user-space memory that is
>> affected. This thread can't make progress, so we kill it.
>> If its a kernel thread or we were in irq context, we panic().
>>
>> I don't think you really want all faults that happen as a result of a kernel access to be
>> fatal!

> Yes, you're right. I just want to fix a hung up when ras error inject testing.
> 
> panic_on_oops is not set in the kernel image for testing. When receiving a sea in kernel
> context, the PE trap in do_exit(), and can't return any more.

trap? gets trapped, (or gets stuck, to prevent confusion with the architectures use of the
word 'trap'!)


> I analyze the source code, the call trace might like this:
>    do_mem_abort
>      -> arm64_notify_die
>         -> die                    # kernel context, call die() directly;
>            -> do_exit             # kernel process context, call do_exit(SIGSEGV);
>               -> do_task_dead()   # call do_task_dead(), and hung up this core;

Thanks for the trace. This describes a corrected error in your I-cache, that occurred
while the kernel was executing a kernel thread. These shouldn't be fatal, because it was
corrected ... but the kernel doesn't know that because it doesn't know how to parse those
records.

There are two things wrong here:
1. it locks up while trying to kill the thread.
2. it tried to kill the thread in the first place!

For 1, does your l1l2_inject module take any spinlocks or tinker with the pre-empt counter?

I suspect this is some rarely-tested path in do_task_dead() that sleeps, but can't from
your l1l2_inject module because the pre-empt counter has been raised.

CONFIG_DEBUG_ATOMIC_SLEEP may point at the function to blame.

It may be accessing some thread data that kthreads don't have, taking a second exception
and blocking on the die_lock. LOCKDEP should catch this one.

We should fix this one first.



2 is a bit more complicated. Today, this is fatal because the arch code assumes this was
probably a memory error, and if it returns to user-space it can't avoid getting stuck in a
loop until the scheduled memory_failure() work runs. Instead it unconditionally signals
the process.

[0] fixes this up for memory errors. But in this case it will assume all the work has been
done by APEI, (or will be before we get back to user-space). APEI ignored the processor
error you fed it, because it doesn't know what they are, they are just printed out.

This is fine for corrected errors, but were are reliant on your firmware describing
uncorrected errors with a 'fatal' severity... which might be too heavy a hammer. (Ideally
that would mean 'uncontained', and the kernel should handle, or detect it can't, any other
errror...)

We can discuss the right thing to do here when support for parsing these 'arm processor
errors' is posted.
(If you think I need to do something different in [0] because of this, please shout!)


> [  387.740609] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
> [  387.748837] {1}[Hardware Error]: event severity: recoverable
> [  387.754470] {1}[Hardware Error]:  Error 0, type: recoverable

> [  387.760103] {1}[Hardware Error]:   section_type: ARM processor error

et voila! Case 2. Linux doesn't handle these 'ARM processor error' things, because it
doesn't know what they are. It just prints them out.


> [  387.766425] {1}[Hardware Error]:   MIDR: 0x00000000481fd010
> [  387.771972] {1}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081080000
> [  387.780628] {1}[Hardware Error]:   error affinity level: 0
> [  387.786088] {1}[Hardware Error]:   running state: 0x1
> [  387.791115] {1}[Hardware Error]:   Power State Coordination Interface state: 0
> [  387.798301] {1}[Hardware Error]:   Error info structure 0:
> [  387.803761] {1}[Hardware Error]:   num errors: 1
> [  387.808356] {1}[Hardware Error]:    error_type: 0, cache error
> [  387.814160] {1}[Hardware Error]:    error_info: 0x0000000024400017
> [  387.820311] {1}[Hardware Error]:     transaction type: Instruction
> [  387.826461] {1}[Hardware Error]:     operation type: Generic error (type cannot be determined)
> [  387.835031] {1}[Hardware Error]:     cache level: 1

> [  387.839878] {1}[Hardware Error]:     the error has been corrected

As this is corrected, I think the bug is a deadlock somewhere in do_task_dead().


> [  387.845942] {1}[Hardware Error]:    physical fault address: 0x00000027caf50770

(and your firmware gives you the physical address, excellent, the kernel can do something
with this!)


> [  388.021037] Call trace:
> [  388.023475]  lsu_inj_ue+0x58/0x70 [l1l2_inject]
> [  388.029019]  error_inject+0x64/0xb0 [l1l2_inject]
> [  388.033707]  process_one_work+0x158/0x4b8
> [  388.037699]  worker_thread+0x50/0x498
> [  388.041348]  kthread+0xfc/0x128
> [  388.044480]  ret_from_fork+0x10/0x1c
> [  388.048042] Code: b2790000 d519f780 f9800020 d5033f9f (58001001)
> [  388.054109] ---[ end trace 39d51c21b0e42ba6 ]---
> 
> core 0 hung up at here.

DEBUG_ATOMIC_SLEEP and maybe LOCKDEP should help you pin down where the kernel is getting
stuck. It looks like a bug in the core code.


Thanks,

James

[0] https://lore.kernel.org/linux-acpi/20200228174817.74278-4-james.morse@arm.com/
Xie XiuQi April 30, 2020, 9:44 a.m. UTC | #6
Hi James,

Sorry for reply late.

On 2020/4/14 22:53, James Morse wrote:
> Hi Xie,
> 
> On 14/04/2020 13:39, Xie XiuQi wrote:
>> On 2020/4/14 20:16, James Morse wrote:
>>> On 14/04/2020 11:59, Mark Rutland wrote:
>>>> On Fri, Apr 10, 2020 at 09:52:45AM +0800, Xie XiuQi wrote:
>>>>> We should panic even panic_on_oops is not set, when we can't recover
>>>>> from synchronous external abort in kernel context.
>>>
>>> Hmm, fault-from-kernel-context doesn't mean the fault affects the kernel. If the kernel is
>>> reading or writing from user-space memory for a syscall, its the user-space memory that is
>>> affected. This thread can't make progress, so we kill it.
>>> If its a kernel thread or we were in irq context, we panic().
>>>
>>> I don't think you really want all faults that happen as a result of a kernel access to be
>>> fatal!
> 
>> Yes, you're right. I just want to fix a hung up when ras error inject testing.
>>
>> panic_on_oops is not set in the kernel image for testing. When receiving a sea in kernel
>> context, the PE trap in do_exit(), and can't return any more.
> 
> trap? gets trapped, (or gets stuck, to prevent confusion with the architectures use of the
> word 'trap'!)
> 
> 
>> I analyze the source code, the call trace might like this:
>>    do_mem_abort
>>      -> arm64_notify_die
>>         -> die                    # kernel context, call die() directly;
>>            -> do_exit             # kernel process context, call do_exit(SIGSEGV);
>>               -> do_task_dead()   # call do_task_dead(), and hung up this core;
> 
> Thanks for the trace. This describes a corrected error in your I-cache, that occurred
> while the kernel was executing a kernel thread. These shouldn't be fatal, because it was
> corrected ... but the kernel doesn't know that because it doesn't know how to parse those
> records.
> 
> There are two things wrong here:
> 1. it locks up while trying to kill the thread.
> 2. it tried to kill the thread in the first place!
> 
> For 1, does your l1l2_inject module take any spinlocks or tinker with the pre-empt counter?
> 
> I suspect this is some rarely-tested path in do_task_dead() that sleeps, but can't from
> your l1l2_inject module because the pre-empt counter has been raised.
> 
> CONFIG_DEBUG_ATOMIC_SLEEP may point at the function to blame.
> 
> It may be accessing some thread data that kthreads don't have, taking a second exception
> and blocking on the die_lock. LOCKDEP should catch this one.
> 
> We should fix this one first.
> 

I analyze the l1l2_inject module, there is a kworker thread used to inject error.
do_sea() try to kill the kworker thread, so the work(s) queued on this kworker
is blocked.

panic_on_oops option is usually set on production environment, so if someone
unset this option for testing, he should take his own risk. Is it right?

> 
> 2 is a bit more complicated. Today, this is fatal because the arch code assumes this was
> probably a memory error, and if it returns to user-space it can't avoid getting stuck in a
> loop until the scheduled memory_failure() work runs. Instead it unconditionally signals
> the process.
> 
> [0] fixes this up for memory errors. But in this case it will assume all the work has been
> done by APEI, (or will be before we get back to user-space). APEI ignored the processor
> error you fed it, because it doesn't know what they are, they are just printed out.
> 
> This is fine for corrected errors, but were are reliant on your firmware describing
> uncorrected errors with a 'fatal' severity... which might be too heavy a hammer. (Ideally
> that would mean 'uncontained', and the kernel should handle, or detect it can't, any other
> errror...)
> 
> We can discuss the right thing to do here when support for parsing these 'arm processor
> errors' is posted.
> (If you think I need to do something different in [0] because of this, please shout!)

For some errors which could be recoverable or corrected, do_sea() should not kill process
or die() unconditionally. It's better detect this situation.

> 
> 
>> [  387.740609] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 9
>> [  387.748837] {1}[Hardware Error]: event severity: recoverable
>> [  387.754470] {1}[Hardware Error]:  Error 0, type: recoverable
> 
>> [  387.760103] {1}[Hardware Error]:   section_type: ARM processor error
> 
> et voila! Case 2. Linux doesn't handle these 'ARM processor error' things, because it
> doesn't know what they are. It just prints them out.
> 
> 
>> [  387.766425] {1}[Hardware Error]:   MIDR: 0x00000000481fd010
>> [  387.771972] {1}[Hardware Error]:   Multiprocessor Affinity Register (MPIDR): 0x0000000081080000
>> [  387.780628] {1}[Hardware Error]:   error affinity level: 0
>> [  387.786088] {1}[Hardware Error]:   running state: 0x1
>> [  387.791115] {1}[Hardware Error]:   Power State Coordination Interface state: 0
>> [  387.798301] {1}[Hardware Error]:   Error info structure 0:
>> [  387.803761] {1}[Hardware Error]:   num errors: 1
>> [  387.808356] {1}[Hardware Error]:    error_type: 0, cache error
>> [  387.814160] {1}[Hardware Error]:    error_info: 0x0000000024400017
>> [  387.820311] {1}[Hardware Error]:     transaction type: Instruction
>> [  387.826461] {1}[Hardware Error]:     operation type: Generic error (type cannot be determined)
>> [  387.835031] {1}[Hardware Error]:     cache level: 1
> 
>> [  387.839878] {1}[Hardware Error]:     the error has been corrected
> 
> As this is corrected, I think the bug is a deadlock somewhere in do_task_dead().
> 
> 
>> [  387.845942] {1}[Hardware Error]:    physical fault address: 0x00000027caf50770
> 
> (and your firmware gives you the physical address, excellent, the kernel can do something
> with this!)
> 
> 
>> [  388.021037] Call trace:
>> [  388.023475]  lsu_inj_ue+0x58/0x70 [l1l2_inject]
>> [  388.029019]  error_inject+0x64/0xb0 [l1l2_inject]
>> [  388.033707]  process_one_work+0x158/0x4b8
>> [  388.037699]  worker_thread+0x50/0x498
>> [  388.041348]  kthread+0xfc/0x128
>> [  388.044480]  ret_from_fork+0x10/0x1c
>> [  388.048042] Code: b2790000 d519f780 f9800020 d5033f9f (58001001)
>> [  388.054109] ---[ end trace 39d51c21b0e42ba6 ]---
>>
>> core 0 hung up at here.
> 
> DEBUG_ATOMIC_SLEEP and maybe LOCKDEP should help you pin down where the kernel is getting
> stuck. It looks like a bug in the core code.
> 
> 
> Thanks,
> 
> James
> 
> [0] https://lore.kernel.org/linux-acpi/20200228174817.74278-4-james.morse@arm.com/
> .
>
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index cb29253ae86b..acfc71c6d148 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -326,6 +326,18 @@  static inline bool esr_is_data_abort(u32 esr)
 	return ec == ESR_ELx_EC_DABT_LOW || ec == ESR_ELx_EC_DABT_CUR;
 }
 
+static inline bool esr_is_inst_abort(u32 esr)
+{
+	const u32 ec = ESR_ELx_EC(esr);
+
+	return ec == ESR_ELx_EC_IABT_LOW || ec == ESR_ELx_EC_IABT_CUR;
+}
+
+static inline bool esr_is_ext_abort(u32 esr)
+{
+	return esr_is_data_abort(esr) || esr_is_inst_abort(esr);
+}
+
 const char *esr_get_class_string(u32 esr);
 #endif /* __ASSEMBLY */
 
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index cf402be5c573..08f7f7688d5b 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -202,6 +202,8 @@  void die(const char *str, struct pt_regs *regs, int err)
 		panic("Fatal exception in interrupt");
 	if (panic_on_oops)
 		panic("Fatal exception");
+	if (esr_is_ext_abort(err))
+		panic("Synchronous external abort in kernel context");
 
 	raw_spin_unlock_irqrestore(&die_lock, flags);