Message ID | 20210607032329.28671-1-mark-pk.tsai@mediatek.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: ftrace: don't dereference a probably invalid address | expand |
On Mon, 7 Jun 2021 11:23:30 +0800 Mark-PK Tsai <mark-pk.tsai@mediatek.com> wrote: > Address in __mcount_loc may be invalid if somthing goes wrong. > On our arm64 platform, the bug in recordmcount make kernel > crash in ftrace_init(). How did it crash? The link below doesn't show any crash. > > https://lore.kernel.org/lkml/20210607023839.26387-1-mark-pk.tsai@mediatek.com/ > > Return -EFAULT if we are dealing with out-of-range condition > to prevent dereference the invalid address in ftrace_bug(), > then the kernel can disable ftrace safely for problematic > __mcount_loc. !mod is not an out-of-range condition. It just happened that the other bug caused this strange side-effect. A !mod does not mean a fault happened. Just because it may have been caused by a fault in your use case does not mean that it's a fault in all use cases. That's like saying that your dog is a poodle, so all dogs are poodles. A return of -EINVAL should not cause a crash. If it does, then that needs to be fixed. -- Steve > > Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> > --- > arch/arm64/kernel/ftrace.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c > index b5d3ddaf69d9..98bec8445a58 100644 > --- a/arch/arm64/kernel/ftrace.c > +++ b/arch/arm64/kernel/ftrace.c > @@ -201,7 +201,7 @@ int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, > preempt_enable(); > > if (WARN_ON(!mod)) > - return -EINVAL; > + return -EFAULT; > } > > /*
> On Mon, 7 Jun 2021 11:23:30 +0800 > Mark-PK Tsai <mark-pk.tsai@mediatek.com> wrote: > > > Address in __mcount_loc may be invalid if somthing goes wrong. > > On our arm64 platform, the bug in recordmcount make kernel > > crash in ftrace_init(). > > How did it crash? The link below doesn't show any crash. Below is the crash log: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:2008 ftrace_bug+0x9c/0x38c Modules linked in: CPU: 0 PID: 0 Comm: swapper Tainted: G W 5.4.61-350609-gf78fedda5a5e #1 Hardware name: MediaTek MT5896 (DT) pstate: 60400089 (nZCv daIf +PAN -UAO) pc : ftrace_bug+0x9c/0x38c lr : ftrace_process_locs+0x314/0x3b8 sp : ffffffc011743ef0 x29: ffffffc011743f00 x28: 0000000000000001 x27: ffffff818e401b80 x26: 0000000000000000 x25: ffffff818e480008 x24: ffffffc011749000 x23: 0000000000000008 x22: 0000000000000000 x21: ffffffc010084ac0 x20: 0000000000000024 x19: ffffff818e480000 x18: ffffffc011759c20 x17: ffffffc01133dcf8 x16: 0000000000000068 x15: ffffffc01133dcf8 x14: 0000000000000000 x13: 0000000000000000 x12: ffffffc010084ae4 x11: ffffffc011749000 x10: ffffffc011749000 x9 : 0000000000000001 x8 : ffffffc011749000 x7 : 0000000000000000 x6 : 000000000000003f x5 : 000000000008e93d x4 : 0000000000000000 x3 : 0000000000000101 x2 : ffffffc010084ac0 x1 : ffffff818e480000 x0 : ffffffc01127621c Call trace: ftrace_bug+0x9c/0x38c ftrace_process_locs+0x314/0x3b8 ftrace_init+0x8c/0xbc start_kernel+0x180/0x40c ---[ end trace 59db467eb74a6604 ]--- ftrace failed to modify [<0000000000000024>] 0x24 actual: "Unable to handle kernel read from unreadable memory at virtual address 0000000000000024 "Mem abort info: And the crash is becuase kernel trying to read *rec->ip in print_ip_ins() if ftrace_bug() get error code -EINVAL. > > > > > https://lore.kernel.org/lkml/20210607023839.26387-1-mark-pk.tsai@mediatek.com/ > > > > Return -EFAULT if we are dealing with out-of-range condition > > to prevent dereference the invalid address in ftrace_bug(), > > then the kernel can disable ftrace safely for problematic > > __mcount_loc. > > !mod is not an out-of-range condition. It just happened that the other > bug caused this strange side-effect. A !mod does not mean a fault > happened. Just because it may have been caused by a fault in your use > case does not mean that it's a fault in all use cases. That's like > saying that your dog is a poodle, so all dogs are poodles. > > A return of -EINVAL should not cause a crash. If it does, then that > needs to be fixed. I understand. Keep -EINVAL here make more sense. So maybe we should handle this case in ftrace_bug() by checking the rec->ip? > > -- Steve > > > > > > Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> > > --- > > arch/arm64/kernel/ftrace.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c > > index b5d3ddaf69d9..98bec8445a58 100644 > > --- a/arch/arm64/kernel/ftrace.c > > +++ b/arch/arm64/kernel/ftrace.c > > @@ -201,7 +201,7 @@ int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, > > preempt_enable(); > > > > if (WARN_ON(!mod)) > > - return -EINVAL; > > + return -EFAULT; > > } > > > > /*
> > On Mon, 7 Jun 2021 11:23:30 +0800 > > Mark-PK Tsai <mark-pk.tsai@mediatek.com> wrote: > > > > > Address in __mcount_loc may be invalid if somthing goes wrong. > > > On our arm64 platform, the bug in recordmcount make kernel > > > crash in ftrace_init(). > > > > How did it crash? The link below doesn't show any crash. > > Below is the crash log: > > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:2008 ftrace_bug+0x9c/0x38c > Modules linked in: > CPU: 0 PID: 0 Comm: swapper Tainted: G W 5.4.61-350609-gf78fedda5a5e #1 > Hardware name: MediaTek MT5896 (DT) > pstate: 60400089 (nZCv daIf +PAN -UAO) > pc : ftrace_bug+0x9c/0x38c > lr : ftrace_process_locs+0x314/0x3b8 > sp : ffffffc011743ef0 > x29: ffffffc011743f00 x28: 0000000000000001 > x27: ffffff818e401b80 x26: 0000000000000000 > x25: ffffff818e480008 x24: ffffffc011749000 > x23: 0000000000000008 x22: 0000000000000000 > x21: ffffffc010084ac0 x20: 0000000000000024 > x19: ffffff818e480000 x18: ffffffc011759c20 > x17: ffffffc01133dcf8 x16: 0000000000000068 > x15: ffffffc01133dcf8 x14: 0000000000000000 > x13: 0000000000000000 x12: ffffffc010084ae4 > x11: ffffffc011749000 x10: ffffffc011749000 > x9 : 0000000000000001 x8 : ffffffc011749000 > x7 : 0000000000000000 x6 : 000000000000003f > x5 : 000000000008e93d x4 : 0000000000000000 > x3 : 0000000000000101 x2 : ffffffc010084ac0 > x1 : ffffff818e480000 x0 : ffffffc01127621c > Call trace: > ftrace_bug+0x9c/0x38c > ftrace_process_locs+0x314/0x3b8 > ftrace_init+0x8c/0xbc > start_kernel+0x180/0x40c > ---[ end trace 59db467eb74a6604 ]--- > ftrace failed to modify > [<0000000000000024>] 0x24 > actual: > "Unable to handle kernel read from unreadable memory at virtual address 0000000000000024 > "Mem abort info: I'm sorry that the last reply I only post the warning log before crash. Below is the panic log right after this warning. "Unable to handle kernel read from unreadable memory at virtual address 0000000000000024 "Mem abort info: " ESR = 0x96000005 " EC = 0x25: DABT (current EL), IL = 32 bits " SET = 0, FnV = 0 " EA = 0, S1PTW = 0 "Data abort info: " ISV = 0, ISS = 0x00000005 " CM = 0, WnR = 0 "[0000000000000024] user address but active_mm is swapper Internal error: Oops: 96000005 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper Tainted: G W 5.4.61-350609-gf78fedda5a5e #1 pstate: 60400089 (nZCv daIf +PAN -UAO) pc : ftrace_bug+0xd8/0x38c lr : ftrace_bug+0xd8/0x38c sp : ffffffc011743ef0 x29: ffffffc011743f00 x28: 0000000000000001 x27: ffffff818e401b80 x26: 0000000000000000 x25: ffffff818e480008 x24: ffffffc011749000 x23: 0000000000000008 x22: 0000000000000000 x21: ffffffc010084ac0 x20: 0000000000000024 x19: ffffff818e480000 x18: ffffffc011759c20 x17: 0000000000000031 x16: ffffffc010d40c38 x15: ffffffc0114ad9ef x14: 000000000000004e x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 00000000ffffffff x9 : 0000000000000000 x8 : 0000000000000000 x7 : 000000000000000b x6 : ffffffc01188720b x5 : 000000000000000b x4 : ffffffc0118848ce x3 : 0000000000000020 x2 : 000000000000000b x1 : ffffffc0118848d9 x0 : 000000000000000b Call trace: ftrace_bug+0xd8/0x38c ftrace_process_locs+0x314/0x3b8 ftrace_init+0x8c/0xbc start_kernel+0x180/0x40c Code: f00085a1 9120d000 913e4421 97fdef85 (39400282) ---[ end trace 59db467eb74a6605 ]--- Kernel panic - not syncing: Attempted to kill the idle task! > > And the crash is becuase kernel trying to read *rec->ip in print_ip_ins() if > ftrace_bug() get error code -EINVAL. > > > > > > > > > https://lore.kernel.org/lkml/20210607023839.26387-1-mark-pk.tsai@mediatek.com/ > > > > > > Return -EFAULT if we are dealing with out-of-range condition > > > to prevent dereference the invalid address in ftrace_bug(), > > > then the kernel can disable ftrace safely for problematic > > > __mcount_loc. > > > > !mod is not an out-of-range condition. It just happened that the other > > bug caused this strange side-effect. A !mod does not mean a fault > > happened. Just because it may have been caused by a fault in your use > > case does not mean that it's a fault in all use cases. That's like > > saying that your dog is a poodle, so all dogs are poodles. > > > > A return of -EINVAL should not cause a crash. If it does, then that > > needs to be fixed. > > I understand. > Keep -EINVAL here make more sense. > So maybe we should handle this case in ftrace_bug() by checking the rec->ip? > > > > > -- Steve > > > > > > > > > > Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> > > > --- > > > arch/arm64/kernel/ftrace.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c > > > index b5d3ddaf69d9..98bec8445a58 100644 > > > --- a/arch/arm64/kernel/ftrace.c > > > +++ b/arch/arm64/kernel/ftrace.c > > > @@ -201,7 +201,7 @@ int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, > > > preempt_enable(); > > > > > > if (WARN_ON(!mod)) > > > - return -EINVAL; > > > + return -EFAULT; > > > } > > > > > > /* >
On Mon, 7 Jun 2021 22:15:22 +0800 Mark-PK Tsai <mark-pk.tsai@mediatek.com> wrote: > And the crash is becuase kernel trying to read *rec->ip in print_ip_ins() if > ftrace_bug() get error code -EINVAL. Right, so the actual fix should be something like this: [ not tested, nor even compiled ] -- Steve diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 2e8a3fde7104..72ef4dccbcc4 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -1967,12 +1967,18 @@ static int ftrace_hash_ipmodify_update(struct ftrace_ops *ops, static void print_ip_ins(const char *fmt, const unsigned char *p) { + char ins[MCOUNT_INSN_SIZE]; int i; + if (copy_from_kernel_nofault(ins, p, MCOUNT_INSN_SIZE)) { + printk(KERN_CONT "%s[FAULT](%px)", fmt, p); + return; + } + printk(KERN_CONT "%s", fmt); for (i = 0; i < MCOUNT_INSN_SIZE; i++) - printk(KERN_CONT "%s%02x", i ? ":" : "", p[i]); + printk(KERN_CONT "%s%02x", i ? ":" : "", ins[i]); } enum ftrace_bug_type ftrace_bug_type;
> On Mon, 7 Jun 2021 22:15:22 +0800 > Mark-PK Tsai <mark-pk.tsai@mediatek.com> wrote: > > > And the crash is becuase kernel trying to read *rec->ip in print_ip_ins() if > > ftrace_bug() get error code -EINVAL. > > Right, so the actual fix should be something like this: > > [ not tested, nor even compiled ] > > -- Steve > I just test this patch, and turns out it work well. All my modification is only rename copy_from_kernel_nofault into probe_kernel_read because the older kernel version I use. Then the kernel will show the error mesage instead of panic as following. [ 0.000000] ftrace failed to modify [ 0.000000] [<0000000000000020>] 0x20 [ 0.000000] actual: [FAULT](0000000000000020) Should I resend this patch as v2? Or you will upstream this fix? > > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c > index 2e8a3fde7104..72ef4dccbcc4 100644 > --- a/kernel/trace/ftrace.c > +++ b/kernel/trace/ftrace.c > @@ -1967,12 +1967,18 @@ static int ftrace_hash_ipmodify_update(struct ftrace_ops *ops, > > static void print_ip_ins(const char *fmt, const unsigned char *p) > { > + char ins[MCOUNT_INSN_SIZE]; > int i; > > + if (copy_from_kernel_nofault(ins, p, MCOUNT_INSN_SIZE)) { > + printk(KERN_CONT "%s[FAULT](%px)", fmt, p); > + return; > + } > + > printk(KERN_CONT "%s", fmt); > > for (i = 0; i < MCOUNT_INSN_SIZE; i++) > - printk(KERN_CONT "%s%02x", i ? ":" : "", p[i]); > + printk(KERN_CONT "%s%02x", i ? ":" : "", ins[i]); > } > > enum ftrace_bug_type ftrace_bug_type;
On Tue, 8 Jun 2021 02:14:03 +0800 Mark-PK Tsai <mark-pk.tsai@mediatek.com> wrote: > Should I resend this patch as v2? > Or you will upstream this fix? I'll push it. I have some other fixes to add as well. Thanks! -- Steve
> On Tue, 8 Jun 2021 02:14:03 +0800 > Mark-PK Tsai <mark-pk.tsai@mediatek.com> wrote: > > > Should I resend this patch as v2? > > Or you will upstream this fix? > > I'll push it. I have some other fixes to add as well. > > Thanks! > > -- Steve Thanks! Here is my reported and tested by. Reported-and-tested-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c index b5d3ddaf69d9..98bec8445a58 100644 --- a/arch/arm64/kernel/ftrace.c +++ b/arch/arm64/kernel/ftrace.c @@ -201,7 +201,7 @@ int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec, preempt_enable(); if (WARN_ON(!mod)) - return -EINVAL; + return -EFAULT; } /*
Address in __mcount_loc may be invalid if somthing goes wrong. On our arm64 platform, the bug in recordmcount make kernel crash in ftrace_init(). https://lore.kernel.org/lkml/20210607023839.26387-1-mark-pk.tsai@mediatek.com/ Return -EFAULT if we are dealing with out-of-range condition to prevent dereference the invalid address in ftrace_bug(), then the kernel can disable ftrace safely for problematic __mcount_loc. Signed-off-by: Mark-PK Tsai <mark-pk.tsai@mediatek.com> --- arch/arm64/kernel/ftrace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)