Message ID | fadc5f2a295d6cb9f590bbbdd71fc2f78bf3a085.1644216043.git.naveen.n.rao@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | RFC |
Headers | show |
Series | powerpc64/bpf: Add support for BPF Trampolines | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch, async |
On Mon, 7 Feb 2022 12:37:21 +0530 "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote: > --- a/arch/powerpc/kernel/trace/ftrace.c > +++ b/arch/powerpc/kernel/trace/ftrace.c > @@ -1137,3 +1137,14 @@ char *arch_ftrace_match_adjust(char *str, const char *search) > return str; > } > #endif /* PPC64_ELF_ABI_v1 */ > + > +#ifdef CONFIG_MPROFILE_KERNEL > +unsigned long ftrace_location_lookup(unsigned long ip) > +{ > + /* > + * Per livepatch.h, ftrace location is always within the first > + * 16 bytes of a function on powerpc with -mprofile-kernel. > + */ > + return ftrace_location_range(ip, ip + 16); I think this is the wrong approach for the implementation and error prone. > +} > +#endif > -- What I believe is a safer approach is to use the record address and add the range to it. That is, you know that the ftrace modification site is a range (multiple instructions), where in the ftrace infrastructure, only one ip represents that range. What you want is if you pass in an ip, and that ip is within that range, you return the ip that represents that range to ftrace. It looks like we need to change the compare function in the bsearch. Perhaps add a new macro to define the size of the range to be searched, instead of just using MCOUNT_INSN_SIZE? Then we may not even need this new lookup function? static int ftrace_cmp_recs(const void *a, const void *b) { const struct dyn_ftrace *key = a; const struct dyn_ftrace *rec = b; if (key->flags < rec->ip) return -1; if (key->ip >= rec->ip + ARCH_IP_SIZE) return 1; return 0; } Where ARCH_IP_SIZE is defined to MCOUNT_INSN_SIZE by default, but an arch could define it to something else, like 16. Would that work for you, or am I missing something? -- Steve
Steven Rostedt wrote: > On Mon, 7 Feb 2022 12:37:21 +0530 > "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote: > >> --- a/arch/powerpc/kernel/trace/ftrace.c >> +++ b/arch/powerpc/kernel/trace/ftrace.c >> @@ -1137,3 +1137,14 @@ char *arch_ftrace_match_adjust(char *str, const char *search) >> return str; >> } >> #endif /* PPC64_ELF_ABI_v1 */ >> + >> +#ifdef CONFIG_MPROFILE_KERNEL >> +unsigned long ftrace_location_lookup(unsigned long ip) >> +{ >> + /* >> + * Per livepatch.h, ftrace location is always within the first >> + * 16 bytes of a function on powerpc with -mprofile-kernel. >> + */ >> + return ftrace_location_range(ip, ip + 16); > > I think this is the wrong approach for the implementation and error prone. > >> +} >> +#endif >> -- > > What I believe is a safer approach is to use the record address and add the > range to it. > > That is, you know that the ftrace modification site is a range (multiple > instructions), where in the ftrace infrastructure, only one ip represents > that range. What you want is if you pass in an ip, and that ip is within > that range, you return the ip that represents that range to ftrace. > > It looks like we need to change the compare function in the bsearch. > > Perhaps add a new macro to define the size of the range to be searched, > instead of just using MCOUNT_INSN_SIZE? Then we may not even need this new > lookup function? > > static int ftrace_cmp_recs(const void *a, const void *b) > { > const struct dyn_ftrace *key = a; > const struct dyn_ftrace *rec = b; > > if (key->flags < rec->ip) > return -1; > if (key->ip >= rec->ip + ARCH_IP_SIZE) > return 1; > return 0; > } > > Where ARCH_IP_SIZE is defined to MCOUNT_INSN_SIZE by default, but an arch > could define it to something else, like 16. > > Would that work for you, or am I missing something? Yes, I hadn't realized that [un]register_ftrace_direct() and modify_ftrace_direct() internally lookup the correct ftrace location, and act on that. So, changing ftrace_cmp_recs() does look like it will work well for powerpc. Thanks for this suggestion. However, I think we will not be able to use a fixed range. I would like to reserve instructions from function entry till the branch to _mcount(), and it can be two or four instructions depending on whether a function has a global entry point. For this, I am considering adding a field in 'struct dyn_arch_ftrace', and a hook in ftrace_process_locs() to initialize the same. I may need to override ftrace_cmp_recs(). Thanks, - Naveen
On Wed, 09 Feb 2022 17:50:09 +0000 "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote: > However, I think we will not be able to use a fixed range. I would like > to reserve instructions from function entry till the branch to > _mcount(), and it can be two or four instructions depending on whether a > function has a global entry point. For this, I am considering adding a > field in 'struct dyn_arch_ftrace', and a hook in ftrace_process_locs() > to initialize the same. I may need to override ftrace_cmp_recs(). Be careful about adding anything to dyn_arch_ftrace. powerpc already adds the pointer to the module. Anything you add to that gets multiplied by thousands of times (which takes up memory). At boot up you may see something like: ftrace: allocating 45363 entries in 178 pages That's 45,363 dyn_arch_ftrace structures. And each module loads their own as well. To see how many total you have after boot up: # cat /sys/kernel/tracing/dyn_ftrace_total_info 55974 pages:295 groups: 89 That's from the same kernel. Another 10,000 entries were created by modules. (This was for x86_64) What you may be able to do, is to add a way to look at the already saved kallsyms, which keeps track of the function entry and exit to know how to map an address back to the function. kallsyms_lookup(addr, NULL, &offset, NULL, NULL); Should give you the offset of addr from the start of the function. -- Steve
Steven Rostedt wrote: > On Wed, 09 Feb 2022 17:50:09 +0000 > "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote: > >> However, I think we will not be able to use a fixed range. I would like >> to reserve instructions from function entry till the branch to >> _mcount(), and it can be two or four instructions depending on whether a >> function has a global entry point. For this, I am considering adding a >> field in 'struct dyn_arch_ftrace', and a hook in ftrace_process_locs() >> to initialize the same. I may need to override ftrace_cmp_recs(). > > Be careful about adding anything to dyn_arch_ftrace. powerpc already adds > the pointer to the module. Anything you add to that gets multiplied by > thousands of times (which takes up memory). > > At boot up you may see something like: > > ftrace: allocating 45363 entries in 178 pages > > That's 45,363 dyn_arch_ftrace structures. And each module loads their own > as well. To see how many total you have after boot up: > > > # cat /sys/kernel/tracing/dyn_ftrace_total_info > 55974 pages:295 groups: 89 > > That's from the same kernel. Another 10,000 entries were created by modules. > (This was for x86_64) > > What you may be able to do, is to add a way to look at the already saved > kallsyms, which keeps track of the function entry and exit to know how to > map an address back to the function. > > kallsyms_lookup(addr, NULL, &offset, NULL, NULL); > > Should give you the offset of addr from the start of the function. Good point. I should be able to overload the existing field for this purpose. Is something like the below ok? --- arch/powerpc/include/asm/ftrace.h | 13 ++++++ arch/powerpc/kernel/trace/ftrace.c | 73 ++++++++++++++++++++++++++---- kernel/trace/ftrace.c | 2 + 3 files changed, 78 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h index debe8c4f706260..96d6e26cee86af 100644 --- a/arch/powerpc/include/asm/ftrace.h +++ b/arch/powerpc/include/asm/ftrace.h @@ -59,6 +59,19 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr) struct dyn_arch_ftrace { struct module *mod; }; + +struct dyn_ftrace; +struct module *ftrace_mod_addr_get(struct dyn_ftrace *rec); +void ftrace_mod_addr_set(struct dyn_ftrace *rec, struct module *mod); + +#ifdef CONFIG_MPROFILE_KERNEL +int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec); +#define ftrace_init_nop ftrace_init_nop + +int ftrace_cmp_recs(const void *a, const void *b); +#define ftrace_cmp_recs ftrace_cmp_recs +#endif + #endif /* __ASSEMBLY__ */ #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index 80b6285769f27c..d9b6faa4c98a8c 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -428,21 +428,21 @@ int ftrace_make_nop(struct module *mod, * We should either already have a pointer to the module * or it has been passed in. */ - if (!rec->arch.mod) { + if (!ftrace_mod_addr_get(rec)) { if (!mod) { pr_err("No module loaded addr=%lx\n", addr); return -EFAULT; } - rec->arch.mod = mod; + ftrace_mod_addr_set(rec, mod); } else if (mod) { - if (mod != rec->arch.mod) { + if (mod != ftrace_mod_addr_get(rec)) { pr_err("Record mod %p not equal to passed in mod %p\n", - rec->arch.mod, mod); + ftrace_mod_addr_get(rec), mod); return -EINVAL; } /* nothing to do if mod == rec->arch.mod */ } else - mod = rec->arch.mod; + mod = ftrace_mod_addr_get(rec); return __ftrace_make_nop(mod, rec, addr); #else @@ -451,6 +451,59 @@ int ftrace_make_nop(struct module *mod, #endif /* CONFIG_MODULES */ } +#ifdef CONFIG_MPROFILE_KERNEL +struct module *ftrace_mod_addr_get(struct dyn_ftrace *rec) +{ + return (struct module *)((unsigned long)rec->arch.mod & ~0x1); +} + +void ftrace_mod_addr_set(struct dyn_ftrace *rec, struct module *mod) +{ + rec->arch.mod = (struct module *)(((unsigned long)rec->arch.mod & 0x1) | (unsigned long)mod); +} + +bool ftrace_location_has_gep(const struct dyn_ftrace *rec) +{ + return !!((unsigned long)rec->arch.mod & 0x1); +} + +int ftrace_cmp_recs(const void *a, const void *b) +{ + const struct dyn_ftrace *key = a; + const struct dyn_ftrace *rec = b; + int offset = ftrace_location_has_gep(rec) ? 12 : 4; + + if (key->flags < rec->ip - offset) + return -1; + if (key->ip >= rec->ip + MCOUNT_INSN_SIZE) + return 1; + return 0; +} + +int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec) +{ + unsigned long offset; + + if (!kallsyms_lookup_size_offset(rec->ip, NULL, &offset) || (offset != 12 && offset != 4)) { + /* TODO: implement logic to deduce lep/gep from code */ + } else if (offset == 12) { + ftrace_mod_addr_set(rec, (struct module *)1); + } + + return ftrace_make_nop(mod, rec, MCOUNT_ADDR); +} +#else +struct module *ftrace_mod_addr_get(struct dyn_ftrace *rec) +{ + return rec->arch.mod; +} + +void ftrace_mod_addr_set(struct dyn_ftrace *rec, struct module * mod) +{ + rec->arch.mod = mod; +} +#endif /* CONFIG_MPROFILE_KERNEL */ + #ifdef CONFIG_MODULES #ifdef CONFIG_PPC64 /* @@ -494,7 +547,7 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) ppc_inst_t instr; void *ip = (void *)rec->ip; unsigned long entry, ptr, tramp; - struct module *mod = rec->arch.mod; + struct module *mod = ftrace_mod_addr_get(rec); /* read where this goes */ if (copy_inst_from_kernel_nofault(op, ip)) @@ -561,7 +614,7 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) int err; ppc_inst_t op; u32 *ip = (u32 *)rec->ip; - struct module *mod = rec->arch.mod; + struct module *mod = ftrace_mod_addr_get(rec); unsigned long tramp; /* read where this goes */ @@ -678,7 +731,7 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) * Being that we are converting from nop, it had better * already have a module defined. */ - if (!rec->arch.mod) { + if (!ftrace_mod_addr_get(rec)) { pr_err("No module loaded\n"); return -EINVAL; } @@ -699,7 +752,7 @@ __ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, ppc_inst_t op; unsigned long ip = rec->ip; unsigned long entry, ptr, tramp; - struct module *mod = rec->arch.mod; + struct module *mod = ftrace_mod_addr_get(rec); /* If we never set up ftrace trampolines, then bail */ if (!mod->arch.tramp || !mod->arch.tramp_regs) { @@ -814,7 +867,7 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, /* * Out of range jumps are called from modules. */ - if (!rec->arch.mod) { + if (!ftrace_mod_addr_get(rec)) { pr_err("No module loaded\n"); return -EINVAL; } diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index f9feb197b2daaf..68f20cf34b0c47 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -1510,6 +1510,7 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip, void *regs) } +#ifndef ftrace_cmp_recs static int ftrace_cmp_recs(const void *a, const void *b) { const struct dyn_ftrace *key = a; @@ -1521,6 +1522,7 @@ static int ftrace_cmp_recs(const void *a, const void *b) return 1; return 0; } +#endif static struct dyn_ftrace *lookup_rec(unsigned long start, unsigned long end) { Thanks, Naveen
On Thu, 10 Feb 2022 13:58:29 +0000 "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote: > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c > index f9feb197b2daaf..68f20cf34b0c47 100644 > --- a/kernel/trace/ftrace.c > +++ b/kernel/trace/ftrace.c > @@ -1510,6 +1510,7 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip, void *regs) > } > > > +#ifndef ftrace_cmp_recs > static int ftrace_cmp_recs(const void *a, const void *b) > { > const struct dyn_ftrace *key = a; > @@ -1521,6 +1522,7 @@ static int ftrace_cmp_recs(const void *a, const void *b) > return 1; > return 0; > } > +#endif > I don't really care for this part, as it seems somewhat ugly. But this patch does appear to solve your issue, and I can't think of a prettier way to do this. So, I will reluctantly ack it. If anything, please add a comment above saying that architectures may need to override this function, and when doing so, they will define their own ftrace_cmp_recs. -- Steve
Steven Rostedt wrote: > On Thu, 10 Feb 2022 13:58:29 +0000 > "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote: > >> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c >> index f9feb197b2daaf..68f20cf34b0c47 100644 >> --- a/kernel/trace/ftrace.c >> +++ b/kernel/trace/ftrace.c >> @@ -1510,6 +1510,7 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip, void *regs) >> } >> >> >> +#ifndef ftrace_cmp_recs >> static int ftrace_cmp_recs(const void *a, const void *b) >> { >> const struct dyn_ftrace *key = a; >> @@ -1521,6 +1522,7 @@ static int ftrace_cmp_recs(const void *a, const void *b) >> return 1; >> return 0; >> } >> +#endif >> > > I don't really care for this part, as it seems somewhat ugly. But this > patch does appear to solve your issue, and I can't think of a prettier way > to do this. Yes, this approach looks like it will solve a few different issues for us. We would also like to nop-out the instruction before the branch to _mcount(), so this helps ensure that kprobes won't also overwrite that instruction. > > So, I will reluctantly ack it. Since we don't want to change struct dyn_ftrace, we will need to contain changes within the architecture code, so I don't see a way to do this in a generic manner. The other option is to mark ftrace_cmp_recs() as a __weak function, but I have a vague recollection of you suggesting #ifdef rather than a __weak function in the past. I might be mis-remembering, so if you think making this a __weak function is better, I can do that. > > If anything, please add a comment above saying that architectures may need > to override this function, and when doing so, they will define their own > ftrace_cmp_recs. Sure. - Naveen
On Thu, 10 Feb 2022 16:40:28 +0000 "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote: > The other option is to mark ftrace_cmp_recs() as a __weak function, but > I have a vague recollection of you suggesting #ifdef rather than a > __weak function in the past. I might be mis-remembering, so if you think > making this a __weak function is better, I can do that. No. If I wanted that I would have suggested it. I think this is the prettiest of the ugly solutions out there ;-) As I said, I can't think of a better solution, and we can go with this until something else comes along. -- Steve
Steven Rostedt wrote: > On Thu, 10 Feb 2022 16:40:28 +0000 > "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> wrote: > >> The other option is to mark ftrace_cmp_recs() as a __weak function, but >> I have a vague recollection of you suggesting #ifdef rather than a >> __weak function in the past. I might be mis-remembering, so if you think >> making this a __weak function is better, I can do that. > > No. If I wanted that I would have suggested it. I think this is the > prettiest of the ugly solutions out there ;-) Understood :) > > As I said, I can't think of a better solution, and we can go with this > until something else comes along. Thanks, - Naveen
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c index 9a492fdec1dfbe..03cb50e4c8c3e8 100644 --- a/arch/powerpc/kernel/kprobes.c +++ b/arch/powerpc/kernel/kprobes.c @@ -50,13 +50,7 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset) addr = (kprobe_opcode_t *)kallsyms_lookup_name(name); if (addr && !offset) { #ifdef CONFIG_KPROBES_ON_FTRACE - unsigned long faddr; - /* - * Per livepatch.h, ftrace location is always within the first - * 16 bytes of a function on powerpc with -mprofile-kernel. - */ - faddr = ftrace_location_range((unsigned long)addr, - (unsigned long)addr + 16); + unsigned long faddr = ftrace_location_lookup((unsigned long)addr); if (faddr) addr = (kprobe_opcode_t *)faddr; else diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c index b65ca87a2eacb1..5127eb65c299af 100644 --- a/arch/powerpc/kernel/trace/ftrace.c +++ b/arch/powerpc/kernel/trace/ftrace.c @@ -1137,3 +1137,14 @@ char *arch_ftrace_match_adjust(char *str, const char *search) return str; } #endif /* PPC64_ELF_ABI_v1 */ + +#ifdef CONFIG_MPROFILE_KERNEL +unsigned long ftrace_location_lookup(unsigned long ip) +{ + /* + * Per livepatch.h, ftrace location is always within the first + * 16 bytes of a function on powerpc with -mprofile-kernel. + */ + return ftrace_location_range(ip, ip + 16); +} +#endif
With CONFIG_MPROFILE_KERNEL, ftrace location is within the first 5 instructions of a function. Override ftrace_location_lookup() to search within this range for the ftrace location. Also convert kprobe_lookup_name() to utilize this function. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> --- arch/powerpc/kernel/kprobes.c | 8 +------- arch/powerpc/kernel/trace/ftrace.c | 11 +++++++++++ 2 files changed, 12 insertions(+), 7 deletions(-)