Message ID | 20240607115211.734845-1-zhengyejian1@huawei.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [RFC] ftrace: Skip __fentry__ location of overridden weak functions | expand |
On Fri, Jun 07, 2024 at 07:52:11PM +0800, Zheng Yejian wrote: > ftrace_location() was changed to not only return the __fentry__ location > when called for the __fentry__ location, but also when called for the > sym+0 location after commit aebfd12521d9 ("x86/ibt,ftrace: Search for > __fentry__ location"). That is, if sym+0 location is not __fentry__, > ftrace_location() would find one over the entire size of the sym. > > However, there is case that more than one __fentry__ exist in the sym > range (described below) and ftrace_location() would find wrong __fentry__ > location by binary searching, which would cause its users like livepatch/ > kprobe/bpf to not work properly on this sym! > > The case is that, based on current compiler behavior, suppose: > - function A is followed by weak function B1 in same binary file; > - weak function B1 is overridden by function B2; > Then in the final binary file: > - symbol B1 will be removed from symbol table while its instructions are > not removed; > - __fentry__ of B1 will be still in __mcount_loc table; > - function size of A is computed by substracting the symbol address of > A from its next symbol address (see kallsyms_lookup_size_offset()), > but because symbol info of B1 is removed, the next symbol of A is > originally the next symbol of B1. See following example, function > sizeof A will be (symbol_address_C - symbol_address_A): > > symbol_address_A > symbol_address_B1 (Not in symbol table) > symbol_address_C > > The weak function issue has been discovered in commit b39181f7c690 > ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function") > but it didn't resolve the issue in ftrace_location(). > > There may be following resolutions: Oh gawd, sodding weak functions again. I would suggest changing scipts/kallsyms.c to emit readily identifiable symbol names for all the weak junk, eg: __weak_junk_NNNNN That instantly fixes the immediate problem and Steve's horrid hack can go away. Additionally, I would add a boot up pass that would INT3 fill all such functions and remove/invalidate all static_call/static_jump/fentry/alternative entry that is inside of them.
On Fri, 7 Jun 2024 17:02:28 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > > There may be following resolutions: > > Oh gawd, sodding weak functions again. > > I would suggest changing scipts/kallsyms.c to emit readily identifiable > symbol names for all the weak junk, eg: > > __weak_junk_NNNNN > > That instantly fixes the immediate problem and Steve's horrid hack can > go away. Right. And when I wrote that hack, I specifically said this should be fixed in kallsyms, and preferably at build time, as that's when the weak functions should all be resolved. -- Steve > > Additionally, I would add a boot up pass that would INT3 fill all such > functions and remove/invalidate all > static_call/static_jump/fentry/alternative entry that is inside of them.
On 2024/6/7 23:02, Peter Zijlstra wrote: > On Fri, Jun 07, 2024 at 07:52:11PM +0800, Zheng Yejian wrote: >> ftrace_location() was changed to not only return the __fentry__ location >> when called for the __fentry__ location, but also when called for the >> sym+0 location after commit aebfd12521d9 ("x86/ibt,ftrace: Search for >> __fentry__ location"). That is, if sym+0 location is not __fentry__, >> ftrace_location() would find one over the entire size of the sym. >> >> However, there is case that more than one __fentry__ exist in the sym >> range (described below) and ftrace_location() would find wrong __fentry__ >> location by binary searching, which would cause its users like livepatch/ >> kprobe/bpf to not work properly on this sym! >> >> The case is that, based on current compiler behavior, suppose: >> - function A is followed by weak function B1 in same binary file; >> - weak function B1 is overridden by function B2; >> Then in the final binary file: >> - symbol B1 will be removed from symbol table while its instructions are >> not removed; >> - __fentry__ of B1 will be still in __mcount_loc table; >> - function size of A is computed by substracting the symbol address of >> A from its next symbol address (see kallsyms_lookup_size_offset()), >> but because symbol info of B1 is removed, the next symbol of A is >> originally the next symbol of B1. See following example, function >> sizeof A will be (symbol_address_C - symbol_address_A): >> >> symbol_address_A >> symbol_address_B1 (Not in symbol table) >> symbol_address_C >> >> The weak function issue has been discovered in commit b39181f7c690 >> ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function") >> but it didn't resolve the issue in ftrace_location(). >> >> There may be following resolutions: > > Oh gawd, sodding weak functions again. > > I would suggest changing scipts/kallsyms.c to emit readily identifiable > symbol names for all the weak junk, eg: > > __weak_junk_NNNNN > Sorry for the late reply, I just had a long noon holiday :> scripts/kallsyms.c is compiled and used to handle symbols in vmlinux.o or vmlinux.a, see kallsyms_step() in scripts/link-vmlinux.sh, those overridden weak symbols has been removed from symbol table of vmlinux.o or vmlinux.a. But we can found those symbols from original xx/xx.o file, for example, the weak free_initmem() in in init/main.c is overridden, its symbol is not in vmlinx but is still in init/main.o . How about traversing all origin xx/xx.o and finding all weak junk symbols ? > That instantly fixes the immediate problem and Steve's horrid hack can > go away. > Yes, this can be done in same patch series. > Additionally, I would add a boot up pass that would INT3 fill all such > functions and remove/invalidate all > static_call/static_jump/fentry/alternative entry that is inside of them. > > > -- Thanks, Zheng Yejian
On Tue, Jun 11, 2024 at 09:56:51AM +0800, Zheng Yejian wrote: > On 2024/6/7 23:02, Peter Zijlstra wrote: > > Oh gawd, sodding weak functions again. > > > > I would suggest changing scipts/kallsyms.c to emit readily identifiable > > symbol names for all the weak junk, eg: > > > > __weak_junk_NNNNN > > > > Sorry for the late reply, I just had a long noon holiday :> > > scripts/kallsyms.c is compiled and used to handle symbols in vmlinux.o > or vmlinux.a, see kallsyms_step() in scripts/link-vmlinux.sh, those > overridden weak symbols has been removed from symbol table of vmlinux.o > or vmlinux.a. But we can found those symbols from original xx/xx.o file, > for example, the weak free_initmem() in in init/main.c is overridden, > its symbol is not in vmlinx but is still in init/main.o . > > How about traversing all origin xx/xx.o and finding all weak junk symbols ? You don't need to. ELF symbl tables have an entry size for FUNC type objects, this means that you can readily find holes in the text and fill them with a symbol. Specifically, you can check the mcount locations against the symbol table and for every one that falls in a hole, generate a new junk symbol. Also see 4adb23686795 where objtool adds these holes to the ignore/unreachable code check. The lack of size for kallsyms is in a large part what is causing the problems.
On 2024/6/11 17:21, Peter Zijlstra wrote: > On Tue, Jun 11, 2024 at 09:56:51AM +0800, Zheng Yejian wrote: >> On 2024/6/7 23:02, Peter Zijlstra wrote: > >>> Oh gawd, sodding weak functions again. >>> >>> I would suggest changing scipts/kallsyms.c to emit readily identifiable >>> symbol names for all the weak junk, eg: >>> >>> __weak_junk_NNNNN >>> >> >> Sorry for the late reply, I just had a long noon holiday :> >> >> scripts/kallsyms.c is compiled and used to handle symbols in vmlinux.o >> or vmlinux.a, see kallsyms_step() in scripts/link-vmlinux.sh, those >> overridden weak symbols has been removed from symbol table of vmlinux.o >> or vmlinux.a. But we can found those symbols from original xx/xx.o file, >> for example, the weak free_initmem() in in init/main.c is overridden, >> its symbol is not in vmlinx but is still in init/main.o . >> >> How about traversing all origin xx/xx.o and finding all weak junk symbols ? > > You don't need to. ELF symbl tables have an entry size for FUNC type > objects, this means that you can readily find holes in the text and fill > them with a symbol. > > Specifically, you can check the mcount locations against the symbol > table and for every one that falls in a hole, generate a new junk > symbol. > > Also see 4adb23686795 where objtool adds these holes to the > ignore/unreachable code check. > > > The lack of size for kallsyms is in a large part what is causing the > problems. Thanks for your suggestions, I'll try it soon. -- Thanks, ZYJ
Wondering where are we with this issue? I am experiencing an issue where in a fentry/kfunc bpf probe attached to a function doesn't fire. I have only experienced this behavior on Debian kernels with `CONFIG_X86_KERNEL_IBT` enabled. Because of weak symbols being removed from kallsyms, kallsyms_lookup_size_offset() returns the symbol offset for the function "acct_process()" more than the actual size. And the function body now contains two __fentry__ locations. Depending on where binary search lands up first, correct (acct_process + 4) or incorrect (acct_process + 260) location is returned. Thanks, Dropify
diff --git a/include/linux/module.h b/include/linux/module.h index ffa1c603163c..3d5a2165160d 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -954,6 +954,9 @@ unsigned long module_kallsyms_lookup_name(const char *name); unsigned long find_kallsyms_symbol_value(struct module *mod, const char *name); +int find_kallsyms_symbol(struct module *mod, unsigned long addr, + unsigned long *size, unsigned long *offset); + #else /* CONFIG_MODULES && CONFIG_KALLSYMS */ static inline int module_kallsyms_on_each_symbol(const char *modname, @@ -997,6 +1000,11 @@ static inline unsigned long find_kallsyms_symbol_value(struct module *mod, return 0; } +static inline int find_kallsyms_symbol(struct module *mod, unsigned long addr, + unsigned long *size, unsigned long *offset) +{ + return 0; +} #endif /* CONFIG_MODULES && CONFIG_KALLSYMS */ #endif /* _LINUX_MODULE_H */ diff --git a/kernel/module/kallsyms.c b/kernel/module/kallsyms.c index 62fb57bb9f16..d70fb4ead794 100644 --- a/kernel/module/kallsyms.c +++ b/kernel/module/kallsyms.c @@ -253,10 +253,10 @@ static const char *kallsyms_symbol_name(struct mod_kallsyms *kallsyms, unsigned * Given a module and address, find the corresponding symbol and return its name * while providing its size and offset if needed. */ -static const char *find_kallsyms_symbol(struct module *mod, - unsigned long addr, - unsigned long *size, - unsigned long *offset) +static const char *__find_kallsyms_symbol(struct module *mod, + unsigned long addr, + unsigned long *size, + unsigned long *offset) { unsigned int i, best = 0; unsigned long nextval, bestval; @@ -311,6 +311,17 @@ static const char *find_kallsyms_symbol(struct module *mod, return kallsyms_symbol_name(kallsyms, best); } +int find_kallsyms_symbol(struct module *mod, unsigned long addr, + unsigned long *size, unsigned long *offset) +{ + const char *ret; + + preempt_disable(); + ret = __find_kallsyms_symbol(mod, addr, size, offset); + preempt_enable(); + return !!ret; +} + void * __weak dereference_module_function_descriptor(struct module *mod, void *ptr) { @@ -344,7 +355,7 @@ const char *module_address_lookup(unsigned long addr, #endif } - ret = find_kallsyms_symbol(mod, addr, size, offset); + ret = __find_kallsyms_symbol(mod, addr, size, offset); } /* Make a copy in here where it's safe */ if (ret) { @@ -367,7 +378,7 @@ int lookup_module_symbol_name(unsigned long addr, char *symname) if (within_module(addr, mod)) { const char *sym; - sym = find_kallsyms_symbol(mod, addr, NULL, NULL); + sym = __find_kallsyms_symbol(mod, addr, NULL, NULL); if (!sym) goto out; diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 65208d3b5ed9..3c56be753ae8 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -6488,6 +6488,7 @@ static int ftrace_process_locs(struct module *mod, unsigned long addr; unsigned long flags = 0; /* Shut up gcc */ int ret = -ENOMEM; + unsigned long last_func = 0; count = end - start; @@ -6538,6 +6539,8 @@ static int ftrace_process_locs(struct module *mod, pg = start_pg; while (p < end) { unsigned long end_offset; + unsigned long cur_func, off; + addr = ftrace_call_adjust(*p++); /* * Some architecture linkers will pad between @@ -6549,6 +6552,16 @@ static int ftrace_process_locs(struct module *mod, skipped++; continue; } + if (mod) + WARN_ON_ONCE(!find_kallsyms_symbol(mod, addr, NULL, &off)); + else + WARN_ON_ONCE(!kallsyms_lookup_size_offset(addr, NULL, &off)); + cur_func = addr - off; + if (cur_func == last_func) { + skipped++; + continue; + } + last_func = cur_func; end_offset = (pg->index+1) * sizeof(pg->records[0]); if (end_offset > PAGE_SIZE << pg->order) { @@ -6860,13 +6873,6 @@ void ftrace_module_enable(struct module *mod) if (!within_module(rec->ip, mod)) break; - /* Weak functions should still be ignored */ - if (!test_for_valid_rec(rec)) { - /* Clear all other flags. Should not be enabled anyway */ - rec->flags = FTRACE_FL_DISABLED; - continue; - } - cnt = 0; /*
ftrace_location() was changed to not only return the __fentry__ location when called for the __fentry__ location, but also when called for the sym+0 location after commit aebfd12521d9 ("x86/ibt,ftrace: Search for __fentry__ location"). That is, if sym+0 location is not __fentry__, ftrace_location() would find one over the entire size of the sym. However, there is case that more than one __fentry__ exist in the sym range (described below) and ftrace_location() would find wrong __fentry__ location by binary searching, which would cause its users like livepatch/ kprobe/bpf to not work properly on this sym! The case is that, based on current compiler behavior, suppose: - function A is followed by weak function B1 in same binary file; - weak function B1 is overridden by function B2; Then in the final binary file: - symbol B1 will be removed from symbol table while its instructions are not removed; - __fentry__ of B1 will be still in __mcount_loc table; - function size of A is computed by substracting the symbol address of A from its next symbol address (see kallsyms_lookup_size_offset()), but because symbol info of B1 is removed, the next symbol of A is originally the next symbol of B1. See following example, function sizeof A will be (symbol_address_C - symbol_address_A): symbol_address_A symbol_address_B1 (Not in symbol table) symbol_address_C The weak function issue has been discovered in commit b39181f7c690 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function") but it didn't resolve the issue in ftrace_location(). There may be following resolutions: 1. Shrink the search range when __fentry__ is not a sym+0 location, for example use the macro FTRACE_MCOUNT_MAX_OFFSET. This need every arch to define its own FTRACE_MCOUNT_MAX_OFFSET: ftrace_location() { ... if (!offset) loc = ftrace_location_range(ip, ip + FTRACE_MCOUNT_MAX_OFFSET + 1); ... } 2. Define arch-specific arch_ftrace_location() based on its own different cases of __fentry__ position, for example: ftrace_location() { ... if (!offset) loc = arch_ftrace_location(ip); ... } 3. Skip __fentry__ of non-override weak function in ftrace_process_locs() then all records in ftrace_pages are valid. The reason why this scheme may work is that both __mcount_loc and symbol table are sorted and it can be assumed that one function has only one __fentry__ location. Then commit b39181f7c690 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function") can be reverted (not do in this patch). However, looking up size and offset of every record in __mount_loc table will slow down system boot and module load. Solution 1 and 2 need every arch to handle the complex fentry location case, I use solution 3 as RFC. Fixes: aebfd12521d9 ("x86/ibt,ftrace: Search for __fentry__ location") Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> --- include/linux/module.h | 8 ++++++++ kernel/module/kallsyms.c | 23 +++++++++++++++++------ kernel/trace/ftrace.c | 20 +++++++++++++------- 3 files changed, 38 insertions(+), 13 deletions(-)