mbox series

[v2,0/5] kallsyms: Emit symbol for holes in text and fix weak function issue

Message ID 20240723063258.2240610-1-zhengyejian@huaweicloud.com (mailing list archive)
Headers show
Series kallsyms: Emit symbol for holes in text and fix weak function issue | expand

Message

Zheng Yejian July 23, 2024, 6:32 a.m. UTC
Background of this patch set can be found in v1:
https://lore.kernel.org/all/20240613133711.2867745-1-zhengyejian1@huawei.com/

Here add a reproduction to show the impact to livepatch:
1. Add following hack to make livepatch-sample.ko do patch on do_one_initcall()
   which has an overriden weak function behind in vmlinux, then print the
   actually used __fentry__ location:

  diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c
  index 90408500e5a3..20b75e7b1771 100644
  --- a/kernel/livepatch/patch.c
  +++ b/kernel/livepatch/patch.c
  @@ -173,6 +173,8 @@ static int klp_patch_func(struct klp_func *func)
                  unsigned long ftrace_loc;

                  ftrace_loc = ftrace_location((unsigned long)func->old_func);
  +               pr_info("%s: old_func=%pS ftrace_loc=%pS (%lx)\n", __func__,
  +                       func->old_func, (void *)ftrace_loc, ftrace_loc);
                  if (!ftrace_loc) {
                          pr_err("failed to find location for function '%s'\n",
                                  func->old_name);
  diff --git a/samples/livepatch/livepatch-sample.c b/samples/livepatch/livepatch-sample.c
  index cd76d7ebe598..c6e8e78e23b8 100644
  --- a/samples/livepatch/livepatch-sample.c
  +++ b/samples/livepatch/livepatch-sample.c
  @@ -38,7 +38,7 @@ static int livepatch_cmdline_proc_show(struct seq_file *m, void *v)

   static struct klp_func funcs[] = {
          {
  -               .old_name = "cmdline_proc_show",
  +               .old_name = "do_one_initcall",
                  .new_func = livepatch_cmdline_proc_show,
          }, { }
   };

2. We can see the size of do_one_initcall() is 0x27e:
  $ nm -nS vmlinux | grep do_one_initcall
  ffffffff810021e0 000000000000027e T do_one_initcall

3. Insert the livepatch ko, the expected ftrace_loc is do_one_initcall+0x4/0x27e but
   actually do_one_initcall+0x294/0x2c0 which is invalid, also its size is inaccurate.
   Then this livepatch can not properly work!!!

  # insmod livepatch-sample.ko
  [   17.942451] livepatch: klp_patch_func: old_func=do_one_initcall+0x0/0x2c0 ftrace_loc=do_one_initcall+0x294/0x2c0 (ffffffff81002474)
  #
  # cat /sys/kernel/tracing/available_filter_functions_addrs | grep ffffffff81002474
  ffffffff81002474 __ftrace_invalid_address___660

After this patch, this issue is gone:

  # insmod livepatch-sample.ko
  [   42.408632] livepatch: klp_patch_func: old_func=do_one_initcall+0x0/0x27e ftrace_loc=do_one_initcall+0x4/0x27e (ffffffff810021e4)
  # cat /sys/kernel/tracing/available_filter_functions_addrs | grep ffffffff810021e4
  ffffffff810021e4 do_one_initcall
  # cat /sys/kernel/tracing/enabled_functions
  do_one_initcall (1)   I     M   tramp: 0xffffffffa0020000 (klp_ftrace_handler+0x0/0x258)->klp_ftrace_handler+0x0/0x258

Changes:

v1->v2:
  - Drop patch which titled "kallsyms: Optimize multiple times of realloc() to one time of malloc()"
    as suggested by Masahiro;
    Link: https://lore.kernel.org/all/CAK7LNAQkSnZ1nVXiZH9kg52H-A_=urcsv-W7wGXvunMGhGX8Vw@mail.gmail.com/
  - Changes in PATCH 1/5:
    - Change may_exist_hole_after_symbol() to has_hole() and return bool type;
    - User realloc instead of malloc another table in emit_hole_symbols();
      Link: https://lore.kernel.org/all/CAK7LNAQaLc6aDK85qQtPHoCkQSGyL-TxXjpgJTfehe2Q1=jMSA@mail.gmail.com/
    - Use empty symbol type/name as a special case to represent the hole instead of
      using symbol "t__hole_symbol_XXXXX";
      Link: https://lore.kernel.org/all/CAK7LNARiR5z9hPRG932T7YjRWqkX_qZ7WKmbxx7iTo2w5YJojQ@mail.gmail.com/
  - Changes in PATCH 3/5:
    - Now name of hole symbol is NULL, so if __fentry__ is located in a hole,
      kallsyms_lookup() find an NULL name then will return 0, so drop the
      needless kallsyms_is_hole_symbol().

Zheng Yejian (5):
  kallsyms: Emit symbol at the holes in the text
  module: kallsyms: Determine exact function size
  ftrace: Skip invalid __fentry__ in ftrace_process_locs()
  ftrace: Fix possible out-of-bound issue in ftrace_process_locs()
  ftrace: Revert the FTRACE_MCOUNT_MAX_OFFSET workaround

 arch/powerpc/include/asm/ftrace.h |   7 --
 arch/x86/include/asm/ftrace.h     |   7 --
 include/linux/module.h            |  14 +++
 kernel/module/kallsyms.c          |  42 ++++++--
 kernel/trace/ftrace.c             | 171 ++++++------------------------
 scripts/kallsyms.c                |  95 ++++++++++++++++-
 scripts/link-vmlinux.sh           |   4 +-
 scripts/mksysmap                  |   2 +-
 8 files changed, 173 insertions(+), 169 deletions(-)

Comments

Martin Kelly Dec. 10, 2024, 7:15 p.m. UTC | #1
On Tue, 2024-07-23 at 14:32 +0800, Zheng Yejian wrote:
> Background of this patch set can be found in v1:
> 
> https://lore.kernel.org/all/20240613133711.2867745-1-zhengyejian1@huawei.com/
>  
> 
> Here add a reproduction to show the impact to livepatch:
> 1. Add following hack to make livepatch-sample.ko do patch on
> do_one_initcall()
>    which has an overriden weak function behind in vmlinux, then print
> the
>    actually used __fentry__ location:
> 

Hi all, what is the status of this patch series? I'd really like to see
it or some other fix to this issue merged. The underlying bug is a
significant one that can cause ftrace/livepatch/BPF fentry to fail
silently. I've noticed this bug in another context[1] and realized
they're the same issue.

I'm happy to help with this patch series to address any issues as
needed.

[1]
https://lore.kernel.org/bpf/7136605d24de9b1fc62d02a355ef11c950a94153.camel@crowdstrike.com/T/#mb7e6f84ac90fa78989e9e2c3cd8d29f65a78845b
Christophe Leroy Dec. 10, 2024, 8:01 p.m. UTC | #2
Hi,

Le 10/12/2024 à 20:15, Martin Kelly a écrit :
> [Vous ne recevez pas souvent de courriers de martin.kelly@crowdstrike.com. Découvrez pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ]
> 
> On Tue, 2024-07-23 at 14:32 +0800, Zheng Yejian wrote:
>> Background of this patch set can be found in v1:
>>
>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F20240613133711.2867745-1-zhengyejian1%40huawei.com%2F&data=05%7C02%7Cchristophe.leroy%40csgroup.eu%7Cbc4f27151ef04b74fba608dd194f0034%7C8b87af7d86474dc78df45f69a2011bb5%7C0%7C0%7C638694550404456289%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=a5XFKy9qxVrM5yXuvJuilJ%2FsUxU4j326MOmEz7dBViY%3D&reserved=0
>>
>>
>> Here add a reproduction to show the impact to livepatch:
>> 1. Add following hack to make livepatch-sample.ko do patch on
>> do_one_initcall()
>>     which has an overriden weak function behind in vmlinux, then print
>> the
>>     actually used __fentry__ location:
>>
> 
> Hi all, what is the status of this patch series? I'd really like to see
> it or some other fix to this issue merged. The underlying bug is a
> significant one that can cause ftrace/livepatch/BPF fentry to fail
> silently. I've noticed this bug in another context[1] and realized
> they're the same issue.
> 
> I'm happy to help with this patch series to address any issues as
> needed.

As far as I can see there are problems on build with patch 1, see 
https://patchwork.kernel.org/project/linux-modules/patch/20240723063258.2240610-2-zhengyejian@huaweicloud.com/

> 
> [1]
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fbpf%2F7136605d24de9b1fc62d02a355ef11c950a94153.camel%40crowdstrike.com%2FT%2F%23mb7e6f84ac90fa78989e9e2c3cd8d29f65a78845b&data=05%7C02%7Cchristophe.leroy%40csgroup.eu%7Cbc4f27151ef04b74fba608dd194f0034%7C8b87af7d86474dc78df45f69a2011bb5%7C0%7C0%7C638694550404477455%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=v9qPnj%2FDDWAuSdB6dP19nyxUWijxveymI6mQb63KxbY%3D&reserved=0

Christophe
Martin Kelly Dec. 10, 2024, 8:49 p.m. UTC | #3
On Tue, 2024-12-10 at 21:01 +0100, Christophe Leroy wrote:
> > 
> > Hi all, what is the status of this patch series? I'd really like to
> > see
> > it or some other fix to this issue merged. The underlying bug is a
> > significant one that can cause ftrace/livepatch/BPF fentry to fail
> > silently. I've noticed this bug in another context[1] and realized
> > they're the same issue.
> > 
> > I'm happy to help with this patch series to address any issues as
> > needed.
> 
> As far as I can see there are problems on build with patch 1, see 
> https://patchwork.kernel.org/project/linux-modules/patch/20240723063258.2240610-2-zhengyejian@huaweicloud.com/
>  
> 
> 

Yeah, I see those. Additionally, this patch no longer applies cleanly
to current master, though fixing it up to do so is pretty easy. Having
done that, this series appears to resolve the issues I saw in the other
linked thread.

Zheng, do you plan to send a v3? I'd be happy to help out with this
patch series if you'd like, as I'm hoping to get this issue resolved
(though I am not an ftrace expert).
Zheng Yejian Dec. 12, 2024, 9:52 a.m. UTC | #4
On 2024/12/11 04:49, Martin Kelly wrote:
> On Tue, 2024-12-10 at 21:01 +0100, Christophe Leroy wrote:
>>>
>>> Hi all, what is the status of this patch series? I'd really like to
>>> see
>>> it or some other fix to this issue merged. The underlying bug is a
>>> significant one that can cause ftrace/livepatch/BPF fentry to fail
>>> silently. I've noticed this bug in another context[1] and realized
>>> they're the same issue.
>>>
>>> I'm happy to help with this patch series to address any issues as
>>> needed.
>>
>> As far as I can see there are problems on build with patch 1, see
>> https://patchwork.kernel.org/project/linux-modules/patch/20240723063258.2240610-2-zhengyejian@huaweicloud.com/
>>   
>>
>>
> 
> Yeah, I see those. Additionally, this patch no longer applies cleanly
> to current master, though fixing it up to do so is pretty easy. Having
> done that, this series appears to resolve the issues I saw in the other
> linked thread.
> 
> Zheng, do you plan to send a v3? I'd be happy to help out with this
> patch series if you'd like, as I'm hoping to get this issue resolved
> (though I am not an ftrace expert).

Sorry to rely so late. Thanks for your feedback!

Steve recently started a discussion of the issue in:
https://lore.kernel.org/all/20241014210841.5a4764de@gandalf.local.home/
but there's no conclusion.
  
I can rebase this patch series and send a new version first, and
I'm also hoping to get more feedbacks and help to resolve the issue.
Martin Kelly Dec. 13, 2024, 7:31 p.m. UTC | #5
On Thu, 2024-12-12 at 17:52 +0800, Zheng Yejian wrote:
> On 2024/12/11 04:49, Martin Kelly wrote:
> > 
> > 
> > Zheng, do you plan to send a v3? I'd be happy to help out with this
> > patch series if you'd like, as I'm hoping to get this issue
> > resolved
> > (though I am not an ftrace expert).
> 
> Sorry to rely so late. Thanks for your feedback!
> 
> Steve recently started a discussion of the issue in:
> 
> https://lore.kernel.org/all/20241015100111.37adfb28@gandalf.local.home/
> but there's no conclusion.
>   
> I can rebase this patch series and send a new version first, and
> I'm also hoping to get more feedbacks and help to resolve the issue.
> 

Hi Zheng,

I may have misunderstood, but based on the final message from Steven, I
got the sense that the idea listed in that thread didn't work out and
we should proceed with your current approach.

Please consider me an interested party for this patch series, and let
me know if there's anything I can do to help speed it along (co-
develop, test, anything else). And of course, thanks very much for your
work thus far!
Zheng Yejian Dec. 14, 2024, 8:37 a.m. UTC | #6
On 2024/12/14 03:31, Martin Kelly wrote:
> On Thu, 2024-12-12 at 17:52 +0800, Zheng Yejian wrote:
>> On 2024/12/11 04:49, Martin Kelly wrote:
>>>
>>>
>>> Zheng, do you plan to send a v3? I'd be happy to help out with this
>>> patch series if you'd like, as I'm hoping to get this issue
>>> resolved
>>> (though I am not an ftrace expert).
>>
>> Sorry to rely so late. Thanks for your feedback!
>>
>> Steve recently started a discussion of the issue in:
>>
>> https://lore.kernel.org/all/20241015100111.37adfb28@gandalf.local.home/
>> but there's no conclusion.
>>    
>> I can rebase this patch series and send a new version first, and
>> I'm also hoping to get more feedbacks and help to resolve the issue.
>>
> 
> Hi Zheng,
> 
> I may have misunderstood, but based on the final message from Steven, I
> got the sense that the idea listed in that thread didn't work out and
> we should proceed with your current approach.

This patch set attempts to add length information of symbols to kallsyms,
which can then ensure that there is only one valid fentry within the range
of function length.

However, I think this patch set actually has some performance implications,
including:
   1. Adding hole symbols may cause the symbol table to grow significantly;
   2. When parsing fentry tables, excluding invalid fentries through kallsyms lookups
      may also increase boot or module load times slightly.

The direct cause of this issue is the wrong fentry being founded by ftrace_location(),
following the approach of "FTRACE_MCOUNT_MAX_OFFSET", narrowing down the search range
and re-finding may also solve this problem, demo patch like below (not
fully tested):

     diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
     index 9b17efb1a87d..7d34320ca9d1 100644
     --- a/kernel/trace/ftrace.c
     +++ b/kernel/trace/ftrace.c
     @@ -1678,8 +1678,11 @@ unsigned long ftrace_location(unsigned long ip)
                             goto out;
     
                     /* map sym+0 to __fentry__ */
     -               if (!offset)
     +               if (!offset) {
                             loc = ftrace_location_range(ip, ip + size - 1);
     +                       while (loc > ip && loc - ip > FTRACE_MCOUNT_MAX_OFFSET)
     +                               loc = ftrace_location_range(ip, loc - 1);
     +               }
             }

Steve, Peter, what do you think?

> 
> Please consider me an interested party for this patch series, and let
> me know if there's anything I can do to help speed it along (co-
> develop, test, anything else). And of course, thanks very much for your
> work thus far!