diff mbox series

cgroup/bpf: fix NULL pointer dereference at cgroup_bpf_offline

Message ID 20241016093633.670555-1-chenridong@huaweicloud.com (mailing list archive)
State Not Applicable
Headers show
Series cgroup/bpf: fix NULL pointer dereference at cgroup_bpf_offline | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Chen Ridong Oct. 16, 2024, 9:36 a.m. UTC
From: Chen Ridong <chenridong@huawei.com>

Kernel fault injection test reports NULL pointer dereference as follows:
BUG: kernel NULL pointer dereference, address: 0000000000000010
PGD 0 P4D 0
Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 UID: 0 PID: 1611 Comm: umount Tainted: G        W       6.12.0-rc2
Tainted: [W]=WARN
RIP: 0010:__percpu_ref_switch_mode+0x30/0x320
RSP: 0018:ffffc90001f9be28 EFLAGS: 00000002
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000001
RDX: 0000000000000001 RSI: ffffffff82c12d18 RDI: ffff88810753a7d8
RBP: ffff88810f3a0888 R08: 0000000000000001 R09: 00000000000c0000
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000010 CR3: 0000000111edc000 CR4: 00000000000006f0
Call Trace:
  percpu_ref_kill_and_confirm+0x3a/0x90
  cgroup_kill_sb+0x61/0x190
  deactivate_locked_super+0x35/0xa0
  cleanup_mnt+0x100/0x160
  task_work_run+0x5c/0x90
  syscall_exit_to_user_mode+0x1bc/0x1d0
  do_syscall_64+0x74/0x140
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

A warning was also found in dmesg when the bug occurs,
WARNING: CPU: 5 PID: 1554 at kernel/cgroup/cgroup.c:2144
Call Trace:
  ? cgroup_setup_root+0x39c/0x440
  cgroup1_get_tree+0x38f/0x860
  ? cgroup1_get_tree+0x71/0x860
  vfs_get_tree+0x2c/0x100
  path_mount+0x2e3/0xb90
  __x64_sys_mount+0x19f/0x200
  do_syscall_64+0x68/0x140
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

As mentioned above, when cgroup_bpf_inherit returns an error in
cgroup_setup_root, cgrp->bpf.refcnt has been exited. If cgrp->bpf.refcnt is
killed again in the cgroup_kill_sb function, the data of cgrp->bpf.refcnt
may have become NULL, leading to NULL pointer dereference.

To fix this issue, goto err when cgroup_bpf_inherit returns an error.
Additionally, if cgroup_bpf_inherit returns an error after rebinding
subsystems, the root_cgrp->self.refcnt is exited, which leads to
cgroup1_root_to_use return 1 (restart) when subsystems is  mounted next.
This is due to a failure trying to get the refcnt(the root is root_cgrp,
without rebinding back to cgrp_dfl_root). So move the call to
cgroup_bpf_inherit above rebind_subsystems in the cgroup_setup_root.

Fixes: 04f8ef5643bc ("cgroup: Fix memory leak caused by missing cgroup_bpf_offline")
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 kernel/cgroup/cgroup.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Michal Koutný Oct. 16, 2024, 1:13 p.m. UTC | #1
Hello.

On Wed, Oct 16, 2024 at 09:36:33AM GMT, Chen Ridong <chenridong@huaweicloud.com> wrote:
> As mentioned above, when cgroup_bpf_inherit returns an error in
> cgroup_setup_root, cgrp->bpf.refcnt has been exited. If cgrp->bpf.refcnt is
> killed again in the cgroup_kill_sb function, the data of cgrp->bpf.refcnt
> may have become NULL, leading to NULL pointer dereference.
> 
> To fix this issue, goto err when cgroup_bpf_inherit returns an error.
> Additionally, if cgroup_bpf_inherit returns an error after rebinding
> subsystems, the root_cgrp->self.refcnt is exited, which leads to
> cgroup1_root_to_use return 1 (restart) when subsystems is  mounted next.
> This is due to a failure trying to get the refcnt(the root is root_cgrp,
> without rebinding back to cgrp_dfl_root). So move the call to
> cgroup_bpf_inherit above rebind_subsystems in the cgroup_setup_root.
> 
> Fixes: 04f8ef5643bc ("cgroup: Fix memory leak caused by missing cgroup_bpf_offline")
> Signed-off-by: Chen Ridong <chenridong@huawei.com>

Hm, I always thought that BPF progs can only be attached to the default
hierarchy (cgroup_bpf_prog_attach/cgroup_get_from_fd should prevent
that).

Thus I wonder whether cgroup_bpf_inherit (which is more like
cgroup_bpf_init in this case) needs to be called no v1 roots at all (and
with such a change, 04f8ef5643bc could be effectively reverted too).

Or can bpf data be used on v1 hierarchies somehow?

Thanks,
Michal
Tejun Heo Oct. 16, 2024, 5:04 p.m. UTC | #2
On Wed, Oct 16, 2024 at 03:13:52PM +0200, Michal Koutný wrote:
> Hello.
> 
> On Wed, Oct 16, 2024 at 09:36:33AM GMT, Chen Ridong <chenridong@huaweicloud.com> wrote:
> > As mentioned above, when cgroup_bpf_inherit returns an error in
> > cgroup_setup_root, cgrp->bpf.refcnt has been exited. If cgrp->bpf.refcnt is
> > killed again in the cgroup_kill_sb function, the data of cgrp->bpf.refcnt
> > may have become NULL, leading to NULL pointer dereference.
> > 
> > To fix this issue, goto err when cgroup_bpf_inherit returns an error.
> > Additionally, if cgroup_bpf_inherit returns an error after rebinding
> > subsystems, the root_cgrp->self.refcnt is exited, which leads to
> > cgroup1_root_to_use return 1 (restart) when subsystems is  mounted next.
> > This is due to a failure trying to get the refcnt(the root is root_cgrp,
> > without rebinding back to cgrp_dfl_root). So move the call to
> > cgroup_bpf_inherit above rebind_subsystems in the cgroup_setup_root.
> > 
> > Fixes: 04f8ef5643bc ("cgroup: Fix memory leak caused by missing cgroup_bpf_offline")
> > Signed-off-by: Chen Ridong <chenridong@huawei.com>
> 
> Hm, I always thought that BPF progs can only be attached to the default
> hierarchy (cgroup_bpf_prog_attach/cgroup_get_from_fd should prevent
> that).
> 
> Thus I wonder whether cgroup_bpf_inherit (which is more like
> cgroup_bpf_init in this case) needs to be called no v1 roots at all (and
> with such a change, 04f8ef5643bc could be effectively reverted too).
> 
> Or can bpf data be used on v1 hierarchies somehow?

We relaxed some of the usages (see cgroup_v1v2_get_from_fd()) but cgroup BPF
progs can only be attached to v2.

Thanks.
Chen Ridong Oct. 17, 2024, 12:17 p.m. UTC | #3
On 2024/10/17 1:04, Tejun Heo wrote:
> On Wed, Oct 16, 2024 at 03:13:52PM +0200, Michal Koutný wrote:
>> Hello.
>>
>> On Wed, Oct 16, 2024 at 09:36:33AM GMT, Chen Ridong <chenridong@huaweicloud.com> wrote:
>>> As mentioned above, when cgroup_bpf_inherit returns an error in
>>> cgroup_setup_root, cgrp->bpf.refcnt has been exited. If cgrp->bpf.refcnt is
>>> killed again in the cgroup_kill_sb function, the data of cgrp->bpf.refcnt
>>> may have become NULL, leading to NULL pointer dereference.
>>>
>>> To fix this issue, goto err when cgroup_bpf_inherit returns an error.
>>> Additionally, if cgroup_bpf_inherit returns an error after rebinding
>>> subsystems, the root_cgrp->self.refcnt is exited, which leads to
>>> cgroup1_root_to_use return 1 (restart) when subsystems is  mounted next.
>>> This is due to a failure trying to get the refcnt(the root is root_cgrp,
>>> without rebinding back to cgrp_dfl_root). So move the call to
>>> cgroup_bpf_inherit above rebind_subsystems in the cgroup_setup_root.
>>>
>>> Fixes: 04f8ef5643bc ("cgroup: Fix memory leak caused by missing cgroup_bpf_offline")
>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>
>> Hm, I always thought that BPF progs can only be attached to the default
>> hierarchy (cgroup_bpf_prog_attach/cgroup_get_from_fd should prevent
>> that).
>>
>> Thus I wonder whether cgroup_bpf_inherit (which is more like
>> cgroup_bpf_init in this case) needs to be called no v1 roots at all (and
>> with such a change, 04f8ef5643bc could be effectively reverted too).
>>
>> Or can bpf data be used on v1 hierarchies somehow?
> 
> We relaxed some of the usages (see cgroup_v1v2_get_from_fd()) but cgroup BPF
> progs can only be attached to v2.
> 
> Thanks.
> 

So, should commit 04f8ef5643bc ("cgroup: Fix memory leak caused by 
missing cgroup_bpf_offline") be reverted, and should cgroup_bpf_inherit 
be only called in v2?
Have I understood this correctly?

Best regards,
Ridong
Michal Koutný Oct. 17, 2024, 9:14 p.m. UTC | #4
On Thu, Oct 17, 2024 at 08:17:58PM GMT, Chen Ridong <chenridong@huaweicloud.com> wrote:
> So, should commit 04f8ef5643bc ("cgroup: Fix memory leak caused by missing
> cgroup_bpf_offline") be reverted, and should cgroup_bpf_inherit be only
> called in v2?

Yes, that should resolve both the original leak and the current NULL ptr
dereference without extra rebinding complications.

I'd like to keep only cgroup v2 be attachable by BPF programs.
cgroup_v1v2_get_from_fd serves the needs of traversing v1 hierarchies,
not program attchments.

I hope this proposal is workable also from the CC'ed BPF people
perspective.

Michal
diff mbox series

Patch

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 5886b95c6eae..8a0cbf95cc57 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -2136,12 +2136,13 @@  int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask)
 	if (ret)
 		goto destroy_root;
 
-	ret = rebind_subsystems(root, ss_mask);
+	ret = cgroup_bpf_inherit(root_cgrp);
 	if (ret)
 		goto exit_stats;
 
-	ret = cgroup_bpf_inherit(root_cgrp);
-	WARN_ON_ONCE(ret);
+	ret = rebind_subsystems(root, ss_mask);
+	if (ret)
+		goto exit_stats;
 
 	trace_cgroup_setup_root(root);