diff mbox series

[net] net: sched: fix NULL pointer dereference in mq_attach

Message ID 20230527093747.3583502-1-shaozhengchao@huawei.com (mailing list archive)
State Accepted
Commit 36eec020fab668719b541f34d97f44e232ffa165
Delegated to: Netdev Maintainers
Headers show
Series [net] net: sched: fix NULL pointer dereference in mq_attach | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 11 this patch: 11
netdev/cc_maintainers fail 1 blamed authors not CCed: kaber@trash.net; 1 maintainers not CCed: kaber@trash.net
netdev/build_clang success Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 11 this patch: 11
netdev/checkpatch warning WARNING: line length of 96 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

shaozhengchao May 27, 2023, 9:37 a.m. UTC
When use the following command to test:
1)ip link add bond0 type bond
2)ip link set bond0 up
3)tc qdisc add dev bond0 root handle ffff: mq
4)tc qdisc replace dev bond0 parent ffff:fff1 handle ffff: mq

The kernel reports NULL pointer dereference issue. The stack information
is as follows:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
Internal error: Oops: 0000000096000006 [#1] SMP
Modules linked in:
pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : mq_attach+0x44/0xa0
lr : qdisc_graft+0x20c/0x5cc
sp : ffff80000e2236a0
x29: ffff80000e2236a0 x28: ffff0000c0e59d80 x27: ffff0000c0be19c0
x26: ffff0000cae3e800 x25: 0000000000000010 x24: 00000000fffffff1
x23: 0000000000000000 x22: ffff0000cae3e800 x21: ffff0000c9df4000
x20: ffff0000c9df4000 x19: 0000000000000000 x18: ffff80000a934000
x17: ffff8000f5b56000 x16: ffff80000bb08000 x15: 0000000000000000
x14: 0000000000000000 x13: 6b6b6b6b6b6b6b6b x12: 6b6b6b6b00000001
x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
x8 : ffff0000c0be0730 x7 : bbbbbbbbbbbbbbbb x6 : 0000000000000008
x5 : ffff0000cae3e864 x4 : 0000000000000000 x3 : 0000000000000001
x2 : 0000000000000001 x1 : ffff8000090bc23c x0 : 0000000000000000
Call trace:
mq_attach+0x44/0xa0
qdisc_graft+0x20c/0x5cc
tc_modify_qdisc+0x1c4/0x664
rtnetlink_rcv_msg+0x354/0x440
netlink_rcv_skb+0x64/0x144
rtnetlink_rcv+0x28/0x34
netlink_unicast+0x1e8/0x2a4
netlink_sendmsg+0x308/0x4a0
sock_sendmsg+0x64/0xac
____sys_sendmsg+0x29c/0x358
___sys_sendmsg+0x90/0xd0
__sys_sendmsg+0x7c/0xd0
__arm64_sys_sendmsg+0x2c/0x38
invoke_syscall+0x54/0x114
el0_svc_common.constprop.1+0x90/0x174
do_el0_svc+0x3c/0xb0
el0_svc+0x24/0xec
el0t_64_sync_handler+0x90/0xb4
el0t_64_sync+0x174/0x178

This is because when mq is added for the first time, qdiscs in mq is set
to NULL in mq_attach(). Therefore, when replacing mq after adding mq, we
need to initialize qdiscs in the mq before continuing to graft. Otherwise,
it will couse NULL pointer dereference issue in mq_attach(). And the same
issue will occur in the attach functions of mqprio, taprio and htb.
ffff:fff1 means that the repalce qdisc is ingress. Ingress does not allow
any qdisc to be attached. Therefore, ffff:fff1 is incorrectly used, and
the command should be dropped.

Fixes: 6ec1c69a8f64 ("net_sched: add classful multiqueue dummy scheduler")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
---
 net/sched/sch_api.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Jamal Hadi Salim May 28, 2023, 7:05 p.m. UTC | #1
On Sat, May 27, 2023 at 5:30 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote:
>
> When use the following command to test:
> 1)ip link add bond0 type bond
> 2)ip link set bond0 up
> 3)tc qdisc add dev bond0 root handle ffff: mq
> 4)tc qdisc replace dev bond0 parent ffff:fff1 handle ffff: mq
>

This is fixed by Peilin in this ongoing discussion:
https://lore.kernel.org/netdev/cover.1684887977.git.peilin.ye@bytedance.com/

cheers,
jamal



> The kernel reports NULL pointer dereference issue. The stack information
> is as follows:
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> Internal error: Oops: 0000000096000006 [#1] SMP
> Modules linked in:
> pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : mq_attach+0x44/0xa0
> lr : qdisc_graft+0x20c/0x5cc
> sp : ffff80000e2236a0
> x29: ffff80000e2236a0 x28: ffff0000c0e59d80 x27: ffff0000c0be19c0
> x26: ffff0000cae3e800 x25: 0000000000000010 x24: 00000000fffffff1
> x23: 0000000000000000 x22: ffff0000cae3e800 x21: ffff0000c9df4000
> x20: ffff0000c9df4000 x19: 0000000000000000 x18: ffff80000a934000
> x17: ffff8000f5b56000 x16: ffff80000bb08000 x15: 0000000000000000
> x14: 0000000000000000 x13: 6b6b6b6b6b6b6b6b x12: 6b6b6b6b00000001
> x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
> x8 : ffff0000c0be0730 x7 : bbbbbbbbbbbbbbbb x6 : 0000000000000008
> x5 : ffff0000cae3e864 x4 : 0000000000000000 x3 : 0000000000000001
> x2 : 0000000000000001 x1 : ffff8000090bc23c x0 : 0000000000000000
> Call trace:
> mq_attach+0x44/0xa0
> qdisc_graft+0x20c/0x5cc
> tc_modify_qdisc+0x1c4/0x664
> rtnetlink_rcv_msg+0x354/0x440
> netlink_rcv_skb+0x64/0x144
> rtnetlink_rcv+0x28/0x34
> netlink_unicast+0x1e8/0x2a4
> netlink_sendmsg+0x308/0x4a0
> sock_sendmsg+0x64/0xac
> ____sys_sendmsg+0x29c/0x358
> ___sys_sendmsg+0x90/0xd0
> __sys_sendmsg+0x7c/0xd0
> __arm64_sys_sendmsg+0x2c/0x38
> invoke_syscall+0x54/0x114
> el0_svc_common.constprop.1+0x90/0x174
> do_el0_svc+0x3c/0xb0
> el0_svc+0x24/0xec
> el0t_64_sync_handler+0x90/0xb4
> el0t_64_sync+0x174/0x178
>
> This is because when mq is added for the first time, qdiscs in mq is set
> to NULL in mq_attach(). Therefore, when replacing mq after adding mq, we
> need to initialize qdiscs in the mq before continuing to graft. Otherwise,
> it will couse NULL pointer dereference issue in mq_attach(). And the same
> issue will occur in the attach functions of mqprio, taprio and htb.
> ffff:fff1 means that the repalce qdisc is ingress. Ingress does not allow
> any qdisc to be attached. Therefore, ffff:fff1 is incorrectly used, and
> the command should be dropped.
>
> Fixes: 6ec1c69a8f64 ("net_sched: add classful multiqueue dummy scheduler")
> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
> ---
>  net/sched/sch_api.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index fdb8f429333d..dbc9cf5eea89 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -1596,6 +1596,10 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n,
>                                         NL_SET_ERR_MSG(extack, "Qdisc parent/child loop detected");
>                                         return -ELOOP;
>                                 }
> +                               if (clid == TC_H_INGRESS) {
> +                                       NL_SET_ERR_MSG(extack, "Ingress cannot graft directly");
> +                                       return -EINVAL;
> +                               }
>                                 qdisc_refcount_inc(q);
>                                 goto graft;
>                         } else {
> --
> 2.34.1
>
>
shaozhengchao May 29, 2023, 1:10 a.m. UTC | #2
On 2023/5/29 3:05, Jamal Hadi Salim wrote:
> On Sat, May 27, 2023 at 5:30 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote:
>>
>> When use the following command to test:
>> 1)ip link add bond0 type bond
>> 2)ip link set bond0 up
>> 3)tc qdisc add dev bond0 root handle ffff: mq
>> 4)tc qdisc replace dev bond0 parent ffff:fff1 handle ffff: mq
>>
> 
> This is fixed by Peilin in this ongoing discussion:
> https://lore.kernel.org/netdev/cover.1684887977.git.peilin.ye@bytedance.com/
> 
> cheers,
> jamal
> 
>
Hi jamal:
	Thank you for your reply. I have notice Peilin's patches before,
and test after the patch is incorporated in local host. But it still
triggers the problem.
	Peilin's patches can be filtered out when the query result of
qdisc_lookup is of the ingress type. Here is 4/6 patch in his patches.
+if (q->flags & TCQ_F_INGRESS) {
+	NL_SET_ERR_MSG(extack,
+		       "Cannot regraft ingress or clsact Qdiscs");
+	return -EINVAL;
+}
	However, the query result of my test case in qdisc_lookup is mq.
Therefore, the patch cannot solve my problem.
Thank you.

Zhengchao Shao
> 
>> The kernel reports NULL pointer dereference issue. The stack information
>> is as follows:
>> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
>> Internal error: Oops: 0000000096000006 [#1] SMP
>> Modules linked in:
>> pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> pc : mq_attach+0x44/0xa0
>> lr : qdisc_graft+0x20c/0x5cc
>> sp : ffff80000e2236a0
>> x29: ffff80000e2236a0 x28: ffff0000c0e59d80 x27: ffff0000c0be19c0
>> x26: ffff0000cae3e800 x25: 0000000000000010 x24: 00000000fffffff1
>> x23: 0000000000000000 x22: ffff0000cae3e800 x21: ffff0000c9df4000
>> x20: ffff0000c9df4000 x19: 0000000000000000 x18: ffff80000a934000
>> x17: ffff8000f5b56000 x16: ffff80000bb08000 x15: 0000000000000000
>> x14: 0000000000000000 x13: 6b6b6b6b6b6b6b6b x12: 6b6b6b6b00000001
>> x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
>> x8 : ffff0000c0be0730 x7 : bbbbbbbbbbbbbbbb x6 : 0000000000000008
>> x5 : ffff0000cae3e864 x4 : 0000000000000000 x3 : 0000000000000001
>> x2 : 0000000000000001 x1 : ffff8000090bc23c x0 : 0000000000000000
>> Call trace:
>> mq_attach+0x44/0xa0
>> qdisc_graft+0x20c/0x5cc
>> tc_modify_qdisc+0x1c4/0x664
>> rtnetlink_rcv_msg+0x354/0x440
>> netlink_rcv_skb+0x64/0x144
>> rtnetlink_rcv+0x28/0x34
>> netlink_unicast+0x1e8/0x2a4
>> netlink_sendmsg+0x308/0x4a0
>> sock_sendmsg+0x64/0xac
>> ____sys_sendmsg+0x29c/0x358
>> ___sys_sendmsg+0x90/0xd0
>> __sys_sendmsg+0x7c/0xd0
>> __arm64_sys_sendmsg+0x2c/0x38
>> invoke_syscall+0x54/0x114
>> el0_svc_common.constprop.1+0x90/0x174
>> do_el0_svc+0x3c/0xb0
>> el0_svc+0x24/0xec
>> el0t_64_sync_handler+0x90/0xb4
>> el0t_64_sync+0x174/0x178
>>
>> This is because when mq is added for the first time, qdiscs in mq is set
>> to NULL in mq_attach(). Therefore, when replacing mq after adding mq, we
>> need to initialize qdiscs in the mq before continuing to graft. Otherwise,
>> it will couse NULL pointer dereference issue in mq_attach(). And the same
>> issue will occur in the attach functions of mqprio, taprio and htb.
>> ffff:fff1 means that the repalce qdisc is ingress. Ingress does not allow
>> any qdisc to be attached. Therefore, ffff:fff1 is incorrectly used, and
>> the command should be dropped.
>>
>> Fixes: 6ec1c69a8f64 ("net_sched: add classful multiqueue dummy scheduler")
>> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
>> ---
>>   net/sched/sch_api.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
>> index fdb8f429333d..dbc9cf5eea89 100644
>> --- a/net/sched/sch_api.c
>> +++ b/net/sched/sch_api.c
>> @@ -1596,6 +1596,10 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n,
>>                                          NL_SET_ERR_MSG(extack, "Qdisc parent/child loop detected");
>>                                          return -ELOOP;
>>                                  }
>> +                               if (clid == TC_H_INGRESS) {
>> +                                       NL_SET_ERR_MSG(extack, "Ingress cannot graft directly");
>> +                                       return -EINVAL;
>> +                               }
>>                                  qdisc_refcount_inc(q);
>>                                  goto graft;
>>                          } else {
>> --
>> 2.34.1
>>
>>
>
Peilin Ye May 29, 2023, 8:59 a.m. UTC | #3
On Mon, May 29, 2023 at 09:10:23AM +0800, shaozhengchao wrote:
> On 2023/5/29 3:05, Jamal Hadi Salim wrote:
> > On Sat, May 27, 2023 at 5:30 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote:
> > > When use the following command to test:
> > > 1)ip link add bond0 type bond
> > > 2)ip link set bond0 up
> > > 3)tc qdisc add dev bond0 root handle ffff: mq
> > > 4)tc qdisc replace dev bond0 parent ffff:fff1 handle ffff: mq
> >
> > This is fixed by Peilin in this ongoing discussion:
> > https://lore.kernel.org/netdev/cover.1684887977.git.peilin.ye@bytedance.com/
> >
>       Thank you for your reply. I have notice Peilin's patches before,
> and test after the patch is incorporated in local host. But it still
> triggers the problem.
>       Peilin's patches can be filtered out when the query result of
> qdisc_lookup is of the ingress type. Here is 4/6 patch in his patches.
> +if (q->flags & TCQ_F_INGRESS) {
> +     NL_SET_ERR_MSG(extack,
> +                    "Cannot regraft ingress or clsact Qdiscs");
> +     return -EINVAL;
> +}
>       However, the query result of my test case in qdisc_lookup is mq.
> Therefore, the patch cannot solve my problem.

Ack, they are different: patch [4/6] prevents ingress (clsact) Qdiscs
from being regrafted (to elsewhere), and Zhengchao's patch prevents other
Qdiscs from being regrafted to ffff:fff1.

Thanks,
Peilin Ye
shaozhengchao May 29, 2023, 10:18 a.m. UTC | #4
On 2023/5/29 16:59, Peilin Ye wrote:
> On Mon, May 29, 2023 at 09:10:23AM +0800, shaozhengchao wrote:
>> On 2023/5/29 3:05, Jamal Hadi Salim wrote:
>>> On Sat, May 27, 2023 at 5:30 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote:
>>>> When use the following command to test:
>>>> 1)ip link add bond0 type bond
>>>> 2)ip link set bond0 up
>>>> 3)tc qdisc add dev bond0 root handle ffff: mq
>>>> 4)tc qdisc replace dev bond0 parent ffff:fff1 handle ffff: mq
>>>
>>> This is fixed by Peilin in this ongoing discussion:
>>> https://lore.kernel.org/netdev/cover.1684887977.git.peilin.ye@bytedance.com/
>>>
>>        Thank you for your reply. I have notice Peilin's patches before,
>> and test after the patch is incorporated in local host. But it still
>> triggers the problem.
>>        Peilin's patches can be filtered out when the query result of
>> qdisc_lookup is of the ingress type. Here is 4/6 patch in his patches.
>> +if (q->flags & TCQ_F_INGRESS) {
>> +     NL_SET_ERR_MSG(extack,
>> +                    "Cannot regraft ingress or clsact Qdiscs");
>> +     return -EINVAL;
>> +}
>>        However, the query result of my test case in qdisc_lookup is mq.
>> Therefore, the patch cannot solve my problem.
> 
> Ack, they are different: patch [4/6] prevents ingress (clsact) Qdiscs
> from being regrafted (to elsewhere), and Zhengchao's patch prevents other
> Qdiscs from being regrafted to ffff:fff1.
> 
> Thanks,
> Peilin Ye
> 
Hi Peilin:
	Thank you for your ack.

Zhengchao Shao
Jamal Hadi Salim May 29, 2023, 1:53 p.m. UTC | #5
On Mon, May 29, 2023 at 4:59 AM Peilin Ye <yepeilin.cs@gmail.com> wrote:
>
> On Mon, May 29, 2023 at 09:10:23AM +0800, shaozhengchao wrote:
> > On 2023/5/29 3:05, Jamal Hadi Salim wrote:
> > > On Sat, May 27, 2023 at 5:30 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote:
> > > > When use the following command to test:
> > > > 1)ip link add bond0 type bond
> > > > 2)ip link set bond0 up
> > > > 3)tc qdisc add dev bond0 root handle ffff: mq
> > > > 4)tc qdisc replace dev bond0 parent ffff:fff1 handle ffff: mq
> > >
> > > This is fixed by Peilin in this ongoing discussion:
> > > https://lore.kernel.org/netdev/cover.1684887977.git.peilin.ye@bytedance.com/
> > >
> >       Thank you for your reply. I have notice Peilin's patches before,
> > and test after the patch is incorporated in local host. But it still
> > triggers the problem.
> >       Peilin's patches can be filtered out when the query result of
> > qdisc_lookup is of the ingress type. Here is 4/6 patch in his patches.
> > +if (q->flags & TCQ_F_INGRESS) {
> > +     NL_SET_ERR_MSG(extack,
> > +                    "Cannot regraft ingress or clsact Qdiscs");
> > +     return -EINVAL;
> > +}
> >       However, the query result of my test case in qdisc_lookup is mq.
> > Therefore, the patch cannot solve my problem.
>
> Ack, they are different: patch [4/6] prevents ingress (clsact) Qdiscs
> from being regrafted (to elsewhere), and Zhengchao's patch prevents other
> Qdiscs from being regrafted to ffff:fff1.


Ok, at first glance it was not obvious.
Do we catch all combinations? for egress (0xffffffff) allowed minor is
0xfff3 (clsact::) and 0xffff. For ingress (0xfffffff1) allowed minor
is 0xfff1 and 0xfff2(clsact).

cheers,
jamal

> Thanks,
> Peilin Ye
>
Peilin Ye May 30, 2023, 12:17 a.m. UTC | #6
On Mon, May 29, 2023 at 09:53:28AM -0400, Jamal Hadi Salim wrote:
> On Mon, May 29, 2023 at 4:59 AM Peilin Ye <yepeilin.cs@gmail.com> wrote:
> > On Mon, May 29, 2023 at 09:10:23AM +0800, shaozhengchao wrote:
> > > On 2023/5/29 3:05, Jamal Hadi Salim wrote:
> > > > On Sat, May 27, 2023 at 5:30 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote:
> > > > > When use the following command to test:
> > > > > 1)ip link add bond0 type bond
> > > > > 2)ip link set bond0 up
> > > > > 3)tc qdisc add dev bond0 root handle ffff: mq
> > > > > 4)tc qdisc replace dev bond0 parent ffff:fff1 handle ffff: mq
> > > >
> > > > This is fixed by Peilin in this ongoing discussion:
> > > > https://lore.kernel.org/netdev/cover.1684887977.git.peilin.ye@bytedance.com/
> > > >
> > >       Thank you for your reply. I have notice Peilin's patches before,
> > > and test after the patch is incorporated in local host. But it still
> > > triggers the problem.
> > >       Peilin's patches can be filtered out when the query result of
> > > qdisc_lookup is of the ingress type. Here is 4/6 patch in his patches.
> > > +if (q->flags & TCQ_F_INGRESS) {
> > > +     NL_SET_ERR_MSG(extack,
> > > +                    "Cannot regraft ingress or clsact Qdiscs");
> > > +     return -EINVAL;
> > > +}
> > >       However, the query result of my test case in qdisc_lookup is mq.
> > > Therefore, the patch cannot solve my problem.
> >
> > Ack, they are different: patch [4/6] prevents ingress (clsact) Qdiscs
> > from being regrafted (to elsewhere), and Zhengchao's patch prevents other
> > Qdiscs from being regrafted to ffff:fff1.
> 
> Ok, at first glance it was not obvious.
> Do we catch all combinations? for egress (0xffffffff) allowed minor is
> 0xfff3 (clsact::) and 0xffff. For ingress (0xfffffff1) allowed minor
> is 0xfff1 and 0xfff2(clsact).

ffff:fff1 is special in tc_modify_qdisc(); if minor isn't fff1,
tc_modify_qdisc() thinks user wants to graft a Qdisc under existing ingress
or clsact Qdisc:

	if (clid != TC_H_INGRESS) {	/* ffff:fff1 */
		p = qdisc_lookup(dev, TC_H_MAJ(clid));
		if (!p) {
			NL_SET_ERR_MSG(extack, "Failed to find specified qdisc");
			return -ENOENT;
		}
		q = qdisc_leaf(p, clid);
	} else if (dev_ingress_queue_create(dev)) {
		q = dev_ingress_queue(dev)->qdisc_sleeping;
	}

This will go to the "parent != NULL" path in qdisc_graft(), and
sch_{ingress,clsact} doesn't implement cl_ops->graft(), so -EOPNOTSUPP will
be returned.

In short, yes, I think ffff:fff1 is the only case should be fixed.

By the way I just noticed that currently it is possible to create a e.g.
HTB class with a class ID of ffff:fff1...

  $ tc qdisc add dev eth0 root handle ffff: htb default fff1
  $ tc class add dev eth0 \
            parent ffff: classid ffff:fff1 htb rate 100%

Regrafting a Qdisc to such classes won't work as intended at all.  It's a
separate issue though.

Thanks,
Peilin Ye
Peilin Ye May 30, 2023, 1:07 a.m. UTC | #7
On Sat, May 27, 2023 at 05:37:47PM +0800, Zhengchao Shao wrote:
> When use the following command to test:
> 1)ip link add bond0 type bond
> 2)ip link set bond0 up
> 3)tc qdisc add dev bond0 root handle ffff: mq
> 4)tc qdisc replace dev bond0 parent ffff:fff1 handle ffff: mq
> 
> The kernel reports NULL pointer dereference issue. The stack information
> is as follows:
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> Internal error: Oops: 0000000096000006 [#1] SMP
> Modules linked in:
> pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : mq_attach+0x44/0xa0
> lr : qdisc_graft+0x20c/0x5cc
> sp : ffff80000e2236a0
> x29: ffff80000e2236a0 x28: ffff0000c0e59d80 x27: ffff0000c0be19c0
> x26: ffff0000cae3e800 x25: 0000000000000010 x24: 00000000fffffff1
> x23: 0000000000000000 x22: ffff0000cae3e800 x21: ffff0000c9df4000
> x20: ffff0000c9df4000 x19: 0000000000000000 x18: ffff80000a934000
> x17: ffff8000f5b56000 x16: ffff80000bb08000 x15: 0000000000000000
> x14: 0000000000000000 x13: 6b6b6b6b6b6b6b6b x12: 6b6b6b6b00000001
> x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
> x8 : ffff0000c0be0730 x7 : bbbbbbbbbbbbbbbb x6 : 0000000000000008
> x5 : ffff0000cae3e864 x4 : 0000000000000000 x3 : 0000000000000001
> x2 : 0000000000000001 x1 : ffff8000090bc23c x0 : 0000000000000000
> Call trace:
> mq_attach+0x44/0xa0
> qdisc_graft+0x20c/0x5cc
> tc_modify_qdisc+0x1c4/0x664
> rtnetlink_rcv_msg+0x354/0x440
> netlink_rcv_skb+0x64/0x144
> rtnetlink_rcv+0x28/0x34
> netlink_unicast+0x1e8/0x2a4
> netlink_sendmsg+0x308/0x4a0
> sock_sendmsg+0x64/0xac
> ____sys_sendmsg+0x29c/0x358
> ___sys_sendmsg+0x90/0xd0
> __sys_sendmsg+0x7c/0xd0
> __arm64_sys_sendmsg+0x2c/0x38
> invoke_syscall+0x54/0x114
> el0_svc_common.constprop.1+0x90/0x174
> do_el0_svc+0x3c/0xb0
> el0_svc+0x24/0xec
> el0t_64_sync_handler+0x90/0xb4
> el0t_64_sync+0x174/0x178
> 
> This is because when mq is added for the first time, qdiscs in mq is set
> to NULL in mq_attach(). Therefore, when replacing mq after adding mq, we
> need to initialize qdiscs in the mq before continuing to graft. Otherwise,
> it will couse NULL pointer dereference issue in mq_attach(). And the same
> issue will occur in the attach functions of mqprio, taprio and htb.
> ffff:fff1 means that the repalce qdisc is ingress. Ingress does not allow
> any qdisc to be attached. Therefore, ffff:fff1 is incorrectly used, and
> the command should be dropped.
> 
> Fixes: 6ec1c69a8f64 ("net_sched: add classful multiqueue dummy scheduler")
> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>

Tested-by: Peilin Ye <peilin.ye@bytedance.com>

Thanks,
Peilin Ye
shaozhengchao May 30, 2023, 2:45 a.m. UTC | #8
Hi Peilin:
	Thank you for your reply.
On 2023/5/30 8:17, Peilin Ye wrote:
> On Mon, May 29, 2023 at 09:53:28AM -0400, Jamal Hadi Salim wrote:
>> On Mon, May 29, 2023 at 4:59 AM Peilin Ye <yepeilin.cs@gmail.com> wrote:
>>> On Mon, May 29, 2023 at 09:10:23AM +0800, shaozhengchao wrote:
>>>> On 2023/5/29 3:05, Jamal Hadi Salim wrote:
>>>>> On Sat, May 27, 2023 at 5:30 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote:
>>>>>> When use the following command to test:
>>>>>> 1)ip link add bond0 type bond
>>>>>> 2)ip link set bond0 up
>>>>>> 3)tc qdisc add dev bond0 root handle ffff: mq
>>>>>> 4)tc qdisc replace dev bond0 parent ffff:fff1 handle ffff: mq
>>>>>
>>>>> This is fixed by Peilin in this ongoing discussion:
>>>>> https://lore.kernel.org/netdev/cover.1684887977.git.peilin.ye@bytedance.com/
>>>>>
>>>>        Thank you for your reply. I have notice Peilin's patches before,
>>>> and test after the patch is incorporated in local host. But it still
>>>> triggers the problem.
>>>>        Peilin's patches can be filtered out when the query result of
>>>> qdisc_lookup is of the ingress type. Here is 4/6 patch in his patches.
>>>> +if (q->flags & TCQ_F_INGRESS) {
>>>> +     NL_SET_ERR_MSG(extack,
>>>> +                    "Cannot regraft ingress or clsact Qdiscs");
>>>> +     return -EINVAL;
>>>> +}
>>>>        However, the query result of my test case in qdisc_lookup is mq.
>>>> Therefore, the patch cannot solve my problem.
>>>
>>> Ack, they are different: patch [4/6] prevents ingress (clsact) Qdiscs
>>> from being regrafted (to elsewhere), and Zhengchao's patch prevents other
>>> Qdiscs from being regrafted to ffff:fff1.
>>
>> Ok, at first glance it was not obvious.
>> Do we catch all combinations? for egress (0xffffffff) allowed minor is
>> 0xfff3 (clsact::) and 0xffff. For ingress (0xfffffff1) allowed minor
>> is 0xfff1 and 0xfff2(clsact).
> 
> ffff:fff1 is special in tc_modify_qdisc(); if minor isn't fff1,
> tc_modify_qdisc() thinks user wants to graft a Qdisc under existing ingress
> or clsact Qdisc:
> 
> 	if (clid != TC_H_INGRESS) {	/* ffff:fff1 */
> 		p = qdisc_lookup(dev, TC_H_MAJ(clid));
> 		if (!p) {
> 			NL_SET_ERR_MSG(extack, "Failed to find specified qdisc");
> 			return -ENOENT;
> 		}
> 		q = qdisc_leaf(p, clid);
> 	} else if (dev_ingress_queue_create(dev)) {
> 		q = dev_ingress_queue(dev)->qdisc_sleeping;
> 	}
> 
> This will go to the "parent != NULL" path in qdisc_graft(), and
> sch_{ingress,clsact} doesn't implement cl_ops->graft(), so -EOPNOTSUPP will
> be returned.
> 
Yes, I agree.
> In short, yes, I think ffff:fff1 is the only case should be fixed.
> 
> By the way I just noticed that currently it is possible to create a e.g.
> HTB class with a class ID of ffff:fff1...
> 
>    $ tc qdisc add dev eth0 root handle ffff: htb default fff1
>    $ tc class add dev eth0 \
>              parent ffff: classid ffff:fff1 htb rate 100%
> 
> Regrafting a Qdisc to such classes won't work as intended at all.  It's a
> separate issue though.
> 
This seems to be another problem.

Zhengchao Shao
> Thanks,
> Peilin Ye
>
Paolo Abeni May 30, 2023, 10:16 a.m. UTC | #9
On Mon, 2023-05-29 at 17:17 -0700, Peilin Ye wrote:
> On Mon, May 29, 2023 at 09:53:28AM -0400, Jamal Hadi Salim wrote:
> > On Mon, May 29, 2023 at 4:59 AM Peilin Ye <yepeilin.cs@gmail.com> wrote:
> > > Ack, they are different: patch [4/6] prevents ingress (clsact) Qdiscs
> > > from being regrafted (to elsewhere), and Zhengchao's patch prevents other
> > > Qdiscs from being regrafted to ffff:fff1.
> > 
> > Ok, at first glance it was not obvious.
> > Do we catch all combinations? for egress (0xffffffff) allowed minor is
> > 0xfff3 (clsact::) and 0xffff. For ingress (0xfffffff1) allowed minor
> > is 0xfff1 and 0xfff2(clsact).
> 
> ffff:fff1 is special in tc_modify_qdisc(); if minor isn't fff1,
> tc_modify_qdisc() thinks user wants to graft a Qdisc under existing ingress
> or clsact Qdisc:
> 
> 	if (clid != TC_H_INGRESS) {	/* ffff:fff1 */
> 		p = qdisc_lookup(dev, TC_H_MAJ(clid));
> 		if (!p) {
> 			NL_SET_ERR_MSG(extack, "Failed to find specified qdisc");
> 			return -ENOENT;
> 		}
> 		q = qdisc_leaf(p, clid);
> 	} else if (dev_ingress_queue_create(dev)) {
> 		q = dev_ingress_queue(dev)->qdisc_sleeping;
> 	}
> 
> This will go to the "parent != NULL" path in qdisc_graft(), and
> sch_{ingress,clsact} doesn't implement cl_ops->graft(), so -EOPNOTSUPP will
> be returned.
> 
> In short, yes, I think ffff:fff1 is the only case should be fixed.
> 
> By the way I just noticed that currently it is possible to create a e.g.
> HTB class with a class ID of ffff:fff1...
> 
>   $ tc qdisc add dev eth0 root handle ffff: htb default fff1
>   $ tc class add dev eth0 \
>             parent ffff: classid ffff:fff1 htb rate 100%
> 
> Regrafting a Qdisc to such classes won't work as intended at all.  It's a
> separate issue though.

Jamal, are you ok with the above explanation? Perhaps it would be
worthy to add a specific test-case under tc-testing for this issue?

Thanks!

Paolo
Peilin Ye May 31, 2023, 12:35 a.m. UTC | #10
On Tue, May 30, 2023 at 12:16:10PM +0200, Paolo Abeni wrote:
> Perhaps it would be worthy to add a specific test-case under tc-testing
> for this issue?

FYI I've started creating ingress.json tests for the issues fixed in this
series [1].  Zhengchao, I noticed that you've added a lot of tc-tests
before, would you like to add another one for this issue you fixed?  Or I
can do it if you're not going to.

[1] https://lore.kernel.org/r/cover.1685388545.git.peilin.ye@bytedance.com/

Thanks,
Peilin Ye
Jamal Hadi Salim May 31, 2023, 12:43 a.m. UTC | #11
On Tue, May 30, 2023 at 6:16 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Mon, 2023-05-29 at 17:17 -0700, Peilin Ye wrote:
> > On Mon, May 29, 2023 at 09:53:28AM -0400, Jamal Hadi Salim wrote:
> > > On Mon, May 29, 2023 at 4:59 AM Peilin Ye <yepeilin.cs@gmail.com> wrote:
> > > > Ack, they are different: patch [4/6] prevents ingress (clsact) Qdiscs
> > > > from being regrafted (to elsewhere), and Zhengchao's patch prevents other
> > > > Qdiscs from being regrafted to ffff:fff1.
> > >
> > > Ok, at first glance it was not obvious.
> > > Do we catch all combinations? for egress (0xffffffff) allowed minor is
> > > 0xfff3 (clsact::) and 0xffff. For ingress (0xfffffff1) allowed minor
> > > is 0xfff1 and 0xfff2(clsact).
> >
> > ffff:fff1 is special in tc_modify_qdisc(); if minor isn't fff1,
> > tc_modify_qdisc() thinks user wants to graft a Qdisc under existing ingress
> > or clsact Qdisc:
> >
> >       if (clid != TC_H_INGRESS) {     /* ffff:fff1 */
> >               p = qdisc_lookup(dev, TC_H_MAJ(clid));
> >               if (!p) {
> >                       NL_SET_ERR_MSG(extack, "Failed to find specified qdisc");
> >                       return -ENOENT;
> >               }
> >               q = qdisc_leaf(p, clid);
> >       } else if (dev_ingress_queue_create(dev)) {
> >               q = dev_ingress_queue(dev)->qdisc_sleeping;
> >       }
> >
> > This will go to the "parent != NULL" path in qdisc_graft(), and
> > sch_{ingress,clsact} doesn't implement cl_ops->graft(), so -EOPNOTSUPP will
> > be returned.
> >
> > In short, yes, I think ffff:fff1 is the only case should be fixed.
> >
> > By the way I just noticed that currently it is possible to create a e.g.
> > HTB class with a class ID of ffff:fff1...
> >
> >   $ tc qdisc add dev eth0 root handle ffff: htb default fff1
> >   $ tc class add dev eth0 \
> >             parent ffff: classid ffff:fff1 htb rate 100%
> >
> > Regrafting a Qdisc to such classes won't work as intended at all.  It's a
> > separate issue though.
>
> Jamal, are you ok with the above explanation? Perhaps it would be
> worthy to add a specific test-case under tc-testing for this issue?
>

I am fine with this one going in as is and then adding tests. I will ACK it.

cheers,
jamal

> Thanks!
>
> Paolo
>
Jamal Hadi Salim May 31, 2023, 12:50 a.m. UTC | #12
On Sat, May 27, 2023 at 5:30 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote:
>
> When use the following command to test:
> 1)ip link add bond0 type bond
> 2)ip link set bond0 up
> 3)tc qdisc add dev bond0 root handle ffff: mq
> 4)tc qdisc replace dev bond0 parent ffff:fff1 handle ffff: mq
>
> The kernel reports NULL pointer dereference issue. The stack information
> is as follows:
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> Internal error: Oops: 0000000096000006 [#1] SMP
> Modules linked in:
> pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : mq_attach+0x44/0xa0
> lr : qdisc_graft+0x20c/0x5cc
> sp : ffff80000e2236a0
> x29: ffff80000e2236a0 x28: ffff0000c0e59d80 x27: ffff0000c0be19c0
> x26: ffff0000cae3e800 x25: 0000000000000010 x24: 00000000fffffff1
> x23: 0000000000000000 x22: ffff0000cae3e800 x21: ffff0000c9df4000
> x20: ffff0000c9df4000 x19: 0000000000000000 x18: ffff80000a934000
> x17: ffff8000f5b56000 x16: ffff80000bb08000 x15: 0000000000000000
> x14: 0000000000000000 x13: 6b6b6b6b6b6b6b6b x12: 6b6b6b6b00000001
> x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
> x8 : ffff0000c0be0730 x7 : bbbbbbbbbbbbbbbb x6 : 0000000000000008
> x5 : ffff0000cae3e864 x4 : 0000000000000000 x3 : 0000000000000001
> x2 : 0000000000000001 x1 : ffff8000090bc23c x0 : 0000000000000000
> Call trace:
> mq_attach+0x44/0xa0
> qdisc_graft+0x20c/0x5cc
> tc_modify_qdisc+0x1c4/0x664
> rtnetlink_rcv_msg+0x354/0x440
> netlink_rcv_skb+0x64/0x144
> rtnetlink_rcv+0x28/0x34
> netlink_unicast+0x1e8/0x2a4
> netlink_sendmsg+0x308/0x4a0
> sock_sendmsg+0x64/0xac
> ____sys_sendmsg+0x29c/0x358
> ___sys_sendmsg+0x90/0xd0
> __sys_sendmsg+0x7c/0xd0
> __arm64_sys_sendmsg+0x2c/0x38
> invoke_syscall+0x54/0x114
> el0_svc_common.constprop.1+0x90/0x174
> do_el0_svc+0x3c/0xb0
> el0_svc+0x24/0xec
> el0t_64_sync_handler+0x90/0xb4
> el0t_64_sync+0x174/0x178
>
> This is because when mq is added for the first time, qdiscs in mq is set
> to NULL in mq_attach(). Therefore, when replacing mq after adding mq, we
> need to initialize qdiscs in the mq before continuing to graft. Otherwise,
> it will couse NULL pointer dereference issue in mq_attach(). And the same
> issue will occur in the attach functions of mqprio, taprio and htb.
> ffff:fff1 means that the repalce qdisc is ingress. Ingress does not allow
> any qdisc to be attached. Therefore, ffff:fff1 is incorrectly used, and
> the command should be dropped.
>
> Fixes: 6ec1c69a8f64 ("net_sched: add classful multiqueue dummy scheduler")
> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

> ---
>  net/sched/sch_api.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index fdb8f429333d..dbc9cf5eea89 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -1596,6 +1596,10 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n,
>                                         NL_SET_ERR_MSG(extack, "Qdisc parent/child loop detected");
>                                         return -ELOOP;
>                                 }
> +                               if (clid == TC_H_INGRESS) {
> +                                       NL_SET_ERR_MSG(extack, "Ingress cannot graft directly");
> +                                       return -EINVAL;
> +                               }
>                                 qdisc_refcount_inc(q);
>                                 goto graft;
>                         } else {
> --
> 2.34.1
>
>
shaozhengchao May 31, 2023, 1 a.m. UTC | #13
On 2023/5/31 8:35, Peilin Ye wrote:
> On Tue, May 30, 2023 at 12:16:10PM +0200, Paolo Abeni wrote:
>> Perhaps it would be worthy to add a specific test-case under tc-testing
>> for this issue?
> 
> FYI I've started creating ingress.json tests for the issues fixed in this
> series [1].  Zhengchao, I noticed that you've added a lot of tc-tests
> before, would you like to add another one for this issue you fixed?  Or I
> can do it if you're not going to.
> 
> [1] https://lore.kernel.org/r/cover.1685388545.git.peilin.ye@bytedance.com/
> 
> Thanks,
> Peilin Ye
> 
Hi Peilin:
	Thank you for the notice. I'd like to add the test-case in
another patch.

Zhengchao Shao
patchwork-bot+netdevbpf@kernel.org May 31, 2023, 6:40 a.m. UTC | #14
Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sat, 27 May 2023 17:37:47 +0800 you wrote:
> When use the following command to test:
> 1)ip link add bond0 type bond
> 2)ip link set bond0 up
> 3)tc qdisc add dev bond0 root handle ffff: mq
> 4)tc qdisc replace dev bond0 parent ffff:fff1 handle ffff: mq
> 
> The kernel reports NULL pointer dereference issue. The stack information
> is as follows:
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
> Internal error: Oops: 0000000096000006 [#1] SMP
> Modules linked in:
> pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : mq_attach+0x44/0xa0
> lr : qdisc_graft+0x20c/0x5cc
> sp : ffff80000e2236a0
> x29: ffff80000e2236a0 x28: ffff0000c0e59d80 x27: ffff0000c0be19c0
> x26: ffff0000cae3e800 x25: 0000000000000010 x24: 00000000fffffff1
> x23: 0000000000000000 x22: ffff0000cae3e800 x21: ffff0000c9df4000
> x20: ffff0000c9df4000 x19: 0000000000000000 x18: ffff80000a934000
> x17: ffff8000f5b56000 x16: ffff80000bb08000 x15: 0000000000000000
> x14: 0000000000000000 x13: 6b6b6b6b6b6b6b6b x12: 6b6b6b6b00000001
> x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
> x8 : ffff0000c0be0730 x7 : bbbbbbbbbbbbbbbb x6 : 0000000000000008
> x5 : ffff0000cae3e864 x4 : 0000000000000000 x3 : 0000000000000001
> x2 : 0000000000000001 x1 : ffff8000090bc23c x0 : 0000000000000000
> Call trace:
> mq_attach+0x44/0xa0
> qdisc_graft+0x20c/0x5cc
> tc_modify_qdisc+0x1c4/0x664
> rtnetlink_rcv_msg+0x354/0x440
> netlink_rcv_skb+0x64/0x144
> rtnetlink_rcv+0x28/0x34
> netlink_unicast+0x1e8/0x2a4
> netlink_sendmsg+0x308/0x4a0
> sock_sendmsg+0x64/0xac
> ____sys_sendmsg+0x29c/0x358
> ___sys_sendmsg+0x90/0xd0
> __sys_sendmsg+0x7c/0xd0
> __arm64_sys_sendmsg+0x2c/0x38
> invoke_syscall+0x54/0x114
> el0_svc_common.constprop.1+0x90/0x174
> do_el0_svc+0x3c/0xb0
> el0_svc+0x24/0xec
> el0t_64_sync_handler+0x90/0xb4
> el0t_64_sync+0x174/0x178
> 
> [...]

Here is the summary with links:
  - [net] net: sched: fix NULL pointer dereference in mq_attach
    https://git.kernel.org/netdev/net/c/36eec020fab6

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index fdb8f429333d..dbc9cf5eea89 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1596,6 +1596,10 @@  static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n,
 					NL_SET_ERR_MSG(extack, "Qdisc parent/child loop detected");
 					return -ELOOP;
 				}
+				if (clid == TC_H_INGRESS) {
+					NL_SET_ERR_MSG(extack, "Ingress cannot graft directly");
+					return -EINVAL;
+				}
 				qdisc_refcount_inc(q);
 				goto graft;
 			} else {