Message ID | 20230527093747.3583502-1-shaozhengchao@huawei.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 36eec020fab668719b541f34d97f44e232ffa165 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: sched: fix NULL pointer dereference in mq_attach | expand |
On Sat, May 27, 2023 at 5:30 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote: > > When use the following command to test: > 1)ip link add bond0 type bond > 2)ip link set bond0 up > 3)tc qdisc add dev bond0 root handle ffff: mq > 4)tc qdisc replace dev bond0 parent ffff:fff1 handle ffff: mq > This is fixed by Peilin in this ongoing discussion: https://lore.kernel.org/netdev/cover.1684887977.git.peilin.ye@bytedance.com/ cheers, jamal > The kernel reports NULL pointer dereference issue. The stack information > is as follows: > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > Internal error: Oops: 0000000096000006 [#1] SMP > Modules linked in: > pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > pc : mq_attach+0x44/0xa0 > lr : qdisc_graft+0x20c/0x5cc > sp : ffff80000e2236a0 > x29: ffff80000e2236a0 x28: ffff0000c0e59d80 x27: ffff0000c0be19c0 > x26: ffff0000cae3e800 x25: 0000000000000010 x24: 00000000fffffff1 > x23: 0000000000000000 x22: ffff0000cae3e800 x21: ffff0000c9df4000 > x20: ffff0000c9df4000 x19: 0000000000000000 x18: ffff80000a934000 > x17: ffff8000f5b56000 x16: ffff80000bb08000 x15: 0000000000000000 > x14: 0000000000000000 x13: 6b6b6b6b6b6b6b6b x12: 6b6b6b6b00000001 > x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 > x8 : ffff0000c0be0730 x7 : bbbbbbbbbbbbbbbb x6 : 0000000000000008 > x5 : ffff0000cae3e864 x4 : 0000000000000000 x3 : 0000000000000001 > x2 : 0000000000000001 x1 : ffff8000090bc23c x0 : 0000000000000000 > Call trace: > mq_attach+0x44/0xa0 > qdisc_graft+0x20c/0x5cc > tc_modify_qdisc+0x1c4/0x664 > rtnetlink_rcv_msg+0x354/0x440 > netlink_rcv_skb+0x64/0x144 > rtnetlink_rcv+0x28/0x34 > netlink_unicast+0x1e8/0x2a4 > netlink_sendmsg+0x308/0x4a0 > sock_sendmsg+0x64/0xac > ____sys_sendmsg+0x29c/0x358 > ___sys_sendmsg+0x90/0xd0 > __sys_sendmsg+0x7c/0xd0 > __arm64_sys_sendmsg+0x2c/0x38 > invoke_syscall+0x54/0x114 > el0_svc_common.constprop.1+0x90/0x174 > do_el0_svc+0x3c/0xb0 > el0_svc+0x24/0xec > el0t_64_sync_handler+0x90/0xb4 > el0t_64_sync+0x174/0x178 > > This is because when mq is added for the first time, qdiscs in mq is set > to NULL in mq_attach(). Therefore, when replacing mq after adding mq, we > need to initialize qdiscs in the mq before continuing to graft. Otherwise, > it will couse NULL pointer dereference issue in mq_attach(). And the same > issue will occur in the attach functions of mqprio, taprio and htb. > ffff:fff1 means that the repalce qdisc is ingress. Ingress does not allow > any qdisc to be attached. Therefore, ffff:fff1 is incorrectly used, and > the command should be dropped. > > Fixes: 6ec1c69a8f64 ("net_sched: add classful multiqueue dummy scheduler") > Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> > --- > net/sched/sch_api.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c > index fdb8f429333d..dbc9cf5eea89 100644 > --- a/net/sched/sch_api.c > +++ b/net/sched/sch_api.c > @@ -1596,6 +1596,10 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n, > NL_SET_ERR_MSG(extack, "Qdisc parent/child loop detected"); > return -ELOOP; > } > + if (clid == TC_H_INGRESS) { > + NL_SET_ERR_MSG(extack, "Ingress cannot graft directly"); > + return -EINVAL; > + } > qdisc_refcount_inc(q); > goto graft; > } else { > -- > 2.34.1 > >
On 2023/5/29 3:05, Jamal Hadi Salim wrote: > On Sat, May 27, 2023 at 5:30 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote: >> >> When use the following command to test: >> 1)ip link add bond0 type bond >> 2)ip link set bond0 up >> 3)tc qdisc add dev bond0 root handle ffff: mq >> 4)tc qdisc replace dev bond0 parent ffff:fff1 handle ffff: mq >> > > This is fixed by Peilin in this ongoing discussion: > https://lore.kernel.org/netdev/cover.1684887977.git.peilin.ye@bytedance.com/ > > cheers, > jamal > > Hi jamal: Thank you for your reply. I have notice Peilin's patches before, and test after the patch is incorporated in local host. But it still triggers the problem. Peilin's patches can be filtered out when the query result of qdisc_lookup is of the ingress type. Here is 4/6 patch in his patches. +if (q->flags & TCQ_F_INGRESS) { + NL_SET_ERR_MSG(extack, + "Cannot regraft ingress or clsact Qdiscs"); + return -EINVAL; +} However, the query result of my test case in qdisc_lookup is mq. Therefore, the patch cannot solve my problem. Thank you. Zhengchao Shao > >> The kernel reports NULL pointer dereference issue. The stack information >> is as follows: >> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 >> Internal error: Oops: 0000000096000006 [#1] SMP >> Modules linked in: >> pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> pc : mq_attach+0x44/0xa0 >> lr : qdisc_graft+0x20c/0x5cc >> sp : ffff80000e2236a0 >> x29: ffff80000e2236a0 x28: ffff0000c0e59d80 x27: ffff0000c0be19c0 >> x26: ffff0000cae3e800 x25: 0000000000000010 x24: 00000000fffffff1 >> x23: 0000000000000000 x22: ffff0000cae3e800 x21: ffff0000c9df4000 >> x20: ffff0000c9df4000 x19: 0000000000000000 x18: ffff80000a934000 >> x17: ffff8000f5b56000 x16: ffff80000bb08000 x15: 0000000000000000 >> x14: 0000000000000000 x13: 6b6b6b6b6b6b6b6b x12: 6b6b6b6b00000001 >> x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 >> x8 : ffff0000c0be0730 x7 : bbbbbbbbbbbbbbbb x6 : 0000000000000008 >> x5 : ffff0000cae3e864 x4 : 0000000000000000 x3 : 0000000000000001 >> x2 : 0000000000000001 x1 : ffff8000090bc23c x0 : 0000000000000000 >> Call trace: >> mq_attach+0x44/0xa0 >> qdisc_graft+0x20c/0x5cc >> tc_modify_qdisc+0x1c4/0x664 >> rtnetlink_rcv_msg+0x354/0x440 >> netlink_rcv_skb+0x64/0x144 >> rtnetlink_rcv+0x28/0x34 >> netlink_unicast+0x1e8/0x2a4 >> netlink_sendmsg+0x308/0x4a0 >> sock_sendmsg+0x64/0xac >> ____sys_sendmsg+0x29c/0x358 >> ___sys_sendmsg+0x90/0xd0 >> __sys_sendmsg+0x7c/0xd0 >> __arm64_sys_sendmsg+0x2c/0x38 >> invoke_syscall+0x54/0x114 >> el0_svc_common.constprop.1+0x90/0x174 >> do_el0_svc+0x3c/0xb0 >> el0_svc+0x24/0xec >> el0t_64_sync_handler+0x90/0xb4 >> el0t_64_sync+0x174/0x178 >> >> This is because when mq is added for the first time, qdiscs in mq is set >> to NULL in mq_attach(). Therefore, when replacing mq after adding mq, we >> need to initialize qdiscs in the mq before continuing to graft. Otherwise, >> it will couse NULL pointer dereference issue in mq_attach(). And the same >> issue will occur in the attach functions of mqprio, taprio and htb. >> ffff:fff1 means that the repalce qdisc is ingress. Ingress does not allow >> any qdisc to be attached. Therefore, ffff:fff1 is incorrectly used, and >> the command should be dropped. >> >> Fixes: 6ec1c69a8f64 ("net_sched: add classful multiqueue dummy scheduler") >> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> >> --- >> net/sched/sch_api.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c >> index fdb8f429333d..dbc9cf5eea89 100644 >> --- a/net/sched/sch_api.c >> +++ b/net/sched/sch_api.c >> @@ -1596,6 +1596,10 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n, >> NL_SET_ERR_MSG(extack, "Qdisc parent/child loop detected"); >> return -ELOOP; >> } >> + if (clid == TC_H_INGRESS) { >> + NL_SET_ERR_MSG(extack, "Ingress cannot graft directly"); >> + return -EINVAL; >> + } >> qdisc_refcount_inc(q); >> goto graft; >> } else { >> -- >> 2.34.1 >> >> >
On Mon, May 29, 2023 at 09:10:23AM +0800, shaozhengchao wrote: > On 2023/5/29 3:05, Jamal Hadi Salim wrote: > > On Sat, May 27, 2023 at 5:30 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote: > > > When use the following command to test: > > > 1)ip link add bond0 type bond > > > 2)ip link set bond0 up > > > 3)tc qdisc add dev bond0 root handle ffff: mq > > > 4)tc qdisc replace dev bond0 parent ffff:fff1 handle ffff: mq > > > > This is fixed by Peilin in this ongoing discussion: > > https://lore.kernel.org/netdev/cover.1684887977.git.peilin.ye@bytedance.com/ > > > Thank you for your reply. I have notice Peilin's patches before, > and test after the patch is incorporated in local host. But it still > triggers the problem. > Peilin's patches can be filtered out when the query result of > qdisc_lookup is of the ingress type. Here is 4/6 patch in his patches. > +if (q->flags & TCQ_F_INGRESS) { > + NL_SET_ERR_MSG(extack, > + "Cannot regraft ingress or clsact Qdiscs"); > + return -EINVAL; > +} > However, the query result of my test case in qdisc_lookup is mq. > Therefore, the patch cannot solve my problem. Ack, they are different: patch [4/6] prevents ingress (clsact) Qdiscs from being regrafted (to elsewhere), and Zhengchao's patch prevents other Qdiscs from being regrafted to ffff:fff1. Thanks, Peilin Ye
On 2023/5/29 16:59, Peilin Ye wrote: > On Mon, May 29, 2023 at 09:10:23AM +0800, shaozhengchao wrote: >> On 2023/5/29 3:05, Jamal Hadi Salim wrote: >>> On Sat, May 27, 2023 at 5:30 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote: >>>> When use the following command to test: >>>> 1)ip link add bond0 type bond >>>> 2)ip link set bond0 up >>>> 3)tc qdisc add dev bond0 root handle ffff: mq >>>> 4)tc qdisc replace dev bond0 parent ffff:fff1 handle ffff: mq >>> >>> This is fixed by Peilin in this ongoing discussion: >>> https://lore.kernel.org/netdev/cover.1684887977.git.peilin.ye@bytedance.com/ >>> >> Thank you for your reply. I have notice Peilin's patches before, >> and test after the patch is incorporated in local host. But it still >> triggers the problem. >> Peilin's patches can be filtered out when the query result of >> qdisc_lookup is of the ingress type. Here is 4/6 patch in his patches. >> +if (q->flags & TCQ_F_INGRESS) { >> + NL_SET_ERR_MSG(extack, >> + "Cannot regraft ingress or clsact Qdiscs"); >> + return -EINVAL; >> +} >> However, the query result of my test case in qdisc_lookup is mq. >> Therefore, the patch cannot solve my problem. > > Ack, they are different: patch [4/6] prevents ingress (clsact) Qdiscs > from being regrafted (to elsewhere), and Zhengchao's patch prevents other > Qdiscs from being regrafted to ffff:fff1. > > Thanks, > Peilin Ye > Hi Peilin: Thank you for your ack. Zhengchao Shao
On Mon, May 29, 2023 at 4:59 AM Peilin Ye <yepeilin.cs@gmail.com> wrote: > > On Mon, May 29, 2023 at 09:10:23AM +0800, shaozhengchao wrote: > > On 2023/5/29 3:05, Jamal Hadi Salim wrote: > > > On Sat, May 27, 2023 at 5:30 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote: > > > > When use the following command to test: > > > > 1)ip link add bond0 type bond > > > > 2)ip link set bond0 up > > > > 3)tc qdisc add dev bond0 root handle ffff: mq > > > > 4)tc qdisc replace dev bond0 parent ffff:fff1 handle ffff: mq > > > > > > This is fixed by Peilin in this ongoing discussion: > > > https://lore.kernel.org/netdev/cover.1684887977.git.peilin.ye@bytedance.com/ > > > > > Thank you for your reply. I have notice Peilin's patches before, > > and test after the patch is incorporated in local host. But it still > > triggers the problem. > > Peilin's patches can be filtered out when the query result of > > qdisc_lookup is of the ingress type. Here is 4/6 patch in his patches. > > +if (q->flags & TCQ_F_INGRESS) { > > + NL_SET_ERR_MSG(extack, > > + "Cannot regraft ingress or clsact Qdiscs"); > > + return -EINVAL; > > +} > > However, the query result of my test case in qdisc_lookup is mq. > > Therefore, the patch cannot solve my problem. > > Ack, they are different: patch [4/6] prevents ingress (clsact) Qdiscs > from being regrafted (to elsewhere), and Zhengchao's patch prevents other > Qdiscs from being regrafted to ffff:fff1. Ok, at first glance it was not obvious. Do we catch all combinations? for egress (0xffffffff) allowed minor is 0xfff3 (clsact::) and 0xffff. For ingress (0xfffffff1) allowed minor is 0xfff1 and 0xfff2(clsact). cheers, jamal > Thanks, > Peilin Ye >
On Mon, May 29, 2023 at 09:53:28AM -0400, Jamal Hadi Salim wrote: > On Mon, May 29, 2023 at 4:59 AM Peilin Ye <yepeilin.cs@gmail.com> wrote: > > On Mon, May 29, 2023 at 09:10:23AM +0800, shaozhengchao wrote: > > > On 2023/5/29 3:05, Jamal Hadi Salim wrote: > > > > On Sat, May 27, 2023 at 5:30 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote: > > > > > When use the following command to test: > > > > > 1)ip link add bond0 type bond > > > > > 2)ip link set bond0 up > > > > > 3)tc qdisc add dev bond0 root handle ffff: mq > > > > > 4)tc qdisc replace dev bond0 parent ffff:fff1 handle ffff: mq > > > > > > > > This is fixed by Peilin in this ongoing discussion: > > > > https://lore.kernel.org/netdev/cover.1684887977.git.peilin.ye@bytedance.com/ > > > > > > > Thank you for your reply. I have notice Peilin's patches before, > > > and test after the patch is incorporated in local host. But it still > > > triggers the problem. > > > Peilin's patches can be filtered out when the query result of > > > qdisc_lookup is of the ingress type. Here is 4/6 patch in his patches. > > > +if (q->flags & TCQ_F_INGRESS) { > > > + NL_SET_ERR_MSG(extack, > > > + "Cannot regraft ingress or clsact Qdiscs"); > > > + return -EINVAL; > > > +} > > > However, the query result of my test case in qdisc_lookup is mq. > > > Therefore, the patch cannot solve my problem. > > > > Ack, they are different: patch [4/6] prevents ingress (clsact) Qdiscs > > from being regrafted (to elsewhere), and Zhengchao's patch prevents other > > Qdiscs from being regrafted to ffff:fff1. > > Ok, at first glance it was not obvious. > Do we catch all combinations? for egress (0xffffffff) allowed minor is > 0xfff3 (clsact::) and 0xffff. For ingress (0xfffffff1) allowed minor > is 0xfff1 and 0xfff2(clsact). ffff:fff1 is special in tc_modify_qdisc(); if minor isn't fff1, tc_modify_qdisc() thinks user wants to graft a Qdisc under existing ingress or clsact Qdisc: if (clid != TC_H_INGRESS) { /* ffff:fff1 */ p = qdisc_lookup(dev, TC_H_MAJ(clid)); if (!p) { NL_SET_ERR_MSG(extack, "Failed to find specified qdisc"); return -ENOENT; } q = qdisc_leaf(p, clid); } else if (dev_ingress_queue_create(dev)) { q = dev_ingress_queue(dev)->qdisc_sleeping; } This will go to the "parent != NULL" path in qdisc_graft(), and sch_{ingress,clsact} doesn't implement cl_ops->graft(), so -EOPNOTSUPP will be returned. In short, yes, I think ffff:fff1 is the only case should be fixed. By the way I just noticed that currently it is possible to create a e.g. HTB class with a class ID of ffff:fff1... $ tc qdisc add dev eth0 root handle ffff: htb default fff1 $ tc class add dev eth0 \ parent ffff: classid ffff:fff1 htb rate 100% Regrafting a Qdisc to such classes won't work as intended at all. It's a separate issue though. Thanks, Peilin Ye
On Sat, May 27, 2023 at 05:37:47PM +0800, Zhengchao Shao wrote: > When use the following command to test: > 1)ip link add bond0 type bond > 2)ip link set bond0 up > 3)tc qdisc add dev bond0 root handle ffff: mq > 4)tc qdisc replace dev bond0 parent ffff:fff1 handle ffff: mq > > The kernel reports NULL pointer dereference issue. The stack information > is as follows: > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > Internal error: Oops: 0000000096000006 [#1] SMP > Modules linked in: > pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > pc : mq_attach+0x44/0xa0 > lr : qdisc_graft+0x20c/0x5cc > sp : ffff80000e2236a0 > x29: ffff80000e2236a0 x28: ffff0000c0e59d80 x27: ffff0000c0be19c0 > x26: ffff0000cae3e800 x25: 0000000000000010 x24: 00000000fffffff1 > x23: 0000000000000000 x22: ffff0000cae3e800 x21: ffff0000c9df4000 > x20: ffff0000c9df4000 x19: 0000000000000000 x18: ffff80000a934000 > x17: ffff8000f5b56000 x16: ffff80000bb08000 x15: 0000000000000000 > x14: 0000000000000000 x13: 6b6b6b6b6b6b6b6b x12: 6b6b6b6b00000001 > x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 > x8 : ffff0000c0be0730 x7 : bbbbbbbbbbbbbbbb x6 : 0000000000000008 > x5 : ffff0000cae3e864 x4 : 0000000000000000 x3 : 0000000000000001 > x2 : 0000000000000001 x1 : ffff8000090bc23c x0 : 0000000000000000 > Call trace: > mq_attach+0x44/0xa0 > qdisc_graft+0x20c/0x5cc > tc_modify_qdisc+0x1c4/0x664 > rtnetlink_rcv_msg+0x354/0x440 > netlink_rcv_skb+0x64/0x144 > rtnetlink_rcv+0x28/0x34 > netlink_unicast+0x1e8/0x2a4 > netlink_sendmsg+0x308/0x4a0 > sock_sendmsg+0x64/0xac > ____sys_sendmsg+0x29c/0x358 > ___sys_sendmsg+0x90/0xd0 > __sys_sendmsg+0x7c/0xd0 > __arm64_sys_sendmsg+0x2c/0x38 > invoke_syscall+0x54/0x114 > el0_svc_common.constprop.1+0x90/0x174 > do_el0_svc+0x3c/0xb0 > el0_svc+0x24/0xec > el0t_64_sync_handler+0x90/0xb4 > el0t_64_sync+0x174/0x178 > > This is because when mq is added for the first time, qdiscs in mq is set > to NULL in mq_attach(). Therefore, when replacing mq after adding mq, we > need to initialize qdiscs in the mq before continuing to graft. Otherwise, > it will couse NULL pointer dereference issue in mq_attach(). And the same > issue will occur in the attach functions of mqprio, taprio and htb. > ffff:fff1 means that the repalce qdisc is ingress. Ingress does not allow > any qdisc to be attached. Therefore, ffff:fff1 is incorrectly used, and > the command should be dropped. > > Fixes: 6ec1c69a8f64 ("net_sched: add classful multiqueue dummy scheduler") > Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Tested-by: Peilin Ye <peilin.ye@bytedance.com> Thanks, Peilin Ye
Hi Peilin: Thank you for your reply. On 2023/5/30 8:17, Peilin Ye wrote: > On Mon, May 29, 2023 at 09:53:28AM -0400, Jamal Hadi Salim wrote: >> On Mon, May 29, 2023 at 4:59 AM Peilin Ye <yepeilin.cs@gmail.com> wrote: >>> On Mon, May 29, 2023 at 09:10:23AM +0800, shaozhengchao wrote: >>>> On 2023/5/29 3:05, Jamal Hadi Salim wrote: >>>>> On Sat, May 27, 2023 at 5:30 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote: >>>>>> When use the following command to test: >>>>>> 1)ip link add bond0 type bond >>>>>> 2)ip link set bond0 up >>>>>> 3)tc qdisc add dev bond0 root handle ffff: mq >>>>>> 4)tc qdisc replace dev bond0 parent ffff:fff1 handle ffff: mq >>>>> >>>>> This is fixed by Peilin in this ongoing discussion: >>>>> https://lore.kernel.org/netdev/cover.1684887977.git.peilin.ye@bytedance.com/ >>>>> >>>> Thank you for your reply. I have notice Peilin's patches before, >>>> and test after the patch is incorporated in local host. But it still >>>> triggers the problem. >>>> Peilin's patches can be filtered out when the query result of >>>> qdisc_lookup is of the ingress type. Here is 4/6 patch in his patches. >>>> +if (q->flags & TCQ_F_INGRESS) { >>>> + NL_SET_ERR_MSG(extack, >>>> + "Cannot regraft ingress or clsact Qdiscs"); >>>> + return -EINVAL; >>>> +} >>>> However, the query result of my test case in qdisc_lookup is mq. >>>> Therefore, the patch cannot solve my problem. >>> >>> Ack, they are different: patch [4/6] prevents ingress (clsact) Qdiscs >>> from being regrafted (to elsewhere), and Zhengchao's patch prevents other >>> Qdiscs from being regrafted to ffff:fff1. >> >> Ok, at first glance it was not obvious. >> Do we catch all combinations? for egress (0xffffffff) allowed minor is >> 0xfff3 (clsact::) and 0xffff. For ingress (0xfffffff1) allowed minor >> is 0xfff1 and 0xfff2(clsact). > > ffff:fff1 is special in tc_modify_qdisc(); if minor isn't fff1, > tc_modify_qdisc() thinks user wants to graft a Qdisc under existing ingress > or clsact Qdisc: > > if (clid != TC_H_INGRESS) { /* ffff:fff1 */ > p = qdisc_lookup(dev, TC_H_MAJ(clid)); > if (!p) { > NL_SET_ERR_MSG(extack, "Failed to find specified qdisc"); > return -ENOENT; > } > q = qdisc_leaf(p, clid); > } else if (dev_ingress_queue_create(dev)) { > q = dev_ingress_queue(dev)->qdisc_sleeping; > } > > This will go to the "parent != NULL" path in qdisc_graft(), and > sch_{ingress,clsact} doesn't implement cl_ops->graft(), so -EOPNOTSUPP will > be returned. > Yes, I agree. > In short, yes, I think ffff:fff1 is the only case should be fixed. > > By the way I just noticed that currently it is possible to create a e.g. > HTB class with a class ID of ffff:fff1... > > $ tc qdisc add dev eth0 root handle ffff: htb default fff1 > $ tc class add dev eth0 \ > parent ffff: classid ffff:fff1 htb rate 100% > > Regrafting a Qdisc to such classes won't work as intended at all. It's a > separate issue though. > This seems to be another problem. Zhengchao Shao > Thanks, > Peilin Ye >
On Mon, 2023-05-29 at 17:17 -0700, Peilin Ye wrote: > On Mon, May 29, 2023 at 09:53:28AM -0400, Jamal Hadi Salim wrote: > > On Mon, May 29, 2023 at 4:59 AM Peilin Ye <yepeilin.cs@gmail.com> wrote: > > > Ack, they are different: patch [4/6] prevents ingress (clsact) Qdiscs > > > from being regrafted (to elsewhere), and Zhengchao's patch prevents other > > > Qdiscs from being regrafted to ffff:fff1. > > > > Ok, at first glance it was not obvious. > > Do we catch all combinations? for egress (0xffffffff) allowed minor is > > 0xfff3 (clsact::) and 0xffff. For ingress (0xfffffff1) allowed minor > > is 0xfff1 and 0xfff2(clsact). > > ffff:fff1 is special in tc_modify_qdisc(); if minor isn't fff1, > tc_modify_qdisc() thinks user wants to graft a Qdisc under existing ingress > or clsact Qdisc: > > if (clid != TC_H_INGRESS) { /* ffff:fff1 */ > p = qdisc_lookup(dev, TC_H_MAJ(clid)); > if (!p) { > NL_SET_ERR_MSG(extack, "Failed to find specified qdisc"); > return -ENOENT; > } > q = qdisc_leaf(p, clid); > } else if (dev_ingress_queue_create(dev)) { > q = dev_ingress_queue(dev)->qdisc_sleeping; > } > > This will go to the "parent != NULL" path in qdisc_graft(), and > sch_{ingress,clsact} doesn't implement cl_ops->graft(), so -EOPNOTSUPP will > be returned. > > In short, yes, I think ffff:fff1 is the only case should be fixed. > > By the way I just noticed that currently it is possible to create a e.g. > HTB class with a class ID of ffff:fff1... > > $ tc qdisc add dev eth0 root handle ffff: htb default fff1 > $ tc class add dev eth0 \ > parent ffff: classid ffff:fff1 htb rate 100% > > Regrafting a Qdisc to such classes won't work as intended at all. It's a > separate issue though. Jamal, are you ok with the above explanation? Perhaps it would be worthy to add a specific test-case under tc-testing for this issue? Thanks! Paolo
On Tue, May 30, 2023 at 12:16:10PM +0200, Paolo Abeni wrote: > Perhaps it would be worthy to add a specific test-case under tc-testing > for this issue? FYI I've started creating ingress.json tests for the issues fixed in this series [1]. Zhengchao, I noticed that you've added a lot of tc-tests before, would you like to add another one for this issue you fixed? Or I can do it if you're not going to. [1] https://lore.kernel.org/r/cover.1685388545.git.peilin.ye@bytedance.com/ Thanks, Peilin Ye
On Tue, May 30, 2023 at 6:16 AM Paolo Abeni <pabeni@redhat.com> wrote: > > On Mon, 2023-05-29 at 17:17 -0700, Peilin Ye wrote: > > On Mon, May 29, 2023 at 09:53:28AM -0400, Jamal Hadi Salim wrote: > > > On Mon, May 29, 2023 at 4:59 AM Peilin Ye <yepeilin.cs@gmail.com> wrote: > > > > Ack, they are different: patch [4/6] prevents ingress (clsact) Qdiscs > > > > from being regrafted (to elsewhere), and Zhengchao's patch prevents other > > > > Qdiscs from being regrafted to ffff:fff1. > > > > > > Ok, at first glance it was not obvious. > > > Do we catch all combinations? for egress (0xffffffff) allowed minor is > > > 0xfff3 (clsact::) and 0xffff. For ingress (0xfffffff1) allowed minor > > > is 0xfff1 and 0xfff2(clsact). > > > > ffff:fff1 is special in tc_modify_qdisc(); if minor isn't fff1, > > tc_modify_qdisc() thinks user wants to graft a Qdisc under existing ingress > > or clsact Qdisc: > > > > if (clid != TC_H_INGRESS) { /* ffff:fff1 */ > > p = qdisc_lookup(dev, TC_H_MAJ(clid)); > > if (!p) { > > NL_SET_ERR_MSG(extack, "Failed to find specified qdisc"); > > return -ENOENT; > > } > > q = qdisc_leaf(p, clid); > > } else if (dev_ingress_queue_create(dev)) { > > q = dev_ingress_queue(dev)->qdisc_sleeping; > > } > > > > This will go to the "parent != NULL" path in qdisc_graft(), and > > sch_{ingress,clsact} doesn't implement cl_ops->graft(), so -EOPNOTSUPP will > > be returned. > > > > In short, yes, I think ffff:fff1 is the only case should be fixed. > > > > By the way I just noticed that currently it is possible to create a e.g. > > HTB class with a class ID of ffff:fff1... > > > > $ tc qdisc add dev eth0 root handle ffff: htb default fff1 > > $ tc class add dev eth0 \ > > parent ffff: classid ffff:fff1 htb rate 100% > > > > Regrafting a Qdisc to such classes won't work as intended at all. It's a > > separate issue though. > > Jamal, are you ok with the above explanation? Perhaps it would be > worthy to add a specific test-case under tc-testing for this issue? > I am fine with this one going in as is and then adding tests. I will ACK it. cheers, jamal > Thanks! > > Paolo >
On Sat, May 27, 2023 at 5:30 AM Zhengchao Shao <shaozhengchao@huawei.com> wrote: > > When use the following command to test: > 1)ip link add bond0 type bond > 2)ip link set bond0 up > 3)tc qdisc add dev bond0 root handle ffff: mq > 4)tc qdisc replace dev bond0 parent ffff:fff1 handle ffff: mq > > The kernel reports NULL pointer dereference issue. The stack information > is as follows: > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > Internal error: Oops: 0000000096000006 [#1] SMP > Modules linked in: > pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > pc : mq_attach+0x44/0xa0 > lr : qdisc_graft+0x20c/0x5cc > sp : ffff80000e2236a0 > x29: ffff80000e2236a0 x28: ffff0000c0e59d80 x27: ffff0000c0be19c0 > x26: ffff0000cae3e800 x25: 0000000000000010 x24: 00000000fffffff1 > x23: 0000000000000000 x22: ffff0000cae3e800 x21: ffff0000c9df4000 > x20: ffff0000c9df4000 x19: 0000000000000000 x18: ffff80000a934000 > x17: ffff8000f5b56000 x16: ffff80000bb08000 x15: 0000000000000000 > x14: 0000000000000000 x13: 6b6b6b6b6b6b6b6b x12: 6b6b6b6b00000001 > x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 > x8 : ffff0000c0be0730 x7 : bbbbbbbbbbbbbbbb x6 : 0000000000000008 > x5 : ffff0000cae3e864 x4 : 0000000000000000 x3 : 0000000000000001 > x2 : 0000000000000001 x1 : ffff8000090bc23c x0 : 0000000000000000 > Call trace: > mq_attach+0x44/0xa0 > qdisc_graft+0x20c/0x5cc > tc_modify_qdisc+0x1c4/0x664 > rtnetlink_rcv_msg+0x354/0x440 > netlink_rcv_skb+0x64/0x144 > rtnetlink_rcv+0x28/0x34 > netlink_unicast+0x1e8/0x2a4 > netlink_sendmsg+0x308/0x4a0 > sock_sendmsg+0x64/0xac > ____sys_sendmsg+0x29c/0x358 > ___sys_sendmsg+0x90/0xd0 > __sys_sendmsg+0x7c/0xd0 > __arm64_sys_sendmsg+0x2c/0x38 > invoke_syscall+0x54/0x114 > el0_svc_common.constprop.1+0x90/0x174 > do_el0_svc+0x3c/0xb0 > el0_svc+0x24/0xec > el0t_64_sync_handler+0x90/0xb4 > el0t_64_sync+0x174/0x178 > > This is because when mq is added for the first time, qdiscs in mq is set > to NULL in mq_attach(). Therefore, when replacing mq after adding mq, we > need to initialize qdiscs in the mq before continuing to graft. Otherwise, > it will couse NULL pointer dereference issue in mq_attach(). And the same > issue will occur in the attach functions of mqprio, taprio and htb. > ffff:fff1 means that the repalce qdisc is ingress. Ingress does not allow > any qdisc to be attached. Therefore, ffff:fff1 is incorrectly used, and > the command should be dropped. > > Fixes: 6ec1c69a8f64 ("net_sched: add classful multiqueue dummy scheduler") > Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> cheers, jamal > --- > net/sched/sch_api.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c > index fdb8f429333d..dbc9cf5eea89 100644 > --- a/net/sched/sch_api.c > +++ b/net/sched/sch_api.c > @@ -1596,6 +1596,10 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n, > NL_SET_ERR_MSG(extack, "Qdisc parent/child loop detected"); > return -ELOOP; > } > + if (clid == TC_H_INGRESS) { > + NL_SET_ERR_MSG(extack, "Ingress cannot graft directly"); > + return -EINVAL; > + } > qdisc_refcount_inc(q); > goto graft; > } else { > -- > 2.34.1 > >
On 2023/5/31 8:35, Peilin Ye wrote: > On Tue, May 30, 2023 at 12:16:10PM +0200, Paolo Abeni wrote: >> Perhaps it would be worthy to add a specific test-case under tc-testing >> for this issue? > > FYI I've started creating ingress.json tests for the issues fixed in this > series [1]. Zhengchao, I noticed that you've added a lot of tc-tests > before, would you like to add another one for this issue you fixed? Or I > can do it if you're not going to. > > [1] https://lore.kernel.org/r/cover.1685388545.git.peilin.ye@bytedance.com/ > > Thanks, > Peilin Ye > Hi Peilin: Thank you for the notice. I'd like to add the test-case in another patch. Zhengchao Shao
Hello: This patch was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Sat, 27 May 2023 17:37:47 +0800 you wrote: > When use the following command to test: > 1)ip link add bond0 type bond > 2)ip link set bond0 up > 3)tc qdisc add dev bond0 root handle ffff: mq > 4)tc qdisc replace dev bond0 parent ffff:fff1 handle ffff: mq > > The kernel reports NULL pointer dereference issue. The stack information > is as follows: > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > Internal error: Oops: 0000000096000006 [#1] SMP > Modules linked in: > pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) > pc : mq_attach+0x44/0xa0 > lr : qdisc_graft+0x20c/0x5cc > sp : ffff80000e2236a0 > x29: ffff80000e2236a0 x28: ffff0000c0e59d80 x27: ffff0000c0be19c0 > x26: ffff0000cae3e800 x25: 0000000000000010 x24: 00000000fffffff1 > x23: 0000000000000000 x22: ffff0000cae3e800 x21: ffff0000c9df4000 > x20: ffff0000c9df4000 x19: 0000000000000000 x18: ffff80000a934000 > x17: ffff8000f5b56000 x16: ffff80000bb08000 x15: 0000000000000000 > x14: 0000000000000000 x13: 6b6b6b6b6b6b6b6b x12: 6b6b6b6b00000001 > x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 > x8 : ffff0000c0be0730 x7 : bbbbbbbbbbbbbbbb x6 : 0000000000000008 > x5 : ffff0000cae3e864 x4 : 0000000000000000 x3 : 0000000000000001 > x2 : 0000000000000001 x1 : ffff8000090bc23c x0 : 0000000000000000 > Call trace: > mq_attach+0x44/0xa0 > qdisc_graft+0x20c/0x5cc > tc_modify_qdisc+0x1c4/0x664 > rtnetlink_rcv_msg+0x354/0x440 > netlink_rcv_skb+0x64/0x144 > rtnetlink_rcv+0x28/0x34 > netlink_unicast+0x1e8/0x2a4 > netlink_sendmsg+0x308/0x4a0 > sock_sendmsg+0x64/0xac > ____sys_sendmsg+0x29c/0x358 > ___sys_sendmsg+0x90/0xd0 > __sys_sendmsg+0x7c/0xd0 > __arm64_sys_sendmsg+0x2c/0x38 > invoke_syscall+0x54/0x114 > el0_svc_common.constprop.1+0x90/0x174 > do_el0_svc+0x3c/0xb0 > el0_svc+0x24/0xec > el0t_64_sync_handler+0x90/0xb4 > el0t_64_sync+0x174/0x178 > > [...] Here is the summary with links: - [net] net: sched: fix NULL pointer dereference in mq_attach https://git.kernel.org/netdev/net/c/36eec020fab6 You are awesome, thank you!
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index fdb8f429333d..dbc9cf5eea89 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -1596,6 +1596,10 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n, NL_SET_ERR_MSG(extack, "Qdisc parent/child loop detected"); return -ELOOP; } + if (clid == TC_H_INGRESS) { + NL_SET_ERR_MSG(extack, "Ingress cannot graft directly"); + return -EINVAL; + } qdisc_refcount_inc(q); goto graft; } else {
When use the following command to test: 1)ip link add bond0 type bond 2)ip link set bond0 up 3)tc qdisc add dev bond0 root handle ffff: mq 4)tc qdisc replace dev bond0 parent ffff:fff1 handle ffff: mq The kernel reports NULL pointer dereference issue. The stack information is as follows: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Internal error: Oops: 0000000096000006 [#1] SMP Modules linked in: pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : mq_attach+0x44/0xa0 lr : qdisc_graft+0x20c/0x5cc sp : ffff80000e2236a0 x29: ffff80000e2236a0 x28: ffff0000c0e59d80 x27: ffff0000c0be19c0 x26: ffff0000cae3e800 x25: 0000000000000010 x24: 00000000fffffff1 x23: 0000000000000000 x22: ffff0000cae3e800 x21: ffff0000c9df4000 x20: ffff0000c9df4000 x19: 0000000000000000 x18: ffff80000a934000 x17: ffff8000f5b56000 x16: ffff80000bb08000 x15: 0000000000000000 x14: 0000000000000000 x13: 6b6b6b6b6b6b6b6b x12: 6b6b6b6b00000001 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : ffff0000c0be0730 x7 : bbbbbbbbbbbbbbbb x6 : 0000000000000008 x5 : ffff0000cae3e864 x4 : 0000000000000000 x3 : 0000000000000001 x2 : 0000000000000001 x1 : ffff8000090bc23c x0 : 0000000000000000 Call trace: mq_attach+0x44/0xa0 qdisc_graft+0x20c/0x5cc tc_modify_qdisc+0x1c4/0x664 rtnetlink_rcv_msg+0x354/0x440 netlink_rcv_skb+0x64/0x144 rtnetlink_rcv+0x28/0x34 netlink_unicast+0x1e8/0x2a4 netlink_sendmsg+0x308/0x4a0 sock_sendmsg+0x64/0xac ____sys_sendmsg+0x29c/0x358 ___sys_sendmsg+0x90/0xd0 __sys_sendmsg+0x7c/0xd0 __arm64_sys_sendmsg+0x2c/0x38 invoke_syscall+0x54/0x114 el0_svc_common.constprop.1+0x90/0x174 do_el0_svc+0x3c/0xb0 el0_svc+0x24/0xec el0t_64_sync_handler+0x90/0xb4 el0t_64_sync+0x174/0x178 This is because when mq is added for the first time, qdiscs in mq is set to NULL in mq_attach(). Therefore, when replacing mq after adding mq, we need to initialize qdiscs in the mq before continuing to graft. Otherwise, it will couse NULL pointer dereference issue in mq_attach(). And the same issue will occur in the attach functions of mqprio, taprio and htb. ffff:fff1 means that the repalce qdisc is ingress. Ingress does not allow any qdisc to be attached. Therefore, ffff:fff1 is incorrectly used, and the command should be dropped. Fixes: 6ec1c69a8f64 ("net_sched: add classful multiqueue dummy scheduler") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> --- net/sched/sch_api.c | 4 ++++ 1 file changed, 4 insertions(+)