Message ID | 20230608062756.3626573-1-shaozhengchao@huawei.com (mailing list archive) |
---|---|
State | Accepted |
Commit | be3618d9651002cd5ff190dbfc6cf78f03e34e27 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v2] net/sched: taprio: fix slab-out-of-bounds Read in taprio_dequeue_from_txq | expand |
On 08/06/2023 03:27, Zhengchao Shao wrote: > As shown in [1], out-of-bounds access occurs in two cases: > 1)when the qdisc of the taprio type is used to replace the previously > configured taprio, count and offset in tc_to_txq can be set to 0. In this > case, the value of *txq in taprio_next_tc_txq() will increases > continuously. When the number of accessed queues exceeds the number of > queues on the device, out-of-bounds access occurs. > 2)When packets are dequeued, taprio can be deleted. In this case, the tc > rule of dev is cleared. The count and offset values are also set to 0. In > this case, out-of-bounds access is also caused. > > Now the restriction on the queue number is added. > > [1] https://groups.google.com/g/syzkaller-bugs/c/_lYOKgkBVMg > Fixes: 2f530df76c8c ("net/sched: taprio: give higher priority to higher TCs in software dequeue mode") > Reported-by: syzbot+04afcb3d2c840447559a@syzkaller.appspotmail.com > Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Tested-by: Pedro Tammela <pctammela@mojatatu.com> > --- > v2: set q->cur_txq[tc] to prevent out-of-bounds access during next dequeue > --- > net/sched/sch_taprio.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c > index 3c4c2c334878..82983a6eb8f8 100644 > --- a/net/sched/sch_taprio.c > +++ b/net/sched/sch_taprio.c > @@ -799,6 +799,9 @@ static struct sk_buff *taprio_dequeue_tc_priority(struct Qdisc *sch, > > taprio_next_tc_txq(dev, tc, &q->cur_txq[tc]); > > + if (q->cur_txq[tc] >= dev->num_tx_queues) > + q->cur_txq[tc] = first_txq; > + > if (skb) > return skb; > } while (q->cur_txq[tc] != first_txq);
Zhengchao Shao <shaozhengchao@huawei.com> writes: > As shown in [1], out-of-bounds access occurs in two cases: > 1)when the qdisc of the taprio type is used to replace the previously > configured taprio, count and offset in tc_to_txq can be set to 0. In this > case, the value of *txq in taprio_next_tc_txq() will increases > continuously. When the number of accessed queues exceeds the number of > queues on the device, out-of-bounds access occurs. The more I think about this, the more I think the problem is somewhere else, i.e. even enqueuing a packet from a TC with zero queues associated with it doesn't make much sense. The behaviors that make more sense to me are: 1. reject configurations with '0@0' as invalid; 2. drop the packets from TCs mapped to the "empty set" queue (0@0) during enqueue(); btw, (2) sounds better to me at this point. Or is there another valid/sensible interpretation to '0@0' that I am missing? > 2)When packets are dequeued, taprio can be deleted. In this case, the tc > rule of dev is cleared. The count and offset values are also set to 0. In > this case, out-of-bounds access is also caused. This looks like more like working around the issue than fixing it, and it just happens, it's a coincidence, that both issues have the same symptoms. > > Now the restriction on the queue number is added. > > [1] https://groups.google.com/g/syzkaller-bugs/c/_lYOKgkBVMg > Fixes: 2f530df76c8c ("net/sched: taprio: give higher priority to higher TCs in software dequeue mode") > Reported-by: syzbot+04afcb3d2c840447559a@syzkaller.appspotmail.com > Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> > --- > v2: set q->cur_txq[tc] to prevent out-of-bounds access during next dequeue > --- > net/sched/sch_taprio.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c > index 3c4c2c334878..82983a6eb8f8 100644 > --- a/net/sched/sch_taprio.c > +++ b/net/sched/sch_taprio.c > @@ -799,6 +799,9 @@ static struct sk_buff *taprio_dequeue_tc_priority(struct Qdisc *sch, > > taprio_next_tc_txq(dev, tc, &q->cur_txq[tc]); > > + if (q->cur_txq[tc] >= dev->num_tx_queues) > + q->cur_txq[tc] = first_txq; > + > if (skb) > return skb; > } while (q->cur_txq[tc] != first_txq); > -- > 2.34.1 > >
On 2023/6/9 8:42, Vinicius Costa Gomes wrote: > Zhengchao Shao <shaozhengchao@huawei.com> writes: > >> As shown in [1], out-of-bounds access occurs in two cases: >> 1)when the qdisc of the taprio type is used to replace the previously >> configured taprio, count and offset in tc_to_txq can be set to 0. In this >> case, the value of *txq in taprio_next_tc_txq() will increases >> continuously. When the number of accessed queues exceeds the number of >> queues on the device, out-of-bounds access occurs. > Hi Vinicius: Thank you for your reply. > The more I think about this, the more I think the problem is somewhere > else, i.e. even enqueuing a packet from a TC with zero queues associated > with it doesn't make much sense. > > The behaviors that make more sense to me are: > 1. reject configurations with '0@0' as invalid; > 2. drop the packets from TCs mapped to the "empty set" queue (0@0) > during enqueue(); > > btw, (2) sounds better to me at this point. > > Or is there another valid/sensible interpretation to '0@0' that I am missing? I think I know what you mean. Your intention is to make judgments simultaneously during the enqueue process, as shown below? static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free) { struct taprio_sched *q = qdisc_priv(sch); + struct net_device *dev = qdisc_dev(sch); struct Qdisc *child; int queue; + int i; + + for (i = 0; i < dev->num_tc; i++) { + if (unlikely(!dev->tc_to_txq[i].count)) + return qdisc_drop(skb, sch, to_free); + } queue = skb_get_queue_mapping(skb); Is it like this? > >> 2)When packets are dequeued, taprio can be deleted. In this case, the tc >> rule of dev is cleared. The count and offset values are also set to 0. In >> this case, out-of-bounds access is also caused. > > This looks like more like working around the issue than fixing it, and > it just happens, it's a coincidence, that both issues have the same > symptoms. > There are many trigger paths for this problem, and I worry that there may be missing scenarios after I modify taprio_change and taprio_destroy, so I modify the dequeue process. Do you have any other ideas? Thanks. Zhengchao Shao >> >> Now the restriction on the queue number is added. >> >> [1] https://groups.google.com/g/syzkaller-bugs/c/_lYOKgkBVMg >> Fixes: 2f530df76c8c ("net/sched: taprio: give higher priority to higher TCs in software dequeue mode") >> Reported-by: syzbot+04afcb3d2c840447559a@syzkaller.appspotmail.com >> Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> >> --- >> v2: set q->cur_txq[tc] to prevent out-of-bounds access during next dequeue >> --- >> net/sched/sch_taprio.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c >> index 3c4c2c334878..82983a6eb8f8 100644 >> --- a/net/sched/sch_taprio.c >> +++ b/net/sched/sch_taprio.c >> @@ -799,6 +799,9 @@ static struct sk_buff *taprio_dequeue_tc_priority(struct Qdisc *sch, >> >> taprio_next_tc_txq(dev, tc, &q->cur_txq[tc]); >> >> + if (q->cur_txq[tc] >= dev->num_tx_queues) >> + q->cur_txq[tc] = first_txq; >> + >> if (skb) >> return skb; >> } while (q->cur_txq[tc] != first_txq); >> -- >> 2.34.1 >> >> >
On Fri, Jun 09, 2023 at 09:57:20AM +0800, shaozhengchao wrote: > > btw, (2) sounds better to me at this point. > > > > Or is there another valid/sensible interpretation to '0@0' that I am missing? > I think I know what you mean. Your intention is to make judgments > simultaneously during the enqueue process, as shown below? > > static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch, > struct sk_buff **to_free) > { > struct taprio_sched *q = qdisc_priv(sch); > + struct net_device *dev = qdisc_dev(sch); > struct Qdisc *child; > int queue; > + int i; > + > + for (i = 0; i < dev->num_tc; i++) { > + if (unlikely(!dev->tc_to_txq[i].count)) > + return qdisc_drop(skb, sch, to_free); > + } > > queue = skb_get_queue_mapping(skb); > > Is it like this? No. If we go down this path (not saying that we should), you should only validate the queue count of the packet's traffic class, not all queue counts... diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 978c3504fbaa..d1d10341278d 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -633,11 +633,16 @@ static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free) { struct taprio_sched *q = qdisc_priv(sch); + struct net_device *dev = qdisc_dev(sch); + int tc, queue, prio = skb->priority; struct Qdisc *child; - int queue; queue = skb_get_queue_mapping(skb); + tc = netdev_get_prio_tc_map(dev, prio); + if (!dev->tc_to_txq[tc].count) + return qdisc_drop(skb, sch, to_free); + child = q->qdiscs[queue]; if (unlikely(!child)) return qdisc_drop(skb, sch, to_free); > > > > > > 2)When packets are dequeued, taprio can be deleted. In this case, the tc > > > rule of dev is cleared. The count and offset values are also set to 0. In > > > this case, out-of-bounds access is also caused. > > > > This looks like more like working around the issue than fixing it, and > > it just happens, it's a coincidence, that both issues have the same > > symptoms. > > > There are many trigger paths for this problem, and I worry that there > may be missing scenarios after I modify taprio_change and > taprio_destroy, so I modify the dequeue process. Many other trigger paths like what? The main code path leading to 0 TXQs for a traffic class that Vinicius seems to worry about ("queues 0@0" in configuration) should already be rejected by mqprio_validate_queue_counts(): tc qdisc replace dev eno0 handle 8001: parent root stab overhead 24 taprio \ num_tc 3 map 0 1 2 queues 0@0 0@0 0@0 base-time 200 \ sched-entry S 80 20000 sched-entry S a0 20000 sched-entry S 5f 60000 clockid CLOCK_TAI Error: sch_mqprio_lib: No queues for TC 0. We should thus concentrate on the other (involuntary) code paths that can lead to there being 0 TXQs for a TC. Modifying the data path because we can't figure out the control path seems desperate. Is there a reproducer for the bug?
Hello: This patch was applied to netdev/net.git (main) by David S. Miller <davem@davemloft.net>: On Thu, 8 Jun 2023 14:27:56 +0800 you wrote: > As shown in [1], out-of-bounds access occurs in two cases: > 1)when the qdisc of the taprio type is used to replace the previously > configured taprio, count and offset in tc_to_txq can be set to 0. In this > case, the value of *txq in taprio_next_tc_txq() will increases > continuously. When the number of accessed queues exceeds the number of > queues on the device, out-of-bounds access occurs. > 2)When packets are dequeued, taprio can be deleted. In this case, the tc > rule of dev is cleared. The count and offset values are also set to 0. In > this case, out-of-bounds access is also caused. > > [...] Here is the summary with links: - [net,v2] net/sched: taprio: fix slab-out-of-bounds Read in taprio_dequeue_from_txq https://git.kernel.org/netdev/net/c/be3618d96510 You are awesome, thank you!
On 2023/6/9 17:45, Vladimir Oltean wrote: > On Fri, Jun 09, 2023 at 09:57:20AM +0800, shaozhengchao wrote: >>> btw, (2) sounds better to me at this point. >>> >>> Or is there another valid/sensible interpretation to '0@0' that I am missing? >> I think I know what you mean. Your intention is to make judgments >> simultaneously during the enqueue process, as shown below? >> >> static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch, >> struct sk_buff **to_free) >> { >> struct taprio_sched *q = qdisc_priv(sch); >> + struct net_device *dev = qdisc_dev(sch); >> struct Qdisc *child; >> int queue; >> + int i; >> + >> + for (i = 0; i < dev->num_tc; i++) { >> + if (unlikely(!dev->tc_to_txq[i].count)) >> + return qdisc_drop(skb, sch, to_free); >> + } >> >> queue = skb_get_queue_mapping(skb); >> >> Is it like this? > Hi Vladimir: Thank you for your reply. > No. If we go down this path (not saying that we should), you should only > validate the queue count of the packet's traffic class, not all queue counts... > > diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c > index 978c3504fbaa..d1d10341278d 100644 > --- a/net/sched/sch_taprio.c > +++ b/net/sched/sch_taprio.c > @@ -633,11 +633,16 @@ static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch, > struct sk_buff **to_free) > { > struct taprio_sched *q = qdisc_priv(sch); > + struct net_device *dev = qdisc_dev(sch); > + int tc, queue, prio = skb->priority; > struct Qdisc *child; > - int queue; > > queue = skb_get_queue_mapping(skb); > > + tc = netdev_get_prio_tc_map(dev, prio); > + if (!dev->tc_to_txq[tc].count) > + return qdisc_drop(skb, sch, to_free); > + It looks good to me. I'll add it in subsequent patch. > child = q->qdiscs[queue]; > if (unlikely(!child)) > return qdisc_drop(skb, sch, to_free); > >> >>> >>>> 2)When packets are dequeued, taprio can be deleted. In this case, the tc >>>> rule of dev is cleared. The count and offset values are also set to 0. In >>>> this case, out-of-bounds access is also caused. >>> >>> This looks like more like working around the issue than fixing it, and >>> it just happens, it's a coincidence, that both issues have the same >>> symptoms. >>> >> There are many trigger paths for this problem, and I worry that there >> may be missing scenarios after I modify taprio_change and >> taprio_destroy, so I modify the dequeue process. > > Many other trigger paths like what? > > The main code path leading to 0 TXQs for a traffic class that Vinicius > seems to worry about ("queues 0@0" in configuration) should already be > rejected by mqprio_validate_queue_counts(): > I added the local print information to confirm that some scenarios cannot be filtered by mqprio_validate_queue_counts. But I can't find a command line that can reproduce the problem. > tc qdisc replace dev eno0 handle 8001: parent root stab overhead 24 taprio \ > num_tc 3 map 0 1 2 queues 0@0 0@0 0@0 base-time 200 \ > sched-entry S 80 20000 sched-entry S a0 20000 sched-entry S 5f 60000 clockid CLOCK_TAI > Error: sch_mqprio_lib: No queues for TC 0. > > We should thus concentrate on the other (involuntary) code paths that > can lead to there being 0 TXQs for a TC. Modifying the data path because > we can't figure out the control path seems desperate. > > Is there a reproducer for the bug? Only the syz reproduction program. https://groups.google.com/g/syzkaller-bugs/c/_lYOKgkBVMg Thank you. Zhengchao Shao
On Mon, Jun 12, 2023 at 08:49:53AM +0800, shaozhengchao wrote: > > Is there a reproducer for the bug? > Only the syz reproduction program. > https://groups.google.com/g/syzkaller-bugs/c/_lYOKgkBVMg > Thank you. Sorry, I don't really have time to become familiar with the syzbot right now. Can someone help me translate that syz repro into a C program?
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 3c4c2c334878..82983a6eb8f8 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -799,6 +799,9 @@ static struct sk_buff *taprio_dequeue_tc_priority(struct Qdisc *sch, taprio_next_tc_txq(dev, tc, &q->cur_txq[tc]); + if (q->cur_txq[tc] >= dev->num_tx_queues) + q->cur_txq[tc] = first_txq; + if (skb) return skb; } while (q->cur_txq[tc] != first_txq);
As shown in [1], out-of-bounds access occurs in two cases: 1)when the qdisc of the taprio type is used to replace the previously configured taprio, count and offset in tc_to_txq can be set to 0. In this case, the value of *txq in taprio_next_tc_txq() will increases continuously. When the number of accessed queues exceeds the number of queues on the device, out-of-bounds access occurs. 2)When packets are dequeued, taprio can be deleted. In this case, the tc rule of dev is cleared. The count and offset values are also set to 0. In this case, out-of-bounds access is also caused. Now the restriction on the queue number is added. [1] https://groups.google.com/g/syzkaller-bugs/c/_lYOKgkBVMg Fixes: 2f530df76c8c ("net/sched: taprio: give higher priority to higher TCs in software dequeue mode") Reported-by: syzbot+04afcb3d2c840447559a@syzkaller.appspotmail.com Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> --- v2: set q->cur_txq[tc] to prevent out-of-bounds access during next dequeue --- net/sched/sch_taprio.c | 3 +++ 1 file changed, 3 insertions(+)