Message ID | 20220817104646.22861-1-wanghai38@huawei.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net/sched: fix netdevice reference leaks in attach_one_default_qdisc() | expand |
On Wed, 17 Aug 2022 18:46:46 +0800 Wang Hai wrote: > In attach_default_qdiscs(), when attach default qdisc (fq_codel) fails > and fallback to noqueue, if the original attached qdisc is not released > and a new one is directly attached, this will cause netdevice reference > leaks. Could you provide more details on the failure path? My preference would be to try to clean up properly there, if possible. > The following is the bug log: > > veth0: default qdisc (fq_codel) fail, fallback to noqueue > unregister_netdevice: waiting for veth0 to become free. Usage count = 32 > leaked reference. > qdisc_alloc+0x12e/0x210 > qdisc_create_dflt+0x62/0x140 > attach_one_default_qdisc.constprop.41+0x44/0x70 > dev_activate+0x128/0x290 > __dev_open+0x12a/0x190 > __dev_change_flags+0x1a2/0x1f0 > dev_change_flags+0x23/0x60 > do_setlink+0x332/0x1150 > __rtnl_newlink+0x52f/0x8e0 > rtnl_newlink+0x43/0x70 > rtnetlink_rcv_msg+0x140/0x3b0 > netlink_rcv_skb+0x50/0x100 > netlink_unicast+0x1bb/0x290 > netlink_sendmsg+0x37c/0x4e0 > sock_sendmsg+0x5f/0x70 > ____sys_sendmsg+0x208/0x280 > > In attach_one_default_qdisc(), release the old one before attaching > a new qdisc to fix this bug. > > Fixes: bf6dba76d278 ("net: sched: fallback to qdisc noqueue if default qdisc setup fail") > Signed-off-by: Wang Hai <wanghai38@huawei.com> > --- > net/sched/sch_generic.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c > index d47b9689eba6..87b61ef14497 100644 > --- a/net/sched/sch_generic.c > +++ b/net/sched/sch_generic.c > @@ -1140,6 +1140,11 @@ static void attach_one_default_qdisc(struct net_device *dev, > > if (!netif_is_multiqueue(dev)) > qdisc->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT; > + > + if (dev_queue->qdisc_sleeping && > + dev_queue->qdisc_sleeping != &noop_qdisc) > + qdisc_put(dev_queue->qdisc_sleeping); > + > dev_queue->qdisc_sleeping = qdisc; > } >
在 2022/8/19 1:56, Jakub Kicinski 写道: > On Wed, 17 Aug 2022 18:46:46 +0800 Wang Hai wrote: >> In attach_default_qdiscs(), when attach default qdisc (fq_codel) fails >> and fallback to noqueue, if the original attached qdisc is not released >> and a new one is directly attached, this will cause netdevice reference >> leaks. > Could you provide more details on the failure path? My preference would > be to try to clean up properly there, if possible. Hi Jakub. Here are the details of the failure. Do I need to do cleanup under the failed path? If a dev has multiple queues and queue 0 fails to attach qdisc because there is no memory in attach_one_default_qdisc(). Then dev->qdisc will be noop_qdisc by default. But the other queues may be able to successfully attach to default qdisc. In this case, the fallback to noqueue process will be triggered static void attach_default_qdiscs(struct net_device *dev) { ... if (!netif_is_multiqueue(dev) || dev->priv_flags & IFF_NO_QUEUE) { ... netdev_for_each_tx_queue(dev, attach_one_default_qdisc, NULL); // queue 0 attach failed because -ENOBUFS, but the other queues attach successfully qdisc = txq->qdisc_sleeping; rcu_assign_pointer(dev->qdisc, qdisc); // dev->qdisc = &noop_qdisc ... } ... if (qdisc == &noop_qdisc) { ... netdev_for_each_tx_queue(dev, attach_one_default_qdisc, NULL); // Re-attach, but not release the previously created qdisc ... } } >> The following is the bug log: >> >> veth0: default qdisc (fq_codel) fail, fallback to noqueue >> unregister_netdevice: waiting for veth0 to become free. Usage count = 32 >> leaked reference. >> qdisc_alloc+0x12e/0x210 >> qdisc_create_dflt+0x62/0x140 >> attach_one_default_qdisc.constprop.41+0x44/0x70 >> dev_activate+0x128/0x290 >> __dev_open+0x12a/0x190 >> __dev_change_flags+0x1a2/0x1f0 >> dev_change_flags+0x23/0x60 >> do_setlink+0x332/0x1150 >> __rtnl_newlink+0x52f/0x8e0 >> rtnl_newlink+0x43/0x70 >> rtnetlink_rcv_msg+0x140/0x3b0 >> netlink_rcv_skb+0x50/0x100 >> netlink_unicast+0x1bb/0x290 >> netlink_sendmsg+0x37c/0x4e0 >> sock_sendmsg+0x5f/0x70 >> ____sys_sendmsg+0x208/0x280 >> >> In attach_one_default_qdisc(), release the old one before attaching >> a new qdisc to fix this bug. >> >> Fixes: bf6dba76d278 ("net: sched: fallback to qdisc noqueue if default qdisc setup fail") >> Signed-off-by: Wang Hai <wanghai38@huawei.com> >> --- >> net/sched/sch_generic.c | 5 +++++ >> 1 file changed, 5 insertions(+) >> >> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c >> index d47b9689eba6..87b61ef14497 100644 >> --- a/net/sched/sch_generic.c >> +++ b/net/sched/sch_generic.c >> @@ -1140,6 +1140,11 @@ static void attach_one_default_qdisc(struct net_device *dev, >> >> if (!netif_is_multiqueue(dev)) >> qdisc->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT; >> + >> + if (dev_queue->qdisc_sleeping && >> + dev_queue->qdisc_sleeping != &noop_qdisc) >> + qdisc_put(dev_queue->qdisc_sleeping); >> + >> dev_queue->qdisc_sleeping = qdisc; >> } >> > .
在 2022/8/19 23:58, wanghai (M) 写道: > > 在 2022/8/19 1:56, Jakub Kicinski 写道: >> On Wed, 17 Aug 2022 18:46:46 +0800 Wang Hai wrote: >>> In attach_default_qdiscs(), when attach default qdisc (fq_codel) fails >>> and fallback to noqueue, if the original attached qdisc is not released >>> and a new one is directly attached, this will cause netdevice reference >>> leaks. >> Could you provide more details on the failure path? My preference would >> be to try to clean up properly there, if possible. > Hi Jakub. > > Here are the details of the failure. Do I need to do cleanup under the > failed path? > > If a dev has multiple queues and queue 0 fails to attach qdisc > because there is no memory in attach_one_default_qdisc(). Then > dev->qdisc will be noop_qdisc by default. But the other queues > may be able to successfully attach to default qdisc. > > In this case, the fallback to noqueue process will be triggered > > static void attach_default_qdiscs(struct net_device *dev) > { > ... > if (!netif_is_multiqueue(dev) || > dev->priv_flags & IFF_NO_QUEUE) { > ... > netdev_for_each_tx_queue(dev, attach_one_default_qdisc, > NULL); // queue 0 attach failed because -ENOBUFS, but the other queues > attach successfully > qdisc = txq->qdisc_sleeping; > rcu_assign_pointer(dev->qdisc, qdisc); // dev->qdisc = > &noop_qdisc > ... > } > ... > if (qdisc == &noop_qdisc) { > ... > netdev_for_each_tx_queue(dev, attach_one_default_qdisc, NULL); > // Re-attach, but not release the previously created qdisc > ... > } > } > Hi Jakub. Do you have any other suggestions for this patch? Any replies would be appreciated. >>> The following is the bug log: >>> >>> veth0: default qdisc (fq_codel) fail, fallback to noqueue >>> unregister_netdevice: waiting for veth0 to become free. Usage count >>> = 32 >>> leaked reference. >>> qdisc_alloc+0x12e/0x210 >>> qdisc_create_dflt+0x62/0x140 >>> attach_one_default_qdisc.constprop.41+0x44/0x70 >>> dev_activate+0x128/0x290 >>> __dev_open+0x12a/0x190 >>> __dev_change_flags+0x1a2/0x1f0 >>> dev_change_flags+0x23/0x60 >>> do_setlink+0x332/0x1150 >>> __rtnl_newlink+0x52f/0x8e0 >>> rtnl_newlink+0x43/0x70 >>> rtnetlink_rcv_msg+0x140/0x3b0 >>> netlink_rcv_skb+0x50/0x100 >>> netlink_unicast+0x1bb/0x290 >>> netlink_sendmsg+0x37c/0x4e0 >>> sock_sendmsg+0x5f/0x70 >>> ____sys_sendmsg+0x208/0x280 >>> >>> In attach_one_default_qdisc(), release the old one before attaching >>> a new qdisc to fix this bug. >>> >>> Fixes: bf6dba76d278 ("net: sched: fallback to qdisc noqueue if >>> default qdisc setup fail") >>> Signed-off-by: Wang Hai <wanghai38@huawei.com> >>> --- >>> net/sched/sch_generic.c | 5 +++++ >>> 1 file changed, 5 insertions(+) >>> >>> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c >>> index d47b9689eba6..87b61ef14497 100644 >>> --- a/net/sched/sch_generic.c >>> +++ b/net/sched/sch_generic.c >>> @@ -1140,6 +1140,11 @@ static void attach_one_default_qdisc(struct >>> net_device *dev, >>> if (!netif_is_multiqueue(dev)) >>> qdisc->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT; >>> + >>> + if (dev_queue->qdisc_sleeping && >>> + dev_queue->qdisc_sleeping != &noop_qdisc) >>> + qdisc_put(dev_queue->qdisc_sleeping); >>> + >>> dev_queue->qdisc_sleeping = qdisc; >>> } >> . >
On Thu, 25 Aug 2022 20:29:21 +0800 wanghai (M) wrote: > 在 2022/8/19 23:58, wanghai (M) 写道: > > Here are the details of the failure. Do I need to do cleanup under the > > failed path? > > > > If a dev has multiple queues and queue 0 fails to attach qdisc > > because there is no memory in attach_one_default_qdisc(). Then > > dev->qdisc will be noop_qdisc by default. But the other queues > > may be able to successfully attach to default qdisc. > > > > In this case, the fallback to noqueue process will be triggered > > > > static void attach_default_qdiscs(struct net_device *dev) > > { > > ... > > if (!netif_is_multiqueue(dev) || > > dev->priv_flags & IFF_NO_QUEUE) { > > ... > > netdev_for_each_tx_queue(dev, attach_one_default_qdisc, > > NULL); // queue 0 attach failed because -ENOBUFS, but the other queues > > attach successfully > > qdisc = txq->qdisc_sleeping; > > rcu_assign_pointer(dev->qdisc, qdisc); // dev->qdisc = > > &noop_qdisc > > ... > > } > > ... > > if (qdisc == &noop_qdisc) { > > ... > > netdev_for_each_tx_queue(dev, attach_one_default_qdisc, NULL); > > // Re-attach, but not release the previously created qdisc > > ... > > } > > } > > Do you have any other suggestions for this patch? Any replies would be > appreciated. Sorry for the silence and thanks for a solid explanation! Can we do a: netdev_for_each_tx_queue(dev, shutdown_scheduler_queue, &noop_qdisc); before trying to re-attach, to clear out any non-noop qdisc that may have gotten assigned?
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index d47b9689eba6..87b61ef14497 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -1140,6 +1140,11 @@ static void attach_one_default_qdisc(struct net_device *dev, if (!netif_is_multiqueue(dev)) qdisc->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT; + + if (dev_queue->qdisc_sleeping && + dev_queue->qdisc_sleeping != &noop_qdisc) + qdisc_put(dev_queue->qdisc_sleeping); + dev_queue->qdisc_sleeping = qdisc; }
In attach_default_qdiscs(), when attach default qdisc (fq_codel) fails and fallback to noqueue, if the original attached qdisc is not released and a new one is directly attached, this will cause netdevice reference leaks. The following is the bug log: veth0: default qdisc (fq_codel) fail, fallback to noqueue unregister_netdevice: waiting for veth0 to become free. Usage count = 32 leaked reference. qdisc_alloc+0x12e/0x210 qdisc_create_dflt+0x62/0x140 attach_one_default_qdisc.constprop.41+0x44/0x70 dev_activate+0x128/0x290 __dev_open+0x12a/0x190 __dev_change_flags+0x1a2/0x1f0 dev_change_flags+0x23/0x60 do_setlink+0x332/0x1150 __rtnl_newlink+0x52f/0x8e0 rtnl_newlink+0x43/0x70 rtnetlink_rcv_msg+0x140/0x3b0 netlink_rcv_skb+0x50/0x100 netlink_unicast+0x1bb/0x290 netlink_sendmsg+0x37c/0x4e0 sock_sendmsg+0x5f/0x70 ____sys_sendmsg+0x208/0x280 In attach_one_default_qdisc(), release the old one before attaching a new qdisc to fix this bug. Fixes: bf6dba76d278 ("net: sched: fallback to qdisc noqueue if default qdisc setup fail") Signed-off-by: Wang Hai <wanghai38@huawei.com> --- net/sched/sch_generic.c | 5 +++++ 1 file changed, 5 insertions(+)