Message ID | 20240229114016.2995906-7-edumazet@google.com (mailing list archive) |
---|---|
State | Accepted |
Commit | cdb2f80f1c10654efc66c1624f66df2b87eabf06 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | inet: no longer use RTNL to protect inet_dump_ifaddr() | expand |
On Thu, 29 Feb 2024 11:40:16 +0000 Eric Dumazet wrote: > + if (err < 0 && likely(skb->len)) > + err = skb->len; I think Ido may have commented on one of your early series, but if we set err to skb->len we'll have to do an extra empty message to terminate the dump. You basically only want to return skb->len when message has overflown, so the somewhat idiomatic way to do this is: err = (err == -EMSGSIZE) ? skb->len : err; Assuming err can't be set to some weird positive value. IDK if you want to do this in future patches or it's risky, but I have the itch to tell you every time I see a conversion which doesn't follow this pattern :)
On Thu, Feb 29, 2024 at 4:37 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 29 Feb 2024 11:40:16 +0000 Eric Dumazet wrote: > > + if (err < 0 && likely(skb->len)) > > + err = skb->len; > > I think Ido may have commented on one of your early series, but if we > set err to skb->len we'll have to do an extra empty message to terminate > the dump. > > You basically only want to return skb->len when message has > overflown, so the somewhat idiomatic way to do this is: > > err = (err == -EMSGSIZE) ? skb->len : err; > > Assuming err can't be set to some weird positive value. > > IDK if you want to do this in future patches or it's risky, but I have > the itch to tell you every time I see a conversion which doesn't follow > this pattern :) This totally makes sense. I will send a followup patch to fix all these in one go, if this is ok with you ? Thanks.
On Thu, Feb 29, 2024 at 4:45 PM Eric Dumazet <edumazet@google.com> wrote: > > On Thu, Feb 29, 2024 at 4:37 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Thu, 29 Feb 2024 11:40:16 +0000 Eric Dumazet wrote: > > > + if (err < 0 && likely(skb->len)) > > > + err = skb->len; > > > > I think Ido may have commented on one of your early series, but if we > > set err to skb->len we'll have to do an extra empty message to terminate > > the dump. > > > > You basically only want to return skb->len when message has > > overflown, so the somewhat idiomatic way to do this is: > > > > err = (err == -EMSGSIZE) ? skb->len : err; This would set err to zero if skb is empty at this point. I guess a more correct action would be: if (err == -EMSGSIZE && likely(skb->len)) err = skb->len; > > > > Assuming err can't be set to some weird positive value. > > > > IDK if you want to do this in future patches or it's risky, but I have > > the itch to tell you every time I see a conversion which doesn't follow > > this pattern :) > > This totally makes sense. > > I will send a followup patch to fix all these in one go, if this is ok > with you ? > > Thanks.
On Thu, 29 Feb 2024 16:50:45 +0100 Eric Dumazet wrote: > > > You basically only want to return skb->len when message has > > > overflown, so the somewhat idiomatic way to do this is: > > > > > > err = (err == -EMSGSIZE) ? skb->len : err; > > This would set err to zero if skb is empty at this point. > > I guess a more correct action would be: > > if (err == -EMSGSIZE && likely(skb->len)) > err = skb->len; Ugh, fair point. We should probably move the EMSGSIZE handling to the core, this is getting too complicated for average humans.. Like this? diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 765921cf1194..ce27003b90a8 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2264,6 +2264,8 @@ static int netlink_dump(struct sock *sk, bool lock_taken) if (extra_mutex) mutex_lock(extra_mutex); nlk->dump_done_errno = cb->dump(skb, cb); + if (nlk->dump_done_errno == -EMSGSIZE && skb->len) + nlk->dump_done_errno = skb->len; if (extra_mutex) mutex_unlock(extra_mutex); > > > > > > Assuming err can't be set to some weird positive value. > > > > > > IDK if you want to do this in future patches or it's risky, but I have > > > the itch to tell you every time I see a conversion which doesn't follow > > > this pattern :) > > > > This totally makes sense. > > > > I will send a followup patch to fix all these in one go, if this is ok > > with you ? Definitely not a blocker
On Thu, Feb 29, 2024 at 5:21 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 29 Feb 2024 16:50:45 +0100 Eric Dumazet wrote: > > > > You basically only want to return skb->len when message has > > > > overflown, so the somewhat idiomatic way to do this is: > > > > > > > > err = (err == -EMSGSIZE) ? skb->len : err; > > > > This would set err to zero if skb is empty at this point. > > > > I guess a more correct action would be: > > > > if (err == -EMSGSIZE && likely(skb->len)) > > err = skb->len; > > Ugh, fair point. > We should probably move the EMSGSIZE handling to the core, this is > getting too complicated for average humans.. Like this? > Now you are talking ! Yep, definitely nicer. > diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c > index 765921cf1194..ce27003b90a8 100644 > --- a/net/netlink/af_netlink.c > +++ b/net/netlink/af_netlink.c > @@ -2264,6 +2264,8 @@ static int netlink_dump(struct sock *sk, bool lock_taken) > if (extra_mutex) > mutex_lock(extra_mutex); > nlk->dump_done_errno = cb->dump(skb, cb); > + if (nlk->dump_done_errno == -EMSGSIZE && skb->len) > + nlk->dump_done_errno = skb->len; > if (extra_mutex) > mutex_unlock(extra_mutex); > > > > > > > > > Assuming err can't be set to some weird positive value. > > > > > > > > IDK if you want to do this in future patches or it's risky, but I have > > > > the itch to tell you every time I see a conversion which doesn't follow > > > > this pattern :) > > > > > > This totally makes sense. > > > > > > I will send a followup patch to fix all these in one go, if this is ok > > > with you ? > > Definitely not a blocker
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 2afe78dfc3c2f6c0394925f1c35532a2dfd26d71..4daa8124f247c256c4f8c1ff29ac621570af0755 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -1676,7 +1676,7 @@ static int put_cacheinfo(struct sk_buff *skb, unsigned long cstamp, return nla_put(skb, IFA_CACHEINFO, sizeof(ci), &ci); } -static int inet_fill_ifaddr(struct sk_buff *skb, struct in_ifaddr *ifa, +static int inet_fill_ifaddr(struct sk_buff *skb, const struct in_ifaddr *ifa, struct inet_fill_args *args) { struct ifaddrmsg *ifm; @@ -1805,15 +1805,15 @@ static int inet_valid_dump_ifaddr_req(const struct nlmsghdr *nlh, } static int in_dev_dump_addr(struct in_device *in_dev, struct sk_buff *skb, - struct netlink_callback *cb, int s_ip_idx, + struct netlink_callback *cb, int *s_ip_idx, struct inet_fill_args *fillargs) { struct in_ifaddr *ifa; int ip_idx = 0; int err; - in_dev_for_each_ifa_rtnl(ifa, in_dev) { - if (ip_idx < s_ip_idx) { + in_dev_for_each_ifa_rcu(ifa, in_dev) { + if (ip_idx < *s_ip_idx) { ip_idx++; continue; } @@ -1825,9 +1825,9 @@ static int in_dev_dump_addr(struct in_device *in_dev, struct sk_buff *skb, ip_idx++; } err = 0; - + ip_idx = 0; done: - cb->args[2] = ip_idx; + *s_ip_idx = ip_idx; return err; } @@ -1859,75 +1859,53 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb) }; struct net *net = sock_net(skb->sk); struct net *tgt_net = net; - int h, s_h; - int idx, s_idx; - int s_ip_idx; - struct net_device *dev; + struct { + unsigned long ifindex; + int ip_idx; + } *ctx = (void *)cb->ctx; struct in_device *in_dev; - struct hlist_head *head; + struct net_device *dev; int err = 0; - s_h = cb->args[0]; - s_idx = idx = cb->args[1]; - s_ip_idx = cb->args[2]; - + rcu_read_lock(); if (cb->strict_check) { err = inet_valid_dump_ifaddr_req(nlh, &fillargs, &tgt_net, skb->sk, cb); if (err < 0) - goto put_tgt_net; + goto done; - err = 0; if (fillargs.ifindex) { - dev = __dev_get_by_index(tgt_net, fillargs.ifindex); - if (!dev) { - err = -ENODEV; - goto put_tgt_net; - } - - in_dev = __in_dev_get_rtnl(dev); - if (in_dev) { - err = in_dev_dump_addr(in_dev, skb, cb, s_ip_idx, - &fillargs); - } - goto put_tgt_net; - } - } - - for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) { - idx = 0; - head = &tgt_net->dev_index_head[h]; - rcu_read_lock(); - cb->seq = inet_base_seq(tgt_net); - hlist_for_each_entry_rcu(dev, head, index_hlist) { - if (idx < s_idx) - goto cont; - if (h > s_h || idx > s_idx) - s_ip_idx = 0; + err = -ENODEV; + dev = dev_get_by_index_rcu(tgt_net, fillargs.ifindex); + if (!dev) + goto done; in_dev = __in_dev_get_rcu(dev); if (!in_dev) - goto cont; - - err = in_dev_dump_addr(in_dev, skb, cb, s_ip_idx, - &fillargs); - if (err < 0) { - rcu_read_unlock(); goto done; - } -cont: - idx++; + err = in_dev_dump_addr(in_dev, skb, cb, &ctx->ip_idx, + &fillargs); + goto done; } - rcu_read_unlock(); } + cb->seq = inet_base_seq(tgt_net); + + for_each_netdev_dump(net, dev, ctx->ifindex) { + in_dev = __in_dev_get_rcu(dev); + if (!in_dev) + continue; + err = in_dev_dump_addr(in_dev, skb, cb, &ctx->ip_idx, + &fillargs); + if (err < 0) + goto done; + } done: - cb->args[0] = h; - cb->args[1] = idx; -put_tgt_net: + if (err < 0 && likely(skb->len)) + err = skb->len; if (fillargs.netnsid >= 0) put_net(tgt_net); - - return skb->len ? : err; + rcu_read_unlock(); + return err; } static void rtmsg_ifa(int event, struct in_ifaddr *ifa, struct nlmsghdr *nlh, @@ -2818,7 +2796,8 @@ void __init devinet_init(void) rtnl_register(PF_INET, RTM_NEWADDR, inet_rtm_newaddr, NULL, 0); rtnl_register(PF_INET, RTM_DELADDR, inet_rtm_deladdr, NULL, 0); - rtnl_register(PF_INET, RTM_GETADDR, NULL, inet_dump_ifaddr, 0); + rtnl_register(PF_INET, RTM_GETADDR, NULL, inet_dump_ifaddr, + RTNL_FLAG_DUMP_UNLOCKED); rtnl_register(PF_INET, RTM_GETNETCONF, inet_netconf_get_devconf, inet_netconf_dump_devconf, RTNL_FLAG_DOIT_UNLOCKED | RTNL_FLAG_DUMP_UNLOCKED);
1) inet_dump_ifaddr() can can run under RCU protection instead of RTNL. 2) properly return 0 at the end of a dump, avoiding an an extra recvmsg() system call. Signed-off-by: Eric Dumazet <edumazet@google.com> --- net/ipv4/devinet.c | 95 ++++++++++++++++++---------------------------- 1 file changed, 37 insertions(+), 58 deletions(-)