diff mbox series

[net-next,6/6] inet: use xa_array iterator to implement inet_dump_ifaddr()

Message ID 20240229114016.2995906-7-edumazet@google.com (mailing list archive)
State Accepted
Commit cdb2f80f1c10654efc66c1624f66df2b87eabf06
Delegated to: Netdev Maintainers
Headers show
Series inet: no longer use RTNL to protect inet_dump_ifaddr() | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 942 this patch: 942
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 958 this patch: 958
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 959 this patch: 959
netdev/checkpatch warning WARNING: Possible repeated word: 'can'
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-03-01--06-00 (tests: 882)

Commit Message

Eric Dumazet Feb. 29, 2024, 11:40 a.m. UTC
1) inet_dump_ifaddr() can can run under RCU protection
   instead of RTNL.

2) properly return 0 at the end of a dump, avoiding an
   an extra recvmsg() system call.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/devinet.c | 95 ++++++++++++++++++----------------------------
 1 file changed, 37 insertions(+), 58 deletions(-)

Comments

Jakub Kicinski Feb. 29, 2024, 3:37 p.m. UTC | #1
On Thu, 29 Feb 2024 11:40:16 +0000 Eric Dumazet wrote:
> +	if (err < 0 && likely(skb->len))
> +		err = skb->len;

I think Ido may have commented on one of your early series, but if we
set err to skb->len we'll have to do an extra empty message to terminate
the dump.

You basically only want to return skb->len when message has
overflown, so the somewhat idiomatic way to do this is:

	err = (err == -EMSGSIZE) ? skb->len : err;

Assuming err can't be set to some weird positive value.

IDK if you want to do this in future patches or it's risky, but I have
the itch to tell you every time I see a conversion which doesn't follow
this pattern :)
Eric Dumazet Feb. 29, 2024, 3:45 p.m. UTC | #2
On Thu, Feb 29, 2024 at 4:37 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 29 Feb 2024 11:40:16 +0000 Eric Dumazet wrote:
> > +     if (err < 0 && likely(skb->len))
> > +             err = skb->len;
>
> I think Ido may have commented on one of your early series, but if we
> set err to skb->len we'll have to do an extra empty message to terminate
> the dump.
>
> You basically only want to return skb->len when message has
> overflown, so the somewhat idiomatic way to do this is:
>
>         err = (err == -EMSGSIZE) ? skb->len : err;
>
> Assuming err can't be set to some weird positive value.
>
> IDK if you want to do this in future patches or it's risky, but I have
> the itch to tell you every time I see a conversion which doesn't follow
> this pattern :)

This totally makes sense.

I will send a followup patch to fix all these in one go, if this is ok
with you ?

Thanks.
Eric Dumazet Feb. 29, 2024, 3:50 p.m. UTC | #3
On Thu, Feb 29, 2024 at 4:45 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Feb 29, 2024 at 4:37 PM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Thu, 29 Feb 2024 11:40:16 +0000 Eric Dumazet wrote:
> > > +     if (err < 0 && likely(skb->len))
> > > +             err = skb->len;
> >
> > I think Ido may have commented on one of your early series, but if we
> > set err to skb->len we'll have to do an extra empty message to terminate
> > the dump.
> >
> > You basically only want to return skb->len when message has
> > overflown, so the somewhat idiomatic way to do this is:
> >
> >         err = (err == -EMSGSIZE) ? skb->len : err;

This would set err to zero if skb is empty at this point.

I guess a more correct action would be:

if (err == -EMSGSIZE && likely(skb->len))
    err = skb->len;


> >
> > Assuming err can't be set to some weird positive value.
> >
> > IDK if you want to do this in future patches or it's risky, but I have
> > the itch to tell you every time I see a conversion which doesn't follow
> > this pattern :)
>
> This totally makes sense.
>
> I will send a followup patch to fix all these in one go, if this is ok
> with you ?
>
> Thanks.
Jakub Kicinski Feb. 29, 2024, 4:21 p.m. UTC | #4
On Thu, 29 Feb 2024 16:50:45 +0100 Eric Dumazet wrote:
> > > You basically only want to return skb->len when message has
> > > overflown, so the somewhat idiomatic way to do this is:
> > >
> > >         err = (err == -EMSGSIZE) ? skb->len : err;  
> 
> This would set err to zero if skb is empty at this point.
> 
> I guess a more correct action would be:
> 
> if (err == -EMSGSIZE && likely(skb->len))
>     err = skb->len;

Ugh, fair point.
We should probably move the EMSGSIZE handling to the core, this is
getting too complicated for average humans.. Like this?

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 765921cf1194..ce27003b90a8 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2264,6 +2264,8 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
 		if (extra_mutex)
 			mutex_lock(extra_mutex);
 		nlk->dump_done_errno = cb->dump(skb, cb);
+		if (nlk->dump_done_errno == -EMSGSIZE && skb->len)
+			nlk->dump_done_errno = skb->len;
 		if (extra_mutex)
 			mutex_unlock(extra_mutex);

> > >
> > > Assuming err can't be set to some weird positive value.
> > >
> > > IDK if you want to do this in future patches or it's risky, but I have
> > > the itch to tell you every time I see a conversion which doesn't follow
> > > this pattern :)  
> >
> > This totally makes sense.
> >
> > I will send a followup patch to fix all these in one go, if this is ok
> > with you ?

Definitely not a blocker
Eric Dumazet Feb. 29, 2024, 4:26 p.m. UTC | #5
On Thu, Feb 29, 2024 at 5:21 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 29 Feb 2024 16:50:45 +0100 Eric Dumazet wrote:
> > > > You basically only want to return skb->len when message has
> > > > overflown, so the somewhat idiomatic way to do this is:
> > > >
> > > >         err = (err == -EMSGSIZE) ? skb->len : err;
> >
> > This would set err to zero if skb is empty at this point.
> >
> > I guess a more correct action would be:
> >
> > if (err == -EMSGSIZE && likely(skb->len))
> >     err = skb->len;
>
> Ugh, fair point.
> We should probably move the EMSGSIZE handling to the core, this is
> getting too complicated for average humans.. Like this?
>

Now you are talking ! Yep, definitely nicer.

> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index 765921cf1194..ce27003b90a8 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -2264,6 +2264,8 @@ static int netlink_dump(struct sock *sk, bool lock_taken)
>                 if (extra_mutex)
>                         mutex_lock(extra_mutex);
>                 nlk->dump_done_errno = cb->dump(skb, cb);
> +               if (nlk->dump_done_errno == -EMSGSIZE && skb->len)
> +                       nlk->dump_done_errno = skb->len;
>                 if (extra_mutex)
>                         mutex_unlock(extra_mutex);
>
> > > >
> > > > Assuming err can't be set to some weird positive value.
> > > >
> > > > IDK if you want to do this in future patches or it's risky, but I have
> > > > the itch to tell you every time I see a conversion which doesn't follow
> > > > this pattern :)
> > >
> > > This totally makes sense.
> > >
> > > I will send a followup patch to fix all these in one go, if this is ok
> > > with you ?
>
> Definitely not a blocker
diff mbox series

Patch

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 2afe78dfc3c2f6c0394925f1c35532a2dfd26d71..4daa8124f247c256c4f8c1ff29ac621570af0755 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1676,7 +1676,7 @@  static int put_cacheinfo(struct sk_buff *skb, unsigned long cstamp,
 	return nla_put(skb, IFA_CACHEINFO, sizeof(ci), &ci);
 }
 
-static int inet_fill_ifaddr(struct sk_buff *skb, struct in_ifaddr *ifa,
+static int inet_fill_ifaddr(struct sk_buff *skb, const struct in_ifaddr *ifa,
 			    struct inet_fill_args *args)
 {
 	struct ifaddrmsg *ifm;
@@ -1805,15 +1805,15 @@  static int inet_valid_dump_ifaddr_req(const struct nlmsghdr *nlh,
 }
 
 static int in_dev_dump_addr(struct in_device *in_dev, struct sk_buff *skb,
-			    struct netlink_callback *cb, int s_ip_idx,
+			    struct netlink_callback *cb, int *s_ip_idx,
 			    struct inet_fill_args *fillargs)
 {
 	struct in_ifaddr *ifa;
 	int ip_idx = 0;
 	int err;
 
-	in_dev_for_each_ifa_rtnl(ifa, in_dev) {
-		if (ip_idx < s_ip_idx) {
+	in_dev_for_each_ifa_rcu(ifa, in_dev) {
+		if (ip_idx < *s_ip_idx) {
 			ip_idx++;
 			continue;
 		}
@@ -1825,9 +1825,9 @@  static int in_dev_dump_addr(struct in_device *in_dev, struct sk_buff *skb,
 		ip_idx++;
 	}
 	err = 0;
-
+	ip_idx = 0;
 done:
-	cb->args[2] = ip_idx;
+	*s_ip_idx = ip_idx;
 
 	return err;
 }
@@ -1859,75 +1859,53 @@  static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
 	};
 	struct net *net = sock_net(skb->sk);
 	struct net *tgt_net = net;
-	int h, s_h;
-	int idx, s_idx;
-	int s_ip_idx;
-	struct net_device *dev;
+	struct {
+		unsigned long ifindex;
+		int ip_idx;
+	} *ctx = (void *)cb->ctx;
 	struct in_device *in_dev;
-	struct hlist_head *head;
+	struct net_device *dev;
 	int err = 0;
 
-	s_h = cb->args[0];
-	s_idx = idx = cb->args[1];
-	s_ip_idx = cb->args[2];
-
+	rcu_read_lock();
 	if (cb->strict_check) {
 		err = inet_valid_dump_ifaddr_req(nlh, &fillargs, &tgt_net,
 						 skb->sk, cb);
 		if (err < 0)
-			goto put_tgt_net;
+			goto done;
 
-		err = 0;
 		if (fillargs.ifindex) {
-			dev = __dev_get_by_index(tgt_net, fillargs.ifindex);
-			if (!dev) {
-				err = -ENODEV;
-				goto put_tgt_net;
-			}
-
-			in_dev = __in_dev_get_rtnl(dev);
-			if (in_dev) {
-				err = in_dev_dump_addr(in_dev, skb, cb, s_ip_idx,
-						       &fillargs);
-			}
-			goto put_tgt_net;
-		}
-	}
-
-	for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
-		idx = 0;
-		head = &tgt_net->dev_index_head[h];
-		rcu_read_lock();
-		cb->seq = inet_base_seq(tgt_net);
-		hlist_for_each_entry_rcu(dev, head, index_hlist) {
-			if (idx < s_idx)
-				goto cont;
-			if (h > s_h || idx > s_idx)
-				s_ip_idx = 0;
+			err = -ENODEV;
+			dev = dev_get_by_index_rcu(tgt_net, fillargs.ifindex);
+			if (!dev)
+				goto done;
 			in_dev = __in_dev_get_rcu(dev);
 			if (!in_dev)
-				goto cont;
-
-			err = in_dev_dump_addr(in_dev, skb, cb, s_ip_idx,
-					       &fillargs);
-			if (err < 0) {
-				rcu_read_unlock();
 				goto done;
-			}
-cont:
-			idx++;
+			err = in_dev_dump_addr(in_dev, skb, cb, &ctx->ip_idx,
+					       &fillargs);
+			goto done;
 		}
-		rcu_read_unlock();
 	}
 
+	cb->seq = inet_base_seq(tgt_net);
+
+	for_each_netdev_dump(net, dev, ctx->ifindex) {
+		in_dev = __in_dev_get_rcu(dev);
+		if (!in_dev)
+			continue;
+		err = in_dev_dump_addr(in_dev, skb, cb, &ctx->ip_idx,
+				       &fillargs);
+		if (err < 0)
+			goto done;
+	}
 done:
-	cb->args[0] = h;
-	cb->args[1] = idx;
-put_tgt_net:
+	if (err < 0 && likely(skb->len))
+		err = skb->len;
 	if (fillargs.netnsid >= 0)
 		put_net(tgt_net);
-
-	return skb->len ? : err;
+	rcu_read_unlock();
+	return err;
 }
 
 static void rtmsg_ifa(int event, struct in_ifaddr *ifa, struct nlmsghdr *nlh,
@@ -2818,7 +2796,8 @@  void __init devinet_init(void)
 
 	rtnl_register(PF_INET, RTM_NEWADDR, inet_rtm_newaddr, NULL, 0);
 	rtnl_register(PF_INET, RTM_DELADDR, inet_rtm_deladdr, NULL, 0);
-	rtnl_register(PF_INET, RTM_GETADDR, NULL, inet_dump_ifaddr, 0);
+	rtnl_register(PF_INET, RTM_GETADDR, NULL, inet_dump_ifaddr,
+		      RTNL_FLAG_DUMP_UNLOCKED);
 	rtnl_register(PF_INET, RTM_GETNETCONF, inet_netconf_get_devconf,
 		      inet_netconf_dump_devconf,
 		      RTNL_FLAG_DOIT_UNLOCKED | RTNL_FLAG_DUMP_UNLOCKED);