Message ID | 20240503192059.3884225-6-edumazet@google.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 6747a5d4990b8c8d7392f7a06b7a4bb5f4ada80e |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | rtnetlink: more rcu conversions for rtnl_fill_ifinfo() | expand |
On Fri, May 03, 2024 at 07:20:56PM +0000, Eric Dumazet wrote: > Following device fields can be read locklessly > in rtnl_fill_ifinfo() : > > type, ifindex, operstate, link_mode, mtu, min_mtu, max_mtu, group, > promiscuity, allmulti, num_tx_queues, gso_max_segs, gso_max_size, > gro_max_size, gso_ipv4_max_size, gro_ipv4_max_size, tso_max_size, > tso_max_segs, num_rx_queues. Hi Eric, * Regarding mtu, as the comment you added to sruct net_device some time ago mentions, mtu is written in many places. I'm wondering if, in particular wrt ndo_change_mtu implementations, if some it is appropriate to add WRITE_ONCE() annotations. * Likewise, is it appropriate to add WRITE_ONCE() to dev_set_group() ?
On Sun, May 5, 2024 at 4:47 PM Simon Horman <horms@kernel.org> wrote: > > On Fri, May 03, 2024 at 07:20:56PM +0000, Eric Dumazet wrote: > > Following device fields can be read locklessly > > in rtnl_fill_ifinfo() : > > > > type, ifindex, operstate, link_mode, mtu, min_mtu, max_mtu, group, > > promiscuity, allmulti, num_tx_queues, gso_max_segs, gso_max_size, > > gro_max_size, gso_ipv4_max_size, gro_ipv4_max_size, tso_max_size, > > tso_max_segs, num_rx_queues. > > Hi Eric, > > * Regarding mtu, as the comment you added to sruct net_device > some time ago mentions, mtu is written in many places. > > I'm wondering if, in particular wrt ndo_change_mtu implementations, > if some it is appropriate to add WRITE_ONCE() annotations. Sure thing. I called for these changes in commit 501a90c94510 ("inet: protect against too small mtu values.") when I said "Hopefully we will add the missing ones in followup patches." > > * Likewise, is it appropriate to add WRITE_ONCE() to dev_set_group() ? In general, a lot of write sides would need to be changed. In practice, most compilers will not perform store tearing, this would be quite expensive. Also, adding WRITE_ONCE() will not prevent a reader from reading some temporary values, take a look at dev_change_tx_queue_len() for instance.
On Sun, May 05, 2024 at 05:00:10PM +0200, Eric Dumazet wrote: > On Sun, May 5, 2024 at 4:47 PM Simon Horman <horms@kernel.org> wrote: > > > > On Fri, May 03, 2024 at 07:20:56PM +0000, Eric Dumazet wrote: > > > Following device fields can be read locklessly > > > in rtnl_fill_ifinfo() : > > > > > > type, ifindex, operstate, link_mode, mtu, min_mtu, max_mtu, group, > > > promiscuity, allmulti, num_tx_queues, gso_max_segs, gso_max_size, > > > gro_max_size, gso_ipv4_max_size, gro_ipv4_max_size, tso_max_size, > > > tso_max_segs, num_rx_queues. > > > > Hi Eric, > > > > * Regarding mtu, as the comment you added to sruct net_device > > some time ago mentions, mtu is written in many places. > > > > I'm wondering if, in particular wrt ndo_change_mtu implementations, > > if some it is appropriate to add WRITE_ONCE() annotations. > > Sure thing. I called for these changes in commit > 501a90c94510 ("inet: protect against too small mtu values.") > when I said "Hopefully we will add the missing ones in followup patches." Ok, so basically it would be nice to add them, but they don't block progress of this patchset? > > * Likewise, is it appropriate to add WRITE_ONCE() to dev_set_group() ? > > In general, a lot of write sides would need to be changed. > > In practice, most compilers will not perform store tearing, this would > be quite expensive. > > Also, adding WRITE_ONCE() will not prevent a reader from reading some > temporary values, > take a look at dev_change_tx_queue_len() for instance. Thank you, I will study this more closely.
On Sun, May 5, 2024 at 5:06 PM Simon Horman <horms@kernel.org> wrote: > > On Sun, May 05, 2024 at 05:00:10PM +0200, Eric Dumazet wrote: > > On Sun, May 5, 2024 at 4:47 PM Simon Horman <horms@kernel.org> wrote: > > > > > > On Fri, May 03, 2024 at 07:20:56PM +0000, Eric Dumazet wrote: > > > > Following device fields can be read locklessly > > > > in rtnl_fill_ifinfo() : > > > > > > > > type, ifindex, operstate, link_mode, mtu, min_mtu, max_mtu, group, > > > > promiscuity, allmulti, num_tx_queues, gso_max_segs, gso_max_size, > > > > gro_max_size, gso_ipv4_max_size, gro_ipv4_max_size, tso_max_size, > > > > tso_max_segs, num_rx_queues. > > > > > > Hi Eric, > > > > > > * Regarding mtu, as the comment you added to sruct net_device > > > some time ago mentions, mtu is written in many places. > > > > > > I'm wondering if, in particular wrt ndo_change_mtu implementations, > > > if some it is appropriate to add WRITE_ONCE() annotations. > > > > Sure thing. I called for these changes in commit > > 501a90c94510 ("inet: protect against too small mtu values.") > > when I said "Hopefully we will add the missing ones in followup patches." > > Ok, so basically it would be nice to add them, > but they don't block progress of this patchset? A patch set adding WRITE_ONCE() on all dev->mtu would be great, and seems orthogonal. Note that we already have many points in the stack where dev->mtu is read locklessly.
On Sun, May 05, 2024 at 05:14:58PM +0200, Eric Dumazet wrote: > On Sun, May 5, 2024 at 5:06 PM Simon Horman <horms@kernel.org> wrote: > > > > On Sun, May 05, 2024 at 05:00:10PM +0200, Eric Dumazet wrote: > > > On Sun, May 5, 2024 at 4:47 PM Simon Horman <horms@kernel.org> wrote: > > > > > > > > On Fri, May 03, 2024 at 07:20:56PM +0000, Eric Dumazet wrote: > > > > > Following device fields can be read locklessly > > > > > in rtnl_fill_ifinfo() : > > > > > > > > > > type, ifindex, operstate, link_mode, mtu, min_mtu, max_mtu, group, > > > > > promiscuity, allmulti, num_tx_queues, gso_max_segs, gso_max_size, > > > > > gro_max_size, gso_ipv4_max_size, gro_ipv4_max_size, tso_max_size, > > > > > tso_max_segs, num_rx_queues. > > > > > > > > Hi Eric, > > > > > > > > * Regarding mtu, as the comment you added to sruct net_device > > > > some time ago mentions, mtu is written in many places. > > > > > > > > I'm wondering if, in particular wrt ndo_change_mtu implementations, > > > > if some it is appropriate to add WRITE_ONCE() annotations. > > > > > > Sure thing. I called for these changes in commit > > > 501a90c94510 ("inet: protect against too small mtu values.") > > > when I said "Hopefully we will add the missing ones in followup patches." > > > > Ok, so basically it would be nice to add them, > > but they don't block progress of this patchset? > > A patch set adding WRITE_ONCE() on all dev->mtu would be great, > and seems orthogonal. Ack. I'm guessing an incremental approach to getting better coverage would be best. I'll add this to my todo list. > Note that we already have many points in the > stack where dev->mtu is read locklessly. Understood. For the record, I have no objections to this patch. Reviewed-by: Simon Horman <horms@kernel.org>
On Tue, May 7, 2024 at 6:38 PM Simon Horman <horms@kernel.org> wrote: > > On Sun, May 05, 2024 at 05:14:58PM +0200, Eric Dumazet wrote: > > On Sun, May 5, 2024 at 5:06 PM Simon Horman <horms@kernel.org> wrote: > > > > > > On Sun, May 05, 2024 at 05:00:10PM +0200, Eric Dumazet wrote: > > > > On Sun, May 5, 2024 at 4:47 PM Simon Horman <horms@kernel.org> wrote: > > > > > > > > > > On Fri, May 03, 2024 at 07:20:56PM +0000, Eric Dumazet wrote: > > > > > > Following device fields can be read locklessly > > > > > > in rtnl_fill_ifinfo() : > > > > > > > > > > > > type, ifindex, operstate, link_mode, mtu, min_mtu, max_mtu, group, > > > > > > promiscuity, allmulti, num_tx_queues, gso_max_segs, gso_max_size, > > > > > > gro_max_size, gso_ipv4_max_size, gro_ipv4_max_size, tso_max_size, > > > > > > tso_max_segs, num_rx_queues. > > > > > > > > > > Hi Eric, > > > > > > > > > > * Regarding mtu, as the comment you added to sruct net_device > > > > > some time ago mentions, mtu is written in many places. > > > > > > > > > > I'm wondering if, in particular wrt ndo_change_mtu implementations, > > > > > if some it is appropriate to add WRITE_ONCE() annotations. > > > > > > > > Sure thing. I called for these changes in commit > > > > 501a90c94510 ("inet: protect against too small mtu values.") > > > > when I said "Hopefully we will add the missing ones in followup patches." > > > > > > Ok, so basically it would be nice to add them, > > > but they don't block progress of this patchset? > > > > A patch set adding WRITE_ONCE() on all dev->mtu would be great, > > and seems orthogonal. > > Ack. I'm guessing an incremental approach to getting better coverage would > be best. I'll add this to my todo list. I sent a single patch about that already ;) https://patchwork.kernel.org/project/netdevbpf/patch/20240506102812.3025432-1-edumazet@google.com/ Thanks !
On Tue, May 07, 2024 at 06:39:54PM +0200, Eric Dumazet wrote: > On Tue, May 7, 2024 at 6:38 PM Simon Horman <horms@kernel.org> wrote: > > > > On Sun, May 05, 2024 at 05:14:58PM +0200, Eric Dumazet wrote: > > > On Sun, May 5, 2024 at 5:06 PM Simon Horman <horms@kernel.org> wrote: > > > > > > > > On Sun, May 05, 2024 at 05:00:10PM +0200, Eric Dumazet wrote: > > > > > On Sun, May 5, 2024 at 4:47 PM Simon Horman <horms@kernel.org> wrote: > > > > > > > > > > > > On Fri, May 03, 2024 at 07:20:56PM +0000, Eric Dumazet wrote: > > > > > > > Following device fields can be read locklessly > > > > > > > in rtnl_fill_ifinfo() : > > > > > > > > > > > > > > type, ifindex, operstate, link_mode, mtu, min_mtu, max_mtu, group, > > > > > > > promiscuity, allmulti, num_tx_queues, gso_max_segs, gso_max_size, > > > > > > > gro_max_size, gso_ipv4_max_size, gro_ipv4_max_size, tso_max_size, > > > > > > > tso_max_segs, num_rx_queues. > > > > > > > > > > > > Hi Eric, > > > > > > > > > > > > * Regarding mtu, as the comment you added to sruct net_device > > > > > > some time ago mentions, mtu is written in many places. > > > > > > > > > > > > I'm wondering if, in particular wrt ndo_change_mtu implementations, > > > > > > if some it is appropriate to add WRITE_ONCE() annotations. > > > > > > > > > > Sure thing. I called for these changes in commit > > > > > 501a90c94510 ("inet: protect against too small mtu values.") > > > > > when I said "Hopefully we will add the missing ones in followup patches." > > > > > > > > Ok, so basically it would be nice to add them, > > > > but they don't block progress of this patchset? > > > > > > A patch set adding WRITE_ONCE() on all dev->mtu would be great, > > > and seems orthogonal. > > > > Ack. I'm guessing an incremental approach to getting better coverage would > > be best. I'll add this to my todo list. > > I sent a single patch about that already ;) :)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 77d14528bdefc8b655f5da37ed88d0b937f35a61..242c24e857ec4c799f0239be3371fd589a8ed191 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1603,7 +1603,8 @@ static int put_master_ifindex(struct sk_buff *skb, struct net_device *dev) upper_dev = netdev_master_upper_dev_get_rcu(dev); if (upper_dev) - ret = nla_put_u32(skb, IFLA_MASTER, upper_dev->ifindex); + ret = nla_put_u32(skb, IFLA_MASTER, + READ_ONCE(upper_dev->ifindex)); rcu_read_unlock(); return ret; @@ -1825,8 +1826,8 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, ifm = nlmsg_data(nlh); ifm->ifi_family = AF_UNSPEC; ifm->__ifi_pad = 0; - ifm->ifi_type = dev->type; - ifm->ifi_index = dev->ifindex; + ifm->ifi_type = READ_ONCE(dev->type); + ifm->ifi_index = READ_ONCE(dev->ifindex); ifm->ifi_flags = dev_get_flags(dev); ifm->ifi_change = change; @@ -1839,24 +1840,34 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, if (nla_put_u32(skb, IFLA_TXQLEN, READ_ONCE(dev->tx_queue_len)) || nla_put_u8(skb, IFLA_OPERSTATE, - netif_running(dev) ? dev->operstate : IF_OPER_DOWN) || - nla_put_u8(skb, IFLA_LINKMODE, dev->link_mode) || - nla_put_u32(skb, IFLA_MTU, dev->mtu) || - nla_put_u32(skb, IFLA_MIN_MTU, dev->min_mtu) || - nla_put_u32(skb, IFLA_MAX_MTU, dev->max_mtu) || - nla_put_u32(skb, IFLA_GROUP, dev->group) || - nla_put_u32(skb, IFLA_PROMISCUITY, dev->promiscuity) || - nla_put_u32(skb, IFLA_ALLMULTI, dev->allmulti) || - nla_put_u32(skb, IFLA_NUM_TX_QUEUES, dev->num_tx_queues) || - nla_put_u32(skb, IFLA_GSO_MAX_SEGS, dev->gso_max_segs) || - nla_put_u32(skb, IFLA_GSO_MAX_SIZE, dev->gso_max_size) || - nla_put_u32(skb, IFLA_GRO_MAX_SIZE, dev->gro_max_size) || - nla_put_u32(skb, IFLA_GSO_IPV4_MAX_SIZE, dev->gso_ipv4_max_size) || - nla_put_u32(skb, IFLA_GRO_IPV4_MAX_SIZE, dev->gro_ipv4_max_size) || - nla_put_u32(skb, IFLA_TSO_MAX_SIZE, dev->tso_max_size) || - nla_put_u32(skb, IFLA_TSO_MAX_SEGS, dev->tso_max_segs) || + netif_running(dev) ? READ_ONCE(dev->operstate) : + IF_OPER_DOWN) || + nla_put_u8(skb, IFLA_LINKMODE, READ_ONCE(dev->link_mode)) || + nla_put_u32(skb, IFLA_MTU, READ_ONCE(dev->mtu)) || + nla_put_u32(skb, IFLA_MIN_MTU, READ_ONCE(dev->min_mtu)) || + nla_put_u32(skb, IFLA_MAX_MTU, READ_ONCE(dev->max_mtu)) || + nla_put_u32(skb, IFLA_GROUP, READ_ONCE(dev->group)) || + nla_put_u32(skb, IFLA_PROMISCUITY, READ_ONCE(dev->promiscuity)) || + nla_put_u32(skb, IFLA_ALLMULTI, READ_ONCE(dev->allmulti)) || + nla_put_u32(skb, IFLA_NUM_TX_QUEUES, + READ_ONCE(dev->num_tx_queues)) || + nla_put_u32(skb, IFLA_GSO_MAX_SEGS, + READ_ONCE(dev->gso_max_segs)) || + nla_put_u32(skb, IFLA_GSO_MAX_SIZE, + READ_ONCE(dev->gso_max_size)) || + nla_put_u32(skb, IFLA_GRO_MAX_SIZE, + READ_ONCE(dev->gro_max_size)) || + nla_put_u32(skb, IFLA_GSO_IPV4_MAX_SIZE, + READ_ONCE(dev->gso_ipv4_max_size)) || + nla_put_u32(skb, IFLA_GRO_IPV4_MAX_SIZE, + READ_ONCE(dev->gro_ipv4_max_size)) || + nla_put_u32(skb, IFLA_TSO_MAX_SIZE, + READ_ONCE(dev->tso_max_size)) || + nla_put_u32(skb, IFLA_TSO_MAX_SEGS, + READ_ONCE(dev->tso_max_segs)) || #ifdef CONFIG_RPS - nla_put_u32(skb, IFLA_NUM_RX_QUEUES, dev->num_rx_queues) || + nla_put_u32(skb, IFLA_NUM_RX_QUEUES, + READ_ONCE(dev->num_rx_queues)) || #endif put_master_ifindex(skb, dev) || nla_put_u8(skb, IFLA_CARRIER, netif_carrier_ok(dev)) ||
Following device fields can be read locklessly in rtnl_fill_ifinfo() : type, ifindex, operstate, link_mode, mtu, min_mtu, max_mtu, group, promiscuity, allmulti, num_tx_queues, gso_max_segs, gso_max_size, gro_max_size, gso_ipv4_max_size, gro_ipv4_max_size, tso_max_size, tso_max_segs, num_rx_queues. Signed-off-by: Eric Dumazet <edumazet@google.com> --- net/core/rtnetlink.c | 51 +++++++++++++++++++++++++++----------------- 1 file changed, 31 insertions(+), 20 deletions(-)