Message ID | 20220116090220.2378360-1-eric.dumazet@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 0a6e6b3c7db6c34e3d149f09cd714972f8753e3f |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v2,net] ipv4: update fib_info_cnt under spinlock protection | expand |
On Sun, Jan 16, 2022 at 01:02:20AM -0800, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > In the past, free_fib_info() was supposed to be called > under RTNL protection. > > This eventually was no longer the case. > > Instead of enforcing RTNL it seems we simply can > move fib_info_cnt changes to occur when fib_info_lock > is held. > > v2: David Laight suggested to update fib_info_cnt > only when an entry is added/deleted to/from the hash table, > as fib_info_cnt is used to make sure hash table size > is optimal. [...] > Fixes: 48bb9eb47b27 ("netdevsim: fib: Add dummy implementation for FIB offload") > Signed-off-by: Eric Dumazet <edumazet@google.com> > Reported-by: syzbot <syzkaller@googlegroups.com> > Cc: David Laight <David.Laight@ACULAB.COM> > Cc: Ido Schimmel <idosch@mellanox.com> > Cc: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Hello: This patch was applied to netdev/net.git (master) by David S. Miller <davem@davemloft.net>: On Sun, 16 Jan 2022 01:02:20 -0800 you wrote: > From: Eric Dumazet <edumazet@google.com> > > In the past, free_fib_info() was supposed to be called > under RTNL protection. > > This eventually was no longer the case. > > [...] Here is the summary with links: - [v2,net] ipv4: update fib_info_cnt under spinlock protection https://git.kernel.org/netdev/net/c/0a6e6b3c7db6 You are awesome, thank you!
From: Eric Dumazet > Sent: 16 January 2022 09:02 > > In the past, free_fib_info() was supposed to be called > under RTNL protection. > > This eventually was no longer the case. > > Instead of enforcing RTNL it seems we simply can > move fib_info_cnt changes to occur when fib_info_lock > is held. > > v2: David Laight suggested to update fib_info_cnt > only when an entry is added/deleted to/from the hash table, > as fib_info_cnt is used to make sure hash table size > is optimal. Already applied, but acked-by: David Laight ... If you are going to add READ_ONCE() markers then one on 'fib_info_hash_size' would be much more appropriate since this value is used twice. > err = -ENOBUFS; > - if (fib_info_cnt >= fib_info_hash_size) { > + > + /* Paired with WRITE_ONCE() in fib_release_info() */ > + if (READ_ONCE(fib_info_cnt) >= fib_info_hash_size) { > unsigned int new_size = fib_info_hash_size << 1; > struct hlist_head *new_info_hash; > struct hlist_head *new_laddrhash; > @@ -1462,7 +1467,6 @@ struct fib_info *fib_create_info(struct fib_config *cfg, If is also possible for two (or many) threads to decide to increase the hash table size at the same time. The code that moves the items to the new hash tables should probably discard the new tables is they aren't larger than the existing ones. The copy does look safe - just a waste of time. It is also technically possible (but very unlikely) that the table will get shrunk! It will grow again on the next allocate. But this is a different bug. I also though Linus said that the WRITE_ONCE() weren't needed here because the kernel basically assumes the compiler isn't stupid enough to do 'write tearing' on word sized items (or just write zero before every write). David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
From: Eric Dumazet > Sent: 16 January 2022 09:02 > > From: Eric Dumazet <edumazet@google.com> > > In the past, free_fib_info() was supposed to be called > under RTNL protection. > > This eventually was no longer the case. > > Instead of enforcing RTNL it seems we simply can > move fib_info_cnt changes to occur when fib_info_lock > is held. This probably ought to be a stable candidate. If an increment is lost due to the unlocked inc/dec it is possible for the count to wrap to -1 if all fib are deleted. That will cause the table size to be doubled on every allocate (until the count goes +ve again). Losing a decrement is less of a problem. You'd need to lose a lot of them for any ill effect. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Mon, Jan 17, 2022 at 1:20 AM David Laight <David.Laight@aculab.com> wrote: > This probably ought to be a stable candidate. > > If an increment is lost due to the unlocked inc/dec it is > possible for the count to wrap to -1 if all fib are deleted. > That will cause the table size to be doubled on every > allocate (until the count goes +ve again). > > Losing a decrement is less of a problem. > You'd need to lose a lot of them for any ill effect. > You are rephrasing the syzbot report, and the Fixes: tag I added in the changelog. hint: netdev maintainers handle the stable backports based on the Fixes: tag, and their own judgement.
On Sun, Jan 16, 2022 at 7:23 PM David Laight <David.Laight@aculab.com> wrote: > > From: Eric Dumazet > > Sent: 16 January 2022 09:02 > > > > In the past, free_fib_info() was supposed to be called > > under RTNL protection. > > > > This eventually was no longer the case. > > > > Instead of enforcing RTNL it seems we simply can > > move fib_info_cnt changes to occur when fib_info_lock > > is held. > > > > v2: David Laight suggested to update fib_info_cnt > > only when an entry is added/deleted to/from the hash table, > > as fib_info_cnt is used to make sure hash table size > > is optimal. > > Already applied, but > acked-by: David Laight > > ... > If you are going to add READ_ONCE() markers then one on > 'fib_info_hash_size' would be much more appropriate since > this value is used twice. > > > err = -ENOBUFS; > > - if (fib_info_cnt >= fib_info_hash_size) { > > + > > + /* Paired with WRITE_ONCE() in fib_release_info() */ > > + if (READ_ONCE(fib_info_cnt) >= fib_info_hash_size) { > > unsigned int new_size = fib_info_hash_size << 1; > > struct hlist_head *new_info_hash; > > struct hlist_head *new_laddrhash; > > @@ -1462,7 +1467,6 @@ struct fib_info *fib_create_info(struct fib_config *cfg, > > If is also possible for two (or many) threads to decide to > increase the hash table size at the same time. > > The code that moves the items to the new hash tables should > probably discard the new tables is they aren't larger than > the existing ones. > The copy does look safe - just a waste of time. > > It is also technically possible (but very unlikely) that the table > will get shrunk! > It will grow again on the next allocate. > > But this is a different bug. > There is no bug. fib_create_info() is called with RTNL held. > I also though Linus said that the WRITE_ONCE() weren't needed > here because the kernel basically assumes the compiler isn't > stupid enough to do 'write tearing' on word sized items > (or just write zero before every write). > That is not true. READ_ONCE()/WRITE_ONCE() have their own purpose, we can not assume a compiler will follow arbitrary rules about word sizes.
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 828de171708f599b56f63715514c0259c7cb08a2..45619c005b8dddd7ccd5c7029efa4ed69b6ce1de 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -249,7 +249,6 @@ void free_fib_info(struct fib_info *fi) pr_warn("Freeing alive fib_info %p\n", fi); return; } - fib_info_cnt--; call_rcu(&fi->rcu, free_fib_info_rcu); } @@ -260,6 +259,10 @@ void fib_release_info(struct fib_info *fi) spin_lock_bh(&fib_info_lock); if (fi && refcount_dec_and_test(&fi->fib_treeref)) { hlist_del(&fi->fib_hash); + + /* Paired with READ_ONCE() in fib_create_info(). */ + WRITE_ONCE(fib_info_cnt, fib_info_cnt - 1); + if (fi->fib_prefsrc) hlist_del(&fi->fib_lhash); if (fi->nh) { @@ -1430,7 +1433,9 @@ struct fib_info *fib_create_info(struct fib_config *cfg, #endif err = -ENOBUFS; - if (fib_info_cnt >= fib_info_hash_size) { + + /* Paired with WRITE_ONCE() in fib_release_info() */ + if (READ_ONCE(fib_info_cnt) >= fib_info_hash_size) { unsigned int new_size = fib_info_hash_size << 1; struct hlist_head *new_info_hash; struct hlist_head *new_laddrhash; @@ -1462,7 +1467,6 @@ struct fib_info *fib_create_info(struct fib_config *cfg, return ERR_PTR(err); } - fib_info_cnt++; fi->fib_net = net; fi->fib_protocol = cfg->fc_protocol; fi->fib_scope = cfg->fc_scope; @@ -1591,6 +1595,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg, refcount_set(&fi->fib_treeref, 1); refcount_set(&fi->fib_clntref, 1); spin_lock_bh(&fib_info_lock); + fib_info_cnt++; hlist_add_head(&fi->fib_hash, &fib_info_hash[fib_info_hashfn(fi)]); if (fi->fib_prefsrc) {