Message ID | 20230515085046.4457-1-jnixdorf-oss@avm.de (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next,1/2] bridge: Add a limit on FDB entries | expand |
On 15/05/2023 11:50, Johannes Nixdorf wrote: > A malicious actor behind one bridge port may spam the kernel with packets > with a random source MAC address, each of which will create an FDB entry, > each of which is a dynamic allocation in the kernel. > > There are roughly 2^48 different MAC addresses, further limited by the > rhashtable they are stored in to 2^31. Each entry is of the type struct > net_bridge_fdb_entry, which is currently 128 bytes big. This means the > maximum amount of memory allocated for FDB entries is 2^31 * 128B = > 256GiB, which is too much for most computers. > > Mitigate this by adding a bridge netlink setting IFLA_BR_FDB_MAX_ENTRIES, > which, if nonzero, limits the amount of entries to a user specified > maximum. > > For backwards compatibility the default setting of 0 disables the limit. > > All changes to fdb_n_entries are under br->hash_lock, which means we do > not need additional locking. The call paths are (✓ denotes that > br->hash_lock is taken around the next call): > > - fdb_delete <-+- fdb_delete_local <-+- br_fdb_changeaddr ✓ > | +- br_fdb_change_mac_address ✓ > | +- br_fdb_delete_by_port ✓ > +- br_fdb_find_delete_local ✓ > +- fdb_add_local <-+- br_fdb_changeaddr ✓ > | +- br_fdb_change_mac_address ✓ > | +- br_fdb_add_local ✓ > +- br_fdb_cleanup ✓ > +- br_fdb_flush ✓ > +- br_fdb_delete_by_port ✓ > +- fdb_delete_by_addr_and_port <--- __br_fdb_delete ✓ > +- br_fdb_external_learn_del ✓ > - fdb_create <-+- fdb_add_local <-+- br_fdb_changeaddr ✓ > | +- br_fdb_change_mac_address ✓ > | +- br_fdb_add_local ✓ > +- br_fdb_update ✓ > +- fdb_add_entry <--- __br_fdb_add ✓ > +- br_fdb_external_learn_add ✓ > > Signed-off-by: Johannes Nixdorf <jnixdorf-oss@avm.de> > --- > include/uapi/linux/if_link.h | 1 + > net/bridge/br_device.c | 2 ++ > net/bridge/br_fdb.c | 6 ++++++ > net/bridge/br_netlink.c | 9 ++++++++- > net/bridge/br_private.h | 2 ++ > 5 files changed, 19 insertions(+), 1 deletion(-) > Hi, If you're sending a patch series please add a cover letter (--cover-letter) which explains what the series are trying to do and why. I've had a patch that implements this feature for a while but didn't get to upstreaming it. :) Anyway more comments below, > diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h > index 4ac1000b0ef2..27cf5f2d8790 100644 > --- a/include/uapi/linux/if_link.h > +++ b/include/uapi/linux/if_link.h > @@ -510,6 +510,7 @@ enum { > IFLA_BR_VLAN_STATS_PER_PORT, > IFLA_BR_MULTI_BOOLOPT, > IFLA_BR_MCAST_QUERIER_STATE, > + IFLA_BR_FDB_MAX_ENTRIES, > __IFLA_BR_MAX, > }; > > diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c > index 8eca8a5c80c6..d455a28df7c9 100644 > --- a/net/bridge/br_device.c > +++ b/net/bridge/br_device.c > @@ -528,6 +528,8 @@ void br_dev_setup(struct net_device *dev) > br->bridge_hello_time = br->hello_time = 2 * HZ; > br->bridge_forward_delay = br->forward_delay = 15 * HZ; > br->bridge_ageing_time = br->ageing_time = BR_DEFAULT_AGEING_TIME; > + br->fdb_n_entries = 0; > + br->fdb_max_entries = 0; Unnecessary, the private area is already cleared. > dev->max_mtu = ETH_MAX_MTU; > > br_netfilter_rtable_init(br); > diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c > index e69a872bfc1d..8a833e6dee92 100644 > --- a/net/bridge/br_fdb.c > +++ b/net/bridge/br_fdb.c > @@ -329,6 +329,8 @@ static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f, > hlist_del_init_rcu(&f->fdb_node); > rhashtable_remove_fast(&br->fdb_hash_tbl, &f->rhnode, > br_fdb_rht_params); > + if (!WARN_ON(!br->fdb_n_entries)) > + br->fdb_n_entries--; This is pointless, just put the WARN_ON(!br->fdb_n_entries) above decrementing, if we hit that we are already in trouble and not decrementing doesn't help us. > fdb_notify(br, f, RTM_DELNEIGH, swdev_notify); > call_rcu(&f->rcu, fdb_rcu_free); > } > @@ -391,6 +393,9 @@ static struct net_bridge_fdb_entry *fdb_create(struct net_bridge *br, > struct net_bridge_fdb_entry *fdb; > int err; > > + if (unlikely(br->fdb_max_entries && br->fdb_n_entries >= br->fdb_max_entries)) > + return NULL; > + This one needs more work, fdb_create() is also used when user-space is adding new entries, so it would be nice to return a proper error. > fdb = kmem_cache_alloc(br_fdb_cache, GFP_ATOMIC); > if (!fdb) > return NULL; > @@ -408,6 +413,7 @@ static struct net_bridge_fdb_entry *fdb_create(struct net_bridge *br, > } > > hlist_add_head_rcu(&fdb->fdb_node, &br->fdb_list); > + br->fdb_n_entries++; > > return fdb; > } > diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c > index 05c5863d2e20..e5b8d36a3291 100644 > --- a/net/bridge/br_netlink.c > +++ b/net/bridge/br_netlink.c > @@ -1527,6 +1527,12 @@ static int br_changelink(struct net_device *brdev, struct nlattr *tb[], > return err; > } > > + if (data[IFLA_BR_FDB_MAX_ENTRIES]) { > + u32 val = nla_get_u32(data[IFLA_BR_FDB_MAX_ENTRIES]); > + > + br->fdb_max_entries = val; > + } > + > return 0; > } > > @@ -1656,7 +1662,8 @@ static int br_fill_info(struct sk_buff *skb, const struct net_device *brdev) > nla_put_u8(skb, IFLA_BR_TOPOLOGY_CHANGE_DETECTED, > br->topology_change_detected) || > nla_put(skb, IFLA_BR_GROUP_ADDR, ETH_ALEN, br->group_addr) || > - nla_put(skb, IFLA_BR_MULTI_BOOLOPT, sizeof(bm), &bm)) > + nla_put(skb, IFLA_BR_MULTI_BOOLOPT, sizeof(bm), &bm) || > + nla_put_u32(skb, IFLA_BR_FDB_MAX_ENTRIES, br->fdb_max_entries)) You are not returning the current entry count, that is also needed. > return -EMSGSIZE; > > #ifdef CONFIG_BRIDGE_VLAN_FILTERING > diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h > index 2119729ded2b..64fb359c6e3e 100644 > --- a/net/bridge/br_private.h > +++ b/net/bridge/br_private.h > @@ -494,6 +494,8 @@ struct net_bridge { > #endif > > struct rhashtable fdb_hash_tbl; > + u32 fdb_n_entries; > + u32 fdb_max_entries; These are not critical, so I'd use 4 byte holes in net_bridge and pack it better instead of making it larger. > struct list_head port_list; > #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER) > union {
Hi, On Mon, May 15, 2023 at 12:35:03PM +0300, Nikolay Aleksandrov wrote: > On 15/05/2023 11:50, Johannes Nixdorf wrote: > > A malicious actor behind one bridge port may spam the kernel with packets > > with a random source MAC address, each of which will create an FDB entry, > > each of which is a dynamic allocation in the kernel. > > > > There are roughly 2^48 different MAC addresses, further limited by the > > rhashtable they are stored in to 2^31. Each entry is of the type struct > > net_bridge_fdb_entry, which is currently 128 bytes big. This means the > > maximum amount of memory allocated for FDB entries is 2^31 * 128B = > > 256GiB, which is too much for most computers. > > > > Mitigate this by adding a bridge netlink setting IFLA_BR_FDB_MAX_ENTRIES, > > which, if nonzero, limits the amount of entries to a user specified > > maximum. > > > > For backwards compatibility the default setting of 0 disables the limit. > > > > All changes to fdb_n_entries are under br->hash_lock, which means we do > > not need additional locking. The call paths are (✓ denotes that > > br->hash_lock is taken around the next call): > > > > - fdb_delete <-+- fdb_delete_local <-+- br_fdb_changeaddr ✓ > > | +- br_fdb_change_mac_address ✓ > > | +- br_fdb_delete_by_port ✓ > > +- br_fdb_find_delete_local ✓ > > +- fdb_add_local <-+- br_fdb_changeaddr ✓ > > | +- br_fdb_change_mac_address ✓ > > | +- br_fdb_add_local ✓ > > +- br_fdb_cleanup ✓ > > +- br_fdb_flush ✓ > > +- br_fdb_delete_by_port ✓ > > +- fdb_delete_by_addr_and_port <--- __br_fdb_delete ✓ > > +- br_fdb_external_learn_del ✓ > > - fdb_create <-+- fdb_add_local <-+- br_fdb_changeaddr ✓ > > | +- br_fdb_change_mac_address ✓ > > | +- br_fdb_add_local ✓ > > +- br_fdb_update ✓ > > +- fdb_add_entry <--- __br_fdb_add ✓ > > +- br_fdb_external_learn_add ✓ > > > > Signed-off-by: Johannes Nixdorf <jnixdorf-oss@avm.de> > > --- > > include/uapi/linux/if_link.h | 1 + > > net/bridge/br_device.c | 2 ++ > > net/bridge/br_fdb.c | 6 ++++++ > > net/bridge/br_netlink.c | 9 ++++++++- > > net/bridge/br_private.h | 2 ++ > > 5 files changed, 19 insertions(+), 1 deletion(-) > > > > Hi, > If you're sending a patch series please add a cover letter (--cover-letter) which > explains what the series are trying to do and why. Thanks for the hint. I'll do that in the future, including a potential v2. > I've had a patch that implements this feature for a while but didn't get to upstreaming it. :) I'm not too attached to my name on it, so if your patch is further along, I'd be happy if you submitted your version instead. > Anyway more comments below, > > diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h > > index 4ac1000b0ef2..27cf5f2d8790 100644 > > --- a/include/uapi/linux/if_link.h > > +++ b/include/uapi/linux/if_link.h > > @@ -510,6 +510,7 @@ enum { > > IFLA_BR_VLAN_STATS_PER_PORT, > > IFLA_BR_MULTI_BOOLOPT, > > IFLA_BR_MCAST_QUERIER_STATE, > > + IFLA_BR_FDB_MAX_ENTRIES, > > __IFLA_BR_MAX, > > }; > > > > diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c > > index 8eca8a5c80c6..d455a28df7c9 100644 > > --- a/net/bridge/br_device.c > > +++ b/net/bridge/br_device.c > > @@ -528,6 +528,8 @@ void br_dev_setup(struct net_device *dev) > > br->bridge_hello_time = br->hello_time = 2 * HZ; > > br->bridge_forward_delay = br->forward_delay = 15 * HZ; > > br->bridge_ageing_time = br->ageing_time = BR_DEFAULT_AGEING_TIME; > > + br->fdb_n_entries = 0; > > + br->fdb_max_entries = 0; > > Unnecessary, the private area is already cleared. This will be taken out in a v2. > > dev->max_mtu = ETH_MAX_MTU; > > > > br_netfilter_rtable_init(br); > > diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c > > index e69a872bfc1d..8a833e6dee92 100644 > > --- a/net/bridge/br_fdb.c > > +++ b/net/bridge/br_fdb.c > > @@ -329,6 +329,8 @@ static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f, > > hlist_del_init_rcu(&f->fdb_node); > > rhashtable_remove_fast(&br->fdb_hash_tbl, &f->rhnode, > > br_fdb_rht_params); > > + if (!WARN_ON(!br->fdb_n_entries)) > > + br->fdb_n_entries--; > > This is pointless, just put the WARN_ON(!br->fdb_n_entries) above decrementing, if we > hit that we are already in trouble and not decrementing doesn't help us. This will now always be decremented in a v2. > > fdb_notify(br, f, RTM_DELNEIGH, swdev_notify); > > call_rcu(&f->rcu, fdb_rcu_free); > > } > > @@ -391,6 +393,9 @@ static struct net_bridge_fdb_entry *fdb_create(struct net_bridge *br, > > struct net_bridge_fdb_entry *fdb; > > int err; > > > > + if (unlikely(br->fdb_max_entries && br->fdb_n_entries >= br->fdb_max_entries)) > > + return NULL; > > + > > This one needs more work, fdb_create() is also used when user-space is adding new > entries, so it would be nice to return a proper error. The callers usually map this return code to -ENOMEM, which I deemed an appropriate return code for violating the new limit, as I understood it as a memory limit for the FDB table. Is there a different error return code you had in mind here, or would you rather only count dynamic entries towards the limit at all? > > fdb = kmem_cache_alloc(br_fdb_cache, GFP_ATOMIC); > > if (!fdb) > > return NULL; > > @@ -408,6 +413,7 @@ static struct net_bridge_fdb_entry *fdb_create(struct net_bridge *br, > > } > > > > hlist_add_head_rcu(&fdb->fdb_node, &br->fdb_list); > > + br->fdb_n_entries++; > > > > return fdb; > > } > > diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c > > index 05c5863d2e20..e5b8d36a3291 100644 > > --- a/net/bridge/br_netlink.c > > +++ b/net/bridge/br_netlink.c > > @@ -1527,6 +1527,12 @@ static int br_changelink(struct net_device *brdev, struct nlattr *tb[], > > return err; > > } > > > > + if (data[IFLA_BR_FDB_MAX_ENTRIES]) { > > + u32 val = nla_get_u32(data[IFLA_BR_FDB_MAX_ENTRIES]); > > + > > + br->fdb_max_entries = val; > > + } > > + > > return 0; > > } > > > > @@ -1656,7 +1662,8 @@ static int br_fill_info(struct sk_buff *skb, const struct net_device *brdev) > > nla_put_u8(skb, IFLA_BR_TOPOLOGY_CHANGE_DETECTED, > > br->topology_change_detected) || > > nla_put(skb, IFLA_BR_GROUP_ADDR, ETH_ALEN, br->group_addr) || > > - nla_put(skb, IFLA_BR_MULTI_BOOLOPT, sizeof(bm), &bm)) > > + nla_put(skb, IFLA_BR_MULTI_BOOLOPT, sizeof(bm), &bm) || > > + nla_put_u32(skb, IFLA_BR_FDB_MAX_ENTRIES, br->fdb_max_entries)) > > You are not returning the current entry count, that is also needed. For a v2 this now also returns the current entry count under IFLA_BR_FDB_CUR_ENTRIES. > > return -EMSGSIZE; > > > > #ifdef CONFIG_BRIDGE_VLAN_FILTERING > > diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h > > index 2119729ded2b..64fb359c6e3e 100644 > > --- a/net/bridge/br_private.h > > +++ b/net/bridge/br_private.h > > @@ -494,6 +494,8 @@ struct net_bridge { > > #endif > > > > struct rhashtable fdb_hash_tbl; > > + u32 fdb_n_entries; > > + u32 fdb_max_entries; > > These are not critical, so I'd use 4 byte holes in net_bridge and pack it better > instead of making it larger. For a v2 I now moved it into (conditional) holes now in front of CONFIG_BRIDGE_VLAN_FILTERING (only a hole if it is enabled) and CONFIG_SWITCHDEV (only a hole if it is disabled). I could not find any other holes, but please tell me if you had any others in mind. > > struct list_head port_list; > > #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER) > > union { > Thanks for your detailed feedback.
On 16/05/2023 11:12, Johannes Nixdorf wrote: > Hi, > > On Mon, May 15, 2023 at 12:35:03PM +0300, Nikolay Aleksandrov wrote: >> On 15/05/2023 11:50, Johannes Nixdorf wrote: >>> A malicious actor behind one bridge port may spam the kernel with packets >>> with a random source MAC address, each of which will create an FDB entry, >>> each of which is a dynamic allocation in the kernel. >>> >>> There are roughly 2^48 different MAC addresses, further limited by the >>> rhashtable they are stored in to 2^31. Each entry is of the type struct >>> net_bridge_fdb_entry, which is currently 128 bytes big. This means the >>> maximum amount of memory allocated for FDB entries is 2^31 * 128B = >>> 256GiB, which is too much for most computers. >>> >>> Mitigate this by adding a bridge netlink setting IFLA_BR_FDB_MAX_ENTRIES, >>> which, if nonzero, limits the amount of entries to a user specified >>> maximum. >>> >>> For backwards compatibility the default setting of 0 disables the limit. >>> >>> All changes to fdb_n_entries are under br->hash_lock, which means we do >>> not need additional locking. The call paths are (✓ denotes that >>> br->hash_lock is taken around the next call): >>> >>> - fdb_delete <-+- fdb_delete_local <-+- br_fdb_changeaddr ✓ >>> | +- br_fdb_change_mac_address ✓ >>> | +- br_fdb_delete_by_port ✓ >>> +- br_fdb_find_delete_local ✓ >>> +- fdb_add_local <-+- br_fdb_changeaddr ✓ >>> | +- br_fdb_change_mac_address ✓ >>> | +- br_fdb_add_local ✓ >>> +- br_fdb_cleanup ✓ >>> +- br_fdb_flush ✓ >>> +- br_fdb_delete_by_port ✓ >>> +- fdb_delete_by_addr_and_port <--- __br_fdb_delete ✓ >>> +- br_fdb_external_learn_del ✓ >>> - fdb_create <-+- fdb_add_local <-+- br_fdb_changeaddr ✓ >>> | +- br_fdb_change_mac_address ✓ >>> | +- br_fdb_add_local ✓ >>> +- br_fdb_update ✓ >>> +- fdb_add_entry <--- __br_fdb_add ✓ >>> +- br_fdb_external_learn_add ✓ >>> >>> Signed-off-by: Johannes Nixdorf <jnixdorf-oss@avm.de> >>> --- >>> include/uapi/linux/if_link.h | 1 + >>> net/bridge/br_device.c | 2 ++ >>> net/bridge/br_fdb.c | 6 ++++++ >>> net/bridge/br_netlink.c | 9 ++++++++- >>> net/bridge/br_private.h | 2 ++ >>> 5 files changed, 19 insertions(+), 1 deletion(-) >>> >> >> Hi, >> If you're sending a patch series please add a cover letter (--cover-letter) which >> explains what the series are trying to do and why. > > Thanks for the hint. I'll do that in the future, including a potential v2. > >> I've had a patch that implements this feature for a while but didn't get to upstreaming it. :) > > I'm not too attached to my name on it, so if your patch is further along, > I'd be happy if you submitted your version instead. > Nah, just mentioning it. I'd be happy to review and ack your patches. I don't have time at the moment to polish and upstream it. Please add a selftest as well. >> Anyway more comments below, > >>> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h >>> index 4ac1000b0ef2..27cf5f2d8790 100644 >>> --- a/include/uapi/linux/if_link.h >>> +++ b/include/uapi/linux/if_link.h >>> @@ -510,6 +510,7 @@ enum { >>> IFLA_BR_VLAN_STATS_PER_PORT, >>> IFLA_BR_MULTI_BOOLOPT, >>> IFLA_BR_MCAST_QUERIER_STATE, >>> + IFLA_BR_FDB_MAX_ENTRIES, >>> __IFLA_BR_MAX, >>> }; >>> >>> diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c >>> index 8eca8a5c80c6..d455a28df7c9 100644 >>> --- a/net/bridge/br_device.c >>> +++ b/net/bridge/br_device.c >>> @@ -528,6 +528,8 @@ void br_dev_setup(struct net_device *dev) >>> br->bridge_hello_time = br->hello_time = 2 * HZ; >>> br->bridge_forward_delay = br->forward_delay = 15 * HZ; >>> br->bridge_ageing_time = br->ageing_time = BR_DEFAULT_AGEING_TIME; >>> + br->fdb_n_entries = 0; >>> + br->fdb_max_entries = 0; >> >> Unnecessary, the private area is already cleared. > > This will be taken out in a v2. > >>> dev->max_mtu = ETH_MAX_MTU; >>> >>> br_netfilter_rtable_init(br); >>> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c >>> index e69a872bfc1d..8a833e6dee92 100644 >>> --- a/net/bridge/br_fdb.c >>> +++ b/net/bridge/br_fdb.c >>> @@ -329,6 +329,8 @@ static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f, >>> hlist_del_init_rcu(&f->fdb_node); >>> rhashtable_remove_fast(&br->fdb_hash_tbl, &f->rhnode, >>> br_fdb_rht_params); >>> + if (!WARN_ON(!br->fdb_n_entries)) >>> + br->fdb_n_entries--; >> >> This is pointless, just put the WARN_ON(!br->fdb_n_entries) above decrementing, if we >> hit that we are already in trouble and not decrementing doesn't help us. > > This will now always be decremented in a v2. > >>> fdb_notify(br, f, RTM_DELNEIGH, swdev_notify); >>> call_rcu(&f->rcu, fdb_rcu_free); >>> } >>> @@ -391,6 +393,9 @@ static struct net_bridge_fdb_entry *fdb_create(struct net_bridge *br, >>> struct net_bridge_fdb_entry *fdb; >>> int err; >>> >>> + if (unlikely(br->fdb_max_entries && br->fdb_n_entries >= br->fdb_max_entries)) >>> + return NULL; >>> + >> >> This one needs more work, fdb_create() is also used when user-space is adding new >> entries, so it would be nice to return a proper error. > > The callers usually map this return code to -ENOMEM, which I deemed an > appropriate return code for violating the new limit, as I understood it > as a memory limit for the FDB table. > > Is there a different error return code you had in mind here, or would > you rather only count dynamic entries towards the limit at all? > I think we should be able to tell apart real ENOMEM from out of entries, the callers should interpret the returned error and propagate a user-friendly and understandable extack. The error could be E2BIG for example. That would require updating callers though. Please split these into separate patches so they can be easier to review. >>> fdb = kmem_cache_alloc(br_fdb_cache, GFP_ATOMIC); >>> if (!fdb) >>> return NULL; >>> @@ -408,6 +413,7 @@ static struct net_bridge_fdb_entry *fdb_create(struct net_bridge *br, >>> } >>> >>> hlist_add_head_rcu(&fdb->fdb_node, &br->fdb_list); >>> + br->fdb_n_entries++; >>> >>> return fdb; >>> } >>> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c >>> index 05c5863d2e20..e5b8d36a3291 100644 >>> --- a/net/bridge/br_netlink.c >>> +++ b/net/bridge/br_netlink.c >>> @@ -1527,6 +1527,12 @@ static int br_changelink(struct net_device *brdev, struct nlattr *tb[], >>> return err; >>> } >>> >>> + if (data[IFLA_BR_FDB_MAX_ENTRIES]) { >>> + u32 val = nla_get_u32(data[IFLA_BR_FDB_MAX_ENTRIES]); >>> + >>> + br->fdb_max_entries = val; >>> + } >>> + >>> return 0; >>> } >>> >>> @@ -1656,7 +1662,8 @@ static int br_fill_info(struct sk_buff *skb, const struct net_device *brdev) >>> nla_put_u8(skb, IFLA_BR_TOPOLOGY_CHANGE_DETECTED, >>> br->topology_change_detected) || >>> nla_put(skb, IFLA_BR_GROUP_ADDR, ETH_ALEN, br->group_addr) || >>> - nla_put(skb, IFLA_BR_MULTI_BOOLOPT, sizeof(bm), &bm)) >>> + nla_put(skb, IFLA_BR_MULTI_BOOLOPT, sizeof(bm), &bm) || >>> + nla_put_u32(skb, IFLA_BR_FDB_MAX_ENTRIES, br->fdb_max_entries)) >> >> You are not returning the current entry count, that is also needed. > > For a v2 this now also returns the current entry count under > IFLA_BR_FDB_CUR_ENTRIES. > >>> return -EMSGSIZE; >>> >>> #ifdef CONFIG_BRIDGE_VLAN_FILTERING >>> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h >>> index 2119729ded2b..64fb359c6e3e 100644 >>> --- a/net/bridge/br_private.h >>> +++ b/net/bridge/br_private.h >>> @@ -494,6 +494,8 @@ struct net_bridge { >>> #endif >>> >>> struct rhashtable fdb_hash_tbl; >>> + u32 fdb_n_entries; >>> + u32 fdb_max_entries; >> >> These are not critical, so I'd use 4 byte holes in net_bridge and pack it better >> instead of making it larger. > > For a v2 I now moved it into (conditional) holes now in front of > CONFIG_BRIDGE_VLAN_FILTERING (only a hole if it is enabled) and > CONFIG_SWITCHDEV (only a hole if it is disabled). I could not find any > other holes, but please tell me if you had any others in mind. > >>> struct list_head port_list; >>> #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER) >>> union { >> > > Thanks for your detailed feedback.
On 16/05/2023 11:12, Johannes Nixdorf wrote: [snip] >>> return -EMSGSIZE; >>> >>> #ifdef CONFIG_BRIDGE_VLAN_FILTERING >>> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h >>> index 2119729ded2b..64fb359c6e3e 100644 >>> --- a/net/bridge/br_private.h >>> +++ b/net/bridge/br_private.h >>> @@ -494,6 +494,8 @@ struct net_bridge { >>> #endif >>> >>> struct rhashtable fdb_hash_tbl; >>> + u32 fdb_n_entries; >>> + u32 fdb_max_entries; >> >> These are not critical, so I'd use 4 byte holes in net_bridge and pack it better >> instead of making it larger. > > For a v2 I now moved it into (conditional) holes now in front of > CONFIG_BRIDGE_VLAN_FILTERING (only a hole if it is enabled) and > CONFIG_SWITCHDEV (only a hole if it is disabled). I could not find any > other holes, but please tell me if you had any others in mind. > Just please don't add them in the first 64 bytes (first cache line) as we use that in the hot path and keep it for variables used there. I'd say use any of the other 4 byte holes and just add both, so another 4 byte hole would be left after the second one. >>> struct list_head port_list; >>> #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER) >>> union { >> > > Thanks for your detailed feedback.
On 15/05/2023 11:50, Johannes Nixdorf wrote: > A malicious actor behind one bridge port may spam the kernel with packets > with a random source MAC address, each of which will create an FDB entry, > each of which is a dynamic allocation in the kernel. > > There are roughly 2^48 different MAC addresses, further limited by the > rhashtable they are stored in to 2^31. Each entry is of the type struct > net_bridge_fdb_entry, which is currently 128 bytes big. This means the > maximum amount of memory allocated for FDB entries is 2^31 * 128B = > 256GiB, which is too much for most computers. > > Mitigate this by adding a bridge netlink setting IFLA_BR_FDB_MAX_ENTRIES, > which, if nonzero, limits the amount of entries to a user specified > maximum. > > For backwards compatibility the default setting of 0 disables the limit. > > All changes to fdb_n_entries are under br->hash_lock, which means we do > not need additional locking. The call paths are (✓ denotes that > br->hash_lock is taken around the next call): > > - fdb_delete <-+- fdb_delete_local <-+- br_fdb_changeaddr ✓ > | +- br_fdb_change_mac_address ✓ > | +- br_fdb_delete_by_port ✓ > +- br_fdb_find_delete_local ✓ > +- fdb_add_local <-+- br_fdb_changeaddr ✓ > | +- br_fdb_change_mac_address ✓ > | +- br_fdb_add_local ✓ > +- br_fdb_cleanup ✓ > +- br_fdb_flush ✓ > +- br_fdb_delete_by_port ✓ > +- fdb_delete_by_addr_and_port <--- __br_fdb_delete ✓ > +- br_fdb_external_learn_del ✓ > - fdb_create <-+- fdb_add_local <-+- br_fdb_changeaddr ✓ > | +- br_fdb_change_mac_address ✓ > | +- br_fdb_add_local ✓ > +- br_fdb_update ✓ > +- fdb_add_entry <--- __br_fdb_add ✓ > +- br_fdb_external_learn_add ✓ > > Signed-off-by: Johannes Nixdorf <jnixdorf-oss@avm.de> > --- > include/uapi/linux/if_link.h | 1 + > net/bridge/br_device.c | 2 ++ > net/bridge/br_fdb.c | 6 ++++++ > net/bridge/br_netlink.c | 9 ++++++++- > net/bridge/br_private.h | 2 ++ > 5 files changed, 19 insertions(+), 1 deletion(-) > I completely missed the fact that you don't deal with the situation where you already have fdbs created and a limit is set later, then it would be useless because it will start counting from 0 even though there are already entries. Also another issue that came to mind is that you don't deal with fdb_create() for "special" entries, i.e. when adding a port. Currently it will print an error, but you should revisit all callers and see where it might be a problem.
On Tue, May 16, 2023 at 11:38:11AM +0300, Nikolay Aleksandrov wrote: > On 15/05/2023 11:50, Johannes Nixdorf wrote: > > A malicious actor behind one bridge port may spam the kernel with packets > > with a random source MAC address, each of which will create an FDB entry, > > each of which is a dynamic allocation in the kernel. > > > > There are roughly 2^48 different MAC addresses, further limited by the > > rhashtable they are stored in to 2^31. Each entry is of the type struct > > net_bridge_fdb_entry, which is currently 128 bytes big. This means the > > maximum amount of memory allocated for FDB entries is 2^31 * 128B = > > 256GiB, which is too much for most computers. > > > > Mitigate this by adding a bridge netlink setting IFLA_BR_FDB_MAX_ENTRIES, > > which, if nonzero, limits the amount of entries to a user specified > > maximum. > > > > For backwards compatibility the default setting of 0 disables the limit. > > > > All changes to fdb_n_entries are under br->hash_lock, which means we do > > not need additional locking. The call paths are (✓ denotes that > > br->hash_lock is taken around the next call): > > > > - fdb_delete <-+- fdb_delete_local <-+- br_fdb_changeaddr ✓ > > | +- br_fdb_change_mac_address ✓ > > | +- br_fdb_delete_by_port ✓ > > +- br_fdb_find_delete_local ✓ > > +- fdb_add_local <-+- br_fdb_changeaddr ✓ > > | +- br_fdb_change_mac_address ✓ > > | +- br_fdb_add_local ✓ > > +- br_fdb_cleanup ✓ > > +- br_fdb_flush ✓ > > +- br_fdb_delete_by_port ✓ > > +- fdb_delete_by_addr_and_port <--- __br_fdb_delete ✓ > > +- br_fdb_external_learn_del ✓ > > - fdb_create <-+- fdb_add_local <-+- br_fdb_changeaddr ✓ > > | +- br_fdb_change_mac_address ✓ > > | +- br_fdb_add_local ✓ > > +- br_fdb_update ✓ > > +- fdb_add_entry <--- __br_fdb_add ✓ > > +- br_fdb_external_learn_add ✓ > > > > Signed-off-by: Johannes Nixdorf <jnixdorf-oss@avm.de> > > --- > > include/uapi/linux/if_link.h | 1 + > > net/bridge/br_device.c | 2 ++ > > net/bridge/br_fdb.c | 6 ++++++ > > net/bridge/br_netlink.c | 9 ++++++++- > > net/bridge/br_private.h | 2 ++ > > 5 files changed, 19 insertions(+), 1 deletion(-) > > > > I completely missed the fact that you don't deal with the situation where you already have fdbs created > and a limit is set later, then it would be useless because it will start counting from 0 even though > there are already entries. This should not be an issue. The accounting starts with the bridge creation and is never suspended, so if the user sets a limit later we do not restart counting at 0. The only corner case I can see there is if the user sets a new limit lower than the current number of FDB entries. In that case the code currently leaves the bridge in a state where the limit is violated, but refuses new FDB entries until the total is back below the limit. The alternative of cleaning out old FDB entries until their number is under the limit again seems to be more error prone to me as well, so I'd rather leave it that way. > Also another issue that came to mind is that you don't deal with fdb_create() > for "special" entries, i.e. when adding a port. Currently it will print an error, but you should revisit > all callers and see where it might be a problem. I'll have a look again, also to see whether only counting dynamic entries created as a reaction to observed packets might be a viable alternative. If the user creates the entries by adding a port or manually via netlink I see no reason to restrict them to the same limit.
On 16/05/2023 11:53, Johannes Nixdorf wrote: > On Tue, May 16, 2023 at 11:38:11AM +0300, Nikolay Aleksandrov wrote: >> On 15/05/2023 11:50, Johannes Nixdorf wrote: >>> A malicious actor behind one bridge port may spam the kernel with packets >>> with a random source MAC address, each of which will create an FDB entry, >>> each of which is a dynamic allocation in the kernel. >>> >>> There are roughly 2^48 different MAC addresses, further limited by the >>> rhashtable they are stored in to 2^31. Each entry is of the type struct >>> net_bridge_fdb_entry, which is currently 128 bytes big. This means the >>> maximum amount of memory allocated for FDB entries is 2^31 * 128B = >>> 256GiB, which is too much for most computers. >>> >>> Mitigate this by adding a bridge netlink setting IFLA_BR_FDB_MAX_ENTRIES, >>> which, if nonzero, limits the amount of entries to a user specified >>> maximum. >>> >>> For backwards compatibility the default setting of 0 disables the limit. >>> >>> All changes to fdb_n_entries are under br->hash_lock, which means we do >>> not need additional locking. The call paths are (✓ denotes that >>> br->hash_lock is taken around the next call): >>> >>> - fdb_delete <-+- fdb_delete_local <-+- br_fdb_changeaddr ✓ >>> | +- br_fdb_change_mac_address ✓ >>> | +- br_fdb_delete_by_port ✓ >>> +- br_fdb_find_delete_local ✓ >>> +- fdb_add_local <-+- br_fdb_changeaddr ✓ >>> | +- br_fdb_change_mac_address ✓ >>> | +- br_fdb_add_local ✓ >>> +- br_fdb_cleanup ✓ >>> +- br_fdb_flush ✓ >>> +- br_fdb_delete_by_port ✓ >>> +- fdb_delete_by_addr_and_port <--- __br_fdb_delete ✓ >>> +- br_fdb_external_learn_del ✓ >>> - fdb_create <-+- fdb_add_local <-+- br_fdb_changeaddr ✓ >>> | +- br_fdb_change_mac_address ✓ >>> | +- br_fdb_add_local ✓ >>> +- br_fdb_update ✓ >>> +- fdb_add_entry <--- __br_fdb_add ✓ >>> +- br_fdb_external_learn_add ✓ >>> >>> Signed-off-by: Johannes Nixdorf <jnixdorf-oss@avm.de> >>> --- >>> include/uapi/linux/if_link.h | 1 + >>> net/bridge/br_device.c | 2 ++ >>> net/bridge/br_fdb.c | 6 ++++++ >>> net/bridge/br_netlink.c | 9 ++++++++- >>> net/bridge/br_private.h | 2 ++ >>> 5 files changed, 19 insertions(+), 1 deletion(-) >>> >> >> I completely missed the fact that you don't deal with the situation where you already have fdbs created >> and a limit is set later, then it would be useless because it will start counting from 0 even though >> there are already entries. > > This should not be an issue. The accounting starts with the bridge > creation and is never suspended, so if the user sets a limit later we > do not restart counting at 0. > > The only corner case I can see there is if the user sets a new limit > lower than the current number of FDB entries. In that case the code > currently leaves the bridge in a state where the limit is violated, > but refuses new FDB entries until the total is back below the limit. The > alternative of cleaning out old FDB entries until their number is under > the limit again seems to be more error prone to me as well, so I'd rather > leave it that way. > Ah, good. That's ok then. >> Also another issue that came to mind is that you don't deal with fdb_create() >> for "special" entries, i.e. when adding a port. Currently it will print an error, but you should revisit >> all callers and see where it might be a problem. > > I'll have a look again, also to see whether only counting dynamic > entries created as a reaction to observed packets might be a viable > alternative. If the user creates the entries by adding a port or manually > via netlink I see no reason to restrict them to the same limit. Hmm.. perhaps we can add a flag mask of entries to count. Initially it can be only dynamic entries. We should include more people in this discussion (+CC Ido and Vladimir). Switchdev folks might have more specific requirements and restrictions, so it'd be nice to get their input as well.
Hi, On Tue, May 16, 2023 at 11:56:41AM +0300, Nikolay Aleksandrov wrote: > Hmm.. perhaps we can add a flag mask of entries to count. Initially it can be > only dynamic entries. We should include more people in this discussion (+CC Ido and Vladimir). > Switchdev folks might have more specific requirements and restrictions, so it'd be nice to get > their input as well. I have some other things to do until I can take a closer look at this discussion, but in principle, switchdev drivers will likely want to impose their own limit on FDB entries because the hardware itself is inherently limited in size, so I'm thinking there should be another way for the software bridge to be informed about this limit other than UAPI. Which ports that limit should affect (think bridging between ports of different switches with different FDB sizes) I don't know. If we only consider switchdev, FDB limits should probably be per hwdom. Also, in terms of static vs dynamic limits, I've seen hardware implementations where static FDB entries go to a different FDB table compared to dynamic ones (Microchip KSZ DSA switches), implementations where static partitioning between static and dynamic FDB entries is possible but configurable, and implementations where they all consume from the shared space and you'd have to evict a dynamic entry to install a static one. So it's hard to really say what's the size. That, plus not to mention, many hardware FDBs are not fully associative, and due to hash collisions, you may be unable to install an entry in the 4-way associative bin where its {MAC,VID} hash says it should go, even though the FDB at large is not full. It sounds sexy to take switchdev into consideration, but I'm not really sure what we want. Something flexible to cater for the above, probably. This discussion should probably be merged with: https://lore.kernel.org/netdev/20230324144917.32lnpgtw5auuyovy@skbuf/T/#ma600839815582ca61886e83ba533b1dfbe447557 so I'm CCing Oleksij too, since he probably knows better than me what he wants. In the thread with DSA trace events, there also was a short talk about user space theoretically being able to infer FDB sizes and utilization degree based on instrumenting with ftrace, which is something we wouldn't like to have to maintain. So I'm adding the DSA maintainers too, since there is interest for agreeing on a different API. https://lore.kernel.org/netdev/2f150ad4-34f4-4af9-b3ce-c1aff208ec7e@lunn.ch/T/#mfa895245fd012e8f66db784fa568109dba396aa7
On 16/05/2023 13:21, Vladimir Oltean wrote: > Hi, > > On Tue, May 16, 2023 at 11:56:41AM +0300, Nikolay Aleksandrov wrote: >> Hmm.. perhaps we can add a flag mask of entries to count. Initially it can be >> only dynamic entries. We should include more people in this discussion (+CC Ido and Vladimir). >> Switchdev folks might have more specific requirements and restrictions, so it'd be nice to get >> their input as well. > > I have some other things to do until I can take a closer look at this > discussion, but in principle, switchdev drivers will likely want to > impose their own limit on FDB entries because the hardware itself is > inherently limited in size, so I'm thinking there should be another way > for the software bridge to be informed about this limit other than UAPI. Yep, that's ok but it can be added later. This is pretty much internal. > Which ports that limit should affect (think bridging between ports of > different switches with different FDB sizes) I don't know. If we only > consider switchdev, FDB limits should probably be per hwdom. > Now, that's a whole different issue (per-port limits). I've prototype patches for that too, but it's a much harder problem to solve and scale in software. Let's please focus on the single global limit for the moment. > Also, in terms of static vs dynamic limits, I've seen hardware > implementations where static FDB entries go to a different FDB table > compared to dynamic ones (Microchip KSZ DSA switches), implementations > where static partitioning between static and dynamic FDB entries is > possible but configurable, and implementations where they all consume > from the shared space and you'd have to evict a dynamic entry to install > a static one. So it's hard to really say what's the size. That, plus not > to mention, many hardware FDBs are not fully associative, and due to > hash collisions, you may be unable to install an entry in the 4-way > associative bin where its {MAC,VID} hash says it should go, even though > the FDB at large is not full. > > It sounds sexy to take switchdev into consideration, but I'm not really > sure what we want. Something flexible to cater for the above, probably. > This discussion should probably be merged with: > https://lore.kernel.org/netdev/20230324144917.32lnpgtw5auuyovy@skbuf/T/#ma600839815582ca61886e83ba533b1dfbe447557 > so I'm CCing Oleksij too, since he probably knows better than me what he > wants. > Let's take a step back, I wasn't suggesting we start with a full-fledged switchdev implementation. :) I meant only to see if the minimum global limit implementation suggested would suffice and would be able to later extend so switchdev can use and potentially modify (e.g. drivers setting limits etc). We can start with a simple support for limits and then extend accordingly. The important part here is to not add any uAPI that can't be changed later which would impact future changes. > In the thread with DSA trace events, there also was a short talk about > user space theoretically being able to infer FDB sizes and utilization > degree based on instrumenting with ftrace, which is something we wouldn't > like to have to maintain. So I'm adding the DSA maintainers too, since > there is interest for agreeing on a different API. > https://lore.kernel.org/netdev/2f150ad4-34f4-4af9-b3ce-c1aff208ec7e@lunn.ch/T/#mfa895245fd012e8f66db784fa568109dba396aa7
On Tue, May 16, 2023 at 01:32:05PM +0300, Nikolay Aleksandrov wrote: > Let's take a step back, I wasn't suggesting we start with a full-fledged switchdev > implementation. :) I meant only to see if the minimum global limit implementation > suggested would suffice and would be able to later extend so switchdev can use and > potentially modify (e.g. drivers setting limits etc). We can start with a simple > support for limits and then extend accordingly. The important part here is to > not add any uAPI that can't be changed later which would impact future changes. I guess adding a global per-bridge learning limit now makes sense and would not unreasonably hinder switchdev later on. The focus is on "learning limit" and not a limit to user-created entries as Johannes has currently done in v1. I don't necessarily see an urgent need for IFLA_BR_FDB_CUR_ENTRIES, given the fact that user space can dump the FDB and count what it needs, filtering for FDB types accordingly.
On 16/05/2023 13:44, Vladimir Oltean wrote: > On Tue, May 16, 2023 at 01:32:05PM +0300, Nikolay Aleksandrov wrote: >> Let's take a step back, I wasn't suggesting we start with a full-fledged switchdev >> implementation. :) I meant only to see if the minimum global limit implementation >> suggested would suffice and would be able to later extend so switchdev can use and >> potentially modify (e.g. drivers setting limits etc). We can start with a simple >> support for limits and then extend accordingly. The important part here is to >> not add any uAPI that can't be changed later which would impact future changes. > > I guess adding a global per-bridge learning limit now makes sense and > would not unreasonably hinder switchdev later on. The focus is on > "learning limit" and not a limit to user-created entries as Johannes has > currently done in v1. I don't necessarily see an urgent need for > IFLA_BR_FDB_CUR_ENTRIES, given the fact that user space can dump the FDB > and count what it needs, filtering for FDB types accordingly. Having the current count is just a helper, if you have a high limit dumping the table and counting might take awhile. Thanks for the feedback, then we'll polish and move on with the set for a global limit.
On Tue, May 16, 2023 at 01:47:47PM +0300, Nikolay Aleksandrov wrote: > Having the current count is just a helper, if you have a high limit dumping the table > and counting might take awhile. Thanks for the feedback, then we'll polish and move > on with the set for a global limit. Ok, but to be useful, the current count will have to be directly comparable to the limit, I guess. So the current count will also be for dynamically learned entries? Or is the plan to enforce the global limit for any kind of FDB entries?
On 16/05/2023 13:55, Vladimir Oltean wrote: > On Tue, May 16, 2023 at 01:47:47PM +0300, Nikolay Aleksandrov wrote: >> Having the current count is just a helper, if you have a high limit dumping the table >> and counting might take awhile. Thanks for the feedback, then we'll polish and move >> on with the set for a global limit. > > Ok, but to be useful, the current count will have to be directly > comparable to the limit, I guess. So the current count will also be for > dynamically learned entries? Or is the plan to enforce the global limit > for any kind of FDB entries? That was one of the questions actually. More that I'm thinking about this, the more I want to break it apart by type because we discussed being able to specify a flag mask for the limit (all, dynamic, dynamic+static etc). If we embed these stats into a bridge fdb count attribute, it can be easily extended later if anything new comes along. If switchdev doesn't support some of these global limit configs, we can pass the option and it can deny setting it later. I think this should be more than enough as a first step.
On Tue, May 16, 2023 at 02:04:30PM +0300, Nikolay Aleksandrov wrote: > That was one of the questions actually. More that I'm thinking about this, the more > I want to break it apart by type because we discussed being able to specify a flag > mask for the limit (all, dynamic, dynamic+static etc). If we embed these stats into a > bridge fdb count attribute, it can be easily extended later if anything new comes along. > If switchdev doesn't support some of these global limit configs, we can pass the option > and it can deny setting it later. I think this should be more than enough as a first step. Ok, and by "type" you actually mean the impossibly hard to understand neighbor discovery states used by the bridge UAPI? Like having (overlapping) limits per NUD_REACHABLE, NUD_NOARP etc flags set in ndm->ndm_state? Or how should the UAPI look like?
On 16/05/2023 14:10, Vladimir Oltean wrote: > On Tue, May 16, 2023 at 02:04:30PM +0300, Nikolay Aleksandrov wrote: >> That was one of the questions actually. More that I'm thinking about this, the more >> I want to break it apart by type because we discussed being able to specify a flag >> mask for the limit (all, dynamic, dynamic+static etc). If we embed these stats into a >> bridge fdb count attribute, it can be easily extended later if anything new comes along. >> If switchdev doesn't support some of these global limit configs, we can pass the option >> and it can deny setting it later. I think this should be more than enough as a first step. > > Ok, and by "type" you actually mean the impossibly hard to understand > neighbor discovery states used by the bridge UAPI? Like having Yes, that is what I mean. It's not that hard, we can limit it to a few combinations that are well understood and defined. > (overlapping) limits per NUD_REACHABLE, NUD_NOARP etc flags set in > ndm->ndm_state? Or how should the UAPI look like? Hmm, perhaps for the time being and for keeping it simpler it'd be best if the type initially is just about dynamic entries and the count reflects only those. We can add an abstraction later if we want "per-type"/mask limits. Adding such abstraction should be pretty-straight forward if we keep in mind the flags that can change only under lock, otherwise proper counting would have to be revisited.
On Tue, May 16, 2023 at 02:18:15PM +0300, Nikolay Aleksandrov wrote: > On 16/05/2023 14:10, Vladimir Oltean wrote: > > On Tue, May 16, 2023 at 02:04:30PM +0300, Nikolay Aleksandrov wrote: > >> That was one of the questions actually. More that I'm thinking about this, the more > >> I want to break it apart by type because we discussed being able to specify a flag > >> mask for the limit (all, dynamic, dynamic+static etc). If we embed these stats into a > >> bridge fdb count attribute, it can be easily extended later if anything new comes along. > >> If switchdev doesn't support some of these global limit configs, we can pass the option > >> and it can deny setting it later. I think this should be more than enough as a first step. > > > > Ok, and by "type" you actually mean the impossibly hard to understand > > neighbor discovery states used by the bridge UAPI? Like having > > Yes, that is what I mean. It's not that hard, we can limit it to a few combinations > that are well understood and defined. > > > (overlapping) limits per NUD_REACHABLE, NUD_NOARP etc flags set in > > ndm->ndm_state? Or how should the UAPI look like? > > Hmm, perhaps for the time being and for keeping it simpler it'd be best if the type initially is just about > dynamic entries and the count reflects only those. We can add an abstraction later if we want "per-type"/mask limits. > Adding such abstraction should be pretty-straight forward if we keep in mind the flags that can change only > under lock, otherwise proper counting would have to be revisited. Now that I implemented most of v2, except that I kept the netlink API roughly the same as v1, I noticed that we probably need to discuss the UAPI design more, or else we'd be stuck with the new netlink attributes that do not fit the later abstraction design. I see several options from what was discussed here and what seems to be the easiest to implement for me: 1. Everything is a separate netlink attribute: My current draft of v2 adds 2 netlink attributes - IFLA_BR_FDB_MAX_LEARNED_ENTRIES and IFLA_BR_FDB_CUR_LEARNED_ENTRIES. More generally this would be two u32 netlink attributes for each limit (_MAX_ (RW) and _CUR_ (RO)), which can be differentiated by their name. 1.a Each limit is a separate netlink attribute, _CUR_ and _MAX_ are grouped together as a nested message: Like 1., but add only one netlink attribute for each limit (e.g. IFLA_BR_FDB_LIMIT_LEARNED), containing a nested message with the _CUR_ and _MAX_ attributes. 1.b The same as 1.a, but have one nested message (e.g. IFLA_BR_FDB_LIMITS): The message would contain attributes of the form IFLA_BR_FDB_LIMITS_${NAME}_CUR, IFLA_BR_FDB_LIMITS_${NAME}_MAX, initially only for NAME=LEARNED. 2. Add a new dynamically sized list of attributes + flag mask: Permitt the netlink caller to pass a dynamically sized array (NL_ATTR_TYPE_NESTED_ARRAY?) of pairs of a flag (and state) mask combination and the limit to enforce for them. We'd be rejecting everything but NTF_USE + NUD_NOARP for the first implementation. Problems: - Those are the impossibly hard to understand neighbour discovery states. (as in the quoted mail) Having now looked closer at them and the bridge internal flags they translate to, I also would prefer a different approach. - For the general approach of not just rejecting all but one flag combination accounting is more difficult. For the one limit in v1, and the v2 draft, we can just start counting when creating the bridge, and the accounting is up to date when the user sets a limit. For the general approach later we'd probably not want to include separate counters for each combination in the bridge struct. Instead we'd dynamically allocate our counter when the user sets a limit, so for each newly set limit we'd then need to lock the fdb table and count the current fdb entries matching the limit first. 2.a Invent new names for the supported limits without exposing their flag (and state) masks: Conceptually this is equivalent to putting the names in the netlink attribute namespace as in 1., so I'd prefer to go with one of them instead. Do you have a preference for an approach from the list, or do you see different options I did not include?
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 4ac1000b0ef2..27cf5f2d8790 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -510,6 +510,7 @@ enum { IFLA_BR_VLAN_STATS_PER_PORT, IFLA_BR_MULTI_BOOLOPT, IFLA_BR_MCAST_QUERIER_STATE, + IFLA_BR_FDB_MAX_ENTRIES, __IFLA_BR_MAX, }; diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c index 8eca8a5c80c6..d455a28df7c9 100644 --- a/net/bridge/br_device.c +++ b/net/bridge/br_device.c @@ -528,6 +528,8 @@ void br_dev_setup(struct net_device *dev) br->bridge_hello_time = br->hello_time = 2 * HZ; br->bridge_forward_delay = br->forward_delay = 15 * HZ; br->bridge_ageing_time = br->ageing_time = BR_DEFAULT_AGEING_TIME; + br->fdb_n_entries = 0; + br->fdb_max_entries = 0; dev->max_mtu = ETH_MAX_MTU; br_netfilter_rtable_init(br); diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c index e69a872bfc1d..8a833e6dee92 100644 --- a/net/bridge/br_fdb.c +++ b/net/bridge/br_fdb.c @@ -329,6 +329,8 @@ static void fdb_delete(struct net_bridge *br, struct net_bridge_fdb_entry *f, hlist_del_init_rcu(&f->fdb_node); rhashtable_remove_fast(&br->fdb_hash_tbl, &f->rhnode, br_fdb_rht_params); + if (!WARN_ON(!br->fdb_n_entries)) + br->fdb_n_entries--; fdb_notify(br, f, RTM_DELNEIGH, swdev_notify); call_rcu(&f->rcu, fdb_rcu_free); } @@ -391,6 +393,9 @@ static struct net_bridge_fdb_entry *fdb_create(struct net_bridge *br, struct net_bridge_fdb_entry *fdb; int err; + if (unlikely(br->fdb_max_entries && br->fdb_n_entries >= br->fdb_max_entries)) + return NULL; + fdb = kmem_cache_alloc(br_fdb_cache, GFP_ATOMIC); if (!fdb) return NULL; @@ -408,6 +413,7 @@ static struct net_bridge_fdb_entry *fdb_create(struct net_bridge *br, } hlist_add_head_rcu(&fdb->fdb_node, &br->fdb_list); + br->fdb_n_entries++; return fdb; } diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index 05c5863d2e20..e5b8d36a3291 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -1527,6 +1527,12 @@ static int br_changelink(struct net_device *brdev, struct nlattr *tb[], return err; } + if (data[IFLA_BR_FDB_MAX_ENTRIES]) { + u32 val = nla_get_u32(data[IFLA_BR_FDB_MAX_ENTRIES]); + + br->fdb_max_entries = val; + } + return 0; } @@ -1656,7 +1662,8 @@ static int br_fill_info(struct sk_buff *skb, const struct net_device *brdev) nla_put_u8(skb, IFLA_BR_TOPOLOGY_CHANGE_DETECTED, br->topology_change_detected) || nla_put(skb, IFLA_BR_GROUP_ADDR, ETH_ALEN, br->group_addr) || - nla_put(skb, IFLA_BR_MULTI_BOOLOPT, sizeof(bm), &bm)) + nla_put(skb, IFLA_BR_MULTI_BOOLOPT, sizeof(bm), &bm) || + nla_put_u32(skb, IFLA_BR_FDB_MAX_ENTRIES, br->fdb_max_entries)) return -EMSGSIZE; #ifdef CONFIG_BRIDGE_VLAN_FILTERING diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index 2119729ded2b..64fb359c6e3e 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -494,6 +494,8 @@ struct net_bridge { #endif struct rhashtable fdb_hash_tbl; + u32 fdb_n_entries; + u32 fdb_max_entries; struct list_head port_list; #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER) union {
A malicious actor behind one bridge port may spam the kernel with packets with a random source MAC address, each of which will create an FDB entry, each of which is a dynamic allocation in the kernel. There are roughly 2^48 different MAC addresses, further limited by the rhashtable they are stored in to 2^31. Each entry is of the type struct net_bridge_fdb_entry, which is currently 128 bytes big. This means the maximum amount of memory allocated for FDB entries is 2^31 * 128B = 256GiB, which is too much for most computers. Mitigate this by adding a bridge netlink setting IFLA_BR_FDB_MAX_ENTRIES, which, if nonzero, limits the amount of entries to a user specified maximum. For backwards compatibility the default setting of 0 disables the limit. All changes to fdb_n_entries are under br->hash_lock, which means we do not need additional locking. The call paths are (✓ denotes that br->hash_lock is taken around the next call): - fdb_delete <-+- fdb_delete_local <-+- br_fdb_changeaddr ✓ | +- br_fdb_change_mac_address ✓ | +- br_fdb_delete_by_port ✓ +- br_fdb_find_delete_local ✓ +- fdb_add_local <-+- br_fdb_changeaddr ✓ | +- br_fdb_change_mac_address ✓ | +- br_fdb_add_local ✓ +- br_fdb_cleanup ✓ +- br_fdb_flush ✓ +- br_fdb_delete_by_port ✓ +- fdb_delete_by_addr_and_port <--- __br_fdb_delete ✓ +- br_fdb_external_learn_del ✓ - fdb_create <-+- fdb_add_local <-+- br_fdb_changeaddr ✓ | +- br_fdb_change_mac_address ✓ | +- br_fdb_add_local ✓ +- br_fdb_update ✓ +- fdb_add_entry <--- __br_fdb_add ✓ +- br_fdb_external_learn_add ✓ Signed-off-by: Johannes Nixdorf <jnixdorf-oss@avm.de> --- include/uapi/linux/if_link.h | 1 + net/bridge/br_device.c | 2 ++ net/bridge/br_fdb.c | 6 ++++++ net/bridge/br_netlink.c | 9 ++++++++- net/bridge/br_private.h | 2 ++ 5 files changed, 19 insertions(+), 1 deletion(-)