diff mbox series

[net,1/1] netfilter: nf_tables: wait for rcu grace period on net_device removal

Message ID 20241106235853.169747-2-pablo@netfilter.org (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net,1/1] netfilter: nf_tables: wait for rcu grace period on net_device removal | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3 this patch: 3
netdev/build_tools success Errors and warnings before: 2 (+0) this patch: 2 (+0)
netdev/cc_maintainers warning 3 maintainers not CCed: horms@kernel.org kadlec@netfilter.org coreteam@netfilter.org
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api warning Found: 'put_net(' was: 0 now: 1
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 63 this patch: 63
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc fail Errors and warnings before: 6 this patch: 8
netdev/source_inline success Was 0 now: 0

Commit Message

Pablo Neira Ayuso Nov. 6, 2024, 11:58 p.m. UTC
8c873e219970 ("netfilter: core: free hooks with call_rcu") removed
synchronize_net() call when unregistering basechain hook, however,
net_device removal event handler for the NFPROTO_NETDEV was not updated
to wait for RCU grace period.

Note that 835b803377f5 ("netfilter: nf_tables_netdev: unregister hooks
on net_device removal") does not remove basechain rules on device
removal, I was hinted to remove rules on net_device removal later, see
5ebe0b0eec9d ("netfilter: nf_tables: destroy basechain and rules on
netdevice removal").

Although NETDEV_UNREGISTER event is guaranteed to be handled after
synchronize_net() call, this path needs to wait for rcu grace period via
rcu callback to release basechain hooks if netns is alive because an
ongoing netlink dump could be in progress (sockets hold a reference on
the netns).

Note that nf_tables_pre_exit_net() unregisters and releases basechain
hooks but it is possible to see NETDEV_UNREGISTER at a later stage in
the netns exit path, eg. veth peer device in another netns:

 cleanup_net()
  default_device_exit_batch()
   unregister_netdevice_many_notify()
    notifier_call_chain()
     nf_tables_netdev_event()
      __nft_release_basechain()

In this particular case, same rule of thumb applies: if netns is alive,
then wait for rcu grace period because netlink dump in the other netns
could be in progress. Otherwise, if the other netns is going away then
no netlink dump can be in progress and basechain hooks can be released
inmediately.

While at it, turn WARN_ON() into WARN_ON_ONCE() for the basechain
validation, which should not ever happen.

Fixes: 835b803377f5 ("netfilter: nf_tables_netdev: unregister hooks on net_device removal")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables.h |  2 ++
 net/netfilter/nf_tables_api.c     | 41 +++++++++++++++++++++++++------
 2 files changed, 36 insertions(+), 7 deletions(-)

Comments

Paolo Abeni Nov. 7, 2024, 10:55 a.m. UTC | #1
Hi,
On 11/7/24 00:58, Pablo Neira Ayuso wrote:
> 8c873e219970 ("netfilter: core: free hooks with call_rcu") removed
> synchronize_net() call when unregistering basechain hook, however,
> net_device removal event handler for the NFPROTO_NETDEV was not updated
> to wait for RCU grace period.
> 
> Note that 835b803377f5 ("netfilter: nf_tables_netdev: unregister hooks
> on net_device removal") does not remove basechain rules on device
> removal, I was hinted to remove rules on net_device removal later, see
> 5ebe0b0eec9d ("netfilter: nf_tables: destroy basechain and rules on
> netdevice removal").
> 
> Although NETDEV_UNREGISTER event is guaranteed to be handled after
> synchronize_net() call, this path needs to wait for rcu grace period via
> rcu callback to release basechain hooks if netns is alive because an
> ongoing netlink dump could be in progress (sockets hold a reference on
> the netns).
> 
> Note that nf_tables_pre_exit_net() unregisters and releases basechain
> hooks but it is possible to see NETDEV_UNREGISTER at a later stage in
> the netns exit path, eg. veth peer device in another netns:
> 
>  cleanup_net()
>   default_device_exit_batch()
>    unregister_netdevice_many_notify()
>     notifier_call_chain()
>      nf_tables_netdev_event()
>       __nft_release_basechain()
> 
> In this particular case, same rule of thumb applies: if netns is alive,
> then wait for rcu grace period because netlink dump in the other netns
> could be in progress. Otherwise, if the other netns is going away then
> no netlink dump can be in progress and basechain hooks can be released
> inmediately.
> 
> While at it, turn WARN_ON() into WARN_ON_ONCE() for the basechain
> validation, which should not ever happen.
> 
> Fixes: 835b803377f5 ("netfilter: nf_tables_netdev: unregister hooks on net_device removal")
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> ---
>  include/net/netfilter/nf_tables.h |  2 ++
>  net/netfilter/nf_tables_api.c     | 41 +++++++++++++++++++++++++------
>  2 files changed, 36 insertions(+), 7 deletions(-)
> 
> diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
> index 91ae20cb7648..8dd8e278843d 100644
> --- a/include/net/netfilter/nf_tables.h
> +++ b/include/net/netfilter/nf_tables.h
> @@ -1120,6 +1120,7 @@ struct nft_chain {
>  	char				*name;
>  	u16				udlen;
>  	u8				*udata;
> +	struct rcu_head			rcu_head;

I'm sorry to be pedantic but the CI is complaining about the lack of
kdoc for this field...

>  
>  	/* Only used during control plane commit phase: */
>  	struct nft_rule_blob		*blob_next;
> @@ -1282,6 +1283,7 @@ struct nft_table {
>  	struct list_head		sets;
>  	struct list_head		objects;
>  	struct list_head		flowtables;
> +	possible_net_t			net;

... and this one ...

>  	u64				hgenerator;
>  	u64				handle;
>  	u32				use;

[...]
> +static void nft_release_basechain_rcu(struct rcu_head *head)
> +{
> +	struct nft_chain *chain = container_of(head, struct nft_chain, rcu_head);
> +	struct nft_ctx ctx = {
> +		.family	= chain->table->family,
> +		.chain	= chain,
> +		.net	= read_pnet(&chain->table->net),
> +	};
> +
> +	__nft_release_basechain_now(&ctx);
> +	put_net(ctx.net);

... and also about deprecated API usage here, the put_net_tracker()
version should be preferred.

Given this change will likely land on very old trees I guess the tracker
conversion is better handled as a follow-up net-next patch.

Would you mind addressing the kdoc above? Today PR will be handled by
Jakub quite later, so there is a bit of time.

Thanks!

Paolo
Pablo Neira Ayuso Nov. 7, 2024, 11:26 a.m. UTC | #2
On Thu, Nov 07, 2024 at 11:55:47AM +0100, Paolo Abeni wrote:
> Hi,
> On 11/7/24 00:58, Pablo Neira Ayuso wrote:
> > 8c873e219970 ("netfilter: core: free hooks with call_rcu") removed
> > synchronize_net() call when unregistering basechain hook, however,
> > net_device removal event handler for the NFPROTO_NETDEV was not updated
> > to wait for RCU grace period.
> > 
> > Note that 835b803377f5 ("netfilter: nf_tables_netdev: unregister hooks
> > on net_device removal") does not remove basechain rules on device
> > removal, I was hinted to remove rules on net_device removal later, see
> > 5ebe0b0eec9d ("netfilter: nf_tables: destroy basechain and rules on
> > netdevice removal").
> > 
> > Although NETDEV_UNREGISTER event is guaranteed to be handled after
> > synchronize_net() call, this path needs to wait for rcu grace period via
> > rcu callback to release basechain hooks if netns is alive because an
> > ongoing netlink dump could be in progress (sockets hold a reference on
> > the netns).
> > 
> > Note that nf_tables_pre_exit_net() unregisters and releases basechain
> > hooks but it is possible to see NETDEV_UNREGISTER at a later stage in
> > the netns exit path, eg. veth peer device in another netns:
> > 
> >  cleanup_net()
> >   default_device_exit_batch()
> >    unregister_netdevice_many_notify()
> >     notifier_call_chain()
> >      nf_tables_netdev_event()
> >       __nft_release_basechain()
> > 
> > In this particular case, same rule of thumb applies: if netns is alive,
> > then wait for rcu grace period because netlink dump in the other netns
> > could be in progress. Otherwise, if the other netns is going away then
> > no netlink dump can be in progress and basechain hooks can be released
> > inmediately.
> > 
> > While at it, turn WARN_ON() into WARN_ON_ONCE() for the basechain
> > validation, which should not ever happen.
> > 
> > Fixes: 835b803377f5 ("netfilter: nf_tables_netdev: unregister hooks on net_device removal")
> > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > ---
> >  include/net/netfilter/nf_tables.h |  2 ++
> >  net/netfilter/nf_tables_api.c     | 41 +++++++++++++++++++++++++------
> >  2 files changed, 36 insertions(+), 7 deletions(-)
> > 
> > diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
> > index 91ae20cb7648..8dd8e278843d 100644
> > --- a/include/net/netfilter/nf_tables.h
> > +++ b/include/net/netfilter/nf_tables.h
> > @@ -1120,6 +1120,7 @@ struct nft_chain {
> >  	char				*name;
> >  	u16				udlen;
> >  	u8				*udata;
> > +	struct rcu_head			rcu_head;
> 
> I'm sorry to be pedantic but the CI is complaining about the lack of
> kdoc for this field...
> 
> >  
> >  	/* Only used during control plane commit phase: */
> >  	struct nft_rule_blob		*blob_next;
> > @@ -1282,6 +1283,7 @@ struct nft_table {
> >  	struct list_head		sets;
> >  	struct list_head		objects;
> >  	struct list_head		flowtables;
> > +	possible_net_t			net;
> 
> ... and this one ...
> 
> >  	u64				hgenerator;
> >  	u64				handle;
> >  	u32				use;
> 
> [...]
> > +static void nft_release_basechain_rcu(struct rcu_head *head)
> > +{
> > +	struct nft_chain *chain = container_of(head, struct nft_chain, rcu_head);
> > +	struct nft_ctx ctx = {
> > +		.family	= chain->table->family,
> > +		.chain	= chain,
> > +		.net	= read_pnet(&chain->table->net),
> > +	};
> > +
> > +	__nft_release_basechain_now(&ctx);
> > +	put_net(ctx.net);
> 
> ... and also about deprecated API usage here, the put_net_tracker()
> version should be preferred.
>
> Given this change will likely land on very old trees I guess the tracker
> conversion is better handled as a follow-up net-next patch.

Agreed.

> Would you mind addressing the kdoc above? Today PR will be handled by
> Jakub quite later, so there is a bit of time.

I will fix kdoc and resubmit.
diff mbox series

Patch

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 91ae20cb7648..8dd8e278843d 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -1120,6 +1120,7 @@  struct nft_chain {
 	char				*name;
 	u16				udlen;
 	u8				*udata;
+	struct rcu_head			rcu_head;
 
 	/* Only used during control plane commit phase: */
 	struct nft_rule_blob		*blob_next;
@@ -1282,6 +1283,7 @@  struct nft_table {
 	struct list_head		sets;
 	struct list_head		objects;
 	struct list_head		flowtables;
+	possible_net_t			net;
 	u64				hgenerator;
 	u64				handle;
 	u32				use;
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index a24fe62650a7..588a2757986c 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1495,6 +1495,7 @@  static int nf_tables_newtable(struct sk_buff *skb, const struct nfnl_info *info,
 	INIT_LIST_HEAD(&table->sets);
 	INIT_LIST_HEAD(&table->objects);
 	INIT_LIST_HEAD(&table->flowtables);
+	write_pnet(&table->net, net);
 	table->family = family;
 	table->flags = flags;
 	table->handle = ++nft_net->table_handle;
@@ -11430,22 +11431,48 @@  int nft_data_dump(struct sk_buff *skb, int attr, const struct nft_data *data,
 }
 EXPORT_SYMBOL_GPL(nft_data_dump);
 
-int __nft_release_basechain(struct nft_ctx *ctx)
+static void __nft_release_basechain_now(struct nft_ctx *ctx)
 {
 	struct nft_rule *rule, *nr;
 
-	if (WARN_ON(!nft_is_base_chain(ctx->chain)))
-		return 0;
-
-	nf_tables_unregister_hook(ctx->net, ctx->chain->table, ctx->chain);
 	list_for_each_entry_safe(rule, nr, &ctx->chain->rules, list) {
 		list_del(&rule->list);
-		nft_use_dec(&ctx->chain->use);
 		nf_tables_rule_release(ctx, rule);
 	}
+	nf_tables_chain_destroy(ctx->chain);
+}
+
+static void nft_release_basechain_rcu(struct rcu_head *head)
+{
+	struct nft_chain *chain = container_of(head, struct nft_chain, rcu_head);
+	struct nft_ctx ctx = {
+		.family	= chain->table->family,
+		.chain	= chain,
+		.net	= read_pnet(&chain->table->net),
+	};
+
+	__nft_release_basechain_now(&ctx);
+	put_net(ctx.net);
+}
+
+int __nft_release_basechain(struct nft_ctx *ctx)
+{
+	struct nft_rule *rule;
+
+	if (WARN_ON_ONCE(!nft_is_base_chain(ctx->chain)))
+		return 0;
+
+	nf_tables_unregister_hook(ctx->net, ctx->chain->table, ctx->chain);
+	list_for_each_entry(rule, &ctx->chain->rules, list)
+		nft_use_dec(&ctx->chain->use);
+
 	nft_chain_del(ctx->chain);
 	nft_use_dec(&ctx->table->use);
-	nf_tables_chain_destroy(ctx->chain);
+
+	if (maybe_get_net(ctx->net))
+		call_rcu(&ctx->chain->rcu_head, nft_release_basechain_rcu);
+	else
+		__nft_release_basechain_now(ctx);
 
 	return 0;
 }