From patchwork Tue Nov 21 12:27:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 13462976 X-Patchwork-Delegate: kuba@kernel.org Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:237:300::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4FDC292; Tue, 21 Nov 2023 04:28:26 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1r5Prc-0005B9-VB; Tue, 21 Nov 2023 13:28:24 +0100 From: Florian Westphal To: Cc: lorenzo@kernel.org, , Florian Westphal Subject: [PATCH nf-next 1/8] netfilter: flowtable: move nf_flowtable out of container structures Date: Tue, 21 Nov 2023 13:27:44 +0100 Message-ID: <20231121122800.13521-2-fw@strlen.de> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231121122800.13521-1-fw@strlen.de> References: <20231121122800.13521-1-fw@strlen.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org struct nf_flowtable is currently wholly embedded in either nft_flowtable or tcf_ct_flow_table. In order to allow flowtable acceleration via XDP, the XDP program will need to map struct net_device to struct nf_flowtable. To make this work reliably, make a clear separation of the frontend (nft, tc) and backend (nf_flowtable) representation. In this first patch, amke it so nft_flowtable and tcf_ct_flow_table only store pointers to an nf_flowtable structure. The main goal is to have follow patches that allow us to keep the nf_flowtable structure around for a bit longer (e.g. until after an rcu grace period has elapesed) when the frontend(s) are tearing the structures down. At this time, things are fine, but when xdp programs might be using the nf_flowtable structure as well we will need a way to ensure that no such users exist anymore. Right now there is inufficient guarantee: nftables only ensures that the netfilter hooks are unregistered, and tc only ensures the tc actions have been removed. Any future kfunc might still be called in parallel from an XDP program. The easies way to resolve this is to let the nf_flowtable core handle release and module reference counting itself. Signed-off-by: Florian Westphal --- include/net/netfilter/nf_tables.h | 15 +++++--- net/netfilter/nf_tables_api.c | 62 +++++++++++++++++-------------- net/netfilter/nft_flow_offload.c | 4 +- net/sched/act_ct.c | 33 +++++++++------- 4 files changed, 66 insertions(+), 48 deletions(-) diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index b157c5cafd14..362eca5d0451 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -1380,8 +1380,14 @@ void nft_unregister_obj(struct nft_object_type *obj_type); * @use: number of references to this flow table * @handle: unique object handle * @dev_name: array of device names - * @data: rhashtable and garbage collector - * @ops: array of hooks + * @hook_list: list of struct nft_hook + * @ft: pointer to underlying nf_flowtable + * + * This structure represents the low-level + * nf_flowtable within the nf_tables framework. + * + * nf_flowtable itself has no concept of 'tables', 'transactions', + * etc. They do not even have names. */ struct nft_flowtable { struct list_head list; @@ -1392,9 +1398,8 @@ struct nft_flowtable { u32 genmask:2; u32 use; u64 handle; - /* runtime data below here */ - struct list_head hook_list ____cacheline_aligned; - struct nf_flowtable data; + struct list_head hook_list; + struct nf_flowtable *ft; }; struct nft_flowtable *nft_flowtable_lookup(const struct nft_table *table, diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index c0a42989b982..ee0de8d9b49c 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -8100,11 +8100,11 @@ static int nft_flowtable_parse_hook(const struct nft_ctx *ctx, if (tb[NFTA_FLOWTABLE_HOOK_PRIORITY]) { priority = ntohl(nla_get_be32(tb[NFTA_FLOWTABLE_HOOK_PRIORITY])); - if (priority != flowtable->data.priority) + if (priority != flowtable->ft->priority) return -EOPNOTSUPP; } - flowtable_hook->priority = flowtable->data.priority; + flowtable_hook->priority = flowtable->ft->priority; flowtable_hook->num = flowtable->hooknum; } @@ -8121,8 +8121,8 @@ static int nft_flowtable_parse_hook(const struct nft_ctx *ctx, hook->ops.pf = NFPROTO_NETDEV; hook->ops.hooknum = flowtable_hook->num; hook->ops.priority = flowtable_hook->priority; - hook->ops.priv = &flowtable->data; - hook->ops.hook = flowtable->data.type->hook; + hook->ops.priv = flowtable->ft; + hook->ops.hook = flowtable->ft->type->hook; } return err; @@ -8164,8 +8164,8 @@ static void nft_unregister_flowtable_hook(struct net *net, struct nft_hook *hook) { nf_unregister_net_hook(net, &hook->ops); - flowtable->data.type->setup(&flowtable->data, hook->ops.dev, - FLOW_BLOCK_UNBIND); + flowtable->ft->type->setup(flowtable->ft, hook->ops.dev, + FLOW_BLOCK_UNBIND); } static void __nft_unregister_flowtable_net_hooks(struct net *net, @@ -8212,17 +8212,15 @@ static int nft_register_flowtable_net_hooks(struct net *net, } } - err = flowtable->data.type->setup(&flowtable->data, - hook->ops.dev, - FLOW_BLOCK_BIND); + err = flowtable->ft->type->setup(flowtable->ft, hook->ops.dev, + FLOW_BLOCK_BIND); if (err < 0) goto err_unregister_net_hooks; err = nf_register_net_hook(net, &hook->ops); if (err < 0) { - flowtable->data.type->setup(&flowtable->data, - hook->ops.dev, - FLOW_BLOCK_UNBIND); + flowtable->ft->type->setup(flowtable->ft, hook->ops.dev, + FLOW_BLOCK_UNBIND); goto err_unregister_net_hooks; } @@ -8284,13 +8282,13 @@ static int nft_flowtable_update(struct nft_ctx *ctx, const struct nlmsghdr *nlh, err = -EOPNOTSUPP; goto err_flowtable_update_hook; } - if ((flowtable->data.flags & NFT_FLOWTABLE_HW_OFFLOAD) ^ + if ((flowtable->ft->flags & NFT_FLOWTABLE_HW_OFFLOAD) ^ (flags & NFT_FLOWTABLE_HW_OFFLOAD)) { err = -EOPNOTSUPP; goto err_flowtable_update_hook; } } else { - flags = flowtable->data.flags; + flags = flowtable->ft->flags; } err = nft_register_flowtable_net_hooks(ctx->net, ctx->table, @@ -8402,18 +8400,24 @@ static int nf_tables_newflowtable(struct sk_buff *skb, goto err2; } + flowtable->ft = kzalloc(sizeof(*flowtable->ft), GFP_KERNEL_ACCOUNT); + if (!flowtable->ft) { + err = -ENOMEM; + goto err3; + } + if (nla[NFTA_FLOWTABLE_FLAGS]) { - flowtable->data.flags = + flowtable->ft->flags = ntohl(nla_get_be32(nla[NFTA_FLOWTABLE_FLAGS])); - if (flowtable->data.flags & ~NFT_FLOWTABLE_MASK) { + if (flowtable->ft->flags & ~NFT_FLOWTABLE_MASK) { err = -EOPNOTSUPP; goto err3; } } - write_pnet(&flowtable->data.net, net); - flowtable->data.type = type; - err = type->init(&flowtable->data); + write_pnet(&flowtable->ft->net, net); + flowtable->ft->type = type; + err = type->init(flowtable->ft); if (err < 0) goto err3; @@ -8423,7 +8427,7 @@ static int nf_tables_newflowtable(struct sk_buff *skb, goto err4; list_splice(&flowtable_hook.list, &flowtable->hook_list); - flowtable->data.priority = flowtable_hook.priority; + flowtable->ft->priority = flowtable_hook.priority; flowtable->hooknum = flowtable_hook.num; err = nft_register_flowtable_net_hooks(ctx.net, table, @@ -8448,8 +8452,9 @@ static int nf_tables_newflowtable(struct sk_buff *skb, kfree_rcu(hook, rcu); } err4: - flowtable->data.type->free(&flowtable->data); + flowtable->ft->type->free(flowtable->ft); err3: + kfree(flowtable->ft); module_put(type->owner); err2: kfree(flowtable->name); @@ -8603,14 +8608,14 @@ static int nf_tables_fill_flowtable_info(struct sk_buff *skb, struct net *net, } if (nla_put_be32(skb, NFTA_FLOWTABLE_USE, htonl(flowtable->use)) || - nla_put_be32(skb, NFTA_FLOWTABLE_FLAGS, htonl(flowtable->data.flags))) + nla_put_be32(skb, NFTA_FLOWTABLE_FLAGS, htonl(flowtable->ft->flags))) goto nla_put_failure; nest = nla_nest_start_noflag(skb, NFTA_FLOWTABLE_HOOK); if (!nest) goto nla_put_failure; if (nla_put_be32(skb, NFTA_FLOWTABLE_HOOK_NUM, htonl(flowtable->hooknum)) || - nla_put_be32(skb, NFTA_FLOWTABLE_HOOK_PRIORITY, htonl(flowtable->data.priority))) + nla_put_be32(skb, NFTA_FLOWTABLE_HOOK_PRIORITY, htonl(flowtable->ft->priority))) goto nla_put_failure; nest_devs = nla_nest_start_noflag(skb, NFTA_FLOWTABLE_HOOK_DEVS); @@ -8825,15 +8830,16 @@ static void nf_tables_flowtable_destroy(struct nft_flowtable *flowtable) { struct nft_hook *hook, *next; - flowtable->data.type->free(&flowtable->data); + flowtable->ft->type->free(flowtable->ft); list_for_each_entry_safe(hook, next, &flowtable->hook_list, list) { - flowtable->data.type->setup(&flowtable->data, hook->ops.dev, - FLOW_BLOCK_UNBIND); + flowtable->ft->type->setup(flowtable->ft, hook->ops.dev, + FLOW_BLOCK_UNBIND); list_del_rcu(&hook->list); kfree(hook); } kfree(flowtable->name); - module_put(flowtable->data.type->owner); + module_put(flowtable->ft->type->owner); + kfree(flowtable->ft); kfree(flowtable); } @@ -10164,7 +10170,7 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) break; case NFT_MSG_NEWFLOWTABLE: if (nft_trans_flowtable_update(trans)) { - nft_trans_flowtable(trans)->data.flags = + nft_trans_flowtable(trans)->ft->flags = nft_trans_flowtable_flags(trans); nf_tables_flowtable_notify(&trans->ctx, nft_trans_flowtable(trans), diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c index ab3362c483b4..83b631470bab 100644 --- a/net/netfilter/nft_flow_offload.c +++ b/net/netfilter/nft_flow_offload.c @@ -196,7 +196,7 @@ static void nft_dev_forward_path(struct nf_flow_route *route, int i; if (nft_dev_fill_forward_path(route, dst, ct, dir, ha, &stack) >= 0) - nft_dev_path_info(&stack, &info, ha, &ft->data); + nft_dev_path_info(&stack, &info, ha, ft->ft); if (!info.indev || !nft_flowtable_find_dev(info.indev, ft)) return; @@ -293,7 +293,7 @@ static void nft_flow_offload_eval(const struct nft_expr *expr, const struct nft_pktinfo *pkt) { struct nft_flow_offload *priv = nft_expr_priv(expr); - struct nf_flowtable *flowtable = &priv->flowtable->data; + struct nf_flowtable *flowtable = priv->flowtable->ft; struct tcphdr _tcph, *tcph = NULL; struct nf_flow_route route = {}; enum ip_conntrack_info ctinfo; diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index b3f4a503ee2b..2a6362becd71 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -45,7 +45,7 @@ struct tcf_ct_flow_table { struct rhash_head node; /* In zones tables */ struct rcu_work rwork; - struct nf_flowtable nf_ft; + struct nf_flowtable *nf_ft; refcount_t ref; u16 zone; @@ -305,6 +305,7 @@ static int tcf_ct_flow_table_get(struct net *net, struct tcf_ct_params *params) ct_ft = kzalloc(sizeof(*ct_ft), GFP_KERNEL); if (!ct_ft) goto err_alloc; + refcount_set(&ct_ft->ref, 1); ct_ft->zone = params->zone; @@ -312,24 +313,29 @@ static int tcf_ct_flow_table_get(struct net *net, struct tcf_ct_params *params) if (err) goto err_insert; - ct_ft->nf_ft.type = &flowtable_ct; - ct_ft->nf_ft.flags |= NF_FLOWTABLE_HW_OFFLOAD | - NF_FLOWTABLE_COUNTER; - err = nf_flow_table_init(&ct_ft->nf_ft); + ct_ft->nf_ft = kzalloc(sizeof(*ct_ft->nf_ft), GFP_KERNEL); + if (!ct_ft->nf_ft) + goto err_alloc; + + ct_ft->nf_ft->type = &flowtable_ct; + ct_ft->nf_ft->flags |= NF_FLOWTABLE_HW_OFFLOAD | + NF_FLOWTABLE_COUNTER; + err = nf_flow_table_init(ct_ft->nf_ft); if (err) goto err_init; - write_pnet(&ct_ft->nf_ft.net, net); + write_pnet(&ct_ft->nf_ft->net, net); __module_get(THIS_MODULE); out_unlock: params->ct_ft = ct_ft; - params->nf_ft = &ct_ft->nf_ft; + params->nf_ft = ct_ft->nf_ft; mutex_unlock(&zones_mutex); return 0; err_init: rhashtable_remove_fast(&zones_ht, &ct_ft->node, zones_params); + kfree(ct_ft->nf_ft); err_insert: kfree(ct_ft); err_alloc: @@ -345,16 +351,17 @@ static void tcf_ct_flow_table_cleanup_work(struct work_struct *work) ct_ft = container_of(to_rcu_work(work), struct tcf_ct_flow_table, rwork); - nf_flow_table_free(&ct_ft->nf_ft); + nf_flow_table_free(ct_ft->nf_ft); /* Remove any remaining callbacks before cleanup */ - block = &ct_ft->nf_ft.flow_block; - down_write(&ct_ft->nf_ft.flow_block_lock); + block = &ct_ft->nf_ft->flow_block; + down_write(&ct_ft->nf_ft->flow_block_lock); list_for_each_entry_safe(block_cb, tmp_cb, &block->cb_list, list) { list_del(&block_cb->list); flow_block_cb_free(block_cb); } - up_write(&ct_ft->nf_ft.flow_block_lock); + up_write(&ct_ft->nf_ft->flow_block_lock); + kfree(ct_ft->nf_ft); kfree(ct_ft); module_put(THIS_MODULE); @@ -417,7 +424,7 @@ static void tcf_ct_flow_table_add(struct tcf_ct_flow_table *ct_ft, tcf_ct_flow_tc_ifidx(entry, act_ct_ext, FLOW_OFFLOAD_DIR_REPLY); } - err = flow_offload_add(&ct_ft->nf_ft, entry); + err = flow_offload_add(ct_ft->nf_ft, entry); if (err) goto err_add; @@ -625,7 +632,7 @@ static bool tcf_ct_flow_table_lookup(struct tcf_ct_params *p, struct sk_buff *skb, u8 family) { - struct nf_flowtable *nf_ft = &p->ct_ft->nf_ft; + struct nf_flowtable *nf_ft = p->ct_ft->nf_ft; struct flow_offload_tuple_rhash *tuplehash; struct flow_offload_tuple tuple = {}; enum ip_conntrack_info ctinfo; From patchwork Tue Nov 21 12:27:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 13462977 X-Patchwork-Delegate: kuba@kernel.org Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:237:300::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6051D126; Tue, 21 Nov 2023 04:28:30 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1r5Prh-0005BP-2E; Tue, 21 Nov 2023 13:28:29 +0100 From: Florian Westphal To: Cc: lorenzo@kernel.org, , Florian Westphal Subject: [PATCH nf-next 2/8] netfilter: nf_flowtable: replace init callback with a create one Date: Tue, 21 Nov 2023 13:27:45 +0100 Message-ID: <20231121122800.13521-3-fw@strlen.de> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231121122800.13521-1-fw@strlen.de> References: <20231121122800.13521-1-fw@strlen.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org Let the flowtable core allocate the nf_flowtable structure itself. While at it, move the net and type assignment into the flowtable core. The returned nf_flowtable pointer can still be released with plain kfree(), this will be changed in a followup patch. Signed-off-by: Florian Westphal --- include/net/netfilter/nf_flow_table.h | 5 +++-- net/netfilter/nf_flow_table_core.c | 19 ++++++++++++++----- net/netfilter/nf_flow_table_inet.c | 6 +++--- net/netfilter/nf_tables_api.c | 8 +------- net/sched/act_ct.c | 12 ++++-------- 5 files changed, 25 insertions(+), 25 deletions(-) diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h index 36351e441316..d365eabd4a3c 100644 --- a/include/net/netfilter/nf_flow_table.h +++ b/include/net/netfilter/nf_flow_table.h @@ -52,7 +52,8 @@ struct nf_flow_rule { struct nf_flowtable_type { struct list_head list; int family; - int (*init)(struct nf_flowtable *ft); + struct nf_flowtable * (*create)(struct net *net, + const struct nf_flowtable_type *type); bool (*gc)(const struct flow_offload *flow); int (*setup)(struct nf_flowtable *ft, struct net_device *dev, @@ -279,7 +280,7 @@ void nf_flow_table_gc_cleanup(struct nf_flowtable *flowtable, struct net_device *dev); void nf_flow_table_cleanup(struct net_device *dev); -int nf_flow_table_init(struct nf_flowtable *flow_table); +struct nf_flowtable *nf_flow_table_create(struct net *net, const struct nf_flowtable_type *type); void nf_flow_table_free(struct nf_flowtable *flow_table); void flow_offload_teardown(struct flow_offload *flow); diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c index 920a5a29ae1d..375fc9c24149 100644 --- a/net/netfilter/nf_flow_table_core.c +++ b/net/netfilter/nf_flow_table_core.c @@ -531,29 +531,38 @@ void nf_flow_dnat_port(const struct flow_offload *flow, struct sk_buff *skb, } EXPORT_SYMBOL_GPL(nf_flow_dnat_port); -int nf_flow_table_init(struct nf_flowtable *flowtable) +struct nf_flowtable *nf_flow_table_create(struct net *net, const struct nf_flowtable_type *type) { + struct nf_flowtable *flowtable = kzalloc(sizeof(*flowtable), GFP_KERNEL_ACCOUNT); int err; + if (!flowtable) + return NULL; + INIT_DELAYED_WORK(&flowtable->gc_work, nf_flow_offload_work_gc); flow_block_init(&flowtable->flow_block); init_rwsem(&flowtable->flow_block_lock); err = rhashtable_init(&flowtable->rhashtable, &nf_flow_offload_rhash_params); - if (err < 0) - return err; + if (err < 0) { + kfree(flowtable); + return NULL; + } queue_delayed_work(system_power_efficient_wq, &flowtable->gc_work, HZ); + write_pnet(&flowtable->net, net); + flowtable->type = type; + mutex_lock(&flowtable_lock); list_add(&flowtable->list, &flowtables); mutex_unlock(&flowtable_lock); - return 0; + return flowtable; } -EXPORT_SYMBOL_GPL(nf_flow_table_init); +EXPORT_SYMBOL_GPL(nf_flow_table_create); static void nf_flow_table_do_cleanup(struct nf_flowtable *flow_table, struct flow_offload *flow, void *data) diff --git a/net/netfilter/nf_flow_table_inet.c b/net/netfilter/nf_flow_table_inet.c index 9505f9d188ff..7b20a073573d 100644 --- a/net/netfilter/nf_flow_table_inet.c +++ b/net/netfilter/nf_flow_table_inet.c @@ -63,7 +63,7 @@ static int nf_flow_rule_route_inet(struct net *net, static struct nf_flowtable_type flowtable_inet = { .family = NFPROTO_INET, - .init = nf_flow_table_init, + .create = nf_flow_table_create, .setup = nf_flow_table_offload_setup, .action = nf_flow_rule_route_inet, .free = nf_flow_table_free, @@ -73,7 +73,7 @@ static struct nf_flowtable_type flowtable_inet = { static struct nf_flowtable_type flowtable_ipv4 = { .family = NFPROTO_IPV4, - .init = nf_flow_table_init, + .create = nf_flow_table_create, .setup = nf_flow_table_offload_setup, .action = nf_flow_rule_route_ipv4, .free = nf_flow_table_free, @@ -83,7 +83,7 @@ static struct nf_flowtable_type flowtable_ipv4 = { static struct nf_flowtable_type flowtable_ipv6 = { .family = NFPROTO_IPV6, - .init = nf_flow_table_init, + .create = nf_flow_table_create, .setup = nf_flow_table_offload_setup, .action = nf_flow_rule_route_ipv6, .free = nf_flow_table_free, diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index ee0de8d9b49c..cce82fef4488 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -8400,7 +8400,7 @@ static int nf_tables_newflowtable(struct sk_buff *skb, goto err2; } - flowtable->ft = kzalloc(sizeof(*flowtable->ft), GFP_KERNEL_ACCOUNT); + flowtable->ft = type->create(net, type); if (!flowtable->ft) { err = -ENOMEM; goto err3; @@ -8415,12 +8415,6 @@ static int nf_tables_newflowtable(struct sk_buff *skb, } } - write_pnet(&flowtable->ft->net, net); - flowtable->ft->type = type; - err = type->init(flowtable->ft); - if (err < 0) - goto err3; - err = nft_flowtable_parse_hook(&ctx, nla, &flowtable_hook, flowtable, extack, true); if (err < 0) diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index 2a6362becd71..80869cc52348 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -313,17 +313,13 @@ static int tcf_ct_flow_table_get(struct net *net, struct tcf_ct_params *params) if (err) goto err_insert; - ct_ft->nf_ft = kzalloc(sizeof(*ct_ft->nf_ft), GFP_KERNEL); + ct_ft->nf_ft->type = &flowtable_ct; + ct_ft->nf_ft = nf_flow_table_create(net, &flowtable_ct); if (!ct_ft->nf_ft) - goto err_alloc; + goto err_create; - ct_ft->nf_ft->type = &flowtable_ct; ct_ft->nf_ft->flags |= NF_FLOWTABLE_HW_OFFLOAD | NF_FLOWTABLE_COUNTER; - err = nf_flow_table_init(ct_ft->nf_ft); - if (err) - goto err_init; - write_pnet(&ct_ft->nf_ft->net, net); __module_get(THIS_MODULE); out_unlock: @@ -333,7 +329,7 @@ static int tcf_ct_flow_table_get(struct net *net, struct tcf_ct_params *params) return 0; -err_init: +err_create: rhashtable_remove_fast(&zones_ht, &ct_ft->node, zones_params); kfree(ct_ft->nf_ft); err_insert: From patchwork Tue Nov 21 12:27:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 13462978 X-Patchwork-Delegate: kuba@kernel.org Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:237:300::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7111B126; Tue, 21 Nov 2023 04:28:34 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1r5Prl-0005Bo-4H; Tue, 21 Nov 2023 13:28:33 +0100 From: Florian Westphal To: Cc: lorenzo@kernel.org, , Florian Westphal Subject: [PATCH nf-next 3/8] netfilter: nf_flowtable: make free a real free function Date: Tue, 21 Nov 2023 13:27:46 +0100 Message-ID: <20231121122800.13521-4-fw@strlen.de> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231121122800.13521-1-fw@strlen.de> References: <20231121122800.13521-1-fw@strlen.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org The nf_flowtable 'free' function only frees the internal data structures, e.g. the rhashtable. Make it so it releases the entire nf_flowtable. This wasn't done before because the nf_flowtable structure used to be embedded into another frontend-representation struct. This is no longer the case, nf_flowtable gets allocated by ->create(), and therefore should also be released via ->free(). This also moves the module_put call into the nf_flowtable core. A followup patch will delay the actual freeing until another rcu grace period has elapsed. Signed-off-by: Florian Westphal --- net/netfilter/nf_flow_table_core.c | 2 ++ net/netfilter/nf_tables_api.c | 12 ++++-------- net/sched/act_ct.c | 6 ++---- 3 files changed, 8 insertions(+), 12 deletions(-) diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c index 375fc9c24149..70cc4e0d5ac9 100644 --- a/net/netfilter/nf_flow_table_core.c +++ b/net/netfilter/nf_flow_table_core.c @@ -612,6 +612,8 @@ void nf_flow_table_free(struct nf_flowtable *flow_table) nf_flow_table_gc_run(flow_table); nf_flow_table_offload_flush_cleanup(flow_table); rhashtable_destroy(&flow_table->rhashtable); + module_put(flow_table->type->owner); + kfree(flow_table); } EXPORT_SYMBOL_GPL(nf_flow_table_free); diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index cce82fef4488..e779e275d694 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -8402,8 +8402,9 @@ static int nf_tables_newflowtable(struct sk_buff *skb, flowtable->ft = type->create(net, type); if (!flowtable->ft) { + module_put(type->owner); err = -ENOMEM; - goto err3; + goto err2; } if (nla[NFTA_FLOWTABLE_FLAGS]) { @@ -8411,7 +8412,7 @@ static int nf_tables_newflowtable(struct sk_buff *skb, ntohl(nla_get_be32(nla[NFTA_FLOWTABLE_FLAGS])); if (flowtable->ft->flags & ~NFT_FLOWTABLE_MASK) { err = -EOPNOTSUPP; - goto err3; + goto err4; } } @@ -8447,9 +8448,6 @@ static int nf_tables_newflowtable(struct sk_buff *skb, } err4: flowtable->ft->type->free(flowtable->ft); -err3: - kfree(flowtable->ft); - module_put(type->owner); err2: kfree(flowtable->name); err1: @@ -8824,7 +8822,6 @@ static void nf_tables_flowtable_destroy(struct nft_flowtable *flowtable) { struct nft_hook *hook, *next; - flowtable->ft->type->free(flowtable->ft); list_for_each_entry_safe(hook, next, &flowtable->hook_list, list) { flowtable->ft->type->setup(flowtable->ft, hook->ops.dev, FLOW_BLOCK_UNBIND); @@ -8832,8 +8829,7 @@ static void nf_tables_flowtable_destroy(struct nft_flowtable *flowtable) kfree(hook); } kfree(flowtable->name); - module_put(flowtable->ft->type->owner); - kfree(flowtable->ft); + flowtable->ft->type->free(flowtable->ft); kfree(flowtable); } diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c index 80869cc52348..dc17b313c175 100644 --- a/net/sched/act_ct.c +++ b/net/sched/act_ct.c @@ -321,6 +321,7 @@ static int tcf_ct_flow_table_get(struct net *net, struct tcf_ct_params *params) ct_ft->nf_ft->flags |= NF_FLOWTABLE_HW_OFFLOAD | NF_FLOWTABLE_COUNTER; + /* released via nf_flow_table_free() */ __module_get(THIS_MODULE); out_unlock: params->ct_ft = ct_ft; @@ -347,7 +348,6 @@ static void tcf_ct_flow_table_cleanup_work(struct work_struct *work) ct_ft = container_of(to_rcu_work(work), struct tcf_ct_flow_table, rwork); - nf_flow_table_free(ct_ft->nf_ft); /* Remove any remaining callbacks before cleanup */ block = &ct_ft->nf_ft->flow_block; @@ -357,10 +357,8 @@ static void tcf_ct_flow_table_cleanup_work(struct work_struct *work) flow_block_cb_free(block_cb); } up_write(&ct_ft->nf_ft->flow_block_lock); - kfree(ct_ft->nf_ft); + nf_flow_table_free(ct_ft->nf_ft); kfree(ct_ft); - - module_put(THIS_MODULE); } static void tcf_ct_flow_table_put(struct tcf_ct_flow_table *ct_ft) From patchwork Tue Nov 21 12:27:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 13462979 X-Patchwork-Delegate: kuba@kernel.org Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:237:300::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A81D92; Tue, 21 Nov 2023 04:28:38 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1r5Prp-0005C6-62; Tue, 21 Nov 2023 13:28:37 +0100 From: Florian Westphal To: Cc: lorenzo@kernel.org, , Florian Westphal Subject: [PATCH nf-next 4/8] netfilter: nf_flowtable: delay flowtable release a second time Date: Tue, 21 Nov 2023 13:27:47 +0100 Message-ID: <20231121122800.13521-5-fw@strlen.de> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231121122800.13521-1-fw@strlen.de> References: <20231121122800.13521-1-fw@strlen.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org At this time the frontends (tc, nftables) ensure that the nf_flowtable is removed after the frontend hooks are gone (tc action, netfilter hooks). In both cases the nf_flowtable can be safely free'd as no packets will be traversing these hooks anymore. However, the upcoming nf_flowtable kfunc for XDP will still have a pointer to the nf_flowtable in its own net_device -> nf_flowtable mapping. This mapping is removed via the flow_block UNBIND callback. This callback however comes after an rcu grace period, not before. Therefore defer the real freeing via call_rcu so that no kfunc can possibly be using the nf_flowtable (or flow entries within) anymore. Signed-off-by: Florian Westphal --- include/net/netfilter/nf_flow_table.h | 2 ++ net/netfilter/nf_flow_table_core.c | 18 ++++++++++++++---- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h index d365eabd4a3c..6598ac455d17 100644 --- a/include/net/netfilter/nf_flow_table.h +++ b/include/net/netfilter/nf_flow_table.h @@ -83,6 +83,8 @@ struct nf_flowtable { struct flow_block flow_block; struct rw_semaphore flow_block_lock; /* Guards flow_block */ possible_net_t net; + + struct rcu_work rwork; }; static inline bool nf_flowtable_hw_offload(struct nf_flowtable *flowtable) diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c index 70cc4e0d5ac9..cae27f8f0f68 100644 --- a/net/netfilter/nf_flow_table_core.c +++ b/net/netfilter/nf_flow_table_core.c @@ -599,11 +599,11 @@ void nf_flow_table_cleanup(struct net_device *dev) } EXPORT_SYMBOL_GPL(nf_flow_table_cleanup); -void nf_flow_table_free(struct nf_flowtable *flow_table) +static void nf_flow_table_free_rwork(struct work_struct *work) { - mutex_lock(&flowtable_lock); - list_del(&flow_table->list); - mutex_unlock(&flowtable_lock); + struct nf_flowtable *flow_table; + + flow_table = container_of(to_rcu_work(work), struct nf_flowtable, rwork); cancel_delayed_work_sync(&flow_table->gc_work); nf_flow_table_offload_flush(flow_table); @@ -615,6 +615,16 @@ void nf_flow_table_free(struct nf_flowtable *flow_table) module_put(flow_table->type->owner); kfree(flow_table); } + +void nf_flow_table_free(struct nf_flowtable *flow_table) +{ + mutex_lock(&flowtable_lock); + list_del(&flow_table->list); + mutex_unlock(&flowtable_lock); + + INIT_RCU_WORK(&flow_table->rwork, nf_flow_table_free_rwork); + queue_rcu_work(system_power_efficient_wq, &flow_table->rwork); +} EXPORT_SYMBOL_GPL(nf_flow_table_free); static int nf_flow_table_init_net(struct net *net) From patchwork Tue Nov 21 12:27:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 13462980 X-Patchwork-Delegate: kuba@kernel.org Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:237:300::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9B457197; Tue, 21 Nov 2023 04:28:42 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1r5Prt-0005CT-7q; Tue, 21 Nov 2023 13:28:41 +0100 From: Florian Westphal To: Cc: lorenzo@kernel.org, , Florian Westphal Subject: [PATCH nf-next 5/8] netfilter: nf_tables: reject flowtable hw offload for same device Date: Tue, 21 Nov 2023 13:27:48 +0100 Message-ID: <20231121122800.13521-6-fw@strlen.de> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231121122800.13521-1-fw@strlen.de> References: <20231121122800.13521-1-fw@strlen.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org The existing check is not sufficient, as it only considers flowtables within the same table. In case of HW offload, we need check all flowtables that exist in the same net namespace. We can skip flowtables that are slated for removal (not active in next gen). Ideally this check would supersede the existing one, but this is probably too risky and might prevent existing configs from working. As is, you can do all of the following: table ip t { flowtable f { devices = { lo } } } table ip6 t { flowtable f { devices = { lo } } } table inet t { flowtable f { devices = { lo } } } ... but IMO this should not be possible in the first place. Disable this for HW offload. This is related to XDP flowtable work, the idea is to keep a small hashtable that has a 'struct net_device := struct nf_flowtable' map. This mapping must be unique. The idea is to add a "XDP OFFLOAD" flag to nftables api and then have this function run for 'xdp offload' case too. This is useful, because it would permit the "xdp offload" hashtable to tolerate duplicate keys -- they would only occur during transactional updates, e.g. a flush of the current table combined with atomic reload. Without this change, the nf_flowtable core cannot tell when flowtable is a real duplicate, or just a temporary artefact of the two-phase-commit protocol (i.e., the clashing entry is queued for removal). Signed-off-by: Florian Westphal --- net/netfilter/nf_tables_api.c | 36 +++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index e779e275d694..7437b997ca7e 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -8189,6 +8189,37 @@ static void nft_unregister_flowtable_net_hooks(struct net *net, __nft_unregister_flowtable_net_hooks(net, hook_list, false); } +static bool nft_flowtable_offload_clash(struct net *net, + const struct nft_hook *hook, + struct nft_flowtable *flowtable) +{ + const struct nftables_pernet *nft_net; + struct nft_flowtable *existing_ft; + const struct nft_table *table; + + /* No offload requested, no need to validate */ + if (!nf_flowtable_hw_offload(flowtable->ft)) + return false; + + nft_net = nft_pernet(net); + + list_for_each_entry(table, &nft_net->tables, list) { + list_for_each_entry(existing_ft, &table->flowtables, list) { + const struct nft_hook *hook2; + + if (!nft_is_active_next(net, existing_ft)) + continue; + + list_for_each_entry(hook2, &existing_ft->hook_list, list) { + if (hook->ops.dev == hook2->ops.dev) + return true; + } + } + } + + return false; +} + static int nft_register_flowtable_net_hooks(struct net *net, struct nft_table *table, struct list_head *hook_list, @@ -8199,6 +8230,11 @@ static int nft_register_flowtable_net_hooks(struct net *net, int err, i = 0; list_for_each_entry(hook, hook_list, list) { + if (nft_flowtable_offload_clash(net, hook, flowtable)) { + err = -EEXIST; + goto err_unregister_net_hooks; + } + list_for_each_entry(ft, &table->flowtables, list) { if (!nft_is_active_next(net, ft)) continue; From patchwork Tue Nov 21 12:27:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 13462981 X-Patchwork-Delegate: kuba@kernel.org Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:237:300::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A00FEE7; Tue, 21 Nov 2023 04:28:46 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1r5Prx-0005Cr-AX; Tue, 21 Nov 2023 13:28:45 +0100 From: Florian Westphal To: Cc: lorenzo@kernel.org, , Florian Westphal Subject: [PATCH nf-next 6/8] netfilter: nf_tables: add xdp offload flag Date: Tue, 21 Nov 2023 13:27:49 +0100 Message-ID: <20231121122800.13521-7-fw@strlen.de> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231121122800.13521-1-fw@strlen.de> References: <20231121122800.13521-1-fw@strlen.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org Also make sure this flag cannot be set or cleared, just like with the regular hw offload flag. Otherwise we'd have to add more complexity and explicitly withdraw or add the device to the netdev -> flowtable mapping. Signed-off-by: Florian Westphal --- include/net/netfilter/nf_flow_table.h | 1 + include/uapi/linux/netfilter/nf_tables.h | 5 ++++- net/netfilter/nf_tables_api.c | 14 ++++++++++++-- 3 files changed, 17 insertions(+), 3 deletions(-) diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h index 6598ac455d17..11985d9b8370 100644 --- a/include/net/netfilter/nf_flow_table.h +++ b/include/net/netfilter/nf_flow_table.h @@ -70,6 +70,7 @@ struct nf_flowtable_type { enum nf_flowtable_flags { NF_FLOWTABLE_HW_OFFLOAD = 0x1, /* NFT_FLOWTABLE_HW_OFFLOAD */ NF_FLOWTABLE_COUNTER = 0x2, /* NFT_FLOWTABLE_COUNTER */ + NF_FLOWTABLE_XDP_OFFLOAD = 0x4, /* NFT_FLOWTABLE_XDP_OFFLOAD */ }; struct nf_flowtable { diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index ca30232b7bc8..ed297dc77288 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -1675,12 +1675,15 @@ enum nft_object_attributes { * * @NFT_FLOWTABLE_HW_OFFLOAD: flowtable hardware offload is enabled * @NFT_FLOWTABLE_COUNTER: enable flow counters + * @NFT_FLOWTABLE_XDP_OFFLOAD: flowtable xdp offload is enabled */ enum nft_flowtable_flags { NFT_FLOWTABLE_HW_OFFLOAD = 0x1, NFT_FLOWTABLE_COUNTER = 0x2, + NFT_FLOWTABLE_XDP_OFFLOAD = 0x4, NFT_FLOWTABLE_MASK = (NFT_FLOWTABLE_HW_OFFLOAD | - NFT_FLOWTABLE_COUNTER) + NFT_FLOWTABLE_COUNTER | + NFT_FLOWTABLE_XDP_OFFLOAD), }; /** diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 7437b997ca7e..4e21311ec768 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -8288,6 +8288,15 @@ static void nft_hooks_destroy(struct list_head *hook_list) } } +static bool nft_flowtable_flag_changes(const struct nf_flowtable *ft, + unsigned int new_flags, enum nft_flowtable_flags flag) +{ + if ((ft->flags & flag) ^ (new_flags & flag)) + return true; + + return false; +} + static int nft_flowtable_update(struct nft_ctx *ctx, const struct nlmsghdr *nlh, struct nft_flowtable *flowtable, struct netlink_ext_ack *extack) @@ -8318,8 +8327,9 @@ static int nft_flowtable_update(struct nft_ctx *ctx, const struct nlmsghdr *nlh, err = -EOPNOTSUPP; goto err_flowtable_update_hook; } - if ((flowtable->ft->flags & NFT_FLOWTABLE_HW_OFFLOAD) ^ - (flags & NFT_FLOWTABLE_HW_OFFLOAD)) { + + if (nft_flowtable_flag_changes(flowtable->ft, flags, NFT_FLOWTABLE_HW_OFFLOAD) || + nft_flowtable_flag_changes(flowtable->ft, flags, NFT_FLOWTABLE_XDP_OFFLOAD)) { err = -EOPNOTSUPP; goto err_flowtable_update_hook; } From patchwork Tue Nov 21 12:27:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 13462982 X-Patchwork-Delegate: kuba@kernel.org Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:237:300::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B97D7110; Tue, 21 Nov 2023 04:28:50 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1r5Ps1-0005DD-CW; Tue, 21 Nov 2023 13:28:49 +0100 From: Florian Westphal To: Cc: lorenzo@kernel.org, , Florian Westphal Subject: [PATCH nf-next 7/8] netfilter: nf_tables: add flowtable map for xdp offload Date: Tue, 21 Nov 2023 13:27:50 +0100 Message-ID: <20231121122800.13521-8-fw@strlen.de> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231121122800.13521-1-fw@strlen.de> References: <20231121122800.13521-1-fw@strlen.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org This adds a small internal mapping table so that a new bpf (xdp) kfunc can perform lookups in a flowtable. As-is, xdp program has access to the device pointer, but no way to do a lookup in a flowtable -- there is no way to obtain the needed struct without questionable stunts. This allows to obtain an nf_flowtable pointer given a net_device structure. A device cannot be added to multiple flowtables, the mapping needs to be unique. This is enforced when a flowtables with the NF_FLOWTABLE_XDP_OFFLOAD was added. Exposure of this NF_FLOWTABLE_XDP_OFFLOAD in UAPI could be avoided, iff the 'net_device maps to 0 or 1 flowtable' paradigm is enforced regardless of offload-or-not flag. HOWEVER, that does break existing behaviour. An alternative would be to repurpose the hw offload flag by allowing XDP fallback when hw offload cannot be done due to lack of ndo callbacks. Signed-off-by: Florian Westphal Tested-by: Lorenzo Bianconi --- include/net/netfilter/nf_flow_table.h | 7 ++ net/netfilter/nf_flow_table_offload.c | 131 +++++++++++++++++++++++++- net/netfilter/nf_tables_api.c | 3 +- 3 files changed, 139 insertions(+), 2 deletions(-) diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h index 11985d9b8370..b8b7fcb98732 100644 --- a/include/net/netfilter/nf_flow_table.h +++ b/include/net/netfilter/nf_flow_table.h @@ -93,6 +93,11 @@ static inline bool nf_flowtable_hw_offload(struct nf_flowtable *flowtable) return flowtable->flags & NF_FLOWTABLE_HW_OFFLOAD; } +static inline bool nf_flowtable_xdp_offload(struct nf_flowtable *flowtable) +{ + return flowtable->flags & NF_FLOWTABLE_XDP_OFFLOAD; +} + enum flow_offload_tuple_dir { FLOW_OFFLOAD_DIR_ORIGINAL = IP_CT_DIR_ORIGINAL, FLOW_OFFLOAD_DIR_REPLY = IP_CT_DIR_REPLY, @@ -299,6 +304,8 @@ struct flow_ports { __be16 source, dest; }; +struct nf_flowtable *nf_flowtable_by_dev(const struct net_device *dev); + unsigned int nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb, const struct nf_hook_state *state); unsigned int nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb, diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c index a010b25076ca..9ec7aa4ad2e5 100644 --- a/net/netfilter/nf_flow_table_offload.c +++ b/net/netfilter/nf_flow_table_offload.c @@ -17,6 +17,92 @@ static struct workqueue_struct *nf_flow_offload_add_wq; static struct workqueue_struct *nf_flow_offload_del_wq; static struct workqueue_struct *nf_flow_offload_stats_wq; +struct flow_offload_xdp { + struct hlist_node hnode; + + unsigned long net_device_addr; + struct nf_flowtable *ft; + + struct rcu_head rcuhead; +}; + +#define NF_XDP_HT_BITS 4 +static DEFINE_HASHTABLE(nf_xdp_hashtable, NF_XDP_HT_BITS); +static DEFINE_MUTEX(nf_xdp_hashtable_lock); + +/* caller must hold rcu read lock */ +struct nf_flowtable *nf_flowtable_by_dev(const struct net_device *dev) +{ + unsigned long key = (unsigned long)dev; + const struct flow_offload_xdp *cur; + + hash_for_each_possible_rcu(nf_xdp_hashtable, cur, hnode, key) { + if (key == cur->net_device_addr) + return cur->ft; + } + + return NULL; +} + +static int nf_flowtable_by_dev_insert(struct nf_flowtable *ft, + const struct net_device *dev) +{ + unsigned long key = (unsigned long)dev; + struct flow_offload_xdp *cur; + int err = 0; + + mutex_lock(&nf_xdp_hashtable_lock); + hash_for_each_possible(nf_xdp_hashtable, cur, hnode, key) { + if (key != cur->net_device_addr) + continue; + err = -EEXIST; + break; + } + + if (err == 0) { + struct flow_offload_xdp *new; + + new = kzalloc(sizeof(*new), GFP_KERNEL_ACCOUNT); + if (new) { + new->net_device_addr = key; + new->ft = ft; + + hash_add_rcu(nf_xdp_hashtable, &new->hnode, key); + } else { + err = -ENOMEM; + } + } + + mutex_unlock(&nf_xdp_hashtable_lock); + + DEBUG_NET_WARN_ON_ONCE(err == 0 && nf_flowtable_by_dev(dev) != ft); + + return err; +} + +static void nf_flowtable_by_dev_remove(const struct net_device *dev) +{ + unsigned long key = (unsigned long)dev; + struct flow_offload_xdp *cur; + bool found = false; + + mutex_lock(&nf_xdp_hashtable_lock); + + hash_for_each_possible(nf_xdp_hashtable, cur, hnode, key) { + if (key != cur->net_device_addr) + continue; + + hash_del_rcu(&cur->hnode); + kfree_rcu(cur, rcuhead); + found = true; + break; + } + + mutex_unlock(&nf_xdp_hashtable_lock); + + WARN_ON_ONCE(!found); +} + struct flow_offload_work { struct list_head list; enum flow_cls_command cmd; @@ -1183,6 +1269,44 @@ static int nf_flow_table_offload_cmd(struct flow_block_offload *bo, return 0; } +static int nf_flow_offload_xdp_setup(struct nf_flowtable *flowtable, + struct net_device *dev, + enum flow_block_command cmd) +{ + if (!nf_flowtable_xdp_offload(flowtable)) + return 0; + + switch (cmd) { + case FLOW_BLOCK_BIND: + return nf_flowtable_by_dev_insert(flowtable, dev); + case FLOW_BLOCK_UNBIND: + nf_flowtable_by_dev_remove(dev); + return 0; + } + + WARN_ON_ONCE(1); + return 0; +} + +static void nf_flow_offload_xdp_cancel(struct nf_flowtable *flowtable, + struct net_device *dev, + enum flow_block_command cmd) +{ + if (!nf_flowtable_xdp_offload(flowtable)) + return; + + switch (cmd) { + case FLOW_BLOCK_BIND: + nf_flowtable_by_dev_remove(dev); + return; + case FLOW_BLOCK_UNBIND: + /* We do not re-bind in case hw offload would report error + * on *unregister*. + */ + break; + } +} + int nf_flow_table_offload_setup(struct nf_flowtable *flowtable, struct net_device *dev, enum flow_block_command cmd) @@ -1191,6 +1315,9 @@ int nf_flow_table_offload_setup(struct nf_flowtable *flowtable, struct flow_block_offload bo; int err; + if (nf_flow_offload_xdp_setup(flowtable, dev, cmd)) + return -EBUSY; + if (!nf_flowtable_hw_offload(flowtable)) return 0; @@ -1200,8 +1327,10 @@ int nf_flow_table_offload_setup(struct nf_flowtable *flowtable, else err = nf_flow_table_indr_offload_cmd(&bo, flowtable, dev, cmd, &extack); - if (err < 0) + if (err < 0) { + nf_flow_offload_xdp_cancel(flowtable, dev, cmd); return err; + } return nf_flow_table_block_setup(flowtable, &bo, cmd); } diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 4e21311ec768..223ca4d0e2a5 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -8198,7 +8198,8 @@ static bool nft_flowtable_offload_clash(struct net *net, const struct nft_table *table; /* No offload requested, no need to validate */ - if (!nf_flowtable_hw_offload(flowtable->ft)) + if (!nf_flowtable_hw_offload(flowtable->ft) && + !nf_flowtable_xdp_offload(flowtable->ft)) return false; nft_net = nft_pernet(net); From patchwork Tue Nov 21 12:27:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 13462983 X-Patchwork-Delegate: kuba@kernel.org Authentication-Results: smtp.subspace.kernel.org; dkim=none Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:237:300::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CBEC7126; Tue, 21 Nov 2023 04:28:54 -0800 (PST) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1r5Ps5-0005Da-El; Tue, 21 Nov 2023 13:28:53 +0100 From: Florian Westphal To: Cc: lorenzo@kernel.org, , Florian Westphal Subject: [PATCH nf-next 8/8] netfilter: nf_tables: permit duplicate flowtable mappings Date: Tue, 21 Nov 2023 13:27:51 +0100 Message-ID: <20231121122800.13521-9-fw@strlen.de> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231121122800.13521-1-fw@strlen.de> References: <20231121122800.13521-1-fw@strlen.de> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org The core ensures that no duplicate mapping (net_device x is always assigned to exactly one flowtable, or none at all) exists. Only exception: its assigned to a flowtable that is going away in this transaction. Therefore relax the existing check to permit the duplicate entry, it is only temporary. The existing entry will shadow the new one until the transaction is committed (old entry is removed) or aborted (new entry is removed). Signed-off-by: Florian Westphal --- net/netfilter/nf_flow_table_offload.c | 36 +++++++++++++++++++++++---- 1 file changed, 31 insertions(+), 5 deletions(-) diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c index 9ec7aa4ad2e5..b6503fd45871 100644 --- a/net/netfilter/nf_flow_table_offload.c +++ b/net/netfilter/nf_flow_table_offload.c @@ -49,13 +49,14 @@ static int nf_flowtable_by_dev_insert(struct nf_flowtable *ft, { unsigned long key = (unsigned long)dev; struct flow_offload_xdp *cur; + bool duplicate = false; int err = 0; mutex_lock(&nf_xdp_hashtable_lock); hash_for_each_possible(nf_xdp_hashtable, cur, hnode, key) { if (key != cur->net_device_addr) continue; - err = -EEXIST; + duplicate = true; break; } @@ -67,7 +68,20 @@ static int nf_flowtable_by_dev_insert(struct nf_flowtable *ft, new->net_device_addr = key; new->ft = ft; - hash_add_rcu(nf_xdp_hashtable, &new->hnode, key); + if (duplicate) { + u32 index = hash_min(key, HASH_BITS(nf_xdp_hashtable)); + + /* nf_tables_api.c makes sure that only a single flowtable + * has this device. + * + * Only exception: the flowtable is about to be removed. + * Thus we tolerate the duplicated entry, the 'old' entry + * will be unhashed after the transaction completes. + */ + hlist_add_tail_rcu(&new->hnode, &nf_xdp_hashtable[index]); + } else { + hash_add_rcu(nf_xdp_hashtable, &new->hnode, key); + } } else { err = -ENOMEM; } @@ -80,7 +94,8 @@ static int nf_flowtable_by_dev_insert(struct nf_flowtable *ft, return err; } -static void nf_flowtable_by_dev_remove(const struct net_device *dev) +static void nf_flowtable_by_dev_remove(const struct nf_flowtable *ft, + const struct net_device *dev) { unsigned long key = (unsigned long)dev; struct flow_offload_xdp *cur; @@ -92,6 +107,17 @@ static void nf_flowtable_by_dev_remove(const struct net_device *dev) if (key != cur->net_device_addr) continue; + /* duplicate entry, happens when current transaction + * removes flowtable f1 and adds flowtable f2, where both + * have *dev assigned to them. + * + * nf_tables_api.c makes sure that at most one + * flowtable,dev pair with 'xdp' flag enabled can exist + * in the same generation. + */ + if (cur->ft != ft) + continue; + hash_del_rcu(&cur->hnode); kfree_rcu(cur, rcuhead); found = true; @@ -1280,7 +1306,7 @@ static int nf_flow_offload_xdp_setup(struct nf_flowtable *flowtable, case FLOW_BLOCK_BIND: return nf_flowtable_by_dev_insert(flowtable, dev); case FLOW_BLOCK_UNBIND: - nf_flowtable_by_dev_remove(dev); + nf_flowtable_by_dev_remove(flowtable, dev); return 0; } @@ -1297,7 +1323,7 @@ static void nf_flow_offload_xdp_cancel(struct nf_flowtable *flowtable, switch (cmd) { case FLOW_BLOCK_BIND: - nf_flowtable_by_dev_remove(dev); + nf_flowtable_by_dev_remove(flowtable, dev); return; case FLOW_BLOCK_UNBIND: /* We do not re-bind in case hw offload would report error