Message ID | 20211025141400.13698-3-fw@strlen.de (mailing list archive) |
---|---|
State | Accepted |
Commit | 8c9c296adfae9ea05f655d69e9f6e13daa86fb4a |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | vrf: rework interaction with netfilter/conntrack | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Series has a cover letter |
netdev/fixes_present | success | Fixes tag not required for -next series |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net-next |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | warning | 2 maintainers not CCed: davem@davemloft.net kuba@kernel.org |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Signed-off-by tag matches author and committer |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 2 this patch: 2 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | No Fixes tag |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 82 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 2 this patch: 2 |
netdev/header_inline | success | No static functions without inline keyword in header files |
On 10/25/21 8:14 AM, Florian Westphal wrote: > The VRF driver invokes netfilter for output+postrouting hooks so that users > can create rules that check for 'oif $vrf' rather than lower device name. > > This is a problem when NAT rules are configured. > > To avoid any conntrack involvement in round 1, tag skbs as 'untracked' > to prevent conntrack from picking them up. > > This gets cleared before the packet gets handed to the ip stack so > conntrack will be active on the second iteration. > > One remaining issue is that a rule like > > output ... oif $vrfname notrack > > won't propagate to the second round because we can't tell > 'notrack set via ruleset' and 'notrack set by vrf driver' apart. > However, this isn't a regression: the 'notrack' removal happens > instead of unconditional nf_reset_ct(). > I'd also like to avoid leaking more vrf specific conditionals into the > netfilter infra. > > For ingress, conntrack has already been done before the packet makes it > to the vrf driver, with this patch egress does connection tracking with > lower/physical device as well. > > Signed-off-by: Florian Westphal <fw@strlen.de> > --- > drivers/net/vrf.c | 28 ++++++++++++++++++++++++---- > 1 file changed, 24 insertions(+), 4 deletions(-) > Acked-by: David Ahern <dsahern@kernel.org>
Hi, One question about this. On Mon, Oct 25, 2021 at 04:14:00PM +0200, Florian Westphal wrote: > The VRF driver invokes netfilter for output+postrouting hooks so that users > can create rules that check for 'oif $vrf' rather than lower device name. If the motion for these hooks in the driver is to match for 'oif vrf', now that there is an egress hook, it might make more sense to filter from there based on the interface rather than adding these hook calls from the vrf driver? I wonder if, in the future, it makes sense to entirely disable these hooks in the vrf driver and rely on egress hook?
Pablo Neira Ayuso <pablo@netfilter.org> wrote: > If the motion for these hooks in the driver is to match for 'oif vrf', > now that there is an egress hook, it might make more sense to filter > from there based on the interface rather than adding these hook calls > from the vrf driver? > > I wonder if, in the future, it makes sense to entirely disable these > hooks in the vrf driver and rely on egress hook? Agree, it would be better to support ingress+egress hhoks from vrf so vrf specific filtering can be done per-device. I don't think we can just remove the existing NF_HOOK()s in vrf though. We could add toggles to disable them, but I'm not sure how to best expose that (ip link attribute, ethtool, sysctl ...)...?
On Tue, Oct 26, 2021 at 02:58:58PM +0200, Florian Westphal wrote: > Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > If the motion for these hooks in the driver is to match for 'oif vrf', > > now that there is an egress hook, it might make more sense to filter > > from there based on the interface rather than adding these hook calls > > from the vrf driver? > > > > I wonder if, in the future, it makes sense to entirely disable these > > hooks in the vrf driver and rely on egress hook? > > Agree, it would be better to support ingress+egress hhoks from vrf > so vrf specific filtering can be done per-device. > > I don't think we can just remove the existing NF_HOOK()s in vrf though. I understand, there are people relying on this. > We could add toggles to disable them, but I'm not sure how to best > expose that (ip link attribute, ethtool, sysctl ...)...? I would make it global toggle. As you mentioned it might be good to explore an alternative to this via the ingress+egress hooks now that the usecases are better known?
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index bf2fac913942..546aa1aacb77 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -35,6 +35,7 @@ #include <net/l3mdev.h> #include <net/fib_rules.h> #include <net/netns/generic.h> +#include <net/netfilter/nf_conntrack.h> #define DRV_NAME "vrf" #define DRV_VERSION "1.1" @@ -424,12 +425,26 @@ static int vrf_local_xmit(struct sk_buff *skb, struct net_device *dev, return NETDEV_TX_OK; } +static void vrf_nf_set_untracked(struct sk_buff *skb) +{ + if (skb_get_nfct(skb) == 0) + nf_ct_set(skb, NULL, IP_CT_UNTRACKED); +} + +static void vrf_nf_reset_ct(struct sk_buff *skb) +{ + if (skb_get_nfct(skb) == IP_CT_UNTRACKED) + nf_reset_ct(skb); +} + #if IS_ENABLED(CONFIG_IPV6) static int vrf_ip6_local_out(struct net *net, struct sock *sk, struct sk_buff *skb) { int err; + vrf_nf_reset_ct(skb); + err = nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_OUT, net, sk, skb, NULL, skb_dst(skb)->dev, dst_output); @@ -508,6 +523,8 @@ static int vrf_ip_local_out(struct net *net, struct sock *sk, { int err; + vrf_nf_reset_ct(skb); + err = nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_OUT, net, sk, skb, NULL, skb_dst(skb)->dev, dst_output); if (likely(err == 1)) @@ -626,8 +643,7 @@ static void vrf_finish_direct(struct sk_buff *skb) skb_pull(skb, ETH_HLEN); } - /* reset skb device */ - nf_reset_ct(skb); + vrf_nf_reset_ct(skb); } #if IS_ENABLED(CONFIG_IPV6) @@ -641,7 +657,7 @@ static int vrf_finish_output6(struct net *net, struct sock *sk, struct neighbour *neigh; int ret; - nf_reset_ct(skb); + vrf_nf_reset_ct(skb); skb->protocol = htons(ETH_P_IPV6); skb->dev = dev; @@ -752,6 +768,8 @@ static struct sk_buff *vrf_ip6_out_direct(struct net_device *vrf_dev, skb->dev = vrf_dev; + vrf_nf_set_untracked(skb); + err = nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_OUT, net, sk, skb, NULL, vrf_dev, vrf_ip6_out_direct_finish); @@ -858,7 +876,7 @@ static int vrf_finish_output(struct net *net, struct sock *sk, struct sk_buff *s struct neighbour *neigh; bool is_v6gw = false; - nf_reset_ct(skb); + vrf_nf_reset_ct(skb); /* Be paranoid, rather than too clever. */ if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) { @@ -980,6 +998,8 @@ static struct sk_buff *vrf_ip_out_direct(struct net_device *vrf_dev, skb->dev = vrf_dev; + vrf_nf_set_untracked(skb); + err = nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_OUT, net, sk, skb, NULL, vrf_dev, vrf_ip_out_direct_finish);
The VRF driver invokes netfilter for output+postrouting hooks so that users can create rules that check for 'oif $vrf' rather than lower device name. This is a problem when NAT rules are configured. To avoid any conntrack involvement in round 1, tag skbs as 'untracked' to prevent conntrack from picking them up. This gets cleared before the packet gets handed to the ip stack so conntrack will be active on the second iteration. One remaining issue is that a rule like output ... oif $vrfname notrack won't propagate to the second round because we can't tell 'notrack set via ruleset' and 'notrack set by vrf driver' apart. However, this isn't a regression: the 'notrack' removal happens instead of unconditional nf_reset_ct(). I'd also like to avoid leaking more vrf specific conditionals into the netfilter infra. For ingress, conntrack has already been done before the packet makes it to the vrf driver, with this patch egress does connection tracking with lower/physical device as well. Signed-off-by: Florian Westphal <fw@strlen.de> --- drivers/net/vrf.c | 28 ++++++++++++++++++++++++---- 1 file changed, 24 insertions(+), 4 deletions(-)