Message ID | 20211015090934.2870662-1-zenczykowski@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [netfilter] netfilter: conntrack: udp: generate event on switch to stream timeout | expand |
On Fri, Oct 15, 2021 at 02:09:34AM -0700, Maciej Żenczykowski wrote: > From: Maciej Żenczykowski <maze@google.com> > > Without this it's hard to offload udp due to lack of a conntrack event > at the appropriate time (ie. once udp stream is established and stream > timeout is in effect). > > Without this udp conntrack events 'update/assured/timeout=30' > either need to be ignored, or polling loop needs to be <30 second > instead of <120 second. > > With this change: > [NEW] udp 17 30 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 [UNREPLIED] src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 > [UPDATE] udp 17 30 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 > [UPDATE] udp 17 30 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [ASSURED] > [UPDATE] udp 17 120 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [ASSURED] > [DESTROY] udp 17 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [ASSURED] > (the 3rd update/assured/120 event is new) Hm, I still don't understand why do you need this extra 3rd update/assured event event. Could you explain your usecase? Thanks. > Cc: Florian Westphal <fw@strlen.de> > Cc: Pablo Neira Ayuso <pablo@netfilter.org> > Fixes: d535c8a69c19 'netfilter: conntrack: udp: only extend timeout to stream mode after 2s' > Signed-off-by: Maciej Żenczykowski <maze@google.com> > --- > include/net/netfilter/nf_conntrack.h | 1 + > .../uapi/linux/netfilter/nf_conntrack_common.h | 1 + > net/netfilter/nf_conntrack_proto_udp.c | 16 ++++++++++++++-- > 3 files changed, 16 insertions(+), 2 deletions(-) > > diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h > index cc663c68ddc4..12029d616cfa 100644 > --- a/include/net/netfilter/nf_conntrack.h > +++ b/include/net/netfilter/nf_conntrack.h > @@ -26,6 +26,7 @@ > > struct nf_ct_udp { > unsigned long stream_ts; > + bool notified; > }; > > /* per conntrack: protocol private data */ > diff --git a/include/uapi/linux/netfilter/nf_conntrack_common.h b/include/uapi/linux/netfilter/nf_conntrack_common.h > index 4b3395082d15..a8e91b5821fa 100644 > --- a/include/uapi/linux/netfilter/nf_conntrack_common.h > +++ b/include/uapi/linux/netfilter/nf_conntrack_common.h > @@ -144,6 +144,7 @@ enum ip_conntrack_events { > IPCT_SECMARK, /* new security mark has been set */ > IPCT_LABEL, /* new connlabel has been set */ > IPCT_SYNPROXY, /* synproxy has been set */ > + IPCT_UDPSTREAM, /* udp stream has been set */ > #ifdef __KERNEL__ > __IPCT_MAX > #endif > diff --git a/net/netfilter/nf_conntrack_proto_udp.c b/net/netfilter/nf_conntrack_proto_udp.c > index 68911fcaa0f1..f0d9869aa30f 100644 > --- a/net/netfilter/nf_conntrack_proto_udp.c > +++ b/net/netfilter/nf_conntrack_proto_udp.c > @@ -97,18 +97,23 @@ int nf_conntrack_udp_packet(struct nf_conn *ct, > if (!timeouts) > timeouts = udp_get_timeouts(nf_ct_net(ct)); > > - if (!nf_ct_is_confirmed(ct)) > + if (!nf_ct_is_confirmed(ct)) { > ct->proto.udp.stream_ts = 2 * HZ + jiffies; > + ct->proto.udp.notified = false; > + } > > /* If we've seen traffic both ways, this is some kind of UDP > * stream. Set Assured. > */ > if (test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) { > unsigned long extra = timeouts[UDP_CT_UNREPLIED]; > + bool stream = false; > > /* Still active after two seconds? Extend timeout. */ > - if (time_after(jiffies, ct->proto.udp.stream_ts)) > + if (time_after(jiffies, ct->proto.udp.stream_ts)) { > extra = timeouts[UDP_CT_REPLIED]; > + stream = true; > + } > > nf_ct_refresh_acct(ct, ctinfo, skb, extra); > > @@ -116,9 +121,16 @@ int nf_conntrack_udp_packet(struct nf_conn *ct, > if (unlikely((ct->status & IPS_NAT_CLASH))) > return NF_ACCEPT; > > + if (stream) { > + stream = !ct->proto.udp.notified; > + ct->proto.udp.notified = true; > + } > + > /* Also, more likely to be important, and not a probe */ > if (!test_and_set_bit(IPS_ASSURED_BIT, &ct->status)) > nf_conntrack_event_cache(IPCT_ASSURED, ct); > + else if (stream) > + nf_conntrack_event_cache(IPCT_UDPSTREAM, ct); > } else { > nf_ct_refresh_acct(ct, ctinfo, skb, timeouts[UDP_CT_UNREPLIED]); > } > -- > 2.33.0.1079.g6e70778dc9-goog >
On Fri, Oct 15, 2021 at 2:39 AM Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > On Fri, Oct 15, 2021 at 02:09:34AM -0700, Maciej Żenczykowski wrote: > > From: Maciej Żenczykowski <maze@google.com> > > > > Without this it's hard to offload udp due to lack of a conntrack event > > at the appropriate time (ie. once udp stream is established and stream > > timeout is in effect). > > > > Without this udp conntrack events 'update/assured/timeout=30' > > either need to be ignored, or polling loop needs to be <30 second > > instead of <120 second. > > > > With this change: > > [NEW] udp 17 30 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 [UNREPLIED] src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 > > [UPDATE] udp 17 30 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 > > [UPDATE] udp 17 30 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [ASSURED] > > [UPDATE] udp 17 120 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [ASSURED] > > [DESTROY] udp 17 src=10.246.11.13 dst=216.239.35.0 sport=37282 dport=123 src=216.239.35.0 dst=10.246.11.13 sport=123 dport=37282 [ASSURED] > > (the 3rd update/assured/120 event is new) > > Hm, I still don't understand why do you need this extra 3rd > update/assured event event. Could you explain your usecase? Currently we populate a flow offload array on the assured event, and thus the flow in both directions starts bypassing the kernel. Hence conntrack timeout is no longer automatically refreshed - and there is no opportunity for the timeout to get bumped to the stream timeout of 120s - it stays at 30s. We periodically (every just over 60-ish seconds) check whether packets on a flow have been offloaded, and if so refresh the conntrack timeout. This isn't cheap and we don't want to do it even more often. However this 60s cycle > 30s non-stream udp timeout, so the kernel conntrack entry expires (and we must thus clear out the flow from the offload). This results in a broken udp stream - but only on newer kernels. Older kernels don't have this '2s' wait feature (which makes a lot of sense btw.) but as a result of this the conntrack assured event happens at the right time - when the timeout hits 120s (or 180s on even older kernels). By generating another assured event when the udp stream is 'confirmed' and the timeout is boosted from 30s to 120s we have an opportunity to ignore the first one (with timeout 30) and only populate the offload on the second one (with timeout 120). I'm not sure if I'm doing a good job of describing this. Ask again if it's not clear and I'll try again. > > Thanks. > > > Cc: Florian Westphal <fw@strlen.de> > > Cc: Pablo Neira Ayuso <pablo@netfilter.org> > > Fixes: d535c8a69c19 'netfilter: conntrack: udp: only extend timeout to stream mode after 2s' > > Signed-off-by: Maciej Żenczykowski <maze@google.com> > > --- > > include/net/netfilter/nf_conntrack.h | 1 + > > .../uapi/linux/netfilter/nf_conntrack_common.h | 1 + > > net/netfilter/nf_conntrack_proto_udp.c | 16 ++++++++++++++-- > > 3 files changed, 16 insertions(+), 2 deletions(-) > > > > diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h > > index cc663c68ddc4..12029d616cfa 100644 > > --- a/include/net/netfilter/nf_conntrack.h > > +++ b/include/net/netfilter/nf_conntrack.h > > @@ -26,6 +26,7 @@ > > > > struct nf_ct_udp { > > unsigned long stream_ts; > > + bool notified; > > }; > > > > /* per conntrack: protocol private data */ > > diff --git a/include/uapi/linux/netfilter/nf_conntrack_common.h b/include/uapi/linux/netfilter/nf_conntrack_common.h > > index 4b3395082d15..a8e91b5821fa 100644 > > --- a/include/uapi/linux/netfilter/nf_conntrack_common.h > > +++ b/include/uapi/linux/netfilter/nf_conntrack_common.h > > @@ -144,6 +144,7 @@ enum ip_conntrack_events { > > IPCT_SECMARK, /* new security mark has been set */ > > IPCT_LABEL, /* new connlabel has been set */ > > IPCT_SYNPROXY, /* synproxy has been set */ > > + IPCT_UDPSTREAM, /* udp stream has been set */ > > #ifdef __KERNEL__ > > __IPCT_MAX > > #endif > > diff --git a/net/netfilter/nf_conntrack_proto_udp.c b/net/netfilter/nf_conntrack_proto_udp.c > > index 68911fcaa0f1..f0d9869aa30f 100644 > > --- a/net/netfilter/nf_conntrack_proto_udp.c > > +++ b/net/netfilter/nf_conntrack_proto_udp.c > > @@ -97,18 +97,23 @@ int nf_conntrack_udp_packet(struct nf_conn *ct, > > if (!timeouts) > > timeouts = udp_get_timeouts(nf_ct_net(ct)); > > > > - if (!nf_ct_is_confirmed(ct)) > > + if (!nf_ct_is_confirmed(ct)) { > > ct->proto.udp.stream_ts = 2 * HZ + jiffies; > > + ct->proto.udp.notified = false; > > + } > > > > /* If we've seen traffic both ways, this is some kind of UDP > > * stream. Set Assured. > > */ > > if (test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) { > > unsigned long extra = timeouts[UDP_CT_UNREPLIED]; > > + bool stream = false; > > > > /* Still active after two seconds? Extend timeout. */ > > - if (time_after(jiffies, ct->proto.udp.stream_ts)) > > + if (time_after(jiffies, ct->proto.udp.stream_ts)) { > > extra = timeouts[UDP_CT_REPLIED]; > > + stream = true; > > + } > > > > nf_ct_refresh_acct(ct, ctinfo, skb, extra); > > > > @@ -116,9 +121,16 @@ int nf_conntrack_udp_packet(struct nf_conn *ct, > > if (unlikely((ct->status & IPS_NAT_CLASH))) > > return NF_ACCEPT; > > > > + if (stream) { > > + stream = !ct->proto.udp.notified; > > + ct->proto.udp.notified = true; > > + } > > + > > /* Also, more likely to be important, and not a probe */ > > if (!test_and_set_bit(IPS_ASSURED_BIT, &ct->status)) > > nf_conntrack_event_cache(IPCT_ASSURED, ct); > > + else if (stream) > > + nf_conntrack_event_cache(IPCT_UDPSTREAM, ct); > > } else { > > nf_ct_refresh_acct(ct, ctinfo, skb, timeouts[UDP_CT_UNREPLIED]); > > } > > -- > > 2.33.0.1079.g6e70778dc9-goog > >Maciej Żenczykowski, Kernel Networking Developer @ Google
Maciej Żenczykowski <zenczykowski@gmail.com> wrote: > > Hm, I still don't understand why do you need this extra 3rd > > update/assured event event. Could you explain your usecase? > > Currently we populate a flow offload array on the assured event, and > thus the flow in both directions starts bypassing the kernel. > Hence conntrack timeout is no longer automatically refreshed - and > there is no opportunity for the timeout to get bumped to the stream > timeout of 120s - it stays at 30s. > We periodically (every just over 60-ish seconds) check whether packets > on a flow have been offloaded, and if so refresh the conntrack > timeout. This isn't cheap and we don't want to do it even more often. > However this 60s cycle > 30s non-stream udp timeout, so the kernel > conntrack entry expires (and we must thus clear out the flow from the > offload). This results in a broken udp stream - but only on newer > kernels. Older kernels don't have this '2s' wait feature (which makes > a lot of sense btw.) but as a result of this the conntrack assured > event happens at the right time - when the timeout hits 120s (or 180s > on even older kernels). > > By generating another assured event when the udp stream is 'confirmed' > and the timeout is boosted from 30s to 120s we have an opportunity to > ignore the first one (with timeout 30) and only populate the offload > on the second one (with timeout 120). > > I'm not sure if I'm doing a good job of describing this. Ask again if > it's not clear and I'll try again. Thanks for explaining, no objections to this from my side. Do you think it makes sense to just delay setting the ASSURED bit until after the 2s period?
On Fri, Oct 15, 2021 at 2:57 AM Florian Westphal <fw@strlen.de> wrote: > > Maciej Żenczykowski <zenczykowski@gmail.com> wrote: > > > Hm, I still don't understand why do you need this extra 3rd > > > update/assured event event. Could you explain your usecase? > > > > Currently we populate a flow offload array on the assured event, and > > thus the flow in both directions starts bypassing the kernel. > > Hence conntrack timeout is no longer automatically refreshed - and > > there is no opportunity for the timeout to get bumped to the stream > > timeout of 120s - it stays at 30s. > > We periodically (every just over 60-ish seconds) check whether packets > > on a flow have been offloaded, and if so refresh the conntrack > > timeout. This isn't cheap and we don't want to do it even more often. > > However this 60s cycle > 30s non-stream udp timeout, so the kernel > > conntrack entry expires (and we must thus clear out the flow from the > > offload). This results in a broken udp stream - but only on newer > > kernels. Older kernels don't have this '2s' wait feature (which makes > > a lot of sense btw.) but as a result of this the conntrack assured > > event happens at the right time - when the timeout hits 120s (or 180s > > on even older kernels). > > > > By generating another assured event when the udp stream is 'confirmed' > > and the timeout is boosted from 30s to 120s we have an opportunity to > > ignore the first one (with timeout 30) and only populate the offload > > on the second one (with timeout 120). > > > > I'm not sure if I'm doing a good job of describing this. Ask again if > > it's not clear and I'll try again. > > Thanks for explaining, no objections to this from my side. > > Do you think it makes sense to just delay setting the ASSURED bit > until after the 2s period? That would work for this particular use case.... but I don't know if it's a good idea. I did of course think of it, but the commit message seemed to imply there's a good reason to set the assured bit earlier rather than later... A udp flow becoming bidirectional seems like an important event to notify about... Afterall, the UDP flow might become a stream 29 seconds after it becomes bidirectional... That seems like a pretty long time (and it's user configurable to be even longer) to delay the notification. I imagine the pair of you know best whether 2 events or delay assured event until stream timeout is applied makes more sense... - Maciej
Maciej Żenczykowski <zenczykowski@gmail.com> wrote: > On Fri, Oct 15, 2021 at 2:57 AM Florian Westphal <fw@strlen.de> wrote: > > Do you think it makes sense to just delay setting the ASSURED bit > > until after the 2s period? > > That would work for this particular use case.... but I don't know if > it's a good idea. > I did of course think of it, but the commit message seemed to imply > there's a good reason to set the assured bit earlier rather than > later... > > A udp flow becoming bidirectional seems like an important event to > notify about... > Afterall, the UDP flow might become a stream 29 seconds after it > becomes bidirectional... Oh right, never mind then. Acked-by: Florian Westphal <fw@strlen.de>
On Fri, Oct 15, 2021 at 03:15:07AM -0700, Maciej Żenczykowski wrote: > On Fri, Oct 15, 2021 at 2:57 AM Florian Westphal <fw@strlen.de> wrote: > > Maciej Żenczykowski <zenczykowski@gmail.com> wrote: [...] > A udp flow becoming bidirectional seems like an important event to > notify about... > Afterall, the UDP flow might become a stream 29 seconds after it > becomes bidirectional... > That seems like a pretty long time (and it's user configurable to be > even longer) to delay the notification. > > I imagine the pair of you know best whether 2 events or delay assured > event until stream timeout is applied makes more sense... This 2 events looks awkward to me, currently the model we have to report events is: - status bits are updated - flow has changed protocol state (TCP). but in this case, this is reporting a timer update. Timeout updates are not reported on events, since this would trigger too many events one per packet. What's the concern with delaying the IPS_ASSURED bit? By setting a lower timeout (30 second) my understanding is that this flow is less important to those that are in the stream state (120s), so these should also be candidate to be removed by early_drop. IIRC, the idea behind the stream concept is to reduce lifetime of shortlived UDP flows to release slots from the conntrack table earlier.
diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index cc663c68ddc4..12029d616cfa 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -26,6 +26,7 @@ struct nf_ct_udp { unsigned long stream_ts; + bool notified; }; /* per conntrack: protocol private data */ diff --git a/include/uapi/linux/netfilter/nf_conntrack_common.h b/include/uapi/linux/netfilter/nf_conntrack_common.h index 4b3395082d15..a8e91b5821fa 100644 --- a/include/uapi/linux/netfilter/nf_conntrack_common.h +++ b/include/uapi/linux/netfilter/nf_conntrack_common.h @@ -144,6 +144,7 @@ enum ip_conntrack_events { IPCT_SECMARK, /* new security mark has been set */ IPCT_LABEL, /* new connlabel has been set */ IPCT_SYNPROXY, /* synproxy has been set */ + IPCT_UDPSTREAM, /* udp stream has been set */ #ifdef __KERNEL__ __IPCT_MAX #endif diff --git a/net/netfilter/nf_conntrack_proto_udp.c b/net/netfilter/nf_conntrack_proto_udp.c index 68911fcaa0f1..f0d9869aa30f 100644 --- a/net/netfilter/nf_conntrack_proto_udp.c +++ b/net/netfilter/nf_conntrack_proto_udp.c @@ -97,18 +97,23 @@ int nf_conntrack_udp_packet(struct nf_conn *ct, if (!timeouts) timeouts = udp_get_timeouts(nf_ct_net(ct)); - if (!nf_ct_is_confirmed(ct)) + if (!nf_ct_is_confirmed(ct)) { ct->proto.udp.stream_ts = 2 * HZ + jiffies; + ct->proto.udp.notified = false; + } /* If we've seen traffic both ways, this is some kind of UDP * stream. Set Assured. */ if (test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) { unsigned long extra = timeouts[UDP_CT_UNREPLIED]; + bool stream = false; /* Still active after two seconds? Extend timeout. */ - if (time_after(jiffies, ct->proto.udp.stream_ts)) + if (time_after(jiffies, ct->proto.udp.stream_ts)) { extra = timeouts[UDP_CT_REPLIED]; + stream = true; + } nf_ct_refresh_acct(ct, ctinfo, skb, extra); @@ -116,9 +121,16 @@ int nf_conntrack_udp_packet(struct nf_conn *ct, if (unlikely((ct->status & IPS_NAT_CLASH))) return NF_ACCEPT; + if (stream) { + stream = !ct->proto.udp.notified; + ct->proto.udp.notified = true; + } + /* Also, more likely to be important, and not a probe */ if (!test_and_set_bit(IPS_ASSURED_BIT, &ct->status)) nf_conntrack_event_cache(IPCT_ASSURED, ct); + else if (stream) + nf_conntrack_event_cache(IPCT_UDPSTREAM, ct); } else { nf_ct_refresh_acct(ct, ctinfo, skb, timeouts[UDP_CT_UNREPLIED]); }