From patchwork Wed Sep 7 15:41:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 12969145 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA1C7C54EE9 for ; Wed, 7 Sep 2022 15:41:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229672AbiIGPlh (ORCPT ); Wed, 7 Sep 2022 11:41:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38890 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229536AbiIGPlf (ORCPT ); Wed, 7 Sep 2022 11:41:35 -0400 Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A3F249E0FF; Wed, 7 Sep 2022 08:41:33 -0700 (PDT) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1oVxBB-00065Z-2v; Wed, 07 Sep 2022 17:41:29 +0200 From: Florian Westphal To: Cc: Eric Dumazet , "David S. Miller" , Paolo Abeni , Jakub Kicinski , , Florian Westphal Subject: [PATCH net-next 1/8] netfilter: conntrack: prepare tcp_in_window for ternary return value Date: Wed, 7 Sep 2022 17:41:03 +0200 Message-Id: <20220907154110.8898-2-fw@strlen.de> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220907154110.8898-1-fw@strlen.de> References: <20220907154110.8898-1-fw@strlen.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org tcp_in_window returns true if the packet is in window and false if it is not. If its outside of window, packet will be treated as INVALID. There are corner cases where the packet should still be tracked, because rulesets may drop or log such packets, even though they can occur during normal operation, such as overly delayed acks. In extreme cases, connection may hang forever because conntrack state differs from real state. There is no retransmission for ACKs. In case of ACK loss after conntrack processing, its possible that a connection can be stuck because the actual retransmits are considered stale ("SEQ is under the lower bound (already ACKed data retransmitted)". The problem is made worse by carrier-grade-nat which can also result in stale packets from old connections to get treated as 'recent' packets in conntrack (it doesn't support tcp timestamps at this time). Prepare tcp_in_window() to return an enum that tells the desired action (in-window/accept, bogus/drop). A third action (accept the packet as in-window, but do not change state) is added in a followup patch. Signed-off-by: Florian Westphal --- net/netfilter/nf_conntrack_proto_tcp.c | 136 ++++++++++++++++--------- 1 file changed, 87 insertions(+), 49 deletions(-) diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c index a634c72b1ffc..1731b82dcc97 100644 --- a/net/netfilter/nf_conntrack_proto_tcp.c +++ b/net/netfilter/nf_conntrack_proto_tcp.c @@ -47,6 +47,11 @@ static const char *const tcp_conntrack_names[] = { "SYN_SENT2", }; +enum nf_ct_tcp_action { + NFCT_TCP_INVALID, + NFCT_TCP_ACCEPT, +}; + #define SECS * HZ #define MINS * 60 SECS #define HOURS * 60 MINS @@ -472,13 +477,37 @@ static void tcp_init_sender(struct ip_ct_tcp_state *sender, } } -static bool tcp_in_window(struct nf_conn *ct, - enum ip_conntrack_dir dir, - unsigned int index, - const struct sk_buff *skb, - unsigned int dataoff, - const struct tcphdr *tcph, - const struct nf_hook_state *hook_state) +__printf(6, 7) +static enum nf_ct_tcp_action nf_tcp_log_invalid(const struct sk_buff *skb, + const struct nf_conn *ct, + const struct nf_hook_state *state, + const struct ip_ct_tcp_state *sender, + enum nf_ct_tcp_action ret, + const char *fmt, ...) +{ + const struct nf_tcp_net *tn = nf_tcp_pernet(nf_ct_net(ct)); + struct va_format vaf; + va_list args; + bool be_liberal; + + be_liberal = sender->flags & IP_CT_TCP_FLAG_BE_LIBERAL || tn->tcp_be_liberal; + if (be_liberal) + return NFCT_TCP_ACCEPT; + + va_start(args, fmt); + vaf.fmt = fmt; + vaf.va = &args; + nf_ct_l4proto_log_invalid(skb, ct, state, "%pV", &vaf); + va_end(args); + + return ret; +} + +static enum nf_ct_tcp_action +tcp_in_window(struct nf_conn *ct, enum ip_conntrack_dir dir, + unsigned int index, const struct sk_buff *skb, + unsigned int dataoff, const struct tcphdr *tcph, + const struct nf_hook_state *hook_state) { struct ip_ct_tcp *state = &ct->proto.tcp; struct net *net = nf_ct_net(ct); @@ -486,9 +515,9 @@ static bool tcp_in_window(struct nf_conn *ct, struct ip_ct_tcp_state *sender = &state->seen[dir]; struct ip_ct_tcp_state *receiver = &state->seen[!dir]; __u32 seq, ack, sack, end, win, swin; - u16 win_raw; + bool in_recv_win, seq_ok; s32 receiver_offset; - bool res, in_recv_win; + u16 win_raw; /* * Get the required data from the packet. @@ -517,7 +546,7 @@ static bool tcp_in_window(struct nf_conn *ct, end, win); if (!tcph->ack) /* Simultaneous open */ - return true; + return NFCT_TCP_ACCEPT; } else { /* * We are in the middle of a connection, @@ -560,7 +589,7 @@ static bool tcp_in_window(struct nf_conn *ct, end, win); if (dir == IP_CT_DIR_REPLY && !tcph->ack) - return true; + return NFCT_TCP_ACCEPT; } if (!(tcph->ack)) { @@ -584,13 +613,52 @@ static bool tcp_in_window(struct nf_conn *ct, */ seq = end = sender->td_end; + seq_ok = before(seq, sender->td_maxend + 1); + if (!seq_ok) { + u32 overshot = end - sender->td_maxend + 1; + bool ack_ok; + + ack_ok = after(sack, receiver->td_end - MAXACKWINDOW(sender) - 1); + in_recv_win = receiver->td_maxwin && + after(end, sender->td_end - receiver->td_maxwin - 1); + + if (in_recv_win && + ack_ok && + overshot <= receiver->td_maxwin && + before(sack, receiver->td_end + 1)) { + /* Work around TCPs that send more bytes than allowed by + * the receive window. + * + * If the (marked as invalid) packet is allowed to pass by + * the ruleset and the peer acks this data, then its possible + * all future packets will trigger 'ACK is over upper bound' check. + * + * Thus if only the sequence check fails then do update td_end so + * possible ACK for this data can update internal state. + */ + sender->td_end = end; + sender->flags |= IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED; + + nf_ct_l4proto_log_invalid(skb, ct, hook_state, + "%u bytes more than expected", overshot); + return NFCT_TCP_ACCEPT; + } + + return nf_tcp_log_invalid(skb, ct, hook_state, sender, NFCT_TCP_INVALID, + "SEQ is over upper bound %u (over the window of the receiver)", + sender->td_maxend + 1); + } + + if (!before(sack, receiver->td_end + 1)) + return nf_tcp_log_invalid(skb, ct, hook_state, sender, NFCT_TCP_INVALID, + "ACK is over upper bound %u (ACKed data not seen yet)", + receiver->td_end + 1); + /* Is the ending sequence in the receive window (if available)? */ in_recv_win = !receiver->td_maxwin || after(end, sender->td_end - receiver->td_maxwin - 1); - if (before(seq, sender->td_maxend + 1) && - in_recv_win && - before(sack, receiver->td_end + 1) && + if (in_recv_win && after(sack, receiver->td_end - MAXACKWINDOW(sender) - 1)) { /* * Take into account window scaling (RFC 1323). @@ -648,44 +716,12 @@ static bool tcp_in_window(struct nf_conn *ct, state->retrans = 0; } } - res = true; } else { - res = false; if (sender->flags & IP_CT_TCP_FLAG_BE_LIBERAL || tn->tcp_be_liberal) - res = true; - if (!res) { - bool seq_ok = before(seq, sender->td_maxend + 1); - - if (!seq_ok) { - u32 overshot = end - sender->td_maxend + 1; - bool ack_ok; - - ack_ok = after(sack, receiver->td_end - MAXACKWINDOW(sender) - 1); - - if (in_recv_win && - ack_ok && - overshot <= receiver->td_maxwin && - before(sack, receiver->td_end + 1)) { - /* Work around TCPs that send more bytes than allowed by - * the receive window. - * - * If the (marked as invalid) packet is allowed to pass by - * the ruleset and the peer acks this data, then its possible - * all future packets will trigger 'ACK is over upper bound' check. - * - * Thus if only the sequence check fails then do update td_end so - * possible ACK for this data can update internal state. - */ - sender->td_end = end; - sender->flags |= IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED; - - nf_ct_l4proto_log_invalid(skb, ct, hook_state, - "%u bytes more than expected", overshot); - return res; - } - } + return NFCT_TCP_ACCEPT; + { nf_ct_l4proto_log_invalid(skb, ct, hook_state, "%s", before(seq, sender->td_maxend + 1) ? @@ -697,9 +733,11 @@ static bool tcp_in_window(struct nf_conn *ct, : "SEQ is under the lower bound (already ACKed data retransmitted)" : "SEQ is over the upper bound (over the window of the receiver)"); } + + return NFCT_TCP_INVALID; } - return res; + return NFCT_TCP_ACCEPT; } /* table of valid flag combinations - PUSH, ECE and CWR are always valid */ From patchwork Wed Sep 7 15:41:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 12969146 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A2C6DC54EE9 for ; Wed, 7 Sep 2022 15:41:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229811AbiIGPlk (ORCPT ); Wed, 7 Sep 2022 11:41:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38966 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229637AbiIGPlh (ORCPT ); Wed, 7 Sep 2022 11:41:37 -0400 Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9393CA74F9; Wed, 7 Sep 2022 08:41:36 -0700 (PDT) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1oVxBC-00065t-Fq; Wed, 07 Sep 2022 17:41:30 +0200 From: Florian Westphal To: Cc: Eric Dumazet , "David S. Miller" , Paolo Abeni , Jakub Kicinski , , Florian Westphal Subject: [PATCH net-next 2/8] netfilter: conntrack: ignore overly delayed tcp packets Date: Wed, 7 Sep 2022 17:41:04 +0200 Message-Id: <20220907154110.8898-3-fw@strlen.de> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220907154110.8898-1-fw@strlen.de> References: <20220907154110.8898-1-fw@strlen.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org If 'nf_conntrack_tcp_loose' is off (the default), tcp packets that are outside of the current window are marked as INVALID. nf/iptables rulesets often drop such packets via 'ct state invalid' or similar checks. For overly delayed acks, this can be a nuisance if such 'invalid' packets are also logged. Since they are not invalid in a strict sense, just ignore them, i.e. conntrack won't extend timeout or change state so that they do not match invalid state rules anymore. This also avoids unwantend connection stalls in case conntrack considers retransmission (of data that did not reach the peer) as too old. The else branch of the conditional becomes obsolete. Next patch will reformant the now always-true if condition. The existing workaround for data that exceeds the calculated receive window is adjusted to use the 'ignore' state so that these packets do not refresh the timeout or change state other than updating ->td_end. Signed-off-by: Florian Westphal --- net/netfilter/nf_conntrack_proto_tcp.c | 49 +++++++++++--------------- 1 file changed, 21 insertions(+), 28 deletions(-) diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c index 1731b82dcc97..2d6925ef269f 100644 --- a/net/netfilter/nf_conntrack_proto_tcp.c +++ b/net/netfilter/nf_conntrack_proto_tcp.c @@ -48,6 +48,7 @@ static const char *const tcp_conntrack_names[] = { }; enum nf_ct_tcp_action { + NFCT_TCP_IGNORE, NFCT_TCP_INVALID, NFCT_TCP_ACCEPT, }; @@ -510,8 +511,6 @@ tcp_in_window(struct nf_conn *ct, enum ip_conntrack_dir dir, const struct nf_hook_state *hook_state) { struct ip_ct_tcp *state = &ct->proto.tcp; - struct net *net = nf_ct_net(ct); - struct nf_tcp_net *tn = nf_tcp_pernet(net); struct ip_ct_tcp_state *sender = &state->seen[dir]; struct ip_ct_tcp_state *receiver = &state->seen[!dir]; __u32 seq, ack, sack, end, win, swin; @@ -639,9 +638,8 @@ tcp_in_window(struct nf_conn *ct, enum ip_conntrack_dir dir, sender->td_end = end; sender->flags |= IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED; - nf_ct_l4proto_log_invalid(skb, ct, hook_state, + return nf_tcp_log_invalid(skb, ct, hook_state, sender, NFCT_TCP_IGNORE, "%u bytes more than expected", overshot); - return NFCT_TCP_ACCEPT; } return nf_tcp_log_invalid(skb, ct, hook_state, sender, NFCT_TCP_INVALID, @@ -657,9 +655,15 @@ tcp_in_window(struct nf_conn *ct, enum ip_conntrack_dir dir, /* Is the ending sequence in the receive window (if available)? */ in_recv_win = !receiver->td_maxwin || after(end, sender->td_end - receiver->td_maxwin - 1); - - if (in_recv_win && - after(sack, receiver->td_end - MAXACKWINDOW(sender) - 1)) { + if (!in_recv_win) + return nf_tcp_log_invalid(skb, ct, hook_state, sender, NFCT_TCP_IGNORE, + "SEQ is under lower bound %u (already ACKed data retransmitted)", + sender->td_end - receiver->td_maxwin - 1); + if (!after(sack, receiver->td_end - MAXACKWINDOW(sender) - 1)) + return nf_tcp_log_invalid(skb, ct, hook_state, sender, NFCT_TCP_IGNORE, + "ignored ACK under lower bound %u (possible overly delayed)", + receiver->td_end - MAXACKWINDOW(sender) - 1); + if (1) { /* * Take into account window scaling (RFC 1323). */ @@ -716,25 +720,6 @@ tcp_in_window(struct nf_conn *ct, enum ip_conntrack_dir dir, state->retrans = 0; } } - } else { - if (sender->flags & IP_CT_TCP_FLAG_BE_LIBERAL || - tn->tcp_be_liberal) - return NFCT_TCP_ACCEPT; - - { - nf_ct_l4proto_log_invalid(skb, ct, hook_state, - "%s", - before(seq, sender->td_maxend + 1) ? - in_recv_win ? - before(sack, receiver->td_end + 1) ? - after(sack, receiver->td_end - MAXACKWINDOW(sender) - 1) ? "BUG" - : "ACK is under the lower bound (possible overly delayed ACK)" - : "ACK is over the upper bound (ACKed data not seen yet)" - : "SEQ is under the lower bound (already ACKed data retransmitted)" - : "SEQ is over the upper bound (over the window of the receiver)"); - } - - return NFCT_TCP_INVALID; } return NFCT_TCP_ACCEPT; @@ -899,6 +884,7 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, struct nf_conntrack_tuple *tuple; enum tcp_conntrack new_state, old_state; unsigned int index, *timeouts; + enum nf_ct_tcp_action res; enum ip_conntrack_dir dir; const struct tcphdr *th; struct tcphdr _tcph; @@ -1164,10 +1150,17 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, break; } - if (!tcp_in_window(ct, dir, index, - skb, dataoff, th, state)) { + res = tcp_in_window(ct, dir, index, + skb, dataoff, th, state); + switch (res) { + case NFCT_TCP_IGNORE: + spin_unlock_bh(&ct->lock); + return NF_ACCEPT; + case NFCT_TCP_INVALID: spin_unlock_bh(&ct->lock); return -NF_ACCEPT; + case NFCT_TCP_ACCEPT: + break; } in_window: /* From now on we have got in-window packets */ From patchwork Wed Sep 7 15:41:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 12969147 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71CBFC54EE9 for ; Wed, 7 Sep 2022 15:41:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229916AbiIGPlm (ORCPT ); Wed, 7 Sep 2022 11:41:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38976 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229721AbiIGPlh (ORCPT ); Wed, 7 Sep 2022 11:41:37 -0400 Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E207B98A54; Wed, 7 Sep 2022 08:41:36 -0700 (PDT) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1oVxBD-00066D-QS; Wed, 07 Sep 2022 17:41:31 +0200 From: Florian Westphal To: Cc: Eric Dumazet , "David S. Miller" , Paolo Abeni , Jakub Kicinski , , Florian Westphal Subject: [PATCH net-next 3/8] netfilter: conntrack: remove unneeded indent level Date: Wed, 7 Sep 2022 17:41:05 +0200 Message-Id: <20220907154110.8898-4-fw@strlen.de> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220907154110.8898-1-fw@strlen.de> References: <20220907154110.8898-1-fw@strlen.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org After previous patch, the conditional branch is obsolete, reformat it. gcc generates same code as before this change. Signed-off-by: Florian Westphal --- net/netfilter/nf_conntrack_proto_tcp.c | 98 ++++++++++++-------------- 1 file changed, 45 insertions(+), 53 deletions(-) diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c index 2d6925ef269f..0574290326d1 100644 --- a/net/netfilter/nf_conntrack_proto_tcp.c +++ b/net/netfilter/nf_conntrack_proto_tcp.c @@ -663,62 +663,54 @@ tcp_in_window(struct nf_conn *ct, enum ip_conntrack_dir dir, return nf_tcp_log_invalid(skb, ct, hook_state, sender, NFCT_TCP_IGNORE, "ignored ACK under lower bound %u (possible overly delayed)", receiver->td_end - MAXACKWINDOW(sender) - 1); - if (1) { - /* - * Take into account window scaling (RFC 1323). - */ - if (!tcph->syn) - win <<= sender->td_scale; - - /* - * Update sender data. - */ - swin = win + (sack - ack); - if (sender->td_maxwin < swin) - sender->td_maxwin = swin; - if (after(end, sender->td_end)) { - sender->td_end = end; - sender->flags |= IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED; - } - if (tcph->ack) { - if (!(sender->flags & IP_CT_TCP_FLAG_MAXACK_SET)) { - sender->td_maxack = ack; - sender->flags |= IP_CT_TCP_FLAG_MAXACK_SET; - } else if (after(ack, sender->td_maxack)) - sender->td_maxack = ack; - } - /* - * Update receiver data. - */ - if (receiver->td_maxwin != 0 && after(end, sender->td_maxend)) - receiver->td_maxwin += end - sender->td_maxend; - if (after(sack + win, receiver->td_maxend - 1)) { - receiver->td_maxend = sack + win; - if (win == 0) - receiver->td_maxend++; + /* Take into account window scaling (RFC 1323). */ + if (!tcph->syn) + win <<= sender->td_scale; + + /* Update sender data. */ + swin = win + (sack - ack); + if (sender->td_maxwin < swin) + sender->td_maxwin = swin; + if (after(end, sender->td_end)) { + sender->td_end = end; + sender->flags |= IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED; + } + if (tcph->ack) { + if (!(sender->flags & IP_CT_TCP_FLAG_MAXACK_SET)) { + sender->td_maxack = ack; + sender->flags |= IP_CT_TCP_FLAG_MAXACK_SET; + } else if (after(ack, sender->td_maxack)) { + sender->td_maxack = ack; } - if (ack == receiver->td_end) - receiver->flags &= ~IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED; + } - /* - * Check retransmissions. - */ - if (index == TCP_ACK_SET) { - if (state->last_dir == dir - && state->last_seq == seq - && state->last_ack == ack - && state->last_end == end - && state->last_win == win_raw) - state->retrans++; - else { - state->last_dir = dir; - state->last_seq = seq; - state->last_ack = ack; - state->last_end = end; - state->last_win = win_raw; - state->retrans = 0; - } + /* Update receiver data. */ + if (receiver->td_maxwin != 0 && after(end, sender->td_maxend)) + receiver->td_maxwin += end - sender->td_maxend; + if (after(sack + win, receiver->td_maxend - 1)) { + receiver->td_maxend = sack + win; + if (win == 0) + receiver->td_maxend++; + } + if (ack == receiver->td_end) + receiver->flags &= ~IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED; + + /* Check retransmissions. */ + if (index == TCP_ACK_SET) { + if (state->last_dir == dir && + state->last_seq == seq && + state->last_ack == ack && + state->last_end == end && + state->last_win == win_raw) { + state->retrans++; + } else { + state->last_dir = dir; + state->last_seq = seq; + state->last_ack = ack; + state->last_end = end; + state->last_win = win_raw; + state->retrans = 0; } } From patchwork Wed Sep 7 15:41:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 12969148 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3A3AC6FA89 for ; Wed, 7 Sep 2022 15:41:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229731AbiIGPlq (ORCPT ); Wed, 7 Sep 2022 11:41:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39084 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229890AbiIGPll (ORCPT ); Wed, 7 Sep 2022 11:41:41 -0400 Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 88C759E0FF; Wed, 7 Sep 2022 08:41:40 -0700 (PDT) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1oVxBI-00066j-4a; Wed, 07 Sep 2022 17:41:36 +0200 From: Florian Westphal To: Cc: Eric Dumazet , "David S. Miller" , Paolo Abeni , Jakub Kicinski , , Florian Westphal Subject: [PATCH net-next 4/8] netfilter: conntrack: reduce timeout when receiving out-of-window fin or rst Date: Wed, 7 Sep 2022 17:41:06 +0200 Message-Id: <20220907154110.8898-5-fw@strlen.de> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220907154110.8898-1-fw@strlen.de> References: <20220907154110.8898-1-fw@strlen.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org In case the endpoints and conntrack go out-of-sync, i.e. there is disagreement wrt. validy of sequence/ack numbers between conntracks internal state and those of the endpoints, connections can hang for a long time (until ESTABLISHED timeout). This adds a check to detect a fin/fin exchange even if those are invalid. The timeout is then lowered to UNACKED (default 300s). Signed-off-by: Florian Westphal --- net/netfilter/nf_conntrack_proto_tcp.c | 58 ++++++++++++++++++++++++++ 1 file changed, 58 insertions(+) diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c index 0574290326d1..656631083177 100644 --- a/net/netfilter/nf_conntrack_proto_tcp.c +++ b/net/netfilter/nf_conntrack_proto_tcp.c @@ -717,6 +717,63 @@ tcp_in_window(struct nf_conn *ct, enum ip_conntrack_dir dir, return NFCT_TCP_ACCEPT; } +static void __cold nf_tcp_handle_invalid(struct nf_conn *ct, + enum ip_conntrack_dir dir, + int index, + const struct sk_buff *skb, + const struct nf_hook_state *hook_state) +{ + const unsigned int *timeouts; + const struct nf_tcp_net *tn; + unsigned int timeout; + u32 expires; + + if (!test_bit(IPS_ASSURED_BIT, &ct->status) || + test_bit(IPS_FIXED_TIMEOUT_BIT, &ct->status)) + return; + + /* We don't want to have connections hanging around in ESTABLISHED + * state for long time 'just because' conntrack deemed a FIN/RST + * out-of-window. + * + * Shrink the timeout just like when there is unacked data. + * This speeds up eviction of 'dead' connections where the + * connection and conntracks internal state are out of sync. + */ + switch (index) { + case TCP_RST_SET: + case TCP_FIN_SET: + break; + default: + return; + } + + if (ct->proto.tcp.last_dir != dir && + (ct->proto.tcp.last_index == TCP_FIN_SET || + ct->proto.tcp.last_index == TCP_RST_SET)) { + expires = nf_ct_expires(ct); + if (expires < 120 * HZ) + return; + + tn = nf_tcp_pernet(nf_ct_net(ct)); + timeouts = nf_ct_timeout_lookup(ct); + if (!timeouts) + timeouts = tn->timeouts; + + timeout = READ_ONCE(timeouts[TCP_CONNTRACK_UNACK]); + if (expires > timeout) { + nf_ct_l4proto_log_invalid(skb, ct, hook_state, + "packet (index %d, dir %d) response for index %d lower timeout to %u", + index, dir, ct->proto.tcp.last_index, timeout); + + WRITE_ONCE(ct->timeout, timeout + nfct_time_stamp); + } + } else { + ct->proto.tcp.last_index = index; + ct->proto.tcp.last_dir = dir; + } +} + /* table of valid flag combinations - PUSH, ECE and CWR are always valid */ static const u8 tcp_valid_flags[(TCPHDR_FIN|TCPHDR_SYN|TCPHDR_RST|TCPHDR_ACK| TCPHDR_URG) + 1] = @@ -1149,6 +1206,7 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct, spin_unlock_bh(&ct->lock); return NF_ACCEPT; case NFCT_TCP_INVALID: + nf_tcp_handle_invalid(ct, dir, index, skb, state); spin_unlock_bh(&ct->lock); return -NF_ACCEPT; case NFCT_TCP_ACCEPT: From patchwork Wed Sep 7 15:41:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 12969149 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B250FC6FA8A for ; Wed, 7 Sep 2022 15:41:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229890AbiIGPls (ORCPT ); Wed, 7 Sep 2022 11:41:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229892AbiIGPll (ORCPT ); Wed, 7 Sep 2022 11:41:41 -0400 Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 931C5A5713; Wed, 7 Sep 2022 08:41:40 -0700 (PDT) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1oVxBI-000670-HY; Wed, 07 Sep 2022 17:41:36 +0200 From: Florian Westphal To: Cc: Eric Dumazet , "David S. Miller" , Paolo Abeni , Jakub Kicinski , , Florian Westphal Subject: [PATCH net-next 5/8] netfilter: remove NFPROTO_DECNET Date: Wed, 7 Sep 2022 17:41:07 +0200 Message-Id: <20220907154110.8898-6-fw@strlen.de> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220907154110.8898-1-fw@strlen.de> References: <20220907154110.8898-1-fw@strlen.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Decnet has been removed. so no need to reserve space in arrays for it. Signed-off-by: Florian Westphal --- include/uapi/linux/netfilter.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/uapi/linux/netfilter.h b/include/uapi/linux/netfilter.h index 53411ccc69db..5a79ccb76701 100644 --- a/include/uapi/linux/netfilter.h +++ b/include/uapi/linux/netfilter.h @@ -63,7 +63,9 @@ enum { NFPROTO_NETDEV = 5, NFPROTO_BRIDGE = 7, NFPROTO_IPV6 = 10, +#ifndef __KERNEL__ /* no longer supported by kernel */ NFPROTO_DECNET = 12, +#endif NFPROTO_NUMPROTO, }; From patchwork Wed Sep 7 15:41:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 12969151 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2BA1C54EE9 for ; Wed, 7 Sep 2022 15:41:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229919AbiIGPlu (ORCPT ); Wed, 7 Sep 2022 11:41:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39162 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229783AbiIGPlr (ORCPT ); Wed, 7 Sep 2022 11:41:47 -0400 Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D5D2EB0881; Wed, 7 Sep 2022 08:41:45 -0700 (PDT) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1oVxBM-00067h-U9; Wed, 07 Sep 2022 17:41:40 +0200 From: Florian Westphal To: Cc: Eric Dumazet , "David S. Miller" , Paolo Abeni , Jakub Kicinski , , Wolfram Sang , Simon Horman , Florian Westphal Subject: [PATCH net-next 6/8] netfilter: move from strlcpy with unused retval to strscpy Date: Wed, 7 Sep 2022 17:41:08 +0200 Message-Id: <20220907154110.8898-7-fw@strlen.de> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220907154110.8898-1-fw@strlen.de> References: <20220907154110.8898-1-fw@strlen.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org From: Wolfram Sang Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang Reviewed-by: Simon Horman Signed-off-by: Florian Westphal --- net/netfilter/ipset/ip_set_core.c | 4 ++-- net/netfilter/ipvs/ip_vs_ctl.c | 8 ++++---- net/netfilter/nf_log.c | 4 ++-- net/netfilter/nf_tables_api.c | 2 +- net/netfilter/nft_osf.c | 2 +- net/netfilter/x_tables.c | 20 ++++++++++---------- net/netfilter/xt_RATEEST.c | 2 +- 7 files changed, 21 insertions(+), 21 deletions(-) diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c index 6b31746f9be3..e7ba5b6dd2b7 100644 --- a/net/netfilter/ipset/ip_set_core.c +++ b/net/netfilter/ipset/ip_set_core.c @@ -353,7 +353,7 @@ ip_set_init_comment(struct ip_set *set, struct ip_set_comment *comment, c = kmalloc(sizeof(*c) + len + 1, GFP_ATOMIC); if (unlikely(!c)) return; - strlcpy(c->str, ext->comment, len + 1); + strscpy(c->str, ext->comment, len + 1); set->ext_size += sizeof(*c) + strlen(c->str) + 1; rcu_assign_pointer(comment->c, c); } @@ -1072,7 +1072,7 @@ static int ip_set_create(struct sk_buff *skb, const struct nfnl_info *info, if (!set) return -ENOMEM; spin_lock_init(&set->lock); - strlcpy(set->name, name, IPSET_MAXNAMELEN); + strscpy(set->name, name, IPSET_MAXNAMELEN); set->family = family; set->revision = revision; diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c index 818b0b058b10..988222fff9f0 100644 --- a/net/netfilter/ipvs/ip_vs_ctl.c +++ b/net/netfilter/ipvs/ip_vs_ctl.c @@ -2611,7 +2611,7 @@ ip_vs_copy_service(struct ip_vs_service_entry *dst, struct ip_vs_service *src) dst->addr = src->addr.ip; dst->port = src->port; dst->fwmark = src->fwmark; - strlcpy(dst->sched_name, sched_name, sizeof(dst->sched_name)); + strscpy(dst->sched_name, sched_name, sizeof(dst->sched_name)); dst->flags = src->flags; dst->timeout = src->timeout / HZ; dst->netmask = src->netmask; @@ -2805,13 +2805,13 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len) mutex_lock(&ipvs->sync_mutex); if (ipvs->sync_state & IP_VS_STATE_MASTER) { d[0].state = IP_VS_STATE_MASTER; - strlcpy(d[0].mcast_ifn, ipvs->mcfg.mcast_ifn, + strscpy(d[0].mcast_ifn, ipvs->mcfg.mcast_ifn, sizeof(d[0].mcast_ifn)); d[0].syncid = ipvs->mcfg.syncid; } if (ipvs->sync_state & IP_VS_STATE_BACKUP) { d[1].state = IP_VS_STATE_BACKUP; - strlcpy(d[1].mcast_ifn, ipvs->bcfg.mcast_ifn, + strscpy(d[1].mcast_ifn, ipvs->bcfg.mcast_ifn, sizeof(d[1].mcast_ifn)); d[1].syncid = ipvs->bcfg.syncid; } @@ -3561,7 +3561,7 @@ static int ip_vs_genl_new_daemon(struct netns_ipvs *ipvs, struct nlattr **attrs) attrs[IPVS_DAEMON_ATTR_MCAST_IFN] && attrs[IPVS_DAEMON_ATTR_SYNC_ID])) return -EINVAL; - strlcpy(c.mcast_ifn, nla_data(attrs[IPVS_DAEMON_ATTR_MCAST_IFN]), + strscpy(c.mcast_ifn, nla_data(attrs[IPVS_DAEMON_ATTR_MCAST_IFN]), sizeof(c.mcast_ifn)); c.syncid = nla_get_u32(attrs[IPVS_DAEMON_ATTR_SYNC_ID]); diff --git a/net/netfilter/nf_log.c b/net/netfilter/nf_log.c index edee7fa944c1..8a29290149bd 100644 --- a/net/netfilter/nf_log.c +++ b/net/netfilter/nf_log.c @@ -443,9 +443,9 @@ static int nf_log_proc_dostring(struct ctl_table *table, int write, mutex_lock(&nf_log_mutex); logger = nft_log_dereference(net->nf.nf_loggers[tindex]); if (!logger) - strlcpy(buf, "NONE", sizeof(buf)); + strscpy(buf, "NONE", sizeof(buf)); else - strlcpy(buf, logger->name, sizeof(buf)); + strscpy(buf, logger->name, sizeof(buf)); mutex_unlock(&nf_log_mutex); r = proc_dostring(&tmp, write, buffer, lenp, ppos); } diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 2ee50e23c9b7..12406006852d 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -742,7 +742,7 @@ __printf(2, 3) int nft_request_module(struct net *net, const char *fmt, return -ENOMEM; req->done = false; - strlcpy(req->module, module_name, MODULE_NAME_LEN); + strscpy(req->module, module_name, MODULE_NAME_LEN); list_add_tail(&req->list, &nft_net->module_list); return -EAGAIN; diff --git a/net/netfilter/nft_osf.c b/net/netfilter/nft_osf.c index 89342ccccdcc..adacf95b6e2b 100644 --- a/net/netfilter/nft_osf.c +++ b/net/netfilter/nft_osf.c @@ -51,7 +51,7 @@ static void nft_osf_eval(const struct nft_expr *expr, struct nft_regs *regs, snprintf(os_match, NFT_OSF_MAXGENRELEN, "%s:%s", data.genre, data.version); else - strlcpy(os_match, data.genre, NFT_OSF_MAXGENRELEN); + strscpy(os_match, data.genre, NFT_OSF_MAXGENRELEN); strncpy((char *)dest, os_match, NFT_OSF_MAXGENRELEN); } diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index 54a489f16b17..470282cf3fae 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -766,7 +766,7 @@ void xt_compat_match_from_user(struct xt_entry_match *m, void **dstptr, msize += off; m->u.user.match_size = msize; - strlcpy(name, match->name, sizeof(name)); + strscpy(name, match->name, sizeof(name)); module_put(match->me); strncpy(m->u.user.name, name, sizeof(m->u.user.name)); @@ -1146,7 +1146,7 @@ void xt_compat_target_from_user(struct xt_entry_target *t, void **dstptr, tsize += off; t->u.user.target_size = tsize; - strlcpy(name, target->name, sizeof(name)); + strscpy(name, target->name, sizeof(name)); module_put(target->me); strncpy(t->u.user.name, name, sizeof(t->u.user.name)); @@ -1827,7 +1827,7 @@ int xt_proto_init(struct net *net, u_int8_t af) root_uid = make_kuid(net->user_ns, 0); root_gid = make_kgid(net->user_ns, 0); - strlcpy(buf, xt_prefix[af], sizeof(buf)); + strscpy(buf, xt_prefix[af], sizeof(buf)); strlcat(buf, FORMAT_TABLES, sizeof(buf)); proc = proc_create_net_data(buf, 0440, net->proc_net, &xt_table_seq_ops, sizeof(struct seq_net_private), @@ -1837,7 +1837,7 @@ int xt_proto_init(struct net *net, u_int8_t af) if (uid_valid(root_uid) && gid_valid(root_gid)) proc_set_user(proc, root_uid, root_gid); - strlcpy(buf, xt_prefix[af], sizeof(buf)); + strscpy(buf, xt_prefix[af], sizeof(buf)); strlcat(buf, FORMAT_MATCHES, sizeof(buf)); proc = proc_create_seq_private(buf, 0440, net->proc_net, &xt_match_seq_ops, sizeof(struct nf_mttg_trav), @@ -1847,7 +1847,7 @@ int xt_proto_init(struct net *net, u_int8_t af) if (uid_valid(root_uid) && gid_valid(root_gid)) proc_set_user(proc, root_uid, root_gid); - strlcpy(buf, xt_prefix[af], sizeof(buf)); + strscpy(buf, xt_prefix[af], sizeof(buf)); strlcat(buf, FORMAT_TARGETS, sizeof(buf)); proc = proc_create_seq_private(buf, 0440, net->proc_net, &xt_target_seq_ops, sizeof(struct nf_mttg_trav), @@ -1862,12 +1862,12 @@ int xt_proto_init(struct net *net, u_int8_t af) #ifdef CONFIG_PROC_FS out_remove_matches: - strlcpy(buf, xt_prefix[af], sizeof(buf)); + strscpy(buf, xt_prefix[af], sizeof(buf)); strlcat(buf, FORMAT_MATCHES, sizeof(buf)); remove_proc_entry(buf, net->proc_net); out_remove_tables: - strlcpy(buf, xt_prefix[af], sizeof(buf)); + strscpy(buf, xt_prefix[af], sizeof(buf)); strlcat(buf, FORMAT_TABLES, sizeof(buf)); remove_proc_entry(buf, net->proc_net); out: @@ -1881,15 +1881,15 @@ void xt_proto_fini(struct net *net, u_int8_t af) #ifdef CONFIG_PROC_FS char buf[XT_FUNCTION_MAXNAMELEN]; - strlcpy(buf, xt_prefix[af], sizeof(buf)); + strscpy(buf, xt_prefix[af], sizeof(buf)); strlcat(buf, FORMAT_TABLES, sizeof(buf)); remove_proc_entry(buf, net->proc_net); - strlcpy(buf, xt_prefix[af], sizeof(buf)); + strscpy(buf, xt_prefix[af], sizeof(buf)); strlcat(buf, FORMAT_TARGETS, sizeof(buf)); remove_proc_entry(buf, net->proc_net); - strlcpy(buf, xt_prefix[af], sizeof(buf)); + strscpy(buf, xt_prefix[af], sizeof(buf)); strlcat(buf, FORMAT_MATCHES, sizeof(buf)); remove_proc_entry(buf, net->proc_net); #endif /*CONFIG_PROC_FS*/ diff --git a/net/netfilter/xt_RATEEST.c b/net/netfilter/xt_RATEEST.c index 8aec1b529364..80f6624e2355 100644 --- a/net/netfilter/xt_RATEEST.c +++ b/net/netfilter/xt_RATEEST.c @@ -144,7 +144,7 @@ static int xt_rateest_tg_checkentry(const struct xt_tgchk_param *par) goto err1; gnet_stats_basic_sync_init(&est->bstats); - strlcpy(est->name, info->name, sizeof(est->name)); + strscpy(est->name, info->name, sizeof(est->name)); spin_lock_init(&est->lock); est->refcnt = 1; est->params.interval = info->interval; From patchwork Wed Sep 7 15:41:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 12969150 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67551C6FA89 for ; Wed, 7 Sep 2022 15:41:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229939AbiIGPlt (ORCPT ); Wed, 7 Sep 2022 11:41:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39202 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229847AbiIGPls (ORCPT ); Wed, 7 Sep 2022 11:41:48 -0400 Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 72F2DB08A9; Wed, 7 Sep 2022 08:41:46 -0700 (PDT) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1oVxBO-00068C-8O; Wed, 07 Sep 2022 17:41:42 +0200 From: Florian Westphal To: Cc: Eric Dumazet , "David S. Miller" , Paolo Abeni , Jakub Kicinski , , Florian Westphal , Pablo Neira Ayuso Subject: [PATCH net-next 7/8] netfilter: nat: move repetitive nat port reserve loop to a helper Date: Wed, 7 Sep 2022 17:41:09 +0200 Message-Id: <20220907154110.8898-8-fw@strlen.de> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220907154110.8898-1-fw@strlen.de> References: <20220907154110.8898-1-fw@strlen.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Almost all nat helpers reserve an expecation port the same way: Try the port inidcated by the peer, then move to next port if that port is already in use. We can squash this into a helper. Suggested-by: Pablo Neira Ayuso Signed-off-by: Florian Westphal --- include/net/netfilter/nf_nat_helper.h | 1 + net/ipv4/netfilter/nf_nat_h323.c | 60 ++------------------------- net/netfilter/nf_nat_amanda.c | 14 +------ net/netfilter/nf_nat_ftp.c | 17 +------- net/netfilter/nf_nat_helper.c | 19 +++++++++ net/netfilter/nf_nat_irc.c | 16 +------ net/netfilter/nf_nat_sip.c | 14 +------ 7 files changed, 30 insertions(+), 111 deletions(-) diff --git a/include/net/netfilter/nf_nat_helper.h b/include/net/netfilter/nf_nat_helper.h index efae84646353..44c421b9be85 100644 --- a/include/net/netfilter/nf_nat_helper.h +++ b/include/net/netfilter/nf_nat_helper.h @@ -38,4 +38,5 @@ bool nf_nat_mangle_udp_packet(struct sk_buff *skb, struct nf_conn *ct, * to port ct->master->saved_proto. */ void nf_nat_follow_master(struct nf_conn *ct, struct nf_conntrack_expect *this); +u16 nf_nat_exp_find_port(struct nf_conntrack_expect *exp, u16 port); #endif diff --git a/net/ipv4/netfilter/nf_nat_h323.c b/net/ipv4/netfilter/nf_nat_h323.c index a334f0dcc2d0..faee20af4856 100644 --- a/net/ipv4/netfilter/nf_nat_h323.c +++ b/net/ipv4/netfilter/nf_nat_h323.c @@ -291,20 +291,7 @@ static int nat_t120(struct sk_buff *skb, struct nf_conn *ct, exp->expectfn = nf_nat_follow_master; exp->dir = !dir; - /* Try to get same port: if not, try to change it. */ - for (; nated_port != 0; nated_port++) { - int ret; - - exp->tuple.dst.u.tcp.port = htons(nated_port); - ret = nf_ct_expect_related(exp, 0); - if (ret == 0) - break; - else if (ret != -EBUSY) { - nated_port = 0; - break; - } - } - + nated_port = nf_nat_exp_find_port(exp, nated_port); if (nated_port == 0) { /* No port available */ net_notice_ratelimited("nf_nat_h323: out of TCP ports\n"); return 0; @@ -347,20 +334,7 @@ static int nat_h245(struct sk_buff *skb, struct nf_conn *ct, if (info->sig_port[dir] == port) nated_port = ntohs(info->sig_port[!dir]); - /* Try to get same port: if not, try to change it. */ - for (; nated_port != 0; nated_port++) { - int ret; - - exp->tuple.dst.u.tcp.port = htons(nated_port); - ret = nf_ct_expect_related(exp, 0); - if (ret == 0) - break; - else if (ret != -EBUSY) { - nated_port = 0; - break; - } - } - + nated_port = nf_nat_exp_find_port(exp, nated_port); if (nated_port == 0) { /* No port available */ net_notice_ratelimited("nf_nat_q931: out of TCP ports\n"); return 0; @@ -439,20 +413,7 @@ static int nat_q931(struct sk_buff *skb, struct nf_conn *ct, if (info->sig_port[dir] == port) nated_port = ntohs(info->sig_port[!dir]); - /* Try to get same port: if not, try to change it. */ - for (; nated_port != 0; nated_port++) { - int ret; - - exp->tuple.dst.u.tcp.port = htons(nated_port); - ret = nf_ct_expect_related(exp, 0); - if (ret == 0) - break; - else if (ret != -EBUSY) { - nated_port = 0; - break; - } - } - + nated_port = nf_nat_exp_find_port(exp, nated_port); if (nated_port == 0) { /* No port available */ net_notice_ratelimited("nf_nat_ras: out of TCP ports\n"); return 0; @@ -532,20 +493,7 @@ static int nat_callforwarding(struct sk_buff *skb, struct nf_conn *ct, exp->expectfn = ip_nat_callforwarding_expect; exp->dir = !dir; - /* Try to get same port: if not, try to change it. */ - for (nated_port = ntohs(port); nated_port != 0; nated_port++) { - int ret; - - exp->tuple.dst.u.tcp.port = htons(nated_port); - ret = nf_ct_expect_related(exp, 0); - if (ret == 0) - break; - else if (ret != -EBUSY) { - nated_port = 0; - break; - } - } - + nated_port = nf_nat_exp_find_port(exp, ntohs(port)); if (nated_port == 0) { /* No port available */ net_notice_ratelimited("nf_nat_q931: out of TCP ports\n"); return 0; diff --git a/net/netfilter/nf_nat_amanda.c b/net/netfilter/nf_nat_amanda.c index 3bc7e0854efe..98deef6cde69 100644 --- a/net/netfilter/nf_nat_amanda.c +++ b/net/netfilter/nf_nat_amanda.c @@ -44,19 +44,7 @@ static unsigned int help(struct sk_buff *skb, exp->expectfn = nf_nat_follow_master; /* Try to get same port: if not, try to change it. */ - for (port = ntohs(exp->saved_proto.tcp.port); port != 0; port++) { - int res; - - exp->tuple.dst.u.tcp.port = htons(port); - res = nf_ct_expect_related(exp, 0); - if (res == 0) - break; - else if (res != -EBUSY) { - port = 0; - break; - } - } - + port = nf_nat_exp_find_port(exp, ntohs(exp->saved_proto.tcp.port)); if (port == 0) { nf_ct_helper_log(skb, exp->master, "all ports in use"); return NF_DROP; diff --git a/net/netfilter/nf_nat_ftp.c b/net/netfilter/nf_nat_ftp.c index aace6768a64e..c92a436d9c48 100644 --- a/net/netfilter/nf_nat_ftp.c +++ b/net/netfilter/nf_nat_ftp.c @@ -86,22 +86,9 @@ static unsigned int nf_nat_ftp(struct sk_buff *skb, * this one. */ exp->expectfn = nf_nat_follow_master; - /* Try to get same port: if not, try to change it. */ - for (port = ntohs(exp->saved_proto.tcp.port); port != 0; port++) { - int ret; - - exp->tuple.dst.u.tcp.port = htons(port); - ret = nf_ct_expect_related(exp, 0); - if (ret == 0) - break; - else if (ret != -EBUSY) { - port = 0; - break; - } - } - + port = nf_nat_exp_find_port(exp, ntohs(exp->saved_proto.tcp.port)); if (port == 0) { - nf_ct_helper_log(skb, ct, "all ports in use"); + nf_ct_helper_log(skb, exp->master, "all ports in use"); return NF_DROP; } diff --git a/net/netfilter/nf_nat_helper.c b/net/netfilter/nf_nat_helper.c index a263505455fc..067d6d6f6b7d 100644 --- a/net/netfilter/nf_nat_helper.c +++ b/net/netfilter/nf_nat_helper.c @@ -198,3 +198,22 @@ void nf_nat_follow_master(struct nf_conn *ct, nf_nat_setup_info(ct, &range, NF_NAT_MANIP_DST); } EXPORT_SYMBOL(nf_nat_follow_master); + +u16 nf_nat_exp_find_port(struct nf_conntrack_expect *exp, u16 port) +{ + /* Try to get same port: if not, try to change it. */ + for (; port != 0; port++) { + int res; + + exp->tuple.dst.u.tcp.port = htons(port); + res = nf_ct_expect_related(exp, 0); + if (res == 0) + return port; + + if (res != -EBUSY) + break; + } + + return 0; +} +EXPORT_SYMBOL_GPL(nf_nat_exp_find_port); diff --git a/net/netfilter/nf_nat_irc.c b/net/netfilter/nf_nat_irc.c index c691ab8d234c..19c4fcc60c50 100644 --- a/net/netfilter/nf_nat_irc.c +++ b/net/netfilter/nf_nat_irc.c @@ -48,20 +48,8 @@ static unsigned int help(struct sk_buff *skb, exp->dir = IP_CT_DIR_REPLY; exp->expectfn = nf_nat_follow_master; - /* Try to get same port: if not, try to change it. */ - for (port = ntohs(exp->saved_proto.tcp.port); port != 0; port++) { - int ret; - - exp->tuple.dst.u.tcp.port = htons(port); - ret = nf_ct_expect_related(exp, 0); - if (ret == 0) - break; - else if (ret != -EBUSY) { - port = 0; - break; - } - } - + port = nf_nat_exp_find_port(exp, + ntohs(exp->saved_proto.tcp.port)); if (port == 0) { nf_ct_helper_log(skb, ct, "all ports in use"); return NF_DROP; diff --git a/net/netfilter/nf_nat_sip.c b/net/netfilter/nf_nat_sip.c index f0a735e86851..cf4aeb299bde 100644 --- a/net/netfilter/nf_nat_sip.c +++ b/net/netfilter/nf_nat_sip.c @@ -410,19 +410,7 @@ static unsigned int nf_nat_sip_expect(struct sk_buff *skb, unsigned int protoff, exp->dir = !dir; exp->expectfn = nf_nat_sip_expected; - for (; port != 0; port++) { - int ret; - - exp->tuple.dst.u.udp.port = htons(port); - ret = nf_ct_expect_related(exp, NF_CT_EXP_F_SKIP_MASTER); - if (ret == 0) - break; - else if (ret != -EBUSY) { - port = 0; - break; - } - } - + port = nf_nat_exp_find_port(exp, port); if (port == 0) { nf_ct_helper_log(skb, ct, "all ports in use for SIP"); return NF_DROP; From patchwork Wed Sep 7 15:41:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 12969152 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8EEDC38145 for ; Wed, 7 Sep 2022 15:41:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229978AbiIGPlz (ORCPT ); Wed, 7 Sep 2022 11:41:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39490 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229760AbiIGPlw (ORCPT ); Wed, 7 Sep 2022 11:41:52 -0400 Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [IPv6:2a0a:51c0:0:12e:520::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EC64D9E0FF; Wed, 7 Sep 2022 08:41:50 -0700 (PDT) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1oVxBS-00068v-LE; Wed, 07 Sep 2022 17:41:46 +0200 From: Florian Westphal To: Cc: Eric Dumazet , "David S. Miller" , Paolo Abeni , Jakub Kicinski , , Florian Westphal Subject: [PATCH net-next 8/8] netfilter: nat: avoid long-running port range loop Date: Wed, 7 Sep 2022 17:41:10 +0200 Message-Id: <20220907154110.8898-9-fw@strlen.de> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220907154110.8898-1-fw@strlen.de> References: <20220907154110.8898-1-fw@strlen.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Looping a large port range takes too long. Instead select a random offset within [ntohs(exp->saved_proto.tcp.port), 65535] and try 128 ports. This is a rehash of an erlier patch to do the same, but generalized to handle other helpers as well. Link: https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210920204439.13179-2-Cole.Dishington@alliedtelesis.co.nz/ Signed-off-by: Florian Westphal --- net/netfilter/nf_nat_helper.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/net/netfilter/nf_nat_helper.c b/net/netfilter/nf_nat_helper.c index 067d6d6f6b7d..a95a25196943 100644 --- a/net/netfilter/nf_nat_helper.c +++ b/net/netfilter/nf_nat_helper.c @@ -201,8 +201,18 @@ EXPORT_SYMBOL(nf_nat_follow_master); u16 nf_nat_exp_find_port(struct nf_conntrack_expect *exp, u16 port) { + static const unsigned int max_attempts = 128; + int range, attempts_left; + u16 min = port; + + range = USHRT_MAX - port; + attempts_left = range; + + if (attempts_left > max_attempts) + attempts_left = max_attempts; + /* Try to get same port: if not, try to change it. */ - for (; port != 0; port++) { + for (;;) { int res; exp->tuple.dst.u.tcp.port = htons(port); @@ -210,8 +220,10 @@ u16 nf_nat_exp_find_port(struct nf_conntrack_expect *exp, u16 port) if (res == 0) return port; - if (res != -EBUSY) + if (res != -EBUSY || (--attempts_left < 0)) break; + + port = min + prandom_u32_max(range); } return 0;