diff mbox series

[net-next,1/8] netfilter: conntrack: prepare tcp_in_window for ternary return value

Message ID 20220907154110.8898-2-fw@strlen.de (mailing list archive)
State Accepted
Commit d9a6f0d0df1899ff9086a57abc600e414f4b8cdd
Delegated to: Netdev Maintainers
Headers show
Series [net-next,1/8] netfilter: conntrack: prepare tcp_in_window for ternary return value | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Pull request is its own cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 3 maintainers not CCed: pablo@netfilter.org coreteam@netfilter.org kadlec@netfilter.org
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning WARNING: line length of 82 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns WARNING: line length of 89 exceeds 80 columns WARNING: line length of 91 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Florian Westphal Sept. 7, 2022, 3:41 p.m. UTC
tcp_in_window returns true if the packet is in window and false if it is
not.

If its outside of window, packet will be treated as INVALID.

There are corner cases where the packet should still be tracked, because
rulesets may drop or log such packets, even though they can occur during
normal operation, such as overly delayed acks.

In extreme cases, connection may hang forever because conntrack state
differs from real state.

There is no retransmission for ACKs.

In case of ACK loss after conntrack processing, its possible that a
connection can be stuck because the actual retransmits are considered
stale ("SEQ is under the lower bound (already ACKed data
retransmitted)".

The problem is made worse by carrier-grade-nat which can also result
in stale packets from old connections to get treated as 'recent' packets
in conntrack (it doesn't support tcp timestamps at this time).

Prepare tcp_in_window() to return an enum that tells the desired
action (in-window/accept, bogus/drop).

A third action (accept the packet as in-window, but do not change
state) is added in a followup patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_conntrack_proto_tcp.c | 136 ++++++++++++++++---------
 1 file changed, 87 insertions(+), 49 deletions(-)

Comments

patchwork-bot+netdevbpf@kernel.org Sept. 9, 2022, 7:40 a.m. UTC | #1
Hello:

This series was applied to netdev/net-next.git (master)
by Florian Westphal <fw@strlen.de>:

On Wed,  7 Sep 2022 17:41:03 +0200 you wrote:
> tcp_in_window returns true if the packet is in window and false if it is
> not.
> 
> If its outside of window, packet will be treated as INVALID.
> 
> There are corner cases where the packet should still be tracked, because
> rulesets may drop or log such packets, even though they can occur during
> normal operation, such as overly delayed acks.
> 
> [...]

Here is the summary with links:
  - [net-next,1/8] netfilter: conntrack: prepare tcp_in_window for ternary return value
    https://git.kernel.org/netdev/net-next/c/d9a6f0d0df18
  - [net-next,2/8] netfilter: conntrack: ignore overly delayed tcp packets
    https://git.kernel.org/netdev/net-next/c/6e250dcbff1d
  - [net-next,3/8] netfilter: conntrack: remove unneeded indent level
    https://git.kernel.org/netdev/net-next/c/09a59001b0d6
  - [net-next,4/8] netfilter: conntrack: reduce timeout when receiving out-of-window fin or rst
    https://git.kernel.org/netdev/net-next/c/628d694344a0
  - [net-next,5/8] netfilter: remove NFPROTO_DECNET
    https://git.kernel.org/netdev/net-next/c/a0a4de4d897f
  - [net-next,6/8] netfilter: move from strlcpy with unused retval to strscpy
    https://git.kernel.org/netdev/net-next/c/8556bceb9c40
  - [net-next,7/8] netfilter: nat: move repetitive nat port reserve loop to a helper
    https://git.kernel.org/netdev/net-next/c/c92c27171040
  - [net-next,8/8] netfilter: nat: avoid long-running port range loop
    https://git.kernel.org/netdev/net-next/c/adda60cc2bb0

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index a634c72b1ffc..1731b82dcc97 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -47,6 +47,11 @@  static const char *const tcp_conntrack_names[] = {
 	"SYN_SENT2",
 };
 
+enum nf_ct_tcp_action {
+	NFCT_TCP_INVALID,
+	NFCT_TCP_ACCEPT,
+};
+
 #define SECS * HZ
 #define MINS * 60 SECS
 #define HOURS * 60 MINS
@@ -472,13 +477,37 @@  static void tcp_init_sender(struct ip_ct_tcp_state *sender,
 	}
 }
 
-static bool tcp_in_window(struct nf_conn *ct,
-			  enum ip_conntrack_dir dir,
-			  unsigned int index,
-			  const struct sk_buff *skb,
-			  unsigned int dataoff,
-			  const struct tcphdr *tcph,
-			  const struct nf_hook_state *hook_state)
+__printf(6, 7)
+static enum nf_ct_tcp_action nf_tcp_log_invalid(const struct sk_buff *skb,
+						const struct nf_conn *ct,
+						const struct nf_hook_state *state,
+						const struct ip_ct_tcp_state *sender,
+						enum nf_ct_tcp_action ret,
+						const char *fmt, ...)
+{
+	const struct nf_tcp_net *tn = nf_tcp_pernet(nf_ct_net(ct));
+	struct va_format vaf;
+	va_list args;
+	bool be_liberal;
+
+	be_liberal = sender->flags & IP_CT_TCP_FLAG_BE_LIBERAL || tn->tcp_be_liberal;
+	if (be_liberal)
+		return NFCT_TCP_ACCEPT;
+
+	va_start(args, fmt);
+	vaf.fmt = fmt;
+	vaf.va = &args;
+	nf_ct_l4proto_log_invalid(skb, ct, state, "%pV", &vaf);
+	va_end(args);
+
+	return ret;
+}
+
+static enum nf_ct_tcp_action
+tcp_in_window(struct nf_conn *ct, enum ip_conntrack_dir dir,
+	      unsigned int index, const struct sk_buff *skb,
+	      unsigned int dataoff, const struct tcphdr *tcph,
+	      const struct nf_hook_state *hook_state)
 {
 	struct ip_ct_tcp *state = &ct->proto.tcp;
 	struct net *net = nf_ct_net(ct);
@@ -486,9 +515,9 @@  static bool tcp_in_window(struct nf_conn *ct,
 	struct ip_ct_tcp_state *sender = &state->seen[dir];
 	struct ip_ct_tcp_state *receiver = &state->seen[!dir];
 	__u32 seq, ack, sack, end, win, swin;
-	u16 win_raw;
+	bool in_recv_win, seq_ok;
 	s32 receiver_offset;
-	bool res, in_recv_win;
+	u16 win_raw;
 
 	/*
 	 * Get the required data from the packet.
@@ -517,7 +546,7 @@  static bool tcp_in_window(struct nf_conn *ct,
 					end, win);
 			if (!tcph->ack)
 				/* Simultaneous open */
-				return true;
+				return NFCT_TCP_ACCEPT;
 		} else {
 			/*
 			 * We are in the middle of a connection,
@@ -560,7 +589,7 @@  static bool tcp_in_window(struct nf_conn *ct,
 				end, win);
 
 		if (dir == IP_CT_DIR_REPLY && !tcph->ack)
-			return true;
+			return NFCT_TCP_ACCEPT;
 	}
 
 	if (!(tcph->ack)) {
@@ -584,13 +613,52 @@  static bool tcp_in_window(struct nf_conn *ct,
 		 */
 		seq = end = sender->td_end;
 
+	seq_ok = before(seq, sender->td_maxend + 1);
+	if (!seq_ok) {
+		u32 overshot = end - sender->td_maxend + 1;
+		bool ack_ok;
+
+		ack_ok = after(sack, receiver->td_end - MAXACKWINDOW(sender) - 1);
+		in_recv_win = receiver->td_maxwin &&
+			      after(end, sender->td_end - receiver->td_maxwin - 1);
+
+		if (in_recv_win &&
+		    ack_ok &&
+		    overshot <= receiver->td_maxwin &&
+		    before(sack, receiver->td_end + 1)) {
+			/* Work around TCPs that send more bytes than allowed by
+			 * the receive window.
+			 *
+			 * If the (marked as invalid) packet is allowed to pass by
+			 * the ruleset and the peer acks this data, then its possible
+			 * all future packets will trigger 'ACK is over upper bound' check.
+			 *
+			 * Thus if only the sequence check fails then do update td_end so
+			 * possible ACK for this data can update internal state.
+			 */
+			sender->td_end = end;
+			sender->flags |= IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED;
+
+			nf_ct_l4proto_log_invalid(skb, ct, hook_state,
+						  "%u bytes more than expected", overshot);
+			return NFCT_TCP_ACCEPT;
+		}
+
+		return nf_tcp_log_invalid(skb, ct, hook_state, sender, NFCT_TCP_INVALID,
+					  "SEQ is over upper bound %u (over the window of the receiver)",
+					  sender->td_maxend + 1);
+	}
+
+	if (!before(sack, receiver->td_end + 1))
+		return nf_tcp_log_invalid(skb, ct, hook_state, sender, NFCT_TCP_INVALID,
+					  "ACK is over upper bound %u (ACKed data not seen yet)",
+					  receiver->td_end + 1);
+
 	/* Is the ending sequence in the receive window (if available)? */
 	in_recv_win = !receiver->td_maxwin ||
 		      after(end, sender->td_end - receiver->td_maxwin - 1);
 
-	if (before(seq, sender->td_maxend + 1) &&
-	    in_recv_win &&
-	    before(sack, receiver->td_end + 1) &&
+	if (in_recv_win &&
 	    after(sack, receiver->td_end - MAXACKWINDOW(sender) - 1)) {
 		/*
 		 * Take into account window scaling (RFC 1323).
@@ -648,44 +716,12 @@  static bool tcp_in_window(struct nf_conn *ct,
 				state->retrans = 0;
 			}
 		}
-		res = true;
 	} else {
-		res = false;
 		if (sender->flags & IP_CT_TCP_FLAG_BE_LIBERAL ||
 		    tn->tcp_be_liberal)
-			res = true;
-		if (!res) {
-			bool seq_ok = before(seq, sender->td_maxend + 1);
-
-			if (!seq_ok) {
-				u32 overshot = end - sender->td_maxend + 1;
-				bool ack_ok;
-
-				ack_ok = after(sack, receiver->td_end - MAXACKWINDOW(sender) - 1);
-
-				if (in_recv_win &&
-				    ack_ok &&
-				    overshot <= receiver->td_maxwin &&
-				    before(sack, receiver->td_end + 1)) {
-					/* Work around TCPs that send more bytes than allowed by
-					 * the receive window.
-					 *
-					 * If the (marked as invalid) packet is allowed to pass by
-					 * the ruleset and the peer acks this data, then its possible
-					 * all future packets will trigger 'ACK is over upper bound' check.
-					 *
-					 * Thus if only the sequence check fails then do update td_end so
-					 * possible ACK for this data can update internal state.
-					 */
-					sender->td_end = end;
-					sender->flags |= IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED;
-
-					nf_ct_l4proto_log_invalid(skb, ct, hook_state,
-								  "%u bytes more than expected", overshot);
-					return res;
-				}
-			}
+			return NFCT_TCP_ACCEPT;
 
+		{
 			nf_ct_l4proto_log_invalid(skb, ct, hook_state,
 			"%s",
 			before(seq, sender->td_maxend + 1) ?
@@ -697,9 +733,11 @@  static bool tcp_in_window(struct nf_conn *ct,
 			: "SEQ is under the lower bound (already ACKed data retransmitted)"
 			: "SEQ is over the upper bound (over the window of the receiver)");
 		}
+
+		return NFCT_TCP_INVALID;
 	}
 
-	return res;
+	return NFCT_TCP_ACCEPT;
 }
 
 /* table of valid flag combinations - PUSH, ECE and CWR are always valid */