Message ID | 20240523130528.60376-1-edumazet@google.com (mailing list archive) |
---|---|
State | Accepted |
Commit | f4dca95fc0f6350918f2e6727e35b41f7f86fcce |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] tcp: reduce accepted window in NEW_SYN_RECV state | expand |
On Thu, May 23, 2024 at 9:05 AM Eric Dumazet <edumazet@google.com> wrote: > > Jason commit made checks against ACK sequence less strict > and can be exploited by attackers to establish spoofed flows > with less probes. > > Innocent users might use tcp_rmem[1] == 1,000,000,000, > or something more reasonable. > > An attacker can use a regular TCP connection to learn the server > initial tp->rcv_wnd, and use it to optimize the attack. > > If we make sure that only the announced window (smaller than 65535) > is used for ACK validation, we force an attacker to use > 65537 packets to complete the 3WHS (assuming server ISN is unknown) > > Fixes: 378979e94e95 ("tcp: remove 64 KByte limit for initial tp->rcv_wnd value") > Link: https://datatracker.ietf.org/meeting/119/materials/slides-119-tcpm-ghost-acks-00 > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Jason Xing <kernelxing@tencent.com> > Cc: Neal Cardwell <ncardwell@google.com> > --- Acked-by: Neal Cardwell <ncardwell@google.com> Thanks, Eric! neal
On Thu, May 23, 2024 at 9:07 PM Eric Dumazet <edumazet@google.com> wrote: > > Jason commit made checks against ACK sequence less strict > and can be exploited by attackers to establish spoofed flows > with less probes. > > Innocent users might use tcp_rmem[1] == 1,000,000,000, > or something more reasonable. > > An attacker can use a regular TCP connection to learn the server > initial tp->rcv_wnd, and use it to optimize the attack. > > If we make sure that only the announced window (smaller than 65535) > is used for ACK validation, we force an attacker to use > 65537 packets to complete the 3WHS (assuming server ISN is unknown) > > Fixes: 378979e94e95 ("tcp: remove 64 KByte limit for initial tp->rcv_wnd value") > Link: https://datatracker.ietf.org/meeting/119/materials/slides-119-tcpm-ghost-acks-00 > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Jason Xing <kernelxing@tencent.com> > Cc: Neal Cardwell <ncardwell@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Thank you, Eric!
Hello: This patch was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Thu, 23 May 2024 13:05:27 +0000 you wrote: > Jason commit made checks against ACK sequence less strict > and can be exploited by attackers to establish spoofed flows > with less probes. > > Innocent users might use tcp_rmem[1] == 1,000,000,000, > or something more reasonable. > > [...] Here is the summary with links: - [net] tcp: reduce accepted window in NEW_SYN_RECV state https://git.kernel.org/netdev/net/c/f4dca95fc0f6 You are awesome, thank you!
diff --git a/include/net/request_sock.h b/include/net/request_sock.h index bdc737832da66a1eb5c50928e67d45d8b58d7b8e..68c1c5a5444c2c73a5e2209012b0f985f4361704 100644 --- a/include/net/request_sock.h +++ b/include/net/request_sock.h @@ -284,4 +284,16 @@ static inline int reqsk_queue_len_young(const struct request_sock_queue *queue) return atomic_read(&queue->young); } +/* RFC 7323 2.3 Using the Window Scale Option + * The window field (SEG.WND) of every outgoing segment, with the + * exception of <SYN> segments, MUST be right-shifted by + * Rcv.Wind.Shift bits. + * + * This means the SEG.WND carried in SYNACK can not exceed 65535. + * We use this property to harden TCP stack while in NEW_SYN_RECV state. + */ +static inline u32 tcp_synack_window(const struct request_sock *req) +{ + return min(req->rsk_rcv_wnd, 65535U); +} #endif /* _REQUEST_SOCK_H */ diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 30ef0c8f5e92d301c31ea1a05f662c1fc4cf37af..b710958393e64e2278c088018c87ac97a1291a23 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1144,14 +1144,9 @@ static void tcp_v4_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb, #endif } - /* RFC 7323 2.3 - * The window field (SEG.WND) of every outgoing segment, with the - * exception of <SYN> segments, MUST be right-shifted by - * Rcv.Wind.Shift bits: - */ tcp_v4_send_ack(sk, skb, seq, tcp_rsk(req)->rcv_nxt, - req->rsk_rcv_wnd >> inet_rsk(req)->rcv_wscale, + tcp_synack_window(req) >> inet_rsk(req)->rcv_wscale, tcp_rsk_tsval(tcp_rsk(req)), READ_ONCE(req->ts_recent), 0, &key, diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index b93619b2384b3735ecb6e40238f8367d9afb7e15..538c06f95918dedf29e0f4790795fcc417f2516f 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -783,8 +783,11 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb, /* RFC793: "first check sequence number". */ - if (paws_reject || !tcp_in_window(TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb)->end_seq, - tcp_rsk(req)->rcv_nxt, tcp_rsk(req)->rcv_nxt + req->rsk_rcv_wnd)) { + if (paws_reject || !tcp_in_window(TCP_SKB_CB(skb)->seq, + TCP_SKB_CB(skb)->end_seq, + tcp_rsk(req)->rcv_nxt, + tcp_rsk(req)->rcv_nxt + + tcp_synack_window(req))) { /* Out of window: send ACK and drop. */ if (!(flg & TCP_FLAG_RST) && !tcp_oow_rate_limited(sock_net(sk), skb, diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 4c3605485b68e7c333a0144df3d685b3db9ff45d..8c577b651bfcd2f94b45e339ed4a2b47e93ff17a 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1272,15 +1272,10 @@ static void tcp_v6_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb, /* sk->sk_state == TCP_LISTEN -> for regular TCP_SYN_RECV * sk->sk_state == TCP_SYN_RECV -> for Fast Open. */ - /* RFC 7323 2.3 - * The window field (SEG.WND) of every outgoing segment, with the - * exception of <SYN> segments, MUST be right-shifted by - * Rcv.Wind.Shift bits: - */ tcp_v6_send_ack(sk, skb, (sk->sk_state == TCP_LISTEN) ? tcp_rsk(req)->snt_isn + 1 : tcp_sk(sk)->snd_nxt, tcp_rsk(req)->rcv_nxt, - req->rsk_rcv_wnd >> inet_rsk(req)->rcv_wscale, + tcp_synack_window(req) >> inet_rsk(req)->rcv_wscale, tcp_rsk_tsval(tcp_rsk(req)), READ_ONCE(req->ts_recent), sk->sk_bound_dev_if, &key, ipv6_get_dsfield(ipv6_hdr(skb)), 0,
Jason commit made checks against ACK sequence less strict and can be exploited by attackers to establish spoofed flows with less probes. Innocent users might use tcp_rmem[1] == 1,000,000,000, or something more reasonable. An attacker can use a regular TCP connection to learn the server initial tp->rcv_wnd, and use it to optimize the attack. If we make sure that only the announced window (smaller than 65535) is used for ACK validation, we force an attacker to use 65537 packets to complete the 3WHS (assuming server ISN is unknown) Fixes: 378979e94e95 ("tcp: remove 64 KByte limit for initial tp->rcv_wnd value") Link: https://datatracker.ietf.org/meeting/119/materials/slides-119-tcpm-ghost-acks-00 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jason Xing <kernelxing@tencent.com> Cc: Neal Cardwell <ncardwell@google.com> --- include/net/request_sock.h | 12 ++++++++++++ net/ipv4/tcp_ipv4.c | 7 +------ net/ipv4/tcp_minisocks.c | 7 +++++-- net/ipv6/tcp_ipv6.c | 7 +------ 4 files changed, 19 insertions(+), 14 deletions(-)