diff mbox series

[net] tcp: Fix error ts_recent time during three-way handshake

Message ID 20250218105824.34511-1-wanghai38@huawei.com (mailing list archive)
State New
Delegated to: Netdev Maintainers
Headers show
Series [net] tcp: Fix error ts_recent time during three-way handshake | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 1 this patch: 1
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 10 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-02-18--12-00 (tests: 891)

Commit Message

Wang Hai Feb. 18, 2025, 10:58 a.m. UTC
If two ack packets from a connection enter tcp_check_req at the same time
through different cpu, it may happen that req->ts_recent is updated with
with a more recent time and the skb with an older time creates a new sock,
which will cause the tcp_validate_incoming check to fail.

cpu1                                cpu2
tcp_check_req
                                    tcp_check_req
req->ts_recent = tmp_opt.rcv_tsval = t1
                                    req->ts_recent = tmp_opt.rcv_tsval = t2

newsk->ts_recent = req->ts_recent = t2 // t1 < t2
tcp_child_process
tcp_rcv_state_process
tcp_validate_incoming
tcp_paws_check
if ((s32)(rx_opt->ts_recent - rx_opt->rcv_tsval) <= paws_win) // failed

In tcp_check_req, restore ts_recent to this skb's to fix this bug.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
---
 net/ipv4/tcp_minisocks.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Jason Xing Feb. 18, 2025, 12:02 p.m. UTC | #1
Hi Wang,

On Tue, Feb 18, 2025 at 7:02 PM Wang Hai <wanghai38@huawei.com> wrote:
>
> If two ack packets from a connection enter tcp_check_req at the same time
> through different cpu, it may happen that req->ts_recent is updated with
> with a more recent time and the skb with an older time creates a new sock,
> which will cause the tcp_validate_incoming check to fail.
>
> cpu1                                cpu2
> tcp_check_req
>                                     tcp_check_req
> req->ts_recent = tmp_opt.rcv_tsval = t1
>                                     req->ts_recent = tmp_opt.rcv_tsval = t2
>
> newsk->ts_recent = req->ts_recent = t2 // t1 < t2
> tcp_child_process
> tcp_rcv_state_process
> tcp_validate_incoming
> tcp_paws_check
> if ((s32)(rx_opt->ts_recent - rx_opt->rcv_tsval) <= paws_win) // failed
>
> In tcp_check_req, restore ts_recent to this skb's to fix this bug.

It's known that it's possible to receive two packets in two different
cpus like this case nearly at the same time. I'm curious if the socket
was running on the loopback?

Even if the above check that you commented in tcp_paws_check() fails,
how about the rest of the checks in tcp_validate_incoming()? In your
test, I doubt if really that check failed which finally caused the skb
from CPU2 to be discarded. Let me put it this way, is the paws_win
check the root cause? What is the drop reason for
tcp_validate_incoming()?

Thanks,
Jason

>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Wang Hai <wanghai38@huawei.com>
> ---
>  net/ipv4/tcp_minisocks.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index b089b08e9617..0208455f9eb8 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -878,6 +878,10 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
>         sock_rps_save_rxhash(child, skb);
>         tcp_synack_rtt_meas(child, req);
>         *req_stolen = !own_req;
> +       if (own_req && tcp_sk(child)->rx_opt.tstamp_ok &&
> +           unlikely(tcp_sk(child)->rx_opt.ts_recent != tmp_opt.rcv_tsval))
> +               tcp_sk(child)->rx_opt.ts_recent = tmp_opt.rcv_tsval;
> +
>         return inet_csk_complete_hashdance(sk, child, req, own_req);
>
>  listen_overflow:
> --
> 2.17.1
>
>
Eric Dumazet Feb. 18, 2025, 1:35 p.m. UTC | #2
On Tue, Feb 18, 2025 at 12:00 PM Wang Hai <wanghai38@huawei.com> wrote:
>
> If two ack packets from a connection enter tcp_check_req at the same time
> through different cpu, it may happen that req->ts_recent is updated with
> with a more recent time and the skb with an older time creates a new sock,
> which will cause the tcp_validate_incoming check to fail.
>
> cpu1                                cpu2
> tcp_check_req
>                                     tcp_check_req
> req->ts_recent = tmp_opt.rcv_tsval = t1
>                                     req->ts_recent = tmp_opt.rcv_tsval = t2
>
> newsk->ts_recent = req->ts_recent = t2 // t1 < t2
> tcp_child_process
> tcp_rcv_state_process
> tcp_validate_incoming
> tcp_paws_check
> if ((s32)(rx_opt->ts_recent - rx_opt->rcv_tsval) <= paws_win) // failed
>
> In tcp_check_req, restore ts_recent to this skb's to fix this bug.
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Wang Hai <wanghai38@huawei.com>
> ---
>  net/ipv4/tcp_minisocks.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index b089b08e9617..0208455f9eb8 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -878,6 +878,10 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
>         sock_rps_save_rxhash(child, skb);
>         tcp_synack_rtt_meas(child, req);
>         *req_stolen = !own_req;
> +       if (own_req && tcp_sk(child)->rx_opt.tstamp_ok &&
> +           unlikely(tcp_sk(child)->rx_opt.ts_recent != tmp_opt.rcv_tsval))
> +               tcp_sk(child)->rx_opt.ts_recent = tmp_opt.rcv_tsval;
> +
>         return inet_csk_complete_hashdance(sk, child, req, own_req);

Have you seen the comment at line 818 ?

/* TODO: We probably should defer ts_recent change once
 * we take ownership of @req.
 */

Plan was clear and explained. Why implement something else (and buggy) ?
diff mbox series

Patch

diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index b089b08e9617..0208455f9eb8 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -878,6 +878,10 @@  struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
 	sock_rps_save_rxhash(child, skb);
 	tcp_synack_rtt_meas(child, req);
 	*req_stolen = !own_req;
+	if (own_req && tcp_sk(child)->rx_opt.tstamp_ok &&
+	    unlikely(tcp_sk(child)->rx_opt.ts_recent != tmp_opt.rcv_tsval))
+		tcp_sk(child)->rx_opt.ts_recent = tmp_opt.rcv_tsval;
+
 	return inet_csk_complete_hashdance(sk, child, req, own_req);
 
 listen_overflow: