Message ID | 20241203081247.1533534-1-youngmin.nam@samsung.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | tcp: check socket state before calling WARN_ON | expand |
On Tue, Dec 3, 2024 at 9:10 AM Youngmin Nam <youngmin.nam@samsung.com> wrote: > > We encountered the following WARNINGs > in tcp_sacktag_write_queue()/tcp_fastretrans_alert() > which triggered a kernel panic due to panic_on_warn. > > case 1. > ------------[ cut here ]------------ > WARNING: CPU: 4 PID: 453 at net/ipv4/tcp_input.c:2026 > Call trace: > tcp_sacktag_write_queue+0xae8/0xb60 > tcp_ack+0x4ec/0x12b8 > tcp_rcv_state_process+0x22c/0xd38 > tcp_v4_do_rcv+0x220/0x300 > tcp_v4_rcv+0xa5c/0xbb4 > ip_protocol_deliver_rcu+0x198/0x34c > ip_local_deliver_finish+0x94/0xc4 > ip_local_deliver+0x74/0x10c > ip_rcv+0xa0/0x13c > Kernel panic - not syncing: kernel: panic_on_warn set ... > > case 2. > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 648 at net/ipv4/tcp_input.c:3004 > Call trace: > tcp_fastretrans_alert+0x8ac/0xa74 > tcp_ack+0x904/0x12b8 > tcp_rcv_state_process+0x22c/0xd38 > tcp_v4_do_rcv+0x220/0x300 > tcp_v4_rcv+0xa5c/0xbb4 > ip_protocol_deliver_rcu+0x198/0x34c > ip_local_deliver_finish+0x94/0xc4 > ip_local_deliver+0x74/0x10c > ip_rcv+0xa0/0x13c > Kernel panic - not syncing: kernel: panic_on_warn set ... > I have not seen these warnings firing. Neal, have you seen this in the past ? Please provide the kernel version (this must be a pristine LTS one). and symbolized stack traces using scripts/decode_stacktrace.sh If this warning was easy to trigger, please provide a packetdrill test ? > When we check the socket state value at the time of the issue, > it was 0x4. > > skc_state = 0x4, > > This is "TCP_FIN_WAIT1" and which means the device closed its socket. > > enum { > TCP_ESTABLISHED = 1, > TCP_SYN_SENT, > TCP_SYN_RECV, > TCP_FIN_WAIT1, > > And also this means tp->packets_out was initialized as 0 > by tcp_write_queue_purge(). What stack trace leads to this tcp_write_queue_purge() exactly ? > > In a congested network situation, a TCP ACK for > an already closed session may be received with a delay from the peer. > This can trigger the WARN_ON macro to help debug the situation. > > To make this situation more meaningful, we would like to call > WARN_ON only when the state of the socket is "TCP_ESTABLISHED". > This will prevent the kernel from triggering a panic > due to panic_on_warn. > > Signed-off-by: Youngmin Nam <youngmin.nam@samsung.com> > --- > net/ipv4/tcp_input.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > index 5bdf13ac26ef..62f4c285ab80 100644 > --- a/net/ipv4/tcp_input.c > +++ b/net/ipv4/tcp_input.c > @@ -2037,7 +2037,8 @@ tcp_sacktag_write_queue(struct sock *sk, const struct sk_buff *ack_skb, > WARN_ON((int)tp->sacked_out < 0); > WARN_ON((int)tp->lost_out < 0); > WARN_ON((int)tp->retrans_out < 0); > - WARN_ON((int)tcp_packets_in_flight(tp) < 0); > + if (sk->sk_state == TCP_ESTABLISHED) In any case this test on sk_state is too specific. > + WARN_ON((int)tcp_packets_in_flight(tp) < 0); > #endif > return state->flag; > } > @@ -3080,7 +3081,8 @@ static void tcp_fastretrans_alert(struct sock *sk, const u32 prior_snd_una, > return; > > /* C. Check consistency of the current state. */ > - tcp_verify_left_out(tp); > + if (sk->sk_state == TCP_ESTABLISHED) > + tcp_verify_left_out(tp); > > /* D. Check state exit conditions. State can be terminated > * when high_seq is ACKed. */ > -- > 2.39.2 >
On Tue, Dec 3, 2024 at 6:07 AM Eric Dumazet <edumazet@google.com> wrote: > > On Tue, Dec 3, 2024 at 9:10 AM Youngmin Nam <youngmin.nam@samsung.com> wrote: > > > > We encountered the following WARNINGs > > in tcp_sacktag_write_queue()/tcp_fastretrans_alert() > > which triggered a kernel panic due to panic_on_warn. > > > > case 1. > > ------------[ cut here ]------------ > > WARNING: CPU: 4 PID: 453 at net/ipv4/tcp_input.c:2026 > > Call trace: > > tcp_sacktag_write_queue+0xae8/0xb60 > > tcp_ack+0x4ec/0x12b8 > > tcp_rcv_state_process+0x22c/0xd38 > > tcp_v4_do_rcv+0x220/0x300 > > tcp_v4_rcv+0xa5c/0xbb4 > > ip_protocol_deliver_rcu+0x198/0x34c > > ip_local_deliver_finish+0x94/0xc4 > > ip_local_deliver+0x74/0x10c > > ip_rcv+0xa0/0x13c > > Kernel panic - not syncing: kernel: panic_on_warn set ... > > > > case 2. > > ------------[ cut here ]------------ > > WARNING: CPU: 0 PID: 648 at net/ipv4/tcp_input.c:3004 > > Call trace: > > tcp_fastretrans_alert+0x8ac/0xa74 > > tcp_ack+0x904/0x12b8 > > tcp_rcv_state_process+0x22c/0xd38 > > tcp_v4_do_rcv+0x220/0x300 > > tcp_v4_rcv+0xa5c/0xbb4 > > ip_protocol_deliver_rcu+0x198/0x34c > > ip_local_deliver_finish+0x94/0xc4 > > ip_local_deliver+0x74/0x10c > > ip_rcv+0xa0/0x13c > > Kernel panic - not syncing: kernel: panic_on_warn set ... > > > > I have not seen these warnings firing. Neal, have you seen this in the past ? I can't recall seeing these warnings over the past 5 years or so, and (from checking our monitoring) they don't seem to be firing in our fleet recently. > In any case this test on sk_state is too specific. I agree with Eric. IMHO TCP_FIN_WAIT1 deserves all the same warnings as ESTABLISHED, since in this state the connection may still have a big queue of data it is trying to reliably send to the other side, with full loss recovery and congestion control logic. I would suggest that instead of running with panic_on_warn it would make more sense to not panic on warning, and instead add more detail to these warning messages in your kernels during your testing, to help debug what is going wrong. I would suggest adding to the warning message: tp->packets_out tp->sacked_out tp->lost_out tp->retrans_out tcp_is_sack(tp) tp->mss_cache inet_csk(sk)->icsk_ca_state inet_csk(sk)->icsk_pmtu_cookie A hunch would be that this is either firing for (a) non-SACK connections, or (b) after an MTU reduction. In particular, you might try `echo 0 > /proc/sys/net/ipv4/tcp_mtu_probing` and see if that makes the warnings go away. cheers, neal
On Tue, 3 Dec 2024 10:34:46 -0500 Neal Cardwell wrote: > > I have not seen these warnings firing. Neal, have you seen this in the past ? > > I can't recall seeing these warnings over the past 5 years or so, and > (from checking our monitoring) they don't seem to be firing in our > fleet recently. FWIW I see this at Meta on 5.12 kernels, but nothing since. Could be that one of our workloads is pinned to 5.12. Youngmin, what's the newest kernel you can repro this on?
Hi Eric. Thanks for looking at this issue. On Tue, Dec 03, 2024 at 12:07:05PM +0100, Eric Dumazet wrote: > On Tue, Dec 3, 2024 at 9:10 AM Youngmin Nam <youngmin.nam@samsung.com> wrote: > > > > We encountered the following WARNINGs > > in tcp_sacktag_write_queue()/tcp_fastretrans_alert() > > which triggered a kernel panic due to panic_on_warn. > > > > case 1. > > ------------[ cut here ]------------ > > WARNING: CPU: 4 PID: 453 at net/ipv4/tcp_input.c:2026 > > Call trace: > > tcp_sacktag_write_queue+0xae8/0xb60 > > tcp_ack+0x4ec/0x12b8 > > tcp_rcv_state_process+0x22c/0xd38 > > tcp_v4_do_rcv+0x220/0x300 > > tcp_v4_rcv+0xa5c/0xbb4 > > ip_protocol_deliver_rcu+0x198/0x34c > > ip_local_deliver_finish+0x94/0xc4 > > ip_local_deliver+0x74/0x10c > > ip_rcv+0xa0/0x13c > > Kernel panic - not syncing: kernel: panic_on_warn set ... > > > > case 2. > > ------------[ cut here ]------------ > > WARNING: CPU: 0 PID: 648 at net/ipv4/tcp_input.c:3004 > > Call trace: > > tcp_fastretrans_alert+0x8ac/0xa74 > > tcp_ack+0x904/0x12b8 > > tcp_rcv_state_process+0x22c/0xd38 > > tcp_v4_do_rcv+0x220/0x300 > > tcp_v4_rcv+0xa5c/0xbb4 > > ip_protocol_deliver_rcu+0x198/0x34c > > ip_local_deliver_finish+0x94/0xc4 > > ip_local_deliver+0x74/0x10c > > ip_rcv+0xa0/0x13c > > Kernel panic - not syncing: kernel: panic_on_warn set ... > > > > I have not seen these warnings firing. Neal, have you seen this in the past ? > > Please provide the kernel version (this must be a pristine LTS one). We are running Android kernel for Android mobile device which is based on LTS kernel 6.6-30. But we've seen this issue since kernel 5.15 LTS. > and symbolized stack traces using scripts/decode_stacktrace.sh Unfortunately, we don't have the matched vmlinux right now. So we need to rebuild and reproduce. > > If this warning was easy to trigger, please provide a packetdrill test ? I'm not sure if we can run packetdrill test on Android device. Anyway let me check. FYI, Here are more detailed logs. Case 1. [26496.422651]I[4: napi/wlan0-33: 467] ------------[ cut here ]------------ [26496.422665]I[4: napi/wlan0-33: 467] WARNING: CPU: 4 PID: 467 at net/ipv4/tcp_input.c:2026 tcp_sacktag_write_queue+0xae8/0xb60 [26496.423420]I[4: napi/wlan0-33: 467] CPU: 4 PID: 467 Comm: napi/wlan0-33 Tainted: G S OE 6.6.30-android15-8-geeceb2c9cdf1-ab20240930.125201-4k #1 a1c80b36942fa6e9575b2544032a7536ed502804 [26496.423427]I[4: napi/wlan0-33: 467] Hardware name: Samsung ERD9955 board based on S5E9955 (DT) [26496.423432]I[4: napi/wlan0-33: 467] pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) [26496.423438]I[4: napi/wlan0-33: 467] pc : tcp_sacktag_write_queue+0xae8/0xb60 [26496.423446]I[4: napi/wlan0-33: 467] lr : tcp_ack+0x4ec/0x12b8 [26496.423455]I[4: napi/wlan0-33: 467] sp : ffffffc096b8b690 [26496.423458]I[4: napi/wlan0-33: 467] x29: ffffffc096b8b710 x28: 0000000000008001 x27: 000000005526d635 [26496.423469]I[4: napi/wlan0-33: 467] x26: ffffff8a19079684 x25: 000000005526dbfd x24: 0000000000000001 [26496.423480]I[4: napi/wlan0-33: 467] x23: 000000000000000a x22: ffffff88e5f5b680 x21: 000000005526dbc9 [26496.423489]I[4: napi/wlan0-33: 467] x20: ffffff8a19078d80 x19: ffffff88e9f4193e x18: ffffffd083114c80 [26496.423499]I[4: napi/wlan0-33: 467] x17: 00000000529c6ef0 x16: 000000000000ff8b x15: 0000000000000000 [26496.423508]I[4: napi/wlan0-33: 467] x14: 0000000000000001 x13: 0000000000000001 x12: 0000000000000000 [26496.423517]I[4: napi/wlan0-33: 467] x11: 0000000000000000 x10: 0000000000000001 x9 : 00000000fffffffd [26496.423526]I[4: napi/wlan0-33: 467] x8 : 0000000000000001 x7 : 0000000000000000 x6 : ffffffd081ec0bc4 [26496.423536]I[4: napi/wlan0-33: 467] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffffffc096b8b818 [26496.423545]I[4: napi/wlan0-33: 467] x2 : 000000005526d635 x1 : ffffff88e5f5b680 x0 : ffffff8a19078d80 [26496.423555]I[4: napi/wlan0-33: 467] Call trace: [26496.423558]I[4: napi/wlan0-33: 467] tcp_sacktag_write_queue+0xae8/0xb60 [26496.423566]I[4: napi/wlan0-33: 467] tcp_ack+0x4ec/0x12b8 [26496.423573]I[4: napi/wlan0-33: 467] tcp_rcv_state_process+0x22c/0xd38 [26496.423580]I[4: napi/wlan0-33: 467] tcp_v4_do_rcv+0x220/0x300 [26496.423590]I[4: napi/wlan0-33: 467] tcp_v4_rcv+0xa5c/0xbb4 [26496.423596]I[4: napi/wlan0-33: 467] ip_protocol_deliver_rcu+0x198/0x34c [26496.423607]I[4: napi/wlan0-33: 467] ip_local_deliver_finish+0x94/0xc4 [26496.423614]I[4: napi/wlan0-33: 467] ip_local_deliver+0x74/0x10c [26496.423620]I[4: napi/wlan0-33: 467] ip_rcv+0xa0/0x13c [26496.423625]I[4: napi/wlan0-33: 467] __netif_receive_skb_core+0xe14/0x1104 [26496.423642]I[4: napi/wlan0-33: 467] __netif_receive_skb_list_core+0xb8/0x2dc [26496.423649]I[4: napi/wlan0-33: 467] netif_receive_skb_list_internal+0x234/0x320 [26496.423655]I[4: napi/wlan0-33: 467] napi_complete_done+0xb4/0x1a0 [26496.423660]I[4: napi/wlan0-33: 467] slsi_rx_netif_napi_poll+0x22c/0x258 [scsc_wlan 16ac2100e65b7c78ce863cecc238b39b162dbe82] [26496.423822]I[4: napi/wlan0-33: 467] __napi_poll+0x5c/0x238 [26496.423829]I[4: napi/wlan0-33: 467] napi_threaded_poll+0x110/0x204 [26496.423836]I[4: napi/wlan0-33: 467] kthread+0x114/0x138 [26496.423845]I[4: napi/wlan0-33: 467] ret_from_fork+0x10/0x20 [26496.423856]I[4: napi/wlan0-33: 467] Kernel panic - not syncing: kernel: panic_on_warn set .. Case 2. [ 1843.463330]I[0: surfaceflinger: 648] ------------[ cut here ]------------ [ 1843.463355]I[0: surfaceflinger: 648] WARNING: CPU: 0 PID: 648 at net/ipv4/tcp_input.c:3004 tcp_fastretrans_alert+0x8ac/0xa74 [ 1843.464508]I[0: surfaceflinger: 648] CPU: 0 PID: 648 Comm: surfaceflinger Tainted: G S OE 6.6.30-android15-8-geeceb2c9cdf1-ab20241017.075836-4k #1 de751202c2c5ab3ec352a00ae470fc5e907bdcfe [ 1843.464520]I[0: surfaceflinger: 648] Hardware name: Samsung ERD8855 board based on S5E8855 (DT) [ 1843.464527]I[0: surfaceflinger: 648] pstate: 23400005 (nzCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) [ 1843.464535]I[0: surfaceflinger: 648] pc : tcp_fastretrans_alert+0x8ac/0xa74 [ 1843.464548]I[0: surfaceflinger: 648] lr : tcp_ack+0x904/0x12b8 [ 1843.464556]I[0: surfaceflinger: 648] sp : ffffffc0800036e0 [ 1843.464561]I[0: surfaceflinger: 648] x29: ffffffc0800036e0 x28: 0000000000008005 x27: 000000001bc05562 [ 1843.464579]I[0: surfaceflinger: 648] x26: ffffff890418a3c4 x25: 0000000000000000 x24: 000000000000cd02 [ 1843.464595]I[0: surfaceflinger: 648] x23: 000000001bc05562 x22: 0000000000000000 x21: ffffffc080003800 [ 1843.464611]I[0: surfaceflinger: 648] x20: ffffffc08000378c x19: ffffff8904189ac0 x18: 0000000000000000 [ 1843.464627]I[0: surfaceflinger: 648] x17: 00000000529c6ef0 x16: 000000000000ff8b x15: 0000000000000001 [ 1843.464642]I[0: surfaceflinger: 648] x14: 0000000000000001 x13: 0000000000000001 x12: 0000000000000000 [ 1843.464658]I[0: surfaceflinger: 648] x11: ffffff883e9c9540 x10: 0000000000000001 x9 : 0000000000000001 [ 1843.464673]I[0: surfaceflinger: 648] x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffffffd081ec0bc4 [ 1843.464689]I[0: surfaceflinger: 648] x5 : 0000000000000000 x4 : ffffffc08000378c x3 : ffffffc080003800 [ 1843.464704]I[0: surfaceflinger: 648] x2 : 0000000000000000 x1 : 000000001bc05562 x0 : ffffff8904189ac0 [ 1843.464720]I[0: surfaceflinger: 648] Call trace: [ 1843.464725]I[0: surfaceflinger: 648] tcp_fastretrans_alert+0x8ac/0xa74 [ 1843.464735]I[0: surfaceflinger: 648] tcp_ack+0x904/0x12b8 [ 1843.464743]I[0: surfaceflinger: 648] tcp_rcv_state_process+0x22c/0xd38 [ 1843.464751]I[0: surfaceflinger: 648] tcp_v4_do_rcv+0x220/0x300 [ 1843.464760]I[0: surfaceflinger: 648] tcp_v4_rcv+0xa5c/0xbb4 [ 1843.464767]I[0: surfaceflinger: 648] ip_protocol_deliver_rcu+0x198/0x34c [ 1843.464776]I[0: surfaceflinger: 648] ip_local_deliver_finish+0x94/0xc4 [ 1843.464784]I[0: surfaceflinger: 648] ip_local_deliver+0x74/0x10c [ 1843.464791]I[0: surfaceflinger: 648] ip_rcv+0xa0/0x13c [ 1843.464799]I[0: surfaceflinger: 648] __netif_receive_skb_core+0xe14/0x1104 [ 1843.464810]I[0: surfaceflinger: 648] __netif_receive_skb+0x40/0x124 [ 1843.464818]I[0: surfaceflinger: 648] netif_receive_skb+0x7c/0x234 [ 1843.464825]I[0: surfaceflinger: 648] slsi_rx_data_deliver_skb+0x1e0/0xdbc [scsc_wlan 12b378a8d5cf5e6cd833136fc49079d43751bd28] [ 1843.465025]I[0: surfaceflinger: 648] slsi_ba_process_complete+0x70/0xa4 [scsc_wlan 12b378a8d5cf5e6cd833136fc49079d43751bd28] [ 1843.465219]I[0: surfaceflinger: 648] slsi_ba_aging_timeout_handler+0x324/0x354 [scsc_wlan 12b378a8d5cf5e6cd833136fc49079d43751bd28] [ 1843.465410]I[0: surfaceflinger: 648] call_timer_fn+0xd0/0x360 [ 1843.465423]I[0: surfaceflinger: 648] __run_timers+0x1b4/0x268 [ 1843.465432]I[0: surfaceflinger: 648] run_timer_softirq+0x24/0x4c [ 1843.465440]I[0: surfaceflinger: 648] __do_softirq+0x158/0x45c [ 1843.465448]I[0: surfaceflinger: 648] ____do_softirq+0x10/0x20 [ 1843.465455]I[0: surfaceflinger: 648] call_on_irq_stack+0x3c/0x74 [ 1843.465463]I[0: surfaceflinger: 648] do_softirq_own_stack+0x1c/0x2c [ 1843.465470]I[0: surfaceflinger: 648] __irq_exit_rcu+0x54/0xb4 [ 1843.465480]I[0: surfaceflinger: 648] irq_exit_rcu+0x10/0x1c [ 1843.465489]I[0: surfaceflinger: 648] el0_interrupt+0x54/0xe0 [ 1843.465499]I[0: surfaceflinger: 648] __el0_irq_handler_common+0x18/0x28 [ 1843.465508]I[0: surfaceflinger: 648] el0t_64_irq_handler+0x10/0x1c [ 1843.465516]I[0: surfaceflinger: 648] el0t_64_irq+0x1a8/0x1ac [ 1843.465525]I[0: surfaceflinger: 648] Kernel panic - not syncing: kernel: panic_on_warn set ... > > When we check the socket state value at the time of the issue, > > it was 0x4. > > > > skc_state = 0x4, > > > > This is "TCP_FIN_WAIT1" and which means the device closed its socket. > > > > enum { > > TCP_ESTABLISHED = 1, > > TCP_SYN_SENT, > > TCP_SYN_RECV, > > TCP_FIN_WAIT1, > > > > And also this means tp->packets_out was initialized as 0 > > by tcp_write_queue_purge(). > > What stack trace leads to this tcp_write_queue_purge() exactly ? I couldn't find the exact call stack to this. But I just thought that the function would be called based on ramdump snapshot. (*(struct tcp_sock *)(0xFFFFFF800467AB00)).packets_out = 0 (*(struct inet_connection_sock *)0xFFFFFF800467AB00).icsk_backoff = 0 > > > > In a congested network situation, a TCP ACK for > > an already closed session may be received with a delay from the peer. > > This can trigger the WARN_ON macro to help debug the situation. > > > > To make this situation more meaningful, we would like to call > > WARN_ON only when the state of the socket is "TCP_ESTABLISHED". > > This will prevent the kernel from triggering a panic > > due to panic_on_warn. > > > > Signed-off-by: Youngmin Nam <youngmin.nam@samsung.com> > > --- > > net/ipv4/tcp_input.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c > > index 5bdf13ac26ef..62f4c285ab80 100644 > > --- a/net/ipv4/tcp_input.c > > +++ b/net/ipv4/tcp_input.c > > @@ -2037,7 +2037,8 @@ tcp_sacktag_write_queue(struct sock *sk, const struct sk_buff *ack_skb, > > WARN_ON((int)tp->sacked_out < 0); > > WARN_ON((int)tp->lost_out < 0); > > WARN_ON((int)tp->retrans_out < 0); > > - WARN_ON((int)tcp_packets_in_flight(tp) < 0); > > + if (sk->sk_state == TCP_ESTABLISHED) > > In any case this test on sk_state is too specific. Yes. I agree as wll. Do you have any idea to avoid a kernel panic that we are going through ? > > + WARN_ON((int)tcp_packets_in_flight(tp) < 0); > > #endif > > return state->flag; > > } > > @@ -3080,7 +3081,8 @@ static void tcp_fastretrans_alert(struct sock *sk, const u32 prior_snd_una, > > return; > > > > /* C. Check consistency of the current state. */ > > - tcp_verify_left_out(tp); > > + if (sk->sk_state == TCP_ESTABLISHED) > > + tcp_verify_left_out(tp); > > > > /* D. Check state exit conditions. State can be terminated > > * when high_seq is ACKed. */ > > -- > > 2.39.2 > > >
On Tue, Dec 03, 2024 at 10:34:46AM -0500, Neal Cardwell wrote: > On Tue, Dec 3, 2024 at 6:07 AM Eric Dumazet <edumazet@google.com> wrote: > > > > On Tue, Dec 3, 2024 at 9:10 AM Youngmin Nam <youngmin.nam@samsung.com> wrote: > > > > > > We encountered the following WARNINGs > > > in tcp_sacktag_write_queue()/tcp_fastretrans_alert() > > > which triggered a kernel panic due to panic_on_warn. > > > > > > case 1. > > > ------------[ cut here ]------------ > > > WARNING: CPU: 4 PID: 453 at net/ipv4/tcp_input.c:2026 > > > Call trace: > > > tcp_sacktag_write_queue+0xae8/0xb60 > > > tcp_ack+0x4ec/0x12b8 > > > tcp_rcv_state_process+0x22c/0xd38 > > > tcp_v4_do_rcv+0x220/0x300 > > > tcp_v4_rcv+0xa5c/0xbb4 > > > ip_protocol_deliver_rcu+0x198/0x34c > > > ip_local_deliver_finish+0x94/0xc4 > > > ip_local_deliver+0x74/0x10c > > > ip_rcv+0xa0/0x13c > > > Kernel panic - not syncing: kernel: panic_on_warn set ... > > > > > > case 2. > > > ------------[ cut here ]------------ > > > WARNING: CPU: 0 PID: 648 at net/ipv4/tcp_input.c:3004 > > > Call trace: > > > tcp_fastretrans_alert+0x8ac/0xa74 > > > tcp_ack+0x904/0x12b8 > > > tcp_rcv_state_process+0x22c/0xd38 > > > tcp_v4_do_rcv+0x220/0x300 > > > tcp_v4_rcv+0xa5c/0xbb4 > > > ip_protocol_deliver_rcu+0x198/0x34c > > > ip_local_deliver_finish+0x94/0xc4 > > > ip_local_deliver+0x74/0x10c > > > ip_rcv+0xa0/0x13c > > > Kernel panic - not syncing: kernel: panic_on_warn set ... > > > > > > > I have not seen these warnings firing. Neal, have you seen this in the past ? > > I can't recall seeing these warnings over the past 5 years or so, and > (from checking our monitoring) they don't seem to be firing in our > fleet recently. > > > In any case this test on sk_state is too specific. > > I agree with Eric. IMHO TCP_FIN_WAIT1 deserves all the same warnings > as ESTABLISHED, since in this state the connection may still have a > big queue of data it is trying to reliably send to the other side, > with full loss recovery and congestion control logic. Yes I agree with Eric as well. > > I would suggest that instead of running with panic_on_warn it would > make more sense to not panic on warning, and instead add more detail > to these warning messages in your kernels during your testing, to help > debug what is going wrong. I would suggest adding to the warning > message: > > tp->packets_out > tp->sacked_out > tp->lost_out > tp->retrans_out > tcp_is_sack(tp) > tp->mss_cache > inet_csk(sk)->icsk_ca_state > inet_csk(sk)->icsk_pmtu_cookie Hi Neal. Thanks for your opinion. By the way, we enable panic_on_warn by default for stability. As you know, panic_on_warn is not applied to a specific subsystem but to the entire kernel. We just want to avoid the kernel panic. So when I see below lwn article, I think we might use pr_warn() instaed of WARN_ON(). https://lwn.net/Articles/969923/ How do you think of it ? > > A hunch would be that this is either firing for (a) non-SACK > connections, or (b) after an MTU reduction. > > In particular, you might try `echo 0 > > /proc/sys/net/ipv4/tcp_mtu_probing` and see if that makes the warnings > go away. > > cheers, > neal >
On Tue, Dec 03, 2024 at 06:18:39PM -0800, Jakub Kicinski wrote: > On Tue, 3 Dec 2024 10:34:46 -0500 Neal Cardwell wrote: > > > I have not seen these warnings firing. Neal, have you seen this in the past ? > > > > I can't recall seeing these warnings over the past 5 years or so, and > > (from checking our monitoring) they don't seem to be firing in our > > fleet recently. > > FWIW I see this at Meta on 5.12 kernels, but nothing since. > Could be that one of our workloads is pinned to 5.12. > Youngmin, what's the newest kernel you can repro this on? > Hi Jakub. Thank you for taking an interest in this issue. We've seen this issue since 5.15 kernel. Now, we can see this on 6.6 kernel which is the newest kernel we are running.
On Wed, Dec 4, 2024 at 4:35 AM Youngmin Nam <youngmin.nam@samsung.com> wrote: > > On Tue, Dec 03, 2024 at 06:18:39PM -0800, Jakub Kicinski wrote: > > On Tue, 3 Dec 2024 10:34:46 -0500 Neal Cardwell wrote: > > > > I have not seen these warnings firing. Neal, have you seen this in the past ? > > > > > > I can't recall seeing these warnings over the past 5 years or so, and > > > (from checking our monitoring) they don't seem to be firing in our > > > fleet recently. > > > > FWIW I see this at Meta on 5.12 kernels, but nothing since. > > Could be that one of our workloads is pinned to 5.12. > > Youngmin, what's the newest kernel you can repro this on? > > > Hi Jakub. > Thank you for taking an interest in this issue. > > We've seen this issue since 5.15 kernel. > Now, we can see this on 6.6 kernel which is the newest kernel we are running. The fact that we are processing ACK packets after the write queue has been purged would be a serious bug. Thus the WARN() makes sense to us. It would be easy to build a packetdrill test. Please do so, then we can fix the root cause. Thank you !
On Wed, Dec 4, 2024 at 4:14 PM Eric Dumazet wrote: > To: Youngmin Nam <youngmin.nam@samsung.com> > Cc: Jakub Kicinski <kuba@kernel.org>; Neal Cardwell <ncardwell@google.com>; > davem@davemloft.net; dsahern@kernel.org; pabeni@redhat.com; > horms@kernel.org; dujeong.lee@samsung.com; guo88.liu@samsung.com; > yiwang.cai@samsung.com; netdev@vger.kernel.org; linux- > kernel@vger.kernel.org; joonki.min@samsung.com; hajun.sung@samsung.com; > d7271.choe@samsung.com; sw.ju@samsung.com > Subject: Re: [PATCH] tcp: check socket state before calling WARN_ON > > On Wed, Dec 4, 2024 at 4:35 AM Youngmin Nam <youngmin.nam@samsung.com> > wrote: > > > > On Tue, Dec 03, 2024 at 06:18:39PM -0800, Jakub Kicinski wrote: > > > On Tue, 3 Dec 2024 10:34:46 -0500 Neal Cardwell wrote: > > > > > I have not seen these warnings firing. Neal, have you seen this in > the past ? > > > > > > > > I can't recall seeing these warnings over the past 5 years or so, > > > > and (from checking our monitoring) they don't seem to be firing in > > > > our fleet recently. > > > > > > FWIW I see this at Meta on 5.12 kernels, but nothing since. > > > Could be that one of our workloads is pinned to 5.12. > > > Youngmin, what's the newest kernel you can repro this on? > > > > > Hi Jakub. > > Thank you for taking an interest in this issue. > > > > We've seen this issue since 5.15 kernel. > > Now, we can see this on 6.6 kernel which is the newest kernel we are > running. > > The fact that we are processing ACK packets after the write queue has been > purged would be a serious bug. > > Thus the WARN() makes sense to us. > > It would be easy to build a packetdrill test. Please do so, then we can > fix the root cause. > > Thank you ! Please let me share some more details and clarifications on the issue from ramdump snapshot locally secured. 1) This issue has been reported from Android-T linux kernel when we enabled panic_on_warn for the first time. Reproduction rate is not high and can be seen in any test cases with public internet connection. 2) Analysis from ramdump (which is not available at the moment). 2-A) From ramdump, I was able to find below values. tp->packets_out = 0 tp->retrans_out = 1 tp->max_packets_out = 1 tp->max_packets_Seq = 1575830358 tp->snd_ssthresh = 5 tp->snd_cwnd = 1 tp->prior_cwnd = 10 tp->wite_seq = 1575830359 tp->pushed_seq = 1575830358 tp->lost_out = 1 tp->sacked_out = 0 2-B) The last Tx packet from the device is (Time: 17371.562934) Hex: 4500005b95a3400040063e34c0a848188efacf0a888a01bb5ded432f5ad8ab29801800495b5800000101080a3a52197fef299d901703030022f3589123b0523bdd07be137a98ca9b5d3475332d4382c7b420571e6d437a07ba7787 Internet Protocol Version 4 0100 .... = Version: 4 .... 0101 = Header Length: 20 bytes (5) Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT) Total Length: 91 Identification: 0x95a3 (38307) 010. .... = Flags: 0x2, Don't fragment ...0 0000 0000 0000 = Fragment Offset: 0 Time to Live: 64 Protocol: TCP (6) Header Checksum: 0x3e34 Header checksum status: Unverified Source Address: 192.168.72.24 Destination Address: 142.250.207.10 Stream index: 0 Transmission Control Protocol Source Port: 34954 Destination Port: 443 Stream index: 0 Conversation completeness: Incomplete (0) TCP Segment Len: 39 Sequence Number: 0x5ded432f Sequence Number (raw): 1575830319 Next Sequence Number: 40 Acknowledgment Number: 0x5ad8ab29 Acknowledgment number (raw): 1524149033 1000 .... = Header Length: 32 bytes (8) Flags: 0x018 (PSH, ACK) Window: 73 Calculated window size: 73 Window size scaling factor: -1 (unknown) Checksum: 0x5b58 Checksum Status: Unverified Urgent Pointer: 0 Options: (12 bytes), No-Operation (NOP), No-Operation (NOP), Timestamps Timestamps SEQ/ACK analysis TCP payload (39 bytes) Transport Layer Security TLSv1.2 Record Layer: Application Data Protocol: Hypertext Transfer Protocol 2-C) When warn hit, DUT was processing (Time: 17399.502603, 28 seconds later since last Tx) Hex: 456000405FA20000720681F08EFACF0AC0A8481801BB888A5AD8AB295DED4356B010010D93D800000101080AEF299EF43A52089F0101050A5DED432F5DED4356 Internet Protocol Version 4 0100 .... = Version: 4 .... 0101 = Header Length: 20 bytes (5) Differentiated Services Field: 0x60 (DSCP: CS3, ECN: Not-ECT) Total Length: 64 Identification: 0x5fa2 (24482) 000. .... = Flags: 0x0 ...0 0000 0000 0000 = Fragment Offset: 0 Time to Live: 114 Protocol: TCP (6) Header Checksum: 0x81f0 Header checksum status: Unverified Source Address: 142.250.207.10 Destination Address: 192.168.72.24 Stream index: 0 Transmission Control Protocol Source Port: 443 Destination Port: 34954 Stream index: 0 Conversation completeness: Incomplete (0) TCP Segment Len: 0 Sequence Number: 0x5ad8ab29 Sequence Number (raw): 1524149033 Next Sequence Number: 1 Acknowledgment Number: 0x5ded4356 Acknowledgment number (raw): 1575830358 1011 .... = Header Length: 44 bytes (11) Flags: 0x010 (ACK) Window: 269 Calculated window size: 269 Window size scaling factor: -1 (unknown) Checksum: 0x93d8 Checksum Status: Unverified Urgent Pointer: 0 Options: (24 bytes), No-Operation (NOP), No-Operation (NOP), Timestamps, No-Operation (NOP), No-Operation (NOP), SACK Timestamps 2-D) The DUT received ack after 28 seconds from Access Point. 3)Clarification on "tcp_write_queue_purge" claim This is just my conjecture based on ramdump snapshot and it is not shown in calltrace. Based on tcp status in snapshot I thought tcp_write_queue_purge was called and packets_out was cleared. 4) In our kernel "/proc/sys/net/ipv4/tcp_mtu_probing" is set to 0.
On Wed, Dec 4, 2024 at 4:22 AM Youngmin Nam <youngmin.nam@samsung.com> wrote: > > On Tue, Dec 03, 2024 at 10:34:46AM -0500, Neal Cardwell wrote: > > On Tue, Dec 3, 2024 at 6:07 AM Eric Dumazet <edumazet@google.com> wrote: > > > > > > On Tue, Dec 3, 2024 at 9:10 AM Youngmin Nam <youngmin.nam@samsung.com> wrote: > > > > > > > > We encountered the following WARNINGs > > > > in tcp_sacktag_write_queue()/tcp_fastretrans_alert() > > > > which triggered a kernel panic due to panic_on_warn. > > > > > > > > case 1. > > > > ------------[ cut here ]------------ > > > > WARNING: CPU: 4 PID: 453 at net/ipv4/tcp_input.c:2026 > > > > Call trace: > > > > tcp_sacktag_write_queue+0xae8/0xb60 > > > > tcp_ack+0x4ec/0x12b8 > > > > tcp_rcv_state_process+0x22c/0xd38 > > > > tcp_v4_do_rcv+0x220/0x300 > > > > tcp_v4_rcv+0xa5c/0xbb4 > > > > ip_protocol_deliver_rcu+0x198/0x34c > > > > ip_local_deliver_finish+0x94/0xc4 > > > > ip_local_deliver+0x74/0x10c > > > > ip_rcv+0xa0/0x13c > > > > Kernel panic - not syncing: kernel: panic_on_warn set ... > > > > > > > > case 2. > > > > ------------[ cut here ]------------ > > > > WARNING: CPU: 0 PID: 648 at net/ipv4/tcp_input.c:3004 > > > > Call trace: > > > > tcp_fastretrans_alert+0x8ac/0xa74 > > > > tcp_ack+0x904/0x12b8 > > > > tcp_rcv_state_process+0x22c/0xd38 > > > > tcp_v4_do_rcv+0x220/0x300 > > > > tcp_v4_rcv+0xa5c/0xbb4 > > > > ip_protocol_deliver_rcu+0x198/0x34c > > > > ip_local_deliver_finish+0x94/0xc4 > > > > ip_local_deliver+0x74/0x10c > > > > ip_rcv+0xa0/0x13c > > > > Kernel panic - not syncing: kernel: panic_on_warn set ... > > > > > > > > > > I have not seen these warnings firing. Neal, have you seen this in the past ? > > > > I can't recall seeing these warnings over the past 5 years or so, and > > (from checking our monitoring) they don't seem to be firing in our > > fleet recently. > > > > > In any case this test on sk_state is too specific. > > > > I agree with Eric. IMHO TCP_FIN_WAIT1 deserves all the same warnings > > as ESTABLISHED, since in this state the connection may still have a > > big queue of data it is trying to reliably send to the other side, > > with full loss recovery and congestion control logic. > Yes I agree with Eric as well. > > > > > I would suggest that instead of running with panic_on_warn it would > > make more sense to not panic on warning, and instead add more detail > > to these warning messages in your kernels during your testing, to help > > debug what is going wrong. I would suggest adding to the warning > > message: > > > > tp->packets_out > > tp->sacked_out > > tp->lost_out > > tp->retrans_out > > tcp_is_sack(tp) > > tp->mss_cache > > inet_csk(sk)->icsk_ca_state > > inet_csk(sk)->icsk_pmtu_cookie > > Hi Neal. > Thanks for your opinion. > > By the way, we enable panic_on_warn by default for stability. > As you know, panic_on_warn is not applied to a specific subsystem but to the entire kernel. > We just want to avoid the kernel panic. > > So when I see below lwn article, I think we might use pr_warn() instaed of WARN_ON(). > https://lwn.net/Articles/969923/ > > How do you think of it ? You want to silence a WARN_ON() because you chose to make all WARN_ON() fatal. We want something to be able to fix real bugs, because we really care of TCP being correct. We have these discussions all the time. https://lwn.net/Articles/969923/ is a good summary. It makes sense for debug kernels (for instance used by syzkaller or other fuzzers) to panic, but for production this is a high risk, there is a reason panic_on_warn is not set by default. If we use a soft print there like pr_warn(), no future bug will be caught. What next : add a new sysctl to panic whenever a pr_warn() is hit by syzkaller ? Then Android will set this sysctl "because of stability concerns" > > > > A hunch would be that this is either firing for (a) non-SACK > > connections, or (b) after an MTU reduction. > > > > In particular, you might try `echo 0 > > > /proc/sys/net/ipv4/tcp_mtu_probing` and see if that makes the warnings > > go away. > > > > cheers, > > neal > >
On Wed, Dec 4, 2024 at 4:05 AM Youngmin Nam <youngmin.nam@samsung.com> wrote: > > Hi Eric. > Thanks for looking at this issue. > > On Tue, Dec 03, 2024 at 12:07:05PM +0100, Eric Dumazet wrote: > > On Tue, Dec 3, 2024 at 9:10 AM Youngmin Nam <youngmin.nam@samsung.com> wrote: > > > > > > We encountered the following WARNINGs > > > in tcp_sacktag_write_queue()/tcp_fastretrans_alert() > > > which triggered a kernel panic due to panic_on_warn. > > > > > > case 1. > > > ------------[ cut here ]------------ > > > WARNING: CPU: 4 PID: 453 at net/ipv4/tcp_input.c:2026 > > > Call trace: > > > tcp_sacktag_write_queue+0xae8/0xb60 > > > tcp_ack+0x4ec/0x12b8 > > > tcp_rcv_state_process+0x22c/0xd38 > > > tcp_v4_do_rcv+0x220/0x300 > > > tcp_v4_rcv+0xa5c/0xbb4 > > > ip_protocol_deliver_rcu+0x198/0x34c > > > ip_local_deliver_finish+0x94/0xc4 > > > ip_local_deliver+0x74/0x10c > > > ip_rcv+0xa0/0x13c > > > Kernel panic - not syncing: kernel: panic_on_warn set ... > > > > > > case 2. > > > ------------[ cut here ]------------ > > > WARNING: CPU: 0 PID: 648 at net/ipv4/tcp_input.c:3004 > > > Call trace: > > > tcp_fastretrans_alert+0x8ac/0xa74 > > > tcp_ack+0x904/0x12b8 > > > tcp_rcv_state_process+0x22c/0xd38 > > > tcp_v4_do_rcv+0x220/0x300 > > > tcp_v4_rcv+0xa5c/0xbb4 > > > ip_protocol_deliver_rcu+0x198/0x34c > > > ip_local_deliver_finish+0x94/0xc4 > > > ip_local_deliver+0x74/0x10c > > > ip_rcv+0xa0/0x13c > > > Kernel panic - not syncing: kernel: panic_on_warn set ... > > > > > > > I have not seen these warnings firing. Neal, have you seen this in the past ? > > > > Please provide the kernel version (this must be a pristine LTS one). > We are running Android kernel for Android mobile device which is based on LTS kernel 6.6-30. > But we've seen this issue since kernel 5.15 LTS. > > > and symbolized stack traces using scripts/decode_stacktrace.sh > Unfortunately, we don't have the matched vmlinux right now. So we need to rebuild and reproduce. > > > > If this warning was easy to trigger, please provide a packetdrill test ? > I'm not sure if we can run packetdrill test on Android device. Anyway let me check. > > FYI, Here are more detailed logs. > > Case 1. > [26496.422651]I[4: napi/wlan0-33: 467] ------------[ cut here ]------------ > [26496.422665]I[4: napi/wlan0-33: 467] WARNING: CPU: 4 PID: 467 at net/ipv4/tcp_input.c:2026 tcp_sacktag_write_queue+0xae8/0xb60 > [26496.423420]I[4: napi/wlan0-33: 467] CPU: 4 PID: 467 Comm: napi/wlan0-33 Tainted: G S OE 6.6.30-android15-8-geeceb2c9cdf1-ab20240930.125201-4k #1 a1c80b36942fa6e9575b2544032a7536ed502804 > [26496.423427]I[4: napi/wlan0-33: 467] Hardware name: Samsung ERD9955 board based on S5E9955 (DT) > [26496.423432]I[4: napi/wlan0-33: 467] pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) > [26496.423438]I[4: napi/wlan0-33: 467] pc : tcp_sacktag_write_queue+0xae8/0xb60 > [26496.423446]I[4: napi/wlan0-33: 467] lr : tcp_ack+0x4ec/0x12b8 > [26496.423455]I[4: napi/wlan0-33: 467] sp : ffffffc096b8b690 > [26496.423458]I[4: napi/wlan0-33: 467] x29: ffffffc096b8b710 x28: 0000000000008001 x27: 000000005526d635 > [26496.423469]I[4: napi/wlan0-33: 467] x26: ffffff8a19079684 x25: 000000005526dbfd x24: 0000000000000001 > [26496.423480]I[4: napi/wlan0-33: 467] x23: 000000000000000a x22: ffffff88e5f5b680 x21: 000000005526dbc9 > [26496.423489]I[4: napi/wlan0-33: 467] x20: ffffff8a19078d80 x19: ffffff88e9f4193e x18: ffffffd083114c80 > [26496.423499]I[4: napi/wlan0-33: 467] x17: 00000000529c6ef0 x16: 000000000000ff8b x15: 0000000000000000 > [26496.423508]I[4: napi/wlan0-33: 467] x14: 0000000000000001 x13: 0000000000000001 x12: 0000000000000000 > [26496.423517]I[4: napi/wlan0-33: 467] x11: 0000000000000000 x10: 0000000000000001 x9 : 00000000fffffffd > [26496.423526]I[4: napi/wlan0-33: 467] x8 : 0000000000000001 x7 : 0000000000000000 x6 : ffffffd081ec0bc4 > [26496.423536]I[4: napi/wlan0-33: 467] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffffffc096b8b818 > [26496.423545]I[4: napi/wlan0-33: 467] x2 : 000000005526d635 x1 : ffffff88e5f5b680 x0 : ffffff8a19078d80 > [26496.423555]I[4: napi/wlan0-33: 467] Call trace: > [26496.423558]I[4: napi/wlan0-33: 467] tcp_sacktag_write_queue+0xae8/0xb60 > [26496.423566]I[4: napi/wlan0-33: 467] tcp_ack+0x4ec/0x12b8 > [26496.423573]I[4: napi/wlan0-33: 467] tcp_rcv_state_process+0x22c/0xd38 > [26496.423580]I[4: napi/wlan0-33: 467] tcp_v4_do_rcv+0x220/0x300 > [26496.423590]I[4: napi/wlan0-33: 467] tcp_v4_rcv+0xa5c/0xbb4 > [26496.423596]I[4: napi/wlan0-33: 467] ip_protocol_deliver_rcu+0x198/0x34c > [26496.423607]I[4: napi/wlan0-33: 467] ip_local_deliver_finish+0x94/0xc4 > [26496.423614]I[4: napi/wlan0-33: 467] ip_local_deliver+0x74/0x10c > [26496.423620]I[4: napi/wlan0-33: 467] ip_rcv+0xa0/0x13c > [26496.423625]I[4: napi/wlan0-33: 467] __netif_receive_skb_core+0xe14/0x1104 > [26496.423642]I[4: napi/wlan0-33: 467] __netif_receive_skb_list_core+0xb8/0x2dc > [26496.423649]I[4: napi/wlan0-33: 467] netif_receive_skb_list_internal+0x234/0x320 > [26496.423655]I[4: napi/wlan0-33: 467] napi_complete_done+0xb4/0x1a0 > [26496.423660]I[4: napi/wlan0-33: 467] slsi_rx_netif_napi_poll+0x22c/0x258 [scsc_wlan 16ac2100e65b7c78ce863cecc238b39b162dbe82] > [26496.423822]I[4: napi/wlan0-33: 467] __napi_poll+0x5c/0x238 > [26496.423829]I[4: napi/wlan0-33: 467] napi_threaded_poll+0x110/0x204 > [26496.423836]I[4: napi/wlan0-33: 467] kthread+0x114/0x138 > [26496.423845]I[4: napi/wlan0-33: 467] ret_from_fork+0x10/0x20 > [26496.423856]I[4: napi/wlan0-33: 467] Kernel panic - not syncing: kernel: panic_on_warn set .. > > Case 2. > [ 1843.463330]I[0: surfaceflinger: 648] ------------[ cut here ]------------ > [ 1843.463355]I[0: surfaceflinger: 648] WARNING: CPU: 0 PID: 648 at net/ipv4/tcp_input.c:3004 tcp_fastretrans_alert+0x8ac/0xa74 > [ 1843.464508]I[0: surfaceflinger: 648] CPU: 0 PID: 648 Comm: surfaceflinger Tainted: G S OE 6.6.30-android15-8-geeceb2c9cdf1-ab20241017.075836-4k #1 de751202c2c5ab3ec352a00ae470fc5e907bdcfe > [ 1843.464520]I[0: surfaceflinger: 648] Hardware name: Samsung ERD8855 board based on S5E8855 (DT) > [ 1843.464527]I[0: surfaceflinger: 648] pstate: 23400005 (nzCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) > [ 1843.464535]I[0: surfaceflinger: 648] pc : tcp_fastretrans_alert+0x8ac/0xa74 > [ 1843.464548]I[0: surfaceflinger: 648] lr : tcp_ack+0x904/0x12b8 > [ 1843.464556]I[0: surfaceflinger: 648] sp : ffffffc0800036e0 > [ 1843.464561]I[0: surfaceflinger: 648] x29: ffffffc0800036e0 x28: 0000000000008005 x27: 000000001bc05562 > [ 1843.464579]I[0: surfaceflinger: 648] x26: ffffff890418a3c4 x25: 0000000000000000 x24: 000000000000cd02 > [ 1843.464595]I[0: surfaceflinger: 648] x23: 000000001bc05562 x22: 0000000000000000 x21: ffffffc080003800 > [ 1843.464611]I[0: surfaceflinger: 648] x20: ffffffc08000378c x19: ffffff8904189ac0 x18: 0000000000000000 > [ 1843.464627]I[0: surfaceflinger: 648] x17: 00000000529c6ef0 x16: 000000000000ff8b x15: 0000000000000001 > [ 1843.464642]I[0: surfaceflinger: 648] x14: 0000000000000001 x13: 0000000000000001 x12: 0000000000000000 > [ 1843.464658]I[0: surfaceflinger: 648] x11: ffffff883e9c9540 x10: 0000000000000001 x9 : 0000000000000001 > [ 1843.464673]I[0: surfaceflinger: 648] x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffffffd081ec0bc4 > [ 1843.464689]I[0: surfaceflinger: 648] x5 : 0000000000000000 x4 : ffffffc08000378c x3 : ffffffc080003800 > [ 1843.464704]I[0: surfaceflinger: 648] x2 : 0000000000000000 x1 : 000000001bc05562 x0 : ffffff8904189ac0 > [ 1843.464720]I[0: surfaceflinger: 648] Call trace: > [ 1843.464725]I[0: surfaceflinger: 648] tcp_fastretrans_alert+0x8ac/0xa74 > [ 1843.464735]I[0: surfaceflinger: 648] tcp_ack+0x904/0x12b8 > [ 1843.464743]I[0: surfaceflinger: 648] tcp_rcv_state_process+0x22c/0xd38 > [ 1843.464751]I[0: surfaceflinger: 648] tcp_v4_do_rcv+0x220/0x300 > [ 1843.464760]I[0: surfaceflinger: 648] tcp_v4_rcv+0xa5c/0xbb4 > [ 1843.464767]I[0: surfaceflinger: 648] ip_protocol_deliver_rcu+0x198/0x34c > [ 1843.464776]I[0: surfaceflinger: 648] ip_local_deliver_finish+0x94/0xc4 > [ 1843.464784]I[0: surfaceflinger: 648] ip_local_deliver+0x74/0x10c > [ 1843.464791]I[0: surfaceflinger: 648] ip_rcv+0xa0/0x13c > [ 1843.464799]I[0: surfaceflinger: 648] __netif_receive_skb_core+0xe14/0x1104 > [ 1843.464810]I[0: surfaceflinger: 648] __netif_receive_skb+0x40/0x124 > [ 1843.464818]I[0: surfaceflinger: 648] netif_receive_skb+0x7c/0x234 > [ 1843.464825]I[0: surfaceflinger: 648] slsi_rx_data_deliver_skb+0x1e0/0xdbc [scsc_wlan 12b378a8d5cf5e6cd833136fc49079d43751bd28] > [ 1843.465025]I[0: surfaceflinger: 648] slsi_ba_process_complete+0x70/0xa4 [scsc_wlan 12b378a8d5cf5e6cd833136fc49079d43751bd28] > [ 1843.465219]I[0: surfaceflinger: 648] slsi_ba_aging_timeout_handler+0x324/0x354 [scsc_wlan 12b378a8d5cf5e6cd833136fc49079d43751bd28] > [ 1843.465410]I[0: surfaceflinger: 648] call_timer_fn+0xd0/0x360 > [ 1843.465423]I[0: surfaceflinger: 648] __run_timers+0x1b4/0x268 > [ 1843.465432]I[0: surfaceflinger: 648] run_timer_softirq+0x24/0x4c > [ 1843.465440]I[0: surfaceflinger: 648] __do_softirq+0x158/0x45c > [ 1843.465448]I[0: surfaceflinger: 648] ____do_softirq+0x10/0x20 > [ 1843.465455]I[0: surfaceflinger: 648] call_on_irq_stack+0x3c/0x74 > [ 1843.465463]I[0: surfaceflinger: 648] do_softirq_own_stack+0x1c/0x2c > [ 1843.465470]I[0: surfaceflinger: 648] __irq_exit_rcu+0x54/0xb4 > [ 1843.465480]I[0: surfaceflinger: 648] irq_exit_rcu+0x10/0x1c > [ 1843.465489]I[0: surfaceflinger: 648] el0_interrupt+0x54/0xe0 > [ 1843.465499]I[0: surfaceflinger: 648] __el0_irq_handler_common+0x18/0x28 > [ 1843.465508]I[0: surfaceflinger: 648] el0t_64_irq_handler+0x10/0x1c > [ 1843.465516]I[0: surfaceflinger: 648] el0t_64_irq+0x1a8/0x1ac > [ 1843.465525]I[0: surfaceflinger: 648] Kernel panic - not syncing: kernel: panic_on_warn set ... > > > > When we check the socket state value at the time of the issue, > > > it was 0x4. > > > > > > skc_state = 0x4, > > > > > > This is "TCP_FIN_WAIT1" and which means the device closed its socket. > > > > > > enum { > > > TCP_ESTABLISHED = 1, > > > TCP_SYN_SENT, > > > TCP_SYN_RECV, > > > TCP_FIN_WAIT1, > > > > > > And also this means tp->packets_out was initialized as 0 > > > by tcp_write_queue_purge(). > > > > What stack trace leads to this tcp_write_queue_purge() exactly ? > I couldn't find the exact call stack to this. > But I just thought that the function would be called based on ramdump snapshot. > > (*(struct tcp_sock *)(0xFFFFFF800467AB00)).packets_out = 0 > (*(struct inet_connection_sock *)0xFFFFFF800467AB00).icsk_backoff = 0 TCP_FIN_WAIT1 is set whenever the application does a shutdown(fd, SHUT_WR); This means that all bytes in the send queue and retransmit queue should be kept, and will eventually be sent. tcp_write_queue_purge() must not be called until we receive some valid RST packet or fatal timeout. 6.6.30 is old, LTS 6.6.63 has some TCP changes that might br related. $ git log --oneline v6.6.30..v6.6.63 -- net/ipv4/tcp*c 229dfdc36f31a8d47433438bc0e6e1662c4ab404 tcp: fix mptcp DSS corruption due to large pmtu xmit 2daffbd861de532172079dceef5c0f36a26eee14 tcp: fix TFO SYN_RECV to not zero retrans_stamp with retransmits out 718c49f840ef4e451bf44a8a62aae89ebdd5a687 tcp: new TCP_INFO stats for RTO events 04dce9a120502aea4ca66eebf501f404a22cd493 tcp: fix tcp_enter_recovery() to zero retrans_stamp when it's safe e676ca60ad2a6fdeb718b5e7a337a8fb1591d45f tcp: fix to allow timestamp undo if no retransmits were sent 5cce1c07bf8972d3ab94c25aa9fb6170f99082e0 tcp: avoid reusing FIN_WAIT2 when trying to find port in connect() process 4fe707a2978929b498d3730d77a6ab103881420d tcp: process the 3rd ACK with sk_socket for TFO/MPTCP 9fd29738377c10749cb292510ebc202988ea0a31 tcp: Don't drop SYN+ACK for simultaneous connect(). c8219a27fa43a2cbf99f5176f6dddfe73e7a24ae tcp_bpf: fix return value of tcp_bpf_sendmsg() 69f397e60c3be615c32142682d62fc0b6d5d5d67 net: remove NULL-pointer net parameter in ip_metrics_convert f0974e6bc385f0e53034af17abbb86ac0246ef1c tcp: do not export tcp_twsk_purge() 99580ae890ec8bd98b21a2a9c6668f8f1555b62e tcp: prevent concurrent execution of tcp_sk_exit_batch 7348061662c7149b81e38e545d5dd6bd427bec81 tcp/dccp: do not care about families in inet_twsk_purge() 227355ad4e4a6da5435451b3cc7f3ed9091288fa tcp: Update window clamping condition 77100f2e8412dbb84b3e7f1b947c9531cb509492 tcp_metrics: optimize tcp_metrics_flush_all() 6772c4868a8e7ad5305957cdb834ce881793acb7 net: drop bad gso csum_start and offset in virtio_net_hdr 1cfdc250b3d210acd5a4a47337b654e04693cf7f tcp: Adjust clamping window for applications specifying SO_RCVBUF f9fef23a81db9adc1773979fabf921eba679d5e7 tcp: annotate data-races around tp->window_clamp 44aa1e461ccd1c2e49cffad5e75e1b944ec590ef tcp: fix races in tcp_v[46]_err() bc4f9c2ccd68afec3a8395d08fd329af2022c7e8 tcp: fix race in tcp_write_err() ecc6836d63513fb4857a7525608d7fdd0c837cb3 tcp: add tcp_done_with_error() helper dfcdd7f89e401d2c6616be90c76c2fac3fa98fde tcp: avoid too many retransmit packets b75f281bddebdcf363884f0d53c562366e9ead99 tcp: use signed arithmetic in tcp_rtx_probe0_timed_out() 124886cf20599024eb33608a2c3608b7abf3839b tcp: fix incorrect undo caused by DSACK of TLP retransmit 8c2debdd170e395934ac0e039748576dfde14e99 tcp_metrics: validate source addr length 8a7fc2362d6d234befde681ea4fb6c45c1789ed5 UPSTREAM: tcp: fix DSACK undo in fast recovery to call tcp_try_to_open() b4b26d23a1e2bc188cec8592e111d68d83b9031f tcp: fix tcp_rcv_fastopen_synack() to enter TCP_CA_Loss for failed TFO fdae4d139f4778b20a40c60705c53f5f146459b5 Fix race for duplicate reqsk on identical SYN 250fad18b0c959b137ad745428fb411f1ac1bbc6 tcp: clear tp->retrans_stamp in tcp_rcv_fastopen_synack() acdf17546ef8ee73c94e442e3f4b933e42c3dfac tcp: count CLOSE-WAIT sockets for TCP_MIB_CURRESTAB 50569d12945f86fa4b321c4b1c3005874dbaa0f1 net: tls: fix marking packets as decrypted 02261d3f9dc7d1d7be7d778f839e3404ab99034c tcp: Fix shift-out-of-bounds in dctcp_update_alpha(). 00bb933578acd88395bf6e770cacdbe2d6a0be86 tcp: avoid premature drops in tcp_add_backlog() 6e48faad92be13166184d21506e4e54c79c13adc tcp: Use refcount_inc_not_zero() in tcp_twsk_unique(). f47d0d32fa94e815fdd78b8b88684873e67939f4 tcp: defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets
On Wed, Dec 4, 2024 at 2:48 AM Dujeong.lee <dujeong.lee@samsung.com> wrote: > > On Wed, Dec 4, 2024 at 4:14 PM Eric Dumazet wrote: > > To: Youngmin Nam <youngmin.nam@samsung.com> > > Cc: Jakub Kicinski <kuba@kernel.org>; Neal Cardwell <ncardwell@google.com>; > > davem@davemloft.net; dsahern@kernel.org; pabeni@redhat.com; > > horms@kernel.org; dujeong.lee@samsung.com; guo88.liu@samsung.com; > > yiwang.cai@samsung.com; netdev@vger.kernel.org; linux- > > kernel@vger.kernel.org; joonki.min@samsung.com; hajun.sung@samsung.com; > > d7271.choe@samsung.com; sw.ju@samsung.com > > Subject: Re: [PATCH] tcp: check socket state before calling WARN_ON > > > > On Wed, Dec 4, 2024 at 4:35 AM Youngmin Nam <youngmin.nam@samsung.com> > > wrote: > > > > > > On Tue, Dec 03, 2024 at 06:18:39PM -0800, Jakub Kicinski wrote: > > > > On Tue, 3 Dec 2024 10:34:46 -0500 Neal Cardwell wrote: > > > > > > I have not seen these warnings firing. Neal, have you seen this in > > the past ? > > > > > > > > > > I can't recall seeing these warnings over the past 5 years or so, > > > > > and (from checking our monitoring) they don't seem to be firing in > > > > > our fleet recently. > > > > > > > > FWIW I see this at Meta on 5.12 kernels, but nothing since. > > > > Could be that one of our workloads is pinned to 5.12. > > > > Youngmin, what's the newest kernel you can repro this on? > > > > > > > Hi Jakub. > > > Thank you for taking an interest in this issue. > > > > > > We've seen this issue since 5.15 kernel. > > > Now, we can see this on 6.6 kernel which is the newest kernel we are > > running. > > > > The fact that we are processing ACK packets after the write queue has been > > purged would be a serious bug. > > > > Thus the WARN() makes sense to us. > > > > It would be easy to build a packetdrill test. Please do so, then we can > > fix the root cause. > > > > Thank you ! > > > Please let me share some more details and clarifications on the issue from ramdump snapshot locally secured. > > 1) This issue has been reported from Android-T linux kernel when we enabled panic_on_warn for the first time. > Reproduction rate is not high and can be seen in any test cases with public internet connection. > > 2) Analysis from ramdump (which is not available at the moment). > 2-A) From ramdump, I was able to find below values. > tp->packets_out = 0 > tp->retrans_out = 1 > tp->max_packets_out = 1 > tp->max_packets_Seq = 1575830358 > tp->snd_ssthresh = 5 > tp->snd_cwnd = 1 > tp->prior_cwnd = 10 > tp->wite_seq = 1575830359 > tp->pushed_seq = 1575830358 > tp->lost_out = 1 > tp->sacked_out = 0 Thanks for all the details! If the ramdump becomes available again at some point, would it be possible to pull out the following values as well: tp->mss_cache inet_csk(sk)->icsk_pmtu_cookie inet_csk(sk)->icsk_ca_state Thanks, neal
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 5bdf13ac26ef..62f4c285ab80 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2037,7 +2037,8 @@ tcp_sacktag_write_queue(struct sock *sk, const struct sk_buff *ack_skb, WARN_ON((int)tp->sacked_out < 0); WARN_ON((int)tp->lost_out < 0); WARN_ON((int)tp->retrans_out < 0); - WARN_ON((int)tcp_packets_in_flight(tp) < 0); + if (sk->sk_state == TCP_ESTABLISHED) + WARN_ON((int)tcp_packets_in_flight(tp) < 0); #endif return state->flag; } @@ -3080,7 +3081,8 @@ static void tcp_fastretrans_alert(struct sock *sk, const u32 prior_snd_una, return; /* C. Check consistency of the current state. */ - tcp_verify_left_out(tp); + if (sk->sk_state == TCP_ESTABLISHED) + tcp_verify_left_out(tp); /* D. Check state exit conditions. State can be terminated * when high_seq is ACKed. */
We encountered the following WARNINGs in tcp_sacktag_write_queue()/tcp_fastretrans_alert() which triggered a kernel panic due to panic_on_warn. case 1. ------------[ cut here ]------------ WARNING: CPU: 4 PID: 453 at net/ipv4/tcp_input.c:2026 Call trace: tcp_sacktag_write_queue+0xae8/0xb60 tcp_ack+0x4ec/0x12b8 tcp_rcv_state_process+0x22c/0xd38 tcp_v4_do_rcv+0x220/0x300 tcp_v4_rcv+0xa5c/0xbb4 ip_protocol_deliver_rcu+0x198/0x34c ip_local_deliver_finish+0x94/0xc4 ip_local_deliver+0x74/0x10c ip_rcv+0xa0/0x13c Kernel panic - not syncing: kernel: panic_on_warn set ... case 2. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 648 at net/ipv4/tcp_input.c:3004 Call trace: tcp_fastretrans_alert+0x8ac/0xa74 tcp_ack+0x904/0x12b8 tcp_rcv_state_process+0x22c/0xd38 tcp_v4_do_rcv+0x220/0x300 tcp_v4_rcv+0xa5c/0xbb4 ip_protocol_deliver_rcu+0x198/0x34c ip_local_deliver_finish+0x94/0xc4 ip_local_deliver+0x74/0x10c ip_rcv+0xa0/0x13c Kernel panic - not syncing: kernel: panic_on_warn set ... When we check the socket state value at the time of the issue, it was 0x4. skc_state = 0x4, This is "TCP_FIN_WAIT1" and which means the device closed its socket. enum { TCP_ESTABLISHED = 1, TCP_SYN_SENT, TCP_SYN_RECV, TCP_FIN_WAIT1, And also this means tp->packets_out was initialized as 0 by tcp_write_queue_purge(). In a congested network situation, a TCP ACK for an already closed session may be received with a delay from the peer. This can trigger the WARN_ON macro to help debug the situation. To make this situation more meaningful, we would like to call WARN_ON only when the state of the socket is "TCP_ESTABLISHED". This will prevent the kernel from triggering a panic due to panic_on_warn. Signed-off-by: Youngmin Nam <youngmin.nam@samsung.com> --- net/ipv4/tcp_input.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)