Message ID | 20210316223647.4080796-1-weiwan@google.com (mailing list archive) |
---|---|
State | Accepted |
Commit | cb038357937ee4f589aab2469ec3896dce90f317 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v4] net: fix race between napi kthread mode and busy poll | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | warning | 5 maintainers not CCed: daniel@iogearbox.net andriin@fb.com cong.wang@bytedance.com ast@kernel.org ap420073@gmail.com |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 6938 this patch: 6938 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | warning | WARNING: line length of 81 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 7152 this patch: 7152 |
netdev/header_inline | success | Link |
On Tue, 16 Mar 2021 15:36:47 -0700 Wei Wang wrote: > Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to > determine if the kthread owns this napi and could call napi->poll() on > it. However, if socket busy poll is enabled, it is possible that the > busy poll thread grabs this SCHED bit (after the previous napi->poll() > invokes napi_complete_done() and clears SCHED bit) and tries to poll > on the same napi. napi_disable() could grab the SCHED bit as well. > This patch tries to fix this race by adding a new bit > NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in > ____napi_schedule() if the threaded mode is enabled, and gets cleared > in napi_complete_done(), and we only poll the napi in kthread if this > bit is set. This helps distinguish the ownership of the napi between > kthread and other scenarios and fixes the race issue. > > Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") > Reported-by: Martin Zaharinov <micron10@gmail.com> > Suggested-by: Jakub Kicinski <kuba@kernel.org> > Signed-off-by: Wei Wang <weiwan@google.com> > Cc: Alexander Duyck <alexanderduyck@fb.com> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Paolo Abeni <pabeni@redhat.com> > Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hello: This patch was applied to netdev/net.git (refs/heads/master): On Tue, 16 Mar 2021 15:36:47 -0700 you wrote: > Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to > determine if the kthread owns this napi and could call napi->poll() on > it. However, if socket busy poll is enabled, it is possible that the > busy poll thread grabs this SCHED bit (after the previous napi->poll() > invokes napi_complete_done() and clears SCHED bit) and tries to poll > on the same napi. napi_disable() could grab the SCHED bit as well. > This patch tries to fix this race by adding a new bit > NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in > ____napi_schedule() if the threaded mode is enabled, and gets cleared > in napi_complete_done(), and we only poll the napi in kthread if this > bit is set. This helps distinguish the ownership of the napi between > kthread and other scenarios and fixes the race issue. > > [...] Here is the summary with links: - [net,v4] net: fix race between napi kthread mode and busy poll https://git.kernel.org/netdev/net/c/cb038357937e You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
Hi Wei Check this: [ 39.706567] ------------[ cut here ]------------ [ 39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557) [ 39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100 [ 39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas [ 39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G O 5.11.7 #1 [ 39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020 [ 39.706619] Workqueue: events work_for_cpu_fn [ 39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100 [ 39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 [ 39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292 [ 39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff [ 39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001 [ 39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea [ 39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008 [ 39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000 [ 39.706646] FS: 0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000 [ 39.706649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0 [ 39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 39.706656] Call Trace: [ 39.706658] i40e_setup_pf_switch+0x617/0xf80 [i40e] [ 39.706683] i40e_probe.part.0.cold+0x8dc/0x109e [i40e] [ 39.706708] ? acpi_ns_check_object_type+0xd4/0x193 [ 39.706713] ? acpi_ns_check_package_list+0xfd/0x205 [ 39.706716] ? __kmalloc+0x37/0x160 [ 39.706720] ? kmem_cache_alloc+0xcb/0x120 [ 39.706723] ? irq_get_irq_data+0x5/0x20 [ 39.706726] ? mp_check_pin_attr+0xe/0xf0 [ 39.706729] ? irq_get_irq_data+0x5/0x20 [ 39.706731] ? mp_map_pin_to_irq+0xb7/0x2c0 [ 39.706735] ? acpi_register_gsi_ioapic+0x86/0x150 [ 39.706739] ? pci_conf1_read+0x9f/0xf0 [ 39.706743] ? pci_bus_read_config_word+0x2e/0x40 [ 39.706746] local_pci_probe+0x1b/0x40 [ 39.706750] work_for_cpu_fn+0xb/0x20 [ 39.706754] process_one_work+0x1ec/0x350 [ 39.706758] worker_thread+0x24b/0x4d0 [ 39.706760] ? process_one_work+0x350/0x350 [ 39.706762] kthread+0xea/0x120 [ 39.706766] ? kthread_park+0x80/0x80 [ 39.706770] ret_from_fork+0x1f/0x30 [ 39.706774] ---[ end trace 7a203f3ec972a377 ]--- Martin > On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote: > > Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to > determine if the kthread owns this napi and could call napi->poll() on > it. However, if socket busy poll is enabled, it is possible that the > busy poll thread grabs this SCHED bit (after the previous napi->poll() > invokes napi_complete_done() and clears SCHED bit) and tries to poll > on the same napi. napi_disable() could grab the SCHED bit as well. > This patch tries to fix this race by adding a new bit > NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in > ____napi_schedule() if the threaded mode is enabled, and gets cleared > in napi_complete_done(), and we only poll the napi in kthread if this > bit is set. This helps distinguish the ownership of the napi between > kthread and other scenarios and fixes the race issue. > > Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") > Reported-by: Martin Zaharinov <micron10@gmail.com> > Suggested-by: Jakub Kicinski <kuba@kernel.org> > Signed-off-by: Wei Wang <weiwan@google.com> > Cc: Alexander Duyck <alexanderduyck@fb.com> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Paolo Abeni <pabeni@redhat.com> > Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> > --- > Change since v3: > - Add READ_ONCE() for thread->state and add comments in > ____napi_schedule(). > > include/linux/netdevice.h | 2 ++ > net/core/dev.c | 19 ++++++++++++++++++- > 2 files changed, 20 insertions(+), 1 deletion(-) > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 5b67ea89d5f2..87a5d186faff 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -360,6 +360,7 @@ enum { > NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */ > NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/ > NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/ > + NAPI_STATE_SCHED_THREADED, /* Napi is currently scheduled in threaded mode */ > }; > > enum { > @@ -372,6 +373,7 @@ enum { > NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL), > NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL), > NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED), > + NAPIF_STATE_SCHED_THREADED = BIT(NAPI_STATE_SCHED_THREADED), > }; > > enum gro_result { > diff --git a/net/core/dev.c b/net/core/dev.c > index 6c5967e80132..d3195a95f30e 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd, > */ > thread = READ_ONCE(napi->thread); > if (thread) { > + /* Avoid doing set_bit() if the thread is in > + * INTERRUPTIBLE state, cause napi_thread_wait() > + * makes sure to proceed with napi polling > + * if the thread is explicitly woken from here. > + */ > + if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE) > + set_bit(NAPI_STATE_SCHED_THREADED, &napi->state); > wake_up_process(thread); > return; > } > @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done) > WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED)); > > new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED | > + NAPIF_STATE_SCHED_THREADED | > NAPIF_STATE_PREFER_BUSY_POLL); > > /* If STATE_MISSED was set, leave STATE_SCHED set, > @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) > > static int napi_thread_wait(struct napi_struct *napi) > { > + bool woken = false; > + > set_current_state(TASK_INTERRUPTIBLE); > > while (!kthread_should_stop() && !napi_disable_pending(napi)) { > - if (test_bit(NAPI_STATE_SCHED, &napi->state)) { > + /* Testing SCHED_THREADED bit here to make sure the current > + * kthread owns this napi and could poll on this napi. > + * Testing SCHED bit is not enough because SCHED bit might be > + * set by some other busy poll thread or by napi_disable(). > + */ > + if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) { > WARN_ON(!list_empty(&napi->poll_list)); > __set_current_state(TASK_RUNNING); > return 0; > } > > schedule(); > + /* woken being true indicates this thread owns this napi. */ > + woken = true; > set_current_state(TASK_INTERRUPTIBLE); > } > __set_current_state(TASK_RUNNING); > -- > 2.31.0.rc2.261.g7f71774620-goog >
On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote: > > Hi Wei > Check this: > > [ 39.706567] ------------[ cut here ]------------ > [ 39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557) > [ 39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100 Probably more relevant to Intel maintainers than Wei :/ > [ 39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas > [ 39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G O 5.11.7 #1 > [ 39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020 > [ 39.706619] Workqueue: events work_for_cpu_fn > [ 39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100 > [ 39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 > [ 39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292 > [ 39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff > [ 39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001 > [ 39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea > [ 39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008 > [ 39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000 > [ 39.706646] FS: 0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000 > [ 39.706649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0 > [ 39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 39.706656] Call Trace: > [ 39.706658] i40e_setup_pf_switch+0x617/0xf80 [i40e] > [ 39.706683] i40e_probe.part.0.cold+0x8dc/0x109e [i40e] > [ 39.706708] ? acpi_ns_check_object_type+0xd4/0x193 > [ 39.706713] ? acpi_ns_check_package_list+0xfd/0x205 > [ 39.706716] ? __kmalloc+0x37/0x160 > [ 39.706720] ? kmem_cache_alloc+0xcb/0x120 > [ 39.706723] ? irq_get_irq_data+0x5/0x20 > [ 39.706726] ? mp_check_pin_attr+0xe/0xf0 > [ 39.706729] ? irq_get_irq_data+0x5/0x20 > [ 39.706731] ? mp_map_pin_to_irq+0xb7/0x2c0 > [ 39.706735] ? acpi_register_gsi_ioapic+0x86/0x150 > [ 39.706739] ? pci_conf1_read+0x9f/0xf0 > [ 39.706743] ? pci_bus_read_config_word+0x2e/0x40 > [ 39.706746] local_pci_probe+0x1b/0x40 > [ 39.706750] work_for_cpu_fn+0xb/0x20 > [ 39.706754] process_one_work+0x1ec/0x350 > [ 39.706758] worker_thread+0x24b/0x4d0 > [ 39.706760] ? process_one_work+0x350/0x350 > [ 39.706762] kthread+0xea/0x120 > [ 39.706766] ? kthread_park+0x80/0x80 > [ 39.706770] ret_from_fork+0x1f/0x30 > [ 39.706774] ---[ end trace 7a203f3ec972a377 ]--- > > Martin > > > > On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote: > > > > Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to > > determine if the kthread owns this napi and could call napi->poll() on > > it. However, if socket busy poll is enabled, it is possible that the > > busy poll thread grabs this SCHED bit (after the previous napi->poll() > > invokes napi_complete_done() and clears SCHED bit) and tries to poll > > on the same napi. napi_disable() could grab the SCHED bit as well. > > This patch tries to fix this race by adding a new bit > > NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in > > ____napi_schedule() if the threaded mode is enabled, and gets cleared > > in napi_complete_done(), and we only poll the napi in kthread if this > > bit is set. This helps distinguish the ownership of the napi between > > kthread and other scenarios and fixes the race issue. > > > > Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") > > Reported-by: Martin Zaharinov <micron10@gmail.com> > > Suggested-by: Jakub Kicinski <kuba@kernel.org> > > Signed-off-by: Wei Wang <weiwan@google.com> > > Cc: Alexander Duyck <alexanderduyck@fb.com> > > Cc: Eric Dumazet <edumazet@google.com> > > Cc: Paolo Abeni <pabeni@redhat.com> > > Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> > > --- > > Change since v3: > > - Add READ_ONCE() for thread->state and add comments in > > ____napi_schedule(). > > > > include/linux/netdevice.h | 2 ++ > > net/core/dev.c | 19 ++++++++++++++++++- > > 2 files changed, 20 insertions(+), 1 deletion(-) > > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > > index 5b67ea89d5f2..87a5d186faff 100644 > > --- a/include/linux/netdevice.h > > +++ b/include/linux/netdevice.h > > @@ -360,6 +360,7 @@ enum { > > NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */ > > NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/ > > NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/ > > + NAPI_STATE_SCHED_THREADED, /* Napi is currently scheduled in threaded mode */ > > }; > > > > enum { > > @@ -372,6 +373,7 @@ enum { > > NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL), > > NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL), > > NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED), > > + NAPIF_STATE_SCHED_THREADED = BIT(NAPI_STATE_SCHED_THREADED), > > }; > > > > enum gro_result { > > diff --git a/net/core/dev.c b/net/core/dev.c > > index 6c5967e80132..d3195a95f30e 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd, > > */ > > thread = READ_ONCE(napi->thread); > > if (thread) { > > + /* Avoid doing set_bit() if the thread is in > > + * INTERRUPTIBLE state, cause napi_thread_wait() > > + * makes sure to proceed with napi polling > > + * if the thread is explicitly woken from here. > > + */ > > + if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE) > > + set_bit(NAPI_STATE_SCHED_THREADED, &napi->state); > > wake_up_process(thread); > > return; > > } > > @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done) > > WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED)); > > > > new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED | > > + NAPIF_STATE_SCHED_THREADED | > > NAPIF_STATE_PREFER_BUSY_POLL); > > > > /* If STATE_MISSED was set, leave STATE_SCHED set, > > @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) > > > > static int napi_thread_wait(struct napi_struct *napi) > > { > > + bool woken = false; > > + > > set_current_state(TASK_INTERRUPTIBLE); > > > > while (!kthread_should_stop() && !napi_disable_pending(napi)) { > > - if (test_bit(NAPI_STATE_SCHED, &napi->state)) { > > + /* Testing SCHED_THREADED bit here to make sure the current > > + * kthread owns this napi and could poll on this napi. > > + * Testing SCHED bit is not enough because SCHED bit might be > > + * set by some other busy poll thread or by napi_disable(). > > + */ > > + if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) { > > WARN_ON(!list_empty(&napi->poll_list)); > > __set_current_state(TASK_RUNNING); > > return 0; > > } > > > > schedule(); > > + /* woken being true indicates this thread owns this napi. */ > > + woken = true; > > set_current_state(TASK_INTERRUPTIBLE); > > } > > __set_current_state(TASK_RUNNING); > > -- > > 2.31.0.rc2.261.g7f71774620-goog > > >
Hi Eric May be I write to Wai to check yet. And one other problem may be is for network team and I don't know how much it has to do with it. Mar 20 06:06:28 [367562.703896][T1217504] team0: Failed to send options change via netlink (err -105) Mar 20 06:06:33 [367567.824137][T1217504] team0: Failed to send options change via netlink (err -105) Mar 20 06:06:39 [367572.944079][T1217504] team0: Failed to send options change via netlink (err -105) Mar 20 06:06:44 [367578.064217][T1217504] team0: Failed to send options change via netlink (err -105) Mar 20 06:06:49 [367583.184378][T1217504] team0: Failed to send options change via netlink (err -105) Mar 20 06:06:54 [367588.304470][T1217504] team0: Failed to send options change via netlink (err -105) Mar 20 06:06:59 [367593.414634][T1217504] team0: Failed to send options change via netlink (err -105) Mar 20 06:07:04 [367598.534996][T1217504] team0: Failed to send options change via netlink (err -105) Mar 20 06:07:09 [367603.664872][T1217504] team0: Failed to send options change via netlink (err -105) Mar 20 06:07:14 [367608.785017][T1217504] team0: Failed to send options change via netlink (err -105) Mar 20 06:07:20 [367613.905101][T1217504] team0: Failed to send options change via netlink (err -105) Mar 20 06:07:25 [367619.025236][T1217504] team0: Failed to send options change via netlink (err -105) Mar 20 06:07:30 [367624.145448][T1217504] team0: Failed to send options change via netlink (err -105) Mar 20 06:07:35 [367629.265489][T1217504] team0: Failed to send options change via netlink (err -105) Mar 20 06:07:40 [367634.385630][T1217504] team0: Failed to send options change via netlink (err -105) when this happens it stops connecting to the server Martin > On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote: > > On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote: >> >> Hi Wei >> Check this: >> >> [ 39.706567] ------------[ cut here ]------------ >> [ 39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557) >> [ 39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100 > > Probably more relevant to Intel maintainers than Wei :/ > >> [ 39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas >> [ 39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G O 5.11.7 #1 >> [ 39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020 >> [ 39.706619] Workqueue: events work_for_cpu_fn >> [ 39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100 >> [ 39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 >> [ 39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292 >> [ 39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff >> [ 39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001 >> [ 39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea >> [ 39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008 >> [ 39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000 >> [ 39.706646] FS: 0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000 >> [ 39.706649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0 >> [ 39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> [ 39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> [ 39.706656] Call Trace: >> [ 39.706658] i40e_setup_pf_switch+0x617/0xf80 [i40e] >> [ 39.706683] i40e_probe.part.0.cold+0x8dc/0x109e [i40e] >> [ 39.706708] ? acpi_ns_check_object_type+0xd4/0x193 >> [ 39.706713] ? acpi_ns_check_package_list+0xfd/0x205 >> [ 39.706716] ? __kmalloc+0x37/0x160 >> [ 39.706720] ? kmem_cache_alloc+0xcb/0x120 >> [ 39.706723] ? irq_get_irq_data+0x5/0x20 >> [ 39.706726] ? mp_check_pin_attr+0xe/0xf0 >> [ 39.706729] ? irq_get_irq_data+0x5/0x20 >> [ 39.706731] ? mp_map_pin_to_irq+0xb7/0x2c0 >> [ 39.706735] ? acpi_register_gsi_ioapic+0x86/0x150 >> [ 39.706739] ? pci_conf1_read+0x9f/0xf0 >> [ 39.706743] ? pci_bus_read_config_word+0x2e/0x40 >> [ 39.706746] local_pci_probe+0x1b/0x40 >> [ 39.706750] work_for_cpu_fn+0xb/0x20 >> [ 39.706754] process_one_work+0x1ec/0x350 >> [ 39.706758] worker_thread+0x24b/0x4d0 >> [ 39.706760] ? process_one_work+0x350/0x350 >> [ 39.706762] kthread+0xea/0x120 >> [ 39.706766] ? kthread_park+0x80/0x80 >> [ 39.706770] ret_from_fork+0x1f/0x30 >> [ 39.706774] ---[ end trace 7a203f3ec972a377 ]--- >> >> Martin >> >> >>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote: >>> >>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to >>> determine if the kthread owns this napi and could call napi->poll() on >>> it. However, if socket busy poll is enabled, it is possible that the >>> busy poll thread grabs this SCHED bit (after the previous napi->poll() >>> invokes napi_complete_done() and clears SCHED bit) and tries to poll >>> on the same napi. napi_disable() could grab the SCHED bit as well. >>> This patch tries to fix this race by adding a new bit >>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in >>> ____napi_schedule() if the threaded mode is enabled, and gets cleared >>> in napi_complete_done(), and we only poll the napi in kthread if this >>> bit is set. This helps distinguish the ownership of the napi between >>> kthread and other scenarios and fixes the race issue. >>> >>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") >>> Reported-by: Martin Zaharinov <micron10@gmail.com> >>> Suggested-by: Jakub Kicinski <kuba@kernel.org> >>> Signed-off-by: Wei Wang <weiwan@google.com> >>> Cc: Alexander Duyck <alexanderduyck@fb.com> >>> Cc: Eric Dumazet <edumazet@google.com> >>> Cc: Paolo Abeni <pabeni@redhat.com> >>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> >>> --- >>> Change since v3: >>> - Add READ_ONCE() for thread->state and add comments in >>> ____napi_schedule(). >>> >>> include/linux/netdevice.h | 2 ++ >>> net/core/dev.c | 19 ++++++++++++++++++- >>> 2 files changed, 20 insertions(+), 1 deletion(-) >>> >>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>> index 5b67ea89d5f2..87a5d186faff 100644 >>> --- a/include/linux/netdevice.h >>> +++ b/include/linux/netdevice.h >>> @@ -360,6 +360,7 @@ enum { >>> NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */ >>> NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/ >>> NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/ >>> + NAPI_STATE_SCHED_THREADED, /* Napi is currently scheduled in threaded mode */ >>> }; >>> >>> enum { >>> @@ -372,6 +373,7 @@ enum { >>> NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL), >>> NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL), >>> NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED), >>> + NAPIF_STATE_SCHED_THREADED = BIT(NAPI_STATE_SCHED_THREADED), >>> }; >>> >>> enum gro_result { >>> diff --git a/net/core/dev.c b/net/core/dev.c >>> index 6c5967e80132..d3195a95f30e 100644 >>> --- a/net/core/dev.c >>> +++ b/net/core/dev.c >>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd, >>> */ >>> thread = READ_ONCE(napi->thread); >>> if (thread) { >>> + /* Avoid doing set_bit() if the thread is in >>> + * INTERRUPTIBLE state, cause napi_thread_wait() >>> + * makes sure to proceed with napi polling >>> + * if the thread is explicitly woken from here. >>> + */ >>> + if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE) >>> + set_bit(NAPI_STATE_SCHED_THREADED, &napi->state); >>> wake_up_process(thread); >>> return; >>> } >>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done) >>> WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED)); >>> >>> new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED | >>> + NAPIF_STATE_SCHED_THREADED | >>> NAPIF_STATE_PREFER_BUSY_POLL); >>> >>> /* If STATE_MISSED was set, leave STATE_SCHED set, >>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) >>> >>> static int napi_thread_wait(struct napi_struct *napi) >>> { >>> + bool woken = false; >>> + >>> set_current_state(TASK_INTERRUPTIBLE); >>> >>> while (!kthread_should_stop() && !napi_disable_pending(napi)) { >>> - if (test_bit(NAPI_STATE_SCHED, &napi->state)) { >>> + /* Testing SCHED_THREADED bit here to make sure the current >>> + * kthread owns this napi and could poll on this napi. >>> + * Testing SCHED bit is not enough because SCHED bit might be >>> + * set by some other busy poll thread or by napi_disable(). >>> + */ >>> + if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) { >>> WARN_ON(!list_empty(&napi->poll_list)); >>> __set_current_state(TASK_RUNNING); >>> return 0; >>> } >>> >>> schedule(); >>> + /* woken being true indicates this thread owns this napi. */ >>> + woken = true; >>> set_current_state(TASK_INTERRUPTIBLE); >>> } >>> __set_current_state(TASK_RUNNING); >>> -- >>> 2.31.0.rc2.261.g7f71774620-goog >>> >>
Hi Eric and Wei Please check this log : 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null) [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G O 5.11.4 #1 [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 [1584289.107263] Call Trace: [1584289.107266] dump_stack+0x58/0x6b [1584289.209562] warn_alloc.cold+0x70/0xd4 [1584289.209569] __alloc_pages_slowpath.constprop.0+0xd57/0xfb0 [1584289.209574] __alloc_pages_nodemask+0x15a/0x180 [1584289.474009] allocate_slab+0x272/0x450 [1584289.496731] ___slab_alloc.constprop.0+0x41e/0x4d0 [1584289.519147] kmem_cache_alloc+0x110/0x120 [1584289.541416] build_skb+0x1a/0x200 [1584289.563121] ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe] [1584289.584618] ixgbe_poll+0xeb/0x2a0 [ixgbe] [1584289.605528] __napi_poll+0x1f/0x130 [1584289.625842] napi_threaded_poll+0x110/0x160 [1584289.646110] ? __napi_poll+0x130/0x130 [1584289.665810] kthread+0xea/0x120 [1584289.684836] ? kthread_park+0x80/0x80 [1584289.703440] ret_from_fork+0x1f/0x30 [1584289.721616] Mem-Info: [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0 active_file:17408 inactive_file:149 isolated_file:32 unevictable:1440359 dirty:17500 writeback:0 slab_reclaimable:43368 slab_unreclaimable:155124 mapped:817431 shmem:7650 pagetables:32093 bounce:0 free:17832 free_pcp:113 free_cma:0 [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726 [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB [1584290.104980] lowmem_reserve[]: 0 0 13985 13985 [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB [1584290.237051] lowmem_reserve[]: 0 0 0 0 [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB [1584290.409087] 1465768 total pagecache pages [1584290.434531] 4165289 pages RAM [1584290.459616] 0 pages HighMem/MovableOnly [1584290.484480] 104766 pages reserved [1584290.508709] 0 pages hwpoisoned [1584301.710231] team0: Failed to send options change via netlink (err -105) [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0 [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G O 5.11.4 #1 [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 [1584302.776532] Call Trace: [1584302.799361] dump_stack+0x58/0x6b [1584302.821791] dump_header+0x4c/0x2e6 [1584302.843580] oom_kill_process.cold+0xb/0x10 [1584302.865223] out_of_memory.part.0+0x125/0x5f0 [1584302.886641] out_of_memory+0x54/0xa0 [1584302.907302] __alloc_pages_slowpath.constprop.0+0xb03/0xfb0 [1584302.927913] __alloc_pages_nodemask+0x15a/0x180 [1584302.947874] __get_free_pages+0x8/0x30 [1584302.967246] pgd_alloc+0x21/0x180 [1584302.986355] mm_alloc+0x1af/0x250 [1584303.005085] alloc_bprm+0x80/0x2a0 [1584303.023328] do_execveat_common+0x8b/0x330 [1584303.041181] __x64_sys_execve+0x2b/0x40 [1584303.058513] do_syscall_64+0x2d/0x40 [1584303.075281] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [1584303.091891] RIP: 0033:0x488376 [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30 [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376 [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660 [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000 [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258 [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100 [1584303.379094] Mem-Info: [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0 active_file:12975 inactive_file:168 isolated_file:32 unevictable:909709 dirty:12864 writeback:10 slab_reclaimable:42415 slab_unreclaimable:154783 mapped:39825 shmem:14744 pagetables:26041 bounce:0 free:537002 free_pcp:1813 free_cma:0 [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726 [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB [1584303.888935] lowmem_reserve[]: 0 0 13985 13985 [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB [1584304.036531] lowmem_reserve[]: 0 0 0 0 [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB [1584304.287094] 933871 total pagecache pages [1584304.312815] 4165289 pages RAM [1584304.337915] 0 pages HighMem/MovableOnly [1584304.362522] 104766 pages reserved [1584304.386516] 0 pages hwpoisoned > On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote: > > On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote: >> >> Hi Wei >> Check this: >> >> [ 39.706567] ------------[ cut here ]------------ >> [ 39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557) >> [ 39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100 > > Probably more relevant to Intel maintainers than Wei :/ > >> [ 39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas >> [ 39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G O 5.11.7 #1 >> [ 39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020 >> [ 39.706619] Workqueue: events work_for_cpu_fn >> [ 39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100 >> [ 39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 >> [ 39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292 >> [ 39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff >> [ 39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001 >> [ 39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea >> [ 39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008 >> [ 39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000 >> [ 39.706646] FS: 0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000 >> [ 39.706649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0 >> [ 39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> [ 39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> [ 39.706656] Call Trace: >> [ 39.706658] i40e_setup_pf_switch+0x617/0xf80 [i40e] >> [ 39.706683] i40e_probe.part.0.cold+0x8dc/0x109e [i40e] >> [ 39.706708] ? acpi_ns_check_object_type+0xd4/0x193 >> [ 39.706713] ? acpi_ns_check_package_list+0xfd/0x205 >> [ 39.706716] ? __kmalloc+0x37/0x160 >> [ 39.706720] ? kmem_cache_alloc+0xcb/0x120 >> [ 39.706723] ? irq_get_irq_data+0x5/0x20 >> [ 39.706726] ? mp_check_pin_attr+0xe/0xf0 >> [ 39.706729] ? irq_get_irq_data+0x5/0x20 >> [ 39.706731] ? mp_map_pin_to_irq+0xb7/0x2c0 >> [ 39.706735] ? acpi_register_gsi_ioapic+0x86/0x150 >> [ 39.706739] ? pci_conf1_read+0x9f/0xf0 >> [ 39.706743] ? pci_bus_read_config_word+0x2e/0x40 >> [ 39.706746] local_pci_probe+0x1b/0x40 >> [ 39.706750] work_for_cpu_fn+0xb/0x20 >> [ 39.706754] process_one_work+0x1ec/0x350 >> [ 39.706758] worker_thread+0x24b/0x4d0 >> [ 39.706760] ? process_one_work+0x350/0x350 >> [ 39.706762] kthread+0xea/0x120 >> [ 39.706766] ? kthread_park+0x80/0x80 >> [ 39.706770] ret_from_fork+0x1f/0x30 >> [ 39.706774] ---[ end trace 7a203f3ec972a377 ]--- >> >> Martin >> >> >>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote: >>> >>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to >>> determine if the kthread owns this napi and could call napi->poll() on >>> it. However, if socket busy poll is enabled, it is possible that the >>> busy poll thread grabs this SCHED bit (after the previous napi->poll() >>> invokes napi_complete_done() and clears SCHED bit) and tries to poll >>> on the same napi. napi_disable() could grab the SCHED bit as well. >>> This patch tries to fix this race by adding a new bit >>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in >>> ____napi_schedule() if the threaded mode is enabled, and gets cleared >>> in napi_complete_done(), and we only poll the napi in kthread if this >>> bit is set. This helps distinguish the ownership of the napi between >>> kthread and other scenarios and fixes the race issue. >>> >>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") >>> Reported-by: Martin Zaharinov <micron10@gmail.com> >>> Suggested-by: Jakub Kicinski <kuba@kernel.org> >>> Signed-off-by: Wei Wang <weiwan@google.com> >>> Cc: Alexander Duyck <alexanderduyck@fb.com> >>> Cc: Eric Dumazet <edumazet@google.com> >>> Cc: Paolo Abeni <pabeni@redhat.com> >>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> >>> --- >>> Change since v3: >>> - Add READ_ONCE() for thread->state and add comments in >>> ____napi_schedule(). >>> >>> include/linux/netdevice.h | 2 ++ >>> net/core/dev.c | 19 ++++++++++++++++++- >>> 2 files changed, 20 insertions(+), 1 deletion(-) >>> >>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>> index 5b67ea89d5f2..87a5d186faff 100644 >>> --- a/include/linux/netdevice.h >>> +++ b/include/linux/netdevice.h >>> @@ -360,6 +360,7 @@ enum { >>> NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */ >>> NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/ >>> NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/ >>> + NAPI_STATE_SCHED_THREADED, /* Napi is currently scheduled in threaded mode */ >>> }; >>> >>> enum { >>> @@ -372,6 +373,7 @@ enum { >>> NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL), >>> NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL), >>> NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED), >>> + NAPIF_STATE_SCHED_THREADED = BIT(NAPI_STATE_SCHED_THREADED), >>> }; >>> >>> enum gro_result { >>> diff --git a/net/core/dev.c b/net/core/dev.c >>> index 6c5967e80132..d3195a95f30e 100644 >>> --- a/net/core/dev.c >>> +++ b/net/core/dev.c >>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd, >>> */ >>> thread = READ_ONCE(napi->thread); >>> if (thread) { >>> + /* Avoid doing set_bit() if the thread is in >>> + * INTERRUPTIBLE state, cause napi_thread_wait() >>> + * makes sure to proceed with napi polling >>> + * if the thread is explicitly woken from here. >>> + */ >>> + if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE) >>> + set_bit(NAPI_STATE_SCHED_THREADED, &napi->state); >>> wake_up_process(thread); >>> return; >>> } >>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done) >>> WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED)); >>> >>> new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED | >>> + NAPIF_STATE_SCHED_THREADED | >>> NAPIF_STATE_PREFER_BUSY_POLL); >>> >>> /* If STATE_MISSED was set, leave STATE_SCHED set, >>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) >>> >>> static int napi_thread_wait(struct napi_struct *napi) >>> { >>> + bool woken = false; >>> + >>> set_current_state(TASK_INTERRUPTIBLE); >>> >>> while (!kthread_should_stop() && !napi_disable_pending(napi)) { >>> - if (test_bit(NAPI_STATE_SCHED, &napi->state)) { >>> + /* Testing SCHED_THREADED bit here to make sure the current >>> + * kthread owns this napi and could poll on this napi. >>> + * Testing SCHED bit is not enough because SCHED bit might be >>> + * set by some other busy poll thread or by napi_disable(). >>> + */ >>> + if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) { >>> WARN_ON(!list_empty(&napi->poll_list)); >>> __set_current_state(TASK_RUNNING); >>> return 0; >>> } >>> >>> schedule(); >>> + /* woken being true indicates this thread owns this napi. */ >>> + woken = true; >>> set_current_state(TASK_INTERRUPTIBLE); >>> } >>> __set_current_state(TASK_RUNNING); >>> -- >>> 2.31.0.rc2.261.g7f71774620-goog >>> >>
On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote: > > Hi Eric and Wei > > Please check this log : > Please send a normal report to netdev. This has nothing to to with us (Eric & Wei) Thanks. > > 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null) > [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G O 5.11.4 #1 > [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 > [1584289.107263] Call Trace: > [1584289.107266] dump_stack+0x58/0x6b > [1584289.209562] warn_alloc.cold+0x70/0xd4 > [1584289.209569] __alloc_pages_slowpath.constprop.0+0xd57/0xfb0 > [1584289.209574] __alloc_pages_nodemask+0x15a/0x180 > [1584289.474009] allocate_slab+0x272/0x450 > [1584289.496731] ___slab_alloc.constprop.0+0x41e/0x4d0 > [1584289.519147] kmem_cache_alloc+0x110/0x120 > [1584289.541416] build_skb+0x1a/0x200 > [1584289.563121] ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe] > [1584289.584618] ixgbe_poll+0xeb/0x2a0 [ixgbe] > [1584289.605528] __napi_poll+0x1f/0x130 > [1584289.625842] napi_threaded_poll+0x110/0x160 > [1584289.646110] ? __napi_poll+0x130/0x130 > [1584289.665810] kthread+0xea/0x120 > [1584289.684836] ? kthread_park+0x80/0x80 > [1584289.703440] ret_from_fork+0x1f/0x30 > [1584289.721616] Mem-Info: > [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0 > active_file:17408 inactive_file:149 isolated_file:32 > unevictable:1440359 dirty:17500 writeback:0 > slab_reclaimable:43368 slab_unreclaimable:155124 > mapped:817431 shmem:7650 pagetables:32093 bounce:0 > free:17832 free_pcp:113 free_cma:0 > [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no > [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726 > [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB > [1584290.104980] lowmem_reserve[]: 0 0 13985 13985 > [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB > [1584290.237051] lowmem_reserve[]: 0 0 0 0 > [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB > [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB > [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB > [1584290.409087] 1465768 total pagecache pages > [1584290.434531] 4165289 pages RAM > [1584290.459616] 0 pages HighMem/MovableOnly > [1584290.484480] 104766 pages reserved > [1584290.508709] 0 pages hwpoisoned > [1584301.710231] team0: Failed to send options change via netlink (err -105) > [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0 > [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G O 5.11.4 #1 > [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 > [1584302.776532] Call Trace: > [1584302.799361] dump_stack+0x58/0x6b > [1584302.821791] dump_header+0x4c/0x2e6 > [1584302.843580] oom_kill_process.cold+0xb/0x10 > [1584302.865223] out_of_memory.part.0+0x125/0x5f0 > [1584302.886641] out_of_memory+0x54/0xa0 > [1584302.907302] __alloc_pages_slowpath.constprop.0+0xb03/0xfb0 > [1584302.927913] __alloc_pages_nodemask+0x15a/0x180 > [1584302.947874] __get_free_pages+0x8/0x30 > [1584302.967246] pgd_alloc+0x21/0x180 > [1584302.986355] mm_alloc+0x1af/0x250 > [1584303.005085] alloc_bprm+0x80/0x2a0 > [1584303.023328] do_execveat_common+0x8b/0x330 > [1584303.041181] __x64_sys_execve+0x2b/0x40 > [1584303.058513] do_syscall_64+0x2d/0x40 > [1584303.075281] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [1584303.091891] RIP: 0033:0x488376 > [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30 > [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b > [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376 > [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660 > [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000 > [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258 > [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100 > [1584303.379094] Mem-Info: > [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0 > active_file:12975 inactive_file:168 isolated_file:32 > unevictable:909709 dirty:12864 writeback:10 > slab_reclaimable:42415 slab_unreclaimable:154783 > mapped:39825 shmem:14744 pagetables:26041 bounce:0 > free:537002 free_pcp:1813 free_cma:0 > [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no > [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726 > [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB > [1584303.888935] lowmem_reserve[]: 0 0 13985 13985 > [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB > [1584304.036531] lowmem_reserve[]: 0 0 0 0 > [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB > [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB > [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB > [1584304.287094] 933871 total pagecache pages > [1584304.312815] 4165289 pages RAM > [1584304.337915] 0 pages HighMem/MovableOnly > [1584304.362522] 104766 pages reserved > [1584304.386516] 0 pages hwpoisoned > > > On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote: > > > > On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote: > >> > >> Hi Wei > >> Check this: > >> > >> [ 39.706567] ------------[ cut here ]------------ > >> [ 39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557) > >> [ 39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100 > > > > Probably more relevant to Intel maintainers than Wei :/ > > > >> [ 39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas > >> [ 39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G O 5.11.7 #1 > >> [ 39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020 > >> [ 39.706619] Workqueue: events work_for_cpu_fn > >> [ 39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100 > >> [ 39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 > >> [ 39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292 > >> [ 39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff > >> [ 39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001 > >> [ 39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea > >> [ 39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008 > >> [ 39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000 > >> [ 39.706646] FS: 0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000 > >> [ 39.706649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >> [ 39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0 > >> [ 39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >> [ 39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >> [ 39.706656] Call Trace: > >> [ 39.706658] i40e_setup_pf_switch+0x617/0xf80 [i40e] > >> [ 39.706683] i40e_probe.part.0.cold+0x8dc/0x109e [i40e] > >> [ 39.706708] ? acpi_ns_check_object_type+0xd4/0x193 > >> [ 39.706713] ? acpi_ns_check_package_list+0xfd/0x205 > >> [ 39.706716] ? __kmalloc+0x37/0x160 > >> [ 39.706720] ? kmem_cache_alloc+0xcb/0x120 > >> [ 39.706723] ? irq_get_irq_data+0x5/0x20 > >> [ 39.706726] ? mp_check_pin_attr+0xe/0xf0 > >> [ 39.706729] ? irq_get_irq_data+0x5/0x20 > >> [ 39.706731] ? mp_map_pin_to_irq+0xb7/0x2c0 > >> [ 39.706735] ? acpi_register_gsi_ioapic+0x86/0x150 > >> [ 39.706739] ? pci_conf1_read+0x9f/0xf0 > >> [ 39.706743] ? pci_bus_read_config_word+0x2e/0x40 > >> [ 39.706746] local_pci_probe+0x1b/0x40 > >> [ 39.706750] work_for_cpu_fn+0xb/0x20 > >> [ 39.706754] process_one_work+0x1ec/0x350 > >> [ 39.706758] worker_thread+0x24b/0x4d0 > >> [ 39.706760] ? process_one_work+0x350/0x350 > >> [ 39.706762] kthread+0xea/0x120 > >> [ 39.706766] ? kthread_park+0x80/0x80 > >> [ 39.706770] ret_from_fork+0x1f/0x30 > >> [ 39.706774] ---[ end trace 7a203f3ec972a377 ]--- > >> > >> Martin > >> > >> > >>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote: > >>> > >>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to > >>> determine if the kthread owns this napi and could call napi->poll() on > >>> it. However, if socket busy poll is enabled, it is possible that the > >>> busy poll thread grabs this SCHED bit (after the previous napi->poll() > >>> invokes napi_complete_done() and clears SCHED bit) and tries to poll > >>> on the same napi. napi_disable() could grab the SCHED bit as well. > >>> This patch tries to fix this race by adding a new bit > >>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in > >>> ____napi_schedule() if the threaded mode is enabled, and gets cleared > >>> in napi_complete_done(), and we only poll the napi in kthread if this > >>> bit is set. This helps distinguish the ownership of the napi between > >>> kthread and other scenarios and fixes the race issue. > >>> > >>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") > >>> Reported-by: Martin Zaharinov <micron10@gmail.com> > >>> Suggested-by: Jakub Kicinski <kuba@kernel.org> > >>> Signed-off-by: Wei Wang <weiwan@google.com> > >>> Cc: Alexander Duyck <alexanderduyck@fb.com> > >>> Cc: Eric Dumazet <edumazet@google.com> > >>> Cc: Paolo Abeni <pabeni@redhat.com> > >>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> > >>> --- > >>> Change since v3: > >>> - Add READ_ONCE() for thread->state and add comments in > >>> ____napi_schedule(). > >>> > >>> include/linux/netdevice.h | 2 ++ > >>> net/core/dev.c | 19 ++++++++++++++++++- > >>> 2 files changed, 20 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > >>> index 5b67ea89d5f2..87a5d186faff 100644 > >>> --- a/include/linux/netdevice.h > >>> +++ b/include/linux/netdevice.h > >>> @@ -360,6 +360,7 @@ enum { > >>> NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */ > >>> NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/ > >>> NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/ > >>> + NAPI_STATE_SCHED_THREADED, /* Napi is currently scheduled in threaded mode */ > >>> }; > >>> > >>> enum { > >>> @@ -372,6 +373,7 @@ enum { > >>> NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL), > >>> NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL), > >>> NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED), > >>> + NAPIF_STATE_SCHED_THREADED = BIT(NAPI_STATE_SCHED_THREADED), > >>> }; > >>> > >>> enum gro_result { > >>> diff --git a/net/core/dev.c b/net/core/dev.c > >>> index 6c5967e80132..d3195a95f30e 100644 > >>> --- a/net/core/dev.c > >>> +++ b/net/core/dev.c > >>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd, > >>> */ > >>> thread = READ_ONCE(napi->thread); > >>> if (thread) { > >>> + /* Avoid doing set_bit() if the thread is in > >>> + * INTERRUPTIBLE state, cause napi_thread_wait() > >>> + * makes sure to proceed with napi polling > >>> + * if the thread is explicitly woken from here. > >>> + */ > >>> + if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE) > >>> + set_bit(NAPI_STATE_SCHED_THREADED, &napi->state); > >>> wake_up_process(thread); > >>> return; > >>> } > >>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done) > >>> WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED)); > >>> > >>> new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED | > >>> + NAPIF_STATE_SCHED_THREADED | > >>> NAPIF_STATE_PREFER_BUSY_POLL); > >>> > >>> /* If STATE_MISSED was set, leave STATE_SCHED set, > >>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) > >>> > >>> static int napi_thread_wait(struct napi_struct *napi) > >>> { > >>> + bool woken = false; > >>> + > >>> set_current_state(TASK_INTERRUPTIBLE); > >>> > >>> while (!kthread_should_stop() && !napi_disable_pending(napi)) { > >>> - if (test_bit(NAPI_STATE_SCHED, &napi->state)) { > >>> + /* Testing SCHED_THREADED bit here to make sure the current > >>> + * kthread owns this napi and could poll on this napi. > >>> + * Testing SCHED bit is not enough because SCHED bit might be > >>> + * set by some other busy poll thread or by napi_disable(). > >>> + */ > >>> + if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) { > >>> WARN_ON(!list_empty(&napi->poll_list)); > >>> __set_current_state(TASK_RUNNING); > >>> return 0; > >>> } > >>> > >>> schedule(); > >>> + /* woken being true indicates this thread owns this napi. */ > >>> + woken = true; > >>> set_current_state(TASK_INTERRUPTIBLE); > >>> } > >>> __set_current_state(TASK_RUNNING); > >>> -- > >>> 2.31.0.rc2.261.g7f71774620-goog > >>> > >> >
Hi Team One report latest kernel 5.11.12 Please check and help to find and fix Apr 10 12:46:25 [214315.519319][ T3345] R13: ffff8cf193ddf700 R14: ffff8cf238ab3500 R15: ffff91ab82133d88 Apr 10 12:46:26 [214315.570814][ T3345] FS: 0000000000000000(0000) GS:ffff8cf3efb00000(0000) knlGS:0000000000000000 Apr 10 12:46:26 [214315.622416][ T3345] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 10 12:46:26 [214315.648390][ T3345] CR2: 00007f7211406000 CR3: 00000001a924a004 CR4: 00000000001706e0 Apr 10 12:46:26 [214315.698998][ T3345] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Apr 10 12:46:26 [214315.749508][ T3345] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Apr 10 12:46:26 [214315.799749][ T3345] Call Trace: Apr 10 12:46:26 [214315.824268][ T3345] netif_receive_skb_list_internal+0x5e/0x2c0 Apr 10 12:46:26 [214315.848996][ T3345] napi_gro_flush+0x11b/0x260 Apr 10 12:46:26 [214315.873320][ T3345] napi_complete_done+0x107/0x180 Apr 10 12:46:26 [214315.897160][ T3345] ixgbe_poll+0x10e/0x2a0 [ixgbe] Apr 10 12:46:26 [214315.920564][ T3345] __napi_poll+0x1f/0x130 Apr 10 12:46:26 [214315.943475][ T3345] napi_threaded_poll+0x110/0x160 Apr 10 12:46:26 [214315.966252][ T3345] ? __napi_poll+0x130/0x130 Apr 10 12:46:26 [214315.988424][ T3345] kthread+0xea/0x120 Apr 10 12:46:26 [214316.010247][ T3345] ? kthread_park+0x80/0x80 Apr 10 12:46:26 [214316.031729][ T3345] ret_from_fork+0x1f/0x30 Apr 10 12:46:26 [214316.052904][ T3345] ---[ end trace c7726a0541128b42 ]— Martin
Hello, On Sat, 2021-04-10 at 14:22 +0300, Martin Zaharinov wrote: > Hi Team > > One report latest kernel 5.11.12 > > Please check and help to find and fix Please provide a complete splat, including the trapping instruction. > > Apr 10 12:46:25 [214315.519319][ T3345] R13: ffff8cf193ddf700 R14: ffff8cf238ab3500 R15: ffff91ab82133d88 > Apr 10 12:46:26 [214315.570814][ T3345] FS: 0000000000000000(0000) GS:ffff8cf3efb00000(0000) knlGS:0000000000000000 > Apr 10 12:46:26 [214315.622416][ T3345] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Apr 10 12:46:26 [214315.648390][ T3345] CR2: 00007f7211406000 CR3: 00000001a924a004 CR4: 00000000001706e0 > Apr 10 12:46:26 [214315.698998][ T3345] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Apr 10 12:46:26 [214315.749508][ T3345] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Apr 10 12:46:26 [214315.799749][ T3345] Call Trace: > Apr 10 12:46:26 [214315.824268][ T3345] netif_receive_skb_list_internal+0x5e/0x2c0 > Apr 10 12:46:26 [214315.848996][ T3345] napi_gro_flush+0x11b/0x260 > Apr 10 12:46:26 [214315.873320][ T3345] napi_complete_done+0x107/0x180 > Apr 10 12:46:26 [214315.897160][ T3345] ixgbe_poll+0x10e/0x2a0 [ixgbe] > Apr 10 12:46:26 [214315.920564][ T3345] __napi_poll+0x1f/0x130 > Apr 10 12:46:26 [214315.943475][ T3345] napi_threaded_poll+0x110/0x160 > Apr 10 12:46:26 [214315.966252][ T3345] ? __napi_poll+0x130/0x130 > Apr 10 12:46:26 [214315.988424][ T3345] kthread+0xea/0x120 > Apr 10 12:46:26 [214316.010247][ T3345] ? kthread_park+0x80/0x80 > Apr 10 12:46:26 [214316.031729][ T3345] ret_from_fork+0x1f/0x30 Could you please also provide the decoded the stack trace? Something alike the following will do: cat <file contaning the splat> | ./scripts/decode_stacktrace.sh <path to vmlinux> Even more importantly: threaded napi is implemented with the merge commit adbb4fb028452b1b0488a1a7b66ab856cdf20715, which landed into the vanilla tree since v5.12.rc1 and is not backported to 5.11.x. What kernel are you really using? Thanks, Paolo
Hi Paolo Sorry for delay. After disable gro on eth interface and team0 and now work fine. In this case I user kernel 5.11.12 but after release 5.12 I will migrate to them and will check for problem with kthread. Thanks, I will update if have other problem. Martin > On 12 Apr 2021, at 11:36, Paolo Abeni <pabeni@redhat.com> wrote: > > Hello, > > On Sat, 2021-04-10 at 14:22 +0300, Martin Zaharinov wrote: >> Hi Team >> >> One report latest kernel 5.11.12 >> >> Please check and help to find and fix > > Please provide a complete splat, including the trapping instruction. >> >> Apr 10 12:46:25 [214315.519319][ T3345] R13: ffff8cf193ddf700 R14: ffff8cf238ab3500 R15: ffff91ab82133d88 >> Apr 10 12:46:26 [214315.570814][ T3345] FS: 0000000000000000(0000) GS:ffff8cf3efb00000(0000) knlGS:0000000000000000 >> Apr 10 12:46:26 [214315.622416][ T3345] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> Apr 10 12:46:26 [214315.648390][ T3345] CR2: 00007f7211406000 CR3: 00000001a924a004 CR4: 00000000001706e0 >> Apr 10 12:46:26 [214315.698998][ T3345] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> Apr 10 12:46:26 [214315.749508][ T3345] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> Apr 10 12:46:26 [214315.799749][ T3345] Call Trace: >> Apr 10 12:46:26 [214315.824268][ T3345] netif_receive_skb_list_internal+0x5e/0x2c0 >> Apr 10 12:46:26 [214315.848996][ T3345] napi_gro_flush+0x11b/0x260 >> Apr 10 12:46:26 [214315.873320][ T3345] napi_complete_done+0x107/0x180 >> Apr 10 12:46:26 [214315.897160][ T3345] ixgbe_poll+0x10e/0x2a0 [ixgbe] >> Apr 10 12:46:26 [214315.920564][ T3345] __napi_poll+0x1f/0x130 >> Apr 10 12:46:26 [214315.943475][ T3345] napi_threaded_poll+0x110/0x160 >> Apr 10 12:46:26 [214315.966252][ T3345] ? __napi_poll+0x130/0x130 >> Apr 10 12:46:26 [214315.988424][ T3345] kthread+0xea/0x120 >> Apr 10 12:46:26 [214316.010247][ T3345] ? kthread_park+0x80/0x80 >> Apr 10 12:46:26 [214316.031729][ T3345] ret_from_fork+0x1f/0x30 > > Could you please also provide the decoded the stack trace? Something > alike the following will do: > > cat <file contaning the splat> | ./scripts/decode_stacktrace.sh <path to vmlinux> > > Even more importantly: > > threaded napi is implemented with the merge > commit adbb4fb028452b1b0488a1a7b66ab856cdf20715, which landed into the > vanilla tree since v5.12.rc1 and is not backported to 5.11.x. What > kernel are you really using? > > Thanks, > > Paolo
Hi all One more bug report . Kernel is 5.12.1 If you need more info I will write. Server run with 200 users with nat [81402.540906] rcu: INFO: rcu_sched self-detected stall on CPU [81402.540909] rcu: 5-....: (3314 ticks this GP) idle=74e/1/0x4000000000000000 softirq=4979878/4979878 fqs=2554 last_accelerate: a926/c0a0 dyntick_enabled: 1 [81402.540911] (t=6001 jiffies g=7517749 q=44479) [81402.540913] NMI backtrace for cpu 5 [81402.540914] CPU: 5 PID: 36 Comm: ksoftirqd/5 Tainted: G O 5.12.1 #1 [81402.540916] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 [81402.540917] Call Trace: [81402.540919] [81402.540920] dump_stack+0x65/0x7d [81402.540924] ? lapic_can_unplug_cpu+0x70/0x70 [81402.540927] nmi_trigger_cpumask_backtrace.cold+0x40/0x4d [81402.540929] rcu_dump_cpu_stacks+0xbe/0xec [81402.540932] rcu_sched_clock_irq.cold+0x195/0x3f1 [81402.540934] ? enqueue_task_fair+0x796/0xbd0 [81402.540938] update_process_times+0x88/0xc0 [81402.540942] tick_sched_timer+0x7f/0x110 [81402.540944] ? tick_nohz_dep_set_task+0x80/0x80 [81402.540945] __hrtimer_run_queues+0x10b/0x1b0 [81402.540947] hrtimer_interrupt+0x10a/0x420 [81402.540949] __sysvec_apic_timer_interrupt+0x47/0x60 [81402.540952] sysvec_apic_timer_interrupt+0x65/0x90 [81402.540955] [81402.540955] asm_sysvec_apic_timer_interrupt+0xf/0x20 [81402.540959] RIP: 0010:console_unlock+0x366/0x5e0 [81402.540961] Code: ff ff 8b 05 44 5f b2 01 85 c0 75 66 c7 05 3a 5f b2 01 01 00 00 00 e9 0f fd ff ff e8 f4 1c 00 00 48 85 db 74 01 fb 8b 54 24 0c <85> d2 0f 84 4a fd ff ff e8 1d 2b 7c 00 e9 40 fd ff ff 4d 85 ff 74 [81402.540963] RSP: 0018:ffff9dc980203a80 EFLAGS: 00000206 [81402.540964] RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000000000 [81402.540965] RDX: 0000000000000000 RSI: 0000000000000087 RDI: ffffffff82b59898 [81402.540966] RBP: 0000000000000000 R08: ffff9786814db080 R09: 0000000000000000 [81402.540966] R10: ffff9786a85bf260 R11: ffff9786f7bd7cf0 R12: 0000000000000048 [81402.540967] R13: 0000000000000000 R14: 20c49ba5e353f7cf R15: 0000000000000000 [81402.540968] ? common_interrupt+0x14/0xa0 [81402.540969] ? asm_common_interrupt+0x1b/0x40 [81402.540971] vprintk_default+0x5a/0x150 [81402.540972] printk+0x43/0x45 [81402.540975] create_nat_session+0x1c5e/0x1cfd [xt_NAT] [81402.540978] ipt_do_table+0x2e5/0x670 [ip_tables] [81402.540980] ? ip_route_input_noref+0xa8/0x1e0 [81402.540983] nf_hook_slow+0x36/0xa0 [81402.540986] ip_forward+0x40d/0x450 [81402.540987] ? ip4_obj_hashfn+0xc0/0xc0 [81402.540989] process_backlog+0x11a/0x230 [81402.540992] __napi_poll+0x1f/0x130 [81402.540994] net_rx_action+0x239/0x2f0 [81402.540996] ? run_timer_softirq+0x730/0x880 [81402.540998] __do_softirq+0xaf/0x1da [81402.541000] run_ksoftirqd+0x15/0x20 [81402.541004] smpboot_thread_fn+0xb3/0x140 [81402.541006] ? sort_range+0x20/0x20 [81402.541008] kthread+0xea/0x120 [81402.541010] ? kthread_park+0x80/0x80 [81402.541012] ret_from_fork+0x1f/0x30 [81416.300055] rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { [81476.311498] rcu: INFO: rcu_sched self-detected stall on CPU [81476.311500] rcu: 3-....: (1 GPs behind) idle=86a/1/0x4000000000000000 softirq=4703397/4703398 fqs=2596 last_accelerate: c5ff/dd71 dyntick_enabled: 1 [81476.311503] (t=6001 jiffies g=7517753 q=82419) [81476.311505] NMI backtrace for cpu 3 [81476.311506] CPU: 3 PID: 527214 Comm: kworker/3:2 Tainted: G O 5.12.1 #1 [81476.311507] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 [81476.311509] Workqueue: rcu_gp wait_rcu_exp_gp [81476.311512] Call Trace: [81476.311514] [81476.311515] dump_stack+0x65/0x7d [81476.311519] ? lapic_can_unplug_cpu+0x70/0x70 [81476.311521] nmi_trigger_cpumask_backtrace.cold+0x40/0x4d [81476.311523] rcu_dump_cpu_stacks+0xbe/0xec [81476.311527] rcu_sched_clock_irq.cold+0x195/0x3f1 [81476.311529] ? timekeeping_advance+0x34e/0x540 [81476.311531] update_process_times+0x88/0xc0 [81476.311534] tick_sched_timer+0x7f/0x110 [81476.311536] ? tick_nohz_dep_set_task+0x80/0x80 [81476.311537] __hrtimer_run_queues+0x10b/0x1b0 [81476.311539] hrtimer_interrupt+0x10a/0x420 [81476.311541] __sysvec_apic_timer_interrupt+0x47/0x60 [81476.311544] sysvec_apic_timer_interrupt+0x65/0x90 [81476.311547] [81476.311547] asm_sysvec_apic_timer_interrupt+0xf/0x20 [81476.311551] RIP: 0010:console_unlock+0x366/0x5e0 [81476.311554] Code: ff ff 8b 05 44 5f b2 01 85 c0 75 66 c7 05 3a 5f b2 01 01 00 00 00 e9 0f fd ff ff e8 f4 1c 00 00 48 85 db 74 01 fb 8b 54 24 0c <85> d2 0f 84 4a fd ff ff e8 1d 2b 7c 00 e9 40 fd ff ff 4d 85 ff 74 [81476.311555] RSP: 0018:ffff9dc980313cc0 EFLAGS: 00000206 [81476.311556] RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000000000 [81476.311557] RDX: 0000000000000000 RSI: 0000000000000087 RDI: ffffffff82b59898 [81476.311557] RBP: 0000000000000000 R08: ffff9786814db080 R09: 0000000000000000 [81476.311558] R10: ffff9786a85bac10 R11: ffff97872e90acf0 R12: 0000000000000048 [81476.311559] R13: 0000000000000000 R14: 20c49ba5e353f7cf R15: 0000000000000000 [81476.311560] vprintk_default+0x5a/0x150 [81476.311562] printk+0x43/0x45 [81476.311563] synchronize_rcu_expedited_wait.cold+0x20/0x2db [81476.311565] rcu_exp_wait_wake+0xc/0x110 [81476.311567] process_one_work+0x1ec/0x350 [81476.311569] worker_thread+0x4f/0x4d0 [81476.311570] ? process_one_work+0x350/0x350 [81476.311571] kthread+0xea/0x120 [81476.311573] ? kthread_park+0x80/0x80 [81476.311574] ret_from_fork+0x1f/0x30 [81551.199572] } 19586 jiffies s: 14473 root: 0x0/.
Hi Paolo Its urgent I get same bug with new kernel 5.12.1 its normal server with nat traffic and need GRO to be enabled to work speed on users. Please check : May 9 12:30:23 [126568.653018][ T3527] ------------[ cut here ]------------ May 9 12:30:23 [126568.653019][ T3527] list_del corruption. prev->next should be ffff9478d6b55a00, but was ffffb0ebc3123d88 May 9 12:30:23 [126568.653023][ T3527] WARNING: CPU: 20 PID: 3527 at lib/list_debug.c:51 __list_del_entry_valid+0x79/0x90 May 9 12:30:23 [126568.653026][ T3527] Modules linked in: nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc xt_dtvqos(O) xt_TCPMSS xt_nat iptable_mangle iptable_nat ip_tables team_mode_loadbalance team netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos May 9 12:30:23 [126568.653049][ T3527] CPU: 20 PID: 3527 Comm: napi/eth1-542 Tainted: G W O 5.12.1 #1 May 9 12:30:23 [126568.653050][ T3527] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 3.3 10/28/2020 May 9 12:30:23 [126568.653051][ T3527] RIP: 0010:__list_del_entry_valid+0x79/0x90 May 9 12:30:23 [126568.653054][ T3527] Code: c3 48 89 fe 4c 89 c2 48 c7 c7 08 db 34 b8 e8 2c df 51 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 db 34 b8 e8 15 df 51 00 <0f> 0b 31 c0 c3 48 c7 c7 80 db 34 b8 e8 04 df 51 00 0f 0b 31 c0 c3 May 9 12:30:23 [126568.653055][ T3527] RSP: 0018:ffffb0ebc3123d78 EFLAGS: 00010296 May 9 12:30:23 [126568.653056][ T3527] RAX: 0000000000000054 RBX: ffff9478d6b55a00 RCX: 80000000fff832ec May 9 12:30:23 [126568.653057][ T3527] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffffffffb8b59898 May 9 12:30:23 [126568.653058][ T3527] RBP: ffff9477eac08158 R08: 00000000000098c4 R09: 000000000000000f May 9 12:30:23 [126568.653059][ T3527] R10: 0000000000000004 R11: ffff947f1e8fa1b4 R12: ffff9478d6b54400 May 9 12:30:23 [126568.653059][ T3527] R13: ffff9478d6b55a00 R14: ffff94789340d400 R15: ffffb0ebc3123d88 May 9 12:30:23 [126568.653060][ T3527] FS: 0000000000000000(0000) GS:ffff947f1fd00000(0000) knlGS:0000000000000000 May 9 12:30:23 [126568.653061][ T3527] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 9 12:30:24 [126568.653062][ T3527] CR2: 00007fc73d0e0000 CR3: 00000001dea18003 CR4: 00000000001706e0 May 9 12:30:24 [126568.653063][ T3527] Call Trace: May 9 12:30:24 [126568.653063][ T3527] netif_receive_skb_list_internal+0x5e/0x2b0 May 9 12:30:24 [126568.653066][ T3527] ? napi_gro_receive+0x14d/0x160 May 9 12:30:24 [126568.653068][ T3527] ? enqueue_to_backlog+0x39/0x250 May 9 12:30:24 [126568.653069][ T3527] napi_gro_flush+0x11b/0x260 May 9 12:30:24 [126568.653071][ T3527] napi_complete_done+0x107/0x180 May 9 12:30:24 [126568.653073][ T3527] ixgbe_poll+0x10e/0x2a0 [ixgbe] May 9 12:30:24 [126568.653080][ T3527] __napi_poll+0x1f/0x130 May 9 12:30:24 [126568.653082][ T3527] napi_threaded_poll+0x105/0x150 May 9 12:30:24 [126568.653084][ T3527] ? __napi_poll+0x130/0x130 May 9 12:30:24 [126568.653086][ T3527] kthread+0xea/0x120 May 9 12:30:24 [126568.653088][ T3527] ? kthread_park+0x80/0x80 May 9 12:30:24 [126568.653090][ T3527] ret_from_fork+0x1f/0x30 May 9 12:30:24 [126568.653092][ T3527] ---[ end trace 946b481f5c11bfe9 ]--- May 9 12:30:24 [126568.653092][ T3527] ------------[ cut here ]------------ May 9 12:30:24 [126568.653093][ T3527] list_del corruption. prev->next should be ffff9478d6b54400, but was ffffb0ebc3123d88 May 9 12:30:24 [126568.653097][ T3527] WARNING: CPU: 20 PID: 3527 at lib/list_debug.c:51 __list_del_entry_valid+0x79/0x90 May 9 12:30:24 [126568.653099][ T3527] Modules linked in: nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc xt_dtvqos(O) xt_TCPMSS xt_nat iptable_mangle iptable_nat ip_tables team_mode_loadbalance team netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos May 9 12:30:24 [126568.653113][ T3527] CPU: 20 PID: 3527 Comm: napi/eth1-542 Tainted: G W O 5.12.1 #1 May 9 12:30:24 [126568.653114][ T3527] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 3.3 10/28/2020 May 9 12:30:24 [126568.653114][ T3527] RIP: 0010:__list_del_entry_valid+0x79/0x90 May 9 12:30:24 [126568.653116][ T3527] Code: c3 48 89 fe 4c 89 c2 48 c7 c7 08 db 34 b8 e8 2c df 51 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 db 34 b8 e8 15 df 51 00 <0f> 0b 31 c0 c3 48 c7 c7 80 db 34 b8 e8 04 df 51 00 0f 0b 31 c0 c3 May 9 12:30:24 [126568.653117][ T3527] RSP: 0018:ffffb0ebc3123d78 EFLAGS: 00010296 May 9 12:30:24 [126568.653118][ T3527] RAX: 0000000000000054 RBX: ffff9478d6b54400 RCX: 80000000fff8330b May 9 12:30:24 [126568.653119][ T3527] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffffffffb8b59898 May 9 12:30:24 [126568.653120][ T3527] RBP: ffff9477eac08158 R08: 0000000000009921 R09: 000000000000000f May 9 12:30:24 [126568.653120][ T3527] R10: 0000000000000004 R11: ffff947f1e8fab74 R12: ffff9478d6b55800 May 9 12:30:24 [126568.653121][ T3527] R13: ffff9478d6b54400 R14: ffff9478d6b54700 R15: ffffb0ebc3123d88 May 9 12:30:25 [126568.653122][ T3527] FS: 0000000000000000(0000) GS:ffff947f1fd00000(0000) knlGS:0000000000000000 May 9 12:30:25 [126568.653123][ T3527] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 9 12:30:25 [126568.653124][ T3527] CR2: 00007fc73d0e0000 CR3: 00000001dea18003 CR4: 00000000001706e0 May 9 12:30:25 [126568.653124][ T3527] Call Trace: May 9 12:30:25 [126568.653125][ T3527] netif_receive_skb_list_internal+0x5e/0x2b0 May 9 12:30:25 [126568.653127][ T3527] ? napi_gro_receive+0x14d/0x160 May 9 12:30:25 [126568.653129][ T3527] ? enqueue_to_backlog+0x39/0x250 May 9 12:30:25 [126568.653130][ T3527] napi_gro_flush+0x11b/0x260 May 9 12:30:25 [126568.653132][ T3527] napi_complete_done+0x107/0x180 May 9 12:30:25 [126568.653134][ T3527] ixgbe_poll+0x10e/0x2a0 [ixgbe] May 9 12:30:25 [126568.653140][ T3527] __napi_poll+0x1f/0x130 May 9 12:30:25 [126568.653142][ T3527] napi_threaded_poll+0x105/0x150 May 9 12:30:25 [126568.653144][ T3527] ? __napi_poll+0x130/0x130 May 9 12:30:25 [126568.653146][ T3527] kthread+0xea/0x120 May 9 12:30:25 [126568.653148][ T3527] ? kthread_park+0x80/0x80 May 9 12:30:25 [126568.653151][ T3527] ret_from_fork+0x1f/0x30 May 9 12:30:25 [126568.653152][ T3527] ---[ end trace 946b481f5c11bfea ]--- Best Regards, Martin На пн, 12.04.2021 г. в 11:37 ч. Paolo Abeni <pabeni@redhat.com> написа: > > Hello, > > On Sat, 2021-04-10 at 14:22 +0300, Martin Zaharinov wrote: > > Hi Team > > > > One report latest kernel 5.11.12 > > > > Please check and help to find and fix > > Please provide a complete splat, including the trapping instruction. > > > > Apr 10 12:46:25 [214315.519319][ T3345] R13: ffff8cf193ddf700 R14: ffff8cf238ab3500 R15: ffff91ab82133d88 > > Apr 10 12:46:26 [214315.570814][ T3345] FS: 0000000000000000(0000) GS:ffff8cf3efb00000(0000) knlGS:0000000000000000 > > Apr 10 12:46:26 [214315.622416][ T3345] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > Apr 10 12:46:26 [214315.648390][ T3345] CR2: 00007f7211406000 CR3: 00000001a924a004 CR4: 00000000001706e0 > > Apr 10 12:46:26 [214315.698998][ T3345] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > Apr 10 12:46:26 [214315.749508][ T3345] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > Apr 10 12:46:26 [214315.799749][ T3345] Call Trace: > > Apr 10 12:46:26 [214315.824268][ T3345] netif_receive_skb_list_internal+0x5e/0x2c0 > > Apr 10 12:46:26 [214315.848996][ T3345] napi_gro_flush+0x11b/0x260 > > Apr 10 12:46:26 [214315.873320][ T3345] napi_complete_done+0x107/0x180 > > Apr 10 12:46:26 [214315.897160][ T3345] ixgbe_poll+0x10e/0x2a0 [ixgbe] > > Apr 10 12:46:26 [214315.920564][ T3345] __napi_poll+0x1f/0x130 > > Apr 10 12:46:26 [214315.943475][ T3345] napi_threaded_poll+0x110/0x160 > > Apr 10 12:46:26 [214315.966252][ T3345] ? __napi_poll+0x130/0x130 > > Apr 10 12:46:26 [214315.988424][ T3345] kthread+0xea/0x120 > > Apr 10 12:46:26 [214316.010247][ T3345] ? kthread_park+0x80/0x80 > > Apr 10 12:46:26 [214316.031729][ T3345] ret_from_fork+0x1f/0x30 > > Could you please also provide the decoded the stack trace? Something > alike the following will do: > > cat <file contaning the splat> | ./scripts/decode_stacktrace.sh <path to vmlinux> > > Even more importantly: > > threaded napi is implemented with the merge > commit adbb4fb028452b1b0488a1a7b66ab856cdf20715, which landed into the > vanilla tree since v5.12.rc1 and is not backported to 5.11.x. What > kernel are you really using? > > Thanks, > > Paolo >
Hi Eric and Wei Please see this bug report from last hour , Kernel 5.13.13, Traffic is 7Gb/s down/ 7Gb/s up Uptime before crash : 10day Sep 9 12:49:31 [829553.899833][ T2925] ------------[ cut here ]------------ Sep 9 12:49:31 [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158 Sep 9 12:49:31 [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90 Sep 9 12:49:31 [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] Sep 9 12:49:31 [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G O 5.13.13 #1 Sep 9 12:49:31 [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 Sep 9 12:49:31 [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90 Sep 9 12:49:31 [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6 Sep 9 12:49:31 [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286 Sep 9 12:49:31 [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001 Sep 9 12:49:32 [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff Sep 9 12:49:32 [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff Sep 9 12:49:32 [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00 Sep 9 12:49:32 [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00 Sep 9 12:49:32 [829554.744221][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 Sep 9 12:49:32 [829554.795701][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 9 12:49:32 [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0 Sep 9 12:49:32 [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sep 9 12:49:32 [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Sep 9 12:49:32 [829554.972250][ T2925] Call Trace: Sep 9 12:49:32 [829554.996597][ T2925] netif_receive_skb_list_internal+0x25c/0x2b0 Sep 9 12:49:32 [829555.021270][ T2925] busy_poll_stop+0x113/0x140 Sep 9 12:49:32 [829555.045679][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 Sep 9 12:49:32 [829555.069833][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] Sep 9 12:49:32 [829555.093659][ T2925] napi_busy_loop+0x212/0x280 Sep 9 12:49:32 [829555.117046][ T2925] ep_poll+0xba/0x380 Sep 9 12:49:32 [829555.140048][ T2925] ? __napi_poll+0x1f/0x100 Sep 9 12:49:32 [829555.162477][ T2925] do_epoll_wait+0xa6/0xc0 Sep 9 12:49:32 [829555.184504][ T2925] do_epoll_pwait.part.0+0x9/0x70 Sep 9 12:49:32 [829555.206138][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 Sep 9 12:49:32 [829555.227619][ T2925] ? do_syscall_64+0x3a/0x70 Sep 9 12:49:32 [829555.248592][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae Sep 9 12:49:32 [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]--- Sep 9 12:49:32 [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9 Sep 9 12:49:32 [829555.357231][ T2925] #PF: supervisor read access in kernel mode Sep 9 12:49:32 [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page Sep 9 12:49:32 [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0 Sep 9 12:49:32 [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI Sep 9 12:49:32 [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G W O 5.13.13 #1 Sep 9 12:49:32 [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 Sep 9 12:49:32 [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 Sep 9 12:49:33 [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 Sep 9 12:49:33 [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 Sep 9 12:49:33 [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 Sep 9 12:49:33 [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 Sep 9 12:49:33 [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 Sep 9 12:49:33 [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 Sep 9 12:49:33 [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 Sep 9 12:49:33 [829555.797754][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 Sep 9 12:49:33 [829555.843229][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 9 12:49:33 [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 Sep 9 12:49:33 [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sep 9 12:49:33 [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Sep 9 12:49:33 [829556.008563][ T2925] Call Trace: Sep 9 12:49:33 [829556.032547][ T2925] ? enqueue_to_backlog+0x81/0x250 Sep 9 12:49:33 [829556.056686][ T2925] netif_receive_skb_list_internal+0x24d/0x2b0 Sep 9 12:49:33 [829556.080870][ T2925] busy_poll_stop+0x113/0x140 Sep 9 12:49:33 [829556.104559][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 Sep 9 12:49:33 [829556.128028][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] Sep 9 12:49:33 [829556.151405][ T2925] napi_busy_loop+0x212/0x280 Sep 9 12:49:33 [829556.174478][ T2925] ep_poll+0xba/0x380 Sep 9 12:49:33 [829556.196887][ T2925] ? __napi_poll+0x1f/0x100 Sep 9 12:49:33 [829556.219070][ T2925] do_epoll_wait+0xa6/0xc0 Sep 9 12:49:33 [829556.240778][ T2925] do_epoll_pwait.part.0+0x9/0x70 Sep 9 12:49:33 [829556.262203][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 Sep 9 12:49:33 [829556.283188][ T2925] ? do_syscall_64+0x3a/0x70 Sep 9 12:49:33 [829556.303666][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae Sep 9 12:49:33 [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] Sep 9 12:49:33 [829556.487037][ T2925] CR2: 00000000000496c9 Sep 9 12:49:33 [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]--- Sep 9 12:49:33 [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 Sep 9 12:49:34 [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 Sep 9 12:49:34 [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 Sep 9 12:49:34 [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 Sep 9 12:49:34 [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 Sep 9 12:49:34 [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 Sep 9 12:49:34 [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 Sep 9 12:49:34 [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 Sep 9 12:49:34 [829556.831749][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 Sep 9 12:49:34 [829556.879295][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 9 12:49:34 [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 Sep 9 12:49:34 [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sep 9 12:49:34 [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Sep 9 12:49:34 [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt Sep 9 12:49:34 [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Sep 9 12:49:34 [829557.231174][ T2925] Rebooting in 10 seconds.. Sep 9 12:49:44 [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG. > On 30 Mar 2021, at 16:39, Eric Dumazet <edumazet@google.com> wrote: > > On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote: >> >> Hi Eric and Wei >> >> Please check this log : >> > > Please send a normal report to netdev. > > This has nothing to to with us (Eric & Wei) > > Thanks. > >> >> 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null) >> [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G O 5.11.4 #1 >> [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 >> [1584289.107263] Call Trace: >> [1584289.107266] dump_stack+0x58/0x6b >> [1584289.209562] warn_alloc.cold+0x70/0xd4 >> [1584289.209569] __alloc_pages_slowpath.constprop.0+0xd57/0xfb0 >> [1584289.209574] __alloc_pages_nodemask+0x15a/0x180 >> [1584289.474009] allocate_slab+0x272/0x450 >> [1584289.496731] ___slab_alloc.constprop.0+0x41e/0x4d0 >> [1584289.519147] kmem_cache_alloc+0x110/0x120 >> [1584289.541416] build_skb+0x1a/0x200 >> [1584289.563121] ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe] >> [1584289.584618] ixgbe_poll+0xeb/0x2a0 [ixgbe] >> [1584289.605528] __napi_poll+0x1f/0x130 >> [1584289.625842] napi_threaded_poll+0x110/0x160 >> [1584289.646110] ? __napi_poll+0x130/0x130 >> [1584289.665810] kthread+0xea/0x120 >> [1584289.684836] ? kthread_park+0x80/0x80 >> [1584289.703440] ret_from_fork+0x1f/0x30 >> [1584289.721616] Mem-Info: >> [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0 >> active_file:17408 inactive_file:149 isolated_file:32 >> unevictable:1440359 dirty:17500 writeback:0 >> slab_reclaimable:43368 slab_unreclaimable:155124 >> mapped:817431 shmem:7650 pagetables:32093 bounce:0 >> free:17832 free_pcp:113 free_cma:0 >> [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no >> [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB >> [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726 >> [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB >> [1584290.104980] lowmem_reserve[]: 0 0 13985 13985 >> [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB >> [1584290.237051] lowmem_reserve[]: 0 0 0 0 >> [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB >> [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB >> [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB >> [1584290.409087] 1465768 total pagecache pages >> [1584290.434531] 4165289 pages RAM >> [1584290.459616] 0 pages HighMem/MovableOnly >> [1584290.484480] 104766 pages reserved >> [1584290.508709] 0 pages hwpoisoned >> [1584301.710231] team0: Failed to send options change via netlink (err -105) >> [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0 >> [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G O 5.11.4 #1 >> [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 >> [1584302.776532] Call Trace: >> [1584302.799361] dump_stack+0x58/0x6b >> [1584302.821791] dump_header+0x4c/0x2e6 >> [1584302.843580] oom_kill_process.cold+0xb/0x10 >> [1584302.865223] out_of_memory.part.0+0x125/0x5f0 >> [1584302.886641] out_of_memory+0x54/0xa0 >> [1584302.907302] __alloc_pages_slowpath.constprop.0+0xb03/0xfb0 >> [1584302.927913] __alloc_pages_nodemask+0x15a/0x180 >> [1584302.947874] __get_free_pages+0x8/0x30 >> [1584302.967246] pgd_alloc+0x21/0x180 >> [1584302.986355] mm_alloc+0x1af/0x250 >> [1584303.005085] alloc_bprm+0x80/0x2a0 >> [1584303.023328] do_execveat_common+0x8b/0x330 >> [1584303.041181] __x64_sys_execve+0x2b/0x40 >> [1584303.058513] do_syscall_64+0x2d/0x40 >> [1584303.075281] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> [1584303.091891] RIP: 0033:0x488376 >> [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30 >> [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b >> [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376 >> [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660 >> [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000 >> [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258 >> [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100 >> [1584303.379094] Mem-Info: >> [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0 >> active_file:12975 inactive_file:168 isolated_file:32 >> unevictable:909709 dirty:12864 writeback:10 >> slab_reclaimable:42415 slab_unreclaimable:154783 >> mapped:39825 shmem:14744 pagetables:26041 bounce:0 >> free:537002 free_pcp:1813 free_cma:0 >> [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no >> [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB >> [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726 >> [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB >> [1584303.888935] lowmem_reserve[]: 0 0 13985 13985 >> [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB >> [1584304.036531] lowmem_reserve[]: 0 0 0 0 >> [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB >> [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB >> [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB >> [1584304.287094] 933871 total pagecache pages >> [1584304.312815] 4165289 pages RAM >> [1584304.337915] 0 pages HighMem/MovableOnly >> [1584304.362522] 104766 pages reserved >> [1584304.386516] 0 pages hwpoisoned >> >>> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote: >>> >>> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote: >>>> >>>> Hi Wei >>>> Check this: >>>> >>>> [ 39.706567] ------------[ cut here ]------------ >>>> [ 39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557) >>>> [ 39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100 >>> >>> Probably more relevant to Intel maintainers than Wei :/ >>> >>>> [ 39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas >>>> [ 39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G O 5.11.7 #1 >>>> [ 39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020 >>>> [ 39.706619] Workqueue: events work_for_cpu_fn >>>> [ 39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100 >>>> [ 39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 >>>> [ 39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292 >>>> [ 39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff >>>> [ 39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001 >>>> [ 39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea >>>> [ 39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008 >>>> [ 39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000 >>>> [ 39.706646] FS: 0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000 >>>> [ 39.706649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> [ 39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0 >>>> [ 39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>> [ 39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>> [ 39.706656] Call Trace: >>>> [ 39.706658] i40e_setup_pf_switch+0x617/0xf80 [i40e] >>>> [ 39.706683] i40e_probe.part.0.cold+0x8dc/0x109e [i40e] >>>> [ 39.706708] ? acpi_ns_check_object_type+0xd4/0x193 >>>> [ 39.706713] ? acpi_ns_check_package_list+0xfd/0x205 >>>> [ 39.706716] ? __kmalloc+0x37/0x160 >>>> [ 39.706720] ? kmem_cache_alloc+0xcb/0x120 >>>> [ 39.706723] ? irq_get_irq_data+0x5/0x20 >>>> [ 39.706726] ? mp_check_pin_attr+0xe/0xf0 >>>> [ 39.706729] ? irq_get_irq_data+0x5/0x20 >>>> [ 39.706731] ? mp_map_pin_to_irq+0xb7/0x2c0 >>>> [ 39.706735] ? acpi_register_gsi_ioapic+0x86/0x150 >>>> [ 39.706739] ? pci_conf1_read+0x9f/0xf0 >>>> [ 39.706743] ? pci_bus_read_config_word+0x2e/0x40 >>>> [ 39.706746] local_pci_probe+0x1b/0x40 >>>> [ 39.706750] work_for_cpu_fn+0xb/0x20 >>>> [ 39.706754] process_one_work+0x1ec/0x350 >>>> [ 39.706758] worker_thread+0x24b/0x4d0 >>>> [ 39.706760] ? process_one_work+0x350/0x350 >>>> [ 39.706762] kthread+0xea/0x120 >>>> [ 39.706766] ? kthread_park+0x80/0x80 >>>> [ 39.706770] ret_from_fork+0x1f/0x30 >>>> [ 39.706774] ---[ end trace 7a203f3ec972a377 ]--- >>>> >>>> Martin >>>> >>>> >>>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote: >>>>> >>>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to >>>>> determine if the kthread owns this napi and could call napi->poll() on >>>>> it. However, if socket busy poll is enabled, it is possible that the >>>>> busy poll thread grabs this SCHED bit (after the previous napi->poll() >>>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll >>>>> on the same napi. napi_disable() could grab the SCHED bit as well. >>>>> This patch tries to fix this race by adding a new bit >>>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in >>>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared >>>>> in napi_complete_done(), and we only poll the napi in kthread if this >>>>> bit is set. This helps distinguish the ownership of the napi between >>>>> kthread and other scenarios and fixes the race issue. >>>>> >>>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") >>>>> Reported-by: Martin Zaharinov <micron10@gmail.com> >>>>> Suggested-by: Jakub Kicinski <kuba@kernel.org> >>>>> Signed-off-by: Wei Wang <weiwan@google.com> >>>>> Cc: Alexander Duyck <alexanderduyck@fb.com> >>>>> Cc: Eric Dumazet <edumazet@google.com> >>>>> Cc: Paolo Abeni <pabeni@redhat.com> >>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> >>>>> --- >>>>> Change since v3: >>>>> - Add READ_ONCE() for thread->state and add comments in >>>>> ____napi_schedule(). >>>>> >>>>> include/linux/netdevice.h | 2 ++ >>>>> net/core/dev.c | 19 ++++++++++++++++++- >>>>> 2 files changed, 20 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>>>> index 5b67ea89d5f2..87a5d186faff 100644 >>>>> --- a/include/linux/netdevice.h >>>>> +++ b/include/linux/netdevice.h >>>>> @@ -360,6 +360,7 @@ enum { >>>>> NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */ >>>>> NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/ >>>>> NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/ >>>>> + NAPI_STATE_SCHED_THREADED, /* Napi is currently scheduled in threaded mode */ >>>>> }; >>>>> >>>>> enum { >>>>> @@ -372,6 +373,7 @@ enum { >>>>> NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL), >>>>> NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL), >>>>> NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED), >>>>> + NAPIF_STATE_SCHED_THREADED = BIT(NAPI_STATE_SCHED_THREADED), >>>>> }; >>>>> >>>>> enum gro_result { >>>>> diff --git a/net/core/dev.c b/net/core/dev.c >>>>> index 6c5967e80132..d3195a95f30e 100644 >>>>> --- a/net/core/dev.c >>>>> +++ b/net/core/dev.c >>>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd, >>>>> */ >>>>> thread = READ_ONCE(napi->thread); >>>>> if (thread) { >>>>> + /* Avoid doing set_bit() if the thread is in >>>>> + * INTERRUPTIBLE state, cause napi_thread_wait() >>>>> + * makes sure to proceed with napi polling >>>>> + * if the thread is explicitly woken from here. >>>>> + */ >>>>> + if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE) >>>>> + set_bit(NAPI_STATE_SCHED_THREADED, &napi->state); >>>>> wake_up_process(thread); >>>>> return; >>>>> } >>>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done) >>>>> WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED)); >>>>> >>>>> new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED | >>>>> + NAPIF_STATE_SCHED_THREADED | >>>>> NAPIF_STATE_PREFER_BUSY_POLL); >>>>> >>>>> /* If STATE_MISSED was set, leave STATE_SCHED set, >>>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) >>>>> >>>>> static int napi_thread_wait(struct napi_struct *napi) >>>>> { >>>>> + bool woken = false; >>>>> + >>>>> set_current_state(TASK_INTERRUPTIBLE); >>>>> >>>>> while (!kthread_should_stop() && !napi_disable_pending(napi)) { >>>>> - if (test_bit(NAPI_STATE_SCHED, &napi->state)) { >>>>> + /* Testing SCHED_THREADED bit here to make sure the current >>>>> + * kthread owns this napi and could poll on this napi. >>>>> + * Testing SCHED bit is not enough because SCHED bit might be >>>>> + * set by some other busy poll thread or by napi_disable(). >>>>> + */ >>>>> + if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) { >>>>> WARN_ON(!list_empty(&napi->poll_list)); >>>>> __set_current_state(TASK_RUNNING); >>>>> return 0; >>>>> } >>>>> >>>>> schedule(); >>>>> + /* woken being true indicates this thread owns this napi. */ >>>>> + woken = true; >>>>> set_current_state(TASK_INTERRUPTIBLE); >>>>> } >>>>> __set_current_state(TASK_RUNNING); >>>>> -- >>>>> 2.31.0.rc2.261.g7f71774620-goog >>>>> >>>> >>
Hi Martin, Is there a reproducer for this? What kind of traffic is it running? What is the following config: cat /proc/sys/net/core/busy_poll cat /proc/sys/net/core/busy_read cat /sys/class/net/<ixgbe_dev>/threaded And is SO_PREFER_BUSY_POLL used? Thanks. Wei On Thu, Sep 9, 2021 at 4:18 AM Martin Zaharinov <micron10@gmail.com> wrote: > > Hi Eric and Wei > > Please see this bug report from last hour , > Kernel 5.13.13, Traffic is 7Gb/s down/ 7Gb/s up > Uptime before crash : 10day > > > > > Sep 9 12:49:31 [829553.899833][ T2925] ------------[ cut here ]------------ > Sep 9 12:49:31 [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158 > Sep 9 12:49:31 [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90 > Sep 9 12:49:31 [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] > Sep 9 12:49:31 [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G O 5.13.13 #1 > Sep 9 12:49:31 [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 > Sep 9 12:49:31 [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90 > Sep 9 12:49:31 [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6 > Sep 9 12:49:31 [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286 > Sep 9 12:49:31 [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001 > Sep 9 12:49:32 [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff > Sep 9 12:49:32 [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff > Sep 9 12:49:32 [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00 > Sep 9 12:49:32 [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00 > Sep 9 12:49:32 [829554.744221][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 > Sep 9 12:49:32 [829554.795701][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Sep 9 12:49:32 [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0 > Sep 9 12:49:32 [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Sep 9 12:49:32 [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Sep 9 12:49:32 [829554.972250][ T2925] Call Trace: > Sep 9 12:49:32 [829554.996597][ T2925] netif_receive_skb_list_internal+0x25c/0x2b0 > Sep 9 12:49:32 [829555.021270][ T2925] busy_poll_stop+0x113/0x140 > Sep 9 12:49:32 [829555.045679][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 > Sep 9 12:49:32 [829555.069833][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] > Sep 9 12:49:32 [829555.093659][ T2925] napi_busy_loop+0x212/0x280 > Sep 9 12:49:32 [829555.117046][ T2925] ep_poll+0xba/0x380 > Sep 9 12:49:32 [829555.140048][ T2925] ? __napi_poll+0x1f/0x100 > Sep 9 12:49:32 [829555.162477][ T2925] do_epoll_wait+0xa6/0xc0 > Sep 9 12:49:32 [829555.184504][ T2925] do_epoll_pwait.part.0+0x9/0x70 > Sep 9 12:49:32 [829555.206138][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 > Sep 9 12:49:32 [829555.227619][ T2925] ? do_syscall_64+0x3a/0x70 > Sep 9 12:49:32 [829555.248592][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae > Sep 9 12:49:32 [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]--- > Sep 9 12:49:32 [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9 > Sep 9 12:49:32 [829555.357231][ T2925] #PF: supervisor read access in kernel mode > Sep 9 12:49:32 [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page > Sep 9 12:49:32 [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0 > Sep 9 12:49:32 [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI > Sep 9 12:49:32 [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G W O 5.13.13 #1 > Sep 9 12:49:32 [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 > Sep 9 12:49:32 [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 > Sep 9 12:49:33 [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 > Sep 9 12:49:33 [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 > Sep 9 12:49:33 [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 > Sep 9 12:49:33 [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 > Sep 9 12:49:33 [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 > Sep 9 12:49:33 [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 > Sep 9 12:49:33 [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 > Sep 9 12:49:33 [829555.797754][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 > Sep 9 12:49:33 [829555.843229][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Sep 9 12:49:33 [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 > Sep 9 12:49:33 [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Sep 9 12:49:33 [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Sep 9 12:49:33 [829556.008563][ T2925] Call Trace: > Sep 9 12:49:33 [829556.032547][ T2925] ? enqueue_to_backlog+0x81/0x250 > Sep 9 12:49:33 [829556.056686][ T2925] netif_receive_skb_list_internal+0x24d/0x2b0 > Sep 9 12:49:33 [829556.080870][ T2925] busy_poll_stop+0x113/0x140 > Sep 9 12:49:33 [829556.104559][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 > Sep 9 12:49:33 [829556.128028][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] > Sep 9 12:49:33 [829556.151405][ T2925] napi_busy_loop+0x212/0x280 > Sep 9 12:49:33 [829556.174478][ T2925] ep_poll+0xba/0x380 > Sep 9 12:49:33 [829556.196887][ T2925] ? __napi_poll+0x1f/0x100 > Sep 9 12:49:33 [829556.219070][ T2925] do_epoll_wait+0xa6/0xc0 > Sep 9 12:49:33 [829556.240778][ T2925] do_epoll_pwait.part.0+0x9/0x70 > Sep 9 12:49:33 [829556.262203][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 > Sep 9 12:49:33 [829556.283188][ T2925] ? do_syscall_64+0x3a/0x70 > Sep 9 12:49:33 [829556.303666][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae > Sep 9 12:49:33 [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] > Sep 9 12:49:33 [829556.487037][ T2925] CR2: 00000000000496c9 > Sep 9 12:49:33 [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]--- > Sep 9 12:49:33 [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 > Sep 9 12:49:34 [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 > Sep 9 12:49:34 [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 > Sep 9 12:49:34 [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 > Sep 9 12:49:34 [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 > Sep 9 12:49:34 [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 > Sep 9 12:49:34 [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 > Sep 9 12:49:34 [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 > Sep 9 12:49:34 [829556.831749][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 > Sep 9 12:49:34 [829556.879295][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Sep 9 12:49:34 [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 > Sep 9 12:49:34 [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Sep 9 12:49:34 [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Sep 9 12:49:34 [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt > Sep 9 12:49:34 [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > Sep 9 12:49:34 [829557.231174][ T2925] Rebooting in 10 seconds.. > Sep 9 12:49:44 [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG. > > > On 30 Mar 2021, at 16:39, Eric Dumazet <edumazet@google.com> wrote: > > > > On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote: > >> > >> Hi Eric and Wei > >> > >> Please check this log : > >> > > > > Please send a normal report to netdev. > > > > This has nothing to to with us (Eric & Wei) > > > > Thanks. > > > >> > >> 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null) > >> [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G O 5.11.4 #1 > >> [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 > >> [1584289.107263] Call Trace: > >> [1584289.107266] dump_stack+0x58/0x6b > >> [1584289.209562] warn_alloc.cold+0x70/0xd4 > >> [1584289.209569] __alloc_pages_slowpath.constprop.0+0xd57/0xfb0 > >> [1584289.209574] __alloc_pages_nodemask+0x15a/0x180 > >> [1584289.474009] allocate_slab+0x272/0x450 > >> [1584289.496731] ___slab_alloc.constprop.0+0x41e/0x4d0 > >> [1584289.519147] kmem_cache_alloc+0x110/0x120 > >> [1584289.541416] build_skb+0x1a/0x200 > >> [1584289.563121] ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe] > >> [1584289.584618] ixgbe_poll+0xeb/0x2a0 [ixgbe] > >> [1584289.605528] __napi_poll+0x1f/0x130 > >> [1584289.625842] napi_threaded_poll+0x110/0x160 > >> [1584289.646110] ? __napi_poll+0x130/0x130 > >> [1584289.665810] kthread+0xea/0x120 > >> [1584289.684836] ? kthread_park+0x80/0x80 > >> [1584289.703440] ret_from_fork+0x1f/0x30 > >> [1584289.721616] Mem-Info: > >> [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0 > >> active_file:17408 inactive_file:149 isolated_file:32 > >> unevictable:1440359 dirty:17500 writeback:0 > >> slab_reclaimable:43368 slab_unreclaimable:155124 > >> mapped:817431 shmem:7650 pagetables:32093 bounce:0 > >> free:17832 free_pcp:113 free_cma:0 > >> [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no > >> [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > >> [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726 > >> [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB > >> [1584290.104980] lowmem_reserve[]: 0 0 13985 13985 > >> [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB > >> [1584290.237051] lowmem_reserve[]: 0 0 0 0 > >> [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB > >> [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB > >> [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB > >> [1584290.409087] 1465768 total pagecache pages > >> [1584290.434531] 4165289 pages RAM > >> [1584290.459616] 0 pages HighMem/MovableOnly > >> [1584290.484480] 104766 pages reserved > >> [1584290.508709] 0 pages hwpoisoned > >> [1584301.710231] team0: Failed to send options change via netlink (err -105) > >> [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0 > >> [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G O 5.11.4 #1 > >> [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 > >> [1584302.776532] Call Trace: > >> [1584302.799361] dump_stack+0x58/0x6b > >> [1584302.821791] dump_header+0x4c/0x2e6 > >> [1584302.843580] oom_kill_process.cold+0xb/0x10 > >> [1584302.865223] out_of_memory.part.0+0x125/0x5f0 > >> [1584302.886641] out_of_memory+0x54/0xa0 > >> [1584302.907302] __alloc_pages_slowpath.constprop.0+0xb03/0xfb0 > >> [1584302.927913] __alloc_pages_nodemask+0x15a/0x180 > >> [1584302.947874] __get_free_pages+0x8/0x30 > >> [1584302.967246] pgd_alloc+0x21/0x180 > >> [1584302.986355] mm_alloc+0x1af/0x250 > >> [1584303.005085] alloc_bprm+0x80/0x2a0 > >> [1584303.023328] do_execveat_common+0x8b/0x330 > >> [1584303.041181] __x64_sys_execve+0x2b/0x40 > >> [1584303.058513] do_syscall_64+0x2d/0x40 > >> [1584303.075281] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >> [1584303.091891] RIP: 0033:0x488376 > >> [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30 > >> [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b > >> [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376 > >> [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660 > >> [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000 > >> [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258 > >> [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100 > >> [1584303.379094] Mem-Info: > >> [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0 > >> active_file:12975 inactive_file:168 isolated_file:32 > >> unevictable:909709 dirty:12864 writeback:10 > >> slab_reclaimable:42415 slab_unreclaimable:154783 > >> mapped:39825 shmem:14744 pagetables:26041 bounce:0 > >> free:537002 free_pcp:1813 free_cma:0 > >> [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no > >> [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > >> [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726 > >> [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB > >> [1584303.888935] lowmem_reserve[]: 0 0 13985 13985 > >> [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB > >> [1584304.036531] lowmem_reserve[]: 0 0 0 0 > >> [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB > >> [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB > >> [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB > >> [1584304.287094] 933871 total pagecache pages > >> [1584304.312815] 4165289 pages RAM > >> [1584304.337915] 0 pages HighMem/MovableOnly > >> [1584304.362522] 104766 pages reserved > >> [1584304.386516] 0 pages hwpoisoned > >> > >>> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote: > >>> > >>> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote: > >>>> > >>>> Hi Wei > >>>> Check this: > >>>> > >>>> [ 39.706567] ------------[ cut here ]------------ > >>>> [ 39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557) > >>>> [ 39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100 > >>> > >>> Probably more relevant to Intel maintainers than Wei :/ > >>> > >>>> [ 39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas > >>>> [ 39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G O 5.11.7 #1 > >>>> [ 39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020 > >>>> [ 39.706619] Workqueue: events work_for_cpu_fn > >>>> [ 39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100 > >>>> [ 39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 > >>>> [ 39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292 > >>>> [ 39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff > >>>> [ 39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001 > >>>> [ 39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea > >>>> [ 39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008 > >>>> [ 39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000 > >>>> [ 39.706646] FS: 0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000 > >>>> [ 39.706649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>> [ 39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0 > >>>> [ 39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>> [ 39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>>> [ 39.706656] Call Trace: > >>>> [ 39.706658] i40e_setup_pf_switch+0x617/0xf80 [i40e] > >>>> [ 39.706683] i40e_probe.part.0.cold+0x8dc/0x109e [i40e] > >>>> [ 39.706708] ? acpi_ns_check_object_type+0xd4/0x193 > >>>> [ 39.706713] ? acpi_ns_check_package_list+0xfd/0x205 > >>>> [ 39.706716] ? __kmalloc+0x37/0x160 > >>>> [ 39.706720] ? kmem_cache_alloc+0xcb/0x120 > >>>> [ 39.706723] ? irq_get_irq_data+0x5/0x20 > >>>> [ 39.706726] ? mp_check_pin_attr+0xe/0xf0 > >>>> [ 39.706729] ? irq_get_irq_data+0x5/0x20 > >>>> [ 39.706731] ? mp_map_pin_to_irq+0xb7/0x2c0 > >>>> [ 39.706735] ? acpi_register_gsi_ioapic+0x86/0x150 > >>>> [ 39.706739] ? pci_conf1_read+0x9f/0xf0 > >>>> [ 39.706743] ? pci_bus_read_config_word+0x2e/0x40 > >>>> [ 39.706746] local_pci_probe+0x1b/0x40 > >>>> [ 39.706750] work_for_cpu_fn+0xb/0x20 > >>>> [ 39.706754] process_one_work+0x1ec/0x350 > >>>> [ 39.706758] worker_thread+0x24b/0x4d0 > >>>> [ 39.706760] ? process_one_work+0x350/0x350 > >>>> [ 39.706762] kthread+0xea/0x120 > >>>> [ 39.706766] ? kthread_park+0x80/0x80 > >>>> [ 39.706770] ret_from_fork+0x1f/0x30 > >>>> [ 39.706774] ---[ end trace 7a203f3ec972a377 ]--- > >>>> > >>>> Martin > >>>> > >>>> > >>>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote: > >>>>> > >>>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to > >>>>> determine if the kthread owns this napi and could call napi->poll() on > >>>>> it. However, if socket busy poll is enabled, it is possible that the > >>>>> busy poll thread grabs this SCHED bit (after the previous napi->poll() > >>>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll > >>>>> on the same napi. napi_disable() could grab the SCHED bit as well. > >>>>> This patch tries to fix this race by adding a new bit > >>>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in > >>>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared > >>>>> in napi_complete_done(), and we only poll the napi in kthread if this > >>>>> bit is set. This helps distinguish the ownership of the napi between > >>>>> kthread and other scenarios and fixes the race issue. > >>>>> > >>>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") > >>>>> Reported-by: Martin Zaharinov <micron10@gmail.com> > >>>>> Suggested-by: Jakub Kicinski <kuba@kernel.org> > >>>>> Signed-off-by: Wei Wang <weiwan@google.com> > >>>>> Cc: Alexander Duyck <alexanderduyck@fb.com> > >>>>> Cc: Eric Dumazet <edumazet@google.com> > >>>>> Cc: Paolo Abeni <pabeni@redhat.com> > >>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> > >>>>> --- > >>>>> Change since v3: > >>>>> - Add READ_ONCE() for thread->state and add comments in > >>>>> ____napi_schedule(). > >>>>> > >>>>> include/linux/netdevice.h | 2 ++ > >>>>> net/core/dev.c | 19 ++++++++++++++++++- > >>>>> 2 files changed, 20 insertions(+), 1 deletion(-) > >>>>> > >>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > >>>>> index 5b67ea89d5f2..87a5d186faff 100644 > >>>>> --- a/include/linux/netdevice.h > >>>>> +++ b/include/linux/netdevice.h > >>>>> @@ -360,6 +360,7 @@ enum { > >>>>> NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */ > >>>>> NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/ > >>>>> NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/ > >>>>> + NAPI_STATE_SCHED_THREADED, /* Napi is currently scheduled in threaded mode */ > >>>>> }; > >>>>> > >>>>> enum { > >>>>> @@ -372,6 +373,7 @@ enum { > >>>>> NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL), > >>>>> NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL), > >>>>> NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED), > >>>>> + NAPIF_STATE_SCHED_THREADED = BIT(NAPI_STATE_SCHED_THREADED), > >>>>> }; > >>>>> > >>>>> enum gro_result { > >>>>> diff --git a/net/core/dev.c b/net/core/dev.c > >>>>> index 6c5967e80132..d3195a95f30e 100644 > >>>>> --- a/net/core/dev.c > >>>>> +++ b/net/core/dev.c > >>>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd, > >>>>> */ > >>>>> thread = READ_ONCE(napi->thread); > >>>>> if (thread) { > >>>>> + /* Avoid doing set_bit() if the thread is in > >>>>> + * INTERRUPTIBLE state, cause napi_thread_wait() > >>>>> + * makes sure to proceed with napi polling > >>>>> + * if the thread is explicitly woken from here. > >>>>> + */ > >>>>> + if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE) > >>>>> + set_bit(NAPI_STATE_SCHED_THREADED, &napi->state); > >>>>> wake_up_process(thread); > >>>>> return; > >>>>> } > >>>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done) > >>>>> WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED)); > >>>>> > >>>>> new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED | > >>>>> + NAPIF_STATE_SCHED_THREADED | > >>>>> NAPIF_STATE_PREFER_BUSY_POLL); > >>>>> > >>>>> /* If STATE_MISSED was set, leave STATE_SCHED set, > >>>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) > >>>>> > >>>>> static int napi_thread_wait(struct napi_struct *napi) > >>>>> { > >>>>> + bool woken = false; > >>>>> + > >>>>> set_current_state(TASK_INTERRUPTIBLE); > >>>>> > >>>>> while (!kthread_should_stop() && !napi_disable_pending(napi)) { > >>>>> - if (test_bit(NAPI_STATE_SCHED, &napi->state)) { > >>>>> + /* Testing SCHED_THREADED bit here to make sure the current > >>>>> + * kthread owns this napi and could poll on this napi. > >>>>> + * Testing SCHED bit is not enough because SCHED bit might be > >>>>> + * set by some other busy poll thread or by napi_disable(). > >>>>> + */ > >>>>> + if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) { > >>>>> WARN_ON(!list_empty(&napi->poll_list)); > >>>>> __set_current_state(TASK_RUNNING); > >>>>> return 0; > >>>>> } > >>>>> > >>>>> schedule(); > >>>>> + /* woken being true indicates this thread owns this napi. */ > >>>>> + woken = true; > >>>>> set_current_state(TASK_INTERRUPTIBLE); > >>>>> } > >>>>> __set_current_state(TASK_RUNNING); > >>>>> -- > >>>>> 2.31.0.rc2.261.g7f71774620-goog > >>>>> > >>>> > >> >
Hi Wei, The problem is hard to reproduce. I see for second time this problem in a period of 1 month. Server use for Firewall/NAT/PPPOE and have connect users to machine. cat /proc/sys/net/core/busy_poll - 50 cat /proc/sys/net/core/busy_read - 50 cat /sys/class/net/eth0/threaded - 1 cat /sys/class/net/eth1/threaded - 1 May be not use SO_PREFER_BUSY_POLL - how to check and enable? P.S. Eth0 and eth1 united in one common teamd/bond Best regards, Martin > On 10 Sep 2021, at 3:30, Wei Wang <weiwan@google.com> wrote: > > Hi Martin, > > Is there a reproducer for this? What kind of traffic is it running? > What is the following config: > cat /proc/sys/net/core/busy_poll > cat /proc/sys/net/core/busy_read > cat /sys/class/net/<ixgbe_dev>/threaded > And is SO_PREFER_BUSY_POLL used? > > Thanks. > Wei > > > > On Thu, Sep 9, 2021 at 4:18 AM Martin Zaharinov <micron10@gmail.com> wrote: >> >> Hi Eric and Wei >> >> Please see this bug report from last hour , >> Kernel 5.13.13, Traffic is 7Gb/s down/ 7Gb/s up >> Uptime before crash : 10day >> >> >> >> >> Sep 9 12:49:31 [829553.899833][ T2925] ------------[ cut here ]------------ >> Sep 9 12:49:31 [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158 >> Sep 9 12:49:31 [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90 >> Sep 9 12:49:31 [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] >> Sep 9 12:49:31 [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G O 5.13.13 #1 >> Sep 9 12:49:31 [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 >> Sep 9 12:49:31 [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90 >> Sep 9 12:49:31 [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6 >> Sep 9 12:49:31 [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286 >> Sep 9 12:49:31 [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001 >> Sep 9 12:49:32 [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff >> Sep 9 12:49:32 [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff >> Sep 9 12:49:32 [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00 >> Sep 9 12:49:32 [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00 >> Sep 9 12:49:32 [829554.744221][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >> Sep 9 12:49:32 [829554.795701][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> Sep 9 12:49:32 [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0 >> Sep 9 12:49:32 [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> Sep 9 12:49:32 [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> Sep 9 12:49:32 [829554.972250][ T2925] Call Trace: >> Sep 9 12:49:32 [829554.996597][ T2925] netif_receive_skb_list_internal+0x25c/0x2b0 >> Sep 9 12:49:32 [829555.021270][ T2925] busy_poll_stop+0x113/0x140 >> Sep 9 12:49:32 [829555.045679][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 >> Sep 9 12:49:32 [829555.069833][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] >> Sep 9 12:49:32 [829555.093659][ T2925] napi_busy_loop+0x212/0x280 >> Sep 9 12:49:32 [829555.117046][ T2925] ep_poll+0xba/0x380 >> Sep 9 12:49:32 [829555.140048][ T2925] ? __napi_poll+0x1f/0x100 >> Sep 9 12:49:32 [829555.162477][ T2925] do_epoll_wait+0xa6/0xc0 >> Sep 9 12:49:32 [829555.184504][ T2925] do_epoll_pwait.part.0+0x9/0x70 >> Sep 9 12:49:32 [829555.206138][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 >> Sep 9 12:49:32 [829555.227619][ T2925] ? do_syscall_64+0x3a/0x70 >> Sep 9 12:49:32 [829555.248592][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae >> Sep 9 12:49:32 [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]--- >> Sep 9 12:49:32 [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9 >> Sep 9 12:49:32 [829555.357231][ T2925] #PF: supervisor read access in kernel mode >> Sep 9 12:49:32 [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page >> Sep 9 12:49:32 [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0 >> Sep 9 12:49:32 [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI >> Sep 9 12:49:32 [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G W O 5.13.13 #1 >> Sep 9 12:49:32 [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 >> Sep 9 12:49:32 [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 >> Sep 9 12:49:33 [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 >> Sep 9 12:49:33 [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 >> Sep 9 12:49:33 [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 >> Sep 9 12:49:33 [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 >> Sep 9 12:49:33 [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 >> Sep 9 12:49:33 [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 >> Sep 9 12:49:33 [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 >> Sep 9 12:49:33 [829555.797754][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >> Sep 9 12:49:33 [829555.843229][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> Sep 9 12:49:33 [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 >> Sep 9 12:49:33 [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> Sep 9 12:49:33 [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> Sep 9 12:49:33 [829556.008563][ T2925] Call Trace: >> Sep 9 12:49:33 [829556.032547][ T2925] ? enqueue_to_backlog+0x81/0x250 >> Sep 9 12:49:33 [829556.056686][ T2925] netif_receive_skb_list_internal+0x24d/0x2b0 >> Sep 9 12:49:33 [829556.080870][ T2925] busy_poll_stop+0x113/0x140 >> Sep 9 12:49:33 [829556.104559][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 >> Sep 9 12:49:33 [829556.128028][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] >> Sep 9 12:49:33 [829556.151405][ T2925] napi_busy_loop+0x212/0x280 >> Sep 9 12:49:33 [829556.174478][ T2925] ep_poll+0xba/0x380 >> Sep 9 12:49:33 [829556.196887][ T2925] ? __napi_poll+0x1f/0x100 >> Sep 9 12:49:33 [829556.219070][ T2925] do_epoll_wait+0xa6/0xc0 >> Sep 9 12:49:33 [829556.240778][ T2925] do_epoll_pwait.part.0+0x9/0x70 >> Sep 9 12:49:33 [829556.262203][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 >> Sep 9 12:49:33 [829556.283188][ T2925] ? do_syscall_64+0x3a/0x70 >> Sep 9 12:49:33 [829556.303666][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae >> Sep 9 12:49:33 [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] >> Sep 9 12:49:33 [829556.487037][ T2925] CR2: 00000000000496c9 >> Sep 9 12:49:33 [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]--- >> Sep 9 12:49:33 [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 >> Sep 9 12:49:34 [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 >> Sep 9 12:49:34 [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 >> Sep 9 12:49:34 [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 >> Sep 9 12:49:34 [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 >> Sep 9 12:49:34 [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 >> Sep 9 12:49:34 [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 >> Sep 9 12:49:34 [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 >> Sep 9 12:49:34 [829556.831749][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >> Sep 9 12:49:34 [829556.879295][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> Sep 9 12:49:34 [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 >> Sep 9 12:49:34 [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> Sep 9 12:49:34 [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> Sep 9 12:49:34 [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt >> Sep 9 12:49:34 [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) >> Sep 9 12:49:34 [829557.231174][ T2925] Rebooting in 10 seconds.. >> Sep 9 12:49:44 [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG. >> >>> On 30 Mar 2021, at 16:39, Eric Dumazet <edumazet@google.com> wrote: >>> >>> On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote: >>>> >>>> Hi Eric and Wei >>>> >>>> Please check this log : >>>> >>> >>> Please send a normal report to netdev. >>> >>> This has nothing to to with us (Eric & Wei) >>> >>> Thanks. >>> >>>> >>>> 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null) >>>> [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G O 5.11.4 #1 >>>> [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 >>>> [1584289.107263] Call Trace: >>>> [1584289.107266] dump_stack+0x58/0x6b >>>> [1584289.209562] warn_alloc.cold+0x70/0xd4 >>>> [1584289.209569] __alloc_pages_slowpath.constprop.0+0xd57/0xfb0 >>>> [1584289.209574] __alloc_pages_nodemask+0x15a/0x180 >>>> [1584289.474009] allocate_slab+0x272/0x450 >>>> [1584289.496731] ___slab_alloc.constprop.0+0x41e/0x4d0 >>>> [1584289.519147] kmem_cache_alloc+0x110/0x120 >>>> [1584289.541416] build_skb+0x1a/0x200 >>>> [1584289.563121] ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe] >>>> [1584289.584618] ixgbe_poll+0xeb/0x2a0 [ixgbe] >>>> [1584289.605528] __napi_poll+0x1f/0x130 >>>> [1584289.625842] napi_threaded_poll+0x110/0x160 >>>> [1584289.646110] ? __napi_poll+0x130/0x130 >>>> [1584289.665810] kthread+0xea/0x120 >>>> [1584289.684836] ? kthread_park+0x80/0x80 >>>> [1584289.703440] ret_from_fork+0x1f/0x30 >>>> [1584289.721616] Mem-Info: >>>> [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0 >>>> active_file:17408 inactive_file:149 isolated_file:32 >>>> unevictable:1440359 dirty:17500 writeback:0 >>>> slab_reclaimable:43368 slab_unreclaimable:155124 >>>> mapped:817431 shmem:7650 pagetables:32093 bounce:0 >>>> free:17832 free_pcp:113 free_cma:0 >>>> [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no >>>> [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB >>>> [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726 >>>> [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB >>>> [1584290.104980] lowmem_reserve[]: 0 0 13985 13985 >>>> [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB >>>> [1584290.237051] lowmem_reserve[]: 0 0 0 0 >>>> [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB >>>> [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB >>>> [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB >>>> [1584290.409087] 1465768 total pagecache pages >>>> [1584290.434531] 4165289 pages RAM >>>> [1584290.459616] 0 pages HighMem/MovableOnly >>>> [1584290.484480] 104766 pages reserved >>>> [1584290.508709] 0 pages hwpoisoned >>>> [1584301.710231] team0: Failed to send options change via netlink (err -105) >>>> [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0 >>>> [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G O 5.11.4 #1 >>>> [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 >>>> [1584302.776532] Call Trace: >>>> [1584302.799361] dump_stack+0x58/0x6b >>>> [1584302.821791] dump_header+0x4c/0x2e6 >>>> [1584302.843580] oom_kill_process.cold+0xb/0x10 >>>> [1584302.865223] out_of_memory.part.0+0x125/0x5f0 >>>> [1584302.886641] out_of_memory+0x54/0xa0 >>>> [1584302.907302] __alloc_pages_slowpath.constprop.0+0xb03/0xfb0 >>>> [1584302.927913] __alloc_pages_nodemask+0x15a/0x180 >>>> [1584302.947874] __get_free_pages+0x8/0x30 >>>> [1584302.967246] pgd_alloc+0x21/0x180 >>>> [1584302.986355] mm_alloc+0x1af/0x250 >>>> [1584303.005085] alloc_bprm+0x80/0x2a0 >>>> [1584303.023328] do_execveat_common+0x8b/0x330 >>>> [1584303.041181] __x64_sys_execve+0x2b/0x40 >>>> [1584303.058513] do_syscall_64+0x2d/0x40 >>>> [1584303.075281] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>> [1584303.091891] RIP: 0033:0x488376 >>>> [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30 >>>> [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b >>>> [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376 >>>> [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660 >>>> [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000 >>>> [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258 >>>> [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100 >>>> [1584303.379094] Mem-Info: >>>> [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0 >>>> active_file:12975 inactive_file:168 isolated_file:32 >>>> unevictable:909709 dirty:12864 writeback:10 >>>> slab_reclaimable:42415 slab_unreclaimable:154783 >>>> mapped:39825 shmem:14744 pagetables:26041 bounce:0 >>>> free:537002 free_pcp:1813 free_cma:0 >>>> [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no >>>> [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB >>>> [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726 >>>> [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB >>>> [1584303.888935] lowmem_reserve[]: 0 0 13985 13985 >>>> [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB >>>> [1584304.036531] lowmem_reserve[]: 0 0 0 0 >>>> [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB >>>> [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB >>>> [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB >>>> [1584304.287094] 933871 total pagecache pages >>>> [1584304.312815] 4165289 pages RAM >>>> [1584304.337915] 0 pages HighMem/MovableOnly >>>> [1584304.362522] 104766 pages reserved >>>> [1584304.386516] 0 pages hwpoisoned >>>> >>>>> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote: >>>>> >>>>> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote: >>>>>> >>>>>> Hi Wei >>>>>> Check this: >>>>>> >>>>>> [ 39.706567] ------------[ cut here ]------------ >>>>>> [ 39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557) >>>>>> [ 39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100 >>>>> >>>>> Probably more relevant to Intel maintainers than Wei :/ >>>>> >>>>>> [ 39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas >>>>>> [ 39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G O 5.11.7 #1 >>>>>> [ 39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020 >>>>>> [ 39.706619] Workqueue: events work_for_cpu_fn >>>>>> [ 39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100 >>>>>> [ 39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 >>>>>> [ 39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292 >>>>>> [ 39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff >>>>>> [ 39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001 >>>>>> [ 39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea >>>>>> [ 39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008 >>>>>> [ 39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000 >>>>>> [ 39.706646] FS: 0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000 >>>>>> [ 39.706649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>> [ 39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0 >>>>>> [ 39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>> [ 39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>>>> [ 39.706656] Call Trace: >>>>>> [ 39.706658] i40e_setup_pf_switch+0x617/0xf80 [i40e] >>>>>> [ 39.706683] i40e_probe.part.0.cold+0x8dc/0x109e [i40e] >>>>>> [ 39.706708] ? acpi_ns_check_object_type+0xd4/0x193 >>>>>> [ 39.706713] ? acpi_ns_check_package_list+0xfd/0x205 >>>>>> [ 39.706716] ? __kmalloc+0x37/0x160 >>>>>> [ 39.706720] ? kmem_cache_alloc+0xcb/0x120 >>>>>> [ 39.706723] ? irq_get_irq_data+0x5/0x20 >>>>>> [ 39.706726] ? mp_check_pin_attr+0xe/0xf0 >>>>>> [ 39.706729] ? irq_get_irq_data+0x5/0x20 >>>>>> [ 39.706731] ? mp_map_pin_to_irq+0xb7/0x2c0 >>>>>> [ 39.706735] ? acpi_register_gsi_ioapic+0x86/0x150 >>>>>> [ 39.706739] ? pci_conf1_read+0x9f/0xf0 >>>>>> [ 39.706743] ? pci_bus_read_config_word+0x2e/0x40 >>>>>> [ 39.706746] local_pci_probe+0x1b/0x40 >>>>>> [ 39.706750] work_for_cpu_fn+0xb/0x20 >>>>>> [ 39.706754] process_one_work+0x1ec/0x350 >>>>>> [ 39.706758] worker_thread+0x24b/0x4d0 >>>>>> [ 39.706760] ? process_one_work+0x350/0x350 >>>>>> [ 39.706762] kthread+0xea/0x120 >>>>>> [ 39.706766] ? kthread_park+0x80/0x80 >>>>>> [ 39.706770] ret_from_fork+0x1f/0x30 >>>>>> [ 39.706774] ---[ end trace 7a203f3ec972a377 ]--- >>>>>> >>>>>> Martin >>>>>> >>>>>> >>>>>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote: >>>>>>> >>>>>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to >>>>>>> determine if the kthread owns this napi and could call napi->poll() on >>>>>>> it. However, if socket busy poll is enabled, it is possible that the >>>>>>> busy poll thread grabs this SCHED bit (after the previous napi->poll() >>>>>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll >>>>>>> on the same napi. napi_disable() could grab the SCHED bit as well. >>>>>>> This patch tries to fix this race by adding a new bit >>>>>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in >>>>>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared >>>>>>> in napi_complete_done(), and we only poll the napi in kthread if this >>>>>>> bit is set. This helps distinguish the ownership of the napi between >>>>>>> kthread and other scenarios and fixes the race issue. >>>>>>> >>>>>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") >>>>>>> Reported-by: Martin Zaharinov <micron10@gmail.com> >>>>>>> Suggested-by: Jakub Kicinski <kuba@kernel.org> >>>>>>> Signed-off-by: Wei Wang <weiwan@google.com> >>>>>>> Cc: Alexander Duyck <alexanderduyck@fb.com> >>>>>>> Cc: Eric Dumazet <edumazet@google.com> >>>>>>> Cc: Paolo Abeni <pabeni@redhat.com> >>>>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> >>>>>>> --- >>>>>>> Change since v3: >>>>>>> - Add READ_ONCE() for thread->state and add comments in >>>>>>> ____napi_schedule(). >>>>>>> >>>>>>> include/linux/netdevice.h | 2 ++ >>>>>>> net/core/dev.c | 19 ++++++++++++++++++- >>>>>>> 2 files changed, 20 insertions(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>>>>>> index 5b67ea89d5f2..87a5d186faff 100644 >>>>>>> --- a/include/linux/netdevice.h >>>>>>> +++ b/include/linux/netdevice.h >>>>>>> @@ -360,6 +360,7 @@ enum { >>>>>>> NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */ >>>>>>> NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/ >>>>>>> NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/ >>>>>>> + NAPI_STATE_SCHED_THREADED, /* Napi is currently scheduled in threaded mode */ >>>>>>> }; >>>>>>> >>>>>>> enum { >>>>>>> @@ -372,6 +373,7 @@ enum { >>>>>>> NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL), >>>>>>> NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL), >>>>>>> NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED), >>>>>>> + NAPIF_STATE_SCHED_THREADED = BIT(NAPI_STATE_SCHED_THREADED), >>>>>>> }; >>>>>>> >>>>>>> enum gro_result { >>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c >>>>>>> index 6c5967e80132..d3195a95f30e 100644 >>>>>>> --- a/net/core/dev.c >>>>>>> +++ b/net/core/dev.c >>>>>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd, >>>>>>> */ >>>>>>> thread = READ_ONCE(napi->thread); >>>>>>> if (thread) { >>>>>>> + /* Avoid doing set_bit() if the thread is in >>>>>>> + * INTERRUPTIBLE state, cause napi_thread_wait() >>>>>>> + * makes sure to proceed with napi polling >>>>>>> + * if the thread is explicitly woken from here. >>>>>>> + */ >>>>>>> + if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE) >>>>>>> + set_bit(NAPI_STATE_SCHED_THREADED, &napi->state); >>>>>>> wake_up_process(thread); >>>>>>> return; >>>>>>> } >>>>>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done) >>>>>>> WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED)); >>>>>>> >>>>>>> new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED | >>>>>>> + NAPIF_STATE_SCHED_THREADED | >>>>>>> NAPIF_STATE_PREFER_BUSY_POLL); >>>>>>> >>>>>>> /* If STATE_MISSED was set, leave STATE_SCHED set, >>>>>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) >>>>>>> >>>>>>> static int napi_thread_wait(struct napi_struct *napi) >>>>>>> { >>>>>>> + bool woken = false; >>>>>>> + >>>>>>> set_current_state(TASK_INTERRUPTIBLE); >>>>>>> >>>>>>> while (!kthread_should_stop() && !napi_disable_pending(napi)) { >>>>>>> - if (test_bit(NAPI_STATE_SCHED, &napi->state)) { >>>>>>> + /* Testing SCHED_THREADED bit here to make sure the current >>>>>>> + * kthread owns this napi and could poll on this napi. >>>>>>> + * Testing SCHED bit is not enough because SCHED bit might be >>>>>>> + * set by some other busy poll thread or by napi_disable(). >>>>>>> + */ >>>>>>> + if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) { >>>>>>> WARN_ON(!list_empty(&napi->poll_list)); >>>>>>> __set_current_state(TASK_RUNNING); >>>>>>> return 0; >>>>>>> } >>>>>>> >>>>>>> schedule(); >>>>>>> + /* woken being true indicates this thread owns this napi. */ >>>>>>> + woken = true; >>>>>>> set_current_state(TASK_INTERRUPTIBLE); >>>>>>> } >>>>>>> __set_current_state(TASK_RUNNING); >>>>>>> -- >>>>>>> 2.31.0.rc2.261.g7f71774620-goog >>>>>>> >>>>>> >>>> >>
Hi Wei Please see this bug log : Sep 15 08:04:56 [2034411.548669][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 15 08:04:56 [2034411.574642][ T3195] CR2: 00007f1e8cf58000 CR3: 0000000183cb4003 CR4: 00000000001706e0 Sep 15 08:04:56 [2034411.625156][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sep 15 08:04:56 [2034411.675495][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Sep 15 08:04:56 [2034411.725536][ T3195] Call Trace: Sep 15 08:04:56 [2034411.749948][ T3195] netif_receive_skb_list_internal+0x25c/0x2b0 Sep 15 08:04:56 [2034411.774579][ T3195] gro_normal_one+0x6e/0x90 Sep 15 08:04:56 [2034411.798786][ T3195] napi_gro_flush+0xb1/0x100 Sep 15 08:04:56 [2034411.822410][ T3195] napi_complete_done+0x107/0x180 Sep 15 08:04:56 [2034411.845614][ T3195] ixgbe_poll+0x10e/0x240 [ixgbe] Sep 15 08:04:56 [2034411.868480][ T3195] __napi_poll+0x1f/0x100 Sep 15 08:04:56 [2034411.890899][ T3195] ? __napi_poll+0x100/0x100 Sep 15 08:04:56 [2034411.912799][ T3195] napi_threaded_poll+0x105/0x150 Sep 15 08:04:56 [2034411.934567][ T3195] kthread+0x101/0x120 Sep 15 08:04:56 [2034411.955873][ T3195] ? set_kthread_struct+0x30/0x30 Sep 15 08:04:56 [2034411.977157][ T3195] ret_from_fork+0x1f/0x30 Sep 15 08:04:56 [2034411.997922][ T3195] ---[ end trace 83b8d17d2762bc73 ]--- Sep 15 08:04:56 [2034412.018439][ T3195] BUG: kernel NULL pointer dereference, address: 0000000000000000 Sep 15 08:04:56 [2034412.058658][ T3195] #PF: supervisor read access in kernel mode Sep 15 08:04:56 [2034412.078866][ T3195] #PF: error_code(0x0000) - not-present page Sep 15 08:04:56 [2034412.098648][ T3195] PGD 12fabb067 P4D 12fabb067 PUD 17d7fa067 PMD 0 Sep 15 08:04:56 [2034412.118230][ T3195] Oops: 0000 [#1] SMP NOPTI Sep 15 08:04:56 [2034412.137240][ T3195] CPU: 2 PID: 3195 Comm: napi/eth0-513 Tainted: G S W O 5.13.12 #1 Sep 15 08:04:56 [2034412.174538][ T3195] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 Sep 15 08:04:56 [2034412.211613][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0 Sep 15 08:04:56 [2034412.230852][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49 Sep 15 08:04:56 [2034412.287806][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296 Sep 15 08:04:56 [2034412.306525][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 Sep 15 08:04:56 [2034412.343308][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003 Sep 15 08:04:56 [2034412.381399][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff Sep 15 08:04:56 [2034412.421437][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00 Sep 15 08:04:57 [2034412.463438][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00 Sep 15 08:04:57 [2034412.507493][ T3195] FS: 0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000 Sep 15 08:04:57 [2034412.553528][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 15 08:04:57 [2034412.577424][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0 Sep 15 08:04:57 [2034412.624853][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sep 15 08:04:57 [2034412.672633][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Sep 15 08:04:57 [2034412.721656][ T3195] Call Trace: Sep 15 08:04:57 [2034412.746016][ T3195] gro_normal_one+0x6e/0x90 Sep 15 08:04:57 [2034412.770321][ T3195] napi_gro_flush+0xb1/0x100 Sep 15 08:04:57 [2034412.794137][ T3195] napi_complete_done+0x107/0x180 Sep 15 08:04:57 [2034412.817556][ T3195] ixgbe_poll+0x10e/0x240 [ixgbe] Sep 15 08:04:57 [2034412.840522][ T3195] __napi_poll+0x1f/0x100 Sep 15 08:04:57 [2034412.862829][ T3195] ? __napi_poll+0x100/0x100 Sep 15 08:04:57 [2034412.884804][ T3195] napi_threaded_poll+0x105/0x150 Sep 15 08:04:57 [2034412.906305][ T3195] kthread+0x101/0x120 Sep 15 08:04:57 [2034412.927502][ T3195] ? set_kthread_struct+0x30/0x30 Sep 15 08:04:57 [2034412.948434][ T3195] ret_from_fork+0x1f/0x30 Sep 15 08:04:57 [2034412.969117][ T3195] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_dtvqos(O) xt_TCPMSS xt_comment iptable_mangle xt_MASQUERADE xt_nat iptable_nat ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: DTVIAC] Sep 15 08:04:57 [2034413.136792][ T3195] CR2: 0000000000000000 Sep 15 08:04:57 [2034413.157289][ T3195] ---[ end trace 83b8d17d2762bc74 ]--- Sep 15 08:04:57 [2034413.177534][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0 Sep 15 08:04:57 [2034413.197775][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49 Sep 15 08:04:57 [2034413.258524][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296 Sep 15 08:04:57 [2034413.278591][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 Sep 15 08:04:57 [2034413.317532][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003 Sep 15 08:04:57 [2034413.356936][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff Sep 15 08:04:57 [2034413.398408][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00 Sep 15 08:04:58 [2034413.441949][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00 Sep 15 08:04:58 [2034413.487558][ T3195] FS: 0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000 Sep 15 08:04:58 [2034413.535263][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 15 08:04:58 [2034413.560012][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0 Sep 15 08:04:58 [2034413.609277][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sep 15 08:04:58 [2034413.658501][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Sep 15 08:04:58 [2034413.707535][ T3195] Kernel panic - not syncing: Fatal exception in interrupt Sep 15 08:04:58 [2034413.856962][ T3195] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Sep 15 08:04:58 [2034413.906445][ T3195] Rebooting in 10 seconds.. Sep 15 08:05:08 [2034423.930880][ T3195] ACPI MEMORY or I/O RESET_REG. > On 10 Sep 2021, at 3:30, Wei Wang <weiwan@google.com> wrote: > > Hi Martin, > > Is there a reproducer for this? What kind of traffic is it running? > What is the following config: > cat /proc/sys/net/core/busy_poll > cat /proc/sys/net/core/busy_read > cat /sys/class/net/<ixgbe_dev>/threaded > And is SO_PREFER_BUSY_POLL used? > > Thanks. > Wei > > > > On Thu, Sep 9, 2021 at 4:18 AM Martin Zaharinov <micron10@gmail.com> wrote: >> >> Hi Eric and Wei >> >> Please see this bug report from last hour , >> Kernel 5.13.13, Traffic is 7Gb/s down/ 7Gb/s up >> Uptime before crash : 10day >> >> >> >> >> Sep 9 12:49:31 [829553.899833][ T2925] ------------[ cut here ]------------ >> Sep 9 12:49:31 [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158 >> Sep 9 12:49:31 [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90 >> Sep 9 12:49:31 [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] >> Sep 9 12:49:31 [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G O 5.13.13 #1 >> Sep 9 12:49:31 [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 >> Sep 9 12:49:31 [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90 >> Sep 9 12:49:31 [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6 >> Sep 9 12:49:31 [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286 >> Sep 9 12:49:31 [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001 >> Sep 9 12:49:32 [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff >> Sep 9 12:49:32 [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff >> Sep 9 12:49:32 [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00 >> Sep 9 12:49:32 [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00 >> Sep 9 12:49:32 [829554.744221][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >> Sep 9 12:49:32 [829554.795701][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> Sep 9 12:49:32 [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0 >> Sep 9 12:49:32 [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> Sep 9 12:49:32 [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> Sep 9 12:49:32 [829554.972250][ T2925] Call Trace: >> Sep 9 12:49:32 [829554.996597][ T2925] netif_receive_skb_list_internal+0x25c/0x2b0 >> Sep 9 12:49:32 [829555.021270][ T2925] busy_poll_stop+0x113/0x140 >> Sep 9 12:49:32 [829555.045679][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 >> Sep 9 12:49:32 [829555.069833][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] >> Sep 9 12:49:32 [829555.093659][ T2925] napi_busy_loop+0x212/0x280 >> Sep 9 12:49:32 [829555.117046][ T2925] ep_poll+0xba/0x380 >> Sep 9 12:49:32 [829555.140048][ T2925] ? __napi_poll+0x1f/0x100 >> Sep 9 12:49:32 [829555.162477][ T2925] do_epoll_wait+0xa6/0xc0 >> Sep 9 12:49:32 [829555.184504][ T2925] do_epoll_pwait.part.0+0x9/0x70 >> Sep 9 12:49:32 [829555.206138][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 >> Sep 9 12:49:32 [829555.227619][ T2925] ? do_syscall_64+0x3a/0x70 >> Sep 9 12:49:32 [829555.248592][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae >> Sep 9 12:49:32 [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]--- >> Sep 9 12:49:32 [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9 >> Sep 9 12:49:32 [829555.357231][ T2925] #PF: supervisor read access in kernel mode >> Sep 9 12:49:32 [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page >> Sep 9 12:49:32 [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0 >> Sep 9 12:49:32 [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI >> Sep 9 12:49:32 [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G W O 5.13.13 #1 >> Sep 9 12:49:32 [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 >> Sep 9 12:49:32 [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 >> Sep 9 12:49:33 [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 >> Sep 9 12:49:33 [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 >> Sep 9 12:49:33 [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 >> Sep 9 12:49:33 [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 >> Sep 9 12:49:33 [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 >> Sep 9 12:49:33 [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 >> Sep 9 12:49:33 [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 >> Sep 9 12:49:33 [829555.797754][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >> Sep 9 12:49:33 [829555.843229][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> Sep 9 12:49:33 [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 >> Sep 9 12:49:33 [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> Sep 9 12:49:33 [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> Sep 9 12:49:33 [829556.008563][ T2925] Call Trace: >> Sep 9 12:49:33 [829556.032547][ T2925] ? enqueue_to_backlog+0x81/0x250 >> Sep 9 12:49:33 [829556.056686][ T2925] netif_receive_skb_list_internal+0x24d/0x2b0 >> Sep 9 12:49:33 [829556.080870][ T2925] busy_poll_stop+0x113/0x140 >> Sep 9 12:49:33 [829556.104559][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 >> Sep 9 12:49:33 [829556.128028][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] >> Sep 9 12:49:33 [829556.151405][ T2925] napi_busy_loop+0x212/0x280 >> Sep 9 12:49:33 [829556.174478][ T2925] ep_poll+0xba/0x380 >> Sep 9 12:49:33 [829556.196887][ T2925] ? __napi_poll+0x1f/0x100 >> Sep 9 12:49:33 [829556.219070][ T2925] do_epoll_wait+0xa6/0xc0 >> Sep 9 12:49:33 [829556.240778][ T2925] do_epoll_pwait.part.0+0x9/0x70 >> Sep 9 12:49:33 [829556.262203][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 >> Sep 9 12:49:33 [829556.283188][ T2925] ? do_syscall_64+0x3a/0x70 >> Sep 9 12:49:33 [829556.303666][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae >> Sep 9 12:49:33 [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] >> Sep 9 12:49:33 [829556.487037][ T2925] CR2: 00000000000496c9 >> Sep 9 12:49:33 [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]--- >> Sep 9 12:49:33 [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 >> Sep 9 12:49:34 [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 >> Sep 9 12:49:34 [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 >> Sep 9 12:49:34 [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 >> Sep 9 12:49:34 [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 >> Sep 9 12:49:34 [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 >> Sep 9 12:49:34 [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 >> Sep 9 12:49:34 [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 >> Sep 9 12:49:34 [829556.831749][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >> Sep 9 12:49:34 [829556.879295][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> Sep 9 12:49:34 [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 >> Sep 9 12:49:34 [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> Sep 9 12:49:34 [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> Sep 9 12:49:34 [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt >> Sep 9 12:49:34 [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) >> Sep 9 12:49:34 [829557.231174][ T2925] Rebooting in 10 seconds.. >> Sep 9 12:49:44 [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG. >> >>> On 30 Mar 2021, at 16:39, Eric Dumazet <edumazet@google.com> wrote: >>> >>> On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote: >>>> >>>> Hi Eric and Wei >>>> >>>> Please check this log : >>>> >>> >>> Please send a normal report to netdev. >>> >>> This has nothing to to with us (Eric & Wei) >>> >>> Thanks. >>> >>>> >>>> 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null) >>>> [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G O 5.11.4 #1 >>>> [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 >>>> [1584289.107263] Call Trace: >>>> [1584289.107266] dump_stack+0x58/0x6b >>>> [1584289.209562] warn_alloc.cold+0x70/0xd4 >>>> [1584289.209569] __alloc_pages_slowpath.constprop.0+0xd57/0xfb0 >>>> [1584289.209574] __alloc_pages_nodemask+0x15a/0x180 >>>> [1584289.474009] allocate_slab+0x272/0x450 >>>> [1584289.496731] ___slab_alloc.constprop.0+0x41e/0x4d0 >>>> [1584289.519147] kmem_cache_alloc+0x110/0x120 >>>> [1584289.541416] build_skb+0x1a/0x200 >>>> [1584289.563121] ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe] >>>> [1584289.584618] ixgbe_poll+0xeb/0x2a0 [ixgbe] >>>> [1584289.605528] __napi_poll+0x1f/0x130 >>>> [1584289.625842] napi_threaded_poll+0x110/0x160 >>>> [1584289.646110] ? __napi_poll+0x130/0x130 >>>> [1584289.665810] kthread+0xea/0x120 >>>> [1584289.684836] ? kthread_park+0x80/0x80 >>>> [1584289.703440] ret_from_fork+0x1f/0x30 >>>> [1584289.721616] Mem-Info: >>>> [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0 >>>> active_file:17408 inactive_file:149 isolated_file:32 >>>> unevictable:1440359 dirty:17500 writeback:0 >>>> slab_reclaimable:43368 slab_unreclaimable:155124 >>>> mapped:817431 shmem:7650 pagetables:32093 bounce:0 >>>> free:17832 free_pcp:113 free_cma:0 >>>> [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no >>>> [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB >>>> [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726 >>>> [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB >>>> [1584290.104980] lowmem_reserve[]: 0 0 13985 13985 >>>> [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB >>>> [1584290.237051] lowmem_reserve[]: 0 0 0 0 >>>> [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB >>>> [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB >>>> [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB >>>> [1584290.409087] 1465768 total pagecache pages >>>> [1584290.434531] 4165289 pages RAM >>>> [1584290.459616] 0 pages HighMem/MovableOnly >>>> [1584290.484480] 104766 pages reserved >>>> [1584290.508709] 0 pages hwpoisoned >>>> [1584301.710231] team0: Failed to send options change via netlink (err -105) >>>> [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0 >>>> [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G O 5.11.4 #1 >>>> [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 >>>> [1584302.776532] Call Trace: >>>> [1584302.799361] dump_stack+0x58/0x6b >>>> [1584302.821791] dump_header+0x4c/0x2e6 >>>> [1584302.843580] oom_kill_process.cold+0xb/0x10 >>>> [1584302.865223] out_of_memory.part.0+0x125/0x5f0 >>>> [1584302.886641] out_of_memory+0x54/0xa0 >>>> [1584302.907302] __alloc_pages_slowpath.constprop.0+0xb03/0xfb0 >>>> [1584302.927913] __alloc_pages_nodemask+0x15a/0x180 >>>> [1584302.947874] __get_free_pages+0x8/0x30 >>>> [1584302.967246] pgd_alloc+0x21/0x180 >>>> [1584302.986355] mm_alloc+0x1af/0x250 >>>> [1584303.005085] alloc_bprm+0x80/0x2a0 >>>> [1584303.023328] do_execveat_common+0x8b/0x330 >>>> [1584303.041181] __x64_sys_execve+0x2b/0x40 >>>> [1584303.058513] do_syscall_64+0x2d/0x40 >>>> [1584303.075281] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>> [1584303.091891] RIP: 0033:0x488376 >>>> [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30 >>>> [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b >>>> [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376 >>>> [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660 >>>> [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000 >>>> [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258 >>>> [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100 >>>> [1584303.379094] Mem-Info: >>>> [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0 >>>> active_file:12975 inactive_file:168 isolated_file:32 >>>> unevictable:909709 dirty:12864 writeback:10 >>>> slab_reclaimable:42415 slab_unreclaimable:154783 >>>> mapped:39825 shmem:14744 pagetables:26041 bounce:0 >>>> free:537002 free_pcp:1813 free_cma:0 >>>> [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no >>>> [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB >>>> [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726 >>>> [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB >>>> [1584303.888935] lowmem_reserve[]: 0 0 13985 13985 >>>> [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB >>>> [1584304.036531] lowmem_reserve[]: 0 0 0 0 >>>> [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB >>>> [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB >>>> [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB >>>> [1584304.287094] 933871 total pagecache pages >>>> [1584304.312815] 4165289 pages RAM >>>> [1584304.337915] 0 pages HighMem/MovableOnly >>>> [1584304.362522] 104766 pages reserved >>>> [1584304.386516] 0 pages hwpoisoned >>>> >>>>> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote: >>>>> >>>>> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote: >>>>>> >>>>>> Hi Wei >>>>>> Check this: >>>>>> >>>>>> [ 39.706567] ------------[ cut here ]------------ >>>>>> [ 39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557) >>>>>> [ 39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100 >>>>> >>>>> Probably more relevant to Intel maintainers than Wei :/ >>>>> >>>>>> [ 39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas >>>>>> [ 39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G O 5.11.7 #1 >>>>>> [ 39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020 >>>>>> [ 39.706619] Workqueue: events work_for_cpu_fn >>>>>> [ 39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100 >>>>>> [ 39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 >>>>>> [ 39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292 >>>>>> [ 39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff >>>>>> [ 39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001 >>>>>> [ 39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea >>>>>> [ 39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008 >>>>>> [ 39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000 >>>>>> [ 39.706646] FS: 0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000 >>>>>> [ 39.706649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>> [ 39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0 >>>>>> [ 39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>> [ 39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>>>> [ 39.706656] Call Trace: >>>>>> [ 39.706658] i40e_setup_pf_switch+0x617/0xf80 [i40e] >>>>>> [ 39.706683] i40e_probe.part.0.cold+0x8dc/0x109e [i40e] >>>>>> [ 39.706708] ? acpi_ns_check_object_type+0xd4/0x193 >>>>>> [ 39.706713] ? acpi_ns_check_package_list+0xfd/0x205 >>>>>> [ 39.706716] ? __kmalloc+0x37/0x160 >>>>>> [ 39.706720] ? kmem_cache_alloc+0xcb/0x120 >>>>>> [ 39.706723] ? irq_get_irq_data+0x5/0x20 >>>>>> [ 39.706726] ? mp_check_pin_attr+0xe/0xf0 >>>>>> [ 39.706729] ? irq_get_irq_data+0x5/0x20 >>>>>> [ 39.706731] ? mp_map_pin_to_irq+0xb7/0x2c0 >>>>>> [ 39.706735] ? acpi_register_gsi_ioapic+0x86/0x150 >>>>>> [ 39.706739] ? pci_conf1_read+0x9f/0xf0 >>>>>> [ 39.706743] ? pci_bus_read_config_word+0x2e/0x40 >>>>>> [ 39.706746] local_pci_probe+0x1b/0x40 >>>>>> [ 39.706750] work_for_cpu_fn+0xb/0x20 >>>>>> [ 39.706754] process_one_work+0x1ec/0x350 >>>>>> [ 39.706758] worker_thread+0x24b/0x4d0 >>>>>> [ 39.706760] ? process_one_work+0x350/0x350 >>>>>> [ 39.706762] kthread+0xea/0x120 >>>>>> [ 39.706766] ? kthread_park+0x80/0x80 >>>>>> [ 39.706770] ret_from_fork+0x1f/0x30 >>>>>> [ 39.706774] ---[ end trace 7a203f3ec972a377 ]--- >>>>>> >>>>>> Martin >>>>>> >>>>>> >>>>>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote: >>>>>>> >>>>>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to >>>>>>> determine if the kthread owns this napi and could call napi->poll() on >>>>>>> it. However, if socket busy poll is enabled, it is possible that the >>>>>>> busy poll thread grabs this SCHED bit (after the previous napi->poll() >>>>>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll >>>>>>> on the same napi. napi_disable() could grab the SCHED bit as well. >>>>>>> This patch tries to fix this race by adding a new bit >>>>>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in >>>>>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared >>>>>>> in napi_complete_done(), and we only poll the napi in kthread if this >>>>>>> bit is set. This helps distinguish the ownership of the napi between >>>>>>> kthread and other scenarios and fixes the race issue. >>>>>>> >>>>>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") >>>>>>> Reported-by: Martin Zaharinov <micron10@gmail.com> >>>>>>> Suggested-by: Jakub Kicinski <kuba@kernel.org> >>>>>>> Signed-off-by: Wei Wang <weiwan@google.com> >>>>>>> Cc: Alexander Duyck <alexanderduyck@fb.com> >>>>>>> Cc: Eric Dumazet <edumazet@google.com> >>>>>>> Cc: Paolo Abeni <pabeni@redhat.com> >>>>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> >>>>>>> --- >>>>>>> Change since v3: >>>>>>> - Add READ_ONCE() for thread->state and add comments in >>>>>>> ____napi_schedule(). >>>>>>> >>>>>>> include/linux/netdevice.h | 2 ++ >>>>>>> net/core/dev.c | 19 ++++++++++++++++++- >>>>>>> 2 files changed, 20 insertions(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>>>>>> index 5b67ea89d5f2..87a5d186faff 100644 >>>>>>> --- a/include/linux/netdevice.h >>>>>>> +++ b/include/linux/netdevice.h >>>>>>> @@ -360,6 +360,7 @@ enum { >>>>>>> NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */ >>>>>>> NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/ >>>>>>> NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/ >>>>>>> + NAPI_STATE_SCHED_THREADED, /* Napi is currently scheduled in threaded mode */ >>>>>>> }; >>>>>>> >>>>>>> enum { >>>>>>> @@ -372,6 +373,7 @@ enum { >>>>>>> NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL), >>>>>>> NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL), >>>>>>> NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED), >>>>>>> + NAPIF_STATE_SCHED_THREADED = BIT(NAPI_STATE_SCHED_THREADED), >>>>>>> }; >>>>>>> >>>>>>> enum gro_result { >>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c >>>>>>> index 6c5967e80132..d3195a95f30e 100644 >>>>>>> --- a/net/core/dev.c >>>>>>> +++ b/net/core/dev.c >>>>>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd, >>>>>>> */ >>>>>>> thread = READ_ONCE(napi->thread); >>>>>>> if (thread) { >>>>>>> + /* Avoid doing set_bit() if the thread is in >>>>>>> + * INTERRUPTIBLE state, cause napi_thread_wait() >>>>>>> + * makes sure to proceed with napi polling >>>>>>> + * if the thread is explicitly woken from here. >>>>>>> + */ >>>>>>> + if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE) >>>>>>> + set_bit(NAPI_STATE_SCHED_THREADED, &napi->state); >>>>>>> wake_up_process(thread); >>>>>>> return; >>>>>>> } >>>>>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done) >>>>>>> WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED)); >>>>>>> >>>>>>> new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED | >>>>>>> + NAPIF_STATE_SCHED_THREADED | >>>>>>> NAPIF_STATE_PREFER_BUSY_POLL); >>>>>>> >>>>>>> /* If STATE_MISSED was set, leave STATE_SCHED set, >>>>>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) >>>>>>> >>>>>>> static int napi_thread_wait(struct napi_struct *napi) >>>>>>> { >>>>>>> + bool woken = false; >>>>>>> + >>>>>>> set_current_state(TASK_INTERRUPTIBLE); >>>>>>> >>>>>>> while (!kthread_should_stop() && !napi_disable_pending(napi)) { >>>>>>> - if (test_bit(NAPI_STATE_SCHED, &napi->state)) { >>>>>>> + /* Testing SCHED_THREADED bit here to make sure the current >>>>>>> + * kthread owns this napi and could poll on this napi. >>>>>>> + * Testing SCHED bit is not enough because SCHED bit might be >>>>>>> + * set by some other busy poll thread or by napi_disable(). >>>>>>> + */ >>>>>>> + if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) { >>>>>>> WARN_ON(!list_empty(&napi->poll_list)); >>>>>>> __set_current_state(TASK_RUNNING); >>>>>>> return 0; >>>>>>> } >>>>>>> >>>>>>> schedule(); >>>>>>> + /* woken being true indicates this thread owns this napi. */ >>>>>>> + woken = true; >>>>>>> set_current_state(TASK_INTERRUPTIBLE); >>>>>>> } >>>>>>> __set_current_state(TASK_RUNNING); >>>>>>> -- >>>>>>> 2.31.0.rc2.261.g7f71774620-goog >>>>>>> >>>>>> >>>> >>
Thanks Martin for the report. Without a reproducer, it might be hard to debug. I will double check the code to check for potential race between kthread poll and busy poll. Thanks. Wei On Wed, Sep 15, 2021 at 7:22 AM Martin Zaharinov <micron10@gmail.com> wrote: > > Hi Wei > Please see this bug log : > > > Sep 15 08:04:56 [2034411.548669][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Sep 15 08:04:56 [2034411.574642][ T3195] CR2: 00007f1e8cf58000 CR3: 0000000183cb4003 CR4: 00000000001706e0 > Sep 15 08:04:56 [2034411.625156][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Sep 15 08:04:56 [2034411.675495][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Sep 15 08:04:56 [2034411.725536][ T3195] Call Trace: > Sep 15 08:04:56 [2034411.749948][ T3195] netif_receive_skb_list_internal+0x25c/0x2b0 > Sep 15 08:04:56 [2034411.774579][ T3195] gro_normal_one+0x6e/0x90 > Sep 15 08:04:56 [2034411.798786][ T3195] napi_gro_flush+0xb1/0x100 > Sep 15 08:04:56 [2034411.822410][ T3195] napi_complete_done+0x107/0x180 > Sep 15 08:04:56 [2034411.845614][ T3195] ixgbe_poll+0x10e/0x240 [ixgbe] > Sep 15 08:04:56 [2034411.868480][ T3195] __napi_poll+0x1f/0x100 > Sep 15 08:04:56 [2034411.890899][ T3195] ? __napi_poll+0x100/0x100 > Sep 15 08:04:56 [2034411.912799][ T3195] napi_threaded_poll+0x105/0x150 > Sep 15 08:04:56 [2034411.934567][ T3195] kthread+0x101/0x120 > Sep 15 08:04:56 [2034411.955873][ T3195] ? set_kthread_struct+0x30/0x30 > Sep 15 08:04:56 [2034411.977157][ T3195] ret_from_fork+0x1f/0x30 > Sep 15 08:04:56 [2034411.997922][ T3195] ---[ end trace 83b8d17d2762bc73 ]--- > Sep 15 08:04:56 [2034412.018439][ T3195] BUG: kernel NULL pointer dereference, address: 0000000000000000 > Sep 15 08:04:56 [2034412.058658][ T3195] #PF: supervisor read access in kernel mode > Sep 15 08:04:56 [2034412.078866][ T3195] #PF: error_code(0x0000) - not-present page > Sep 15 08:04:56 [2034412.098648][ T3195] PGD 12fabb067 P4D 12fabb067 PUD 17d7fa067 PMD 0 > Sep 15 08:04:56 [2034412.118230][ T3195] Oops: 0000 [#1] SMP NOPTI > Sep 15 08:04:56 [2034412.137240][ T3195] CPU: 2 PID: 3195 Comm: napi/eth0-513 Tainted: G S W O 5.13.12 #1 > Sep 15 08:04:56 [2034412.174538][ T3195] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 > Sep 15 08:04:56 [2034412.211613][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0 > Sep 15 08:04:56 [2034412.230852][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49 > Sep 15 08:04:56 [2034412.287806][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296 > Sep 15 08:04:56 [2034412.306525][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 > Sep 15 08:04:56 [2034412.343308][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003 > Sep 15 08:04:56 [2034412.381399][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff > Sep 15 08:04:56 [2034412.421437][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00 > Sep 15 08:04:57 [2034412.463438][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00 > Sep 15 08:04:57 [2034412.507493][ T3195] FS: 0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000 > Sep 15 08:04:57 [2034412.553528][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Sep 15 08:04:57 [2034412.577424][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0 > Sep 15 08:04:57 [2034412.624853][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Sep 15 08:04:57 [2034412.672633][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Sep 15 08:04:57 [2034412.721656][ T3195] Call Trace: > Sep 15 08:04:57 [2034412.746016][ T3195] gro_normal_one+0x6e/0x90 > Sep 15 08:04:57 [2034412.770321][ T3195] napi_gro_flush+0xb1/0x100 > Sep 15 08:04:57 [2034412.794137][ T3195] napi_complete_done+0x107/0x180 > Sep 15 08:04:57 [2034412.817556][ T3195] ixgbe_poll+0x10e/0x240 [ixgbe] > Sep 15 08:04:57 [2034412.840522][ T3195] __napi_poll+0x1f/0x100 > Sep 15 08:04:57 [2034412.862829][ T3195] ? __napi_poll+0x100/0x100 > Sep 15 08:04:57 [2034412.884804][ T3195] napi_threaded_poll+0x105/0x150 > Sep 15 08:04:57 [2034412.906305][ T3195] kthread+0x101/0x120 > Sep 15 08:04:57 [2034412.927502][ T3195] ? set_kthread_struct+0x30/0x30 > Sep 15 08:04:57 [2034412.948434][ T3195] ret_from_fork+0x1f/0x30 > Sep 15 08:04:57 [2034412.969117][ T3195] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_dtvqos(O) xt_TCPMSS xt_comment iptable_mangle xt_MASQUERADE xt_nat iptable_nat ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: DTVIAC] > Sep 15 08:04:57 [2034413.136792][ T3195] CR2: 0000000000000000 > Sep 15 08:04:57 [2034413.157289][ T3195] ---[ end trace 83b8d17d2762bc74 ]--- > Sep 15 08:04:57 [2034413.177534][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0 > Sep 15 08:04:57 [2034413.197775][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49 > Sep 15 08:04:57 [2034413.258524][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296 > Sep 15 08:04:57 [2034413.278591][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 > Sep 15 08:04:57 [2034413.317532][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003 > Sep 15 08:04:57 [2034413.356936][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff > Sep 15 08:04:57 [2034413.398408][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00 > Sep 15 08:04:58 [2034413.441949][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00 > Sep 15 08:04:58 [2034413.487558][ T3195] FS: 0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000 > Sep 15 08:04:58 [2034413.535263][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Sep 15 08:04:58 [2034413.560012][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0 > Sep 15 08:04:58 [2034413.609277][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Sep 15 08:04:58 [2034413.658501][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Sep 15 08:04:58 [2034413.707535][ T3195] Kernel panic - not syncing: Fatal exception in interrupt > Sep 15 08:04:58 [2034413.856962][ T3195] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > Sep 15 08:04:58 [2034413.906445][ T3195] Rebooting in 10 seconds.. > Sep 15 08:05:08 [2034423.930880][ T3195] ACPI MEMORY or I/O RESET_REG. > > > > > > On 10 Sep 2021, at 3:30, Wei Wang <weiwan@google.com> wrote: > > > > Hi Martin, > > > > Is there a reproducer for this? What kind of traffic is it running? > > What is the following config: > > cat /proc/sys/net/core/busy_poll > > cat /proc/sys/net/core/busy_read > > cat /sys/class/net/<ixgbe_dev>/threaded > > And is SO_PREFER_BUSY_POLL used? > > > > Thanks. > > Wei > > > > > > > > On Thu, Sep 9, 2021 at 4:18 AM Martin Zaharinov <micron10@gmail.com> wrote: > >> > >> Hi Eric and Wei > >> > >> Please see this bug report from last hour , > >> Kernel 5.13.13, Traffic is 7Gb/s down/ 7Gb/s up > >> Uptime before crash : 10day > >> > >> > >> > >> > >> Sep 9 12:49:31 [829553.899833][ T2925] ------------[ cut here ]------------ > >> Sep 9 12:49:31 [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158 > >> Sep 9 12:49:31 [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90 > >> Sep 9 12:49:31 [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] > >> Sep 9 12:49:31 [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G O 5.13.13 #1 > >> Sep 9 12:49:31 [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 > >> Sep 9 12:49:31 [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90 > >> Sep 9 12:49:31 [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6 > >> Sep 9 12:49:31 [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286 > >> Sep 9 12:49:31 [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001 > >> Sep 9 12:49:32 [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff > >> Sep 9 12:49:32 [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff > >> Sep 9 12:49:32 [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00 > >> Sep 9 12:49:32 [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00 > >> Sep 9 12:49:32 [829554.744221][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 > >> Sep 9 12:49:32 [829554.795701][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >> Sep 9 12:49:32 [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0 > >> Sep 9 12:49:32 [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >> Sep 9 12:49:32 [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >> Sep 9 12:49:32 [829554.972250][ T2925] Call Trace: > >> Sep 9 12:49:32 [829554.996597][ T2925] netif_receive_skb_list_internal+0x25c/0x2b0 > >> Sep 9 12:49:32 [829555.021270][ T2925] busy_poll_stop+0x113/0x140 > >> Sep 9 12:49:32 [829555.045679][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 > >> Sep 9 12:49:32 [829555.069833][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] > >> Sep 9 12:49:32 [829555.093659][ T2925] napi_busy_loop+0x212/0x280 > >> Sep 9 12:49:32 [829555.117046][ T2925] ep_poll+0xba/0x380 > >> Sep 9 12:49:32 [829555.140048][ T2925] ? __napi_poll+0x1f/0x100 > >> Sep 9 12:49:32 [829555.162477][ T2925] do_epoll_wait+0xa6/0xc0 > >> Sep 9 12:49:32 [829555.184504][ T2925] do_epoll_pwait.part.0+0x9/0x70 > >> Sep 9 12:49:32 [829555.206138][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 > >> Sep 9 12:49:32 [829555.227619][ T2925] ? do_syscall_64+0x3a/0x70 > >> Sep 9 12:49:32 [829555.248592][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae > >> Sep 9 12:49:32 [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]--- > >> Sep 9 12:49:32 [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9 > >> Sep 9 12:49:32 [829555.357231][ T2925] #PF: supervisor read access in kernel mode > >> Sep 9 12:49:32 [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page > >> Sep 9 12:49:32 [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0 > >> Sep 9 12:49:32 [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI > >> Sep 9 12:49:32 [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G W O 5.13.13 #1 > >> Sep 9 12:49:32 [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 > >> Sep 9 12:49:32 [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 > >> Sep 9 12:49:33 [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 > >> Sep 9 12:49:33 [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 > >> Sep 9 12:49:33 [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 > >> Sep 9 12:49:33 [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 > >> Sep 9 12:49:33 [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 > >> Sep 9 12:49:33 [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 > >> Sep 9 12:49:33 [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 > >> Sep 9 12:49:33 [829555.797754][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 > >> Sep 9 12:49:33 [829555.843229][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >> Sep 9 12:49:33 [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 > >> Sep 9 12:49:33 [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >> Sep 9 12:49:33 [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >> Sep 9 12:49:33 [829556.008563][ T2925] Call Trace: > >> Sep 9 12:49:33 [829556.032547][ T2925] ? enqueue_to_backlog+0x81/0x250 > >> Sep 9 12:49:33 [829556.056686][ T2925] netif_receive_skb_list_internal+0x24d/0x2b0 > >> Sep 9 12:49:33 [829556.080870][ T2925] busy_poll_stop+0x113/0x140 > >> Sep 9 12:49:33 [829556.104559][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 > >> Sep 9 12:49:33 [829556.128028][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] > >> Sep 9 12:49:33 [829556.151405][ T2925] napi_busy_loop+0x212/0x280 > >> Sep 9 12:49:33 [829556.174478][ T2925] ep_poll+0xba/0x380 > >> Sep 9 12:49:33 [829556.196887][ T2925] ? __napi_poll+0x1f/0x100 > >> Sep 9 12:49:33 [829556.219070][ T2925] do_epoll_wait+0xa6/0xc0 > >> Sep 9 12:49:33 [829556.240778][ T2925] do_epoll_pwait.part.0+0x9/0x70 > >> Sep 9 12:49:33 [829556.262203][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 > >> Sep 9 12:49:33 [829556.283188][ T2925] ? do_syscall_64+0x3a/0x70 > >> Sep 9 12:49:33 [829556.303666][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae > >> Sep 9 12:49:33 [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] > >> Sep 9 12:49:33 [829556.487037][ T2925] CR2: 00000000000496c9 > >> Sep 9 12:49:33 [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]--- > >> Sep 9 12:49:33 [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 > >> Sep 9 12:49:34 [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 > >> Sep 9 12:49:34 [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 > >> Sep 9 12:49:34 [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 > >> Sep 9 12:49:34 [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 > >> Sep 9 12:49:34 [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 > >> Sep 9 12:49:34 [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 > >> Sep 9 12:49:34 [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 > >> Sep 9 12:49:34 [829556.831749][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 > >> Sep 9 12:49:34 [829556.879295][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >> Sep 9 12:49:34 [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 > >> Sep 9 12:49:34 [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >> Sep 9 12:49:34 [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >> Sep 9 12:49:34 [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt > >> Sep 9 12:49:34 [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > >> Sep 9 12:49:34 [829557.231174][ T2925] Rebooting in 10 seconds.. > >> Sep 9 12:49:44 [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG. > >> > >>> On 30 Mar 2021, at 16:39, Eric Dumazet <edumazet@google.com> wrote: > >>> > >>> On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote: > >>>> > >>>> Hi Eric and Wei > >>>> > >>>> Please check this log : > >>>> > >>> > >>> Please send a normal report to netdev. > >>> > >>> This has nothing to to with us (Eric & Wei) > >>> > >>> Thanks. > >>> > >>>> > >>>> 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null) > >>>> [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G O 5.11.4 #1 > >>>> [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 > >>>> [1584289.107263] Call Trace: > >>>> [1584289.107266] dump_stack+0x58/0x6b > >>>> [1584289.209562] warn_alloc.cold+0x70/0xd4 > >>>> [1584289.209569] __alloc_pages_slowpath.constprop.0+0xd57/0xfb0 > >>>> [1584289.209574] __alloc_pages_nodemask+0x15a/0x180 > >>>> [1584289.474009] allocate_slab+0x272/0x450 > >>>> [1584289.496731] ___slab_alloc.constprop.0+0x41e/0x4d0 > >>>> [1584289.519147] kmem_cache_alloc+0x110/0x120 > >>>> [1584289.541416] build_skb+0x1a/0x200 > >>>> [1584289.563121] ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe] > >>>> [1584289.584618] ixgbe_poll+0xeb/0x2a0 [ixgbe] > >>>> [1584289.605528] __napi_poll+0x1f/0x130 > >>>> [1584289.625842] napi_threaded_poll+0x110/0x160 > >>>> [1584289.646110] ? __napi_poll+0x130/0x130 > >>>> [1584289.665810] kthread+0xea/0x120 > >>>> [1584289.684836] ? kthread_park+0x80/0x80 > >>>> [1584289.703440] ret_from_fork+0x1f/0x30 > >>>> [1584289.721616] Mem-Info: > >>>> [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0 > >>>> active_file:17408 inactive_file:149 isolated_file:32 > >>>> unevictable:1440359 dirty:17500 writeback:0 > >>>> slab_reclaimable:43368 slab_unreclaimable:155124 > >>>> mapped:817431 shmem:7650 pagetables:32093 bounce:0 > >>>> free:17832 free_pcp:113 free_cma:0 > >>>> [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no > >>>> [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > >>>> [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726 > >>>> [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB > >>>> [1584290.104980] lowmem_reserve[]: 0 0 13985 13985 > >>>> [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB > >>>> [1584290.237051] lowmem_reserve[]: 0 0 0 0 > >>>> [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB > >>>> [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB > >>>> [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB > >>>> [1584290.409087] 1465768 total pagecache pages > >>>> [1584290.434531] 4165289 pages RAM > >>>> [1584290.459616] 0 pages HighMem/MovableOnly > >>>> [1584290.484480] 104766 pages reserved > >>>> [1584290.508709] 0 pages hwpoisoned > >>>> [1584301.710231] team0: Failed to send options change via netlink (err -105) > >>>> [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0 > >>>> [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G O 5.11.4 #1 > >>>> [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 > >>>> [1584302.776532] Call Trace: > >>>> [1584302.799361] dump_stack+0x58/0x6b > >>>> [1584302.821791] dump_header+0x4c/0x2e6 > >>>> [1584302.843580] oom_kill_process.cold+0xb/0x10 > >>>> [1584302.865223] out_of_memory.part.0+0x125/0x5f0 > >>>> [1584302.886641] out_of_memory+0x54/0xa0 > >>>> [1584302.907302] __alloc_pages_slowpath.constprop.0+0xb03/0xfb0 > >>>> [1584302.927913] __alloc_pages_nodemask+0x15a/0x180 > >>>> [1584302.947874] __get_free_pages+0x8/0x30 > >>>> [1584302.967246] pgd_alloc+0x21/0x180 > >>>> [1584302.986355] mm_alloc+0x1af/0x250 > >>>> [1584303.005085] alloc_bprm+0x80/0x2a0 > >>>> [1584303.023328] do_execveat_common+0x8b/0x330 > >>>> [1584303.041181] __x64_sys_execve+0x2b/0x40 > >>>> [1584303.058513] do_syscall_64+0x2d/0x40 > >>>> [1584303.075281] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>>> [1584303.091891] RIP: 0033:0x488376 > >>>> [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30 > >>>> [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b > >>>> [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376 > >>>> [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660 > >>>> [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000 > >>>> [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258 > >>>> [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100 > >>>> [1584303.379094] Mem-Info: > >>>> [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0 > >>>> active_file:12975 inactive_file:168 isolated_file:32 > >>>> unevictable:909709 dirty:12864 writeback:10 > >>>> slab_reclaimable:42415 slab_unreclaimable:154783 > >>>> mapped:39825 shmem:14744 pagetables:26041 bounce:0 > >>>> free:537002 free_pcp:1813 free_cma:0 > >>>> [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no > >>>> [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > >>>> [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726 > >>>> [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB > >>>> [1584303.888935] lowmem_reserve[]: 0 0 13985 13985 > >>>> [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB > >>>> [1584304.036531] lowmem_reserve[]: 0 0 0 0 > >>>> [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB > >>>> [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB > >>>> [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB > >>>> [1584304.287094] 933871 total pagecache pages > >>>> [1584304.312815] 4165289 pages RAM > >>>> [1584304.337915] 0 pages HighMem/MovableOnly > >>>> [1584304.362522] 104766 pages reserved > >>>> [1584304.386516] 0 pages hwpoisoned > >>>> > >>>>> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote: > >>>>> > >>>>> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote: > >>>>>> > >>>>>> Hi Wei > >>>>>> Check this: > >>>>>> > >>>>>> [ 39.706567] ------------[ cut here ]------------ > >>>>>> [ 39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557) > >>>>>> [ 39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100 > >>>>> > >>>>> Probably more relevant to Intel maintainers than Wei :/ > >>>>> > >>>>>> [ 39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas > >>>>>> [ 39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G O 5.11.7 #1 > >>>>>> [ 39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020 > >>>>>> [ 39.706619] Workqueue: events work_for_cpu_fn > >>>>>> [ 39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100 > >>>>>> [ 39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 > >>>>>> [ 39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292 > >>>>>> [ 39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff > >>>>>> [ 39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001 > >>>>>> [ 39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea > >>>>>> [ 39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008 > >>>>>> [ 39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000 > >>>>>> [ 39.706646] FS: 0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000 > >>>>>> [ 39.706649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>>>> [ 39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0 > >>>>>> [ 39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>>>> [ 39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>>>>> [ 39.706656] Call Trace: > >>>>>> [ 39.706658] i40e_setup_pf_switch+0x617/0xf80 [i40e] > >>>>>> [ 39.706683] i40e_probe.part.0.cold+0x8dc/0x109e [i40e] > >>>>>> [ 39.706708] ? acpi_ns_check_object_type+0xd4/0x193 > >>>>>> [ 39.706713] ? acpi_ns_check_package_list+0xfd/0x205 > >>>>>> [ 39.706716] ? __kmalloc+0x37/0x160 > >>>>>> [ 39.706720] ? kmem_cache_alloc+0xcb/0x120 > >>>>>> [ 39.706723] ? irq_get_irq_data+0x5/0x20 > >>>>>> [ 39.706726] ? mp_check_pin_attr+0xe/0xf0 > >>>>>> [ 39.706729] ? irq_get_irq_data+0x5/0x20 > >>>>>> [ 39.706731] ? mp_map_pin_to_irq+0xb7/0x2c0 > >>>>>> [ 39.706735] ? acpi_register_gsi_ioapic+0x86/0x150 > >>>>>> [ 39.706739] ? pci_conf1_read+0x9f/0xf0 > >>>>>> [ 39.706743] ? pci_bus_read_config_word+0x2e/0x40 > >>>>>> [ 39.706746] local_pci_probe+0x1b/0x40 > >>>>>> [ 39.706750] work_for_cpu_fn+0xb/0x20 > >>>>>> [ 39.706754] process_one_work+0x1ec/0x350 > >>>>>> [ 39.706758] worker_thread+0x24b/0x4d0 > >>>>>> [ 39.706760] ? process_one_work+0x350/0x350 > >>>>>> [ 39.706762] kthread+0xea/0x120 > >>>>>> [ 39.706766] ? kthread_park+0x80/0x80 > >>>>>> [ 39.706770] ret_from_fork+0x1f/0x30 > >>>>>> [ 39.706774] ---[ end trace 7a203f3ec972a377 ]--- > >>>>>> > >>>>>> Martin > >>>>>> > >>>>>> > >>>>>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote: > >>>>>>> > >>>>>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to > >>>>>>> determine if the kthread owns this napi and could call napi->poll() on > >>>>>>> it. However, if socket busy poll is enabled, it is possible that the > >>>>>>> busy poll thread grabs this SCHED bit (after the previous napi->poll() > >>>>>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll > >>>>>>> on the same napi. napi_disable() could grab the SCHED bit as well. > >>>>>>> This patch tries to fix this race by adding a new bit > >>>>>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in > >>>>>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared > >>>>>>> in napi_complete_done(), and we only poll the napi in kthread if this > >>>>>>> bit is set. This helps distinguish the ownership of the napi between > >>>>>>> kthread and other scenarios and fixes the race issue. > >>>>>>> > >>>>>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") > >>>>>>> Reported-by: Martin Zaharinov <micron10@gmail.com> > >>>>>>> Suggested-by: Jakub Kicinski <kuba@kernel.org> > >>>>>>> Signed-off-by: Wei Wang <weiwan@google.com> > >>>>>>> Cc: Alexander Duyck <alexanderduyck@fb.com> > >>>>>>> Cc: Eric Dumazet <edumazet@google.com> > >>>>>>> Cc: Paolo Abeni <pabeni@redhat.com> > >>>>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> > >>>>>>> --- > >>>>>>> Change since v3: > >>>>>>> - Add READ_ONCE() for thread->state and add comments in > >>>>>>> ____napi_schedule(). > >>>>>>> > >>>>>>> include/linux/netdevice.h | 2 ++ > >>>>>>> net/core/dev.c | 19 ++++++++++++++++++- > >>>>>>> 2 files changed, 20 insertions(+), 1 deletion(-) > >>>>>>> > >>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > >>>>>>> index 5b67ea89d5f2..87a5d186faff 100644 > >>>>>>> --- a/include/linux/netdevice.h > >>>>>>> +++ b/include/linux/netdevice.h > >>>>>>> @@ -360,6 +360,7 @@ enum { > >>>>>>> NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */ > >>>>>>> NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/ > >>>>>>> NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/ > >>>>>>> + NAPI_STATE_SCHED_THREADED, /* Napi is currently scheduled in threaded mode */ > >>>>>>> }; > >>>>>>> > >>>>>>> enum { > >>>>>>> @@ -372,6 +373,7 @@ enum { > >>>>>>> NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL), > >>>>>>> NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL), > >>>>>>> NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED), > >>>>>>> + NAPIF_STATE_SCHED_THREADED = BIT(NAPI_STATE_SCHED_THREADED), > >>>>>>> }; > >>>>>>> > >>>>>>> enum gro_result { > >>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c > >>>>>>> index 6c5967e80132..d3195a95f30e 100644 > >>>>>>> --- a/net/core/dev.c > >>>>>>> +++ b/net/core/dev.c > >>>>>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd, > >>>>>>> */ > >>>>>>> thread = READ_ONCE(napi->thread); > >>>>>>> if (thread) { > >>>>>>> + /* Avoid doing set_bit() if the thread is in > >>>>>>> + * INTERRUPTIBLE state, cause napi_thread_wait() > >>>>>>> + * makes sure to proceed with napi polling > >>>>>>> + * if the thread is explicitly woken from here. > >>>>>>> + */ > >>>>>>> + if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE) > >>>>>>> + set_bit(NAPI_STATE_SCHED_THREADED, &napi->state); > >>>>>>> wake_up_process(thread); > >>>>>>> return; > >>>>>>> } > >>>>>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done) > >>>>>>> WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED)); > >>>>>>> > >>>>>>> new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED | > >>>>>>> + NAPIF_STATE_SCHED_THREADED | > >>>>>>> NAPIF_STATE_PREFER_BUSY_POLL); > >>>>>>> > >>>>>>> /* If STATE_MISSED was set, leave STATE_SCHED set, > >>>>>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) > >>>>>>> > >>>>>>> static int napi_thread_wait(struct napi_struct *napi) > >>>>>>> { > >>>>>>> + bool woken = false; > >>>>>>> + > >>>>>>> set_current_state(TASK_INTERRUPTIBLE); > >>>>>>> > >>>>>>> while (!kthread_should_stop() && !napi_disable_pending(napi)) { > >>>>>>> - if (test_bit(NAPI_STATE_SCHED, &napi->state)) { > >>>>>>> + /* Testing SCHED_THREADED bit here to make sure the current > >>>>>>> + * kthread owns this napi and could poll on this napi. > >>>>>>> + * Testing SCHED bit is not enough because SCHED bit might be > >>>>>>> + * set by some other busy poll thread or by napi_disable(). > >>>>>>> + */ > >>>>>>> + if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) { > >>>>>>> WARN_ON(!list_empty(&napi->poll_list)); > >>>>>>> __set_current_state(TASK_RUNNING); > >>>>>>> return 0; > >>>>>>> } > >>>>>>> > >>>>>>> schedule(); > >>>>>>> + /* woken being true indicates this thread owns this napi. */ > >>>>>>> + woken = true; > >>>>>>> set_current_state(TASK_INTERRUPTIBLE); > >>>>>>> } > >>>>>>> __set_current_state(TASK_RUNNING); > >>>>>>> -- > >>>>>>> 2.31.0.rc2.261.g7f71774620-goog > >>>>>>> > >>>>>> > >>>> > >> >
Hi Wei yes is not easy to reproduce this here i run this on 20 machine and this with dump run for last 10-12 day without problem . Martin > On 15 Sep 2021, at 18:45, Wei Wang <weiwan@google.com> wrote: > > Thanks Martin for the report. > Without a reproducer, it might be hard to debug. I will double check > the code to check for potential race between kthread poll and busy > poll. > > Thanks. > Wei > > On Wed, Sep 15, 2021 at 7:22 AM Martin Zaharinov <micron10@gmail.com> wrote: >> >> Hi Wei >> Please see this bug log : >> >> >> Sep 15 08:04:56 [2034411.548669][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> Sep 15 08:04:56 [2034411.574642][ T3195] CR2: 00007f1e8cf58000 CR3: 0000000183cb4003 CR4: 00000000001706e0 >> Sep 15 08:04:56 [2034411.625156][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> Sep 15 08:04:56 [2034411.675495][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> Sep 15 08:04:56 [2034411.725536][ T3195] Call Trace: >> Sep 15 08:04:56 [2034411.749948][ T3195] netif_receive_skb_list_internal+0x25c/0x2b0 >> Sep 15 08:04:56 [2034411.774579][ T3195] gro_normal_one+0x6e/0x90 >> Sep 15 08:04:56 [2034411.798786][ T3195] napi_gro_flush+0xb1/0x100 >> Sep 15 08:04:56 [2034411.822410][ T3195] napi_complete_done+0x107/0x180 >> Sep 15 08:04:56 [2034411.845614][ T3195] ixgbe_poll+0x10e/0x240 [ixgbe] >> Sep 15 08:04:56 [2034411.868480][ T3195] __napi_poll+0x1f/0x100 >> Sep 15 08:04:56 [2034411.890899][ T3195] ? __napi_poll+0x100/0x100 >> Sep 15 08:04:56 [2034411.912799][ T3195] napi_threaded_poll+0x105/0x150 >> Sep 15 08:04:56 [2034411.934567][ T3195] kthread+0x101/0x120 >> Sep 15 08:04:56 [2034411.955873][ T3195] ? set_kthread_struct+0x30/0x30 >> Sep 15 08:04:56 [2034411.977157][ T3195] ret_from_fork+0x1f/0x30 >> Sep 15 08:04:56 [2034411.997922][ T3195] ---[ end trace 83b8d17d2762bc73 ]--- >> Sep 15 08:04:56 [2034412.018439][ T3195] BUG: kernel NULL pointer dereference, address: 0000000000000000 >> Sep 15 08:04:56 [2034412.058658][ T3195] #PF: supervisor read access in kernel mode >> Sep 15 08:04:56 [2034412.078866][ T3195] #PF: error_code(0x0000) - not-present page >> Sep 15 08:04:56 [2034412.098648][ T3195] PGD 12fabb067 P4D 12fabb067 PUD 17d7fa067 PMD 0 >> Sep 15 08:04:56 [2034412.118230][ T3195] Oops: 0000 [#1] SMP NOPTI >> Sep 15 08:04:56 [2034412.137240][ T3195] CPU: 2 PID: 3195 Comm: napi/eth0-513 Tainted: G S W O 5.13.12 #1 >> Sep 15 08:04:56 [2034412.174538][ T3195] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 >> Sep 15 08:04:56 [2034412.211613][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0 >> Sep 15 08:04:56 [2034412.230852][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49 >> Sep 15 08:04:56 [2034412.287806][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296 >> Sep 15 08:04:56 [2034412.306525][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 >> Sep 15 08:04:56 [2034412.343308][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003 >> Sep 15 08:04:56 [2034412.381399][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff >> Sep 15 08:04:56 [2034412.421437][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00 >> Sep 15 08:04:57 [2034412.463438][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00 >> Sep 15 08:04:57 [2034412.507493][ T3195] FS: 0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000 >> Sep 15 08:04:57 [2034412.553528][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> Sep 15 08:04:57 [2034412.577424][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0 >> Sep 15 08:04:57 [2034412.624853][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> Sep 15 08:04:57 [2034412.672633][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> Sep 15 08:04:57 [2034412.721656][ T3195] Call Trace: >> Sep 15 08:04:57 [2034412.746016][ T3195] gro_normal_one+0x6e/0x90 >> Sep 15 08:04:57 [2034412.770321][ T3195] napi_gro_flush+0xb1/0x100 >> Sep 15 08:04:57 [2034412.794137][ T3195] napi_complete_done+0x107/0x180 >> Sep 15 08:04:57 [2034412.817556][ T3195] ixgbe_poll+0x10e/0x240 [ixgbe] >> Sep 15 08:04:57 [2034412.840522][ T3195] __napi_poll+0x1f/0x100 >> Sep 15 08:04:57 [2034412.862829][ T3195] ? __napi_poll+0x100/0x100 >> Sep 15 08:04:57 [2034412.884804][ T3195] napi_threaded_poll+0x105/0x150 >> Sep 15 08:04:57 [2034412.906305][ T3195] kthread+0x101/0x120 >> Sep 15 08:04:57 [2034412.927502][ T3195] ? set_kthread_struct+0x30/0x30 >> Sep 15 08:04:57 [2034412.948434][ T3195] ret_from_fork+0x1f/0x30 >> Sep 15 08:04:57 [2034412.969117][ T3195] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_dtvqos(O) xt_TCPMSS xt_comment iptable_mangle xt_MASQUERADE xt_nat iptable_nat ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: DTVIAC] >> Sep 15 08:04:57 [2034413.136792][ T3195] CR2: 0000000000000000 >> Sep 15 08:04:57 [2034413.157289][ T3195] ---[ end trace 83b8d17d2762bc74 ]--- >> Sep 15 08:04:57 [2034413.177534][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0 >> Sep 15 08:04:57 [2034413.197775][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49 >> Sep 15 08:04:57 [2034413.258524][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296 >> Sep 15 08:04:57 [2034413.278591][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 >> Sep 15 08:04:57 [2034413.317532][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003 >> Sep 15 08:04:57 [2034413.356936][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff >> Sep 15 08:04:57 [2034413.398408][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00 >> Sep 15 08:04:58 [2034413.441949][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00 >> Sep 15 08:04:58 [2034413.487558][ T3195] FS: 0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000 >> Sep 15 08:04:58 [2034413.535263][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> Sep 15 08:04:58 [2034413.560012][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0 >> Sep 15 08:04:58 [2034413.609277][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> Sep 15 08:04:58 [2034413.658501][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> Sep 15 08:04:58 [2034413.707535][ T3195] Kernel panic - not syncing: Fatal exception in interrupt >> Sep 15 08:04:58 [2034413.856962][ T3195] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) >> Sep 15 08:04:58 [2034413.906445][ T3195] Rebooting in 10 seconds.. >> Sep 15 08:05:08 [2034423.930880][ T3195] ACPI MEMORY or I/O RESET_REG. >> >> >> >> >>> On 10 Sep 2021, at 3:30, Wei Wang <weiwan@google.com> wrote: >>> >>> Hi Martin, >>> >>> Is there a reproducer for this? What kind of traffic is it running? >>> What is the following config: >>> cat /proc/sys/net/core/busy_poll >>> cat /proc/sys/net/core/busy_read >>> cat /sys/class/net/<ixgbe_dev>/threaded >>> And is SO_PREFER_BUSY_POLL used? >>> >>> Thanks. >>> Wei >>> >>> >>> >>> On Thu, Sep 9, 2021 at 4:18 AM Martin Zaharinov <micron10@gmail.com> wrote: >>>> >>>> Hi Eric and Wei >>>> >>>> Please see this bug report from last hour , >>>> Kernel 5.13.13, Traffic is 7Gb/s down/ 7Gb/s up >>>> Uptime before crash : 10day >>>> >>>> >>>> >>>> >>>> Sep 9 12:49:31 [829553.899833][ T2925] ------------[ cut here ]------------ >>>> Sep 9 12:49:31 [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158 >>>> Sep 9 12:49:31 [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90 >>>> Sep 9 12:49:31 [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] >>>> Sep 9 12:49:31 [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G O 5.13.13 #1 >>>> Sep 9 12:49:31 [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 >>>> Sep 9 12:49:31 [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90 >>>> Sep 9 12:49:31 [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6 >>>> Sep 9 12:49:31 [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286 >>>> Sep 9 12:49:31 [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001 >>>> Sep 9 12:49:32 [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff >>>> Sep 9 12:49:32 [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff >>>> Sep 9 12:49:32 [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00 >>>> Sep 9 12:49:32 [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00 >>>> Sep 9 12:49:32 [829554.744221][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >>>> Sep 9 12:49:32 [829554.795701][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> Sep 9 12:49:32 [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0 >>>> Sep 9 12:49:32 [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>> Sep 9 12:49:32 [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>> Sep 9 12:49:32 [829554.972250][ T2925] Call Trace: >>>> Sep 9 12:49:32 [829554.996597][ T2925] netif_receive_skb_list_internal+0x25c/0x2b0 >>>> Sep 9 12:49:32 [829555.021270][ T2925] busy_poll_stop+0x113/0x140 >>>> Sep 9 12:49:32 [829555.045679][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 >>>> Sep 9 12:49:32 [829555.069833][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] >>>> Sep 9 12:49:32 [829555.093659][ T2925] napi_busy_loop+0x212/0x280 >>>> Sep 9 12:49:32 [829555.117046][ T2925] ep_poll+0xba/0x380 >>>> Sep 9 12:49:32 [829555.140048][ T2925] ? __napi_poll+0x1f/0x100 >>>> Sep 9 12:49:32 [829555.162477][ T2925] do_epoll_wait+0xa6/0xc0 >>>> Sep 9 12:49:32 [829555.184504][ T2925] do_epoll_pwait.part.0+0x9/0x70 >>>> Sep 9 12:49:32 [829555.206138][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 >>>> Sep 9 12:49:32 [829555.227619][ T2925] ? do_syscall_64+0x3a/0x70 >>>> Sep 9 12:49:32 [829555.248592][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae >>>> Sep 9 12:49:32 [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]--- >>>> Sep 9 12:49:32 [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9 >>>> Sep 9 12:49:32 [829555.357231][ T2925] #PF: supervisor read access in kernel mode >>>> Sep 9 12:49:32 [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page >>>> Sep 9 12:49:32 [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0 >>>> Sep 9 12:49:32 [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI >>>> Sep 9 12:49:32 [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G W O 5.13.13 #1 >>>> Sep 9 12:49:32 [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 >>>> Sep 9 12:49:32 [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 >>>> Sep 9 12:49:33 [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 >>>> Sep 9 12:49:33 [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 >>>> Sep 9 12:49:33 [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 >>>> Sep 9 12:49:33 [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 >>>> Sep 9 12:49:33 [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 >>>> Sep 9 12:49:33 [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 >>>> Sep 9 12:49:33 [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 >>>> Sep 9 12:49:33 [829555.797754][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >>>> Sep 9 12:49:33 [829555.843229][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> Sep 9 12:49:33 [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 >>>> Sep 9 12:49:33 [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>> Sep 9 12:49:33 [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>> Sep 9 12:49:33 [829556.008563][ T2925] Call Trace: >>>> Sep 9 12:49:33 [829556.032547][ T2925] ? enqueue_to_backlog+0x81/0x250 >>>> Sep 9 12:49:33 [829556.056686][ T2925] netif_receive_skb_list_internal+0x24d/0x2b0 >>>> Sep 9 12:49:33 [829556.080870][ T2925] busy_poll_stop+0x113/0x140 >>>> Sep 9 12:49:33 [829556.104559][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 >>>> Sep 9 12:49:33 [829556.128028][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] >>>> Sep 9 12:49:33 [829556.151405][ T2925] napi_busy_loop+0x212/0x280 >>>> Sep 9 12:49:33 [829556.174478][ T2925] ep_poll+0xba/0x380 >>>> Sep 9 12:49:33 [829556.196887][ T2925] ? __napi_poll+0x1f/0x100 >>>> Sep 9 12:49:33 [829556.219070][ T2925] do_epoll_wait+0xa6/0xc0 >>>> Sep 9 12:49:33 [829556.240778][ T2925] do_epoll_pwait.part.0+0x9/0x70 >>>> Sep 9 12:49:33 [829556.262203][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 >>>> Sep 9 12:49:33 [829556.283188][ T2925] ? do_syscall_64+0x3a/0x70 >>>> Sep 9 12:49:33 [829556.303666][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae >>>> Sep 9 12:49:33 [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] >>>> Sep 9 12:49:33 [829556.487037][ T2925] CR2: 00000000000496c9 >>>> Sep 9 12:49:33 [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]--- >>>> Sep 9 12:49:33 [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 >>>> Sep 9 12:49:34 [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 >>>> Sep 9 12:49:34 [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 >>>> Sep 9 12:49:34 [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 >>>> Sep 9 12:49:34 [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 >>>> Sep 9 12:49:34 [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 >>>> Sep 9 12:49:34 [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 >>>> Sep 9 12:49:34 [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 >>>> Sep 9 12:49:34 [829556.831749][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >>>> Sep 9 12:49:34 [829556.879295][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> Sep 9 12:49:34 [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 >>>> Sep 9 12:49:34 [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>> Sep 9 12:49:34 [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>> Sep 9 12:49:34 [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt >>>> Sep 9 12:49:34 [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) >>>> Sep 9 12:49:34 [829557.231174][ T2925] Rebooting in 10 seconds.. >>>> Sep 9 12:49:44 [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG. >>>> >>>>> On 30 Mar 2021, at 16:39, Eric Dumazet <edumazet@google.com> wrote: >>>>> >>>>> On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote: >>>>>> >>>>>> Hi Eric and Wei >>>>>> >>>>>> Please check this log : >>>>>> >>>>> >>>>> Please send a normal report to netdev. >>>>> >>>>> This has nothing to to with us (Eric & Wei) >>>>> >>>>> Thanks. >>>>> >>>>>> >>>>>> 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null) >>>>>> [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G O 5.11.4 #1 >>>>>> [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 >>>>>> [1584289.107263] Call Trace: >>>>>> [1584289.107266] dump_stack+0x58/0x6b >>>>>> [1584289.209562] warn_alloc.cold+0x70/0xd4 >>>>>> [1584289.209569] __alloc_pages_slowpath.constprop.0+0xd57/0xfb0 >>>>>> [1584289.209574] __alloc_pages_nodemask+0x15a/0x180 >>>>>> [1584289.474009] allocate_slab+0x272/0x450 >>>>>> [1584289.496731] ___slab_alloc.constprop.0+0x41e/0x4d0 >>>>>> [1584289.519147] kmem_cache_alloc+0x110/0x120 >>>>>> [1584289.541416] build_skb+0x1a/0x200 >>>>>> [1584289.563121] ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe] >>>>>> [1584289.584618] ixgbe_poll+0xeb/0x2a0 [ixgbe] >>>>>> [1584289.605528] __napi_poll+0x1f/0x130 >>>>>> [1584289.625842] napi_threaded_poll+0x110/0x160 >>>>>> [1584289.646110] ? __napi_poll+0x130/0x130 >>>>>> [1584289.665810] kthread+0xea/0x120 >>>>>> [1584289.684836] ? kthread_park+0x80/0x80 >>>>>> [1584289.703440] ret_from_fork+0x1f/0x30 >>>>>> [1584289.721616] Mem-Info: >>>>>> [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0 >>>>>> active_file:17408 inactive_file:149 isolated_file:32 >>>>>> unevictable:1440359 dirty:17500 writeback:0 >>>>>> slab_reclaimable:43368 slab_unreclaimable:155124 >>>>>> mapped:817431 shmem:7650 pagetables:32093 bounce:0 >>>>>> free:17832 free_pcp:113 free_cma:0 >>>>>> [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no >>>>>> [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB >>>>>> [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726 >>>>>> [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB >>>>>> [1584290.104980] lowmem_reserve[]: 0 0 13985 13985 >>>>>> [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB >>>>>> [1584290.237051] lowmem_reserve[]: 0 0 0 0 >>>>>> [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB >>>>>> [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB >>>>>> [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB >>>>>> [1584290.409087] 1465768 total pagecache pages >>>>>> [1584290.434531] 4165289 pages RAM >>>>>> [1584290.459616] 0 pages HighMem/MovableOnly >>>>>> [1584290.484480] 104766 pages reserved >>>>>> [1584290.508709] 0 pages hwpoisoned >>>>>> [1584301.710231] team0: Failed to send options change via netlink (err -105) >>>>>> [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0 >>>>>> [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G O 5.11.4 #1 >>>>>> [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 >>>>>> [1584302.776532] Call Trace: >>>>>> [1584302.799361] dump_stack+0x58/0x6b >>>>>> [1584302.821791] dump_header+0x4c/0x2e6 >>>>>> [1584302.843580] oom_kill_process.cold+0xb/0x10 >>>>>> [1584302.865223] out_of_memory.part.0+0x125/0x5f0 >>>>>> [1584302.886641] out_of_memory+0x54/0xa0 >>>>>> [1584302.907302] __alloc_pages_slowpath.constprop.0+0xb03/0xfb0 >>>>>> [1584302.927913] __alloc_pages_nodemask+0x15a/0x180 >>>>>> [1584302.947874] __get_free_pages+0x8/0x30 >>>>>> [1584302.967246] pgd_alloc+0x21/0x180 >>>>>> [1584302.986355] mm_alloc+0x1af/0x250 >>>>>> [1584303.005085] alloc_bprm+0x80/0x2a0 >>>>>> [1584303.023328] do_execveat_common+0x8b/0x330 >>>>>> [1584303.041181] __x64_sys_execve+0x2b/0x40 >>>>>> [1584303.058513] do_syscall_64+0x2d/0x40 >>>>>> [1584303.075281] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>>>> [1584303.091891] RIP: 0033:0x488376 >>>>>> [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30 >>>>>> [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b >>>>>> [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376 >>>>>> [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660 >>>>>> [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000 >>>>>> [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258 >>>>>> [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100 >>>>>> [1584303.379094] Mem-Info: >>>>>> [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0 >>>>>> active_file:12975 inactive_file:168 isolated_file:32 >>>>>> unevictable:909709 dirty:12864 writeback:10 >>>>>> slab_reclaimable:42415 slab_unreclaimable:154783 >>>>>> mapped:39825 shmem:14744 pagetables:26041 bounce:0 >>>>>> free:537002 free_pcp:1813 free_cma:0 >>>>>> [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no >>>>>> [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB >>>>>> [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726 >>>>>> [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB >>>>>> [1584303.888935] lowmem_reserve[]: 0 0 13985 13985 >>>>>> [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB >>>>>> [1584304.036531] lowmem_reserve[]: 0 0 0 0 >>>>>> [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB >>>>>> [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB >>>>>> [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB >>>>>> [1584304.287094] 933871 total pagecache pages >>>>>> [1584304.312815] 4165289 pages RAM >>>>>> [1584304.337915] 0 pages HighMem/MovableOnly >>>>>> [1584304.362522] 104766 pages reserved >>>>>> [1584304.386516] 0 pages hwpoisoned >>>>>> >>>>>>> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote: >>>>>>> >>>>>>> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote: >>>>>>>> >>>>>>>> Hi Wei >>>>>>>> Check this: >>>>>>>> >>>>>>>> [ 39.706567] ------------[ cut here ]------------ >>>>>>>> [ 39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557) >>>>>>>> [ 39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100 >>>>>>> >>>>>>> Probably more relevant to Intel maintainers than Wei :/ >>>>>>> >>>>>>>> [ 39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas >>>>>>>> [ 39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G O 5.11.7 #1 >>>>>>>> [ 39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020 >>>>>>>> [ 39.706619] Workqueue: events work_for_cpu_fn >>>>>>>> [ 39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100 >>>>>>>> [ 39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 >>>>>>>> [ 39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292 >>>>>>>> [ 39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff >>>>>>>> [ 39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001 >>>>>>>> [ 39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea >>>>>>>> [ 39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008 >>>>>>>> [ 39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000 >>>>>>>> [ 39.706646] FS: 0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000 >>>>>>>> [ 39.706649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>>>> [ 39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0 >>>>>>>> [ 39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>>>> [ 39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>>>>>> [ 39.706656] Call Trace: >>>>>>>> [ 39.706658] i40e_setup_pf_switch+0x617/0xf80 [i40e] >>>>>>>> [ 39.706683] i40e_probe.part.0.cold+0x8dc/0x109e [i40e] >>>>>>>> [ 39.706708] ? acpi_ns_check_object_type+0xd4/0x193 >>>>>>>> [ 39.706713] ? acpi_ns_check_package_list+0xfd/0x205 >>>>>>>> [ 39.706716] ? __kmalloc+0x37/0x160 >>>>>>>> [ 39.706720] ? kmem_cache_alloc+0xcb/0x120 >>>>>>>> [ 39.706723] ? irq_get_irq_data+0x5/0x20 >>>>>>>> [ 39.706726] ? mp_check_pin_attr+0xe/0xf0 >>>>>>>> [ 39.706729] ? irq_get_irq_data+0x5/0x20 >>>>>>>> [ 39.706731] ? mp_map_pin_to_irq+0xb7/0x2c0 >>>>>>>> [ 39.706735] ? acpi_register_gsi_ioapic+0x86/0x150 >>>>>>>> [ 39.706739] ? pci_conf1_read+0x9f/0xf0 >>>>>>>> [ 39.706743] ? pci_bus_read_config_word+0x2e/0x40 >>>>>>>> [ 39.706746] local_pci_probe+0x1b/0x40 >>>>>>>> [ 39.706750] work_for_cpu_fn+0xb/0x20 >>>>>>>> [ 39.706754] process_one_work+0x1ec/0x350 >>>>>>>> [ 39.706758] worker_thread+0x24b/0x4d0 >>>>>>>> [ 39.706760] ? process_one_work+0x350/0x350 >>>>>>>> [ 39.706762] kthread+0xea/0x120 >>>>>>>> [ 39.706766] ? kthread_park+0x80/0x80 >>>>>>>> [ 39.706770] ret_from_fork+0x1f/0x30 >>>>>>>> [ 39.706774] ---[ end trace 7a203f3ec972a377 ]--- >>>>>>>> >>>>>>>> Martin >>>>>>>> >>>>>>>> >>>>>>>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote: >>>>>>>>> >>>>>>>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to >>>>>>>>> determine if the kthread owns this napi and could call napi->poll() on >>>>>>>>> it. However, if socket busy poll is enabled, it is possible that the >>>>>>>>> busy poll thread grabs this SCHED bit (after the previous napi->poll() >>>>>>>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll >>>>>>>>> on the same napi. napi_disable() could grab the SCHED bit as well. >>>>>>>>> This patch tries to fix this race by adding a new bit >>>>>>>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in >>>>>>>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared >>>>>>>>> in napi_complete_done(), and we only poll the napi in kthread if this >>>>>>>>> bit is set. This helps distinguish the ownership of the napi between >>>>>>>>> kthread and other scenarios and fixes the race issue. >>>>>>>>> >>>>>>>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") >>>>>>>>> Reported-by: Martin Zaharinov <micron10@gmail.com> >>>>>>>>> Suggested-by: Jakub Kicinski <kuba@kernel.org> >>>>>>>>> Signed-off-by: Wei Wang <weiwan@google.com> >>>>>>>>> Cc: Alexander Duyck <alexanderduyck@fb.com> >>>>>>>>> Cc: Eric Dumazet <edumazet@google.com> >>>>>>>>> Cc: Paolo Abeni <pabeni@redhat.com> >>>>>>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> >>>>>>>>> --- >>>>>>>>> Change since v3: >>>>>>>>> - Add READ_ONCE() for thread->state and add comments in >>>>>>>>> ____napi_schedule(). >>>>>>>>> >>>>>>>>> include/linux/netdevice.h | 2 ++ >>>>>>>>> net/core/dev.c | 19 ++++++++++++++++++- >>>>>>>>> 2 files changed, 20 insertions(+), 1 deletion(-) >>>>>>>>> >>>>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>>>>>>>> index 5b67ea89d5f2..87a5d186faff 100644 >>>>>>>>> --- a/include/linux/netdevice.h >>>>>>>>> +++ b/include/linux/netdevice.h >>>>>>>>> @@ -360,6 +360,7 @@ enum { >>>>>>>>> NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */ >>>>>>>>> NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/ >>>>>>>>> NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/ >>>>>>>>> + NAPI_STATE_SCHED_THREADED, /* Napi is currently scheduled in threaded mode */ >>>>>>>>> }; >>>>>>>>> >>>>>>>>> enum { >>>>>>>>> @@ -372,6 +373,7 @@ enum { >>>>>>>>> NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL), >>>>>>>>> NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL), >>>>>>>>> NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED), >>>>>>>>> + NAPIF_STATE_SCHED_THREADED = BIT(NAPI_STATE_SCHED_THREADED), >>>>>>>>> }; >>>>>>>>> >>>>>>>>> enum gro_result { >>>>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c >>>>>>>>> index 6c5967e80132..d3195a95f30e 100644 >>>>>>>>> --- a/net/core/dev.c >>>>>>>>> +++ b/net/core/dev.c >>>>>>>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd, >>>>>>>>> */ >>>>>>>>> thread = READ_ONCE(napi->thread); >>>>>>>>> if (thread) { >>>>>>>>> + /* Avoid doing set_bit() if the thread is in >>>>>>>>> + * INTERRUPTIBLE state, cause napi_thread_wait() >>>>>>>>> + * makes sure to proceed with napi polling >>>>>>>>> + * if the thread is explicitly woken from here. >>>>>>>>> + */ >>>>>>>>> + if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE) >>>>>>>>> + set_bit(NAPI_STATE_SCHED_THREADED, &napi->state); >>>>>>>>> wake_up_process(thread); >>>>>>>>> return; >>>>>>>>> } >>>>>>>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done) >>>>>>>>> WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED)); >>>>>>>>> >>>>>>>>> new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED | >>>>>>>>> + NAPIF_STATE_SCHED_THREADED | >>>>>>>>> NAPIF_STATE_PREFER_BUSY_POLL); >>>>>>>>> >>>>>>>>> /* If STATE_MISSED was set, leave STATE_SCHED set, >>>>>>>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) >>>>>>>>> >>>>>>>>> static int napi_thread_wait(struct napi_struct *napi) >>>>>>>>> { >>>>>>>>> + bool woken = false; >>>>>>>>> + >>>>>>>>> set_current_state(TASK_INTERRUPTIBLE); >>>>>>>>> >>>>>>>>> while (!kthread_should_stop() && !napi_disable_pending(napi)) { >>>>>>>>> - if (test_bit(NAPI_STATE_SCHED, &napi->state)) { >>>>>>>>> + /* Testing SCHED_THREADED bit here to make sure the current >>>>>>>>> + * kthread owns this napi and could poll on this napi. >>>>>>>>> + * Testing SCHED bit is not enough because SCHED bit might be >>>>>>>>> + * set by some other busy poll thread or by napi_disable(). >>>>>>>>> + */ >>>>>>>>> + if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) { >>>>>>>>> WARN_ON(!list_empty(&napi->poll_list)); >>>>>>>>> __set_current_state(TASK_RUNNING); >>>>>>>>> return 0; >>>>>>>>> } >>>>>>>>> >>>>>>>>> schedule(); >>>>>>>>> + /* woken being true indicates this thread owns this napi. */ >>>>>>>>> + woken = true; >>>>>>>>> set_current_state(TASK_INTERRUPTIBLE); >>>>>>>>> } >>>>>>>>> __set_current_state(TASK_RUNNING); >>>>>>>>> -- >>>>>>>>> 2.31.0.rc2.261.g7f71774620-goog >>>>>>>>> >>>>>>>> >>>>>> >>>> >>
Hi Wei One more bug report from last hours: Sep 9 12:49:31 [829553.899833][ T2925] ------------[ cut here ]------------ Sep 9 12:49:31 [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158 Sep 9 12:49:31 [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90 Sep 9 12:49:31 [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] Sep 9 12:49:31 [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G O 5.13.13 #1 Sep 9 12:49:31 [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 Sep 9 12:49:31 [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90 Sep 9 12:49:31 [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6 Sep 9 12:49:31 [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286 Sep 9 12:49:31 [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001 Sep 9 12:49:32 [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff Sep 9 12:49:32 [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff Sep 9 12:49:32 [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00 Sep 9 12:49:32 [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00 Sep 9 12:49:32 [829554.744221][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 Sep 9 12:49:32 [829554.795701][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 9 12:49:32 [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0 Sep 9 12:49:32 [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sep 9 12:49:32 [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Sep 9 12:49:32 [829554.972250][ T2925] Call Trace: Sep 9 12:49:32 [829554.996597][ T2925] netif_receive_skb_list_internal+0x25c/0x2b0 Sep 9 12:49:32 [829555.021270][ T2925] busy_poll_stop+0x113/0x140 Sep 9 12:49:32 [829555.045679][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 Sep 9 12:49:32 [829555.069833][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] Sep 9 12:49:32 [829555.093659][ T2925] napi_busy_loop+0x212/0x280 Sep 9 12:49:32 [829555.117046][ T2925] ep_poll+0xba/0x380 Sep 9 12:49:32 [829555.140048][ T2925] ? __napi_poll+0x1f/0x100 Sep 9 12:49:32 [829555.162477][ T2925] do_epoll_wait+0xa6/0xc0 Sep 9 12:49:32 [829555.184504][ T2925] do_epoll_pwait.part.0+0x9/0x70 Sep 9 12:49:32 [829555.206138][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 Sep 9 12:49:32 [829555.227619][ T2925] ? do_syscall_64+0x3a/0x70 Sep 9 12:49:32 [829555.248592][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae Sep 9 12:49:32 [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]--- Sep 9 12:49:32 [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9 Sep 9 12:49:32 [829555.357231][ T2925] #PF: supervisor read access in kernel mode Sep 9 12:49:32 [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page Sep 9 12:49:32 [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0 Sep 9 12:49:32 [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI Sep 9 12:49:32 [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G W O 5.13.13 #1 Sep 9 12:49:32 [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 Sep 9 12:49:32 [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 Sep 9 12:49:33 [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 Sep 9 12:49:33 [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 Sep 9 12:49:33 [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 Sep 9 12:49:33 [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 Sep 9 12:49:33 [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 Sep 9 12:49:33 [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 Sep 9 12:49:33 [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 Sep 9 12:49:33 [829555.797754][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 Sep 9 12:49:33 [829555.843229][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 9 12:49:33 [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 Sep 9 12:49:33 [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sep 9 12:49:33 [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Sep 9 12:49:33 [829556.008563][ T2925] Call Trace: Sep 9 12:49:33 [829556.032547][ T2925] ? enqueue_to_backlog+0x81/0x250 Sep 9 12:49:33 [829556.056686][ T2925] netif_receive_skb_list_internal+0x24d/0x2b0 Sep 9 12:49:33 [829556.080870][ T2925] busy_poll_stop+0x113/0x140 Sep 9 12:49:33 [829556.104559][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 Sep 9 12:49:33 [829556.128028][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] Sep 9 12:49:33 [829556.151405][ T2925] napi_busy_loop+0x212/0x280 Sep 9 12:49:33 [829556.174478][ T2925] ep_poll+0xba/0x380 Sep 9 12:49:33 [829556.196887][ T2925] ? __napi_poll+0x1f/0x100 Sep 9 12:49:33 [829556.219070][ T2925] do_epoll_wait+0xa6/0xc0 Sep 9 12:49:33 [829556.240778][ T2925] do_epoll_pwait.part.0+0x9/0x70 Sep 9 12:49:33 [829556.262203][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 Sep 9 12:49:33 [829556.283188][ T2925] ? do_syscall_64+0x3a/0x70 Sep 9 12:49:33 [829556.303666][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae Sep 9 12:49:33 [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] Sep 9 12:49:33 [829556.487037][ T2925] CR2: 00000000000496c9 Sep 9 12:49:33 [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]--- Sep 9 12:49:33 [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 Sep 9 12:49:34 [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 Sep 9 12:49:34 [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 Sep 9 12:49:34 [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 Sep 9 12:49:34 [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 Sep 9 12:49:34 [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 Sep 9 12:49:34 [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 Sep 9 12:49:34 [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 Sep 9 12:49:34 [829556.831749][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 Sep 9 12:49:34 [829556.879295][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Sep 9 12:49:34 [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 Sep 9 12:49:34 [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sep 9 12:49:34 [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Sep 9 12:49:34 [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt Sep 9 12:49:34 [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Sep 9 12:49:34 [829557.231174][ T2925] Rebooting in 10 seconds.. Sep 9 12:49:44 [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG. > On 15 Sep 2021, at 18:45, Wei Wang <weiwan@google.com> wrote: > > Thanks Martin for the report. > Without a reproducer, it might be hard to debug. I will double check > the code to check for potential race between kthread poll and busy > poll. > > Thanks. > Wei > > On Wed, Sep 15, 2021 at 7:22 AM Martin Zaharinov <micron10@gmail.com> wrote: >> >> Hi Wei >> Please see this bug log : >> >> >> Sep 15 08:04:56 [2034411.548669][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> Sep 15 08:04:56 [2034411.574642][ T3195] CR2: 00007f1e8cf58000 CR3: 0000000183cb4003 CR4: 00000000001706e0 >> Sep 15 08:04:56 [2034411.625156][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> Sep 15 08:04:56 [2034411.675495][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> Sep 15 08:04:56 [2034411.725536][ T3195] Call Trace: >> Sep 15 08:04:56 [2034411.749948][ T3195] netif_receive_skb_list_internal+0x25c/0x2b0 >> Sep 15 08:04:56 [2034411.774579][ T3195] gro_normal_one+0x6e/0x90 >> Sep 15 08:04:56 [2034411.798786][ T3195] napi_gro_flush+0xb1/0x100 >> Sep 15 08:04:56 [2034411.822410][ T3195] napi_complete_done+0x107/0x180 >> Sep 15 08:04:56 [2034411.845614][ T3195] ixgbe_poll+0x10e/0x240 [ixgbe] >> Sep 15 08:04:56 [2034411.868480][ T3195] __napi_poll+0x1f/0x100 >> Sep 15 08:04:56 [2034411.890899][ T3195] ? __napi_poll+0x100/0x100 >> Sep 15 08:04:56 [2034411.912799][ T3195] napi_threaded_poll+0x105/0x150 >> Sep 15 08:04:56 [2034411.934567][ T3195] kthread+0x101/0x120 >> Sep 15 08:04:56 [2034411.955873][ T3195] ? set_kthread_struct+0x30/0x30 >> Sep 15 08:04:56 [2034411.977157][ T3195] ret_from_fork+0x1f/0x30 >> Sep 15 08:04:56 [2034411.997922][ T3195] ---[ end trace 83b8d17d2762bc73 ]--- >> Sep 15 08:04:56 [2034412.018439][ T3195] BUG: kernel NULL pointer dereference, address: 0000000000000000 >> Sep 15 08:04:56 [2034412.058658][ T3195] #PF: supervisor read access in kernel mode >> Sep 15 08:04:56 [2034412.078866][ T3195] #PF: error_code(0x0000) - not-present page >> Sep 15 08:04:56 [2034412.098648][ T3195] PGD 12fabb067 P4D 12fabb067 PUD 17d7fa067 PMD 0 >> Sep 15 08:04:56 [2034412.118230][ T3195] Oops: 0000 [#1] SMP NOPTI >> Sep 15 08:04:56 [2034412.137240][ T3195] CPU: 2 PID: 3195 Comm: napi/eth0-513 Tainted: G S W O 5.13.12 #1 >> Sep 15 08:04:56 [2034412.174538][ T3195] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 >> Sep 15 08:04:56 [2034412.211613][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0 >> Sep 15 08:04:56 [2034412.230852][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49 >> Sep 15 08:04:56 [2034412.287806][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296 >> Sep 15 08:04:56 [2034412.306525][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 >> Sep 15 08:04:56 [2034412.343308][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003 >> Sep 15 08:04:56 [2034412.381399][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff >> Sep 15 08:04:56 [2034412.421437][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00 >> Sep 15 08:04:57 [2034412.463438][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00 >> Sep 15 08:04:57 [2034412.507493][ T3195] FS: 0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000 >> Sep 15 08:04:57 [2034412.553528][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> Sep 15 08:04:57 [2034412.577424][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0 >> Sep 15 08:04:57 [2034412.624853][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> Sep 15 08:04:57 [2034412.672633][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> Sep 15 08:04:57 [2034412.721656][ T3195] Call Trace: >> Sep 15 08:04:57 [2034412.746016][ T3195] gro_normal_one+0x6e/0x90 >> Sep 15 08:04:57 [2034412.770321][ T3195] napi_gro_flush+0xb1/0x100 >> Sep 15 08:04:57 [2034412.794137][ T3195] napi_complete_done+0x107/0x180 >> Sep 15 08:04:57 [2034412.817556][ T3195] ixgbe_poll+0x10e/0x240 [ixgbe] >> Sep 15 08:04:57 [2034412.840522][ T3195] __napi_poll+0x1f/0x100 >> Sep 15 08:04:57 [2034412.862829][ T3195] ? __napi_poll+0x100/0x100 >> Sep 15 08:04:57 [2034412.884804][ T3195] napi_threaded_poll+0x105/0x150 >> Sep 15 08:04:57 [2034412.906305][ T3195] kthread+0x101/0x120 >> Sep 15 08:04:57 [2034412.927502][ T3195] ? set_kthread_struct+0x30/0x30 >> Sep 15 08:04:57 [2034412.948434][ T3195] ret_from_fork+0x1f/0x30 >> Sep 15 08:04:57 [2034412.969117][ T3195] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_dtvqos(O) xt_TCPMSS xt_comment iptable_mangle xt_MASQUERADE xt_nat iptable_nat ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: DTVIAC] >> Sep 15 08:04:57 [2034413.136792][ T3195] CR2: 0000000000000000 >> Sep 15 08:04:57 [2034413.157289][ T3195] ---[ end trace 83b8d17d2762bc74 ]--- >> Sep 15 08:04:57 [2034413.177534][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0 >> Sep 15 08:04:57 [2034413.197775][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49 >> Sep 15 08:04:57 [2034413.258524][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296 >> Sep 15 08:04:57 [2034413.278591][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 >> Sep 15 08:04:57 [2034413.317532][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003 >> Sep 15 08:04:57 [2034413.356936][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff >> Sep 15 08:04:57 [2034413.398408][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00 >> Sep 15 08:04:58 [2034413.441949][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00 >> Sep 15 08:04:58 [2034413.487558][ T3195] FS: 0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000 >> Sep 15 08:04:58 [2034413.535263][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> Sep 15 08:04:58 [2034413.560012][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0 >> Sep 15 08:04:58 [2034413.609277][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> Sep 15 08:04:58 [2034413.658501][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> Sep 15 08:04:58 [2034413.707535][ T3195] Kernel panic - not syncing: Fatal exception in interrupt >> Sep 15 08:04:58 [2034413.856962][ T3195] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) >> Sep 15 08:04:58 [2034413.906445][ T3195] Rebooting in 10 seconds.. >> Sep 15 08:05:08 [2034423.930880][ T3195] ACPI MEMORY or I/O RESET_REG. >> >> >> >> >>> On 10 Sep 2021, at 3:30, Wei Wang <weiwan@google.com> wrote: >>> >>> Hi Martin, >>> >>> Is there a reproducer for this? What kind of traffic is it running? >>> What is the following config: >>> cat /proc/sys/net/core/busy_poll >>> cat /proc/sys/net/core/busy_read >>> cat /sys/class/net/<ixgbe_dev>/threaded >>> And is SO_PREFER_BUSY_POLL used? >>> >>> Thanks. >>> Wei >>> >>> >>> >>> On Thu, Sep 9, 2021 at 4:18 AM Martin Zaharinov <micron10@gmail.com> wrote: >>>> >>>> Hi Eric and Wei >>>> >>>> Please see this bug report from last hour , >>>> Kernel 5.13.13, Traffic is 7Gb/s down/ 7Gb/s up >>>> Uptime before crash : 10day >>>> >>>> >>>> >>>> >>>> Sep 9 12:49:31 [829553.899833][ T2925] ------------[ cut here ]------------ >>>> Sep 9 12:49:31 [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158 >>>> Sep 9 12:49:31 [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90 >>>> Sep 9 12:49:31 [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] >>>> Sep 9 12:49:31 [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G O 5.13.13 #1 >>>> Sep 9 12:49:31 [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 >>>> Sep 9 12:49:31 [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90 >>>> Sep 9 12:49:31 [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6 >>>> Sep 9 12:49:31 [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286 >>>> Sep 9 12:49:31 [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001 >>>> Sep 9 12:49:32 [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff >>>> Sep 9 12:49:32 [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff >>>> Sep 9 12:49:32 [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00 >>>> Sep 9 12:49:32 [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00 >>>> Sep 9 12:49:32 [829554.744221][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >>>> Sep 9 12:49:32 [829554.795701][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> Sep 9 12:49:32 [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0 >>>> Sep 9 12:49:32 [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>> Sep 9 12:49:32 [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>> Sep 9 12:49:32 [829554.972250][ T2925] Call Trace: >>>> Sep 9 12:49:32 [829554.996597][ T2925] netif_receive_skb_list_internal+0x25c/0x2b0 >>>> Sep 9 12:49:32 [829555.021270][ T2925] busy_poll_stop+0x113/0x140 >>>> Sep 9 12:49:32 [829555.045679][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 >>>> Sep 9 12:49:32 [829555.069833][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] >>>> Sep 9 12:49:32 [829555.093659][ T2925] napi_busy_loop+0x212/0x280 >>>> Sep 9 12:49:32 [829555.117046][ T2925] ep_poll+0xba/0x380 >>>> Sep 9 12:49:32 [829555.140048][ T2925] ? __napi_poll+0x1f/0x100 >>>> Sep 9 12:49:32 [829555.162477][ T2925] do_epoll_wait+0xa6/0xc0 >>>> Sep 9 12:49:32 [829555.184504][ T2925] do_epoll_pwait.part.0+0x9/0x70 >>>> Sep 9 12:49:32 [829555.206138][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 >>>> Sep 9 12:49:32 [829555.227619][ T2925] ? do_syscall_64+0x3a/0x70 >>>> Sep 9 12:49:32 [829555.248592][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae >>>> Sep 9 12:49:32 [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]--- >>>> Sep 9 12:49:32 [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9 >>>> Sep 9 12:49:32 [829555.357231][ T2925] #PF: supervisor read access in kernel mode >>>> Sep 9 12:49:32 [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page >>>> Sep 9 12:49:32 [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0 >>>> Sep 9 12:49:32 [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI >>>> Sep 9 12:49:32 [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G W O 5.13.13 #1 >>>> Sep 9 12:49:32 [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 >>>> Sep 9 12:49:32 [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 >>>> Sep 9 12:49:33 [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 >>>> Sep 9 12:49:33 [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 >>>> Sep 9 12:49:33 [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 >>>> Sep 9 12:49:33 [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 >>>> Sep 9 12:49:33 [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 >>>> Sep 9 12:49:33 [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 >>>> Sep 9 12:49:33 [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 >>>> Sep 9 12:49:33 [829555.797754][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >>>> Sep 9 12:49:33 [829555.843229][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> Sep 9 12:49:33 [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 >>>> Sep 9 12:49:33 [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>> Sep 9 12:49:33 [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>> Sep 9 12:49:33 [829556.008563][ T2925] Call Trace: >>>> Sep 9 12:49:33 [829556.032547][ T2925] ? enqueue_to_backlog+0x81/0x250 >>>> Sep 9 12:49:33 [829556.056686][ T2925] netif_receive_skb_list_internal+0x24d/0x2b0 >>>> Sep 9 12:49:33 [829556.080870][ T2925] busy_poll_stop+0x113/0x140 >>>> Sep 9 12:49:33 [829556.104559][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 >>>> Sep 9 12:49:33 [829556.128028][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] >>>> Sep 9 12:49:33 [829556.151405][ T2925] napi_busy_loop+0x212/0x280 >>>> Sep 9 12:49:33 [829556.174478][ T2925] ep_poll+0xba/0x380 >>>> Sep 9 12:49:33 [829556.196887][ T2925] ? __napi_poll+0x1f/0x100 >>>> Sep 9 12:49:33 [829556.219070][ T2925] do_epoll_wait+0xa6/0xc0 >>>> Sep 9 12:49:33 [829556.240778][ T2925] do_epoll_pwait.part.0+0x9/0x70 >>>> Sep 9 12:49:33 [829556.262203][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 >>>> Sep 9 12:49:33 [829556.283188][ T2925] ? do_syscall_64+0x3a/0x70 >>>> Sep 9 12:49:33 [829556.303666][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae >>>> Sep 9 12:49:33 [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] >>>> Sep 9 12:49:33 [829556.487037][ T2925] CR2: 00000000000496c9 >>>> Sep 9 12:49:33 [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]--- >>>> Sep 9 12:49:33 [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 >>>> Sep 9 12:49:34 [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 >>>> Sep 9 12:49:34 [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 >>>> Sep 9 12:49:34 [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 >>>> Sep 9 12:49:34 [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 >>>> Sep 9 12:49:34 [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 >>>> Sep 9 12:49:34 [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 >>>> Sep 9 12:49:34 [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 >>>> Sep 9 12:49:34 [829556.831749][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >>>> Sep 9 12:49:34 [829556.879295][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> Sep 9 12:49:34 [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 >>>> Sep 9 12:49:34 [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>> Sep 9 12:49:34 [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>> Sep 9 12:49:34 [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt >>>> Sep 9 12:49:34 [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) >>>> Sep 9 12:49:34 [829557.231174][ T2925] Rebooting in 10 seconds.. >>>> Sep 9 12:49:44 [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG. >>>> >>>>> On 30 Mar 2021, at 16:39, Eric Dumazet <edumazet@google.com> wrote: >>>>> >>>>> On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote: >>>>>> >>>>>> Hi Eric and Wei >>>>>> >>>>>> Please check this log : >>>>>> >>>>> >>>>> Please send a normal report to netdev. >>>>> >>>>> This has nothing to to with us (Eric & Wei) >>>>> >>>>> Thanks. >>>>> >>>>>> >>>>>> 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null) >>>>>> [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G O 5.11.4 #1 >>>>>> [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 >>>>>> [1584289.107263] Call Trace: >>>>>> [1584289.107266] dump_stack+0x58/0x6b >>>>>> [1584289.209562] warn_alloc.cold+0x70/0xd4 >>>>>> [1584289.209569] __alloc_pages_slowpath.constprop.0+0xd57/0xfb0 >>>>>> [1584289.209574] __alloc_pages_nodemask+0x15a/0x180 >>>>>> [1584289.474009] allocate_slab+0x272/0x450 >>>>>> [1584289.496731] ___slab_alloc.constprop.0+0x41e/0x4d0 >>>>>> [1584289.519147] kmem_cache_alloc+0x110/0x120 >>>>>> [1584289.541416] build_skb+0x1a/0x200 >>>>>> [1584289.563121] ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe] >>>>>> [1584289.584618] ixgbe_poll+0xeb/0x2a0 [ixgbe] >>>>>> [1584289.605528] __napi_poll+0x1f/0x130 >>>>>> [1584289.625842] napi_threaded_poll+0x110/0x160 >>>>>> [1584289.646110] ? __napi_poll+0x130/0x130 >>>>>> [1584289.665810] kthread+0xea/0x120 >>>>>> [1584289.684836] ? kthread_park+0x80/0x80 >>>>>> [1584289.703440] ret_from_fork+0x1f/0x30 >>>>>> [1584289.721616] Mem-Info: >>>>>> [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0 >>>>>> active_file:17408 inactive_file:149 isolated_file:32 >>>>>> unevictable:1440359 dirty:17500 writeback:0 >>>>>> slab_reclaimable:43368 slab_unreclaimable:155124 >>>>>> mapped:817431 shmem:7650 pagetables:32093 bounce:0 >>>>>> free:17832 free_pcp:113 free_cma:0 >>>>>> [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no >>>>>> [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB >>>>>> [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726 >>>>>> [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB >>>>>> [1584290.104980] lowmem_reserve[]: 0 0 13985 13985 >>>>>> [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB >>>>>> [1584290.237051] lowmem_reserve[]: 0 0 0 0 >>>>>> [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB >>>>>> [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB >>>>>> [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB >>>>>> [1584290.409087] 1465768 total pagecache pages >>>>>> [1584290.434531] 4165289 pages RAM >>>>>> [1584290.459616] 0 pages HighMem/MovableOnly >>>>>> [1584290.484480] 104766 pages reserved >>>>>> [1584290.508709] 0 pages hwpoisoned >>>>>> [1584301.710231] team0: Failed to send options change via netlink (err -105) >>>>>> [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0 >>>>>> [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G O 5.11.4 #1 >>>>>> [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 >>>>>> [1584302.776532] Call Trace: >>>>>> [1584302.799361] dump_stack+0x58/0x6b >>>>>> [1584302.821791] dump_header+0x4c/0x2e6 >>>>>> [1584302.843580] oom_kill_process.cold+0xb/0x10 >>>>>> [1584302.865223] out_of_memory.part.0+0x125/0x5f0 >>>>>> [1584302.886641] out_of_memory+0x54/0xa0 >>>>>> [1584302.907302] __alloc_pages_slowpath.constprop.0+0xb03/0xfb0 >>>>>> [1584302.927913] __alloc_pages_nodemask+0x15a/0x180 >>>>>> [1584302.947874] __get_free_pages+0x8/0x30 >>>>>> [1584302.967246] pgd_alloc+0x21/0x180 >>>>>> [1584302.986355] mm_alloc+0x1af/0x250 >>>>>> [1584303.005085] alloc_bprm+0x80/0x2a0 >>>>>> [1584303.023328] do_execveat_common+0x8b/0x330 >>>>>> [1584303.041181] __x64_sys_execve+0x2b/0x40 >>>>>> [1584303.058513] do_syscall_64+0x2d/0x40 >>>>>> [1584303.075281] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>>>> [1584303.091891] RIP: 0033:0x488376 >>>>>> [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30 >>>>>> [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b >>>>>> [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376 >>>>>> [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660 >>>>>> [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000 >>>>>> [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258 >>>>>> [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100 >>>>>> [1584303.379094] Mem-Info: >>>>>> [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0 >>>>>> active_file:12975 inactive_file:168 isolated_file:32 >>>>>> unevictable:909709 dirty:12864 writeback:10 >>>>>> slab_reclaimable:42415 slab_unreclaimable:154783 >>>>>> mapped:39825 shmem:14744 pagetables:26041 bounce:0 >>>>>> free:537002 free_pcp:1813 free_cma:0 >>>>>> [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no >>>>>> [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB >>>>>> [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726 >>>>>> [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB >>>>>> [1584303.888935] lowmem_reserve[]: 0 0 13985 13985 >>>>>> [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB >>>>>> [1584304.036531] lowmem_reserve[]: 0 0 0 0 >>>>>> [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB >>>>>> [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB >>>>>> [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB >>>>>> [1584304.287094] 933871 total pagecache pages >>>>>> [1584304.312815] 4165289 pages RAM >>>>>> [1584304.337915] 0 pages HighMem/MovableOnly >>>>>> [1584304.362522] 104766 pages reserved >>>>>> [1584304.386516] 0 pages hwpoisoned >>>>>> >>>>>>> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote: >>>>>>> >>>>>>> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote: >>>>>>>> >>>>>>>> Hi Wei >>>>>>>> Check this: >>>>>>>> >>>>>>>> [ 39.706567] ------------[ cut here ]------------ >>>>>>>> [ 39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557) >>>>>>>> [ 39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100 >>>>>>> >>>>>>> Probably more relevant to Intel maintainers than Wei :/ >>>>>>> >>>>>>>> [ 39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas >>>>>>>> [ 39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G O 5.11.7 #1 >>>>>>>> [ 39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020 >>>>>>>> [ 39.706619] Workqueue: events work_for_cpu_fn >>>>>>>> [ 39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100 >>>>>>>> [ 39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 >>>>>>>> [ 39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292 >>>>>>>> [ 39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff >>>>>>>> [ 39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001 >>>>>>>> [ 39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea >>>>>>>> [ 39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008 >>>>>>>> [ 39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000 >>>>>>>> [ 39.706646] FS: 0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000 >>>>>>>> [ 39.706649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>>>> [ 39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0 >>>>>>>> [ 39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>>>> [ 39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>>>>>> [ 39.706656] Call Trace: >>>>>>>> [ 39.706658] i40e_setup_pf_switch+0x617/0xf80 [i40e] >>>>>>>> [ 39.706683] i40e_probe.part.0.cold+0x8dc/0x109e [i40e] >>>>>>>> [ 39.706708] ? acpi_ns_check_object_type+0xd4/0x193 >>>>>>>> [ 39.706713] ? acpi_ns_check_package_list+0xfd/0x205 >>>>>>>> [ 39.706716] ? __kmalloc+0x37/0x160 >>>>>>>> [ 39.706720] ? kmem_cache_alloc+0xcb/0x120 >>>>>>>> [ 39.706723] ? irq_get_irq_data+0x5/0x20 >>>>>>>> [ 39.706726] ? mp_check_pin_attr+0xe/0xf0 >>>>>>>> [ 39.706729] ? irq_get_irq_data+0x5/0x20 >>>>>>>> [ 39.706731] ? mp_map_pin_to_irq+0xb7/0x2c0 >>>>>>>> [ 39.706735] ? acpi_register_gsi_ioapic+0x86/0x150 >>>>>>>> [ 39.706739] ? pci_conf1_read+0x9f/0xf0 >>>>>>>> [ 39.706743] ? pci_bus_read_config_word+0x2e/0x40 >>>>>>>> [ 39.706746] local_pci_probe+0x1b/0x40 >>>>>>>> [ 39.706750] work_for_cpu_fn+0xb/0x20 >>>>>>>> [ 39.706754] process_one_work+0x1ec/0x350 >>>>>>>> [ 39.706758] worker_thread+0x24b/0x4d0 >>>>>>>> [ 39.706760] ? process_one_work+0x350/0x350 >>>>>>>> [ 39.706762] kthread+0xea/0x120 >>>>>>>> [ 39.706766] ? kthread_park+0x80/0x80 >>>>>>>> [ 39.706770] ret_from_fork+0x1f/0x30 >>>>>>>> [ 39.706774] ---[ end trace 7a203f3ec972a377 ]--- >>>>>>>> >>>>>>>> Martin >>>>>>>> >>>>>>>> >>>>>>>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote: >>>>>>>>> >>>>>>>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to >>>>>>>>> determine if the kthread owns this napi and could call napi->poll() on >>>>>>>>> it. However, if socket busy poll is enabled, it is possible that the >>>>>>>>> busy poll thread grabs this SCHED bit (after the previous napi->poll() >>>>>>>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll >>>>>>>>> on the same napi. napi_disable() could grab the SCHED bit as well. >>>>>>>>> This patch tries to fix this race by adding a new bit >>>>>>>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in >>>>>>>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared >>>>>>>>> in napi_complete_done(), and we only poll the napi in kthread if this >>>>>>>>> bit is set. This helps distinguish the ownership of the napi between >>>>>>>>> kthread and other scenarios and fixes the race issue. >>>>>>>>> >>>>>>>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") >>>>>>>>> Reported-by: Martin Zaharinov <micron10@gmail.com> >>>>>>>>> Suggested-by: Jakub Kicinski <kuba@kernel.org> >>>>>>>>> Signed-off-by: Wei Wang <weiwan@google.com> >>>>>>>>> Cc: Alexander Duyck <alexanderduyck@fb.com> >>>>>>>>> Cc: Eric Dumazet <edumazet@google.com> >>>>>>>>> Cc: Paolo Abeni <pabeni@redhat.com> >>>>>>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> >>>>>>>>> --- >>>>>>>>> Change since v3: >>>>>>>>> - Add READ_ONCE() for thread->state and add comments in >>>>>>>>> ____napi_schedule(). >>>>>>>>> >>>>>>>>> include/linux/netdevice.h | 2 ++ >>>>>>>>> net/core/dev.c | 19 ++++++++++++++++++- >>>>>>>>> 2 files changed, 20 insertions(+), 1 deletion(-) >>>>>>>>> >>>>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>>>>>>>> index 5b67ea89d5f2..87a5d186faff 100644 >>>>>>>>> --- a/include/linux/netdevice.h >>>>>>>>> +++ b/include/linux/netdevice.h >>>>>>>>> @@ -360,6 +360,7 @@ enum { >>>>>>>>> NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */ >>>>>>>>> NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/ >>>>>>>>> NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/ >>>>>>>>> + NAPI_STATE_SCHED_THREADED, /* Napi is currently scheduled in threaded mode */ >>>>>>>>> }; >>>>>>>>> >>>>>>>>> enum { >>>>>>>>> @@ -372,6 +373,7 @@ enum { >>>>>>>>> NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL), >>>>>>>>> NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL), >>>>>>>>> NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED), >>>>>>>>> + NAPIF_STATE_SCHED_THREADED = BIT(NAPI_STATE_SCHED_THREADED), >>>>>>>>> }; >>>>>>>>> >>>>>>>>> enum gro_result { >>>>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c >>>>>>>>> index 6c5967e80132..d3195a95f30e 100644 >>>>>>>>> --- a/net/core/dev.c >>>>>>>>> +++ b/net/core/dev.c >>>>>>>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd, >>>>>>>>> */ >>>>>>>>> thread = READ_ONCE(napi->thread); >>>>>>>>> if (thread) { >>>>>>>>> + /* Avoid doing set_bit() if the thread is in >>>>>>>>> + * INTERRUPTIBLE state, cause napi_thread_wait() >>>>>>>>> + * makes sure to proceed with napi polling >>>>>>>>> + * if the thread is explicitly woken from here. >>>>>>>>> + */ >>>>>>>>> + if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE) >>>>>>>>> + set_bit(NAPI_STATE_SCHED_THREADED, &napi->state); >>>>>>>>> wake_up_process(thread); >>>>>>>>> return; >>>>>>>>> } >>>>>>>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done) >>>>>>>>> WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED)); >>>>>>>>> >>>>>>>>> new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED | >>>>>>>>> + NAPIF_STATE_SCHED_THREADED | >>>>>>>>> NAPIF_STATE_PREFER_BUSY_POLL); >>>>>>>>> >>>>>>>>> /* If STATE_MISSED was set, leave STATE_SCHED set, >>>>>>>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) >>>>>>>>> >>>>>>>>> static int napi_thread_wait(struct napi_struct *napi) >>>>>>>>> { >>>>>>>>> + bool woken = false; >>>>>>>>> + >>>>>>>>> set_current_state(TASK_INTERRUPTIBLE); >>>>>>>>> >>>>>>>>> while (!kthread_should_stop() && !napi_disable_pending(napi)) { >>>>>>>>> - if (test_bit(NAPI_STATE_SCHED, &napi->state)) { >>>>>>>>> + /* Testing SCHED_THREADED bit here to make sure the current >>>>>>>>> + * kthread owns this napi and could poll on this napi. >>>>>>>>> + * Testing SCHED bit is not enough because SCHED bit might be >>>>>>>>> + * set by some other busy poll thread or by napi_disable(). >>>>>>>>> + */ >>>>>>>>> + if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) { >>>>>>>>> WARN_ON(!list_empty(&napi->poll_list)); >>>>>>>>> __set_current_state(TASK_RUNNING); >>>>>>>>> return 0; >>>>>>>>> } >>>>>>>>> >>>>>>>>> schedule(); >>>>>>>>> + /* woken being true indicates this thread owns this napi. */ >>>>>>>>> + woken = true; >>>>>>>>> set_current_state(TASK_INTERRUPTIBLE); >>>>>>>>> } >>>>>>>>> __set_current_state(TASK_RUNNING); >>>>>>>>> -- >>>>>>>>> 2.31.0.rc2.261.g7f71774620-goog >>>>>>>>> >>>>>>>> >>>>>> >>>> >>
Hey Wai If you find any fix for this write me to test . kthread is a very good solution for network load server but need to find from where is come this bug . Martin > On 22 Sep 2021, at 17:12, Martin Zaharinov <micron10@gmail.com> wrote: > > Hi Wei > > One more bug report from last hours: > > > > Sep 9 12:49:31 [829553.899833][ T2925] ------------[ cut here ]------------ > Sep 9 12:49:31 [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158 > Sep 9 12:49:31 [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90 > Sep 9 12:49:31 [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] > Sep 9 12:49:31 [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G O 5.13.13 #1 > Sep 9 12:49:31 [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 > Sep 9 12:49:31 [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90 > Sep 9 12:49:31 [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6 > Sep 9 12:49:31 [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286 > Sep 9 12:49:31 [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001 > Sep 9 12:49:32 [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff > Sep 9 12:49:32 [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff > Sep 9 12:49:32 [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00 > Sep 9 12:49:32 [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00 > Sep 9 12:49:32 [829554.744221][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 > Sep 9 12:49:32 [829554.795701][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Sep 9 12:49:32 [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0 > Sep 9 12:49:32 [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Sep 9 12:49:32 [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Sep 9 12:49:32 [829554.972250][ T2925] Call Trace: > Sep 9 12:49:32 [829554.996597][ T2925] netif_receive_skb_list_internal+0x25c/0x2b0 > Sep 9 12:49:32 [829555.021270][ T2925] busy_poll_stop+0x113/0x140 > Sep 9 12:49:32 [829555.045679][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 > Sep 9 12:49:32 [829555.069833][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] > Sep 9 12:49:32 [829555.093659][ T2925] napi_busy_loop+0x212/0x280 > Sep 9 12:49:32 [829555.117046][ T2925] ep_poll+0xba/0x380 > Sep 9 12:49:32 [829555.140048][ T2925] ? __napi_poll+0x1f/0x100 > Sep 9 12:49:32 [829555.162477][ T2925] do_epoll_wait+0xa6/0xc0 > Sep 9 12:49:32 [829555.184504][ T2925] do_epoll_pwait.part.0+0x9/0x70 > Sep 9 12:49:32 [829555.206138][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 > Sep 9 12:49:32 [829555.227619][ T2925] ? do_syscall_64+0x3a/0x70 > Sep 9 12:49:32 [829555.248592][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae > Sep 9 12:49:32 [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]--- > Sep 9 12:49:32 [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9 > Sep 9 12:49:32 [829555.357231][ T2925] #PF: supervisor read access in kernel mode > Sep 9 12:49:32 [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page > Sep 9 12:49:32 [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0 > Sep 9 12:49:32 [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI > Sep 9 12:49:32 [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G W O 5.13.13 #1 > Sep 9 12:49:32 [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 > Sep 9 12:49:32 [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 > Sep 9 12:49:33 [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 > Sep 9 12:49:33 [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 > Sep 9 12:49:33 [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 > Sep 9 12:49:33 [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 > Sep 9 12:49:33 [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 > Sep 9 12:49:33 [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 > Sep 9 12:49:33 [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 > Sep 9 12:49:33 [829555.797754][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 > Sep 9 12:49:33 [829555.843229][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Sep 9 12:49:33 [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 > Sep 9 12:49:33 [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Sep 9 12:49:33 [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Sep 9 12:49:33 [829556.008563][ T2925] Call Trace: > Sep 9 12:49:33 [829556.032547][ T2925] ? enqueue_to_backlog+0x81/0x250 > Sep 9 12:49:33 [829556.056686][ T2925] netif_receive_skb_list_internal+0x24d/0x2b0 > Sep 9 12:49:33 [829556.080870][ T2925] busy_poll_stop+0x113/0x140 > Sep 9 12:49:33 [829556.104559][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 > Sep 9 12:49:33 [829556.128028][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] > Sep 9 12:49:33 [829556.151405][ T2925] napi_busy_loop+0x212/0x280 > Sep 9 12:49:33 [829556.174478][ T2925] ep_poll+0xba/0x380 > Sep 9 12:49:33 [829556.196887][ T2925] ? __napi_poll+0x1f/0x100 > Sep 9 12:49:33 [829556.219070][ T2925] do_epoll_wait+0xa6/0xc0 > Sep 9 12:49:33 [829556.240778][ T2925] do_epoll_pwait.part.0+0x9/0x70 > Sep 9 12:49:33 [829556.262203][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 > Sep 9 12:49:33 [829556.283188][ T2925] ? do_syscall_64+0x3a/0x70 > Sep 9 12:49:33 [829556.303666][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae > Sep 9 12:49:33 [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] > Sep 9 12:49:33 [829556.487037][ T2925] CR2: 00000000000496c9 > Sep 9 12:49:33 [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]--- > Sep 9 12:49:33 [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 > Sep 9 12:49:34 [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 > Sep 9 12:49:34 [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 > Sep 9 12:49:34 [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 > Sep 9 12:49:34 [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 > Sep 9 12:49:34 [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 > Sep 9 12:49:34 [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 > Sep 9 12:49:34 [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 > Sep 9 12:49:34 [829556.831749][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 > Sep 9 12:49:34 [829556.879295][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Sep 9 12:49:34 [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 > Sep 9 12:49:34 [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Sep 9 12:49:34 [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Sep 9 12:49:34 [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt > Sep 9 12:49:34 [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > Sep 9 12:49:34 [829557.231174][ T2925] Rebooting in 10 seconds.. > Sep 9 12:49:44 [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG. > >> On 15 Sep 2021, at 18:45, Wei Wang <weiwan@google.com> wrote: >> >> Thanks Martin for the report. >> Without a reproducer, it might be hard to debug. I will double check >> the code to check for potential race between kthread poll and busy >> poll. >> >> Thanks. >> Wei >> >> On Wed, Sep 15, 2021 at 7:22 AM Martin Zaharinov <micron10@gmail.com> wrote: >>> >>> Hi Wei >>> Please see this bug log : >>> >>> >>> Sep 15 08:04:56 [2034411.548669][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> Sep 15 08:04:56 [2034411.574642][ T3195] CR2: 00007f1e8cf58000 CR3: 0000000183cb4003 CR4: 00000000001706e0 >>> Sep 15 08:04:56 [2034411.625156][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> Sep 15 08:04:56 [2034411.675495][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>> Sep 15 08:04:56 [2034411.725536][ T3195] Call Trace: >>> Sep 15 08:04:56 [2034411.749948][ T3195] netif_receive_skb_list_internal+0x25c/0x2b0 >>> Sep 15 08:04:56 [2034411.774579][ T3195] gro_normal_one+0x6e/0x90 >>> Sep 15 08:04:56 [2034411.798786][ T3195] napi_gro_flush+0xb1/0x100 >>> Sep 15 08:04:56 [2034411.822410][ T3195] napi_complete_done+0x107/0x180 >>> Sep 15 08:04:56 [2034411.845614][ T3195] ixgbe_poll+0x10e/0x240 [ixgbe] >>> Sep 15 08:04:56 [2034411.868480][ T3195] __napi_poll+0x1f/0x100 >>> Sep 15 08:04:56 [2034411.890899][ T3195] ? __napi_poll+0x100/0x100 >>> Sep 15 08:04:56 [2034411.912799][ T3195] napi_threaded_poll+0x105/0x150 >>> Sep 15 08:04:56 [2034411.934567][ T3195] kthread+0x101/0x120 >>> Sep 15 08:04:56 [2034411.955873][ T3195] ? set_kthread_struct+0x30/0x30 >>> Sep 15 08:04:56 [2034411.977157][ T3195] ret_from_fork+0x1f/0x30 >>> Sep 15 08:04:56 [2034411.997922][ T3195] ---[ end trace 83b8d17d2762bc73 ]--- >>> Sep 15 08:04:56 [2034412.018439][ T3195] BUG: kernel NULL pointer dereference, address: 0000000000000000 >>> Sep 15 08:04:56 [2034412.058658][ T3195] #PF: supervisor read access in kernel mode >>> Sep 15 08:04:56 [2034412.078866][ T3195] #PF: error_code(0x0000) - not-present page >>> Sep 15 08:04:56 [2034412.098648][ T3195] PGD 12fabb067 P4D 12fabb067 PUD 17d7fa067 PMD 0 >>> Sep 15 08:04:56 [2034412.118230][ T3195] Oops: 0000 [#1] SMP NOPTI >>> Sep 15 08:04:56 [2034412.137240][ T3195] CPU: 2 PID: 3195 Comm: napi/eth0-513 Tainted: G S W O 5.13.12 #1 >>> Sep 15 08:04:56 [2034412.174538][ T3195] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 >>> Sep 15 08:04:56 [2034412.211613][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0 >>> Sep 15 08:04:56 [2034412.230852][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49 >>> Sep 15 08:04:56 [2034412.287806][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296 >>> Sep 15 08:04:56 [2034412.306525][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 >>> Sep 15 08:04:56 [2034412.343308][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003 >>> Sep 15 08:04:56 [2034412.381399][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff >>> Sep 15 08:04:56 [2034412.421437][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00 >>> Sep 15 08:04:57 [2034412.463438][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00 >>> Sep 15 08:04:57 [2034412.507493][ T3195] FS: 0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000 >>> Sep 15 08:04:57 [2034412.553528][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> Sep 15 08:04:57 [2034412.577424][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0 >>> Sep 15 08:04:57 [2034412.624853][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> Sep 15 08:04:57 [2034412.672633][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>> Sep 15 08:04:57 [2034412.721656][ T3195] Call Trace: >>> Sep 15 08:04:57 [2034412.746016][ T3195] gro_normal_one+0x6e/0x90 >>> Sep 15 08:04:57 [2034412.770321][ T3195] napi_gro_flush+0xb1/0x100 >>> Sep 15 08:04:57 [2034412.794137][ T3195] napi_complete_done+0x107/0x180 >>> Sep 15 08:04:57 [2034412.817556][ T3195] ixgbe_poll+0x10e/0x240 [ixgbe] >>> Sep 15 08:04:57 [2034412.840522][ T3195] __napi_poll+0x1f/0x100 >>> Sep 15 08:04:57 [2034412.862829][ T3195] ? __napi_poll+0x100/0x100 >>> Sep 15 08:04:57 [2034412.884804][ T3195] napi_threaded_poll+0x105/0x150 >>> Sep 15 08:04:57 [2034412.906305][ T3195] kthread+0x101/0x120 >>> Sep 15 08:04:57 [2034412.927502][ T3195] ? set_kthread_struct+0x30/0x30 >>> Sep 15 08:04:57 [2034412.948434][ T3195] ret_from_fork+0x1f/0x30 >>> Sep 15 08:04:57 [2034412.969117][ T3195] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_dtvqos(O) xt_TCPMSS xt_comment iptable_mangle xt_MASQUERADE xt_nat iptable_nat ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: DTVIAC] >>> Sep 15 08:04:57 [2034413.136792][ T3195] CR2: 0000000000000000 >>> Sep 15 08:04:57 [2034413.157289][ T3195] ---[ end trace 83b8d17d2762bc74 ]--- >>> Sep 15 08:04:57 [2034413.177534][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0 >>> Sep 15 08:04:57 [2034413.197775][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49 >>> Sep 15 08:04:57 [2034413.258524][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296 >>> Sep 15 08:04:57 [2034413.278591][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 >>> Sep 15 08:04:57 [2034413.317532][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003 >>> Sep 15 08:04:57 [2034413.356936][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff >>> Sep 15 08:04:57 [2034413.398408][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00 >>> Sep 15 08:04:58 [2034413.441949][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00 >>> Sep 15 08:04:58 [2034413.487558][ T3195] FS: 0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000 >>> Sep 15 08:04:58 [2034413.535263][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> Sep 15 08:04:58 [2034413.560012][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0 >>> Sep 15 08:04:58 [2034413.609277][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> Sep 15 08:04:58 [2034413.658501][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>> Sep 15 08:04:58 [2034413.707535][ T3195] Kernel panic - not syncing: Fatal exception in interrupt >>> Sep 15 08:04:58 [2034413.856962][ T3195] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) >>> Sep 15 08:04:58 [2034413.906445][ T3195] Rebooting in 10 seconds.. >>> Sep 15 08:05:08 [2034423.930880][ T3195] ACPI MEMORY or I/O RESET_REG. >>> >>> >>> >>> >>>> On 10 Sep 2021, at 3:30, Wei Wang <weiwan@google.com> wrote: >>>> >>>> Hi Martin, >>>> >>>> Is there a reproducer for this? What kind of traffic is it running? >>>> What is the following config: >>>> cat /proc/sys/net/core/busy_poll >>>> cat /proc/sys/net/core/busy_read >>>> cat /sys/class/net/<ixgbe_dev>/threaded >>>> And is SO_PREFER_BUSY_POLL used? >>>> >>>> Thanks. >>>> Wei >>>> >>>> >>>> >>>> On Thu, Sep 9, 2021 at 4:18 AM Martin Zaharinov <micron10@gmail.com> wrote: >>>>> >>>>> Hi Eric and Wei >>>>> >>>>> Please see this bug report from last hour , >>>>> Kernel 5.13.13, Traffic is 7Gb/s down/ 7Gb/s up >>>>> Uptime before crash : 10day >>>>> >>>>> >>>>> >>>>> >>>>> Sep 9 12:49:31 [829553.899833][ T2925] ------------[ cut here ]------------ >>>>> Sep 9 12:49:31 [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158 >>>>> Sep 9 12:49:31 [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90 >>>>> Sep 9 12:49:31 [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] >>>>> Sep 9 12:49:31 [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G O 5.13.13 #1 >>>>> Sep 9 12:49:31 [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 >>>>> Sep 9 12:49:31 [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90 >>>>> Sep 9 12:49:31 [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6 >>>>> Sep 9 12:49:31 [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286 >>>>> Sep 9 12:49:31 [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001 >>>>> Sep 9 12:49:32 [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff >>>>> Sep 9 12:49:32 [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff >>>>> Sep 9 12:49:32 [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00 >>>>> Sep 9 12:49:32 [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00 >>>>> Sep 9 12:49:32 [829554.744221][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >>>>> Sep 9 12:49:32 [829554.795701][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> Sep 9 12:49:32 [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0 >>>>> Sep 9 12:49:32 [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>> Sep 9 12:49:32 [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>>> Sep 9 12:49:32 [829554.972250][ T2925] Call Trace: >>>>> Sep 9 12:49:32 [829554.996597][ T2925] netif_receive_skb_list_internal+0x25c/0x2b0 >>>>> Sep 9 12:49:32 [829555.021270][ T2925] busy_poll_stop+0x113/0x140 >>>>> Sep 9 12:49:32 [829555.045679][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 >>>>> Sep 9 12:49:32 [829555.069833][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] >>>>> Sep 9 12:49:32 [829555.093659][ T2925] napi_busy_loop+0x212/0x280 >>>>> Sep 9 12:49:32 [829555.117046][ T2925] ep_poll+0xba/0x380 >>>>> Sep 9 12:49:32 [829555.140048][ T2925] ? __napi_poll+0x1f/0x100 >>>>> Sep 9 12:49:32 [829555.162477][ T2925] do_epoll_wait+0xa6/0xc0 >>>>> Sep 9 12:49:32 [829555.184504][ T2925] do_epoll_pwait.part.0+0x9/0x70 >>>>> Sep 9 12:49:32 [829555.206138][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 >>>>> Sep 9 12:49:32 [829555.227619][ T2925] ? do_syscall_64+0x3a/0x70 >>>>> Sep 9 12:49:32 [829555.248592][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae >>>>> Sep 9 12:49:32 [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]--- >>>>> Sep 9 12:49:32 [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9 >>>>> Sep 9 12:49:32 [829555.357231][ T2925] #PF: supervisor read access in kernel mode >>>>> Sep 9 12:49:32 [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page >>>>> Sep 9 12:49:32 [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0 >>>>> Sep 9 12:49:32 [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI >>>>> Sep 9 12:49:32 [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G W O 5.13.13 #1 >>>>> Sep 9 12:49:32 [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 >>>>> Sep 9 12:49:32 [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 >>>>> Sep 9 12:49:33 [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 >>>>> Sep 9 12:49:33 [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 >>>>> Sep 9 12:49:33 [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 >>>>> Sep 9 12:49:33 [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 >>>>> Sep 9 12:49:33 [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 >>>>> Sep 9 12:49:33 [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 >>>>> Sep 9 12:49:33 [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 >>>>> Sep 9 12:49:33 [829555.797754][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >>>>> Sep 9 12:49:33 [829555.843229][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> Sep 9 12:49:33 [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 >>>>> Sep 9 12:49:33 [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>> Sep 9 12:49:33 [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>>> Sep 9 12:49:33 [829556.008563][ T2925] Call Trace: >>>>> Sep 9 12:49:33 [829556.032547][ T2925] ? enqueue_to_backlog+0x81/0x250 >>>>> Sep 9 12:49:33 [829556.056686][ T2925] netif_receive_skb_list_internal+0x24d/0x2b0 >>>>> Sep 9 12:49:33 [829556.080870][ T2925] busy_poll_stop+0x113/0x140 >>>>> Sep 9 12:49:33 [829556.104559][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 >>>>> Sep 9 12:49:33 [829556.128028][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] >>>>> Sep 9 12:49:33 [829556.151405][ T2925] napi_busy_loop+0x212/0x280 >>>>> Sep 9 12:49:33 [829556.174478][ T2925] ep_poll+0xba/0x380 >>>>> Sep 9 12:49:33 [829556.196887][ T2925] ? __napi_poll+0x1f/0x100 >>>>> Sep 9 12:49:33 [829556.219070][ T2925] do_epoll_wait+0xa6/0xc0 >>>>> Sep 9 12:49:33 [829556.240778][ T2925] do_epoll_pwait.part.0+0x9/0x70 >>>>> Sep 9 12:49:33 [829556.262203][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 >>>>> Sep 9 12:49:33 [829556.283188][ T2925] ? do_syscall_64+0x3a/0x70 >>>>> Sep 9 12:49:33 [829556.303666][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae >>>>> Sep 9 12:49:33 [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] >>>>> Sep 9 12:49:33 [829556.487037][ T2925] CR2: 00000000000496c9 >>>>> Sep 9 12:49:33 [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]--- >>>>> Sep 9 12:49:33 [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 >>>>> Sep 9 12:49:34 [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 >>>>> Sep 9 12:49:34 [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 >>>>> Sep 9 12:49:34 [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 >>>>> Sep 9 12:49:34 [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 >>>>> Sep 9 12:49:34 [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 >>>>> Sep 9 12:49:34 [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 >>>>> Sep 9 12:49:34 [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 >>>>> Sep 9 12:49:34 [829556.831749][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >>>>> Sep 9 12:49:34 [829556.879295][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> Sep 9 12:49:34 [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 >>>>> Sep 9 12:49:34 [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>> Sep 9 12:49:34 [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>>> Sep 9 12:49:34 [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt >>>>> Sep 9 12:49:34 [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) >>>>> Sep 9 12:49:34 [829557.231174][ T2925] Rebooting in 10 seconds.. >>>>> Sep 9 12:49:44 [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG. >>>>> >>>>>> On 30 Mar 2021, at 16:39, Eric Dumazet <edumazet@google.com> wrote: >>>>>> >>>>>> On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote: >>>>>>> >>>>>>> Hi Eric and Wei >>>>>>> >>>>>>> Please check this log : >>>>>>> >>>>>> >>>>>> Please send a normal report to netdev. >>>>>> >>>>>> This has nothing to to with us (Eric & Wei) >>>>>> >>>>>> Thanks. >>>>>> >>>>>>> >>>>>>> 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null) >>>>>>> [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G O 5.11.4 #1 >>>>>>> [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 >>>>>>> [1584289.107263] Call Trace: >>>>>>> [1584289.107266] dump_stack+0x58/0x6b >>>>>>> [1584289.209562] warn_alloc.cold+0x70/0xd4 >>>>>>> [1584289.209569] __alloc_pages_slowpath.constprop.0+0xd57/0xfb0 >>>>>>> [1584289.209574] __alloc_pages_nodemask+0x15a/0x180 >>>>>>> [1584289.474009] allocate_slab+0x272/0x450 >>>>>>> [1584289.496731] ___slab_alloc.constprop.0+0x41e/0x4d0 >>>>>>> [1584289.519147] kmem_cache_alloc+0x110/0x120 >>>>>>> [1584289.541416] build_skb+0x1a/0x200 >>>>>>> [1584289.563121] ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe] >>>>>>> [1584289.584618] ixgbe_poll+0xeb/0x2a0 [ixgbe] >>>>>>> [1584289.605528] __napi_poll+0x1f/0x130 >>>>>>> [1584289.625842] napi_threaded_poll+0x110/0x160 >>>>>>> [1584289.646110] ? __napi_poll+0x130/0x130 >>>>>>> [1584289.665810] kthread+0xea/0x120 >>>>>>> [1584289.684836] ? kthread_park+0x80/0x80 >>>>>>> [1584289.703440] ret_from_fork+0x1f/0x30 >>>>>>> [1584289.721616] Mem-Info: >>>>>>> [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0 >>>>>>> active_file:17408 inactive_file:149 isolated_file:32 >>>>>>> unevictable:1440359 dirty:17500 writeback:0 >>>>>>> slab_reclaimable:43368 slab_unreclaimable:155124 >>>>>>> mapped:817431 shmem:7650 pagetables:32093 bounce:0 >>>>>>> free:17832 free_pcp:113 free_cma:0 >>>>>>> [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no >>>>>>> [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB >>>>>>> [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726 >>>>>>> [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB >>>>>>> [1584290.104980] lowmem_reserve[]: 0 0 13985 13985 >>>>>>> [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB >>>>>>> [1584290.237051] lowmem_reserve[]: 0 0 0 0 >>>>>>> [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB >>>>>>> [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB >>>>>>> [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB >>>>>>> [1584290.409087] 1465768 total pagecache pages >>>>>>> [1584290.434531] 4165289 pages RAM >>>>>>> [1584290.459616] 0 pages HighMem/MovableOnly >>>>>>> [1584290.484480] 104766 pages reserved >>>>>>> [1584290.508709] 0 pages hwpoisoned >>>>>>> [1584301.710231] team0: Failed to send options change via netlink (err -105) >>>>>>> [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0 >>>>>>> [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G O 5.11.4 #1 >>>>>>> [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 >>>>>>> [1584302.776532] Call Trace: >>>>>>> [1584302.799361] dump_stack+0x58/0x6b >>>>>>> [1584302.821791] dump_header+0x4c/0x2e6 >>>>>>> [1584302.843580] oom_kill_process.cold+0xb/0x10 >>>>>>> [1584302.865223] out_of_memory.part.0+0x125/0x5f0 >>>>>>> [1584302.886641] out_of_memory+0x54/0xa0 >>>>>>> [1584302.907302] __alloc_pages_slowpath.constprop.0+0xb03/0xfb0 >>>>>>> [1584302.927913] __alloc_pages_nodemask+0x15a/0x180 >>>>>>> [1584302.947874] __get_free_pages+0x8/0x30 >>>>>>> [1584302.967246] pgd_alloc+0x21/0x180 >>>>>>> [1584302.986355] mm_alloc+0x1af/0x250 >>>>>>> [1584303.005085] alloc_bprm+0x80/0x2a0 >>>>>>> [1584303.023328] do_execveat_common+0x8b/0x330 >>>>>>> [1584303.041181] __x64_sys_execve+0x2b/0x40 >>>>>>> [1584303.058513] do_syscall_64+0x2d/0x40 >>>>>>> [1584303.075281] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>>>>> [1584303.091891] RIP: 0033:0x488376 >>>>>>> [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30 >>>>>>> [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b >>>>>>> [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376 >>>>>>> [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660 >>>>>>> [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000 >>>>>>> [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258 >>>>>>> [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100 >>>>>>> [1584303.379094] Mem-Info: >>>>>>> [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0 >>>>>>> active_file:12975 inactive_file:168 isolated_file:32 >>>>>>> unevictable:909709 dirty:12864 writeback:10 >>>>>>> slab_reclaimable:42415 slab_unreclaimable:154783 >>>>>>> mapped:39825 shmem:14744 pagetables:26041 bounce:0 >>>>>>> free:537002 free_pcp:1813 free_cma:0 >>>>>>> [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no >>>>>>> [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB >>>>>>> [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726 >>>>>>> [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB >>>>>>> [1584303.888935] lowmem_reserve[]: 0 0 13985 13985 >>>>>>> [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB >>>>>>> [1584304.036531] lowmem_reserve[]: 0 0 0 0 >>>>>>> [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB >>>>>>> [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB >>>>>>> [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB >>>>>>> [1584304.287094] 933871 total pagecache pages >>>>>>> [1584304.312815] 4165289 pages RAM >>>>>>> [1584304.337915] 0 pages HighMem/MovableOnly >>>>>>> [1584304.362522] 104766 pages reserved >>>>>>> [1584304.386516] 0 pages hwpoisoned >>>>>>> >>>>>>>> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote: >>>>>>>> >>>>>>>> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote: >>>>>>>>> >>>>>>>>> Hi Wei >>>>>>>>> Check this: >>>>>>>>> >>>>>>>>> [ 39.706567] ------------[ cut here ]------------ >>>>>>>>> [ 39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557) >>>>>>>>> [ 39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100 >>>>>>>> >>>>>>>> Probably more relevant to Intel maintainers than Wei :/ >>>>>>>> >>>>>>>>> [ 39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas >>>>>>>>> [ 39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G O 5.11.7 #1 >>>>>>>>> [ 39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020 >>>>>>>>> [ 39.706619] Workqueue: events work_for_cpu_fn >>>>>>>>> [ 39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100 >>>>>>>>> [ 39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 >>>>>>>>> [ 39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292 >>>>>>>>> [ 39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff >>>>>>>>> [ 39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001 >>>>>>>>> [ 39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea >>>>>>>>> [ 39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008 >>>>>>>>> [ 39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000 >>>>>>>>> [ 39.706646] FS: 0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000 >>>>>>>>> [ 39.706649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>>>>> [ 39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0 >>>>>>>>> [ 39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>>>>> [ 39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>>>>>>> [ 39.706656] Call Trace: >>>>>>>>> [ 39.706658] i40e_setup_pf_switch+0x617/0xf80 [i40e] >>>>>>>>> [ 39.706683] i40e_probe.part.0.cold+0x8dc/0x109e [i40e] >>>>>>>>> [ 39.706708] ? acpi_ns_check_object_type+0xd4/0x193 >>>>>>>>> [ 39.706713] ? acpi_ns_check_package_list+0xfd/0x205 >>>>>>>>> [ 39.706716] ? __kmalloc+0x37/0x160 >>>>>>>>> [ 39.706720] ? kmem_cache_alloc+0xcb/0x120 >>>>>>>>> [ 39.706723] ? irq_get_irq_data+0x5/0x20 >>>>>>>>> [ 39.706726] ? mp_check_pin_attr+0xe/0xf0 >>>>>>>>> [ 39.706729] ? irq_get_irq_data+0x5/0x20 >>>>>>>>> [ 39.706731] ? mp_map_pin_to_irq+0xb7/0x2c0 >>>>>>>>> [ 39.706735] ? acpi_register_gsi_ioapic+0x86/0x150 >>>>>>>>> [ 39.706739] ? pci_conf1_read+0x9f/0xf0 >>>>>>>>> [ 39.706743] ? pci_bus_read_config_word+0x2e/0x40 >>>>>>>>> [ 39.706746] local_pci_probe+0x1b/0x40 >>>>>>>>> [ 39.706750] work_for_cpu_fn+0xb/0x20 >>>>>>>>> [ 39.706754] process_one_work+0x1ec/0x350 >>>>>>>>> [ 39.706758] worker_thread+0x24b/0x4d0 >>>>>>>>> [ 39.706760] ? process_one_work+0x350/0x350 >>>>>>>>> [ 39.706762] kthread+0xea/0x120 >>>>>>>>> [ 39.706766] ? kthread_park+0x80/0x80 >>>>>>>>> [ 39.706770] ret_from_fork+0x1f/0x30 >>>>>>>>> [ 39.706774] ---[ end trace 7a203f3ec972a377 ]--- >>>>>>>>> >>>>>>>>> Martin >>>>>>>>> >>>>>>>>> >>>>>>>>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote: >>>>>>>>>> >>>>>>>>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to >>>>>>>>>> determine if the kthread owns this napi and could call napi->poll() on >>>>>>>>>> it. However, if socket busy poll is enabled, it is possible that the >>>>>>>>>> busy poll thread grabs this SCHED bit (after the previous napi->poll() >>>>>>>>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll >>>>>>>>>> on the same napi. napi_disable() could grab the SCHED bit as well. >>>>>>>>>> This patch tries to fix this race by adding a new bit >>>>>>>>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in >>>>>>>>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared >>>>>>>>>> in napi_complete_done(), and we only poll the napi in kthread if this >>>>>>>>>> bit is set. This helps distinguish the ownership of the napi between >>>>>>>>>> kthread and other scenarios and fixes the race issue. >>>>>>>>>> >>>>>>>>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") >>>>>>>>>> Reported-by: Martin Zaharinov <micron10@gmail.com> >>>>>>>>>> Suggested-by: Jakub Kicinski <kuba@kernel.org> >>>>>>>>>> Signed-off-by: Wei Wang <weiwan@google.com> >>>>>>>>>> Cc: Alexander Duyck <alexanderduyck@fb.com> >>>>>>>>>> Cc: Eric Dumazet <edumazet@google.com> >>>>>>>>>> Cc: Paolo Abeni <pabeni@redhat.com> >>>>>>>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> >>>>>>>>>> --- >>>>>>>>>> Change since v3: >>>>>>>>>> - Add READ_ONCE() for thread->state and add comments in >>>>>>>>>> ____napi_schedule(). >>>>>>>>>> >>>>>>>>>> include/linux/netdevice.h | 2 ++ >>>>>>>>>> net/core/dev.c | 19 ++++++++++++++++++- >>>>>>>>>> 2 files changed, 20 insertions(+), 1 deletion(-) >>>>>>>>>> >>>>>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>>>>>>>>> index 5b67ea89d5f2..87a5d186faff 100644 >>>>>>>>>> --- a/include/linux/netdevice.h >>>>>>>>>> +++ b/include/linux/netdevice.h >>>>>>>>>> @@ -360,6 +360,7 @@ enum { >>>>>>>>>> NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */ >>>>>>>>>> NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/ >>>>>>>>>> NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/ >>>>>>>>>> + NAPI_STATE_SCHED_THREADED, /* Napi is currently scheduled in threaded mode */ >>>>>>>>>> }; >>>>>>>>>> >>>>>>>>>> enum { >>>>>>>>>> @@ -372,6 +373,7 @@ enum { >>>>>>>>>> NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL), >>>>>>>>>> NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL), >>>>>>>>>> NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED), >>>>>>>>>> + NAPIF_STATE_SCHED_THREADED = BIT(NAPI_STATE_SCHED_THREADED), >>>>>>>>>> }; >>>>>>>>>> >>>>>>>>>> enum gro_result { >>>>>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c >>>>>>>>>> index 6c5967e80132..d3195a95f30e 100644 >>>>>>>>>> --- a/net/core/dev.c >>>>>>>>>> +++ b/net/core/dev.c >>>>>>>>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd, >>>>>>>>>> */ >>>>>>>>>> thread = READ_ONCE(napi->thread); >>>>>>>>>> if (thread) { >>>>>>>>>> + /* Avoid doing set_bit() if the thread is in >>>>>>>>>> + * INTERRUPTIBLE state, cause napi_thread_wait() >>>>>>>>>> + * makes sure to proceed with napi polling >>>>>>>>>> + * if the thread is explicitly woken from here. >>>>>>>>>> + */ >>>>>>>>>> + if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE) >>>>>>>>>> + set_bit(NAPI_STATE_SCHED_THREADED, &napi->state); >>>>>>>>>> wake_up_process(thread); >>>>>>>>>> return; >>>>>>>>>> } >>>>>>>>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done) >>>>>>>>>> WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED)); >>>>>>>>>> >>>>>>>>>> new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED | >>>>>>>>>> + NAPIF_STATE_SCHED_THREADED | >>>>>>>>>> NAPIF_STATE_PREFER_BUSY_POLL); >>>>>>>>>> >>>>>>>>>> /* If STATE_MISSED was set, leave STATE_SCHED set, >>>>>>>>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) >>>>>>>>>> >>>>>>>>>> static int napi_thread_wait(struct napi_struct *napi) >>>>>>>>>> { >>>>>>>>>> + bool woken = false; >>>>>>>>>> + >>>>>>>>>> set_current_state(TASK_INTERRUPTIBLE); >>>>>>>>>> >>>>>>>>>> while (!kthread_should_stop() && !napi_disable_pending(napi)) { >>>>>>>>>> - if (test_bit(NAPI_STATE_SCHED, &napi->state)) { >>>>>>>>>> + /* Testing SCHED_THREADED bit here to make sure the current >>>>>>>>>> + * kthread owns this napi and could poll on this napi. >>>>>>>>>> + * Testing SCHED bit is not enough because SCHED bit might be >>>>>>>>>> + * set by some other busy poll thread or by napi_disable(). >>>>>>>>>> + */ >>>>>>>>>> + if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) { >>>>>>>>>> WARN_ON(!list_empty(&napi->poll_list)); >>>>>>>>>> __set_current_state(TASK_RUNNING); >>>>>>>>>> return 0; >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> schedule(); >>>>>>>>>> + /* woken being true indicates this thread owns this napi. */ >>>>>>>>>> + woken = true; >>>>>>>>>> set_current_state(TASK_INTERRUPTIBLE); >>>>>>>>>> } >>>>>>>>>> __set_current_state(TASK_RUNNING); >>>>>>>>>> -- >>>>>>>>>> 2.31.0.rc2.261.g7f71774620-goog >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>> >
Hi Martin, It looks like there might still be a race between kthread polling and busy polling. I am looking into the code but was not able to identify the cause. May I ask why you need to enable both at the same time? Thanks. Wei On Thu, Sep 23, 2021 at 1:31 PM Martin Zaharinov <micron10@gmail.com> wrote: > > Hey Wai > > If you find any fix for this write me to test . > > kthread is a very good solution for network load server but need to find from where is come this bug . > > > Martin > > > On 22 Sep 2021, at 17:12, Martin Zaharinov <micron10@gmail.com> wrote: > > > > Hi Wei > > > > One more bug report from last hours: > > > > > > > > Sep 9 12:49:31 [829553.899833][ T2925] ------------[ cut here ]------------ > > Sep 9 12:49:31 [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158 > > Sep 9 12:49:31 [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90 > > Sep 9 12:49:31 [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] > > Sep 9 12:49:31 [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G O 5.13.13 #1 > > Sep 9 12:49:31 [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 > > Sep 9 12:49:31 [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90 > > Sep 9 12:49:31 [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6 > > Sep 9 12:49:31 [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286 > > Sep 9 12:49:31 [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001 > > Sep 9 12:49:32 [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff > > Sep 9 12:49:32 [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff > > Sep 9 12:49:32 [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00 > > Sep 9 12:49:32 [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00 > > Sep 9 12:49:32 [829554.744221][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 > > Sep 9 12:49:32 [829554.795701][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > Sep 9 12:49:32 [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0 > > Sep 9 12:49:32 [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > Sep 9 12:49:32 [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > Sep 9 12:49:32 [829554.972250][ T2925] Call Trace: > > Sep 9 12:49:32 [829554.996597][ T2925] netif_receive_skb_list_internal+0x25c/0x2b0 > > Sep 9 12:49:32 [829555.021270][ T2925] busy_poll_stop+0x113/0x140 > > Sep 9 12:49:32 [829555.045679][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 > > Sep 9 12:49:32 [829555.069833][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] > > Sep 9 12:49:32 [829555.093659][ T2925] napi_busy_loop+0x212/0x280 > > Sep 9 12:49:32 [829555.117046][ T2925] ep_poll+0xba/0x380 > > Sep 9 12:49:32 [829555.140048][ T2925] ? __napi_poll+0x1f/0x100 > > Sep 9 12:49:32 [829555.162477][ T2925] do_epoll_wait+0xa6/0xc0 > > Sep 9 12:49:32 [829555.184504][ T2925] do_epoll_pwait.part.0+0x9/0x70 > > Sep 9 12:49:32 [829555.206138][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 > > Sep 9 12:49:32 [829555.227619][ T2925] ? do_syscall_64+0x3a/0x70 > > Sep 9 12:49:32 [829555.248592][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae > > Sep 9 12:49:32 [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]--- > > Sep 9 12:49:32 [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9 > > Sep 9 12:49:32 [829555.357231][ T2925] #PF: supervisor read access in kernel mode > > Sep 9 12:49:32 [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page > > Sep 9 12:49:32 [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0 > > Sep 9 12:49:32 [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI > > Sep 9 12:49:32 [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G W O 5.13.13 #1 > > Sep 9 12:49:32 [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 > > Sep 9 12:49:32 [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 > > Sep 9 12:49:33 [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 > > Sep 9 12:49:33 [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 > > Sep 9 12:49:33 [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 > > Sep 9 12:49:33 [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 > > Sep 9 12:49:33 [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 > > Sep 9 12:49:33 [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 > > Sep 9 12:49:33 [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 > > Sep 9 12:49:33 [829555.797754][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 > > Sep 9 12:49:33 [829555.843229][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > Sep 9 12:49:33 [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 > > Sep 9 12:49:33 [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > Sep 9 12:49:33 [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > Sep 9 12:49:33 [829556.008563][ T2925] Call Trace: > > Sep 9 12:49:33 [829556.032547][ T2925] ? enqueue_to_backlog+0x81/0x250 > > Sep 9 12:49:33 [829556.056686][ T2925] netif_receive_skb_list_internal+0x24d/0x2b0 > > Sep 9 12:49:33 [829556.080870][ T2925] busy_poll_stop+0x113/0x140 > > Sep 9 12:49:33 [829556.104559][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 > > Sep 9 12:49:33 [829556.128028][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] > > Sep 9 12:49:33 [829556.151405][ T2925] napi_busy_loop+0x212/0x280 > > Sep 9 12:49:33 [829556.174478][ T2925] ep_poll+0xba/0x380 > > Sep 9 12:49:33 [829556.196887][ T2925] ? __napi_poll+0x1f/0x100 > > Sep 9 12:49:33 [829556.219070][ T2925] do_epoll_wait+0xa6/0xc0 > > Sep 9 12:49:33 [829556.240778][ T2925] do_epoll_pwait.part.0+0x9/0x70 > > Sep 9 12:49:33 [829556.262203][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 > > Sep 9 12:49:33 [829556.283188][ T2925] ? do_syscall_64+0x3a/0x70 > > Sep 9 12:49:33 [829556.303666][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae > > Sep 9 12:49:33 [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] > > Sep 9 12:49:33 [829556.487037][ T2925] CR2: 00000000000496c9 > > Sep 9 12:49:33 [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]--- > > Sep 9 12:49:33 [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 > > Sep 9 12:49:34 [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 > > Sep 9 12:49:34 [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 > > Sep 9 12:49:34 [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 > > Sep 9 12:49:34 [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 > > Sep 9 12:49:34 [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 > > Sep 9 12:49:34 [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 > > Sep 9 12:49:34 [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 > > Sep 9 12:49:34 [829556.831749][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 > > Sep 9 12:49:34 [829556.879295][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > Sep 9 12:49:34 [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 > > Sep 9 12:49:34 [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > Sep 9 12:49:34 [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > Sep 9 12:49:34 [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt > > Sep 9 12:49:34 [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > Sep 9 12:49:34 [829557.231174][ T2925] Rebooting in 10 seconds.. > > Sep 9 12:49:44 [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG. > > > >> On 15 Sep 2021, at 18:45, Wei Wang <weiwan@google.com> wrote: > >> > >> Thanks Martin for the report. > >> Without a reproducer, it might be hard to debug. I will double check > >> the code to check for potential race between kthread poll and busy > >> poll. > >> > >> Thanks. > >> Wei > >> > >> On Wed, Sep 15, 2021 at 7:22 AM Martin Zaharinov <micron10@gmail.com> wrote: > >>> > >>> Hi Wei > >>> Please see this bug log : > >>> > >>> > >>> Sep 15 08:04:56 [2034411.548669][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>> Sep 15 08:04:56 [2034411.574642][ T3195] CR2: 00007f1e8cf58000 CR3: 0000000183cb4003 CR4: 00000000001706e0 > >>> Sep 15 08:04:56 [2034411.625156][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>> Sep 15 08:04:56 [2034411.675495][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>> Sep 15 08:04:56 [2034411.725536][ T3195] Call Trace: > >>> Sep 15 08:04:56 [2034411.749948][ T3195] netif_receive_skb_list_internal+0x25c/0x2b0 > >>> Sep 15 08:04:56 [2034411.774579][ T3195] gro_normal_one+0x6e/0x90 > >>> Sep 15 08:04:56 [2034411.798786][ T3195] napi_gro_flush+0xb1/0x100 > >>> Sep 15 08:04:56 [2034411.822410][ T3195] napi_complete_done+0x107/0x180 > >>> Sep 15 08:04:56 [2034411.845614][ T3195] ixgbe_poll+0x10e/0x240 [ixgbe] > >>> Sep 15 08:04:56 [2034411.868480][ T3195] __napi_poll+0x1f/0x100 > >>> Sep 15 08:04:56 [2034411.890899][ T3195] ? __napi_poll+0x100/0x100 > >>> Sep 15 08:04:56 [2034411.912799][ T3195] napi_threaded_poll+0x105/0x150 > >>> Sep 15 08:04:56 [2034411.934567][ T3195] kthread+0x101/0x120 > >>> Sep 15 08:04:56 [2034411.955873][ T3195] ? set_kthread_struct+0x30/0x30 > >>> Sep 15 08:04:56 [2034411.977157][ T3195] ret_from_fork+0x1f/0x30 > >>> Sep 15 08:04:56 [2034411.997922][ T3195] ---[ end trace 83b8d17d2762bc73 ]--- > >>> Sep 15 08:04:56 [2034412.018439][ T3195] BUG: kernel NULL pointer dereference, address: 0000000000000000 > >>> Sep 15 08:04:56 [2034412.058658][ T3195] #PF: supervisor read access in kernel mode > >>> Sep 15 08:04:56 [2034412.078866][ T3195] #PF: error_code(0x0000) - not-present page > >>> Sep 15 08:04:56 [2034412.098648][ T3195] PGD 12fabb067 P4D 12fabb067 PUD 17d7fa067 PMD 0 > >>> Sep 15 08:04:56 [2034412.118230][ T3195] Oops: 0000 [#1] SMP NOPTI > >>> Sep 15 08:04:56 [2034412.137240][ T3195] CPU: 2 PID: 3195 Comm: napi/eth0-513 Tainted: G S W O 5.13.12 #1 > >>> Sep 15 08:04:56 [2034412.174538][ T3195] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 > >>> Sep 15 08:04:56 [2034412.211613][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0 > >>> Sep 15 08:04:56 [2034412.230852][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49 > >>> Sep 15 08:04:56 [2034412.287806][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296 > >>> Sep 15 08:04:56 [2034412.306525][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 > >>> Sep 15 08:04:56 [2034412.343308][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003 > >>> Sep 15 08:04:56 [2034412.381399][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff > >>> Sep 15 08:04:56 [2034412.421437][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00 > >>> Sep 15 08:04:57 [2034412.463438][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00 > >>> Sep 15 08:04:57 [2034412.507493][ T3195] FS: 0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000 > >>> Sep 15 08:04:57 [2034412.553528][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>> Sep 15 08:04:57 [2034412.577424][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0 > >>> Sep 15 08:04:57 [2034412.624853][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>> Sep 15 08:04:57 [2034412.672633][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>> Sep 15 08:04:57 [2034412.721656][ T3195] Call Trace: > >>> Sep 15 08:04:57 [2034412.746016][ T3195] gro_normal_one+0x6e/0x90 > >>> Sep 15 08:04:57 [2034412.770321][ T3195] napi_gro_flush+0xb1/0x100 > >>> Sep 15 08:04:57 [2034412.794137][ T3195] napi_complete_done+0x107/0x180 > >>> Sep 15 08:04:57 [2034412.817556][ T3195] ixgbe_poll+0x10e/0x240 [ixgbe] > >>> Sep 15 08:04:57 [2034412.840522][ T3195] __napi_poll+0x1f/0x100 > >>> Sep 15 08:04:57 [2034412.862829][ T3195] ? __napi_poll+0x100/0x100 > >>> Sep 15 08:04:57 [2034412.884804][ T3195] napi_threaded_poll+0x105/0x150 > >>> Sep 15 08:04:57 [2034412.906305][ T3195] kthread+0x101/0x120 > >>> Sep 15 08:04:57 [2034412.927502][ T3195] ? set_kthread_struct+0x30/0x30 > >>> Sep 15 08:04:57 [2034412.948434][ T3195] ret_from_fork+0x1f/0x30 > >>> Sep 15 08:04:57 [2034412.969117][ T3195] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_dtvqos(O) xt_TCPMSS xt_comment iptable_mangle xt_MASQUERADE xt_nat iptable_nat ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: DTVIAC] > >>> Sep 15 08:04:57 [2034413.136792][ T3195] CR2: 0000000000000000 > >>> Sep 15 08:04:57 [2034413.157289][ T3195] ---[ end trace 83b8d17d2762bc74 ]--- > >>> Sep 15 08:04:57 [2034413.177534][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0 > >>> Sep 15 08:04:57 [2034413.197775][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49 > >>> Sep 15 08:04:57 [2034413.258524][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296 > >>> Sep 15 08:04:57 [2034413.278591][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 > >>> Sep 15 08:04:57 [2034413.317532][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003 > >>> Sep 15 08:04:57 [2034413.356936][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff > >>> Sep 15 08:04:57 [2034413.398408][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00 > >>> Sep 15 08:04:58 [2034413.441949][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00 > >>> Sep 15 08:04:58 [2034413.487558][ T3195] FS: 0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000 > >>> Sep 15 08:04:58 [2034413.535263][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>> Sep 15 08:04:58 [2034413.560012][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0 > >>> Sep 15 08:04:58 [2034413.609277][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>> Sep 15 08:04:58 [2034413.658501][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>> Sep 15 08:04:58 [2034413.707535][ T3195] Kernel panic - not syncing: Fatal exception in interrupt > >>> Sep 15 08:04:58 [2034413.856962][ T3195] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > >>> Sep 15 08:04:58 [2034413.906445][ T3195] Rebooting in 10 seconds.. > >>> Sep 15 08:05:08 [2034423.930880][ T3195] ACPI MEMORY or I/O RESET_REG. > >>> > >>> > >>> > >>> > >>>> On 10 Sep 2021, at 3:30, Wei Wang <weiwan@google.com> wrote: > >>>> > >>>> Hi Martin, > >>>> > >>>> Is there a reproducer for this? What kind of traffic is it running? > >>>> What is the following config: > >>>> cat /proc/sys/net/core/busy_poll > >>>> cat /proc/sys/net/core/busy_read > >>>> cat /sys/class/net/<ixgbe_dev>/threaded > >>>> And is SO_PREFER_BUSY_POLL used? > >>>> > >>>> Thanks. > >>>> Wei > >>>> > >>>> > >>>> > >>>> On Thu, Sep 9, 2021 at 4:18 AM Martin Zaharinov <micron10@gmail.com> wrote: > >>>>> > >>>>> Hi Eric and Wei > >>>>> > >>>>> Please see this bug report from last hour , > >>>>> Kernel 5.13.13, Traffic is 7Gb/s down/ 7Gb/s up > >>>>> Uptime before crash : 10day > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> Sep 9 12:49:31 [829553.899833][ T2925] ------------[ cut here ]------------ > >>>>> Sep 9 12:49:31 [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158 > >>>>> Sep 9 12:49:31 [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90 > >>>>> Sep 9 12:49:31 [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] > >>>>> Sep 9 12:49:31 [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G O 5.13.13 #1 > >>>>> Sep 9 12:49:31 [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 > >>>>> Sep 9 12:49:31 [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90 > >>>>> Sep 9 12:49:31 [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6 > >>>>> Sep 9 12:49:31 [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286 > >>>>> Sep 9 12:49:31 [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001 > >>>>> Sep 9 12:49:32 [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff > >>>>> Sep 9 12:49:32 [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff > >>>>> Sep 9 12:49:32 [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00 > >>>>> Sep 9 12:49:32 [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00 > >>>>> Sep 9 12:49:32 [829554.744221][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 > >>>>> Sep 9 12:49:32 [829554.795701][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>>> Sep 9 12:49:32 [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0 > >>>>> Sep 9 12:49:32 [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>>> Sep 9 12:49:32 [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>>>> Sep 9 12:49:32 [829554.972250][ T2925] Call Trace: > >>>>> Sep 9 12:49:32 [829554.996597][ T2925] netif_receive_skb_list_internal+0x25c/0x2b0 > >>>>> Sep 9 12:49:32 [829555.021270][ T2925] busy_poll_stop+0x113/0x140 > >>>>> Sep 9 12:49:32 [829555.045679][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 > >>>>> Sep 9 12:49:32 [829555.069833][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] > >>>>> Sep 9 12:49:32 [829555.093659][ T2925] napi_busy_loop+0x212/0x280 > >>>>> Sep 9 12:49:32 [829555.117046][ T2925] ep_poll+0xba/0x380 > >>>>> Sep 9 12:49:32 [829555.140048][ T2925] ? __napi_poll+0x1f/0x100 > >>>>> Sep 9 12:49:32 [829555.162477][ T2925] do_epoll_wait+0xa6/0xc0 > >>>>> Sep 9 12:49:32 [829555.184504][ T2925] do_epoll_pwait.part.0+0x9/0x70 > >>>>> Sep 9 12:49:32 [829555.206138][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 > >>>>> Sep 9 12:49:32 [829555.227619][ T2925] ? do_syscall_64+0x3a/0x70 > >>>>> Sep 9 12:49:32 [829555.248592][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae > >>>>> Sep 9 12:49:32 [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]--- > >>>>> Sep 9 12:49:32 [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9 > >>>>> Sep 9 12:49:32 [829555.357231][ T2925] #PF: supervisor read access in kernel mode > >>>>> Sep 9 12:49:32 [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page > >>>>> Sep 9 12:49:32 [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0 > >>>>> Sep 9 12:49:32 [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI > >>>>> Sep 9 12:49:32 [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G W O 5.13.13 #1 > >>>>> Sep 9 12:49:32 [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 > >>>>> Sep 9 12:49:32 [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 > >>>>> Sep 9 12:49:33 [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 > >>>>> Sep 9 12:49:33 [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 > >>>>> Sep 9 12:49:33 [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 > >>>>> Sep 9 12:49:33 [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 > >>>>> Sep 9 12:49:33 [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 > >>>>> Sep 9 12:49:33 [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 > >>>>> Sep 9 12:49:33 [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 > >>>>> Sep 9 12:49:33 [829555.797754][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 > >>>>> Sep 9 12:49:33 [829555.843229][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>>> Sep 9 12:49:33 [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 > >>>>> Sep 9 12:49:33 [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>>> Sep 9 12:49:33 [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>>>> Sep 9 12:49:33 [829556.008563][ T2925] Call Trace: > >>>>> Sep 9 12:49:33 [829556.032547][ T2925] ? enqueue_to_backlog+0x81/0x250 > >>>>> Sep 9 12:49:33 [829556.056686][ T2925] netif_receive_skb_list_internal+0x24d/0x2b0 > >>>>> Sep 9 12:49:33 [829556.080870][ T2925] busy_poll_stop+0x113/0x140 > >>>>> Sep 9 12:49:33 [829556.104559][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 > >>>>> Sep 9 12:49:33 [829556.128028][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] > >>>>> Sep 9 12:49:33 [829556.151405][ T2925] napi_busy_loop+0x212/0x280 > >>>>> Sep 9 12:49:33 [829556.174478][ T2925] ep_poll+0xba/0x380 > >>>>> Sep 9 12:49:33 [829556.196887][ T2925] ? __napi_poll+0x1f/0x100 > >>>>> Sep 9 12:49:33 [829556.219070][ T2925] do_epoll_wait+0xa6/0xc0 > >>>>> Sep 9 12:49:33 [829556.240778][ T2925] do_epoll_pwait.part.0+0x9/0x70 > >>>>> Sep 9 12:49:33 [829556.262203][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 > >>>>> Sep 9 12:49:33 [829556.283188][ T2925] ? do_syscall_64+0x3a/0x70 > >>>>> Sep 9 12:49:33 [829556.303666][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae > >>>>> Sep 9 12:49:33 [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] > >>>>> Sep 9 12:49:33 [829556.487037][ T2925] CR2: 00000000000496c9 > >>>>> Sep 9 12:49:33 [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]--- > >>>>> Sep 9 12:49:33 [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 > >>>>> Sep 9 12:49:34 [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 > >>>>> Sep 9 12:49:34 [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 > >>>>> Sep 9 12:49:34 [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 > >>>>> Sep 9 12:49:34 [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 > >>>>> Sep 9 12:49:34 [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 > >>>>> Sep 9 12:49:34 [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 > >>>>> Sep 9 12:49:34 [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 > >>>>> Sep 9 12:49:34 [829556.831749][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 > >>>>> Sep 9 12:49:34 [829556.879295][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>>> Sep 9 12:49:34 [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 > >>>>> Sep 9 12:49:34 [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>>> Sep 9 12:49:34 [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>>>> Sep 9 12:49:34 [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt > >>>>> Sep 9 12:49:34 [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > >>>>> Sep 9 12:49:34 [829557.231174][ T2925] Rebooting in 10 seconds.. > >>>>> Sep 9 12:49:44 [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG. > >>>>> > >>>>>> On 30 Mar 2021, at 16:39, Eric Dumazet <edumazet@google.com> wrote: > >>>>>> > >>>>>> On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote: > >>>>>>> > >>>>>>> Hi Eric and Wei > >>>>>>> > >>>>>>> Please check this log : > >>>>>>> > >>>>>> > >>>>>> Please send a normal report to netdev. > >>>>>> > >>>>>> This has nothing to to with us (Eric & Wei) > >>>>>> > >>>>>> Thanks. > >>>>>> > >>>>>>> > >>>>>>> 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null) > >>>>>>> [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G O 5.11.4 #1 > >>>>>>> [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 > >>>>>>> [1584289.107263] Call Trace: > >>>>>>> [1584289.107266] dump_stack+0x58/0x6b > >>>>>>> [1584289.209562] warn_alloc.cold+0x70/0xd4 > >>>>>>> [1584289.209569] __alloc_pages_slowpath.constprop.0+0xd57/0xfb0 > >>>>>>> [1584289.209574] __alloc_pages_nodemask+0x15a/0x180 > >>>>>>> [1584289.474009] allocate_slab+0x272/0x450 > >>>>>>> [1584289.496731] ___slab_alloc.constprop.0+0x41e/0x4d0 > >>>>>>> [1584289.519147] kmem_cache_alloc+0x110/0x120 > >>>>>>> [1584289.541416] build_skb+0x1a/0x200 > >>>>>>> [1584289.563121] ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe] > >>>>>>> [1584289.584618] ixgbe_poll+0xeb/0x2a0 [ixgbe] > >>>>>>> [1584289.605528] __napi_poll+0x1f/0x130 > >>>>>>> [1584289.625842] napi_threaded_poll+0x110/0x160 > >>>>>>> [1584289.646110] ? __napi_poll+0x130/0x130 > >>>>>>> [1584289.665810] kthread+0xea/0x120 > >>>>>>> [1584289.684836] ? kthread_park+0x80/0x80 > >>>>>>> [1584289.703440] ret_from_fork+0x1f/0x30 > >>>>>>> [1584289.721616] Mem-Info: > >>>>>>> [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0 > >>>>>>> active_file:17408 inactive_file:149 isolated_file:32 > >>>>>>> unevictable:1440359 dirty:17500 writeback:0 > >>>>>>> slab_reclaimable:43368 slab_unreclaimable:155124 > >>>>>>> mapped:817431 shmem:7650 pagetables:32093 bounce:0 > >>>>>>> free:17832 free_pcp:113 free_cma:0 > >>>>>>> [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no > >>>>>>> [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > >>>>>>> [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726 > >>>>>>> [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB > >>>>>>> [1584290.104980] lowmem_reserve[]: 0 0 13985 13985 > >>>>>>> [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB > >>>>>>> [1584290.237051] lowmem_reserve[]: 0 0 0 0 > >>>>>>> [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB > >>>>>>> [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB > >>>>>>> [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB > >>>>>>> [1584290.409087] 1465768 total pagecache pages > >>>>>>> [1584290.434531] 4165289 pages RAM > >>>>>>> [1584290.459616] 0 pages HighMem/MovableOnly > >>>>>>> [1584290.484480] 104766 pages reserved > >>>>>>> [1584290.508709] 0 pages hwpoisoned > >>>>>>> [1584301.710231] team0: Failed to send options change via netlink (err -105) > >>>>>>> [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0 > >>>>>>> [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G O 5.11.4 #1 > >>>>>>> [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 > >>>>>>> [1584302.776532] Call Trace: > >>>>>>> [1584302.799361] dump_stack+0x58/0x6b > >>>>>>> [1584302.821791] dump_header+0x4c/0x2e6 > >>>>>>> [1584302.843580] oom_kill_process.cold+0xb/0x10 > >>>>>>> [1584302.865223] out_of_memory.part.0+0x125/0x5f0 > >>>>>>> [1584302.886641] out_of_memory+0x54/0xa0 > >>>>>>> [1584302.907302] __alloc_pages_slowpath.constprop.0+0xb03/0xfb0 > >>>>>>> [1584302.927913] __alloc_pages_nodemask+0x15a/0x180 > >>>>>>> [1584302.947874] __get_free_pages+0x8/0x30 > >>>>>>> [1584302.967246] pgd_alloc+0x21/0x180 > >>>>>>> [1584302.986355] mm_alloc+0x1af/0x250 > >>>>>>> [1584303.005085] alloc_bprm+0x80/0x2a0 > >>>>>>> [1584303.023328] do_execveat_common+0x8b/0x330 > >>>>>>> [1584303.041181] __x64_sys_execve+0x2b/0x40 > >>>>>>> [1584303.058513] do_syscall_64+0x2d/0x40 > >>>>>>> [1584303.075281] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>>>>>> [1584303.091891] RIP: 0033:0x488376 > >>>>>>> [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30 > >>>>>>> [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b > >>>>>>> [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376 > >>>>>>> [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660 > >>>>>>> [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000 > >>>>>>> [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258 > >>>>>>> [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100 > >>>>>>> [1584303.379094] Mem-Info: > >>>>>>> [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0 > >>>>>>> active_file:12975 inactive_file:168 isolated_file:32 > >>>>>>> unevictable:909709 dirty:12864 writeback:10 > >>>>>>> slab_reclaimable:42415 slab_unreclaimable:154783 > >>>>>>> mapped:39825 shmem:14744 pagetables:26041 bounce:0 > >>>>>>> free:537002 free_pcp:1813 free_cma:0 > >>>>>>> [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no > >>>>>>> [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > >>>>>>> [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726 > >>>>>>> [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB > >>>>>>> [1584303.888935] lowmem_reserve[]: 0 0 13985 13985 > >>>>>>> [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB > >>>>>>> [1584304.036531] lowmem_reserve[]: 0 0 0 0 > >>>>>>> [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB > >>>>>>> [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB > >>>>>>> [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB > >>>>>>> [1584304.287094] 933871 total pagecache pages > >>>>>>> [1584304.312815] 4165289 pages RAM > >>>>>>> [1584304.337915] 0 pages HighMem/MovableOnly > >>>>>>> [1584304.362522] 104766 pages reserved > >>>>>>> [1584304.386516] 0 pages hwpoisoned > >>>>>>> > >>>>>>>> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote: > >>>>>>>> > >>>>>>>> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote: > >>>>>>>>> > >>>>>>>>> Hi Wei > >>>>>>>>> Check this: > >>>>>>>>> > >>>>>>>>> [ 39.706567] ------------[ cut here ]------------ > >>>>>>>>> [ 39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557) > >>>>>>>>> [ 39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100 > >>>>>>>> > >>>>>>>> Probably more relevant to Intel maintainers than Wei :/ > >>>>>>>> > >>>>>>>>> [ 39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas > >>>>>>>>> [ 39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G O 5.11.7 #1 > >>>>>>>>> [ 39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020 > >>>>>>>>> [ 39.706619] Workqueue: events work_for_cpu_fn > >>>>>>>>> [ 39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100 > >>>>>>>>> [ 39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 > >>>>>>>>> [ 39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292 > >>>>>>>>> [ 39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff > >>>>>>>>> [ 39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001 > >>>>>>>>> [ 39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea > >>>>>>>>> [ 39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008 > >>>>>>>>> [ 39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000 > >>>>>>>>> [ 39.706646] FS: 0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000 > >>>>>>>>> [ 39.706649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>>>>>>> [ 39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0 > >>>>>>>>> [ 39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>>>>>>> [ 39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>>>>>>>> [ 39.706656] Call Trace: > >>>>>>>>> [ 39.706658] i40e_setup_pf_switch+0x617/0xf80 [i40e] > >>>>>>>>> [ 39.706683] i40e_probe.part.0.cold+0x8dc/0x109e [i40e] > >>>>>>>>> [ 39.706708] ? acpi_ns_check_object_type+0xd4/0x193 > >>>>>>>>> [ 39.706713] ? acpi_ns_check_package_list+0xfd/0x205 > >>>>>>>>> [ 39.706716] ? __kmalloc+0x37/0x160 > >>>>>>>>> [ 39.706720] ? kmem_cache_alloc+0xcb/0x120 > >>>>>>>>> [ 39.706723] ? irq_get_irq_data+0x5/0x20 > >>>>>>>>> [ 39.706726] ? mp_check_pin_attr+0xe/0xf0 > >>>>>>>>> [ 39.706729] ? irq_get_irq_data+0x5/0x20 > >>>>>>>>> [ 39.706731] ? mp_map_pin_to_irq+0xb7/0x2c0 > >>>>>>>>> [ 39.706735] ? acpi_register_gsi_ioapic+0x86/0x150 > >>>>>>>>> [ 39.706739] ? pci_conf1_read+0x9f/0xf0 > >>>>>>>>> [ 39.706743] ? pci_bus_read_config_word+0x2e/0x40 > >>>>>>>>> [ 39.706746] local_pci_probe+0x1b/0x40 > >>>>>>>>> [ 39.706750] work_for_cpu_fn+0xb/0x20 > >>>>>>>>> [ 39.706754] process_one_work+0x1ec/0x350 > >>>>>>>>> [ 39.706758] worker_thread+0x24b/0x4d0 > >>>>>>>>> [ 39.706760] ? process_one_work+0x350/0x350 > >>>>>>>>> [ 39.706762] kthread+0xea/0x120 > >>>>>>>>> [ 39.706766] ? kthread_park+0x80/0x80 > >>>>>>>>> [ 39.706770] ret_from_fork+0x1f/0x30 > >>>>>>>>> [ 39.706774] ---[ end trace 7a203f3ec972a377 ]--- > >>>>>>>>> > >>>>>>>>> Martin > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote: > >>>>>>>>>> > >>>>>>>>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to > >>>>>>>>>> determine if the kthread owns this napi and could call napi->poll() on > >>>>>>>>>> it. However, if socket busy poll is enabled, it is possible that the > >>>>>>>>>> busy poll thread grabs this SCHED bit (after the previous napi->poll() > >>>>>>>>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll > >>>>>>>>>> on the same napi. napi_disable() could grab the SCHED bit as well. > >>>>>>>>>> This patch tries to fix this race by adding a new bit > >>>>>>>>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in > >>>>>>>>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared > >>>>>>>>>> in napi_complete_done(), and we only poll the napi in kthread if this > >>>>>>>>>> bit is set. This helps distinguish the ownership of the napi between > >>>>>>>>>> kthread and other scenarios and fixes the race issue. > >>>>>>>>>> > >>>>>>>>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") > >>>>>>>>>> Reported-by: Martin Zaharinov <micron10@gmail.com> > >>>>>>>>>> Suggested-by: Jakub Kicinski <kuba@kernel.org> > >>>>>>>>>> Signed-off-by: Wei Wang <weiwan@google.com> > >>>>>>>>>> Cc: Alexander Duyck <alexanderduyck@fb.com> > >>>>>>>>>> Cc: Eric Dumazet <edumazet@google.com> > >>>>>>>>>> Cc: Paolo Abeni <pabeni@redhat.com> > >>>>>>>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> > >>>>>>>>>> --- > >>>>>>>>>> Change since v3: > >>>>>>>>>> - Add READ_ONCE() for thread->state and add comments in > >>>>>>>>>> ____napi_schedule(). > >>>>>>>>>> > >>>>>>>>>> include/linux/netdevice.h | 2 ++ > >>>>>>>>>> net/core/dev.c | 19 ++++++++++++++++++- > >>>>>>>>>> 2 files changed, 20 insertions(+), 1 deletion(-) > >>>>>>>>>> > >>>>>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > >>>>>>>>>> index 5b67ea89d5f2..87a5d186faff 100644 > >>>>>>>>>> --- a/include/linux/netdevice.h > >>>>>>>>>> +++ b/include/linux/netdevice.h > >>>>>>>>>> @@ -360,6 +360,7 @@ enum { > >>>>>>>>>> NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */ > >>>>>>>>>> NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/ > >>>>>>>>>> NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/ > >>>>>>>>>> + NAPI_STATE_SCHED_THREADED, /* Napi is currently scheduled in threaded mode */ > >>>>>>>>>> }; > >>>>>>>>>> > >>>>>>>>>> enum { > >>>>>>>>>> @@ -372,6 +373,7 @@ enum { > >>>>>>>>>> NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL), > >>>>>>>>>> NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL), > >>>>>>>>>> NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED), > >>>>>>>>>> + NAPIF_STATE_SCHED_THREADED = BIT(NAPI_STATE_SCHED_THREADED), > >>>>>>>>>> }; > >>>>>>>>>> > >>>>>>>>>> enum gro_result { > >>>>>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c > >>>>>>>>>> index 6c5967e80132..d3195a95f30e 100644 > >>>>>>>>>> --- a/net/core/dev.c > >>>>>>>>>> +++ b/net/core/dev.c > >>>>>>>>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd, > >>>>>>>>>> */ > >>>>>>>>>> thread = READ_ONCE(napi->thread); > >>>>>>>>>> if (thread) { > >>>>>>>>>> + /* Avoid doing set_bit() if the thread is in > >>>>>>>>>> + * INTERRUPTIBLE state, cause napi_thread_wait() > >>>>>>>>>> + * makes sure to proceed with napi polling > >>>>>>>>>> + * if the thread is explicitly woken from here. > >>>>>>>>>> + */ > >>>>>>>>>> + if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE) > >>>>>>>>>> + set_bit(NAPI_STATE_SCHED_THREADED, &napi->state); > >>>>>>>>>> wake_up_process(thread); > >>>>>>>>>> return; > >>>>>>>>>> } > >>>>>>>>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done) > >>>>>>>>>> WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED)); > >>>>>>>>>> > >>>>>>>>>> new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED | > >>>>>>>>>> + NAPIF_STATE_SCHED_THREADED | > >>>>>>>>>> NAPIF_STATE_PREFER_BUSY_POLL); > >>>>>>>>>> > >>>>>>>>>> /* If STATE_MISSED was set, leave STATE_SCHED set, > >>>>>>>>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) > >>>>>>>>>> > >>>>>>>>>> static int napi_thread_wait(struct napi_struct *napi) > >>>>>>>>>> { > >>>>>>>>>> + bool woken = false; > >>>>>>>>>> + > >>>>>>>>>> set_current_state(TASK_INTERRUPTIBLE); > >>>>>>>>>> > >>>>>>>>>> while (!kthread_should_stop() && !napi_disable_pending(napi)) { > >>>>>>>>>> - if (test_bit(NAPI_STATE_SCHED, &napi->state)) { > >>>>>>>>>> + /* Testing SCHED_THREADED bit here to make sure the current > >>>>>>>>>> + * kthread owns this napi and could poll on this napi. > >>>>>>>>>> + * Testing SCHED bit is not enough because SCHED bit might be > >>>>>>>>>> + * set by some other busy poll thread or by napi_disable(). > >>>>>>>>>> + */ > >>>>>>>>>> + if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) { > >>>>>>>>>> WARN_ON(!list_empty(&napi->poll_list)); > >>>>>>>>>> __set_current_state(TASK_RUNNING); > >>>>>>>>>> return 0; > >>>>>>>>>> } > >>>>>>>>>> > >>>>>>>>>> schedule(); > >>>>>>>>>> + /* woken being true indicates this thread owns this napi. */ > >>>>>>>>>> + woken = true; > >>>>>>>>>> set_current_state(TASK_INTERRUPTIBLE); > >>>>>>>>>> } > >>>>>>>>>> __set_current_state(TASK_RUNNING); > >>>>>>>>>> -- > >>>>>>>>>> 2.31.0.rc2.261.g7f71774620-goog > >>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>> > >>> > > >
Hi Wei I think we discussed it somewhere here that it should be enabled: cat /proc/sys/net/core/busy_poll - 50 cat /proc/sys/net/core/busy_read - 50 and one more : “ packet receipt: high-latency interrupt-based -------------------> poll-based Busy polling helps reduce latency in the network receive path by • allowing socket layer code to poll the receive queue of a network device, • and disable network interrupts. This eliminates • delays caused by the interrupts • and the resultant context switches However, it • increses CPU utilization. • Also prevent the CPU from sleeping, which can incur additional power comsumption. Busy polling is disabled by default. Set net.core.busy_poll to a value other than 0 to enable it. This parameter controls the number of microseconds to wait for packets on the device queue for socket pool and selects. Red Hat recemmends a value of 50. Add the SO_BUSY_POLL socket option to the socket. " do you think it comes from him? Martin > On 24 Sep 2021, at 3:54, Wei Wang <weiwan@google.com> wrote: > > Hi Martin, > > It looks like there might still be a race between kthread polling and > busy polling. I am looking into the code but was not able to identify > the cause. > May I ask why you need to enable both at the same time? > > Thanks. > Wei > > > On Thu, Sep 23, 2021 at 1:31 PM Martin Zaharinov <micron10@gmail.com> wrote: >> >> Hey Wai >> >> If you find any fix for this write me to test . >> >> kthread is a very good solution for network load server but need to find from where is come this bug . >> >> >> Martin >> >>> On 22 Sep 2021, at 17:12, Martin Zaharinov <micron10@gmail.com> wrote: >>> >>> Hi Wei >>> >>> One more bug report from last hours: >>> >>> >>> >>> Sep 9 12:49:31 [829553.899833][ T2925] ------------[ cut here ]------------ >>> Sep 9 12:49:31 [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158 >>> Sep 9 12:49:31 [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90 >>> Sep 9 12:49:31 [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] >>> Sep 9 12:49:31 [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G O 5.13.13 #1 >>> Sep 9 12:49:31 [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 >>> Sep 9 12:49:31 [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90 >>> Sep 9 12:49:31 [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6 >>> Sep 9 12:49:31 [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286 >>> Sep 9 12:49:31 [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001 >>> Sep 9 12:49:32 [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff >>> Sep 9 12:49:32 [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff >>> Sep 9 12:49:32 [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00 >>> Sep 9 12:49:32 [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00 >>> Sep 9 12:49:32 [829554.744221][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >>> Sep 9 12:49:32 [829554.795701][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> Sep 9 12:49:32 [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0 >>> Sep 9 12:49:32 [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> Sep 9 12:49:32 [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>> Sep 9 12:49:32 [829554.972250][ T2925] Call Trace: >>> Sep 9 12:49:32 [829554.996597][ T2925] netif_receive_skb_list_internal+0x25c/0x2b0 >>> Sep 9 12:49:32 [829555.021270][ T2925] busy_poll_stop+0x113/0x140 >>> Sep 9 12:49:32 [829555.045679][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 >>> Sep 9 12:49:32 [829555.069833][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] >>> Sep 9 12:49:32 [829555.093659][ T2925] napi_busy_loop+0x212/0x280 >>> Sep 9 12:49:32 [829555.117046][ T2925] ep_poll+0xba/0x380 >>> Sep 9 12:49:32 [829555.140048][ T2925] ? __napi_poll+0x1f/0x100 >>> Sep 9 12:49:32 [829555.162477][ T2925] do_epoll_wait+0xa6/0xc0 >>> Sep 9 12:49:32 [829555.184504][ T2925] do_epoll_pwait.part.0+0x9/0x70 >>> Sep 9 12:49:32 [829555.206138][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 >>> Sep 9 12:49:32 [829555.227619][ T2925] ? do_syscall_64+0x3a/0x70 >>> Sep 9 12:49:32 [829555.248592][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae >>> Sep 9 12:49:32 [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]--- >>> Sep 9 12:49:32 [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9 >>> Sep 9 12:49:32 [829555.357231][ T2925] #PF: supervisor read access in kernel mode >>> Sep 9 12:49:32 [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page >>> Sep 9 12:49:32 [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0 >>> Sep 9 12:49:32 [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI >>> Sep 9 12:49:32 [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G W O 5.13.13 #1 >>> Sep 9 12:49:32 [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 >>> Sep 9 12:49:32 [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 >>> Sep 9 12:49:33 [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 >>> Sep 9 12:49:33 [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 >>> Sep 9 12:49:33 [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 >>> Sep 9 12:49:33 [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 >>> Sep 9 12:49:33 [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 >>> Sep 9 12:49:33 [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 >>> Sep 9 12:49:33 [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 >>> Sep 9 12:49:33 [829555.797754][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >>> Sep 9 12:49:33 [829555.843229][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> Sep 9 12:49:33 [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 >>> Sep 9 12:49:33 [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> Sep 9 12:49:33 [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>> Sep 9 12:49:33 [829556.008563][ T2925] Call Trace: >>> Sep 9 12:49:33 [829556.032547][ T2925] ? enqueue_to_backlog+0x81/0x250 >>> Sep 9 12:49:33 [829556.056686][ T2925] netif_receive_skb_list_internal+0x24d/0x2b0 >>> Sep 9 12:49:33 [829556.080870][ T2925] busy_poll_stop+0x113/0x140 >>> Sep 9 12:49:33 [829556.104559][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 >>> Sep 9 12:49:33 [829556.128028][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] >>> Sep 9 12:49:33 [829556.151405][ T2925] napi_busy_loop+0x212/0x280 >>> Sep 9 12:49:33 [829556.174478][ T2925] ep_poll+0xba/0x380 >>> Sep 9 12:49:33 [829556.196887][ T2925] ? __napi_poll+0x1f/0x100 >>> Sep 9 12:49:33 [829556.219070][ T2925] do_epoll_wait+0xa6/0xc0 >>> Sep 9 12:49:33 [829556.240778][ T2925] do_epoll_pwait.part.0+0x9/0x70 >>> Sep 9 12:49:33 [829556.262203][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 >>> Sep 9 12:49:33 [829556.283188][ T2925] ? do_syscall_64+0x3a/0x70 >>> Sep 9 12:49:33 [829556.303666][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae >>> Sep 9 12:49:33 [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] >>> Sep 9 12:49:33 [829556.487037][ T2925] CR2: 00000000000496c9 >>> Sep 9 12:49:33 [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]--- >>> Sep 9 12:49:33 [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 >>> Sep 9 12:49:34 [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 >>> Sep 9 12:49:34 [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 >>> Sep 9 12:49:34 [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 >>> Sep 9 12:49:34 [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 >>> Sep 9 12:49:34 [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 >>> Sep 9 12:49:34 [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 >>> Sep 9 12:49:34 [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 >>> Sep 9 12:49:34 [829556.831749][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >>> Sep 9 12:49:34 [829556.879295][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> Sep 9 12:49:34 [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 >>> Sep 9 12:49:34 [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> Sep 9 12:49:34 [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>> Sep 9 12:49:34 [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt >>> Sep 9 12:49:34 [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) >>> Sep 9 12:49:34 [829557.231174][ T2925] Rebooting in 10 seconds.. >>> Sep 9 12:49:44 [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG. >>> >>>> On 15 Sep 2021, at 18:45, Wei Wang <weiwan@google.com> wrote: >>>> >>>> Thanks Martin for the report. >>>> Without a reproducer, it might be hard to debug. I will double check >>>> the code to check for potential race between kthread poll and busy >>>> poll. >>>> >>>> Thanks. >>>> Wei >>>> >>>> On Wed, Sep 15, 2021 at 7:22 AM Martin Zaharinov <micron10@gmail.com> wrote: >>>>> >>>>> Hi Wei >>>>> Please see this bug log : >>>>> >>>>> >>>>> Sep 15 08:04:56 [2034411.548669][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> Sep 15 08:04:56 [2034411.574642][ T3195] CR2: 00007f1e8cf58000 CR3: 0000000183cb4003 CR4: 00000000001706e0 >>>>> Sep 15 08:04:56 [2034411.625156][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>> Sep 15 08:04:56 [2034411.675495][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>>> Sep 15 08:04:56 [2034411.725536][ T3195] Call Trace: >>>>> Sep 15 08:04:56 [2034411.749948][ T3195] netif_receive_skb_list_internal+0x25c/0x2b0 >>>>> Sep 15 08:04:56 [2034411.774579][ T3195] gro_normal_one+0x6e/0x90 >>>>> Sep 15 08:04:56 [2034411.798786][ T3195] napi_gro_flush+0xb1/0x100 >>>>> Sep 15 08:04:56 [2034411.822410][ T3195] napi_complete_done+0x107/0x180 >>>>> Sep 15 08:04:56 [2034411.845614][ T3195] ixgbe_poll+0x10e/0x240 [ixgbe] >>>>> Sep 15 08:04:56 [2034411.868480][ T3195] __napi_poll+0x1f/0x100 >>>>> Sep 15 08:04:56 [2034411.890899][ T3195] ? __napi_poll+0x100/0x100 >>>>> Sep 15 08:04:56 [2034411.912799][ T3195] napi_threaded_poll+0x105/0x150 >>>>> Sep 15 08:04:56 [2034411.934567][ T3195] kthread+0x101/0x120 >>>>> Sep 15 08:04:56 [2034411.955873][ T3195] ? set_kthread_struct+0x30/0x30 >>>>> Sep 15 08:04:56 [2034411.977157][ T3195] ret_from_fork+0x1f/0x30 >>>>> Sep 15 08:04:56 [2034411.997922][ T3195] ---[ end trace 83b8d17d2762bc73 ]--- >>>>> Sep 15 08:04:56 [2034412.018439][ T3195] BUG: kernel NULL pointer dereference, address: 0000000000000000 >>>>> Sep 15 08:04:56 [2034412.058658][ T3195] #PF: supervisor read access in kernel mode >>>>> Sep 15 08:04:56 [2034412.078866][ T3195] #PF: error_code(0x0000) - not-present page >>>>> Sep 15 08:04:56 [2034412.098648][ T3195] PGD 12fabb067 P4D 12fabb067 PUD 17d7fa067 PMD 0 >>>>> Sep 15 08:04:56 [2034412.118230][ T3195] Oops: 0000 [#1] SMP NOPTI >>>>> Sep 15 08:04:56 [2034412.137240][ T3195] CPU: 2 PID: 3195 Comm: napi/eth0-513 Tainted: G S W O 5.13.12 #1 >>>>> Sep 15 08:04:56 [2034412.174538][ T3195] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 >>>>> Sep 15 08:04:56 [2034412.211613][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0 >>>>> Sep 15 08:04:56 [2034412.230852][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49 >>>>> Sep 15 08:04:56 [2034412.287806][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296 >>>>> Sep 15 08:04:56 [2034412.306525][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 >>>>> Sep 15 08:04:56 [2034412.343308][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003 >>>>> Sep 15 08:04:56 [2034412.381399][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff >>>>> Sep 15 08:04:56 [2034412.421437][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00 >>>>> Sep 15 08:04:57 [2034412.463438][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00 >>>>> Sep 15 08:04:57 [2034412.507493][ T3195] FS: 0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000 >>>>> Sep 15 08:04:57 [2034412.553528][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> Sep 15 08:04:57 [2034412.577424][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0 >>>>> Sep 15 08:04:57 [2034412.624853][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>> Sep 15 08:04:57 [2034412.672633][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>>> Sep 15 08:04:57 [2034412.721656][ T3195] Call Trace: >>>>> Sep 15 08:04:57 [2034412.746016][ T3195] gro_normal_one+0x6e/0x90 >>>>> Sep 15 08:04:57 [2034412.770321][ T3195] napi_gro_flush+0xb1/0x100 >>>>> Sep 15 08:04:57 [2034412.794137][ T3195] napi_complete_done+0x107/0x180 >>>>> Sep 15 08:04:57 [2034412.817556][ T3195] ixgbe_poll+0x10e/0x240 [ixgbe] >>>>> Sep 15 08:04:57 [2034412.840522][ T3195] __napi_poll+0x1f/0x100 >>>>> Sep 15 08:04:57 [2034412.862829][ T3195] ? __napi_poll+0x100/0x100 >>>>> Sep 15 08:04:57 [2034412.884804][ T3195] napi_threaded_poll+0x105/0x150 >>>>> Sep 15 08:04:57 [2034412.906305][ T3195] kthread+0x101/0x120 >>>>> Sep 15 08:04:57 [2034412.927502][ T3195] ? set_kthread_struct+0x30/0x30 >>>>> Sep 15 08:04:57 [2034412.948434][ T3195] ret_from_fork+0x1f/0x30 >>>>> Sep 15 08:04:57 [2034412.969117][ T3195] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_dtvqos(O) xt_TCPMSS xt_comment iptable_mangle xt_MASQUERADE xt_nat iptable_nat ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: DTVIAC] >>>>> Sep 15 08:04:57 [2034413.136792][ T3195] CR2: 0000000000000000 >>>>> Sep 15 08:04:57 [2034413.157289][ T3195] ---[ end trace 83b8d17d2762bc74 ]--- >>>>> Sep 15 08:04:57 [2034413.177534][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0 >>>>> Sep 15 08:04:57 [2034413.197775][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49 >>>>> Sep 15 08:04:57 [2034413.258524][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296 >>>>> Sep 15 08:04:57 [2034413.278591][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 >>>>> Sep 15 08:04:57 [2034413.317532][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003 >>>>> Sep 15 08:04:57 [2034413.356936][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff >>>>> Sep 15 08:04:57 [2034413.398408][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00 >>>>> Sep 15 08:04:58 [2034413.441949][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00 >>>>> Sep 15 08:04:58 [2034413.487558][ T3195] FS: 0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000 >>>>> Sep 15 08:04:58 [2034413.535263][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> Sep 15 08:04:58 [2034413.560012][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0 >>>>> Sep 15 08:04:58 [2034413.609277][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>> Sep 15 08:04:58 [2034413.658501][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>>> Sep 15 08:04:58 [2034413.707535][ T3195] Kernel panic - not syncing: Fatal exception in interrupt >>>>> Sep 15 08:04:58 [2034413.856962][ T3195] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) >>>>> Sep 15 08:04:58 [2034413.906445][ T3195] Rebooting in 10 seconds.. >>>>> Sep 15 08:05:08 [2034423.930880][ T3195] ACPI MEMORY or I/O RESET_REG. >>>>> >>>>> >>>>> >>>>> >>>>>> On 10 Sep 2021, at 3:30, Wei Wang <weiwan@google.com> wrote: >>>>>> >>>>>> Hi Martin, >>>>>> >>>>>> Is there a reproducer for this? What kind of traffic is it running? >>>>>> What is the following config: >>>>>> cat /proc/sys/net/core/busy_poll >>>>>> cat /proc/sys/net/core/busy_read >>>>>> cat /sys/class/net/<ixgbe_dev>/threaded >>>>>> And is SO_PREFER_BUSY_POLL used? >>>>>> >>>>>> Thanks. >>>>>> Wei >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Sep 9, 2021 at 4:18 AM Martin Zaharinov <micron10@gmail.com> wrote: >>>>>>> >>>>>>> Hi Eric and Wei >>>>>>> >>>>>>> Please see this bug report from last hour , >>>>>>> Kernel 5.13.13, Traffic is 7Gb/s down/ 7Gb/s up >>>>>>> Uptime before crash : 10day >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Sep 9 12:49:31 [829553.899833][ T2925] ------------[ cut here ]------------ >>>>>>> Sep 9 12:49:31 [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158 >>>>>>> Sep 9 12:49:31 [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90 >>>>>>> Sep 9 12:49:31 [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] >>>>>>> Sep 9 12:49:31 [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G O 5.13.13 #1 >>>>>>> Sep 9 12:49:31 [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 >>>>>>> Sep 9 12:49:31 [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90 >>>>>>> Sep 9 12:49:31 [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6 >>>>>>> Sep 9 12:49:31 [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286 >>>>>>> Sep 9 12:49:31 [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001 >>>>>>> Sep 9 12:49:32 [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff >>>>>>> Sep 9 12:49:32 [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff >>>>>>> Sep 9 12:49:32 [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00 >>>>>>> Sep 9 12:49:32 [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00 >>>>>>> Sep 9 12:49:32 [829554.744221][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >>>>>>> Sep 9 12:49:32 [829554.795701][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>>> Sep 9 12:49:32 [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0 >>>>>>> Sep 9 12:49:32 [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>>> Sep 9 12:49:32 [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>>>>> Sep 9 12:49:32 [829554.972250][ T2925] Call Trace: >>>>>>> Sep 9 12:49:32 [829554.996597][ T2925] netif_receive_skb_list_internal+0x25c/0x2b0 >>>>>>> Sep 9 12:49:32 [829555.021270][ T2925] busy_poll_stop+0x113/0x140 >>>>>>> Sep 9 12:49:32 [829555.045679][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 >>>>>>> Sep 9 12:49:32 [829555.069833][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] >>>>>>> Sep 9 12:49:32 [829555.093659][ T2925] napi_busy_loop+0x212/0x280 >>>>>>> Sep 9 12:49:32 [829555.117046][ T2925] ep_poll+0xba/0x380 >>>>>>> Sep 9 12:49:32 [829555.140048][ T2925] ? __napi_poll+0x1f/0x100 >>>>>>> Sep 9 12:49:32 [829555.162477][ T2925] do_epoll_wait+0xa6/0xc0 >>>>>>> Sep 9 12:49:32 [829555.184504][ T2925] do_epoll_pwait.part.0+0x9/0x70 >>>>>>> Sep 9 12:49:32 [829555.206138][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 >>>>>>> Sep 9 12:49:32 [829555.227619][ T2925] ? do_syscall_64+0x3a/0x70 >>>>>>> Sep 9 12:49:32 [829555.248592][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae >>>>>>> Sep 9 12:49:32 [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]--- >>>>>>> Sep 9 12:49:32 [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9 >>>>>>> Sep 9 12:49:32 [829555.357231][ T2925] #PF: supervisor read access in kernel mode >>>>>>> Sep 9 12:49:32 [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page >>>>>>> Sep 9 12:49:32 [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0 >>>>>>> Sep 9 12:49:32 [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI >>>>>>> Sep 9 12:49:32 [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G W O 5.13.13 #1 >>>>>>> Sep 9 12:49:32 [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 >>>>>>> Sep 9 12:49:32 [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 >>>>>>> Sep 9 12:49:33 [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 >>>>>>> Sep 9 12:49:33 [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 >>>>>>> Sep 9 12:49:33 [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 >>>>>>> Sep 9 12:49:33 [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 >>>>>>> Sep 9 12:49:33 [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 >>>>>>> Sep 9 12:49:33 [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 >>>>>>> Sep 9 12:49:33 [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 >>>>>>> Sep 9 12:49:33 [829555.797754][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >>>>>>> Sep 9 12:49:33 [829555.843229][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>>> Sep 9 12:49:33 [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 >>>>>>> Sep 9 12:49:33 [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>>> Sep 9 12:49:33 [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>>>>> Sep 9 12:49:33 [829556.008563][ T2925] Call Trace: >>>>>>> Sep 9 12:49:33 [829556.032547][ T2925] ? enqueue_to_backlog+0x81/0x250 >>>>>>> Sep 9 12:49:33 [829556.056686][ T2925] netif_receive_skb_list_internal+0x24d/0x2b0 >>>>>>> Sep 9 12:49:33 [829556.080870][ T2925] busy_poll_stop+0x113/0x140 >>>>>>> Sep 9 12:49:33 [829556.104559][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 >>>>>>> Sep 9 12:49:33 [829556.128028][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] >>>>>>> Sep 9 12:49:33 [829556.151405][ T2925] napi_busy_loop+0x212/0x280 >>>>>>> Sep 9 12:49:33 [829556.174478][ T2925] ep_poll+0xba/0x380 >>>>>>> Sep 9 12:49:33 [829556.196887][ T2925] ? __napi_poll+0x1f/0x100 >>>>>>> Sep 9 12:49:33 [829556.219070][ T2925] do_epoll_wait+0xa6/0xc0 >>>>>>> Sep 9 12:49:33 [829556.240778][ T2925] do_epoll_pwait.part.0+0x9/0x70 >>>>>>> Sep 9 12:49:33 [829556.262203][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 >>>>>>> Sep 9 12:49:33 [829556.283188][ T2925] ? do_syscall_64+0x3a/0x70 >>>>>>> Sep 9 12:49:33 [829556.303666][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae >>>>>>> Sep 9 12:49:33 [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] >>>>>>> Sep 9 12:49:33 [829556.487037][ T2925] CR2: 00000000000496c9 >>>>>>> Sep 9 12:49:33 [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]--- >>>>>>> Sep 9 12:49:33 [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 >>>>>>> Sep 9 12:49:34 [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 >>>>>>> Sep 9 12:49:34 [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 >>>>>>> Sep 9 12:49:34 [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 >>>>>>> Sep 9 12:49:34 [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 >>>>>>> Sep 9 12:49:34 [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 >>>>>>> Sep 9 12:49:34 [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 >>>>>>> Sep 9 12:49:34 [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 >>>>>>> Sep 9 12:49:34 [829556.831749][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >>>>>>> Sep 9 12:49:34 [829556.879295][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>>> Sep 9 12:49:34 [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 >>>>>>> Sep 9 12:49:34 [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>>> Sep 9 12:49:34 [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>>>>> Sep 9 12:49:34 [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt >>>>>>> Sep 9 12:49:34 [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) >>>>>>> Sep 9 12:49:34 [829557.231174][ T2925] Rebooting in 10 seconds.. >>>>>>> Sep 9 12:49:44 [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG. >>>>>>> >>>>>>>> On 30 Mar 2021, at 16:39, Eric Dumazet <edumazet@google.com> wrote: >>>>>>>> >>>>>>>> On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote: >>>>>>>>> >>>>>>>>> Hi Eric and Wei >>>>>>>>> >>>>>>>>> Please check this log : >>>>>>>>> >>>>>>>> >>>>>>>> Please send a normal report to netdev. >>>>>>>> >>>>>>>> This has nothing to to with us (Eric & Wei) >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>>> >>>>>>>>> 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null) >>>>>>>>> [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G O 5.11.4 #1 >>>>>>>>> [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 >>>>>>>>> [1584289.107263] Call Trace: >>>>>>>>> [1584289.107266] dump_stack+0x58/0x6b >>>>>>>>> [1584289.209562] warn_alloc.cold+0x70/0xd4 >>>>>>>>> [1584289.209569] __alloc_pages_slowpath.constprop.0+0xd57/0xfb0 >>>>>>>>> [1584289.209574] __alloc_pages_nodemask+0x15a/0x180 >>>>>>>>> [1584289.474009] allocate_slab+0x272/0x450 >>>>>>>>> [1584289.496731] ___slab_alloc.constprop.0+0x41e/0x4d0 >>>>>>>>> [1584289.519147] kmem_cache_alloc+0x110/0x120 >>>>>>>>> [1584289.541416] build_skb+0x1a/0x200 >>>>>>>>> [1584289.563121] ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe] >>>>>>>>> [1584289.584618] ixgbe_poll+0xeb/0x2a0 [ixgbe] >>>>>>>>> [1584289.605528] __napi_poll+0x1f/0x130 >>>>>>>>> [1584289.625842] napi_threaded_poll+0x110/0x160 >>>>>>>>> [1584289.646110] ? __napi_poll+0x130/0x130 >>>>>>>>> [1584289.665810] kthread+0xea/0x120 >>>>>>>>> [1584289.684836] ? kthread_park+0x80/0x80 >>>>>>>>> [1584289.703440] ret_from_fork+0x1f/0x30 >>>>>>>>> [1584289.721616] Mem-Info: >>>>>>>>> [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0 >>>>>>>>> active_file:17408 inactive_file:149 isolated_file:32 >>>>>>>>> unevictable:1440359 dirty:17500 writeback:0 >>>>>>>>> slab_reclaimable:43368 slab_unreclaimable:155124 >>>>>>>>> mapped:817431 shmem:7650 pagetables:32093 bounce:0 >>>>>>>>> free:17832 free_pcp:113 free_cma:0 >>>>>>>>> [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no >>>>>>>>> [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB >>>>>>>>> [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726 >>>>>>>>> [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB >>>>>>>>> [1584290.104980] lowmem_reserve[]: 0 0 13985 13985 >>>>>>>>> [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB >>>>>>>>> [1584290.237051] lowmem_reserve[]: 0 0 0 0 >>>>>>>>> [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB >>>>>>>>> [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB >>>>>>>>> [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB >>>>>>>>> [1584290.409087] 1465768 total pagecache pages >>>>>>>>> [1584290.434531] 4165289 pages RAM >>>>>>>>> [1584290.459616] 0 pages HighMem/MovableOnly >>>>>>>>> [1584290.484480] 104766 pages reserved >>>>>>>>> [1584290.508709] 0 pages hwpoisoned >>>>>>>>> [1584301.710231] team0: Failed to send options change via netlink (err -105) >>>>>>>>> [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0 >>>>>>>>> [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G O 5.11.4 #1 >>>>>>>>> [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 >>>>>>>>> [1584302.776532] Call Trace: >>>>>>>>> [1584302.799361] dump_stack+0x58/0x6b >>>>>>>>> [1584302.821791] dump_header+0x4c/0x2e6 >>>>>>>>> [1584302.843580] oom_kill_process.cold+0xb/0x10 >>>>>>>>> [1584302.865223] out_of_memory.part.0+0x125/0x5f0 >>>>>>>>> [1584302.886641] out_of_memory+0x54/0xa0 >>>>>>>>> [1584302.907302] __alloc_pages_slowpath.constprop.0+0xb03/0xfb0 >>>>>>>>> [1584302.927913] __alloc_pages_nodemask+0x15a/0x180 >>>>>>>>> [1584302.947874] __get_free_pages+0x8/0x30 >>>>>>>>> [1584302.967246] pgd_alloc+0x21/0x180 >>>>>>>>> [1584302.986355] mm_alloc+0x1af/0x250 >>>>>>>>> [1584303.005085] alloc_bprm+0x80/0x2a0 >>>>>>>>> [1584303.023328] do_execveat_common+0x8b/0x330 >>>>>>>>> [1584303.041181] __x64_sys_execve+0x2b/0x40 >>>>>>>>> [1584303.058513] do_syscall_64+0x2d/0x40 >>>>>>>>> [1584303.075281] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>>>>>>> [1584303.091891] RIP: 0033:0x488376 >>>>>>>>> [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30 >>>>>>>>> [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b >>>>>>>>> [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376 >>>>>>>>> [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660 >>>>>>>>> [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000 >>>>>>>>> [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258 >>>>>>>>> [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100 >>>>>>>>> [1584303.379094] Mem-Info: >>>>>>>>> [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0 >>>>>>>>> active_file:12975 inactive_file:168 isolated_file:32 >>>>>>>>> unevictable:909709 dirty:12864 writeback:10 >>>>>>>>> slab_reclaimable:42415 slab_unreclaimable:154783 >>>>>>>>> mapped:39825 shmem:14744 pagetables:26041 bounce:0 >>>>>>>>> free:537002 free_pcp:1813 free_cma:0 >>>>>>>>> [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no >>>>>>>>> [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB >>>>>>>>> [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726 >>>>>>>>> [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB >>>>>>>>> [1584303.888935] lowmem_reserve[]: 0 0 13985 13985 >>>>>>>>> [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB >>>>>>>>> [1584304.036531] lowmem_reserve[]: 0 0 0 0 >>>>>>>>> [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB >>>>>>>>> [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB >>>>>>>>> [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB >>>>>>>>> [1584304.287094] 933871 total pagecache pages >>>>>>>>> [1584304.312815] 4165289 pages RAM >>>>>>>>> [1584304.337915] 0 pages HighMem/MovableOnly >>>>>>>>> [1584304.362522] 104766 pages reserved >>>>>>>>> [1584304.386516] 0 pages hwpoisoned >>>>>>>>> >>>>>>>>>> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote: >>>>>>>>>> >>>>>>>>>> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Wei >>>>>>>>>>> Check this: >>>>>>>>>>> >>>>>>>>>>> [ 39.706567] ------------[ cut here ]------------ >>>>>>>>>>> [ 39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557) >>>>>>>>>>> [ 39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100 >>>>>>>>>> >>>>>>>>>> Probably more relevant to Intel maintainers than Wei :/ >>>>>>>>>> >>>>>>>>>>> [ 39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas >>>>>>>>>>> [ 39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G O 5.11.7 #1 >>>>>>>>>>> [ 39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020 >>>>>>>>>>> [ 39.706619] Workqueue: events work_for_cpu_fn >>>>>>>>>>> [ 39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100 >>>>>>>>>>> [ 39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 >>>>>>>>>>> [ 39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292 >>>>>>>>>>> [ 39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff >>>>>>>>>>> [ 39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001 >>>>>>>>>>> [ 39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea >>>>>>>>>>> [ 39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008 >>>>>>>>>>> [ 39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000 >>>>>>>>>>> [ 39.706646] FS: 0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000 >>>>>>>>>>> [ 39.706649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>>>>>>> [ 39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0 >>>>>>>>>>> [ 39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>>>>>>>> [ 39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>>>>>>>>>> [ 39.706656] Call Trace: >>>>>>>>>>> [ 39.706658] i40e_setup_pf_switch+0x617/0xf80 [i40e] >>>>>>>>>>> [ 39.706683] i40e_probe.part.0.cold+0x8dc/0x109e [i40e] >>>>>>>>>>> [ 39.706708] ? acpi_ns_check_object_type+0xd4/0x193 >>>>>>>>>>> [ 39.706713] ? acpi_ns_check_package_list+0xfd/0x205 >>>>>>>>>>> [ 39.706716] ? __kmalloc+0x37/0x160 >>>>>>>>>>> [ 39.706720] ? kmem_cache_alloc+0xcb/0x120 >>>>>>>>>>> [ 39.706723] ? irq_get_irq_data+0x5/0x20 >>>>>>>>>>> [ 39.706726] ? mp_check_pin_attr+0xe/0xf0 >>>>>>>>>>> [ 39.706729] ? irq_get_irq_data+0x5/0x20 >>>>>>>>>>> [ 39.706731] ? mp_map_pin_to_irq+0xb7/0x2c0 >>>>>>>>>>> [ 39.706735] ? acpi_register_gsi_ioapic+0x86/0x150 >>>>>>>>>>> [ 39.706739] ? pci_conf1_read+0x9f/0xf0 >>>>>>>>>>> [ 39.706743] ? pci_bus_read_config_word+0x2e/0x40 >>>>>>>>>>> [ 39.706746] local_pci_probe+0x1b/0x40 >>>>>>>>>>> [ 39.706750] work_for_cpu_fn+0xb/0x20 >>>>>>>>>>> [ 39.706754] process_one_work+0x1ec/0x350 >>>>>>>>>>> [ 39.706758] worker_thread+0x24b/0x4d0 >>>>>>>>>>> [ 39.706760] ? process_one_work+0x350/0x350 >>>>>>>>>>> [ 39.706762] kthread+0xea/0x120 >>>>>>>>>>> [ 39.706766] ? kthread_park+0x80/0x80 >>>>>>>>>>> [ 39.706770] ret_from_fork+0x1f/0x30 >>>>>>>>>>> [ 39.706774] ---[ end trace 7a203f3ec972a377 ]--- >>>>>>>>>>> >>>>>>>>>>> Martin >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to >>>>>>>>>>>> determine if the kthread owns this napi and could call napi->poll() on >>>>>>>>>>>> it. However, if socket busy poll is enabled, it is possible that the >>>>>>>>>>>> busy poll thread grabs this SCHED bit (after the previous napi->poll() >>>>>>>>>>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll >>>>>>>>>>>> on the same napi. napi_disable() could grab the SCHED bit as well. >>>>>>>>>>>> This patch tries to fix this race by adding a new bit >>>>>>>>>>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in >>>>>>>>>>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared >>>>>>>>>>>> in napi_complete_done(), and we only poll the napi in kthread if this >>>>>>>>>>>> bit is set. This helps distinguish the ownership of the napi between >>>>>>>>>>>> kthread and other scenarios and fixes the race issue. >>>>>>>>>>>> >>>>>>>>>>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") >>>>>>>>>>>> Reported-by: Martin Zaharinov <micron10@gmail.com> >>>>>>>>>>>> Suggested-by: Jakub Kicinski <kuba@kernel.org> >>>>>>>>>>>> Signed-off-by: Wei Wang <weiwan@google.com> >>>>>>>>>>>> Cc: Alexander Duyck <alexanderduyck@fb.com> >>>>>>>>>>>> Cc: Eric Dumazet <edumazet@google.com> >>>>>>>>>>>> Cc: Paolo Abeni <pabeni@redhat.com> >>>>>>>>>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> >>>>>>>>>>>> --- >>>>>>>>>>>> Change since v3: >>>>>>>>>>>> - Add READ_ONCE() for thread->state and add comments in >>>>>>>>>>>> ____napi_schedule(). >>>>>>>>>>>> >>>>>>>>>>>> include/linux/netdevice.h | 2 ++ >>>>>>>>>>>> net/core/dev.c | 19 ++++++++++++++++++- >>>>>>>>>>>> 2 files changed, 20 insertions(+), 1 deletion(-) >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>>>>>>>>>>> index 5b67ea89d5f2..87a5d186faff 100644 >>>>>>>>>>>> --- a/include/linux/netdevice.h >>>>>>>>>>>> +++ b/include/linux/netdevice.h >>>>>>>>>>>> @@ -360,6 +360,7 @@ enum { >>>>>>>>>>>> NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */ >>>>>>>>>>>> NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/ >>>>>>>>>>>> NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/ >>>>>>>>>>>> + NAPI_STATE_SCHED_THREADED, /* Napi is currently scheduled in threaded mode */ >>>>>>>>>>>> }; >>>>>>>>>>>> >>>>>>>>>>>> enum { >>>>>>>>>>>> @@ -372,6 +373,7 @@ enum { >>>>>>>>>>>> NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL), >>>>>>>>>>>> NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL), >>>>>>>>>>>> NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED), >>>>>>>>>>>> + NAPIF_STATE_SCHED_THREADED = BIT(NAPI_STATE_SCHED_THREADED), >>>>>>>>>>>> }; >>>>>>>>>>>> >>>>>>>>>>>> enum gro_result { >>>>>>>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c >>>>>>>>>>>> index 6c5967e80132..d3195a95f30e 100644 >>>>>>>>>>>> --- a/net/core/dev.c >>>>>>>>>>>> +++ b/net/core/dev.c >>>>>>>>>>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd, >>>>>>>>>>>> */ >>>>>>>>>>>> thread = READ_ONCE(napi->thread); >>>>>>>>>>>> if (thread) { >>>>>>>>>>>> + /* Avoid doing set_bit() if the thread is in >>>>>>>>>>>> + * INTERRUPTIBLE state, cause napi_thread_wait() >>>>>>>>>>>> + * makes sure to proceed with napi polling >>>>>>>>>>>> + * if the thread is explicitly woken from here. >>>>>>>>>>>> + */ >>>>>>>>>>>> + if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE) >>>>>>>>>>>> + set_bit(NAPI_STATE_SCHED_THREADED, &napi->state); >>>>>>>>>>>> wake_up_process(thread); >>>>>>>>>>>> return; >>>>>>>>>>>> } >>>>>>>>>>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done) >>>>>>>>>>>> WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED)); >>>>>>>>>>>> >>>>>>>>>>>> new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED | >>>>>>>>>>>> + NAPIF_STATE_SCHED_THREADED | >>>>>>>>>>>> NAPIF_STATE_PREFER_BUSY_POLL); >>>>>>>>>>>> >>>>>>>>>>>> /* If STATE_MISSED was set, leave STATE_SCHED set, >>>>>>>>>>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) >>>>>>>>>>>> >>>>>>>>>>>> static int napi_thread_wait(struct napi_struct *napi) >>>>>>>>>>>> { >>>>>>>>>>>> + bool woken = false; >>>>>>>>>>>> + >>>>>>>>>>>> set_current_state(TASK_INTERRUPTIBLE); >>>>>>>>>>>> >>>>>>>>>>>> while (!kthread_should_stop() && !napi_disable_pending(napi)) { >>>>>>>>>>>> - if (test_bit(NAPI_STATE_SCHED, &napi->state)) { >>>>>>>>>>>> + /* Testing SCHED_THREADED bit here to make sure the current >>>>>>>>>>>> + * kthread owns this napi and could poll on this napi. >>>>>>>>>>>> + * Testing SCHED bit is not enough because SCHED bit might be >>>>>>>>>>>> + * set by some other busy poll thread or by napi_disable(). >>>>>>>>>>>> + */ >>>>>>>>>>>> + if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) { >>>>>>>>>>>> WARN_ON(!list_empty(&napi->poll_list)); >>>>>>>>>>>> __set_current_state(TASK_RUNNING); >>>>>>>>>>>> return 0; >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> schedule(); >>>>>>>>>>>> + /* woken being true indicates this thread owns this napi. */ >>>>>>>>>>>> + woken = true; >>>>>>>>>>>> set_current_state(TASK_INTERRUPTIBLE); >>>>>>>>>>>> } >>>>>>>>>>>> __set_current_state(TASK_RUNNING); >>>>>>>>>>>> -- >>>>>>>>>>>> 2.31.0.rc2.261.g7f71774620-goog >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>> >>
On Thu, Sep 23, 2021 at 11:18 PM Martin Zaharinov <micron10@gmail.com> wrote: > > Hi Wei > > I think we discussed it somewhere here that it should be enabled: > > cat /proc/sys/net/core/busy_poll - 50 > cat /proc/sys/net/core/busy_read - 50 > > and one more : > > “ packet receipt: > > high-latency > interrupt-based -------------------> poll-based > > Busy polling helps reduce latency in the network receive path by > > • allowing socket layer code to poll the receive queue of a network device, > • and disable network interrupts. > This eliminates > > • delays caused by the interrupts > • and the resultant context switches > However, it > > • increses CPU utilization. > • Also prevent the CPU from sleeping, which can incur additional power comsumption. > Busy polling is disabled by default. > > Set net.core.busy_poll to a value other than 0 to enable it. > > This parameter controls the number of microseconds to wait for packets on the device queue for socket pool and selects. Red Hat recemmends a value of 50. > > Add the SO_BUSY_POLL socket option to the socket. " > > > > do you think it comes from him? Yes. I understand that you enabled busy polling. I was just wondering why you need to enable busy polling + thread polling. Could you help me confirm on the receive side, if the application uses either of the following socket options: - SO_BUSY_POLL_BUDGET - SO_PREFER_BUSY_POLL And what kernel version are you using? > > Martin > > > On 24 Sep 2021, at 3:54, Wei Wang <weiwan@google.com> wrote: > > > > Hi Martin, > > > > It looks like there might still be a race between kthread polling and > > busy polling. I am looking into the code but was not able to identify > > the cause. > > May I ask why you need to enable both at the same time? > > > > Thanks. > > Wei > > > > > > On Thu, Sep 23, 2021 at 1:31 PM Martin Zaharinov <micron10@gmail.com> wrote: > >> > >> Hey Wai > >> > >> If you find any fix for this write me to test . > >> > >> kthread is a very good solution for network load server but need to find from where is come this bug . > >> > >> > >> Martin > >> > >>> On 22 Sep 2021, at 17:12, Martin Zaharinov <micron10@gmail.com> wrote: > >>> > >>> Hi Wei > >>> > >>> One more bug report from last hours: > >>> > >>> > >>> > >>> Sep 9 12:49:31 [829553.899833][ T2925] ------------[ cut here ]------------ > >>> Sep 9 12:49:31 [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158 > >>> Sep 9 12:49:31 [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90 > >>> Sep 9 12:49:31 [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] > >>> Sep 9 12:49:31 [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G O 5.13.13 #1 > >>> Sep 9 12:49:31 [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 > >>> Sep 9 12:49:31 [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90 > >>> Sep 9 12:49:31 [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6 > >>> Sep 9 12:49:31 [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286 > >>> Sep 9 12:49:31 [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001 > >>> Sep 9 12:49:32 [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff > >>> Sep 9 12:49:32 [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff > >>> Sep 9 12:49:32 [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00 > >>> Sep 9 12:49:32 [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00 > >>> Sep 9 12:49:32 [829554.744221][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 > >>> Sep 9 12:49:32 [829554.795701][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>> Sep 9 12:49:32 [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0 > >>> Sep 9 12:49:32 [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>> Sep 9 12:49:32 [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>> Sep 9 12:49:32 [829554.972250][ T2925] Call Trace: > >>> Sep 9 12:49:32 [829554.996597][ T2925] netif_receive_skb_list_internal+0x25c/0x2b0 > >>> Sep 9 12:49:32 [829555.021270][ T2925] busy_poll_stop+0x113/0x140 > >>> Sep 9 12:49:32 [829555.045679][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 > >>> Sep 9 12:49:32 [829555.069833][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] > >>> Sep 9 12:49:32 [829555.093659][ T2925] napi_busy_loop+0x212/0x280 > >>> Sep 9 12:49:32 [829555.117046][ T2925] ep_poll+0xba/0x380 > >>> Sep 9 12:49:32 [829555.140048][ T2925] ? __napi_poll+0x1f/0x100 > >>> Sep 9 12:49:32 [829555.162477][ T2925] do_epoll_wait+0xa6/0xc0 > >>> Sep 9 12:49:32 [829555.184504][ T2925] do_epoll_pwait.part.0+0x9/0x70 > >>> Sep 9 12:49:32 [829555.206138][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 > >>> Sep 9 12:49:32 [829555.227619][ T2925] ? do_syscall_64+0x3a/0x70 > >>> Sep 9 12:49:32 [829555.248592][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae > >>> Sep 9 12:49:32 [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]--- > >>> Sep 9 12:49:32 [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9 > >>> Sep 9 12:49:32 [829555.357231][ T2925] #PF: supervisor read access in kernel mode > >>> Sep 9 12:49:32 [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page > >>> Sep 9 12:49:32 [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0 > >>> Sep 9 12:49:32 [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI > >>> Sep 9 12:49:32 [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G W O 5.13.13 #1 > >>> Sep 9 12:49:32 [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 > >>> Sep 9 12:49:32 [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 > >>> Sep 9 12:49:33 [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 > >>> Sep 9 12:49:33 [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 > >>> Sep 9 12:49:33 [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 > >>> Sep 9 12:49:33 [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 > >>> Sep 9 12:49:33 [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 > >>> Sep 9 12:49:33 [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 > >>> Sep 9 12:49:33 [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 > >>> Sep 9 12:49:33 [829555.797754][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 > >>> Sep 9 12:49:33 [829555.843229][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>> Sep 9 12:49:33 [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 > >>> Sep 9 12:49:33 [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>> Sep 9 12:49:33 [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>> Sep 9 12:49:33 [829556.008563][ T2925] Call Trace: > >>> Sep 9 12:49:33 [829556.032547][ T2925] ? enqueue_to_backlog+0x81/0x250 > >>> Sep 9 12:49:33 [829556.056686][ T2925] netif_receive_skb_list_internal+0x24d/0x2b0 > >>> Sep 9 12:49:33 [829556.080870][ T2925] busy_poll_stop+0x113/0x140 > >>> Sep 9 12:49:33 [829556.104559][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 > >>> Sep 9 12:49:33 [829556.128028][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] > >>> Sep 9 12:49:33 [829556.151405][ T2925] napi_busy_loop+0x212/0x280 > >>> Sep 9 12:49:33 [829556.174478][ T2925] ep_poll+0xba/0x380 > >>> Sep 9 12:49:33 [829556.196887][ T2925] ? __napi_poll+0x1f/0x100 > >>> Sep 9 12:49:33 [829556.219070][ T2925] do_epoll_wait+0xa6/0xc0 > >>> Sep 9 12:49:33 [829556.240778][ T2925] do_epoll_pwait.part.0+0x9/0x70 > >>> Sep 9 12:49:33 [829556.262203][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 > >>> Sep 9 12:49:33 [829556.283188][ T2925] ? do_syscall_64+0x3a/0x70 > >>> Sep 9 12:49:33 [829556.303666][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae > >>> Sep 9 12:49:33 [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] > >>> Sep 9 12:49:33 [829556.487037][ T2925] CR2: 00000000000496c9 > >>> Sep 9 12:49:33 [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]--- > >>> Sep 9 12:49:33 [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 > >>> Sep 9 12:49:34 [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 > >>> Sep 9 12:49:34 [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 > >>> Sep 9 12:49:34 [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 > >>> Sep 9 12:49:34 [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 > >>> Sep 9 12:49:34 [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 > >>> Sep 9 12:49:34 [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 > >>> Sep 9 12:49:34 [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 > >>> Sep 9 12:49:34 [829556.831749][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 > >>> Sep 9 12:49:34 [829556.879295][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>> Sep 9 12:49:34 [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 > >>> Sep 9 12:49:34 [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>> Sep 9 12:49:34 [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>> Sep 9 12:49:34 [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt > >>> Sep 9 12:49:34 [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > >>> Sep 9 12:49:34 [829557.231174][ T2925] Rebooting in 10 seconds.. > >>> Sep 9 12:49:44 [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG. > >>> > >>>> On 15 Sep 2021, at 18:45, Wei Wang <weiwan@google.com> wrote: > >>>> > >>>> Thanks Martin for the report. > >>>> Without a reproducer, it might be hard to debug. I will double check > >>>> the code to check for potential race between kthread poll and busy > >>>> poll. > >>>> > >>>> Thanks. > >>>> Wei > >>>> > >>>> On Wed, Sep 15, 2021 at 7:22 AM Martin Zaharinov <micron10@gmail.com> wrote: > >>>>> > >>>>> Hi Wei > >>>>> Please see this bug log : > >>>>> > >>>>> > >>>>> Sep 15 08:04:56 [2034411.548669][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>>> Sep 15 08:04:56 [2034411.574642][ T3195] CR2: 00007f1e8cf58000 CR3: 0000000183cb4003 CR4: 00000000001706e0 > >>>>> Sep 15 08:04:56 [2034411.625156][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>>> Sep 15 08:04:56 [2034411.675495][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>>>> Sep 15 08:04:56 [2034411.725536][ T3195] Call Trace: > >>>>> Sep 15 08:04:56 [2034411.749948][ T3195] netif_receive_skb_list_internal+0x25c/0x2b0 > >>>>> Sep 15 08:04:56 [2034411.774579][ T3195] gro_normal_one+0x6e/0x90 > >>>>> Sep 15 08:04:56 [2034411.798786][ T3195] napi_gro_flush+0xb1/0x100 > >>>>> Sep 15 08:04:56 [2034411.822410][ T3195] napi_complete_done+0x107/0x180 > >>>>> Sep 15 08:04:56 [2034411.845614][ T3195] ixgbe_poll+0x10e/0x240 [ixgbe] > >>>>> Sep 15 08:04:56 [2034411.868480][ T3195] __napi_poll+0x1f/0x100 > >>>>> Sep 15 08:04:56 [2034411.890899][ T3195] ? __napi_poll+0x100/0x100 > >>>>> Sep 15 08:04:56 [2034411.912799][ T3195] napi_threaded_poll+0x105/0x150 > >>>>> Sep 15 08:04:56 [2034411.934567][ T3195] kthread+0x101/0x120 > >>>>> Sep 15 08:04:56 [2034411.955873][ T3195] ? set_kthread_struct+0x30/0x30 > >>>>> Sep 15 08:04:56 [2034411.977157][ T3195] ret_from_fork+0x1f/0x30 > >>>>> Sep 15 08:04:56 [2034411.997922][ T3195] ---[ end trace 83b8d17d2762bc73 ]--- > >>>>> Sep 15 08:04:56 [2034412.018439][ T3195] BUG: kernel NULL pointer dereference, address: 0000000000000000 > >>>>> Sep 15 08:04:56 [2034412.058658][ T3195] #PF: supervisor read access in kernel mode > >>>>> Sep 15 08:04:56 [2034412.078866][ T3195] #PF: error_code(0x0000) - not-present page > >>>>> Sep 15 08:04:56 [2034412.098648][ T3195] PGD 12fabb067 P4D 12fabb067 PUD 17d7fa067 PMD 0 > >>>>> Sep 15 08:04:56 [2034412.118230][ T3195] Oops: 0000 [#1] SMP NOPTI > >>>>> Sep 15 08:04:56 [2034412.137240][ T3195] CPU: 2 PID: 3195 Comm: napi/eth0-513 Tainted: G S W O 5.13.12 #1 > >>>>> Sep 15 08:04:56 [2034412.174538][ T3195] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 > >>>>> Sep 15 08:04:56 [2034412.211613][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0 > >>>>> Sep 15 08:04:56 [2034412.230852][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49 > >>>>> Sep 15 08:04:56 [2034412.287806][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296 > >>>>> Sep 15 08:04:56 [2034412.306525][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 > >>>>> Sep 15 08:04:56 [2034412.343308][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003 > >>>>> Sep 15 08:04:56 [2034412.381399][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff > >>>>> Sep 15 08:04:56 [2034412.421437][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00 > >>>>> Sep 15 08:04:57 [2034412.463438][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00 > >>>>> Sep 15 08:04:57 [2034412.507493][ T3195] FS: 0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000 > >>>>> Sep 15 08:04:57 [2034412.553528][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>>> Sep 15 08:04:57 [2034412.577424][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0 > >>>>> Sep 15 08:04:57 [2034412.624853][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>>> Sep 15 08:04:57 [2034412.672633][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>>>> Sep 15 08:04:57 [2034412.721656][ T3195] Call Trace: > >>>>> Sep 15 08:04:57 [2034412.746016][ T3195] gro_normal_one+0x6e/0x90 > >>>>> Sep 15 08:04:57 [2034412.770321][ T3195] napi_gro_flush+0xb1/0x100 > >>>>> Sep 15 08:04:57 [2034412.794137][ T3195] napi_complete_done+0x107/0x180 > >>>>> Sep 15 08:04:57 [2034412.817556][ T3195] ixgbe_poll+0x10e/0x240 [ixgbe] > >>>>> Sep 15 08:04:57 [2034412.840522][ T3195] __napi_poll+0x1f/0x100 > >>>>> Sep 15 08:04:57 [2034412.862829][ T3195] ? __napi_poll+0x100/0x100 > >>>>> Sep 15 08:04:57 [2034412.884804][ T3195] napi_threaded_poll+0x105/0x150 > >>>>> Sep 15 08:04:57 [2034412.906305][ T3195] kthread+0x101/0x120 > >>>>> Sep 15 08:04:57 [2034412.927502][ T3195] ? set_kthread_struct+0x30/0x30 > >>>>> Sep 15 08:04:57 [2034412.948434][ T3195] ret_from_fork+0x1f/0x30 > >>>>> Sep 15 08:04:57 [2034412.969117][ T3195] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_dtvqos(O) xt_TCPMSS xt_comment iptable_mangle xt_MASQUERADE xt_nat iptable_nat ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: DTVIAC] > >>>>> Sep 15 08:04:57 [2034413.136792][ T3195] CR2: 0000000000000000 > >>>>> Sep 15 08:04:57 [2034413.157289][ T3195] ---[ end trace 83b8d17d2762bc74 ]--- > >>>>> Sep 15 08:04:57 [2034413.177534][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0 > >>>>> Sep 15 08:04:57 [2034413.197775][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49 > >>>>> Sep 15 08:04:57 [2034413.258524][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296 > >>>>> Sep 15 08:04:57 [2034413.278591][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 > >>>>> Sep 15 08:04:57 [2034413.317532][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003 > >>>>> Sep 15 08:04:57 [2034413.356936][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff > >>>>> Sep 15 08:04:57 [2034413.398408][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00 > >>>>> Sep 15 08:04:58 [2034413.441949][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00 > >>>>> Sep 15 08:04:58 [2034413.487558][ T3195] FS: 0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000 > >>>>> Sep 15 08:04:58 [2034413.535263][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>>> Sep 15 08:04:58 [2034413.560012][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0 > >>>>> Sep 15 08:04:58 [2034413.609277][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>>> Sep 15 08:04:58 [2034413.658501][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>>>> Sep 15 08:04:58 [2034413.707535][ T3195] Kernel panic - not syncing: Fatal exception in interrupt > >>>>> Sep 15 08:04:58 [2034413.856962][ T3195] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > >>>>> Sep 15 08:04:58 [2034413.906445][ T3195] Rebooting in 10 seconds.. > >>>>> Sep 15 08:05:08 [2034423.930880][ T3195] ACPI MEMORY or I/O RESET_REG. > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>> On 10 Sep 2021, at 3:30, Wei Wang <weiwan@google.com> wrote: > >>>>>> > >>>>>> Hi Martin, > >>>>>> > >>>>>> Is there a reproducer for this? What kind of traffic is it running? > >>>>>> What is the following config: > >>>>>> cat /proc/sys/net/core/busy_poll > >>>>>> cat /proc/sys/net/core/busy_read > >>>>>> cat /sys/class/net/<ixgbe_dev>/threaded > >>>>>> And is SO_PREFER_BUSY_POLL used? > >>>>>> > >>>>>> Thanks. > >>>>>> Wei > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Thu, Sep 9, 2021 at 4:18 AM Martin Zaharinov <micron10@gmail.com> wrote: > >>>>>>> > >>>>>>> Hi Eric and Wei > >>>>>>> > >>>>>>> Please see this bug report from last hour , > >>>>>>> Kernel 5.13.13, Traffic is 7Gb/s down/ 7Gb/s up > >>>>>>> Uptime before crash : 10day > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Sep 9 12:49:31 [829553.899833][ T2925] ------------[ cut here ]------------ > >>>>>>> Sep 9 12:49:31 [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158 > >>>>>>> Sep 9 12:49:31 [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90 > >>>>>>> Sep 9 12:49:31 [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] > >>>>>>> Sep 9 12:49:31 [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G O 5.13.13 #1 > >>>>>>> Sep 9 12:49:31 [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 > >>>>>>> Sep 9 12:49:31 [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90 > >>>>>>> Sep 9 12:49:31 [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6 > >>>>>>> Sep 9 12:49:31 [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286 > >>>>>>> Sep 9 12:49:31 [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001 > >>>>>>> Sep 9 12:49:32 [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff > >>>>>>> Sep 9 12:49:32 [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff > >>>>>>> Sep 9 12:49:32 [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00 > >>>>>>> Sep 9 12:49:32 [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00 > >>>>>>> Sep 9 12:49:32 [829554.744221][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 > >>>>>>> Sep 9 12:49:32 [829554.795701][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>>>>> Sep 9 12:49:32 [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0 > >>>>>>> Sep 9 12:49:32 [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>>>>> Sep 9 12:49:32 [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>>>>>> Sep 9 12:49:32 [829554.972250][ T2925] Call Trace: > >>>>>>> Sep 9 12:49:32 [829554.996597][ T2925] netif_receive_skb_list_internal+0x25c/0x2b0 > >>>>>>> Sep 9 12:49:32 [829555.021270][ T2925] busy_poll_stop+0x113/0x140 > >>>>>>> Sep 9 12:49:32 [829555.045679][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 > >>>>>>> Sep 9 12:49:32 [829555.069833][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] > >>>>>>> Sep 9 12:49:32 [829555.093659][ T2925] napi_busy_loop+0x212/0x280 > >>>>>>> Sep 9 12:49:32 [829555.117046][ T2925] ep_poll+0xba/0x380 > >>>>>>> Sep 9 12:49:32 [829555.140048][ T2925] ? __napi_poll+0x1f/0x100 > >>>>>>> Sep 9 12:49:32 [829555.162477][ T2925] do_epoll_wait+0xa6/0xc0 > >>>>>>> Sep 9 12:49:32 [829555.184504][ T2925] do_epoll_pwait.part.0+0x9/0x70 > >>>>>>> Sep 9 12:49:32 [829555.206138][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 > >>>>>>> Sep 9 12:49:32 [829555.227619][ T2925] ? do_syscall_64+0x3a/0x70 > >>>>>>> Sep 9 12:49:32 [829555.248592][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae > >>>>>>> Sep 9 12:49:32 [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]--- > >>>>>>> Sep 9 12:49:32 [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9 > >>>>>>> Sep 9 12:49:32 [829555.357231][ T2925] #PF: supervisor read access in kernel mode > >>>>>>> Sep 9 12:49:32 [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page > >>>>>>> Sep 9 12:49:32 [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0 > >>>>>>> Sep 9 12:49:32 [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI > >>>>>>> Sep 9 12:49:32 [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G W O 5.13.13 #1 > >>>>>>> Sep 9 12:49:32 [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 > >>>>>>> Sep 9 12:49:32 [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 > >>>>>>> Sep 9 12:49:33 [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 > >>>>>>> Sep 9 12:49:33 [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 > >>>>>>> Sep 9 12:49:33 [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 > >>>>>>> Sep 9 12:49:33 [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 > >>>>>>> Sep 9 12:49:33 [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 > >>>>>>> Sep 9 12:49:33 [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 > >>>>>>> Sep 9 12:49:33 [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 > >>>>>>> Sep 9 12:49:33 [829555.797754][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 > >>>>>>> Sep 9 12:49:33 [829555.843229][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>>>>> Sep 9 12:49:33 [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 > >>>>>>> Sep 9 12:49:33 [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>>>>> Sep 9 12:49:33 [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>>>>>> Sep 9 12:49:33 [829556.008563][ T2925] Call Trace: > >>>>>>> Sep 9 12:49:33 [829556.032547][ T2925] ? enqueue_to_backlog+0x81/0x250 > >>>>>>> Sep 9 12:49:33 [829556.056686][ T2925] netif_receive_skb_list_internal+0x24d/0x2b0 > >>>>>>> Sep 9 12:49:33 [829556.080870][ T2925] busy_poll_stop+0x113/0x140 > >>>>>>> Sep 9 12:49:33 [829556.104559][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 > >>>>>>> Sep 9 12:49:33 [829556.128028][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] > >>>>>>> Sep 9 12:49:33 [829556.151405][ T2925] napi_busy_loop+0x212/0x280 > >>>>>>> Sep 9 12:49:33 [829556.174478][ T2925] ep_poll+0xba/0x380 > >>>>>>> Sep 9 12:49:33 [829556.196887][ T2925] ? __napi_poll+0x1f/0x100 > >>>>>>> Sep 9 12:49:33 [829556.219070][ T2925] do_epoll_wait+0xa6/0xc0 > >>>>>>> Sep 9 12:49:33 [829556.240778][ T2925] do_epoll_pwait.part.0+0x9/0x70 > >>>>>>> Sep 9 12:49:33 [829556.262203][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 > >>>>>>> Sep 9 12:49:33 [829556.283188][ T2925] ? do_syscall_64+0x3a/0x70 > >>>>>>> Sep 9 12:49:33 [829556.303666][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae > >>>>>>> Sep 9 12:49:33 [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] > >>>>>>> Sep 9 12:49:33 [829556.487037][ T2925] CR2: 00000000000496c9 > >>>>>>> Sep 9 12:49:33 [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]--- > >>>>>>> Sep 9 12:49:33 [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 > >>>>>>> Sep 9 12:49:34 [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 > >>>>>>> Sep 9 12:49:34 [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 > >>>>>>> Sep 9 12:49:34 [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 > >>>>>>> Sep 9 12:49:34 [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 > >>>>>>> Sep 9 12:49:34 [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 > >>>>>>> Sep 9 12:49:34 [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 > >>>>>>> Sep 9 12:49:34 [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 > >>>>>>> Sep 9 12:49:34 [829556.831749][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 > >>>>>>> Sep 9 12:49:34 [829556.879295][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>>>>> Sep 9 12:49:34 [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 > >>>>>>> Sep 9 12:49:34 [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>>>>> Sep 9 12:49:34 [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>>>>>> Sep 9 12:49:34 [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt > >>>>>>> Sep 9 12:49:34 [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > >>>>>>> Sep 9 12:49:34 [829557.231174][ T2925] Rebooting in 10 seconds.. > >>>>>>> Sep 9 12:49:44 [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG. > >>>>>>> > >>>>>>>> On 30 Mar 2021, at 16:39, Eric Dumazet <edumazet@google.com> wrote: > >>>>>>>> > >>>>>>>> On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote: > >>>>>>>>> > >>>>>>>>> Hi Eric and Wei > >>>>>>>>> > >>>>>>>>> Please check this log : > >>>>>>>>> > >>>>>>>> > >>>>>>>> Please send a normal report to netdev. > >>>>>>>> > >>>>>>>> This has nothing to to with us (Eric & Wei) > >>>>>>>> > >>>>>>>> Thanks. > >>>>>>>> > >>>>>>>>> > >>>>>>>>> 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null) > >>>>>>>>> [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G O 5.11.4 #1 > >>>>>>>>> [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 > >>>>>>>>> [1584289.107263] Call Trace: > >>>>>>>>> [1584289.107266] dump_stack+0x58/0x6b > >>>>>>>>> [1584289.209562] warn_alloc.cold+0x70/0xd4 > >>>>>>>>> [1584289.209569] __alloc_pages_slowpath.constprop.0+0xd57/0xfb0 > >>>>>>>>> [1584289.209574] __alloc_pages_nodemask+0x15a/0x180 > >>>>>>>>> [1584289.474009] allocate_slab+0x272/0x450 > >>>>>>>>> [1584289.496731] ___slab_alloc.constprop.0+0x41e/0x4d0 > >>>>>>>>> [1584289.519147] kmem_cache_alloc+0x110/0x120 > >>>>>>>>> [1584289.541416] build_skb+0x1a/0x200 > >>>>>>>>> [1584289.563121] ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe] > >>>>>>>>> [1584289.584618] ixgbe_poll+0xeb/0x2a0 [ixgbe] > >>>>>>>>> [1584289.605528] __napi_poll+0x1f/0x130 > >>>>>>>>> [1584289.625842] napi_threaded_poll+0x110/0x160 > >>>>>>>>> [1584289.646110] ? __napi_poll+0x130/0x130 > >>>>>>>>> [1584289.665810] kthread+0xea/0x120 > >>>>>>>>> [1584289.684836] ? kthread_park+0x80/0x80 > >>>>>>>>> [1584289.703440] ret_from_fork+0x1f/0x30 > >>>>>>>>> [1584289.721616] Mem-Info: > >>>>>>>>> [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0 > >>>>>>>>> active_file:17408 inactive_file:149 isolated_file:32 > >>>>>>>>> unevictable:1440359 dirty:17500 writeback:0 > >>>>>>>>> slab_reclaimable:43368 slab_unreclaimable:155124 > >>>>>>>>> mapped:817431 shmem:7650 pagetables:32093 bounce:0 > >>>>>>>>> free:17832 free_pcp:113 free_cma:0 > >>>>>>>>> [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no > >>>>>>>>> [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > >>>>>>>>> [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726 > >>>>>>>>> [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB > >>>>>>>>> [1584290.104980] lowmem_reserve[]: 0 0 13985 13985 > >>>>>>>>> [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB > >>>>>>>>> [1584290.237051] lowmem_reserve[]: 0 0 0 0 > >>>>>>>>> [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB > >>>>>>>>> [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB > >>>>>>>>> [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB > >>>>>>>>> [1584290.409087] 1465768 total pagecache pages > >>>>>>>>> [1584290.434531] 4165289 pages RAM > >>>>>>>>> [1584290.459616] 0 pages HighMem/MovableOnly > >>>>>>>>> [1584290.484480] 104766 pages reserved > >>>>>>>>> [1584290.508709] 0 pages hwpoisoned > >>>>>>>>> [1584301.710231] team0: Failed to send options change via netlink (err -105) > >>>>>>>>> [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0 > >>>>>>>>> [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G O 5.11.4 #1 > >>>>>>>>> [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 > >>>>>>>>> [1584302.776532] Call Trace: > >>>>>>>>> [1584302.799361] dump_stack+0x58/0x6b > >>>>>>>>> [1584302.821791] dump_header+0x4c/0x2e6 > >>>>>>>>> [1584302.843580] oom_kill_process.cold+0xb/0x10 > >>>>>>>>> [1584302.865223] out_of_memory.part.0+0x125/0x5f0 > >>>>>>>>> [1584302.886641] out_of_memory+0x54/0xa0 > >>>>>>>>> [1584302.907302] __alloc_pages_slowpath.constprop.0+0xb03/0xfb0 > >>>>>>>>> [1584302.927913] __alloc_pages_nodemask+0x15a/0x180 > >>>>>>>>> [1584302.947874] __get_free_pages+0x8/0x30 > >>>>>>>>> [1584302.967246] pgd_alloc+0x21/0x180 > >>>>>>>>> [1584302.986355] mm_alloc+0x1af/0x250 > >>>>>>>>> [1584303.005085] alloc_bprm+0x80/0x2a0 > >>>>>>>>> [1584303.023328] do_execveat_common+0x8b/0x330 > >>>>>>>>> [1584303.041181] __x64_sys_execve+0x2b/0x40 > >>>>>>>>> [1584303.058513] do_syscall_64+0x2d/0x40 > >>>>>>>>> [1584303.075281] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>>>>>>>> [1584303.091891] RIP: 0033:0x488376 > >>>>>>>>> [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30 > >>>>>>>>> [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b > >>>>>>>>> [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376 > >>>>>>>>> [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660 > >>>>>>>>> [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000 > >>>>>>>>> [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258 > >>>>>>>>> [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100 > >>>>>>>>> [1584303.379094] Mem-Info: > >>>>>>>>> [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0 > >>>>>>>>> active_file:12975 inactive_file:168 isolated_file:32 > >>>>>>>>> unevictable:909709 dirty:12864 writeback:10 > >>>>>>>>> slab_reclaimable:42415 slab_unreclaimable:154783 > >>>>>>>>> mapped:39825 shmem:14744 pagetables:26041 bounce:0 > >>>>>>>>> free:537002 free_pcp:1813 free_cma:0 > >>>>>>>>> [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no > >>>>>>>>> [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > >>>>>>>>> [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726 > >>>>>>>>> [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB > >>>>>>>>> [1584303.888935] lowmem_reserve[]: 0 0 13985 13985 > >>>>>>>>> [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB > >>>>>>>>> [1584304.036531] lowmem_reserve[]: 0 0 0 0 > >>>>>>>>> [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB > >>>>>>>>> [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB > >>>>>>>>> [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB > >>>>>>>>> [1584304.287094] 933871 total pagecache pages > >>>>>>>>> [1584304.312815] 4165289 pages RAM > >>>>>>>>> [1584304.337915] 0 pages HighMem/MovableOnly > >>>>>>>>> [1584304.362522] 104766 pages reserved > >>>>>>>>> [1584304.386516] 0 pages hwpoisoned > >>>>>>>>> > >>>>>>>>>> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote: > >>>>>>>>>> > >>>>>>>>>> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Hi Wei > >>>>>>>>>>> Check this: > >>>>>>>>>>> > >>>>>>>>>>> [ 39.706567] ------------[ cut here ]------------ > >>>>>>>>>>> [ 39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557) > >>>>>>>>>>> [ 39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100 > >>>>>>>>>> > >>>>>>>>>> Probably more relevant to Intel maintainers than Wei :/ > >>>>>>>>>> > >>>>>>>>>>> [ 39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas > >>>>>>>>>>> [ 39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G O 5.11.7 #1 > >>>>>>>>>>> [ 39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020 > >>>>>>>>>>> [ 39.706619] Workqueue: events work_for_cpu_fn > >>>>>>>>>>> [ 39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100 > >>>>>>>>>>> [ 39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 > >>>>>>>>>>> [ 39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292 > >>>>>>>>>>> [ 39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff > >>>>>>>>>>> [ 39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001 > >>>>>>>>>>> [ 39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea > >>>>>>>>>>> [ 39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008 > >>>>>>>>>>> [ 39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000 > >>>>>>>>>>> [ 39.706646] FS: 0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000 > >>>>>>>>>>> [ 39.706649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>>>>>>>>> [ 39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0 > >>>>>>>>>>> [ 39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >>>>>>>>>>> [ 39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > >>>>>>>>>>> [ 39.706656] Call Trace: > >>>>>>>>>>> [ 39.706658] i40e_setup_pf_switch+0x617/0xf80 [i40e] > >>>>>>>>>>> [ 39.706683] i40e_probe.part.0.cold+0x8dc/0x109e [i40e] > >>>>>>>>>>> [ 39.706708] ? acpi_ns_check_object_type+0xd4/0x193 > >>>>>>>>>>> [ 39.706713] ? acpi_ns_check_package_list+0xfd/0x205 > >>>>>>>>>>> [ 39.706716] ? __kmalloc+0x37/0x160 > >>>>>>>>>>> [ 39.706720] ? kmem_cache_alloc+0xcb/0x120 > >>>>>>>>>>> [ 39.706723] ? irq_get_irq_data+0x5/0x20 > >>>>>>>>>>> [ 39.706726] ? mp_check_pin_attr+0xe/0xf0 > >>>>>>>>>>> [ 39.706729] ? irq_get_irq_data+0x5/0x20 > >>>>>>>>>>> [ 39.706731] ? mp_map_pin_to_irq+0xb7/0x2c0 > >>>>>>>>>>> [ 39.706735] ? acpi_register_gsi_ioapic+0x86/0x150 > >>>>>>>>>>> [ 39.706739] ? pci_conf1_read+0x9f/0xf0 > >>>>>>>>>>> [ 39.706743] ? pci_bus_read_config_word+0x2e/0x40 > >>>>>>>>>>> [ 39.706746] local_pci_probe+0x1b/0x40 > >>>>>>>>>>> [ 39.706750] work_for_cpu_fn+0xb/0x20 > >>>>>>>>>>> [ 39.706754] process_one_work+0x1ec/0x350 > >>>>>>>>>>> [ 39.706758] worker_thread+0x24b/0x4d0 > >>>>>>>>>>> [ 39.706760] ? process_one_work+0x350/0x350 > >>>>>>>>>>> [ 39.706762] kthread+0xea/0x120 > >>>>>>>>>>> [ 39.706766] ? kthread_park+0x80/0x80 > >>>>>>>>>>> [ 39.706770] ret_from_fork+0x1f/0x30 > >>>>>>>>>>> [ 39.706774] ---[ end trace 7a203f3ec972a377 ]--- > >>>>>>>>>>> > >>>>>>>>>>> Martin > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to > >>>>>>>>>>>> determine if the kthread owns this napi and could call napi->poll() on > >>>>>>>>>>>> it. However, if socket busy poll is enabled, it is possible that the > >>>>>>>>>>>> busy poll thread grabs this SCHED bit (after the previous napi->poll() > >>>>>>>>>>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll > >>>>>>>>>>>> on the same napi. napi_disable() could grab the SCHED bit as well. > >>>>>>>>>>>> This patch tries to fix this race by adding a new bit > >>>>>>>>>>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in > >>>>>>>>>>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared > >>>>>>>>>>>> in napi_complete_done(), and we only poll the napi in kthread if this > >>>>>>>>>>>> bit is set. This helps distinguish the ownership of the napi between > >>>>>>>>>>>> kthread and other scenarios and fixes the race issue. > >>>>>>>>>>>> > >>>>>>>>>>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") > >>>>>>>>>>>> Reported-by: Martin Zaharinov <micron10@gmail.com> > >>>>>>>>>>>> Suggested-by: Jakub Kicinski <kuba@kernel.org> > >>>>>>>>>>>> Signed-off-by: Wei Wang <weiwan@google.com> > >>>>>>>>>>>> Cc: Alexander Duyck <alexanderduyck@fb.com> > >>>>>>>>>>>> Cc: Eric Dumazet <edumazet@google.com> > >>>>>>>>>>>> Cc: Paolo Abeni <pabeni@redhat.com> > >>>>>>>>>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> > >>>>>>>>>>>> --- > >>>>>>>>>>>> Change since v3: > >>>>>>>>>>>> - Add READ_ONCE() for thread->state and add comments in > >>>>>>>>>>>> ____napi_schedule(). > >>>>>>>>>>>> > >>>>>>>>>>>> include/linux/netdevice.h | 2 ++ > >>>>>>>>>>>> net/core/dev.c | 19 ++++++++++++++++++- > >>>>>>>>>>>> 2 files changed, 20 insertions(+), 1 deletion(-) > >>>>>>>>>>>> > >>>>>>>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > >>>>>>>>>>>> index 5b67ea89d5f2..87a5d186faff 100644 > >>>>>>>>>>>> --- a/include/linux/netdevice.h > >>>>>>>>>>>> +++ b/include/linux/netdevice.h > >>>>>>>>>>>> @@ -360,6 +360,7 @@ enum { > >>>>>>>>>>>> NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */ > >>>>>>>>>>>> NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/ > >>>>>>>>>>>> NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/ > >>>>>>>>>>>> + NAPI_STATE_SCHED_THREADED, /* Napi is currently scheduled in threaded mode */ > >>>>>>>>>>>> }; > >>>>>>>>>>>> > >>>>>>>>>>>> enum { > >>>>>>>>>>>> @@ -372,6 +373,7 @@ enum { > >>>>>>>>>>>> NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL), > >>>>>>>>>>>> NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL), > >>>>>>>>>>>> NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED), > >>>>>>>>>>>> + NAPIF_STATE_SCHED_THREADED = BIT(NAPI_STATE_SCHED_THREADED), > >>>>>>>>>>>> }; > >>>>>>>>>>>> > >>>>>>>>>>>> enum gro_result { > >>>>>>>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c > >>>>>>>>>>>> index 6c5967e80132..d3195a95f30e 100644 > >>>>>>>>>>>> --- a/net/core/dev.c > >>>>>>>>>>>> +++ b/net/core/dev.c > >>>>>>>>>>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd, > >>>>>>>>>>>> */ > >>>>>>>>>>>> thread = READ_ONCE(napi->thread); > >>>>>>>>>>>> if (thread) { > >>>>>>>>>>>> + /* Avoid doing set_bit() if the thread is in > >>>>>>>>>>>> + * INTERRUPTIBLE state, cause napi_thread_wait() > >>>>>>>>>>>> + * makes sure to proceed with napi polling > >>>>>>>>>>>> + * if the thread is explicitly woken from here. > >>>>>>>>>>>> + */ > >>>>>>>>>>>> + if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE) > >>>>>>>>>>>> + set_bit(NAPI_STATE_SCHED_THREADED, &napi->state); > >>>>>>>>>>>> wake_up_process(thread); > >>>>>>>>>>>> return; > >>>>>>>>>>>> } > >>>>>>>>>>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done) > >>>>>>>>>>>> WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED)); > >>>>>>>>>>>> > >>>>>>>>>>>> new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED | > >>>>>>>>>>>> + NAPIF_STATE_SCHED_THREADED | > >>>>>>>>>>>> NAPIF_STATE_PREFER_BUSY_POLL); > >>>>>>>>>>>> > >>>>>>>>>>>> /* If STATE_MISSED was set, leave STATE_SCHED set, > >>>>>>>>>>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) > >>>>>>>>>>>> > >>>>>>>>>>>> static int napi_thread_wait(struct napi_struct *napi) > >>>>>>>>>>>> { > >>>>>>>>>>>> + bool woken = false; > >>>>>>>>>>>> + > >>>>>>>>>>>> set_current_state(TASK_INTERRUPTIBLE); > >>>>>>>>>>>> > >>>>>>>>>>>> while (!kthread_should_stop() && !napi_disable_pending(napi)) { > >>>>>>>>>>>> - if (test_bit(NAPI_STATE_SCHED, &napi->state)) { > >>>>>>>>>>>> + /* Testing SCHED_THREADED bit here to make sure the current > >>>>>>>>>>>> + * kthread owns this napi and could poll on this napi. > >>>>>>>>>>>> + * Testing SCHED bit is not enough because SCHED bit might be > >>>>>>>>>>>> + * set by some other busy poll thread or by napi_disable(). > >>>>>>>>>>>> + */ > >>>>>>>>>>>> + if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) { > >>>>>>>>>>>> WARN_ON(!list_empty(&napi->poll_list)); > >>>>>>>>>>>> __set_current_state(TASK_RUNNING); > >>>>>>>>>>>> return 0; > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> schedule(); > >>>>>>>>>>>> + /* woken being true indicates this thread owns this napi. */ > >>>>>>>>>>>> + woken = true; > >>>>>>>>>>>> set_current_state(TASK_INTERRUPTIBLE); > >>>>>>>>>>>> } > >>>>>>>>>>>> __set_current_state(TASK_RUNNING); > >>>>>>>>>>>> -- > >>>>>>>>>>>> 2.31.0.rc2.261.g7f71774620-goog > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>> > >>> > >> >
On Fri, Sep 24, 2021 at 9:50 AM Martin Zaharinov <micron10@gmail.com> wrote: > > Hi Wei > > App is pppd server > and not use > - SO_BUSY_POLL_BUDGET > - SO_PREFER_BUSY_POLL > kernel with crash is 5.13.13 I see. Thanks. > > i move to 5.14.7 but disable kthread until I talk with you. > > I check and intel card 82599 not use busy poll . > > And may be not need to use busy poll. > > If not enable busypoll is will go to same crash. Just to confirm, are you saying that if you disable busypoll, the issue no longer happens? > and may be need to add documents that when is enable kthread need to by disable busypoll In theory, it should still work. But there is probably some race hidden somewhere. If it is OK for your workload to disable busy poll, maybe it's better to do that for now. I will dig a bit more and see what I can find. > > > Martin > > > On Fri, Sep 24, 2021, 19:42 Wei Wang <weiwan@google.com> wrote: >> >> On Thu, Sep 23, 2021 at 11:18 PM Martin Zaharinov <micron10@gmail.com> wrote: >> > >> > Hi Wei >> > >> > I think we discussed it somewhere here that it should be enabled: >> > >> > cat /proc/sys/net/core/busy_poll - 50 >> > cat /proc/sys/net/core/busy_read - 50 >> > >> > and one more : >> > >> > “ packet receipt: >> > >> > high-latency >> > interrupt-based -------------------> poll-based >> > >> > Busy polling helps reduce latency in the network receive path by >> > >> > • allowing socket layer code to poll the receive queue of a network device, >> > • and disable network interrupts. >> > This eliminates >> > >> > • delays caused by the interrupts >> > • and the resultant context switches >> > However, it >> > >> > • increses CPU utilization. >> > • Also prevent the CPU from sleeping, which can incur additional power comsumption. >> > Busy polling is disabled by default. >> > >> > Set net.core.busy_poll to a value other than 0 to enable it. >> > >> > This parameter controls the number of microseconds to wait for packets on the device queue for socket pool and selects. Red Hat recemmends a value of 50. >> > >> > Add the SO_BUSY_POLL socket option to the socket. " >> > >> > >> > >> > do you think it comes from him? >> >> Yes. I understand that you enabled busy polling. I was just wondering >> why you need to enable busy polling + thread polling. >> Could you help me confirm on the receive side, if the application uses >> either of the following socket options: >> - SO_BUSY_POLL_BUDGET >> - SO_PREFER_BUSY_POLL >> >> And what kernel version are you using? >> >> > >> > Martin >> > >> > > On 24 Sep 2021, at 3:54, Wei Wang <weiwan@google.com> wrote: >> > > >> > > Hi Martin, >> > > >> > > It looks like there might still be a race between kthread polling and >> > > busy polling. I am looking into the code but was not able to identify >> > > the cause. >> > > May I ask why you need to enable both at the same time? >> > > >> > > Thanks. >> > > Wei >> > > >> > > >> > > On Thu, Sep 23, 2021 at 1:31 PM Martin Zaharinov <micron10@gmail.com> wrote: >> > >> >> > >> Hey Wai >> > >> >> > >> If you find any fix for this write me to test . >> > >> >> > >> kthread is a very good solution for network load server but need to find from where is come this bug . >> > >> >> > >> >> > >> Martin >> > >> >> > >>> On 22 Sep 2021, at 17:12, Martin Zaharinov <micron10@gmail.com> wrote: >> > >>> >> > >>> Hi Wei >> > >>> >> > >>> One more bug report from last hours: >> > >>> >> > >>> >> > >>> >> > >>> Sep 9 12:49:31 [829553.899833][ T2925] ------------[ cut here ]------------ >> > >>> Sep 9 12:49:31 [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158 >> > >>> Sep 9 12:49:31 [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90 >> > >>> Sep 9 12:49:31 [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] >> > >>> Sep 9 12:49:31 [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G O 5.13.13 #1 >> > >>> Sep 9 12:49:31 [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 >> > >>> Sep 9 12:49:31 [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90 >> > >>> Sep 9 12:49:31 [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6 >> > >>> Sep 9 12:49:31 [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286 >> > >>> Sep 9 12:49:31 [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001 >> > >>> Sep 9 12:49:32 [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff >> > >>> Sep 9 12:49:32 [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff >> > >>> Sep 9 12:49:32 [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00 >> > >>> Sep 9 12:49:32 [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00 >> > >>> Sep 9 12:49:32 [829554.744221][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >> > >>> Sep 9 12:49:32 [829554.795701][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> > >>> Sep 9 12:49:32 [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0 >> > >>> Sep 9 12:49:32 [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> > >>> Sep 9 12:49:32 [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> > >>> Sep 9 12:49:32 [829554.972250][ T2925] Call Trace: >> > >>> Sep 9 12:49:32 [829554.996597][ T2925] netif_receive_skb_list_internal+0x25c/0x2b0 >> > >>> Sep 9 12:49:32 [829555.021270][ T2925] busy_poll_stop+0x113/0x140 >> > >>> Sep 9 12:49:32 [829555.045679][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 >> > >>> Sep 9 12:49:32 [829555.069833][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] >> > >>> Sep 9 12:49:32 [829555.093659][ T2925] napi_busy_loop+0x212/0x280 >> > >>> Sep 9 12:49:32 [829555.117046][ T2925] ep_poll+0xba/0x380 >> > >>> Sep 9 12:49:32 [829555.140048][ T2925] ? __napi_poll+0x1f/0x100 >> > >>> Sep 9 12:49:32 [829555.162477][ T2925] do_epoll_wait+0xa6/0xc0 >> > >>> Sep 9 12:49:32 [829555.184504][ T2925] do_epoll_pwait.part.0+0x9/0x70 >> > >>> Sep 9 12:49:32 [829555.206138][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 >> > >>> Sep 9 12:49:32 [829555.227619][ T2925] ? do_syscall_64+0x3a/0x70 >> > >>> Sep 9 12:49:32 [829555.248592][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae >> > >>> Sep 9 12:49:32 [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]--- >> > >>> Sep 9 12:49:32 [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9 >> > >>> Sep 9 12:49:32 [829555.357231][ T2925] #PF: supervisor read access in kernel mode >> > >>> Sep 9 12:49:32 [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page >> > >>> Sep 9 12:49:32 [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0 >> > >>> Sep 9 12:49:32 [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI >> > >>> Sep 9 12:49:32 [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G W O 5.13.13 #1 >> > >>> Sep 9 12:49:32 [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 >> > >>> Sep 9 12:49:32 [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 >> > >>> Sep 9 12:49:33 [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 >> > >>> Sep 9 12:49:33 [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 >> > >>> Sep 9 12:49:33 [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 >> > >>> Sep 9 12:49:33 [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 >> > >>> Sep 9 12:49:33 [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 >> > >>> Sep 9 12:49:33 [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 >> > >>> Sep 9 12:49:33 [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 >> > >>> Sep 9 12:49:33 [829555.797754][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >> > >>> Sep 9 12:49:33 [829555.843229][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> > >>> Sep 9 12:49:33 [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 >> > >>> Sep 9 12:49:33 [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> > >>> Sep 9 12:49:33 [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> > >>> Sep 9 12:49:33 [829556.008563][ T2925] Call Trace: >> > >>> Sep 9 12:49:33 [829556.032547][ T2925] ? enqueue_to_backlog+0x81/0x250 >> > >>> Sep 9 12:49:33 [829556.056686][ T2925] netif_receive_skb_list_internal+0x24d/0x2b0 >> > >>> Sep 9 12:49:33 [829556.080870][ T2925] busy_poll_stop+0x113/0x140 >> > >>> Sep 9 12:49:33 [829556.104559][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 >> > >>> Sep 9 12:49:33 [829556.128028][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] >> > >>> Sep 9 12:49:33 [829556.151405][ T2925] napi_busy_loop+0x212/0x280 >> > >>> Sep 9 12:49:33 [829556.174478][ T2925] ep_poll+0xba/0x380 >> > >>> Sep 9 12:49:33 [829556.196887][ T2925] ? __napi_poll+0x1f/0x100 >> > >>> Sep 9 12:49:33 [829556.219070][ T2925] do_epoll_wait+0xa6/0xc0 >> > >>> Sep 9 12:49:33 [829556.240778][ T2925] do_epoll_pwait.part.0+0x9/0x70 >> > >>> Sep 9 12:49:33 [829556.262203][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 >> > >>> Sep 9 12:49:33 [829556.283188][ T2925] ? do_syscall_64+0x3a/0x70 >> > >>> Sep 9 12:49:33 [829556.303666][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae >> > >>> Sep 9 12:49:33 [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] >> > >>> Sep 9 12:49:33 [829556.487037][ T2925] CR2: 00000000000496c9 >> > >>> Sep 9 12:49:33 [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]--- >> > >>> Sep 9 12:49:33 [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 >> > >>> Sep 9 12:49:34 [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 >> > >>> Sep 9 12:49:34 [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 >> > >>> Sep 9 12:49:34 [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 >> > >>> Sep 9 12:49:34 [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 >> > >>> Sep 9 12:49:34 [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 >> > >>> Sep 9 12:49:34 [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 >> > >>> Sep 9 12:49:34 [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 >> > >>> Sep 9 12:49:34 [829556.831749][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >> > >>> Sep 9 12:49:34 [829556.879295][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> > >>> Sep 9 12:49:34 [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 >> > >>> Sep 9 12:49:34 [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> > >>> Sep 9 12:49:34 [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> > >>> Sep 9 12:49:34 [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt >> > >>> Sep 9 12:49:34 [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) >> > >>> Sep 9 12:49:34 [829557.231174][ T2925] Rebooting in 10 seconds.. >> > >>> Sep 9 12:49:44 [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG. >> > >>> >> > >>>> On 15 Sep 2021, at 18:45, Wei Wang <weiwan@google.com> wrote: >> > >>>> >> > >>>> Thanks Martin for the report. >> > >>>> Without a reproducer, it might be hard to debug. I will double check >> > >>>> the code to check for potential race between kthread poll and busy >> > >>>> poll. >> > >>>> >> > >>>> Thanks. >> > >>>> Wei >> > >>>> >> > >>>> On Wed, Sep 15, 2021 at 7:22 AM Martin Zaharinov <micron10@gmail.com> wrote: >> > >>>>> >> > >>>>> Hi Wei >> > >>>>> Please see this bug log : >> > >>>>> >> > >>>>> >> > >>>>> Sep 15 08:04:56 [2034411.548669][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> > >>>>> Sep 15 08:04:56 [2034411.574642][ T3195] CR2: 00007f1e8cf58000 CR3: 0000000183cb4003 CR4: 00000000001706e0 >> > >>>>> Sep 15 08:04:56 [2034411.625156][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> > >>>>> Sep 15 08:04:56 [2034411.675495][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> > >>>>> Sep 15 08:04:56 [2034411.725536][ T3195] Call Trace: >> > >>>>> Sep 15 08:04:56 [2034411.749948][ T3195] netif_receive_skb_list_internal+0x25c/0x2b0 >> > >>>>> Sep 15 08:04:56 [2034411.774579][ T3195] gro_normal_one+0x6e/0x90 >> > >>>>> Sep 15 08:04:56 [2034411.798786][ T3195] napi_gro_flush+0xb1/0x100 >> > >>>>> Sep 15 08:04:56 [2034411.822410][ T3195] napi_complete_done+0x107/0x180 >> > >>>>> Sep 15 08:04:56 [2034411.845614][ T3195] ixgbe_poll+0x10e/0x240 [ixgbe] >> > >>>>> Sep 15 08:04:56 [2034411.868480][ T3195] __napi_poll+0x1f/0x100 >> > >>>>> Sep 15 08:04:56 [2034411.890899][ T3195] ? __napi_poll+0x100/0x100 >> > >>>>> Sep 15 08:04:56 [2034411.912799][ T3195] napi_threaded_poll+0x105/0x150 >> > >>>>> Sep 15 08:04:56 [2034411.934567][ T3195] kthread+0x101/0x120 >> > >>>>> Sep 15 08:04:56 [2034411.955873][ T3195] ? set_kthread_struct+0x30/0x30 >> > >>>>> Sep 15 08:04:56 [2034411.977157][ T3195] ret_from_fork+0x1f/0x30 >> > >>>>> Sep 15 08:04:56 [2034411.997922][ T3195] ---[ end trace 83b8d17d2762bc73 ]--- >> > >>>>> Sep 15 08:04:56 [2034412.018439][ T3195] BUG: kernel NULL pointer dereference, address: 0000000000000000 >> > >>>>> Sep 15 08:04:56 [2034412.058658][ T3195] #PF: supervisor read access in kernel mode >> > >>>>> Sep 15 08:04:56 [2034412.078866][ T3195] #PF: error_code(0x0000) - not-present page >> > >>>>> Sep 15 08:04:56 [2034412.098648][ T3195] PGD 12fabb067 P4D 12fabb067 PUD 17d7fa067 PMD 0 >> > >>>>> Sep 15 08:04:56 [2034412.118230][ T3195] Oops: 0000 [#1] SMP NOPTI >> > >>>>> Sep 15 08:04:56 [2034412.137240][ T3195] CPU: 2 PID: 3195 Comm: napi/eth0-513 Tainted: G S W O 5.13.12 #1 >> > >>>>> Sep 15 08:04:56 [2034412.174538][ T3195] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 >> > >>>>> Sep 15 08:04:56 [2034412.211613][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0 >> > >>>>> Sep 15 08:04:56 [2034412.230852][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49 >> > >>>>> Sep 15 08:04:56 [2034412.287806][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296 >> > >>>>> Sep 15 08:04:56 [2034412.306525][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 >> > >>>>> Sep 15 08:04:56 [2034412.343308][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003 >> > >>>>> Sep 15 08:04:56 [2034412.381399][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff >> > >>>>> Sep 15 08:04:56 [2034412.421437][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00 >> > >>>>> Sep 15 08:04:57 [2034412.463438][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00 >> > >>>>> Sep 15 08:04:57 [2034412.507493][ T3195] FS: 0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000 >> > >>>>> Sep 15 08:04:57 [2034412.553528][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> > >>>>> Sep 15 08:04:57 [2034412.577424][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0 >> > >>>>> Sep 15 08:04:57 [2034412.624853][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> > >>>>> Sep 15 08:04:57 [2034412.672633][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> > >>>>> Sep 15 08:04:57 [2034412.721656][ T3195] Call Trace: >> > >>>>> Sep 15 08:04:57 [2034412.746016][ T3195] gro_normal_one+0x6e/0x90 >> > >>>>> Sep 15 08:04:57 [2034412.770321][ T3195] napi_gro_flush+0xb1/0x100 >> > >>>>> Sep 15 08:04:57 [2034412.794137][ T3195] napi_complete_done+0x107/0x180 >> > >>>>> Sep 15 08:04:57 [2034412.817556][ T3195] ixgbe_poll+0x10e/0x240 [ixgbe] >> > >>>>> Sep 15 08:04:57 [2034412.840522][ T3195] __napi_poll+0x1f/0x100 >> > >>>>> Sep 15 08:04:57 [2034412.862829][ T3195] ? __napi_poll+0x100/0x100 >> > >>>>> Sep 15 08:04:57 [2034412.884804][ T3195] napi_threaded_poll+0x105/0x150 >> > >>>>> Sep 15 08:04:57 [2034412.906305][ T3195] kthread+0x101/0x120 >> > >>>>> Sep 15 08:04:57 [2034412.927502][ T3195] ? set_kthread_struct+0x30/0x30 >> > >>>>> Sep 15 08:04:57 [2034412.948434][ T3195] ret_from_fork+0x1f/0x30 >> > >>>>> Sep 15 08:04:57 [2034412.969117][ T3195] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_dtvqos(O) xt_TCPMSS xt_comment iptable_mangle xt_MASQUERADE xt_nat iptable_nat ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: DTVIAC] >> > >>>>> Sep 15 08:04:57 [2034413.136792][ T3195] CR2: 0000000000000000 >> > >>>>> Sep 15 08:04:57 [2034413.157289][ T3195] ---[ end trace 83b8d17d2762bc74 ]--- >> > >>>>> Sep 15 08:04:57 [2034413.177534][ T3195] RIP: 0010:netif_receive_skb_list_internal+0x224/0x2b0 >> > >>>>> Sep 15 08:04:57 [2034413.197775][ T3195] Code: 20 ff ff ff 4c 89 74 24 18 4c 89 74 24 20 e9 42 ff ff ff 4c 8b 65 00 49 8b 1c 24 4d 89 e7 4c 39 e5 74 70 4c 8d 74 24 18 eb 15 <48> 8b 03 49 89 df 49 89 dc 48 39 eb 0f 84 94 fe ff ff 48 89 c3 49 >> > >>>>> Sep 15 08:04:57 [2034413.258524][ T3195] RSP: 0018:ffff9de881f47d68 EFLAGS: 00010296 >> > >>>>> Sep 15 08:04:57 [2034413.278591][ T3195] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 >> > >>>>> Sep 15 08:04:57 [2034413.317532][ T3195] RDX: ffff98b01fc626f8 RSI: 0000000000000009 RDI: 0000000000000003 >> > >>>>> Sep 15 08:04:57 [2034413.356936][ T3195] RBP: ffff98a8e29d8158 R08: 0000000000000001 R09: 00000000fffbffff >> > >>>>> Sep 15 08:04:57 [2034413.398408][ T3195] R10: ffff98b01d600000 R11: ffff9de881f47bb0 R12: ffff98a9f1352a00 >> > >>>>> Sep 15 08:04:58 [2034413.441949][ T3195] R13: 0000000000000009 R14: ffff9de881f47d80 R15: ffff98a9f1352a00 >> > >>>>> Sep 15 08:04:58 [2034413.487558][ T3195] FS: 0000000000000000(0000) GS:ffff98b01fa80000(0000) knlGS:0000000000000000 >> > >>>>> Sep 15 08:04:58 [2034413.535263][ T3195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> > >>>>> Sep 15 08:04:58 [2034413.560012][ T3195] CR2: 0000000000000000 CR3: 0000000183cb4003 CR4: 00000000001706e0 >> > >>>>> Sep 15 08:04:58 [2034413.609277][ T3195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> > >>>>> Sep 15 08:04:58 [2034413.658501][ T3195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> > >>>>> Sep 15 08:04:58 [2034413.707535][ T3195] Kernel panic - not syncing: Fatal exception in interrupt >> > >>>>> Sep 15 08:04:58 [2034413.856962][ T3195] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) >> > >>>>> Sep 15 08:04:58 [2034413.906445][ T3195] Rebooting in 10 seconds.. >> > >>>>> Sep 15 08:05:08 [2034423.930880][ T3195] ACPI MEMORY or I/O RESET_REG. >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>>> On 10 Sep 2021, at 3:30, Wei Wang <weiwan@google.com> wrote: >> > >>>>>> >> > >>>>>> Hi Martin, >> > >>>>>> >> > >>>>>> Is there a reproducer for this? What kind of traffic is it running? >> > >>>>>> What is the following config: >> > >>>>>> cat /proc/sys/net/core/busy_poll >> > >>>>>> cat /proc/sys/net/core/busy_read >> > >>>>>> cat /sys/class/net/<ixgbe_dev>/threaded >> > >>>>>> And is SO_PREFER_BUSY_POLL used? >> > >>>>>> >> > >>>>>> Thanks. >> > >>>>>> Wei >> > >>>>>> >> > >>>>>> >> > >>>>>> >> > >>>>>> On Thu, Sep 9, 2021 at 4:18 AM Martin Zaharinov <micron10@gmail.com> wrote: >> > >>>>>>> >> > >>>>>>> Hi Eric and Wei >> > >>>>>>> >> > >>>>>>> Please see this bug report from last hour , >> > >>>>>>> Kernel 5.13.13, Traffic is 7Gb/s down/ 7Gb/s up >> > >>>>>>> Uptime before crash : 10day >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> Sep 9 12:49:31 [829553.899833][ T2925] ------------[ cut here ]------------ >> > >>>>>>> Sep 9 12:49:31 [829553.927316][ T2925] list_del corruption. next->prev should be ffff9651016c0b00, but was ffff96511a87c158 >> > >>>>>>> Sep 9 12:49:31 [829553.981630][ T2925] WARNING: CPU: 3 PID: 2925 at lib/list_debug.c:54 __list_del_entry_valid+0x8a/0x90 >> > >>>>>>> Sep 9 12:49:31 [829554.035795][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] >> > >>>>>>> Sep 9 12:49:31 [829554.254022][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G O 5.13.13 #1 >> > >>>>>>> Sep 9 12:49:31 [829554.307304][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 >> > >>>>>>> Sep 9 12:49:31 [829554.360188][ T2925] RIP: 0010:__list_del_entry_valid+0x8a/0x90 >> > >>>>>>> Sep 9 12:49:31 [829554.386671][ T2925] Code: 46 00 0f 0b 31 c0 c3 48 89 f2 48 89 fe 48 c7 c7 40 33 16 9a e8 0b 69 46 00 0f 0b 31 c0 c3 48 c7 c7 80 33 16 9a e8 fa 68 46 00 <0f> 0b 31 c0 c3 cc 89 f8 48 85 d2 74 1f 48 01 f2 0f b6 0e 48 ff c6 >> > >>>>>>> Sep 9 12:49:31 [829554.465378][ T2925] RSP: 0018:ffffa90ec1affd00 EFLAGS: 00010286 >> > >>>>>>> Sep 9 12:49:31 [829554.491219][ T2925] RAX: 0000000000000054 RBX: ffff9651016c0a00 RCX: 0000000000000001 >> > >>>>>>> Sep 9 12:49:32 [829554.541672][ T2925] RDX: 00000000ffffffea RSI: 00000000fffbffff RDI: 00000000fffbffff >> > >>>>>>> Sep 9 12:49:32 [829554.592016][ T2925] RBP: ffff96511a87c158 R08: 0000000000000001 R09: 00000000fffbffff >> > >>>>>>> Sep 9 12:49:32 [829554.642131][ T2925] R10: ffff96546d600000 R11: ffffa90ec1affb50 R12: ffff9651016c0b00 >> > >>>>>>> Sep 9 12:49:32 [829554.692890][ T2925] R13: 0000000000000008 R14: ffffa90ec1affd20 R15: ffff9651016c0b00 >> > >>>>>>> Sep 9 12:49:32 [829554.744221][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >> > >>>>>>> Sep 9 12:49:32 [829554.795701][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> > >>>>>>> Sep 9 12:49:32 [829554.821598][ T2925] CR2: 00007f3854c0a000 CR3: 0000000115a7c001 CR4: 00000000001706e0 >> > >>>>>>> Sep 9 12:49:32 [829554.872045][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> > >>>>>>> Sep 9 12:49:32 [829554.922284][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> > >>>>>>> Sep 9 12:49:32 [829554.972250][ T2925] Call Trace: >> > >>>>>>> Sep 9 12:49:32 [829554.996597][ T2925] netif_receive_skb_list_internal+0x25c/0x2b0 >> > >>>>>>> Sep 9 12:49:32 [829555.021270][ T2925] busy_poll_stop+0x113/0x140 >> > >>>>>>> Sep 9 12:49:32 [829555.045679][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 >> > >>>>>>> Sep 9 12:49:32 [829555.069833][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] >> > >>>>>>> Sep 9 12:49:32 [829555.093659][ T2925] napi_busy_loop+0x212/0x280 >> > >>>>>>> Sep 9 12:49:32 [829555.117046][ T2925] ep_poll+0xba/0x380 >> > >>>>>>> Sep 9 12:49:32 [829555.140048][ T2925] ? __napi_poll+0x1f/0x100 >> > >>>>>>> Sep 9 12:49:32 [829555.162477][ T2925] do_epoll_wait+0xa6/0xc0 >> > >>>>>>> Sep 9 12:49:32 [829555.184504][ T2925] do_epoll_pwait.part.0+0x9/0x70 >> > >>>>>>> Sep 9 12:49:32 [829555.206138][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 >> > >>>>>>> Sep 9 12:49:32 [829555.227619][ T2925] ? do_syscall_64+0x3a/0x70 >> > >>>>>>> Sep 9 12:49:32 [829555.248592][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae >> > >>>>>>> Sep 9 12:49:32 [829555.269414][ T2925] ---[ end trace 7792cf332872df55 ]--- >> > >>>>>>> Sep 9 12:49:32 [829555.317238][ T2925] BUG: unable to handle page fault for address: 00000000000496c9 >> > >>>>>>> Sep 9 12:49:32 [829555.357231][ T2925] #PF: supervisor read access in kernel mode >> > >>>>>>> Sep 9 12:49:32 [829555.377314][ T2925] #PF: error_code(0x0000) - not-present page >> > >>>>>>> Sep 9 12:49:32 [829555.396972][ T2925] PGD 12ad0f067 P4D 12ad0f067 PUD 12ad11067 PMD 0 >> > >>>>>>> Sep 9 12:49:32 [829555.416441][ T2925] Oops: 0000 [#1] SMP NOPTI >> > >>>>>>> Sep 9 12:49:32 [829555.435328][ T2925] CPU: 3 PID: 2925 Comm: kresd Tainted: G W O 5.13.13 #1 >> > >>>>>>> Sep 9 12:49:32 [829555.472060][ T2925] Hardware name: Supermicro Super Server/X10SRD-F, BIOS 3.3 10/28/2020 >> > >>>>>>> Sep 9 12:49:32 [829555.508998][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 >> > >>>>>>> Sep 9 12:49:33 [829555.527449][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 >> > >>>>>>> Sep 9 12:49:33 [829555.582511][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 >> > >>>>>>> Sep 9 12:49:33 [829555.600509][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 >> > >>>>>>> Sep 9 12:49:33 [829555.636779][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 >> > >>>>>>> Sep 9 12:49:33 [829555.674280][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 >> > >>>>>>> Sep 9 12:49:33 [829555.713363][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 >> > >>>>>>> Sep 9 12:49:33 [829555.754362][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 >> > >>>>>>> Sep 9 12:49:33 [829555.797754][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >> > >>>>>>> Sep 9 12:49:33 [829555.843229][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> > >>>>>>> Sep 9 12:49:33 [829555.866726][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 >> > >>>>>>> Sep 9 12:49:33 [829555.913285][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> > >>>>>>> Sep 9 12:49:33 [829555.960278][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> > >>>>>>> Sep 9 12:49:33 [829556.008563][ T2925] Call Trace: >> > >>>>>>> Sep 9 12:49:33 [829556.032547][ T2925] ? enqueue_to_backlog+0x81/0x250 >> > >>>>>>> Sep 9 12:49:33 [829556.056686][ T2925] netif_receive_skb_list_internal+0x24d/0x2b0 >> > >>>>>>> Sep 9 12:49:33 [829556.080870][ T2925] busy_poll_stop+0x113/0x140 >> > >>>>>>> Sep 9 12:49:33 [829556.104559][ T2925] ? ep_destroy_wakeup_source+0x20/0x20 >> > >>>>>>> Sep 9 12:49:33 [829556.128028][ T2925] ? ixgbe_clean_rx_irq+0x790/0x790 [ixgbe] >> > >>>>>>> Sep 9 12:49:33 [829556.151405][ T2925] napi_busy_loop+0x212/0x280 >> > >>>>>>> Sep 9 12:49:33 [829556.174478][ T2925] ep_poll+0xba/0x380 >> > >>>>>>> Sep 9 12:49:33 [829556.196887][ T2925] ? __napi_poll+0x1f/0x100 >> > >>>>>>> Sep 9 12:49:33 [829556.219070][ T2925] do_epoll_wait+0xa6/0xc0 >> > >>>>>>> Sep 9 12:49:33 [829556.240778][ T2925] do_epoll_pwait.part.0+0x9/0x70 >> > >>>>>>> Sep 9 12:49:33 [829556.262203][ T2925] __x64_sys_epoll_pwait+0x6a/0x100 >> > >>>>>>> Sep 9 12:49:33 [829556.283188][ T2925] ? do_syscall_64+0x3a/0x70 >> > >>>>>>> Sep 9 12:49:33 [829556.303666][ T2925] ? entry_SYSCALL_64_after_hwframe+0x44/0xae >> > >>>>>>> Sep 9 12:49:33 [829556.323995][ T2925] Modules linked in: xsk_diag unix_diag nf_conntrack_netlink nfnetlink vlan_mon(O) pppoe pppox ppp_generic slhc team_mode_loadbalance team xt_MASQUERADE xt_nat iptable_nat xt_TCPMSS xt_comment iptable_mangle ip_tables netconsole coretemp ixgbe mdio mdio_devres libphy nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_xnatlog(O) acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos [last unloaded: iptable_raw] >> > >>>>>>> Sep 9 12:49:33 [829556.487037][ T2925] CR2: 00000000000496c9 >> > >>>>>>> Sep 9 12:49:33 [829556.507006][ T2925] ---[ end trace 7792cf332872df56 ]--- >> > >>>>>>> Sep 9 12:49:33 [829556.526984][ T2925] RIP: 0010:get_rps_cpu+0x1b/0x2f0 >> > >>>>>>> Sep 9 12:49:34 [829556.546524][ T2925] Code: 0f 0b e8 28 e7 1b 00 0f 1f 84 00 00 00 00 00 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 55 48 89 f5 53 48 83 ec 20 0f b7 56 7c <48> 8b 87 e8 02 00 00 66 85 d2 74 1f ff ca 0f b7 ca 44 8b 87 f4 02 >> > >>>>>>> Sep 9 12:49:34 [829556.604787][ T2925] RSP: 0018:ffffa90ec1affcb0 EFLAGS: 00010282 >> > >>>>>>> Sep 9 12:49:34 [829556.623841][ T2925] RAX: ffff9651847be000 RBX: ffff9651847be000 RCX: 0000000000200015 >> > >>>>>>> Sep 9 12:49:34 [829556.662385][ T2925] RDX: 0000000000009654 RSI: ffff96546fae26f8 RDI: 00000000000493e1 >> > >>>>>>> Sep 9 12:49:34 [829556.702267][ T2925] RBP: ffff96546fae26f8 R08: 0000000000000001 R09: ffff96519f7f1900 >> > >>>>>>> Sep 9 12:49:34 [829556.743605][ T2925] R10: 0000000000000000 R11: 00000000000262c8 R12: ffff96546fae26f8 >> > >>>>>>> Sep 9 12:49:34 [829556.786604][ T2925] R13: 00000000000493e1 R14: ffffa90ec1affd08 R15: ffff96546fae26f8 >> > >>>>>>> Sep 9 12:49:34 [829556.831749][ T2925] FS: 00007f38717a5900(0000) GS:ffff96546fac0000(0000) knlGS:0000000000000000 >> > >>>>>>> Sep 9 12:49:34 [829556.879295][ T2925] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> > >>>>>>> Sep 9 12:49:34 [829556.903908][ T2925] CR2: 00000000000496c9 CR3: 0000000115a7c001 CR4: 00000000001706e0 >> > >>>>>>> Sep 9 12:49:34 [829556.952783][ T2925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> > >>>>>>> Sep 9 12:49:34 [829557.001612][ T2925] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> > >>>>>>> Sep 9 12:49:34 [829557.050193][ T2925] Kernel panic - not syncing: Fatal exception in interrupt >> > >>>>>>> Sep 9 12:49:34 [829557.182948][ T2925] Kernel Offset: 0x18000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) >> > >>>>>>> Sep 9 12:49:34 [829557.231174][ T2925] Rebooting in 10 seconds.. >> > >>>>>>> Sep 9 12:49:44 [829567.255206][ T2925] ACPI MEMORY or I/O RESET_REG. >> > >>>>>>> >> > >>>>>>>> On 30 Mar 2021, at 16:39, Eric Dumazet <edumazet@google.com> wrote: >> > >>>>>>>> >> > >>>>>>>> On Tue, Mar 30, 2021 at 11:25 AM Martin Zaharinov <micron10@gmail.com> wrote: >> > >>>>>>>>> >> > >>>>>>>>> Hi Eric and Wei >> > >>>>>>>>> >> > >>>>>>>>> Please check this log : >> > >>>>>>>>> >> > >>>>>>>> >> > >>>>>>>> Please send a normal report to netdev. >> > >>>>>>>> >> > >>>>>>>> This has nothing to to with us (Eric & Wei) >> > >>>>>>>> >> > >>>>>>>> Thanks. >> > >>>>>>>> >> > >>>>>>>>> >> > >>>>>>>>> 1584288.951272] napi/eth0-523: page allocation failure: order:0, mode:0x40a20(GFP_ATOMIC|__GFP_COMP), nodemask=(null) >> > >>>>>>>>> [1584289.003674] CPU: 4 PID: 3179 Comm: napi/eth0-523 Tainted: G O 5.11.4 #1 >> > >>>>>>>>> [1584289.055545] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 >> > >>>>>>>>> [1584289.107263] Call Trace: >> > >>>>>>>>> [1584289.107266] dump_stack+0x58/0x6b >> > >>>>>>>>> [1584289.209562] warn_alloc.cold+0x70/0xd4 >> > >>>>>>>>> [1584289.209569] __alloc_pages_slowpath.constprop.0+0xd57/0xfb0 >> > >>>>>>>>> [1584289.209574] __alloc_pages_nodemask+0x15a/0x180 >> > >>>>>>>>> [1584289.474009] allocate_slab+0x272/0x450 >> > >>>>>>>>> [1584289.496731] ___slab_alloc.constprop.0+0x41e/0x4d0 >> > >>>>>>>>> [1584289.519147] kmem_cache_alloc+0x110/0x120 >> > >>>>>>>>> [1584289.541416] build_skb+0x1a/0x200 >> > >>>>>>>>> [1584289.563121] ixgbe_clean_rx_irq+0x5fc/0xa10 [ixgbe] >> > >>>>>>>>> [1584289.584618] ixgbe_poll+0xeb/0x2a0 [ixgbe] >> > >>>>>>>>> [1584289.605528] __napi_poll+0x1f/0x130 >> > >>>>>>>>> [1584289.625842] napi_threaded_poll+0x110/0x160 >> > >>>>>>>>> [1584289.646110] ? __napi_poll+0x130/0x130 >> > >>>>>>>>> [1584289.665810] kthread+0xea/0x120 >> > >>>>>>>>> [1584289.684836] ? kthread_park+0x80/0x80 >> > >>>>>>>>> [1584289.703440] ret_from_fork+0x1f/0x30 >> > >>>>>>>>> [1584289.721616] Mem-Info: >> > >>>>>>>>> [1584289.739066] active_anon:8157 inactive_anon:2100191 isolated_anon:0 >> > >>>>>>>>> active_file:17408 inactive_file:149 isolated_file:32 >> > >>>>>>>>> unevictable:1440359 dirty:17500 writeback:0 >> > >>>>>>>>> slab_reclaimable:43368 slab_unreclaimable:155124 >> > >>>>>>>>> mapped:817431 shmem:7650 pagetables:32093 bounce:0 >> > >>>>>>>>> free:17832 free_pcp:113 free_cma:0 >> > >>>>>>>>> [1584289.842614] Node 0 active_anon:32628kB inactive_anon:8400764kB active_file:69312kB inactive_file:880kB unevictable:5761436kB isolated(anon):0kB isolated(file):128kB mapped:3269724kB dirty:69740kB writeback:0kB shmem:30600kB writeback_tmp:0kB kernel_stack:5376kB pagetables:128372kB all_unreclaimable? no >> > >>>>>>>>> [1584289.913793] Node 0 DMA free:13836kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB >> > >>>>>>>>> [1584289.986882] lowmem_reserve[]: 0 1741 15726 15726 >> > >>>>>>>>> [1584290.005519] Node 0 DMA32 free:54448kB min:9780kB low:11560kB high:13340kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:760844kB active_file:51532kB inactive_file:388kB unevictable:885428kB writepending:51744kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:684kB local_pcp:0kB free_cma:0kB >> > >>>>>>>>> [1584290.104980] lowmem_reserve[]: 0 0 13985 13985 >> > >>>>>>>>> [1584290.125807] Node 0 Normal free:2288kB min:78608kB low:92928kB high:107248kB reserved_highatomic:32768KB active_anon:27524kB inactive_anon:7639920kB active_file:17776kB inactive_file:1304kB unevictable:4876016kB writepending:17736kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:288kB local_pcp:28kB free_cma:0kB >> > >>>>>>>>> [1584290.237051] lowmem_reserve[]: 0 0 0 0 >> > >>>>>>>>> [1584290.260423] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 0*2048kB 3*4096kB (M) = 13836kB >> > >>>>>>>>> [1584290.308847] Node 0 DMA32: 12500*4kB (UMEH) 553*8kB (MH) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54424kB >> > >>>>>>>>> [1584290.358363] Node 0 Normal: 0*4kB 25*8kB (H) 0*16kB 5*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 424kB >> > >>>>>>>>> [1584290.409087] 1465768 total pagecache pages >> > >>>>>>>>> [1584290.434531] 4165289 pages RAM >> > >>>>>>>>> [1584290.459616] 0 pages HighMem/MovableOnly >> > >>>>>>>>> [1584290.484480] 104766 pages reserved >> > >>>>>>>>> [1584290.508709] 0 pages hwpoisoned >> > >>>>>>>>> [1584301.710231] team0: Failed to send options change via netlink (err -105) >> > >>>>>>>>> [1584302.635731] telegraf invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=1, oom_score_adj=0 >> > >>>>>>>>> [1584302.682874] CPU: 0 PID: 3494492 Comm: telegraf Tainted: G O 5.11.4 #1 >> > >>>>>>>>> [1584302.729535] Hardware name: Supermicro SYS-5038MR-H8TRF/X10SRD-F, BIOS 3.2 11/22/2019 >> > >>>>>>>>> [1584302.776532] Call Trace: >> > >>>>>>>>> [1584302.799361] dump_stack+0x58/0x6b >> > >>>>>>>>> [1584302.821791] dump_header+0x4c/0x2e6 >> > >>>>>>>>> [1584302.843580] oom_kill_process.cold+0xb/0x10 >> > >>>>>>>>> [1584302.865223] out_of_memory.part.0+0x125/0x5f0 >> > >>>>>>>>> [1584302.886641] out_of_memory+0x54/0xa0 >> > >>>>>>>>> [1584302.907302] __alloc_pages_slowpath.constprop.0+0xb03/0xfb0 >> > >>>>>>>>> [1584302.927913] __alloc_pages_nodemask+0x15a/0x180 >> > >>>>>>>>> [1584302.947874] __get_free_pages+0x8/0x30 >> > >>>>>>>>> [1584302.967246] pgd_alloc+0x21/0x180 >> > >>>>>>>>> [1584302.986355] mm_alloc+0x1af/0x250 >> > >>>>>>>>> [1584303.005085] alloc_bprm+0x80/0x2a0 >> > >>>>>>>>> [1584303.023328] do_execveat_common+0x8b/0x330 >> > >>>>>>>>> [1584303.041181] __x64_sys_execve+0x2b/0x40 >> > >>>>>>>>> [1584303.058513] do_syscall_64+0x2d/0x40 >> > >>>>>>>>> [1584303.075281] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> > >>>>>>>>> [1584303.091891] RIP: 0033:0x488376 >> > >>>>>>>>> [1584303.108045] Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 1b 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30 >> > >>>>>>>>> [1584303.159632] RSP: 002b:000000c001108528 EFLAGS: 00000206 ORIG_RAX: 000000000000003b >> > >>>>>>>>> [1584303.195446] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000488376 >> > >>>>>>>>> [1584303.231451] RDX: 000000c002b1a080 RSI: 000000c0028432e0 RDI: 000000c000cae660 >> > >>>>>>>>> [1584303.267407] RBP: 000000c0011086c8 R08: 0000000000000018 R09: 0000000000000000 >> > >>>>>>>>> [1584303.303594] R10: 0000000000000008 R11: 0000000000000206 R12: 000000000047f258 >> > >>>>>>>>> [1584303.340218] R13: 000000000000000e R14: 000000000000000d R15: 0000000000000100 >> > >>>>>>>>> [1584303.379094] Mem-Info: >> > >>>>>>>>> [1584303.398713] active_anon:8159 inactive_anon:2138194 isolated_anon:0 >> > >>>>>>>>> active_file:12975 inactive_file:168 isolated_file:32 >> > >>>>>>>>> unevictable:909709 dirty:12864 writeback:10 >> > >>>>>>>>> slab_reclaimable:42415 slab_unreclaimable:154783 >> > >>>>>>>>> mapped:39825 shmem:14744 pagetables:26041 bounce:0 >> > >>>>>>>>> free:537002 free_pcp:1813 free_cma:0 >> > >>>>>>>>> [1584303.547011] Node 0 active_anon:32636kB inactive_anon:8552776kB active_file:51476kB inactive_file:1112kB unevictable:3638836kB isolated(anon):0kB isolated(file):128kB mapped:159480kB dirty:51024kB writeback:28kB shmem:58976kB writeback_tmp:0kB kernel_stack:5392kB pagetables:104164kB all_unreclaimable? no >> > >>>>>>>>> [1584303.640025] Node 0 DMA free:13428kB min:12kB low:24kB high:36kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15968kB managed:15884kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB >> > >>>>>>>>> [1584303.739414] lowmem_reserve[]: 0 1741 15726 15726 >> > >>>>>>>>> [1584303.764535] Node 0 DMA32 free:121320kB min:5872kB low:7652kB high:9432kB reserved_highatomic:24576KB active_anon:5104kB inactive_anon:761140kB active_file:37160kB inactive_file:772kB unevictable:885428kB writepending:37672kB present:1965124kB managed:1899588kB mlocked:0kB bounce:0kB free_pcp:1448kB local_pcp:0kB free_cma:0kB >> > >>>>>>>>> [1584303.888935] lowmem_reserve[]: 0 0 13985 13985 >> > >>>>>>>>> [1584303.913532] Node 0 Normal free:1970692kB min:78608kB low:92928kB high:107248kB reserved_highatomic:126976KB active_anon:27524kB inactive_anon:7812248kB active_file:13664kB inactive_file:1528kB unevictable:2753408kB writepending:12888kB present:14680064kB managed:14326620kB mlocked:0kB bounce:0kB free_pcp:4076kB local_pcp:0kB free_cma:0kB >> > >>>>>>>>> [1584304.036531] lowmem_reserve[]: 0 0 0 0 >> > >>>>>>>>> [1584304.060733] Node 0 DMA: 1*4kB (U) 40*8kB (U) 37*16kB (U) 32*32kB (U) 24*64kB (U) 14*128kB (U) 8*256kB (U) 2*512kB (U) 1*1024kB (U) 0*2048kB 1*4096kB (M) = 13460kB >> > >>>>>>>>> [1584304.134551] Node 0 DMA32: 15098*4kB (UMEH) 6204*8kB (UMEH) 662*16kB (UMEH) 20*32kB (UMEH) 1*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 121320kB >> > >>>>>>>>> [1584304.209349] Node 0 Normal: 1038*4kB (UEH) 392*8kB (EH) 56*16kB (UEH) 28*32kB (UEH) 14*64kB (MEH) 25*128kB (MEH) 6*256kB (MH) 3*512kB (UMH) 3*1024kB (MH) 5*2048kB (UMH) 472*4096kB (U) = 1962872kB >> > >>>>>>>>> [1584304.287094] 933871 total pagecache pages >> > >>>>>>>>> [1584304.312815] 4165289 pages RAM >> > >>>>>>>>> [1584304.337915] 0 pages HighMem/MovableOnly >> > >>>>>>>>> [1584304.362522] 104766 pages reserved >> > >>>>>>>>> [1584304.386516] 0 pages hwpoisoned >> > >>>>>>>>> >> > >>>>>>>>>> On 20 Mar 2021, at 11:55, Eric Dumazet <edumazet@google.com> wrote: >> > >>>>>>>>>> >> > >>>>>>>>>> On Sat, Mar 20, 2021 at 9:45 AM Martin Zaharinov <micron10@gmail.com> wrote: >> > >>>>>>>>>>> >> > >>>>>>>>>>> Hi Wei >> > >>>>>>>>>>> Check this: >> > >>>>>>>>>>> >> > >>>>>>>>>>> [ 39.706567] ------------[ cut here ]------------ >> > >>>>>>>>>>> [ 39.706568] RTNL: assertion failed at net/ipv4/udp_tunnel_nic.c (557) >> > >>>>>>>>>>> [ 39.706585] WARNING: CPU: 0 PID: 429 at net/ipv4/udp_tunnel_nic.c:557 __udp_tunnel_nic_reset_ntf+0xea/0x100 >> > >>>>>>>>>> >> > >>>>>>>>>> Probably more relevant to Intel maintainers than Wei :/ >> > >>>>>>>>>> >> > >>>>>>>>>>> [ 39.706594] Modules linked in: i40e(+) nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos megaraid_sas >> > >>>>>>>>>>> [ 39.706614] CPU: 0 PID: 429 Comm: kworker/0:2 Tainted: G O 5.11.7 #1 >> > >>>>>>>>>>> [ 39.706618] Hardware name: Supermicro X11DPi-N(T)/X11DPi-NT, BIOS 3.4 11/23/2020 >> > >>>>>>>>>>> [ 39.706619] Workqueue: events work_for_cpu_fn >> > >>>>>>>>>>> [ 39.706627] RIP: 0010:__udp_tunnel_nic_reset_ntf+0xea/0x100 >> > >>>>>>>>>>> [ 39.706631] Code: c0 79 f1 00 00 0f 85 4e ff ff ff ba 2d 02 00 00 48 c7 c6 45 3c 3a 93 48 c7 c7 40 de 39 93 c6 05 a0 79 f1 00 01 e8 f5 ad 0c 00 <0f> 0b e9 28 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 >> > >>>>>>>>>>> [ 39.706634] RSP: 0018:ffffa8390d9b3b38 EFLAGS: 00010292 >> > >>>>>>>>>>> [ 39.706637] RAX: 0000000000000039 RBX: ffff8e02630b2768 RCX: 00000000ffdfffff >> > >>>>>>>>>>> [ 39.706639] RDX: 00000000ffdfffff RSI: ffff8e80ad400000 RDI: 0000000000000001 >> > >>>>>>>>>>> [ 39.706641] RBP: ffff8e025df72000 R08: ffff8e80bb3fffe8 R09: 00000000ffffffea >> > >>>>>>>>>>> [ 39.706643] R10: 00000000ffdfffff R11: 80000000ffe00000 R12: ffff8e02630b2008 >> > >>>>>>>>>>> [ 39.706645] R13: 0000000000000000 R14: ffff8e024a88ba00 R15: 0000000000000000 >> > >>>>>>>>>>> [ 39.706646] FS: 0000000000000000(0000) GS:ffff8e40bf800000(0000) knlGS:0000000000000000 >> > >>>>>>>>>>> [ 39.706649] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> > >>>>>>>>>>> [ 39.706651] CR2: 00000000004d8f40 CR3: 0000002ca140a002 CR4: 00000000001706f0 >> > >>>>>>>>>>> [ 39.706652] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> > >>>>>>>>>>> [ 39.706654] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> > >>>>>>>>>>> [ 39.706656] Call Trace: >> > >>>>>>>>>>> [ 39.706658] i40e_setup_pf_switch+0x617/0xf80 [i40e] >> > >>>>>>>>>>> [ 39.706683] i40e_probe.part.0.cold+0x8dc/0x109e [i40e] >> > >>>>>>>>>>> [ 39.706708] ? acpi_ns_check_object_type+0xd4/0x193 >> > >>>>>>>>>>> [ 39.706713] ? acpi_ns_check_package_list+0xfd/0x205 >> > >>>>>>>>>>> [ 39.706716] ? __kmalloc+0x37/0x160 >> > >>>>>>>>>>> [ 39.706720] ? kmem_cache_alloc+0xcb/0x120 >> > >>>>>>>>>>> [ 39.706723] ? irq_get_irq_data+0x5/0x20 >> > >>>>>>>>>>> [ 39.706726] ? mp_check_pin_attr+0xe/0xf0 >> > >>>>>>>>>>> [ 39.706729] ? irq_get_irq_data+0x5/0x20 >> > >>>>>>>>>>> [ 39.706731] ? mp_map_pin_to_irq+0xb7/0x2c0 >> > >>>>>>>>>>> [ 39.706735] ? acpi_register_gsi_ioapic+0x86/0x150 >> > >>>>>>>>>>> [ 39.706739] ? pci_conf1_read+0x9f/0xf0 >> > >>>>>>>>>>> [ 39.706743] ? pci_bus_read_config_word+0x2e/0x40 >> > >>>>>>>>>>> [ 39.706746] local_pci_probe+0x1b/0x40 >> > >>>>>>>>>>> [ 39.706750] work_for_cpu_fn+0xb/0x20 >> > >>>>>>>>>>> [ 39.706754] process_one_work+0x1ec/0x350 >> > >>>>>>>>>>> [ 39.706758] worker_thread+0x24b/0x4d0 >> > >>>>>>>>>>> [ 39.706760] ? process_one_work+0x350/0x350 >> > >>>>>>>>>>> [ 39.706762] kthread+0xea/0x120 >> > >>>>>>>>>>> [ 39.706766] ? kthread_park+0x80/0x80 >> > >>>>>>>>>>> [ 39.706770] ret_from_fork+0x1f/0x30 >> > >>>>>>>>>>> [ 39.706774] ---[ end trace 7a203f3ec972a377 ]--- >> > >>>>>>>>>>> >> > >>>>>>>>>>> Martin >> > >>>>>>>>>>> >> > >>>>>>>>>>> >> > >>>>>>>>>>>> On 17 Mar 2021, at 0:36, Wei Wang <weiwan@google.com> wrote: >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to >> > >>>>>>>>>>>> determine if the kthread owns this napi and could call napi->poll() on >> > >>>>>>>>>>>> it. However, if socket busy poll is enabled, it is possible that the >> > >>>>>>>>>>>> busy poll thread grabs this SCHED bit (after the previous napi->poll() >> > >>>>>>>>>>>> invokes napi_complete_done() and clears SCHED bit) and tries to poll >> > >>>>>>>>>>>> on the same napi. napi_disable() could grab the SCHED bit as well. >> > >>>>>>>>>>>> This patch tries to fix this race by adding a new bit >> > >>>>>>>>>>>> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in >> > >>>>>>>>>>>> ____napi_schedule() if the threaded mode is enabled, and gets cleared >> > >>>>>>>>>>>> in napi_complete_done(), and we only poll the napi in kthread if this >> > >>>>>>>>>>>> bit is set. This helps distinguish the ownership of the napi between >> > >>>>>>>>>>>> kthread and other scenarios and fixes the race issue. >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") >> > >>>>>>>>>>>> Reported-by: Martin Zaharinov <micron10@gmail.com> >> > >>>>>>>>>>>> Suggested-by: Jakub Kicinski <kuba@kernel.org> >> > >>>>>>>>>>>> Signed-off-by: Wei Wang <weiwan@google.com> >> > >>>>>>>>>>>> Cc: Alexander Duyck <alexanderduyck@fb.com> >> > >>>>>>>>>>>> Cc: Eric Dumazet <edumazet@google.com> >> > >>>>>>>>>>>> Cc: Paolo Abeni <pabeni@redhat.com> >> > >>>>>>>>>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> >> > >>>>>>>>>>>> --- >> > >>>>>>>>>>>> Change since v3: >> > >>>>>>>>>>>> - Add READ_ONCE() for thread->state and add comments in >> > >>>>>>>>>>>> ____napi_schedule(). >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> include/linux/netdevice.h | 2 ++ >> > >>>>>>>>>>>> net/core/dev.c | 19 ++++++++++++++++++- >> > >>>>>>>>>>>> 2 files changed, 20 insertions(+), 1 deletion(-) >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >> > >>>>>>>>>>>> index 5b67ea89d5f2..87a5d186faff 100644 >> > >>>>>>>>>>>> --- a/include/linux/netdevice.h >> > >>>>>>>>>>>> +++ b/include/linux/netdevice.h >> > >>>>>>>>>>>> @@ -360,6 +360,7 @@ enum { >> > >>>>>>>>>>>> NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */ >> > >>>>>>>>>>>> NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/ >> > >>>>>>>>>>>> NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/ >> > >>>>>>>>>>>> + NAPI_STATE_SCHED_THREADED, /* Napi is currently scheduled in threaded mode */ >> > >>>>>>>>>>>> }; >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> enum { >> > >>>>>>>>>>>> @@ -372,6 +373,7 @@ enum { >> > >>>>>>>>>>>> NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL), >> > >>>>>>>>>>>> NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL), >> > >>>>>>>>>>>> NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED), >> > >>>>>>>>>>>> + NAPIF_STATE_SCHED_THREADED = BIT(NAPI_STATE_SCHED_THREADED), >> > >>>>>>>>>>>> }; >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> enum gro_result { >> > >>>>>>>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c >> > >>>>>>>>>>>> index 6c5967e80132..d3195a95f30e 100644 >> > >>>>>>>>>>>> --- a/net/core/dev.c >> > >>>>>>>>>>>> +++ b/net/core/dev.c >> > >>>>>>>>>>>> @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd, >> > >>>>>>>>>>>> */ >> > >>>>>>>>>>>> thread = READ_ONCE(napi->thread); >> > >>>>>>>>>>>> if (thread) { >> > >>>>>>>>>>>> + /* Avoid doing set_bit() if the thread is in >> > >>>>>>>>>>>> + * INTERRUPTIBLE state, cause napi_thread_wait() >> > >>>>>>>>>>>> + * makes sure to proceed with napi polling >> > >>>>>>>>>>>> + * if the thread is explicitly woken from here. >> > >>>>>>>>>>>> + */ >> > >>>>>>>>>>>> + if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE) >> > >>>>>>>>>>>> + set_bit(NAPI_STATE_SCHED_THREADED, &napi->state); >> > >>>>>>>>>>>> wake_up_process(thread); >> > >>>>>>>>>>>> return; >> > >>>>>>>>>>>> } >> > >>>>>>>>>>>> @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done) >> > >>>>>>>>>>>> WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED)); >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED | >> > >>>>>>>>>>>> + NAPIF_STATE_SCHED_THREADED | >> > >>>>>>>>>>>> NAPIF_STATE_PREFER_BUSY_POLL); >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> /* If STATE_MISSED was set, leave STATE_SCHED set, >> > >>>>>>>>>>>> @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> static int napi_thread_wait(struct napi_struct *napi) >> > >>>>>>>>>>>> { >> > >>>>>>>>>>>> + bool woken = false; >> > >>>>>>>>>>>> + >> > >>>>>>>>>>>> set_current_state(TASK_INTERRUPTIBLE); >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> while (!kthread_should_stop() && !napi_disable_pending(napi)) { >> > >>>>>>>>>>>> - if (test_bit(NAPI_STATE_SCHED, &napi->state)) { >> > >>>>>>>>>>>> + /* Testing SCHED_THREADED bit here to make sure the current >> > >>>>>>>>>>>> + * kthread owns this napi and could poll on this napi. >> > >>>>>>>>>>>> + * Testing SCHED bit is not enough because SCHED bit might be >> > >>>>>>>>>>>> + * set by some other busy poll thread or by napi_disable(). >> > >>>>>>>>>>>> + */ >> > >>>>>>>>>>>> + if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) { >> > >>>>>>>>>>>> WARN_ON(!list_empty(&napi->poll_list)); >> > >>>>>>>>>>>> __set_current_state(TASK_RUNNING); >> > >>>>>>>>>>>> return 0; >> > >>>>>>>>>>>> } >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> schedule(); >> > >>>>>>>>>>>> + /* woken being true indicates this thread owns this napi. */ >> > >>>>>>>>>>>> + woken = true; >> > >>>>>>>>>>>> set_current_state(TASK_INTERRUPTIBLE); >> > >>>>>>>>>>>> } >> > >>>>>>>>>>>> __set_current_state(TASK_RUNNING); >> > >>>>>>>>>>>> -- >> > >>>>>>>>>>>> 2.31.0.rc2.261.g7f71774620-goog >> > >>>>>>>>>>>> >> > >>>>>>>>>>> >> > >>>>>>>>> >> > >>>>>>> >> > >>>>> >> > >>> >> > >> >> >
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 5b67ea89d5f2..87a5d186faff 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -360,6 +360,7 @@ enum { NAPI_STATE_IN_BUSY_POLL, /* sk_busy_loop() owns this NAPI */ NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/ NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/ + NAPI_STATE_SCHED_THREADED, /* Napi is currently scheduled in threaded mode */ }; enum { @@ -372,6 +373,7 @@ enum { NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL), NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL), NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED), + NAPIF_STATE_SCHED_THREADED = BIT(NAPI_STATE_SCHED_THREADED), }; enum gro_result { diff --git a/net/core/dev.c b/net/core/dev.c index 6c5967e80132..d3195a95f30e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4294,6 +4294,13 @@ static inline void ____napi_schedule(struct softnet_data *sd, */ thread = READ_ONCE(napi->thread); if (thread) { + /* Avoid doing set_bit() if the thread is in + * INTERRUPTIBLE state, cause napi_thread_wait() + * makes sure to proceed with napi polling + * if the thread is explicitly woken from here. + */ + if (READ_ONCE(thread->state) != TASK_INTERRUPTIBLE) + set_bit(NAPI_STATE_SCHED_THREADED, &napi->state); wake_up_process(thread); return; } @@ -6486,6 +6493,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done) WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED)); new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED | + NAPIF_STATE_SCHED_THREADED | NAPIF_STATE_PREFER_BUSY_POLL); /* If STATE_MISSED was set, leave STATE_SCHED set, @@ -6968,16 +6976,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll) static int napi_thread_wait(struct napi_struct *napi) { + bool woken = false; + set_current_state(TASK_INTERRUPTIBLE); while (!kthread_should_stop() && !napi_disable_pending(napi)) { - if (test_bit(NAPI_STATE_SCHED, &napi->state)) { + /* Testing SCHED_THREADED bit here to make sure the current + * kthread owns this napi and could poll on this napi. + * Testing SCHED bit is not enough because SCHED bit might be + * set by some other busy poll thread or by napi_disable(). + */ + if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) { WARN_ON(!list_empty(&napi->poll_list)); __set_current_state(TASK_RUNNING); return 0; } schedule(); + /* woken being true indicates this thread owns this napi. */ + woken = true; set_current_state(TASK_INTERRUPTIBLE); } __set_current_state(TASK_RUNNING);
Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to determine if the kthread owns this napi and could call napi->poll() on it. However, if socket busy poll is enabled, it is possible that the busy poll thread grabs this SCHED bit (after the previous napi->poll() invokes napi_complete_done() and clears SCHED bit) and tries to poll on the same napi. napi_disable() could grab the SCHED bit as well. This patch tries to fix this race by adding a new bit NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in ____napi_schedule() if the threaded mode is enabled, and gets cleared in napi_complete_done(), and we only poll the napi in kthread if this bit is set. This helps distinguish the ownership of the napi between kthread and other scenarios and fixes the race issue. Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") Reported-by: Martin Zaharinov <micron10@gmail.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Wei Wang <weiwan@google.com> Cc: Alexander Duyck <alexanderduyck@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> --- Change since v3: - Add READ_ONCE() for thread->state and add comments in ____napi_schedule(). include/linux/netdevice.h | 2 ++ net/core/dev.c | 19 ++++++++++++++++++- 2 files changed, 20 insertions(+), 1 deletion(-)