diff mbox series

[net,v3] net: fix race between napi kthread mode and busy poll

Message ID 20210302012113.1432412-1-weiwan@google.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net,v3] net: fix race between napi kthread mode and busy poll | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/cc_maintainers warning 5 maintainers not CCed: cong.wang@bytedance.com ap420073@gmail.com daniel@iogearbox.net andriin@fb.com ast@kernel.org
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 6946 this patch: 6946
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns
netdev/build_allmodconfig_warn success Errors and warnings before: 7159 this patch: 7159
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Wei Wang March 2, 2021, 1:21 a.m. UTC
Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
determine if the kthread owns this napi and could call napi->poll() on
it. However, if socket busy poll is enabled, it is possible that the
busy poll thread grabs this SCHED bit (after the previous napi->poll()
invokes napi_complete_done() and clears SCHED bit) and tries to poll
on the same napi. napi_disable() could grab the SCHED bit as well.
This patch tries to fix this race by adding a new bit
NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
____napi_schedule() if the threaded mode is enabled, and gets cleared
in napi_complete_done(), and we only poll the napi in kthread if this
bit is set. This helps distinguish the ownership of the napi between
kthread and other scenarios and fixes the race issue.

Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
Reported-by: Martin Zaharinov <micron10@gmail.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Wei Wang <weiwan@google.com>
Cc: Alexander Duyck <alexanderduyck@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 include/linux/netdevice.h |  2 ++
 net/core/dev.c            | 14 +++++++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

Comments

Jakub Kicinski March 3, 2021, 12:18 a.m. UTC | #1
On Mon,  1 Mar 2021 17:21:13 -0800 Wei Wang wrote:
> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
> determine if the kthread owns this napi and could call napi->poll() on
> it. However, if socket busy poll is enabled, it is possible that the
> busy poll thread grabs this SCHED bit (after the previous napi->poll()
> invokes napi_complete_done() and clears SCHED bit) and tries to poll
> on the same napi. napi_disable() could grab the SCHED bit as well.
> This patch tries to fix this race by adding a new bit
> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
> ____napi_schedule() if the threaded mode is enabled, and gets cleared
> in napi_complete_done(), and we only poll the napi in kthread if this
> bit is set. This helps distinguish the ownership of the napi between
> kthread and other scenarios and fixes the race issue.
> 
> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
> Reported-by: Martin Zaharinov <micron10@gmail.com>
> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Wei Wang <weiwan@google.com>
> Cc: Alexander Duyck <alexanderduyck@fb.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Thanks!
Eric Dumazet March 3, 2021, 9:53 a.m. UTC | #2
On Tue, Mar 2, 2021 at 2:21 AM Wei Wang <weiwan@google.com> wrote:
>
> Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
> determine if the kthread owns this napi and could call napi->poll() on
> it. However, if socket busy poll is enabled, it is possible that the
> busy poll thread grabs this SCHED bit (after the previous napi->poll()
> invokes napi_complete_done() and clears SCHED bit) and tries to poll
> on the same napi. napi_disable() could grab the SCHED bit as well.
> This patch tries to fix this race by adding a new bit
> NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
> ____napi_schedule() if the threaded mode is enabled, and gets cleared
> in napi_complete_done(), and we only poll the napi in kthread if this
> bit is set. This helps distinguish the ownership of the napi between
> kthread and other scenarios and fixes the race issue.
>
> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
> Reported-by: Martin Zaharinov <micron10@gmail.com>
> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Wei Wang <weiwan@google.com>
> Cc: Alexander Duyck <alexanderduyck@fb.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> ---
>  include/linux/netdevice.h |  2 ++
>  net/core/dev.c            | 14 +++++++++++++-
>  2 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index ddf4cfc12615..682908707c1a 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -360,6 +360,7 @@ enum {
>         NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() owns this NAPI */
>         NAPI_STATE_PREFER_BUSY_POLL,    /* prefer busy-polling over softirq processing*/
>         NAPI_STATE_THREADED,            /* The poll is performed inside its own thread*/
> +       NAPI_STATE_SCHED_THREADED,      /* Napi is currently scheduled in threaded mode */
>  };
>
>  enum {
> @@ -372,6 +373,7 @@ enum {
>         NAPIF_STATE_IN_BUSY_POLL        = BIT(NAPI_STATE_IN_BUSY_POLL),
>         NAPIF_STATE_PREFER_BUSY_POLL    = BIT(NAPI_STATE_PREFER_BUSY_POLL),
>         NAPIF_STATE_THREADED            = BIT(NAPI_STATE_THREADED),
> +       NAPIF_STATE_SCHED_THREADED      = BIT(NAPI_STATE_SCHED_THREADED),
>  };
>
>  enum gro_result {
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 6c5967e80132..03c4763de351 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4294,6 +4294,8 @@ static inline void ____napi_schedule(struct softnet_data *sd,
>                  */
>                 thread = READ_ONCE(napi->thread);
>                 if (thread) {
> +                       if (thread->state != TASK_INTERRUPTIBLE)

How safe is this read ?

Presumably KMSAN will detect that another cpu/thread is able to change
thread->state under us,
so a READ_ONCE() (or data_race()) would be needed.

Nowhere else in the kernel can we find a similar construct, I find
unfortunate to bury
in net/core/dev.c something that might be incorrect in the future.

> +                               set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
>                         wake_up_process(thread);
>                         return;
>                 }
> @@ -6486,6 +6488,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
>                 WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
>
>                 new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
> +                             NAPIF_STATE_SCHED_THREADED |
>                               NAPIF_STATE_PREFER_BUSY_POLL);
>
>                 /* If STATE_MISSED was set, leave STATE_SCHED set,
> @@ -6968,16 +6971,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
>
>  static int napi_thread_wait(struct napi_struct *napi)
>  {
> +       bool woken = false;
> +
>         set_current_state(TASK_INTERRUPTIBLE);
>
>         while (!kthread_should_stop() && !napi_disable_pending(napi)) {
> -               if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
> +               /* Testing SCHED_THREADED bit here to make sure the current
> +                * kthread owns this napi and could poll on this napi.
> +                * Testing SCHED bit is not enough because SCHED bit might be
> +                * set by some other busy poll thread or by napi_disable().
> +                */
> +               if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) {
>                         WARN_ON(!list_empty(&napi->poll_list));
>                         __set_current_state(TASK_RUNNING);
>                         return 0;
>                 }
>
>                 schedule();
> +               /* woken being true indicates this thread owns this napi. */
> +               woken = true;
>                 set_current_state(TASK_INTERRUPTIBLE);
>         }
>         __set_current_state(TASK_RUNNING);
> --
> 2.30.1.766.gb4fecdf3b7-goog
>
Eric Dumazet March 3, 2021, 9:55 a.m. UTC | #3
On Wed, Mar 3, 2021 at 10:53 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Mar 2, 2021 at 2:21 AM Wei Wang <weiwan@google.com> wrote:
> >
> > Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
> > determine if the kthread owns this napi and could call napi->poll() on
> > it. However, if socket busy poll is enabled, it is possible that the
> > busy poll thread grabs this SCHED bit (after the previous napi->poll()
> > invokes napi_complete_done() and clears SCHED bit) and tries to poll
> > on the same napi. napi_disable() could grab the SCHED bit as well.
> > This patch tries to fix this race by adding a new bit
> > NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
> > ____napi_schedule() if the threaded mode is enabled, and gets cleared
> > in napi_complete_done(), and we only poll the napi in kthread if this
> > bit is set. This helps distinguish the ownership of the napi between
> > kthread and other scenarios and fixes the race issue.
> >
> > Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
> > Reported-by: Martin Zaharinov <micron10@gmail.com>
> > Suggested-by: Jakub Kicinski <kuba@kernel.org>
> > Signed-off-by: Wei Wang <weiwan@google.com>
> > Cc: Alexander Duyck <alexanderduyck@fb.com>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Paolo Abeni <pabeni@redhat.com>
> > Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > ---
> >  include/linux/netdevice.h |  2 ++
> >  net/core/dev.c            | 14 +++++++++++++-
> >  2 files changed, 15 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index ddf4cfc12615..682908707c1a 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -360,6 +360,7 @@ enum {
> >         NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() owns this NAPI */
> >         NAPI_STATE_PREFER_BUSY_POLL,    /* prefer busy-polling over softirq processing*/
> >         NAPI_STATE_THREADED,            /* The poll is performed inside its own thread*/
> > +       NAPI_STATE_SCHED_THREADED,      /* Napi is currently scheduled in threaded mode */
> >  };
> >
> >  enum {
> > @@ -372,6 +373,7 @@ enum {
> >         NAPIF_STATE_IN_BUSY_POLL        = BIT(NAPI_STATE_IN_BUSY_POLL),
> >         NAPIF_STATE_PREFER_BUSY_POLL    = BIT(NAPI_STATE_PREFER_BUSY_POLL),
> >         NAPIF_STATE_THREADED            = BIT(NAPI_STATE_THREADED),
> > +       NAPIF_STATE_SCHED_THREADED      = BIT(NAPI_STATE_SCHED_THREADED),
> >  };
> >
> >  enum gro_result {
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 6c5967e80132..03c4763de351 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4294,6 +4294,8 @@ static inline void ____napi_schedule(struct softnet_data *sd,
> >                  */
> >                 thread = READ_ONCE(napi->thread);
> >                 if (thread) {
> > +                       if (thread->state != TASK_INTERRUPTIBLE)
>
> How safe is this read ?
>
> Presumably KMSAN will detect that another cpu/thread is able to change
> thread->state under us,
> so a READ_ONCE() (or data_race()) would be needed.
>

Of course I meant KCSAN here.

> Nowhere else in the kernel can we find a similar construct, I find
> unfortunate to bury
> in net/core/dev.c something that might be incorrect in the future.
>
> > +                               set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
> >                         wake_up_process(thread);
> >                         return;
> >                 }
> > @@ -6486,6 +6488,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
> >                 WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
> >
> >                 new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
> > +                             NAPIF_STATE_SCHED_THREADED |
> >                               NAPIF_STATE_PREFER_BUSY_POLL);
> >
> >                 /* If STATE_MISSED was set, leave STATE_SCHED set,
> > @@ -6968,16 +6971,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
> >
> >  static int napi_thread_wait(struct napi_struct *napi)
> >  {
> > +       bool woken = false;
> > +
> >         set_current_state(TASK_INTERRUPTIBLE);
> >
> >         while (!kthread_should_stop() && !napi_disable_pending(napi)) {
> > -               if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
> > +               /* Testing SCHED_THREADED bit here to make sure the current
> > +                * kthread owns this napi and could poll on this napi.
> > +                * Testing SCHED bit is not enough because SCHED bit might be
> > +                * set by some other busy poll thread or by napi_disable().
> > +                */
> > +               if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) {
> >                         WARN_ON(!list_empty(&napi->poll_list));
> >                         __set_current_state(TASK_RUNNING);
> >                         return 0;
> >                 }
> >
> >                 schedule();
> > +               /* woken being true indicates this thread owns this napi. */
> > +               woken = true;
> >                 set_current_state(TASK_INTERRUPTIBLE);
> >         }
> >         __set_current_state(TASK_RUNNING);
> > --
> > 2.30.1.766.gb4fecdf3b7-goog
> >
Wei Wang March 3, 2021, 9:46 p.m. UTC | #4
On Wed, Mar 3, 2021 at 1:55 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, Mar 3, 2021 at 10:53 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Tue, Mar 2, 2021 at 2:21 AM Wei Wang <weiwan@google.com> wrote:
> > >
> > > Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
> > > determine if the kthread owns this napi and could call napi->poll() on
> > > it. However, if socket busy poll is enabled, it is possible that the
> > > busy poll thread grabs this SCHED bit (after the previous napi->poll()
> > > invokes napi_complete_done() and clears SCHED bit) and tries to poll
> > > on the same napi. napi_disable() could grab the SCHED bit as well.
> > > This patch tries to fix this race by adding a new bit
> > > NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
> > > ____napi_schedule() if the threaded mode is enabled, and gets cleared
> > > in napi_complete_done(), and we only poll the napi in kthread if this
> > > bit is set. This helps distinguish the ownership of the napi between
> > > kthread and other scenarios and fixes the race issue.
> > >
> > > Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
> > > Reported-by: Martin Zaharinov <micron10@gmail.com>
> > > Suggested-by: Jakub Kicinski <kuba@kernel.org>
> > > Signed-off-by: Wei Wang <weiwan@google.com>
> > > Cc: Alexander Duyck <alexanderduyck@fb.com>
> > > Cc: Eric Dumazet <edumazet@google.com>
> > > Cc: Paolo Abeni <pabeni@redhat.com>
> > > Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > > ---
> > >  include/linux/netdevice.h |  2 ++
> > >  net/core/dev.c            | 14 +++++++++++++-
> > >  2 files changed, 15 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index ddf4cfc12615..682908707c1a 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -360,6 +360,7 @@ enum {
> > >         NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() owns this NAPI */
> > >         NAPI_STATE_PREFER_BUSY_POLL,    /* prefer busy-polling over softirq processing*/
> > >         NAPI_STATE_THREADED,            /* The poll is performed inside its own thread*/
> > > +       NAPI_STATE_SCHED_THREADED,      /* Napi is currently scheduled in threaded mode */
> > >  };
> > >
> > >  enum {
> > > @@ -372,6 +373,7 @@ enum {
> > >         NAPIF_STATE_IN_BUSY_POLL        = BIT(NAPI_STATE_IN_BUSY_POLL),
> > >         NAPIF_STATE_PREFER_BUSY_POLL    = BIT(NAPI_STATE_PREFER_BUSY_POLL),
> > >         NAPIF_STATE_THREADED            = BIT(NAPI_STATE_THREADED),
> > > +       NAPIF_STATE_SCHED_THREADED      = BIT(NAPI_STATE_SCHED_THREADED),
> > >  };
> > >
> > >  enum gro_result {
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index 6c5967e80132..03c4763de351 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -4294,6 +4294,8 @@ static inline void ____napi_schedule(struct softnet_data *sd,
> > >                  */
> > >                 thread = READ_ONCE(napi->thread);
> > >                 if (thread) {
> > > +                       if (thread->state != TASK_INTERRUPTIBLE)
> >
> > How safe is this read ?
> >
> > Presumably KMSAN will detect that another cpu/thread is able to change
> > thread->state under us,
> > so a READ_ONCE() (or data_race()) would be needed.
> >
>
> Of course I meant KCSAN here.
>
> > Nowhere else in the kernel can we find a similar construct, I find
> > unfortunate to bury
> > in net/core/dev.c something that might be incorrect in the future.
> >
Indeed. It seems not much code is reading and checking the thread
state. Not sure if there is any risk involved in doing this.
The reason to check for the state and then set the bit is to try to
avoid some atomic operations here. And the test I ran did show it is
working properly. But the workload I tested does not represent all the
scenarios.
Not sure what to do here. Should we remove the if () check and
unconditionally set SCHED_THREADED bit?


> > > +                               set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
> > >                         wake_up_process(thread);
> > >                         return;
> > >                 }
> > > @@ -6486,6 +6488,7 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
> > >                 WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
> > >
> > >                 new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
> > > +                             NAPIF_STATE_SCHED_THREADED |
> > >                               NAPIF_STATE_PREFER_BUSY_POLL);
> > >
> > >                 /* If STATE_MISSED was set, leave STATE_SCHED set,
> > > @@ -6968,16 +6971,25 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
> > >
> > >  static int napi_thread_wait(struct napi_struct *napi)
> > >  {
> > > +       bool woken = false;
> > > +
> > >         set_current_state(TASK_INTERRUPTIBLE);
> > >
> > >         while (!kthread_should_stop() && !napi_disable_pending(napi)) {
> > > -               if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
> > > +               /* Testing SCHED_THREADED bit here to make sure the current
> > > +                * kthread owns this napi and could poll on this napi.
> > > +                * Testing SCHED bit is not enough because SCHED bit might be
> > > +                * set by some other busy poll thread or by napi_disable().
> > > +                */
> > > +               if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) {
> > >                         WARN_ON(!list_empty(&napi->poll_list));
> > >                         __set_current_state(TASK_RUNNING);
> > >                         return 0;
> > >                 }
> > >
> > >                 schedule();
> > > +               /* woken being true indicates this thread owns this napi. */
> > > +               woken = true;
> > >                 set_current_state(TASK_INTERRUPTIBLE);
> > >         }
> > >         __set_current_state(TASK_RUNNING);
> > > --
> > > 2.30.1.766.gb4fecdf3b7-goog
> > >
diff mbox series

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ddf4cfc12615..682908707c1a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -360,6 +360,7 @@  enum {
 	NAPI_STATE_IN_BUSY_POLL,	/* sk_busy_loop() owns this NAPI */
 	NAPI_STATE_PREFER_BUSY_POLL,	/* prefer busy-polling over softirq processing*/
 	NAPI_STATE_THREADED,		/* The poll is performed inside its own thread*/
+	NAPI_STATE_SCHED_THREADED,	/* Napi is currently scheduled in threaded mode */
 };
 
 enum {
@@ -372,6 +373,7 @@  enum {
 	NAPIF_STATE_IN_BUSY_POLL	= BIT(NAPI_STATE_IN_BUSY_POLL),
 	NAPIF_STATE_PREFER_BUSY_POLL	= BIT(NAPI_STATE_PREFER_BUSY_POLL),
 	NAPIF_STATE_THREADED		= BIT(NAPI_STATE_THREADED),
+	NAPIF_STATE_SCHED_THREADED	= BIT(NAPI_STATE_SCHED_THREADED),
 };
 
 enum gro_result {
diff --git a/net/core/dev.c b/net/core/dev.c
index 6c5967e80132..03c4763de351 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4294,6 +4294,8 @@  static inline void ____napi_schedule(struct softnet_data *sd,
 		 */
 		thread = READ_ONCE(napi->thread);
 		if (thread) {
+			if (thread->state != TASK_INTERRUPTIBLE)
+				set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
 			wake_up_process(thread);
 			return;
 		}
@@ -6486,6 +6488,7 @@  bool napi_complete_done(struct napi_struct *n, int work_done)
 		WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
 
 		new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
+			      NAPIF_STATE_SCHED_THREADED |
 			      NAPIF_STATE_PREFER_BUSY_POLL);
 
 		/* If STATE_MISSED was set, leave STATE_SCHED set,
@@ -6968,16 +6971,25 @@  static int napi_poll(struct napi_struct *n, struct list_head *repoll)
 
 static int napi_thread_wait(struct napi_struct *napi)
 {
+	bool woken = false;
+
 	set_current_state(TASK_INTERRUPTIBLE);
 
 	while (!kthread_should_stop() && !napi_disable_pending(napi)) {
-		if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
+		/* Testing SCHED_THREADED bit here to make sure the current
+		 * kthread owns this napi and could poll on this napi.
+		 * Testing SCHED bit is not enough because SCHED bit might be
+		 * set by some other busy poll thread or by napi_disable().
+		 */
+		if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) {
 			WARN_ON(!list_empty(&napi->poll_list));
 			__set_current_state(TASK_RUNNING);
 			return 0;
 		}
 
 		schedule();
+		/* woken being true indicates this thread owns this napi. */
+		woken = true;
 		set_current_state(TASK_INTERRUPTIBLE);
 	}
 	__set_current_state(TASK_RUNNING);