Message ID | 996c4bb33166b5cf8d881871ea8b61e54ad4da24.1617230551.git.pabeni@redhat.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: fix hangup on napi_disable for threaded napi | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | fail | 1 blamed authors not CCed: hannes@stressinduktion.org; 7 maintainers not CCed: daniel@iogearbox.net andriin@fb.com cong.wang@bytedance.com ast@kernel.org ap420073@gmail.com bjorn@kernel.org hannes@stressinduktion.org |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 10 this patch: 10 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 14 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 10 this patch: 10 |
netdev/header_inline | success | Link |
On Thu, 1 Apr 2021 00:46:18 +0200 Paolo Abeni wrote: > I hit an hangup on napi_disable(), when the threaded > mode is enabled and the napi is under heavy traffic. > > If the relevant napi has been scheduled and the napi_disable() > kicks in before the next napi_threaded_wait() completes - so > that the latter quits due to the napi_disable_pending() condition, > the existing code leaves the NAPI_STATE_SCHED bit set and the > napi_disable() loop waiting for such bit will hang. > > Address the issue explicitly clearing the SCHED_BIT on napi_thread > termination, if the thread is owns the napi. > > Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > --- > net/core/dev.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/net/core/dev.c b/net/core/dev.c > index b4c67a5be606d..e2e716ba027b8 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -7059,6 +7059,14 @@ static int napi_thread_wait(struct napi_struct *napi) > set_current_state(TASK_INTERRUPTIBLE); > } > __set_current_state(TASK_RUNNING); > + > + /* if the thread owns this napi, and the napi itself has been disabled > + * in-between napi_schedule() and the above napi_disable_pending() > + * check, we need to clear the SCHED bit here, or napi_disable > + * will hang waiting for such bit being cleared > + */ > + if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) > + clear_bit(NAPI_STATE_SCHED, &napi->state); Not sure this covers 100% of the cases. We depend on the ability to go through schedule() "unnecessarily" when the napi gets scheduled after we go into TASK_INTERRUPTIBLE. If we just check woken outside of the loop it may be false even though we got a "wake event". Looking closer now I don't really understand where we ended up with disable handling :S Seems like the thread exits on napi_disable(), but is reaped by netif_napi_del(). Some drivers (*cough* nfp) will go napi_disable() -> napi_enable()... and that will break. Am I missing something? Should we not stay in the wait loop on napi_disable()?
On Wed, 2021-03-31 at 18:41 -0700, Jakub Kicinski wrote: > On Thu, 1 Apr 2021 00:46:18 +0200 Paolo Abeni wrote: > > I hit an hangup on napi_disable(), when the threaded > > mode is enabled and the napi is under heavy traffic. > > > > If the relevant napi has been scheduled and the napi_disable() > > kicks in before the next napi_threaded_wait() completes - so > > that the latter quits due to the napi_disable_pending() condition, > > the existing code leaves the NAPI_STATE_SCHED bit set and the > > napi_disable() loop waiting for such bit will hang. > > > > Address the issue explicitly clearing the SCHED_BIT on napi_thread > > termination, if the thread is owns the napi. > > > > Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > --- > > net/core/dev.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index b4c67a5be606d..e2e716ba027b8 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -7059,6 +7059,14 @@ static int napi_thread_wait(struct napi_struct *napi) > > set_current_state(TASK_INTERRUPTIBLE); > > } > > __set_current_state(TASK_RUNNING); > > + > > + /* if the thread owns this napi, and the napi itself has been disabled > > + * in-between napi_schedule() and the above napi_disable_pending() > > + * check, we need to clear the SCHED bit here, or napi_disable > > + * will hang waiting for such bit being cleared > > + */ > > + if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) > > + clear_bit(NAPI_STATE_SCHED, &napi->state); > > Not sure this covers 100% of the cases. We depend on the ability to go > through schedule() "unnecessarily" when the napi gets scheduled after > we go into TASK_INTERRUPTIBLE. Empirically this patch fixes my test case (napi_disable/napi_enable in a loop with the relevant napi under a lot of UDP traffic). If I understand correctly, the critical scenario you see is something alike: CPU0 CPU1 CPU2 // napi_threaded_poll() main loop napi_complete_done() // napi_threaded_poll() loop completes napi_schedule() // set SCHED bit // NOT set SCHED_THREAD // wake_up_process() is // a no op napi_disable() // set DISABLE bit napi_thread_wait() set_current_state(TASK_INTERRUPTIBLE); // napi_thread_wait() loop completes, // SCHED_THREAD bit is cleared and // wake is false > If we just check woken outside of the loop it may be false even though > we got a "wake event". I think in the above example even the normal processing will be fooled?!? e.g. even without the napi_disable(), napi_thread_wait() will will miss the event/will not understand to it really own the napi and will call schedule(). It looks a different problem to me ?!? I *think* that replacing inside the napi_thread_wait() loop: if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) with: unsigned long state = READ_ONCE(napi->state); if (state & NAPIF_STATE_SCHED && !(state & (NAPIF_STATE_IN_BUSY_POLL | NAPIF_STATE_DISABLE)) should solve it and should also allow removing the NAPI_STATE_SCHED_THREADED bit. I feel like I'm missing some relevant point here. > Looking closer now I don't really understand where we ended up with > disable handling :S Seems like the thread exits on napi_disable(), > but is reaped by netif_napi_del(). Some drivers (*cough* nfp) will > go napi_disable() -> napi_enable()... and that will break. > > Am I missing something? > > Should we not stay in the wait loop on napi_disable()? Here I do not follow?!? Modulo the tiny race (which i was unable to trigger so far) above napi_disable()/napi_enable() loops work correctly here. Could you please re-phrase? Thanks! Paolo
On Thu, 01 Apr 2021 11:55:45 +0200 Paolo Abeni wrote: > On Wed, 2021-03-31 at 18:41 -0700, Jakub Kicinski wrote: > > On Thu, 1 Apr 2021 00:46:18 +0200 Paolo Abeni wrote: > > > I hit an hangup on napi_disable(), when the threaded > > > mode is enabled and the napi is under heavy traffic. > > > > > > If the relevant napi has been scheduled and the napi_disable() > > > kicks in before the next napi_threaded_wait() completes - so > > > that the latter quits due to the napi_disable_pending() condition, > > > the existing code leaves the NAPI_STATE_SCHED bit set and the > > > napi_disable() loop waiting for such bit will hang. > > > > > > Address the issue explicitly clearing the SCHED_BIT on napi_thread > > > termination, if the thread is owns the napi. > > > > > > Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > > --- > > > net/core/dev.c | 8 ++++++++ > > > 1 file changed, 8 insertions(+) > > > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > > index b4c67a5be606d..e2e716ba027b8 100644 > > > --- a/net/core/dev.c > > > +++ b/net/core/dev.c > > > @@ -7059,6 +7059,14 @@ static int napi_thread_wait(struct napi_struct *napi) > > > set_current_state(TASK_INTERRUPTIBLE); > > > } > > > __set_current_state(TASK_RUNNING); > > > + > > > + /* if the thread owns this napi, and the napi itself has been disabled > > > + * in-between napi_schedule() and the above napi_disable_pending() > > > + * check, we need to clear the SCHED bit here, or napi_disable > > > + * will hang waiting for such bit being cleared > > > + */ > > > + if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) > > > + clear_bit(NAPI_STATE_SCHED, &napi->state); > > > > Not sure this covers 100% of the cases. We depend on the ability to go > > through schedule() "unnecessarily" when the napi gets scheduled after > > we go into TASK_INTERRUPTIBLE. > > Empirically this patch fixes my test case (napi_disable/napi_enable in > a loop with the relevant napi under a lot of UDP traffic). > > If I understand correctly, the critical scenario you see is something > alike: > > CPU0 CPU1 CPU2 > // napi_threaded_poll() main loop > napi_complete_done() > // napi_threaded_poll() loop completes > > napi_schedule() > // set SCHED bit > // NOT set SCHED_THREAD Why does it not set SCHED_THREAD if task is RUNNING? > // wake_up_process() is > // a no op > napi_disable() > // set DISABLE bit > > napi_thread_wait() > set_current_state(TASK_INTERRUPTIBLE); > // napi_thread_wait() loop completes, > // SCHED_THREAD bit is cleared and > // wake is false I was thinking of: CPU0 CPU1 CPU2 ==== ==== ==== napi_complete_done() set INTERRUPTIBLE napi_schedule set RUNNING napi_disable() if (should_stop() || disable_pending()) // does not enter loop // test from this patch: if (SCHED_THREADED || woken) // .. is false > > If we just check woken outside of the loop it may be false even though > > we got a "wake event". > > I think in the above example even the normal processing will be > fooled?!? e.g. even without the napi_disable(), napi_thread_wait() will > will miss the event/will not understand to it really own the napi and > will call schedule(). > > It looks a different problem to me ?!? > > I *think* that replacing inside the napi_thread_wait() loop: > > if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) > > with: > > unsigned long state = READ_ONCE(napi->state); > > if (state & NAPIF_STATE_SCHED && > !(state & (NAPIF_STATE_IN_BUSY_POLL | NAPIF_STATE_DISABLE)) > > should solve it and should also allow removing the > NAPI_STATE_SCHED_THREADED bit. I feel like I'm missing some relevant > point here. Heh, that's closer to the proposal Eric put forward. I strongly dislike the idea that every NAPI consumer needs to be aware of all the other consumers to make things work. That's n^2 mental complexity. > > Looking closer now I don't really understand where we ended up with > > disable handling :S Seems like the thread exits on napi_disable(), > > but is reaped by netif_napi_del(). Some drivers (*cough* nfp) will > > go napi_disable() -> napi_enable()... and that will break. > > > > Am I missing something? > > > > Should we not stay in the wait loop on napi_disable()? > > Here I do not follow?!? Modulo the tiny race (which i was unable to > trigger so far) above napi_disable()/napi_enable() loops work correctly > here. > > Could you please re-phrase? After napi_disable() the thread will exit right? (napi_thread_wait() returns -1, the loop in napi_threaded_poll() breaks, and the function returns). napi_enable() will not re-start the thread. What driver are you testing with? You driver must always call netif_napi_del() and netif_napi_add().
Hello, I'm sorry for the lag, On Thu, 2021-04-01 at 16:44 -0700, Jakub Kicinski wrote: > On Thu, 01 Apr 2021 11:55:45 +0200 Paolo Abeni wrote: > > On Wed, 2021-03-31 at 18:41 -0700, Jakub Kicinski wrote: > > > On Thu, 1 Apr 2021 00:46:18 +0200 Paolo Abeni wrote: > > > > I hit an hangup on napi_disable(), when the threaded > > > > mode is enabled and the napi is under heavy traffic. > > > > > > > > If the relevant napi has been scheduled and the napi_disable() > > > > kicks in before the next napi_threaded_wait() completes - so > > > > that the latter quits due to the napi_disable_pending() condition, > > > > the existing code leaves the NAPI_STATE_SCHED bit set and the > > > > napi_disable() loop waiting for such bit will hang. > > > > > > > > Address the issue explicitly clearing the SCHED_BIT on napi_thread > > > > termination, if the thread is owns the napi. > > > > > > > > Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") > > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > > > --- > > > > net/core/dev.c | 8 ++++++++ > > > > 1 file changed, 8 insertions(+) > > > > > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > > > index b4c67a5be606d..e2e716ba027b8 100644 > > > > --- a/net/core/dev.c > > > > +++ b/net/core/dev.c > > > > @@ -7059,6 +7059,14 @@ static int napi_thread_wait(struct napi_struct *napi) > > > > set_current_state(TASK_INTERRUPTIBLE); > > > > } > > > > __set_current_state(TASK_RUNNING); > > > > + > > > > + /* if the thread owns this napi, and the napi itself has been disabled > > > > + * in-between napi_schedule() and the above napi_disable_pending() > > > > + * check, we need to clear the SCHED bit here, or napi_disable > > > > + * will hang waiting for such bit being cleared > > > > + */ > > > > + if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) > > > > + clear_bit(NAPI_STATE_SCHED, &napi->state); > > > > > > Not sure this covers 100% of the cases. We depend on the ability to go > > > through schedule() "unnecessarily" when the napi gets scheduled after > > > we go into TASK_INTERRUPTIBLE. > > > > Empirically this patch fixes my test case (napi_disable/napi_enable in > > a loop with the relevant napi under a lot of UDP traffic). > > > > If I understand correctly, the critical scenario you see is something > > alike: > > > > CPU0 CPU1 CPU2 > > // napi_threaded_poll() main loop > > napi_complete_done() > > // napi_threaded_poll() loop completes > > > > napi_schedule() > > // set SCHED bit > > // NOT set SCHED_THREAD > > Why does it not set SCHED_THREAD if task is RUNNING? Because I'm dumb, I saw the race only on paper, and I mismatched the napi thread status. The actual race I was thinking is what you wrote below ;) > > // wake_up_process() is > > // a no op > > napi_disable() > > // set DISABLE bit > > > > napi_thread_wait() > > set_current_state(TASK_INTERRUPTIBLE); > > // napi_thread_wait() loop completes, > > // SCHED_THREAD bit is cleared and > > // wake is false > > I was thinking of: > > CPU0 CPU1 CPU2 > ==== ==== ==== > napi_complete_done() > set INTERRUPTIBLE > napi_schedule > set RUNNING > napi_disable() > if (should_stop() || > disable_pending()) > // does not enter loop > // test from this patch: > if (SCHED_THREADED || woken) > // .. is false > > > > > If we just check woken outside of the loop it may be false even though > > > we got a "wake event". > > > > I think in the above example even the normal processing will be > > fooled?!? e.g. even without the napi_disable(), napi_thread_wait() will > > will miss the event/will not understand to it really own the napi and > > will call schedule(). > > > > It looks a different problem to me ?!? > > > > I *think* that replacing inside the napi_thread_wait() loop: > > > > if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) > > > > with: > > > > unsigned long state = READ_ONCE(napi->state); > > > > if (state & NAPIF_STATE_SCHED && > > !(state & (NAPIF_STATE_IN_BUSY_POLL | NAPIF_STATE_DISABLE)) > > > > should solve it and should also allow removing the > > NAPI_STATE_SCHED_THREADED bit. I feel like I'm missing some relevant > > point here. > > Heh, that's closer to the proposal Eric put forward. > > I strongly dislike I guess that can't be addressed ;) > the idea that every NAPI consumer needs to be aware > of all the other consumers to make things work. That's n^2 mental > complexity. IMHO the overall complexity is not that bad: both napi_disable() and NAPI poll already set their own specific NAPI bit when acquiring the NAPI instance, they don't need to be aware of any other NAPI consumer internal. The only NAPI user that needs to be aware of others is napi threaded, and I guess/hope we are not going to add more different kind of NAPI users. If you have strong opinion against the above, the only other option I can think of is patching napi_schedule_prep() to set both NAPI_STATE_SCHED and NAPI_STATE_SCHED_THREADED if threaded mode is enabled for the running NAPI. That looks more complex and error prone, so I really would avoid that. Any other better option? Side note: regardless of the above, I think we still need something similar to the code in this patch, can we address the different issues separately? > > > Looking closer now I don't really understand where we ended up with > > > disable handling :S Seems like the thread exits on napi_disable(), > > > but is reaped by netif_napi_del(). Some drivers (*cough* nfp) will > > > go napi_disable() -> napi_enable()... and that will break. > > > > > > Am I missing something? > > > > > > Should we not stay in the wait loop on napi_disable()? > > > > Here I do not follow?!? Modulo the tiny race (which i was unable to > > trigger so far) above napi_disable()/napi_enable() loops work correctly > > here. > > > > Could you please re-phrase? > > After napi_disable() the thread will exit right? (napi_thread_wait() > returns -1, the loop in napi_threaded_poll() breaks, and the function > returns). > > napi_enable() will not re-start the thread. > > What driver are you testing with? You driver must always call > netif_napi_del() and netif_napi_add(). veth + some XDP dummy prog - used just to enable NAPI. Yep, it does a full netif_napi_del()/netif_napi_add(). Looks like plain napi_disable()/napi_enable() is not going to work in threaded mode. Cheers, Paolo
On Wed, 07 Apr 2021 16:54:29 +0200 Paolo Abeni wrote: > > > I think in the above example even the normal processing will be > > > fooled?!? e.g. even without the napi_disable(), napi_thread_wait() will > > > will miss the event/will not understand to it really own the napi and > > > will call schedule(). > > > > > > It looks a different problem to me ?!? > > > > > > I *think* that replacing inside the napi_thread_wait() loop: > > > > > > if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) > > > > > > with: > > > > > > unsigned long state = READ_ONCE(napi->state); > > > > > > if (state & NAPIF_STATE_SCHED && > > > !(state & (NAPIF_STATE_IN_BUSY_POLL | NAPIF_STATE_DISABLE)) > > > > > > should solve it and should also allow removing the > > > NAPI_STATE_SCHED_THREADED bit. I feel like I'm missing some relevant > > > point here. > > > > Heh, that's closer to the proposal Eric put forward. > > > > I strongly dislike > > I guess that can't be addressed ;) I'm not _that_ unreasonable, I hope :) if there is multiple people disagreeing with me then so be it. > > the idea that every NAPI consumer needs to be aware > > of all the other consumers to make things work. That's n^2 mental > > complexity. > > IMHO the overall complexity is not that bad: both napi_disable() and > NAPI poll already set their own specific NAPI bit when acquiring the > NAPI instance, they don't need to be aware of any other NAPI consumer > internal. > > The only NAPI user that needs to be aware of others is napi threaded, > and I guess/hope we are not going to add more different kind of NAPI > users. I thought we agreed that we should leave the door open for other pollers as a condition of merging this simplistic thread thing. > If you have strong opinion against the above, the only other option I > can think of is patching napi_schedule_prep() to set > both NAPI_STATE_SCHED and NAPI_STATE_SCHED_THREADED if threaded mode is > enabled for the running NAPI. That looks more complex and error prone, > so I really would avoid that. > > Any other better option? > > Side note: regardless of the above, I think we still need something > similar to the code in this patch, can we address the different issues > separately? Not sure what issues you're referring to. > > > Here I do not follow?!? Modulo the tiny race (which i was unable to > > > trigger so far) above napi_disable()/napi_enable() loops work correctly > > > here. > > > > > > Could you please re-phrase? > > > > After napi_disable() the thread will exit right? (napi_thread_wait() > > returns -1, the loop in napi_threaded_poll() breaks, and the function > > returns). > > > > napi_enable() will not re-start the thread. > > > > What driver are you testing with? You driver must always call > > netif_napi_del() and netif_napi_add(). > > veth + some XDP dummy prog - used just to enable NAPI. > > Yep, it does a full netif_napi_del()/netif_napi_add(). > > Looks like plain napi_disable()/napi_enable() is not going to work in > threaded mode. Right, I think the problem is disable_pending check is out of place. How about this: diff --git a/net/core/dev.c b/net/core/dev.c index 9d1a8fac793f..e53f8bfed6a1 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -7041,7 +7041,7 @@ static int napi_thread_wait(struct napi_struct *napi) set_current_state(TASK_INTERRUPTIBLE); - while (!kthread_should_stop() && !napi_disable_pending(napi)) { + while (!kthread_should_stop()) { /* Testing SCHED_THREADED bit here to make sure the current * kthread owns this napi and could poll on this napi. * Testing SCHED bit is not enough because SCHED bit might be @@ -7049,8 +7049,14 @@ static int napi_thread_wait(struct napi_struct *napi) */ if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) { WARN_ON(!list_empty(&napi->poll_list)); - __set_current_state(TASK_RUNNING); - return 0; + if (unlikely(napi_disable_pending(napi))) { + clear_bit(NAPI_STATE_SCHED, &napi->state); + clear_bit(NAPI_STATE_SCHED_THREADED, + &napi->state); + } else { + __set_current_state(TASK_RUNNING); + return 0; + } } schedule();
On Wed, 2021-04-07 at 11:13 -0700, Jakub Kicinski wrote: > On Wed, 07 Apr 2021 16:54:29 +0200 Paolo Abeni wrote: > > > > I think in the above example even the normal processing will be > > > > fooled?!? e.g. even without the napi_disable(), napi_thread_wait() will > > > > will miss the event/will not understand to it really own the napi and > > > > will call schedule(). > > > > > > > > It looks a different problem to me ?!? > > > > > > > > I *think* that replacing inside the napi_thread_wait() loop: > > > > > > > > if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) > > > > > > > > with: > > > > > > > > unsigned long state = READ_ONCE(napi->state); > > > > > > > > if (state & NAPIF_STATE_SCHED && > > > > !(state & (NAPIF_STATE_IN_BUSY_POLL | NAPIF_STATE_DISABLE)) > > > > > > > > should solve it and should also allow removing the > > > > NAPI_STATE_SCHED_THREADED bit. I feel like I'm missing some relevant > > > > point here. > > > > > > Heh, that's closer to the proposal Eric put forward. > > > > > > I strongly dislike > > > > I guess that can't be addressed ;) > > I'm not _that_ unreasonable, I hope :) if there is multiple people > disagreeing with me then so be it. I'm sorry, I mean no offence! Just joking about the fact that is usually hard changing preferences ;) > > If you have strong opinion against the above, the only other option I > > can think of is patching napi_schedule_prep() to set > > both NAPI_STATE_SCHED and NAPI_STATE_SCHED_THREADED if threaded mode is > > enabled for the running NAPI. That looks more complex and error prone, > > so I really would avoid that. > > > > Any other better option? > > > > Side note: regardless of the above, I think we still need something > > similar to the code in this patch, can we address the different issues > > separately? > > Not sure what issues you're referring to. The patch that started this thread was ment to address a slightly different race: napi_disable() hanging because napi_threaded_poll() don't clear the NAPI_STATE_SCHED even when owning the napi instance. > Right, I think the problem is disable_pending check is out of place. > > How about this: > > diff --git a/net/core/dev.c b/net/core/dev.c > index 9d1a8fac793f..e53f8bfed6a1 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -7041,7 +7041,7 @@ static int napi_thread_wait(struct napi_struct *napi) > > set_current_state(TASK_INTERRUPTIBLE); > > - while (!kthread_should_stop() && !napi_disable_pending(napi)) { > + while (!kthread_should_stop()) { > /* Testing SCHED_THREADED bit here to make sure the current > * kthread owns this napi and could poll on this napi. > * Testing SCHED bit is not enough because SCHED bit might be > @@ -7049,8 +7049,14 @@ static int napi_thread_wait(struct napi_struct *napi) > */ > if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) { > WARN_ON(!list_empty(&napi->poll_list)); > - __set_current_state(TASK_RUNNING); > - return 0; > + if (unlikely(napi_disable_pending(napi))) { > + clear_bit(NAPI_STATE_SCHED, &napi->state); > + clear_bit(NAPI_STATE_SCHED_THREADED, > + &napi->state); > + } else { > + __set_current_state(TASK_RUNNING); > + return 0; > + } > } > > schedule(); It looks like the above works, and also fixes the problem I originally reported. I think it can be simplified as the following - if NAPIF_STATE_DISABLE is set, napi_threaded_poll()/__napi_poll() will return clearing the SCHEDS bits after trying to poll the device: --- diff --git a/net/core/dev.c b/net/core/dev.c index d9db02d4e044..5cb6f411043d 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6985,7 +6985,7 @@ static int napi_thread_wait(struct napi_struct *napi) set_current_state(TASK_INTERRUPTIBLE); - while (!kthread_should_stop() && !napi_disable_pending(napi)) { + while (!kthread_should_stop()) { /* Testing SCHED_THREADED bit here to make sure the current * kthread owns this napi and could poll on this napi. * Testing SCHED bit is not enough because SCHED bit might be --- And works as intended here. I'll submit that formally, unless there are objection. Thanks! Paolo >
On 4/9/21 11:24 AM, Paolo Abeni wrote: > On Wed, 2021-04-07 at 11:13 -0700, Jakub Kicinski wrote: >> On Wed, 07 Apr 2021 16:54:29 +0200 Paolo Abeni wrote: >>>>> I think in the above example even the normal processing will be >>>>> fooled?!? e.g. even without the napi_disable(), napi_thread_wait() will >>>>> will miss the event/will not understand to it really own the napi and >>>>> will call schedule(). >>>>> >>>>> It looks a different problem to me ?!? >>>>> >>>>> I *think* that replacing inside the napi_thread_wait() loop: >>>>> >>>>> if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) >>>>> >>>>> with: >>>>> >>>>> unsigned long state = READ_ONCE(napi->state); >>>>> >>>>> if (state & NAPIF_STATE_SCHED && >>>>> !(state & (NAPIF_STATE_IN_BUSY_POLL | NAPIF_STATE_DISABLE)) >>>>> >>>>> should solve it and should also allow removing the >>>>> NAPI_STATE_SCHED_THREADED bit. I feel like I'm missing some relevant >>>>> point here. >>>> >>>> Heh, that's closer to the proposal Eric put forward. >>>> >>>> I strongly dislike >>> >>> I guess that can't be addressed ;) >> >> I'm not _that_ unreasonable, I hope :) if there is multiple people >> disagreeing with me then so be it. > > I'm sorry, I mean no offence! Just joking about the fact that is > usually hard changing preferences ;) > >>> If you have strong opinion against the above, the only other option I >>> can think of is patching napi_schedule_prep() to set >>> both NAPI_STATE_SCHED and NAPI_STATE_SCHED_THREADED if threaded mode is >>> enabled for the running NAPI. That looks more complex and error prone, >>> so I really would avoid that. >>> >>> Any other better option? >>> >>> Side note: regardless of the above, I think we still need something >>> similar to the code in this patch, can we address the different issues >>> separately? >> >> Not sure what issues you're referring to. > > The patch that started this thread was ment to address a slightly > different race: napi_disable() hanging because napi_threaded_poll() > don't clear the NAPI_STATE_SCHED even when owning the napi instance. > >> Right, I think the problem is disable_pending check is out of place. >> >> How about this: >> >> diff --git a/net/core/dev.c b/net/core/dev.c >> index 9d1a8fac793f..e53f8bfed6a1 100644 >> --- a/net/core/dev.c >> +++ b/net/core/dev.c >> @@ -7041,7 +7041,7 @@ static int napi_thread_wait(struct napi_struct *napi) >> >> set_current_state(TASK_INTERRUPTIBLE); >> >> - while (!kthread_should_stop() && !napi_disable_pending(napi)) { >> + while (!kthread_should_stop()) { >> /* Testing SCHED_THREADED bit here to make sure the current >> * kthread owns this napi and could poll on this napi. >> * Testing SCHED bit is not enough because SCHED bit might be >> @@ -7049,8 +7049,14 @@ static int napi_thread_wait(struct napi_struct *napi) >> */ >> if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) { >> WARN_ON(!list_empty(&napi->poll_list)); >> - __set_current_state(TASK_RUNNING); >> - return 0; >> + if (unlikely(napi_disable_pending(napi))) { >> + clear_bit(NAPI_STATE_SCHED, &napi->state); >> + clear_bit(NAPI_STATE_SCHED_THREADED, >> + &napi->state); >> + } else { >> + __set_current_state(TASK_RUNNING); >> + return 0; >> + } >> } >> >> schedule(); > > It looks like the above works, and also fixes the problem I originally > reported. > > I think it can be simplified as the following - if NAPIF_STATE_DISABLE > is set, napi_threaded_poll()/__napi_poll() will return clearing the > SCHEDS bits after trying to poll the device: > > --- > diff --git a/net/core/dev.c b/net/core/dev.c > index d9db02d4e044..5cb6f411043d 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -6985,7 +6985,7 @@ static int napi_thread_wait(struct napi_struct *napi) > > set_current_state(TASK_INTERRUPTIBLE); > > - while (!kthread_should_stop() && !napi_disable_pending(napi)) { > + while (!kthread_should_stop()) { > /* Testing SCHED_THREADED bit here to make sure the current > * kthread owns this napi and could poll on this napi. > * Testing SCHED bit is not enough because SCHED bit might be > > --- > > And works as intended here. I'll submit that formally, unless there are > objection. > This looks much better ;) > Thanks! > > Paolo >> >
On Fri, 09 Apr 2021 11:24:07 +0200 Paolo Abeni wrote: > I think it can be simplified as the following - if NAPIF_STATE_DISABLE > is set, napi_threaded_poll()/__napi_poll() will return clearing the > SCHEDS bits after trying to poll the device: Indeed! > diff --git a/net/core/dev.c b/net/core/dev.c > index d9db02d4e044..5cb6f411043d 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -6985,7 +6985,7 @@ static int napi_thread_wait(struct napi_struct *napi) > > set_current_state(TASK_INTERRUPTIBLE); > > - while (!kthread_should_stop() && !napi_disable_pending(napi)) { > + while (!kthread_should_stop()) { > /* Testing SCHED_THREADED bit here to make sure the current > * kthread owns this napi and could poll on this napi. > * Testing SCHED bit is not enough because SCHED bit might be > > --- > > And works as intended here. I'll submit that formally, unless there are > objection. Please and thank you!
diff --git a/net/core/dev.c b/net/core/dev.c index b4c67a5be606d..e2e716ba027b8 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -7059,6 +7059,14 @@ static int napi_thread_wait(struct napi_struct *napi) set_current_state(TASK_INTERRUPTIBLE); } __set_current_state(TASK_RUNNING); + + /* if the thread owns this napi, and the napi itself has been disabled + * in-between napi_schedule() and the above napi_disable_pending() + * check, we need to clear the SCHED bit here, or napi_disable + * will hang waiting for such bit being cleared + */ + if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state) || woken) + clear_bit(NAPI_STATE_SCHED, &napi->state); return -1; }
I hit an hangup on napi_disable(), when the threaded mode is enabled and the napi is under heavy traffic. If the relevant napi has been scheduled and the napi_disable() kicks in before the next napi_threaded_wait() completes - so that the latter quits due to the napi_disable_pending() condition, the existing code leaves the NAPI_STATE_SCHED bit set and the napi_disable() loop waiting for such bit will hang. Address the issue explicitly clearing the SCHED_BIT on napi_thread termination, if the thread is owns the napi. Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support") Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- net/core/dev.c | 8 ++++++++ 1 file changed, 8 insertions(+)