Message ID | 20210305074456.88015-1-ljp@linux.ibm.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [RFC,net] ibmvnic: complete dev->poll nicely during adapter reset | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | fail | 1 blamed authors not CCed: tlfalcon@linux.vnet.ibm.com; 5 maintainers not CCed: paulus@samba.org benh@kernel.crashing.org linuxppc-dev@lists.ozlabs.org tlfalcon@linux.vnet.ibm.com mpe@ellerman.id.au |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 0 this patch: 0 |
netdev/kdoc | success | Errors and warnings before: 3 this patch: 3 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 20 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 0 this patch: 0 |
netdev/header_inline | success | Link |
netdev/stable | success | Stable not CCed |
Lijun Pan [ljp@linux.ibm.com] wrote: > The reset path will call ibmvnic_cleanup->ibmvnic_napi_disable > ->napi_disable(). This is supposed to stop the polling. > Commit 21ecba6c48f9 ("ibmvnic: Exit polling routine correctly > during adapter reset") reported that the during device reset, > polling routine never completed and napi_disable slept indefinitely. > In order to solve that problem, resetting bit was checked and > napi_complete_done was called before dev->poll::ibmvnic_poll exited. > > Checking for resetting bit in dev->poll is racy because resetting > bit may be false while being checked, but turns true immediately > afterwards. Yes, have been testing a fix for that. > > Hence we call napi_complete in ibmvnic_napi_disable, which avoids > the racing with resetting, and makes sure dev->poll and napi_disalbe napi_complete() will prevent a new call to ibmvnic_poll() but what if ibmvnic_poll() is already executing and attempting to access the scrqs while the reset path is freeing them? Sukadev
On Fri, Mar 5, 2021 at 12:44 PM Sukadev Bhattiprolu <sukadev@linux.ibm.com> wrote: > > Lijun Pan [ljp@linux.ibm.com] wrote: > > The reset path will call ibmvnic_cleanup->ibmvnic_napi_disable > > ->napi_disable(). This is supposed to stop the polling. > > Commit 21ecba6c48f9 ("ibmvnic: Exit polling routine correctly > > during adapter reset") reported that the during device reset, > > polling routine never completed and napi_disable slept indefinitely. > > In order to solve that problem, resetting bit was checked and > > napi_complete_done was called before dev->poll::ibmvnic_poll exited. > > > > Checking for resetting bit in dev->poll is racy because resetting > > bit may be false while being checked, but turns true immediately > > afterwards. > > Yes, have been testing a fix for that. > > > > Hence we call napi_complete in ibmvnic_napi_disable, which avoids > > the racing with resetting, and makes sure dev->poll and napi_disalbe > > napi_complete() will prevent a new call to ibmvnic_poll() but what if > ibmvnic_poll() is already executing and attempting to access the scrqs > while the reset path is freeing them? > napi_complete() and napi_disable() are called in the earlier stages of reset path, i.e. before reset path actually calls the functions to freeing scrqs. So I don't think this is a issue here.
Lijun Pan [lijunp213@gmail.com] wrote: > On Fri, Mar 5, 2021 at 12:44 PM Sukadev Bhattiprolu > <sukadev@linux.ibm.com> wrote: > > > > Lijun Pan [ljp@linux.ibm.com] wrote: > > > The reset path will call ibmvnic_cleanup->ibmvnic_napi_disable > > > ->napi_disable(). This is supposed to stop the polling. > > > Commit 21ecba6c48f9 ("ibmvnic: Exit polling routine correctly > > > during adapter reset") reported that the during device reset, > > > polling routine never completed and napi_disable slept indefinitely. > > > In order to solve that problem, resetting bit was checked and > > > napi_complete_done was called before dev->poll::ibmvnic_poll exited. > > > > > > Checking for resetting bit in dev->poll is racy because resetting > > > bit may be false while being checked, but turns true immediately > > > afterwards. > > > > Yes, have been testing a fix for that. > > > > > > Hence we call napi_complete in ibmvnic_napi_disable, which avoids > > > the racing with resetting, and makes sure dev->poll and napi_disalbe > > > > napi_complete() will prevent a new call to ibmvnic_poll() but what if > > ibmvnic_poll() is already executing and attempting to access the scrqs > > while the reset path is freeing them? > > > napi_complete() and napi_disable() are called in the earlier stages of > reset path, i.e. before reset path actually calls the functions to > freeing scrqs. Yes, those will prevent a _new_ call to poll right? But what if poll is already executing? What prevents it from accessing an scrq that the reset path will free? > So I don't think this is a issue here.
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index b6102ccf9b90..338d3d071cec 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -785,6 +785,7 @@ static void ibmvnic_napi_disable(struct ibmvnic_adapter *adapter) for (i = 0; i < adapter->req_rx_queues; i++) { netdev_dbg(adapter->netdev, "Disabling napi[%d]\n", i); + napi_complete(&adapter->napi[i]); napi_disable(&adapter->napi[i]); } @@ -2455,13 +2456,6 @@ static int ibmvnic_poll(struct napi_struct *napi, int budget) u16 offset; u8 flags = 0; - if (unlikely(test_bit(0, &adapter->resetting) && - adapter->reset_reason != VNIC_RESET_NON_FATAL)) { - enable_scrq_irq(adapter, rx_scrq); - napi_complete_done(napi, frames_processed); - return frames_processed; - } - if (!pending_scrq(adapter, rx_scrq)) break; next = ibmvnic_next_scrq(adapter, rx_scrq);
The reset path will call ibmvnic_cleanup->ibmvnic_napi_disable ->napi_disable(). This is supposed to stop the polling. Commit 21ecba6c48f9 ("ibmvnic: Exit polling routine correctly during adapter reset") reported that the during device reset, polling routine never completed and napi_disable slept indefinitely. In order to solve that problem, resetting bit was checked and napi_complete_done was called before dev->poll::ibmvnic_poll exited. Checking for resetting bit in dev->poll is racy because resetting bit may be false while being checked, but turns true immediately afterwards. Hence we call napi_complete in ibmvnic_napi_disable, which avoids the racing with resetting, and makes sure dev->poll and napi_disalbe completes before reset routine actually releases resources. Fixes: 21ecba6c48f9 ("ibmvnic: Exit polling routine correctly during adapter reset") Signed-off-by: Lijun Pan <ljp@linux.ibm.com> --- drivers/net/ethernet/ibm/ibmvnic.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)