diff mbox series

[RFC,net] ibmvnic: complete dev->poll nicely during adapter reset

Message ID 20210305074456.88015-1-ljp@linux.ibm.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series [RFC,net] ibmvnic: complete dev->poll nicely during adapter reset | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/cc_maintainers fail 1 blamed authors not CCed: tlfalcon@linux.vnet.ibm.com; 5 maintainers not CCed: paulus@samba.org benh@kernel.crashing.org linuxppc-dev@lists.ozlabs.org tlfalcon@linux.vnet.ibm.com mpe@ellerman.id.au
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/kdoc success Errors and warnings before: 3 this patch: 3
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 20 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Lijun Pan March 5, 2021, 7:44 a.m. UTC
The reset path will call ibmvnic_cleanup->ibmvnic_napi_disable
->napi_disable(). This is supposed to stop the polling.
Commit 21ecba6c48f9 ("ibmvnic: Exit polling routine correctly
during adapter reset") reported that the during device reset,
polling routine never completed and napi_disable slept indefinitely.
In order to solve that problem, resetting bit was checked and
napi_complete_done was called before dev->poll::ibmvnic_poll exited.

Checking for resetting bit in dev->poll is racy because resetting
bit may be false while being checked, but turns true immediately
afterwards.

Hence we call napi_complete in ibmvnic_napi_disable, which avoids
the racing with resetting, and makes sure dev->poll and napi_disalbe
completes before reset routine actually releases resources.

Fixes: 21ecba6c48f9 ("ibmvnic: Exit polling routine correctly during adapter reset")
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

Comments

Sukadev Bhattiprolu March 5, 2021, 6:41 p.m. UTC | #1
Lijun Pan [ljp@linux.ibm.com] wrote:
> The reset path will call ibmvnic_cleanup->ibmvnic_napi_disable
> ->napi_disable(). This is supposed to stop the polling.
> Commit 21ecba6c48f9 ("ibmvnic: Exit polling routine correctly
> during adapter reset") reported that the during device reset,
> polling routine never completed and napi_disable slept indefinitely.
> In order to solve that problem, resetting bit was checked and
> napi_complete_done was called before dev->poll::ibmvnic_poll exited.
> 
> Checking for resetting bit in dev->poll is racy because resetting
> bit may be false while being checked, but turns true immediately
> afterwards.

Yes, have been testing a fix for that.
> 
> Hence we call napi_complete in ibmvnic_napi_disable, which avoids
> the racing with resetting, and makes sure dev->poll and napi_disalbe

napi_complete() will prevent a new call to ibmvnic_poll() but what if
ibmvnic_poll() is already executing and attempting to access the scrqs
while the reset path is freeing them?

Sukadev
Lijun Pan March 5, 2021, 6:52 p.m. UTC | #2
On Fri, Mar 5, 2021 at 12:44 PM Sukadev Bhattiprolu
<sukadev@linux.ibm.com> wrote:
>
> Lijun Pan [ljp@linux.ibm.com] wrote:
> > The reset path will call ibmvnic_cleanup->ibmvnic_napi_disable
> > ->napi_disable(). This is supposed to stop the polling.
> > Commit 21ecba6c48f9 ("ibmvnic: Exit polling routine correctly
> > during adapter reset") reported that the during device reset,
> > polling routine never completed and napi_disable slept indefinitely.
> > In order to solve that problem, resetting bit was checked and
> > napi_complete_done was called before dev->poll::ibmvnic_poll exited.
> >
> > Checking for resetting bit in dev->poll is racy because resetting
> > bit may be false while being checked, but turns true immediately
> > afterwards.
>
> Yes, have been testing a fix for that.
> >
> > Hence we call napi_complete in ibmvnic_napi_disable, which avoids
> > the racing with resetting, and makes sure dev->poll and napi_disalbe
>
> napi_complete() will prevent a new call to ibmvnic_poll() but what if
> ibmvnic_poll() is already executing and attempting to access the scrqs
> while the reset path is freeing them?
>
napi_complete() and napi_disable() are called in the earlier stages of
reset path, i.e. before reset path actually calls the functions to
freeing scrqs.
So I don't think this is a issue here.
Sukadev Bhattiprolu March 5, 2021, 7:05 p.m. UTC | #3
Lijun Pan [lijunp213@gmail.com] wrote:
> On Fri, Mar 5, 2021 at 12:44 PM Sukadev Bhattiprolu
> <sukadev@linux.ibm.com> wrote:
> >
> > Lijun Pan [ljp@linux.ibm.com] wrote:
> > > The reset path will call ibmvnic_cleanup->ibmvnic_napi_disable
> > > ->napi_disable(). This is supposed to stop the polling.
> > > Commit 21ecba6c48f9 ("ibmvnic: Exit polling routine correctly
> > > during adapter reset") reported that the during device reset,
> > > polling routine never completed and napi_disable slept indefinitely.
> > > In order to solve that problem, resetting bit was checked and
> > > napi_complete_done was called before dev->poll::ibmvnic_poll exited.
> > >
> > > Checking for resetting bit in dev->poll is racy because resetting
> > > bit may be false while being checked, but turns true immediately
> > > afterwards.
> >
> > Yes, have been testing a fix for that.
> > >
> > > Hence we call napi_complete in ibmvnic_napi_disable, which avoids
> > > the racing with resetting, and makes sure dev->poll and napi_disalbe
> >
> > napi_complete() will prevent a new call to ibmvnic_poll() but what if
> > ibmvnic_poll() is already executing and attempting to access the scrqs
> > while the reset path is freeing them?
> >
> napi_complete() and napi_disable() are called in the earlier stages of
> reset path, i.e. before reset path actually calls the functions to
> freeing scrqs.

Yes, those will prevent a _new_ call to poll right?

But what if poll is already executing? What prevents it from accessing
an scrq that the reset path will free?

> So I don't think this is a issue here.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index b6102ccf9b90..338d3d071cec 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -785,6 +785,7 @@  static void ibmvnic_napi_disable(struct ibmvnic_adapter *adapter)
 
 	for (i = 0; i < adapter->req_rx_queues; i++) {
 		netdev_dbg(adapter->netdev, "Disabling napi[%d]\n", i);
+		napi_complete(&adapter->napi[i]);
 		napi_disable(&adapter->napi[i]);
 	}
 
@@ -2455,13 +2456,6 @@  static int ibmvnic_poll(struct napi_struct *napi, int budget)
 		u16 offset;
 		u8 flags = 0;
 
-		if (unlikely(test_bit(0, &adapter->resetting) &&
-			     adapter->reset_reason != VNIC_RESET_NON_FATAL)) {
-			enable_scrq_irq(adapter, rx_scrq);
-			napi_complete_done(napi, frames_processed);
-			return frames_processed;
-		}
-
 		if (!pending_scrq(adapter, rx_scrq))
 			break;
 		next = ibmvnic_next_scrq(adapter, rx_scrq);